Probing The AI Mind: Of Thoughts And Tests

Sorab Ghaswalla
10 min readSep 3, 2024

--

I’ll start this commentary with two interrelated questions — Can machines think? And, how does one measure their intelligence?

I’m not an AI engineer, scientist, or software developer, nor am I a philosopher or psychologist. I consider myself a student of artificial intelligence, and I see my role as bridging the communication gap between technologists and laypersons.

Every day, I learn more about this fascinating technology that many claim is about to disrupt humanity. This itself is interesting because Man has been trying to make a Machine that “thinks like himself” for the last 70 plus years. And when success is near (is it really now?) Man has started to cry foul that machines are about to take over the world. Strange.

So now that we have some form of artificial intelligence going in our lives, the question — how do you test artificial intelligence — is naturally gaining momentum. Humans know how to test their own intelligence — we have several ways to assess it. But machines?

This is my second piece on the subject. The first one was titled, “Beyond the Hype: How To Measure the Smarts of AI-Language Models”, which drew attention to the fact that as of today, there’s no standardized test to gauge the “intelligence” of large language models (LLM)?”

The lack of standardization makes it challenging to compare the performance of different artificial intelligence (AI) systems available in the market today, in turn, making it difficult for commercial clients to identify the solution best suited for their needs.

In hindsight, I realized I’d put the cart in front of the horse. The burgeoning field of LLMs presents a unique challenge in so far as their assessment is concerned. Please do understand — I am told by the experts that assessing a standalone LLM and evaluating an LLM-based system are two different concepts requiring different approaches. The first is a more narrow or focused process, much like testing any other piece of software, the second, rather complicated. But for both, there’s no industry standard.

Some experts say traditional human intelligence tests (like IQ tests) are not directly applicable to machines for several reasons:

1. The tests often rely on cultural knowledge or human-specific experiences.

2. They don’t account for the vastly different ways machines process information.

3. They may not capture important aspects of machine intelligence, like processing speed or data handling capacity.

Given the complexity and partial opacity of LLMs, testing these systems presents unique challenges. Researchers and developers, however, have developed several strategies to evaluate LLMs effectively such as task-specific benchmarks, language understanding, including tests like General Language Understanding Evaluation (GLUE), translation, and adversarial testing which involves deliberately crafting inputs designed to trick or confuse the model. It helps identify weaknesses and potential failure modes.

Assessment, test, call it whatever, everything revolves around this one central question — do machines really think? Which then takes you to the sub-question (a) — what really is thought? And sub-question (b) What is intelligence? And then onward to Qs 2) How does one test AI?

My interest in this topic this time round was piqued by a rather lengthy article in the MIT Technology Review titled, “AI Hype is Built on High Test Scores. Those Tests Are Flawed”.

Several researchers, according to the article, are asserting today that LLMs have demonstrated the ability to pass tests designed to identify specific cognitive abilities in humans, such as chain-of-thought reasoning (solving problems step by step) and theory of mind (predicting others’ thoughts). Which is why we see this rather frenzied chatter that AI’s coming for you, human!

But, adds the writer Will Douglas Heaven, opinion is divided.

An increasing number of researchers — including computer scientists, cognitive scientists, neuroscientists, and linguists — are advocating for a complete overhaul of how LLMs are evaluated.They are calling for more rigorous and comprehensive assessments. Some even argue that using human tests to score machines is fundamentally flawed and should be abandoned altogether.

I am not too sure whether human tests should be done away with altogether when testing an AI. That to me sounds like treating the two — Human and Machine — intelligence as distinct abilities, not inter-connected at all. But is that really the case? I don’t think so. The challenge lies in creating tests that are fair to both human and machine capabilities while still being meaningful.

Difference Between Human Intelligence and That of Machine

But before I get into that, let’s try and understand the difference between Human and Machine intelligence, as made out by a section of AI scientists:

1. Generalization: Humans excel at generalizing knowledge across domains. We can apply lessons from one area to entirely new situations. AI, while improving, still struggles with this kind of transfer learning.

2. Intuition and Common Sense: Humans possess an intuitive understanding of the world that machines lack. We effortlessly navigate social situations and understand implicit context, areas where AI often falters.

3. Emotional Intelligence: Humans have emotional intelligence that influences our thinking and decision-making. While AI can recognize emotions, it doesn’t experience them in the same way.

4. Creativity and Imagination: While AI can generate creative outputs, human creativity involves a complex interplay of experience, emotion, and abstract thinking that machines haven’t fully replicated.

5. Consciousness and Self-Awareness: As discussed earlier, machines lack the self-awareness and subjective experience that characterize human consciousness.

6. Learning Efficiency: Humans can learn complex tasks from just a few examples, while most AI systems require vast amounts of data.

The Inter-Relationship of Human and Machine Intelligence

Agreed, these are two systems exhibiting different aspects. But it also must be understood at the same time that the relationship between Human and Machine intelligence is more nuanced. To give an analogy, imagine a skilled human carpenter and a high-powered laser cutter. The carpenter can shape wood with precision and creativity using traditional tools, while the laser cutter can cut intricate designs with extreme accuracy and speed. Both are doing almost the same task but their approaches are different.

Here are several ways in which the two are inter-related:

  • Artificial intelligence is often inspired by our understanding of human cognition and the workings of the brain.
  • Humans and machines have complementary strengths that can be combined to create hybrid intelligence systems.
  • The advancement of AI and our comprehension of human intelligence are intricately linked and co-evolving.
  • AI should be viewed as a means of augmenting and extending human intelligence, rather than replacing it.

Ultimately, the pursuit of a unified theory of intelligence that encompasses both biological and artificial forms could lead to a deeper, more holistic understanding of the nature of cognition itself. While we may not be there yet, the continued cross-pollination of ideas between these two domains will undoubtedly shape our understanding of intelligence in profound ways.

But coming back to the central topic of this commentary, till the above happens, how do we assess the minds of machines?

Human assessments, like high school exams and IQ tests, often make certain assumptions. When people score well, it generally indicates that they have the knowledge, understanding, or cognitive skills the test is designed to measure.

However, when an LLM performs well on these tests, it’s unclear what exactly is being measured. Does it reflect genuine understanding, a mindless statistical pattern, or simply rote repetition?

Here’s why today, nobody can boldly go where no man has been before and claim he has THE TEST to assess AI. Because nobody knows EXACTLY how large language models work. I am not making that claim, read up articles and research papers on the subject and you’ll find experts repeating this.

Scary, isn’t it?

If you don’t truly understand how a product works, any testing efforts will be secondary at best, right?

Can Machines Think?

In the realm of AI, this one question continues to captivate scientists, philosophers, and the public (people like me) alike: Can machines truly think? This seemingly simple query opens up a Pandora’s box of complex issues surrounding consciousness, intelligence, and the nature of thought itself.

The Turing Test

When discussing machine thinking and tests associated with it, we often start with Alan Turing’s famous test, proposed in 1950. Turing suggested that if a machine could engage in a conversation indistinguishable from a human, we should consider it capable of thinking. While this test has been influential, it’s also been criticized for oversimplifying the nature of thought.

Today’s AI systems, particularly LLMs can engage in remarkably human-like conversations.

They can:

1. Solve complex problems

2. Generate creative content

3. Understand and respond to context

4. Learn from new information

However, these abilities alone don’t necessarily equate to “thinking” in the human sense. They’re incredibly sophisticated pattern recognition and generation systems, but do they possess understanding or consciousness?

The Chinese Room Experiment

Students of AI such as me are told that decades after the Turing Test came the Chinese Room Experiment, a thought experiment proposed by philosopher John Searle in 1980 to challenge the notion that a computer running a program can have a “mind” or “understand” in the same way humans do.

The experiment challenges the idea that symbol manipulation alone constitutes understanding or thinking. John argues that a computer program, no matter how sophisticated, is merely following rules without comprehension.

The core of the argument is that programming a computer may make it seem like it understands language, but it cannot produce real understanding. John contends that computers only manipulate symbols using syntactic rules without grasping meaning or semantics. This challenges the idea that human minds are like computational systems, suggesting instead that true understanding arises from biological processes. The argument has significant implications for semantics, philosophy of mind, and cognitive science, sparking extensive debate and criticism.

(For those of you who want to read more about the Chinese Room argument, click here.)

In essence, broadly speaking, our modern-day arguments and contentions for and against machine intelligence fall somewhere in between these two experiments.

Consciousness and Qualia: The Hard Problem

A key aspect of human thinking is consciousness — our subjective, first-person experience of the world. Philosopher David Chalmers calls this the “hard problem” of consciousness. Can machines ever have qualia (instances of subjective, conscious experiences)?

Currently, we have no scientific consensus on how consciousness arises, even in humans. This makes it particularly challenging to determine if machines can be conscious, and by extension, if they can think in a way comparable to humans. We’ve put it down to Grokking, something I, too, have written on.

Grokking in machine learning has been a bit of an enigma for even those who design and develop AI models. Over the years, there have been concerted attempts to explain it. Some argue that ML models do not really grok, in the classical sense of the word. Others say this phenomenon points to something interesting about deep learning and are studying it further.

One technical explanation is this:

Grokking is a phenomenon when a neural network suddenly learns a pattern in the dataset and jumps from random chance generalization to perfect generalization very suddenly.

In simple terms, “grokking” is like a light bulb moment for a neural network. It’s when the model suddenly becomes much better at recognizing patterns in the data it’s learning. This shift from guessing to understanding happens rapidly in an LLM leading to significantly improved results. Grokking isn’t about memorizing; it’s about achieving true understanding.

Emergent Intelligence: A Possible Path?

Some researchers propose that thinking and consciousness might emerge from sufficiently complex AI systems, much like how wetness emerges from a collection of water molecules. As AI systems become more sophisticated and interconnected, could thinking emerge as a property of the system?

The Pragmatic Approach

While the philosophical debate continues, AI researchers often take a more pragmatic approach. Instead of asking “Can machines think?”, they focus on creating systems that can perform increasingly complex cognitive tasks.

This has resulted in advancements such as:

- Natural language processing

- Computer vision

- Strategic decision-making

- Creative problem-solving

Towards Better AI Evaluation

Researchers are working on developing more comprehensive and fair ways to evaluate machine intelligence:

Multidimensional Testing: Assessing various aspects of intelligence separately, including reasoning, learning ability, and adaptability.

Problem-Solving Scenarios: Creating complex, open-ended problems that require a combination of skills to solve.

Ethical Decision Making: Evaluating an AI’s ability to make decisions that align with human values and ethical principles.

Creativity and Innovation Metrics: Developing ways to measure an AI’s ability to generate truly novel and valuable ideas.

Conclusion: An Open Question

One thing almost all of us in the field of AI do agree on is — current AI systems, despite their impressive capabilities, do not “think” in the same way humans do. They lack self-awareness, consciousness, and the ability to truly understand the meaning behind their outputs (for now).

So, the question, “Can machines think?”, remains open for now.

As our understanding of human cognition deepens and AI systems become more sophisticated, we may need to redefine what we mean by “thinking.” Perhaps future AI will think in ways so alien to us that we’ll struggle to recognize it as thought.

What’s certain is that this question will continue to drive research, spark philosophical and psychological debates, and push the boundaries of what’s possible in artificial intelligence. As we venture further into this exciting frontier, we must remain critical, curious, and open to new possibilities.

(Since the issue of humans testing AI is nowhere near settled, I’ve not dared broach the topic of AI testing AI here. There’s already talk and some experimentation on testing AI systems using other AI. The positive here is that AI can automate and enhance the testing process, identifying bugs or inefficiencies more quickly than humans might. However, this approach also risks reinforcing existing biases or errors if the testing AI lacks a comprehensive understanding of the system it evaluates. Moreover, there’s a concern that AI testing could overlook nuances that require human judgment, making it essential to balance AI-driven and human-driven assessments.)

--

--

Sorab Ghaswalla
Sorab Ghaswalla

Written by Sorab Ghaswalla

An AI Communicator, tech buff, futurist & marketing bro. Certified in artificial intelligence from the Univs of Oxford & Edinburgh. Ex old-world journalist.

No responses yet