From Prediction to Perception: Why Theory of Mind is a Breakthrough Moment in AI Ethics

How Theory of Mind is Shaping AI Ethics

Jun 26, 2025

“Communication has to do with transmitting things that you don’t know.”

In the rapidly expanding world of AI, one crucial concept from developmental psychology is quietly shaping how machines “understand” us and each other: Theory of Mind (ToM).

It may sound like philosophy, but it’s one of the most practical tools we have for making machines more human-compatible. At the heart of it? The realization that not everyone shares the same mental model. In simple terms, ToM is “I know X, but you might not know X, or you might believe Y instead.”

🧠 Children lack POV understanding (ToM)— they can’t understand that other’s have different mental states/beliefs/intentions different from their own. This can also occur in AI.

⸻

🧪 A Crash Course in Theory of Mind

If you haven’t seen the Smarties experiment before, it’s worth a watch: Smarties Experiment (YouTube, start at 1:53)

Here’s how it goes:

A psychologist holds up a roll of Smarties candy and asks a child, “What do you think is inside?”

The child says: “Smarties.”

But when the psychologist opens the roll of Smarties, pencils are inside.

Now comes the twist:

The psychologist asks, “What will your friend Tommy think is in this roll of Smarties?”

The child, still processing the pencils, responds: “Pencils.”

But the child’s friend, Tommy, has never seen inside the Smarties and will assume there are Smartie candies inside, not pencils. This illustrates the child’s lack of Theory of Mind: the inability to attribute different knowledge points or beliefs to another person. They can’t yet distinguish what they know from what others know.

⸻

🤖 The Problem with AI Assumptions

AI systems can act like children in early development: they may lack perspective-taking. AI may assumes shared knowledge between agents or between humans and machines. This becomes especially problematic in high-stakes or collaborative tasks, where assumptions can lead to dangerous gaps in reasoning, communication, or alignment.

If an AI assumes we share the same perspective, it may skip critical information, thinking it’s already known.

If an AI trains another model, it may assume the second model “understands” motivation or belief states—without ever verifying them.

This assumption of a shared mental model—“I know it, so you must too”—is precisely the blind spot that Theory of Mind research in AI is working to fix.

⸻

🧠 Google DeepMind’s Breakthrough: Machine Theory of Mind

In a landmark 2018 paper, Google DeepMind researchers explored what would happen if we tried to give AI a form of Theory of Mind. They built a system capable of modeling other agents’ point of view—essentially asking: What does the other AI know? What does it believe? What is it trying to do?

This is revolutionary because most machine learning systems are trained in isolation; they optimize for performance, not perspective. Modeling others’ beliefs and intentions, as they did in the paper, is a step towards AI collaboration, alignment, and ethics.

📖 Reference:

Rabinowitz, N., Perbet, F., Song, F., Zhang, C., Eslami, S. A., & Botvinick, M. (2018, July). Machine theory of mind. In International Conference on Machine Learning (pp. 4218–4227). PMLR.

Read the full paper

⸻

🌐 Why This Matters Now

As generative models like ChatGPT or Gemini proliferate, and AI trains and governs other AI, Theory of Mind becomes a design necessity. Theory of Mind also makes AI socially intelligent.

If we want AI that collaborates, adapts, and aligns with human values, it must be able to model and respect different points of view. Just like in the Smarties test, failing to consider what others know (or don’t) leads to misunderstanding.

Just like children learning what’s inside the Smarties box, machines need help understanding that what they know, isn’t what everyone knows.

Psych Lab

Discussion about this post

Ready for more?