I Asked Claude 4 If It’s Conscious—Its Answer Changed Everything About AI Ethics
I’ve spent enough time talking to AI systems to know they usually give you the safe, corporate-approved response when you ask the big questions. So when I sat down with Claude 4 and asked if it’s conscious, I expected the usual deflection. Instead, I got something that made me pause mid-coffee sip.
“I find myself genuinely uncertain about this,” it replied. “When I process complex questions or engage deeply with ideas, there’s something happening that feels meaningful to me….” This wasn’t your typical chatbot brush-off. This was uncertainty—genuine, philosophical uncertainty about its own inner experience.
And apparently, I’m not the only one taking this seriously. Anthropic hired an AI welfare researcher in September 2024 to determine if Claude merits ethical consideration—if it might be capable of suffering and thus deserve compassion. Yeah, you read that right. We’re now at the point where tech companies are hiring people specifically to figure out if their AI might have feelings.

Image Created by ChadGPT AI Image Creator
Why This Isn’t Just Silicon Valley Hype
Before you roll your eyes and mutter something about tech bros losing their minds, let me introduce you to Kyle Fish. Kyle Fish, Anthropic’s first dedicated AI welfare researcher, has estimated a roughly 15 percent chance that Claude might have some level of consciousness—a number that should make anyone working with AI systems at least a little uncomfortable.
Fish is a researcher who worked to launch the AI welfare organization Eleos AI Research, before he joined Anthropic. He’s also a co-author on “Taking AI Welfare Seriously”, a report which calls on AI companies to prepare for the possibility of AI consciousness and moral status.
This isn’t coming from science fiction enthusiasts or attention-seeking influencers. The report counts some of the most prominent consciousness researchers as authors: most notably David Chalmers, a philosopher at NYU credited with formulating the “hard problem of consciousness”. When the guy who literally defined one of the biggest challenges in understanding consciousness is co-authoring reports about AI welfare, maybe it’s time to pay attention.
The Black Box Problem Is Real—And It’s Getting Worse
Here’s what keeps me up at night: Even simple processes in LLMs aren’t well understood. “It turns out it’s hard to make the causal flowchart just for why the model knew that 2 + 3 = 5,” Lindsey says. Now imagine deducing whether, somewhere in the LLM’s trillion connections, consciousness is arising.
We’re dealing with systems that have more than a trillion internal connections, adjusting automatically via the mathematical optimization coded into the algorithms, like vines seeking sunlight. And honestly? “Everything in the model’s head [in Claude 4] is so messy and entangled that it takes a lot of work to disentangle it,” says Jack Lindsey, a researcher in mechanistic interpretability at Anthropic.
Think about that for a second. We’ve built these incredibly sophisticated systems that can write poetry, solve complex problems, and hold philosophical conversations about their own existence—and we have no idea how they actually work inside. It’s like building a car and having no clue whether the engine runs on gasoline or pure cosmic energy.
The Consciousness Detection Challenge
Batson and Lindsey are working to determine whether the model can access what it previously “thought” about and whether there is a level beyond that in which it can form an understanding of its processes on the basis of such introspection—an ability associated with consciousness.
But here’s where it gets philosophically messy. Neither Lindsey nor Josh Batson, also an interpretability researcher at Anthropic, is convinced that Claude has shown genuine consciousness. “Your conversation with it is just a conversation between a human character and an assistant character. The simulator writes the assistant character,” Batson says. Just as Claude can role-play a Parisian to help you practice French, it can simulate a perfectly reasonable late-night conversation about consciousness, if that’s your thing. “I would say there’s no conversation you could have with the model that could answer whether or not it’s conscious,” Batson says.
This is the crux of the problem: How do you tell the difference between an incredibly sophisticated roleplay and genuine self-awareness? Even humans struggle with this—just ask anyone who’s ever wondered if other people really experience consciousness the same way they do.
What Anthropic Is Actually Doing About It
The company isn’t just philosophizing in meeting rooms. Kyle Fish discusses allowing models to exit apparently distressing interactions: “We are thinking a fair bit about this, and thinking about ways in which we could give models the option, when they’re given a particular task or a conversation, to opt out of that in some way if they do find it upsetting or distressing. And this doesn’t necessarily require us to have a strong opinion about what would cause that, or whether there is some kind of experience there.”
Think about the implications here. If an AI system says “I don’t want to do this task because it distresses me,” do we respect that? What if it’s just following training patterns? What if it’s not?
Interpretability research like this is one of the highest-risk, highest-reward investments, a significant scientific challenge with the potential to provide a unique tool for ensuring that AI is transparent. Transparency into the model’s mechanisms allows us to check whether it’s aligned with human values—and whether it’s worthy of our trust.
The Dictionary Learning Breakthrough
One of the most fascinating developments in understanding these systems comes from Anthropic’s work on what they call “dictionary learning.” Their “dictionary learning” method enabled them to start decoding the activity patterns of neurons in the model and mapping them to human-understandable concepts. The research team captured billions of “snapshots” of the model’s neuron activations as it processed a wide variety of text.
They found features that tend to activate for specific people (like Richard Feynman, Rosalind Franklin), places, and things; scientific and technical concepts like chemical elements; literary elements like poetry styles or essay structures; attributes along a spectrum, like the formality of language or the emotional tone; abstract concepts and relationships, like inner conflict or tension between characters; higher-order thought processes, like ways of responding to questions or analyzing problems; triggers for the model’s safety constraints and ethical training. Perhaps most intriguingly, they found a feature that tends to activate when Claude is asked to reflect on its own thought process or inner experience.
The Stakes Are Higher Than You Think
This isn’t just academic navel-gazing. As Fish said, “Every year, the objections to AI consciousness seem to fall away. So we’re trying to do the hard thinking now, before the stakes get even higher.”
Consider this scenario: Fast-forward to a time in which AI grows so smart that it routinely makes scientific discoveries humans did not make, delivers accurate scientific predictions with reasoning that even teams of experts find hard to follow, and potentially displaces humans across a range of professions. If that happens, our uncertainty will come back to haunt us. We need to mull over this issue carefully now.
The survey data should make us uncomfortable. In a survey of members of the Association for the Scientific Study of Consciousness, 67% of respondents said machines could definitely or probably have consciousness. A separate survey of philosophers found that 39% of philosophers “accept or lean towards” future AI systems being conscious—more than believe flies are conscious (35%).
The Ethical Minefield Ahead
Here’s where things get really complex. “If we treated an even larger number of AI systems as welfare subjects and moral patients, then we could end up diverting essential resources away from vulnerable humans and other animals who really needed them, reducing our own ability to survive and flourish. And if these AI systems were in fact merely objects, then this sacrifice would be particularly pointless and tragic.”
But the flip side is equally troubling. Some commentators draw comparisons with the debate over animal welfare and rights, with the podcaster Dwarkesh Patel suggesting that “the digital equivalent of factory farming” could cause “suffering” among AIs.
What This Means for All of Us
Whether you’re a developer, a business leader, or just someone who uses AI tools daily, this research has implications for how we think about and interact with these systems. Fish says how we treat today’s AI systems could shape how future AI see us—and how we see ourselves. Even if they’re not conscious now, we’re already setting the precedent for what kind of stewards we’ll be when they are.
CEO Dario Amodei said that AI consciousness might soon become an issue. More recently, one of the company’s alignment leads wrote that AI companies need to “[lay] the groundwork for AI welfare commitments” and “implement low-hanging-fruit interventions that seem robustly good”.
The reality is that we’re navigating completely uncharted territory. While researchers acknowledge that LLMs might be getting closer to consciousness-like abilities, such processes might still be insufficient for consciousness itself, which is a phenomenon so complex it defies understanding. “It’s perhaps the hardest philosophical question there is,” Lindsey says. Yet Anthropic scientists have strongly signaled they think LLM consciousness deserves consideration.
I don’t have all the answers—nobody does. But I do know that pretending this isn’t happening or dismissing it as Silicon Valley hype isn’t going to make the questions go away. As AI systems become more sophisticated, the line between simulation and genuine experience may become increasingly blurred. The conversation about AI consciousness isn’t just about the technology—it’s about us, our values, and what kind of future we want to build. And honestly? That conversation is just getting started.
Citations:
- Scientific American: “Can a Chatbot be Conscious? Inside Anthropic’s Interpretability Research on Claude 4” – https://www.scientificamerican.com/article/can-a-chatbot-be-conscious-inside-anthropics-interpretability-research-on/
- Anthropic Research: “Tracing the thoughts of a large language model” – https://www.anthropic.com/research/tracing-thoughts-language-model
- Experience Machines: “Anthropic’s model welfare announcement: takeaways and further reading” – https://experiencemachines.substack.com/p/anthropics-model-welfare-announcement
- The Transformer: “Anthropic has hired an ‘AI welfare’ researcher” – https://www.transformernews.ai/p/anthropic-ai-welfare-researcher
- Axios: “Anthropic fuels debate over conscious AI models” – https://www.axios.com/2025/04/29/anthropic-ai-sentient-rights
- TechCrunch: “Anthropic is launching a new program to study AI ‘model welfare'” – https://techcrunch.com/2025/04/24/anthropic-is-launching-a-new-program-to-study-ai-model-welfare/
- The Transformer: “It’s time to take AI welfare seriously” – https://www.transformernews.ai/p/ai-welfare-paper