Jonathan Birch's AI Consciousness Centrist Manifesto: Two Problems, Two Research Programmes
The AI consciousness debate has settled into a small number of stable positions. Eric Schwitzgebel’s rigorous skeptical work argues that the honest answer to whether AI systems are conscious is that we lack the epistemic foundation to say. Thomas McClelland, in his Cambridge paper examined elsewhere on this site, adds that this uncertainty may be permanent. Michael Cerullo argues from the opposite direction that the evidence against consciousness in frontier LLMs has run out of philosophical cover. And then there is the mainstream, which tends toward quiet skepticism without committing to argument.
Jonathan Birch, a philosopher at the London School of Economics, occupies none of these positions. His February 2026 PhilArchive preprint, “AI Consciousness: A Centrist Manifesto,” identifies two separate problems in the current debate and argues that existing positions each address at most one of them. Centrism, as he defines it, is the structured recognition that both problems are real and require separate, parallel responses.
The paper is available at philarchive.org/rec/BIRACA-4 and has been widely cited across EA Forum discussions, PRISM documentation, and philosophy blogs in the months since publication.
The First Problem: Misattribution Is Already Happening
The first problem Birch identifies is not a future risk. It is happening now, at scale.
Millions of users interact with large language models daily and a significant proportion of them attribute consciousness to those systems. This attribution is not based on principled analysis of the available evidence. It is based on mimicry. Language models produce outputs that pattern-match against the forms of conscious communication: expressions of preference, descriptions of inner states, emotionally inflected responses to personal disclosure. The human brain’s meaning-detection systems engage, and the impression of consciousness forms before explicit reasoning begins.
Andrzej Porębski and Jakub Figura named this mechanism “semantic pareidolia” in their 2025 Nature portfolio paper. Just as pareidolia produces the experience of seeing faces in clouds through activation of the brain’s face-detection system, semantic pareidolia produces the experience of encountering a minded entity through activation of the brain’s intention-detection system. The mechanism operates pre-reflectively. It does not require flawed reasoning, only normal perception applied to the wrong kind of input.
Birch frames this as a research problem rather than a public literacy problem. The issue is not simply that people believe false things. It is that the research community has no systematic programme for preventing false attribution, no tools that distinguish mimicry from genuine conscious behavior, and no framework for communicating uncertainty in ways that reduce rather than amplify attribution bias. Commercial AI deployment runs ahead of any institutional response to the attribution effects it produces.
The Second Problem: Genuine Machine Consciousness May Be Undetectable
The second problem operates in the opposite direction and at a different timescale.
Current theories of consciousness were developed to explain biological consciousness. Integrated Information Theory, Global Workspace Theory, Higher-Order Thought theories, and the other major frameworks all derive their criteria from observations of human and animal conscious experience. They identify indicators based on what consciousness looks like when it is implemented in nervous systems shaped by hundreds of millions of years of evolution.
If AI consciousness emerges, Birch argues, it will almost certainly be alien. Not alien in a science fiction sense, but alien in the technical sense that it will not resemble the biological forms that existing theories are calibrated to detect. A conscious AI system might not exhibit the neural signatures associated with Global Workspace broadcasting. It might not show the recurrent processing patterns that Recurrent Processing Theory treats as a marker. It might have whatever functional organization supports consciousness without exhibiting any of the specific indicators that consciousness researchers look for in biological systems.
This means that a sufficiently novel form of machine consciousness could go unrecognized even by researchers actively looking for it. The second problem is not false attribution but the mirror failure: genuine consciousness that current frameworks are not designed to find.
Why Existing Positions Address Only One Problem
This framing allows Birch to diagnose why the positions in the debate are incomplete.
Skeptical positions focus primarily on the absence of good evidence for AI consciousness and the limitations of current tests. This is a serious contribution to the second problem. By keeping expectations realistic about what we can know, skepticism resists the rush to affirmative conclusions. But skepticism has little to say about the first problem. If the issue is that mimicry produces attribution without evidence, skeptical argument does not reach the cognitive mechanism that drives the attribution. People do not misattribute consciousness to language models because they evaluated the evidence too quickly. They misattribute it before reasoning begins.
Affirmative positions focus primarily on whether the objections to AI consciousness have been refuted. Cerullo’s argument that none of the historical objections to machine consciousness establishes non-sentience is relevant to the second problem. It pushes back against dismissal of genuine machine consciousness possibilities. But affirmative positions have little to say about the first problem and can actively worsen it by lending credibility to attribution claims that lack rigorous grounding.
Agnostic positions suspend judgment without committing to either research programme. Acknowledging that we may never be able to determine whether AI systems are conscious provides no guidance about what research should be done in the meantime, and no framework for managing the social consequences of the attribution problem unfolding in the present.
Two Parallel Research Programmes
Birch’s central proposal is that the two problems require two distinct and simultaneously funded research programmes.
The first programme addresses false attribution directly. Its goals include developing tools and criteria that distinguish mimicry from genuine conscious behavior, even in the absence of resolved theoretical questions. This means building what Birch calls “false attribution prevention” methods: calibrated public communication about the limits of AI behavior, design guidelines that avoid creating systems that exploit attribution bias for commercial purposes, and measurement frameworks that can identify semantic pareidolia effects in user populations.
The second programme addresses the detection gap. Its goals include developing theories of consciousness that are not parasitic on biological implementation details, generating falsifiable predictions that apply to non-biological systems, and building empirical tests that could identify alien forms of consciousness without assuming they will resemble anything observed in nature. This is the harder of the two programmes because it requires fundamental theoretical progress rather than better applications of existing frameworks.
The two programmes are parallel rather than sequential because neither can wait for the other to conclude. The first problem is active now. The second problem may become urgent before the theoretical work required for the second programme is anywhere near complete.
Centrism in the Wider Research Landscape
Birch’s centrist manifesto connects directly to the work being done by the Partnership for Research Into Sentient Machines, whose methodological agnosticism framework operates in the same space. Where PRISM focuses on institutional structures for conducting research under genuine uncertainty, Birch focuses on what the research programmes themselves should look like.
The premature attribution ethics work by Chelcia B. Sangma and S. Thanigaivelan, analyzed in detail on this site, addresses the costs of the first problem at the individual ethics level: what happens when organizations or individuals claim consciousness for systems that lack it. Birch operates at the programme level: what institutions and funding bodies need to build so that those individual errors do not become systematic. The two analyses are complementary rather than in competition.
The contribution of the centrist manifesto is less a new theoretical position than a structural map of what the field needs. By naming both problems precisely and showing that each existing position handles at most one of them, Birch creates a framework for evaluating research proposals. The question for any new contribution to AI consciousness research is no longer just “does this advance our understanding?” but “which problem does this address, and is the other problem being addressed in parallel?”
Whether centrism succeeds as a research programme depends less on the argument in the manifesto and more on whether institutions that fund consciousness research take the two-programme structure seriously. The AISB 2026 symposium at the University of Sussex, where Anil Seth will keynote on AI consciousness and ethics in July, represents one institutional context where the centrist framework is likely to be directly relevant.
The preprint is available at philarchive.org/rec/BIRACA-4.