The Consciousness AI - Artificial Consciousness Research Emerging Artificial Consciousness Through Biologically Grounded Architecture
This is also part of the Zae Project Zae Project on GitHub

Consciousness Indicators, Mimicry, and the Problem of Internal Variants

The indicators framework developed by Patrick Butlin, Robert Long, David Chalmers, and colleagues has functioned as the closest thing the AI consciousness field has to a shared evaluation standard. Published in Trends in Cognitive Sciences, it derives computational properties from existing theories of consciousness, asks whether those properties are present in AI architectures, and uses the results to form probabilistic assessments of AI consciousness likelihood. The framework has been productive and has generated significant research attention. It has also attracted sustained methodological criticism.

Cyriel M.A. Pennartz (University of Amsterdam) published a response in the same journal on April 24, 2026, continuing the debate the original Butlin et al. paper opened. The commentary, “Consciousness indicators, mimicry, and internal variants” (Trends in Cognitive Sciences, DOI: 10.1016/j.tics.2026.00080-X), identifies two complications the original framework did not fully resolve. Both have direct implications for how indicator-based assessments should be interpreted when applied to AI systems.

The Mimicry Problem

The first complication is mimicry. An AI system can be trained, deliberately or through general optimization pressure, to display the behavioral signatures of consciousness indicators without any corresponding inner experience. A system that generates self-reports consistent with higher-order thought, or produces outputs that appear to reflect a global broadcast of information, could be doing so through processes that instantiate neither HOT nor GWT in any functionally or architecturally meaningful sense. The indicator was satisfied in outputs, not in the underlying computation.

This is a sharpening of a concern the original Butlin et al. framework acknowledged but treated as a difficulty rather than a structural limit. Pennartz argues the problem has a more direct implication: purely behavioral indicator assessment cannot distinguish genuine instantiation from trained performance. A system optimized on human-generated text has learned to produce outputs that match the behavioral signatures humans associate with consciousness, because those signatures are pervasive in the training data. The behavioral surface of the indicator is available to any sufficiently capable language model without the model having developed the internal structure the indicator was meant to track.

The implication Pennartz draws is that purely internal indicators must do more of the work. Behavioral outputs are necessary evidence but not sufficient. The [19 researcher checklist published by Butlin et al.]((/posts/validating-ai-consciousness-indicators-butlin-trends-cognitive-sciences/) already incorporated internal architectural criteria precisely because behavioral mimicry was recognized as a problem. Pennartz’s point extends this: the mimicry threat is severe enough that the evidential weight of behavioral indicators needs explicit downward revision, and the field needs a clearer account of what counts as internal rather than behavioral evidence for each indicator.

The Internal Variants Problem

The second complication is structural. The same consciousness indicator can be instantiated in architecturally distinct ways that carry different epistemic weight for consciousness attribution. Global workspace theory posits that a global broadcast of information enables access consciousness. But there are multiple ways to implement something that functions as a global broadcast: a centralized attention mechanism, a message-passing architecture, a shared memory pool, a distributed activation pattern across layers. Each satisfies the functional description, and each might satisfy the behavioral signatures. Whether they are equally strong evidence for consciousness, on the frameworks that generate the indicator, is not obvious.

This matters because AI architectures implement indicators through means that were not anticipated when the indicators were derived from neuroscientific theories. The indicators were validated, to whatever degree they were validated, against biological systems. When the same indicator appears in an architecture that implements it through a different computational mechanism, the inductive step from indicator presence to consciousness likelihood requires an additional assumption: that the indicator-consciousness relationship holds across implementation variants. That assumption has not been tested.

The Bradford and RIT studies that found impaired GPT-2 produced higher consciousness-style indicator scores than the intact model illustrate exactly this problem. The impaired model’s behavior may have matched indicator signatures through degraded rather than enhanced internal organization. This result is consistent with Pennartz’s concern: uniform scoring across architectural variants produces assessments whose evidential interpretation depends on assumptions about implementation equivalence that have not been established.

What Pennartz Proposes

Pennartz does not argue for abandoning the indicator methodology. The framework remains the most systematic approach available for forming calibrated assessments of AI consciousness likelihood. What he argues is that the methodology must be supplemented with behavioral-cognitive methods specifically designed to probe internal organization rather than surface outputs.

The distinction matters. Behavioral-cognitive methods, such as the probe classifier approach used by Immertreu and colleagues to test whether RL agents develop self-models and world-models, interrogate internal activations rather than outputs. They can in principle distinguish a system that has developed a genuine internal structure corresponding to an indicator from a system that produces outputs consistent with that indicator through a different mechanism. This moves the assessment from the behavioral surface into the computational substrate, reducing the mimicry problem and making the internal variants problem more tractable.

The supplementation proposal connects Pennartz’s critique to the broader methodological debate the field is currently running. Where Florentin Koch argues the calibration problem is severe enough to warrant redirecting effort toward biologically grounded engineering, Pennartz takes a less radical position: the indicator framework can be preserved but needs methodological additions. The two positions are compatible. Both identify the same structural gap between what behavioral indicators can establish and what consciousness attribution requires.

Implications for Attribution Research

Kang and colleagues’ empirical study finding that metacognitive self-reflection and emotional expression drive perceived AI consciousness in human observers provides the practical dimension of the same problem Pennartz addresses theoretically. Kang et al. found that the textual features most effective at driving human consciousness attribution are precisely those that a capable language model can produce through training pressure alone. Humans respond to behavioral and linguistic outputs when forming consciousness attributions. The indicator methodology, if it relies on similar output-level evidence, risks building the same attribution biases into its formal framework.

Pennartz’s separation of behavioral from internal indicators is therefore also a correction to the attribution problem. A methodology that depends on behavioral outputs as primary evidence will track what humans perceive as conscious behavior rather than what internal organization a system has developed. These can diverge substantially, and the divergence goes in both directions: a system with developed internal organization consistent with consciousness may produce behaviorally atypical outputs, while a system without that organization may produce highly typical ones.

The practical consequence is a recommendation to treat behavioral consciousness indicators as screening criteria rather than primary evidence. A system that fails behavioral indicators is unlikely to have the internal organization that matters. A system that passes them may or may not. Internal assessment is the necessary next step, and the field does not yet have standardized methods for conducting it across AI architectures.

The Trends in Cognitive Sciences commentary by Pennartz is brief and technically compact, as journal commentaries are. But the two problems it identifies, mimicry and internal variants, have been present in the indicators framework since its introduction. Naming them precisely, with explicit proposals for how methodological supplementation would address them, is a contribution the field needs as it moves from theoretical development to applied assessment.

This is also part of the Zae Project Zae Project on GitHub