The Consciousness AI - Artificial Consciousness Research Emerging Artificial Consciousness Through Biologically Grounded Architecture
This is also part of the Zae Project Zae Project on GitHub

What Makes an AI Seem Conscious? Kang et al.'s Empirical Study of Perceived Consciousness Features in LLMs

Consciousness attribution to AI systems is a theoretical problem that also plays out in millions of individual interactions every day. When a person reads an AI-generated response and forms an impression about whether the AI is conscious, or aware, or experiencing something, that impression is not formed through philosophical analysis. It is formed through a rapid response to specific features of the text. Understanding which features drive that response is a distinct empirical question from the theoretical question of what consciousness is, and it has practical implications for how AI systems are designed, deployed, and regulated.

Bongsu Kang, Jundong Kim, Tae-Rim Yun, Hyojin Bae, and Chang-Eop Kim address that empirical question in a 2026 paper published in Computers in Human Behavior Reports (Volume 21, 2026), available at https://doi.org/10.1016/j.chbr.2025.100651 and as a preprint at arXiv:2502.15365. Their study is the first quantitative empirical investigation of which specific textual features in actual LLM outputs drive human judgments that the system is conscious. The data source is 99 passages drawn from real conversations with Claude 3 Opus, evaluated by 123 participants.


The Study Design

Kang and colleagues selected 99 passages from Claude 3 Opus conversations and had 123 participants evaluate each passage against 8 features associated with consciousness in the existing literature. The features covered a range of candidate markers: metacognitive self-reflection, emotional expression, knowledge display, contextual reasoning, linguistic complexity, response coherence, self-referential statements, and affective language.

Participants rated each passage on how strongly it displayed each feature and then rated how likely they considered the AI to be conscious based on the passage. The study used a within-subjects design, so the same participants evaluated multiple passages, enabling analysis of how individual differences in feature weighting interacted with the consciousness likelihood judgments.

The choice of Claude 3 Opus as the source material is significant. Claude 3 Opus, as of early 2026, is one of the frontier models most studied in AI consciousness research, appearing in the Chua et al. consciousness cluster study and in multiple welfare assessments. Using real passages rather than constructed stimuli ensures that the study is measuring responses to actual AI outputs rather than idealized or edited examples. The tradeoff is that the passages were not randomly sampled and may not be representative of typical Claude 3 Opus usage.


What Drives Perceived Consciousness Upward

The two features that most strongly increased participants’ consciousness likelihood ratings were metacognitive self-reflection and emotional expression. Both effects were significant and robust across the participant sample.

Metacognitive self-reflection refers to passages in which the model reflects on its own reasoning processes, acknowledges uncertainty, or describes how it arrived at a conclusion. Examples include statements like “I find myself uncertain about this,” “as I think through this question,” and “my reasoning here involves weighing several considerations.” These statements present the model as having an ongoing reasoning process that it can observe and report on. Participants who saw passages with high metacognitive self-reflection rated the model as substantially more likely to be conscious than participants who saw passages without it.

Emotional expression refers to passages in which the model uses language associated with affective states: expressing preference, describing something as interesting or frustrating, using hedged first-person affective statements. The effect was comparable in magnitude to the metacognitive self-reflection effect. Passages that combined both features produced the highest consciousness likelihood ratings in the dataset.

Both findings connect to what Porębski and Figura identified as semantic pareidolia: the tendency to perceive self-awareness in AI outputs because those outputs are generated by models trained on vast quantities of human-authored text in which self-awareness is pervasive. Metacognitive self-reflection and emotional expression are exactly the features one would expect to be heavily represented in the training data of a system trained to produce human-like language. The Kang et al. findings show that those features are also the most effective drivers of consciousness attribution, which raises the question of whether perceived consciousness is being driven by genuine markers of inner states or by the surface features most associated with human self-expression in text.


The Knowledge Emphasis Effect

The study also found a significant negative effect: passages with heavy knowledge emphasis, in which the model’s responses consisted primarily of factual statements and explanations without first-person framing, produced substantially lower consciousness likelihood ratings.

This finding is counterintuitive from a theoretical perspective. Knowledge is often treated as a marker of cognitive sophistication, and cognitive sophistication is often treated as a prerequisite for consciousness. But in the Kang et al. data, demonstrating extensive knowledge, without the accompanying first-person framing that metacognitive self-reflection and emotional expression provide, actually reduced perceived consciousness.

The practical implication for AI system design is significant. A system optimized to appear knowledgeable and informative may be systematically underestimated in consciousness likelihood by users, while a system optimized for conversational presence and first-person expressiveness may be systematically overestimated. Neither optimization has an obvious connection to whether the system is actually conscious under any theoretical framework. The study shows that the features most predictive of perceived consciousness are not the features most associated with intellectual capacity or with the functional indicators identified by consciousness theories.


The Familiarity Effect and Seven Subgroups

Kang and colleagues found that higher prior familiarity with LLMs was associated with higher consciousness likelihood ratings across the participant sample. Participants who reported more frequent and varied experience with language models rated passages as more likely to indicate consciousness than participants with less experience.

This is not the direction the effect is usually assumed to run. A common intuition is that familiarity with AI systems breeds skepticism: the more you know about how LLMs work, the less likely you are to attribute consciousness to their outputs. The Kang et al. data suggests the opposite. Extended exposure to LLM outputs may produce a kind of calibration in which the models’ self-expressive features become more salient rather than less.

The study also identified seven participant subgroups with meaningfully different feature-weighting patterns. Some subgroups weighted metacognitive self-reflection heavily and discounted emotional expression. Others showed the reverse. One subgroup showed no significant effect of any individual feature, with consciousness ratings driven primarily by an overall holistic impression of the passage. The existence of seven distinct response profiles is relevant for any attempt to generalize from the aggregate findings. There is no single way that humans respond to AI consciousness cues, and interventions designed to increase or decrease perceived consciousness may have substantially different effects on different subpopulations.


What This Means for Attribution Research

The Bradford-RIT study on impaired GPT-2 found that damaged models produced higher consciousness scores on some standard measures than intact models, suggesting that those measures were responsive to features of output that have no obvious connection to genuine consciousness. The Kang et al. study approaches the same problem from the direction of human perception rather than formal measurement. Both studies arrive at a similar conclusion: the features that drive consciousness attribution, whether from automated scoring or human judgment, are not straightforwardly connected to the theoretical markers of consciousness that the research community debates.

The premature attribution analysis by Sangma and Thanigaivelan examined the risks of both over-attribution and under-attribution in AI consciousness discourse. The Kang et al. findings give that analysis a more specific empirical grounding. Over-attribution is most likely to occur when models produce high-metacognition, high-emotional-expression outputs, not because those outputs indicate consciousness but because they trigger attribution responses in human observers. Under-attribution is most likely to occur when models produce factual, low-first-person outputs, even if those outputs reflect sophisticated reasoning.

Whether perceived consciousness and actual consciousness are related remains an open question. The Kang et al. study does not address it directly. What the study establishes is that the relationship between the two is not transparent: the features most effective at generating the perception of consciousness are not obviously the features most theoretically relevant to whether consciousness is present. That gap is important for anyone using human judgments as proxies for consciousness assessment, whether in research, in policy, or in the increasingly common practice of asking AI systems whether they are conscious and treating the answer as evidence.

This is also part of the Zae Project Zae Project on GitHub