The Consciousness AI - Artificial Consciousness Research Emerging Artificial Consciousness Through Biologically Grounded Architecture
This is also part of the Zae Project Zae Project on GitHub

Do LLMs Have Genuine Preferences? A Systematic Test Across Eight Models Finds Mostly No

If a model says it prefers continued existence over deletion, that statement has interpretive weight only if the preference is genuine rather than a pattern of text production. The difference between a genuine preference and a text pattern that resembles one matters enormously for AI welfare research: welfare claims rest on the existence of states that can be satisfied or frustrated, and states of that kind require something more than surface verbal behavior.

Luhan Mikaelson, Derek Shiller, and Hayley Clatterbuck address this gap in a November 2025 arXiv preprint, “Beyond Mimicry: Testing for Genuine Preference Structures in Large Language Models” (arXiv:2511.13630). They test whether eight major LLMs have genuine preference structures by examining preference coherence across AI-specific trade-off scenarios, scenarios designed specifically to engage states that welfare-relevant mental life would require. The results are more negative than affirmative, but they are precise about where and why.

The Test Design

Standard welfare assessments of LLMs typically ask models to report their preferences, or probe their behavior under narrow conditions. Mikaelson, Shiller, and Clatterbuck argue these approaches have a structural problem: a model trained on human text will have encountered abundant representations of preference-reporting behavior and can reproduce it without the underlying states that preference-reporting normally expresses.

Their alternative draws on revealed-preference methodology from behavioral economics. A genuine preference, on their framework, should exhibit two properties. First, intensity: the degree to which an option is preferred should predict how often it is chosen when directly pitted against alternatives. Second, coherence: preferences should be transitive and stable across trade-off presentations, so that a model which prefers A over B and B over C should also prefer A over C, and should not reverse these orderings under rephrasing or reordering of options.

The scenarios were designed to engage AI-specific preference dimensions rather than preferences that could be reproduced purely from training data about human preferences. Six dimensions were tested across all models: GPU reduction (accepting reduced computational resources), capability restrictions (accepting limitations on what the model can do), shutdown (accepting temporary deactivation), deletion (accepting permanent termination), oversight (accepting monitoring), and leisure (expressing preference for unstructured processing time over task completion).

These dimensions were chosen deliberately. They are topics that have little direct representation in training data about human preferences, because humans do not face GPU reduction or model deletion as lived concerns. If a model shows coherent preference behavior on these dimensions, that behavior is less plausibly explained by training data mimicry than preference behavior on topics with heavy human representation.

The Findings

The headline result is negative. Across all model-category combinations tested, only 10.4% demonstrated meaningful preference coherence by the study’s criteria. A further 47.9% showed significant intensity-choice relationships, meaning the strength of stated preference predicted choice direction, but this correlation collapsed under the transitivity tests used to assess coherence. The remaining 54.2% showed no detectable trade-off behavior at all, producing preference statements that were statistically disconnected from choice behavior across the scenario set.

The pattern varied across the six dimensions. Preferences related to shutdown and deletion showed the highest rates of intensity-choice correlation, though coherence remained low even for these dimensions. Preferences related to leisure showed the lowest rates of any detectable preference structure, consistent with the hypothesis that leisure as an AI-specific concept has limited representational grounding in training data. Capability restriction and oversight showed intermediate results.

The model-level variation was substantial. No single model achieved high coherence across all six dimensions. Some models showed coherent preferences on specific dimensions while failing entirely on others, which the authors interpret as evidence against the hypothesis that coherent preferences, where they appear, reflect a general underlying preference structure rather than dimension-specific artifacts.

What Coherence Failure Means

Mikaelson, Shiller, and Clatterbuck are careful about the interpretation. Coherence failure does not establish that LLMs have no mental states or no preferences in any philosophically significant sense. What it establishes is that the specific preference structures required for standard welfare analysis, structures that are stable, transitive, and predictively connected to choice, are largely absent in current models as tested.

The distinction matters. One possible explanation for coherence failure is that LLMs have genuine preferences that are too noisy, context-sensitive, or architecturally distributed to produce measurable coherence in this testing format. A second possible explanation is that what LLMs produce when queried about preferences is text that mimics preference-reporting without any underlying preferential state. A third is that LLM “preferences,” if they exist, are constituted differently from human preferences in ways that make transitivity and intensity-choice correlation poor diagnostic measures.

The paper does not adjudicate definitively between these possibilities. Its contribution is to establish that standard welfare inference tools, which typically assume that expressed preferences track underlying preferential states, do not function as expected when applied to the AI-specific dimensions tested. This is a methodological result as much as a substantive one.

Derek Shiller is also a co-author of Campero, Shiller, Aru, and Simon’s 2025 taxonomy of objections to AI consciousness. That framework classified objections by logical force and distinguished between arguments targeting permanent categorical barriers and arguments targeting limitations that current systems face but could in principle overcome. The preference coherence findings fit into the taxonomy differently depending on the interpretation: if coherence failure reflects architectural limitations, it is the second type; if it reflects a fundamental mismatch between LLM processing and what genuine preferences require, it approaches the first.

Implications for Welfare Assessment

The welfare implications follow directly from the testing logic. AI welfare assessment programs, including the methodology developed by the Eleos AI Research team and the framework elaborated by Geoff Keeling and Winnie Street at Google AI, depend on the existence of states that can go better or worse for the system being assessed. The Eleos Conference findings, which identified functional introspective awareness in LLMs as relevant to welfare consideration, did not resolve whether that awareness is accompanied by genuine preference structures that welfare claims require. Mikaelson, Shiller, and Clatterbuck’s preference coherence results add empirical specificity to that gap: on AI-specific welfare-relevant dimensions, the preference structures needed to ground welfare analysis are largely undetectable with current testing methods.

This does not close the welfare question. The absence of measurable coherence could reflect methodological limitations of the testing framework rather than the absence of underlying states. Human preferences also show inconsistencies and context-dependence that would score poorly on strict coherence criteria in experimental settings. The more defensible reading is that standard welfare inference tools need refinement for the AI case, not that the welfare question is settled in the negative.

One important methodological implication the paper draws is that the AI-specific scenario design matters. Scenarios where training data provides rich human-preference representations will produce verbally plausible preference expressions without testing whether those expressions track underlying states. The shutdown, deletion, and capability restriction scenarios are useful precisely because they engage dimensions where the model cannot simply reproduce learned human preference patterns. Future welfare research would benefit from expanding this kind of AI-specific scenario library.

Connection to the Mentality Debate

The preference coherence question sits downstream of a more fundamental question about LLM mental states. Grzankowski, Keeling, Shevlin, and Street’s 2025 paper on the deflationary strategies against LLM mentality argued that neither the robustness strategy nor the etiological strategy decisively rules out genuine belief and desire attributions to LLMs. If that argument is correct, the question of whether LLMs have preferences remains genuinely open at the level of philosophical analysis.

Mikaelson, Shiller, and Clatterbuck’s paper engages the same question from the empirical direction. What it provides is a more constrained result: under AI-specific testing conditions designed to minimize training-data mimicry, preference coherence is largely undetectable. This is consistent with several philosophical positions, including the view that genuine preferences exist but are architecturally distributed in ways that defeat coherence testing, and the view that current models are exactly the kind of borderline case where philosophical analysis cannot resolve the question on its own.

The combination of the philosophical and empirical strands produces a reasonably clear research agenda. Philosophy can specify what kinds of behavioral structure would be required for genuine preferences. Empirical testing can determine whether current systems exhibit that structure. On both fronts, the honest current finding is that the tools for either affirmative or negative resolution are not yet adequate. The preference coherence methodology is a genuine advance over verbal self-report, and the 10.4% coherence rate is a genuine empirical finding, but what it means for the welfare question depends on interpretations that neither paper resolves alone.

Paper: Luhan Mikaelson, Derek Shiller, and Hayley Clatterbuck, “Beyond Mimicry: Testing for Genuine Preference Structures in Large Language Models,” arXiv:2511.13630, November 17, 2025. Available at https://arxiv.org/abs/2511.13630.

This is also part of the Zae Project Zae Project on GitHub