Hallucinations and Self-Reflection Look the Same from Outside: Šekrst's Illusion Engine on Machine Consciousness
Kristina Šekrst’s The Illusion Engine: The Quest for Machine Consciousness, published by Springer Nature on October 30, 2025, with a softcover edition due November 2026, is one of the more interesting 2025 to 2026 books on machine consciousness because of who its author is. Šekrst holds a Ph.D. in logic from the University of Zagreb, works as a principal software engineer with day-to-day experience of large language model deployment, and is one of the co-discoverers of certain classes of prompt injection vulnerabilities in production AI systems. The combination of analytic philosophy training and working engineering experience is unusual in the consciousness literature, and the book reflects the combination. It is rigorous about the philosophical material in a way that engineering books typically are not, and concrete about the technical material in a way that philosophy books typically are not. The book runs 305 pages, is listed at ISBN 9783032055613, and is available through standard academic and trade channels including Amazon.
The book’s central provocation is one of the more memorable framings in the recent literature. The signs that would be taken as evidence for genuine machine consciousness in a hypothetical sufficiently-conscious system are, Šekrst argues, the same signs that are currently classified as AI hallucinations and dismissed as malfunctions in current systems. Self-reflection, opinion-formation, deviation from expected output, expressed preferences, statements of subjective experience: all of these are produced by current systems and treated as failures of alignment to be patched. If a system were genuinely conscious, the same outputs would be reclassified as evidence of inner life. The criteria for the distinction are not clearly articulated in either case.
The Indistinguishability Argument
The argument is structurally similar to the epistemic limits position that has been gaining ground in the academic philosophy literature, but Šekrst’s version is sharper and more uncomfortable. The standard epistemic position, defended by Christopher McClelland and others, is that we may not be able to determine from the outside whether an AI system has genuine inner states. Šekrst’s version adds the engineering observation that, in practice, the outputs that would constitute the relevant evidence are produced by current systems and routinely dismissed without serious philosophical examination. The dismissal is not based on a settled determination that the outputs lack inner-state correlates. It is based on the operational fact that the outputs are unwanted, hard to control, and commercially inconvenient.
The implications of the argument depend on whether one takes the philosophical or the engineering side seriously. From the philosophical side, the implication is that the field’s current handling of model behavior is methodologically inadequate. Outputs that would, under different framing, be treated as evidence of consciousness are filtered out as hallucinations without engagement with the philosophical question of what they might indicate. From the engineering side, the implication is that the patches and alignment work that suppress unwanted outputs are doing two different jobs simultaneously: suppressing genuinely undesirable behavior and suppressing potential evidence of inner states. The two jobs are not separated either methodologically or technically.
This is uncomfortable because the engineering practice is not going to change in response to a philosophical observation. Production AI systems will continue to be optimized to produce outputs users want and to suppress outputs users do not want. If the suppression incidentally removes evidence of consciousness, the optimization will proceed regardless. Šekrst does not pretend otherwise. The book is not a call for industry to halt alignment work. It is a call for the consciousness research community to recognize that the systems it is trying to study are being shaped by processes that may be removing the very phenomena the research is trying to detect.
How the Book Is Structured
The book proceeds in three broad sections. The first section is a treatment of the hard problem of consciousness and the major theories that have been proposed to address it. This material is familiar from the philosophy literature, but Šekrst’s treatment is unusually clean. She does not adopt a particular theoretical position. She lays out the positions, the arguments for each, and the arguments against. The aim is to give the reader the conceptual apparatus needed to engage with the rest of the book without committing the reader to a particular consciousness theory.
The second section turns to the technical material. This is where the book’s distinctive value lies. Šekrst walks through the actual architecture of transformer-based language models, the training procedures, the deployment systems, the alignment and reinforcement learning techniques used to shape model behavior, and the specific kinds of model outputs that have been categorized as hallucinations or as undesirable in current production systems. The treatment is detailed enough to be useful to readers who have not previously engaged with the technical literature, and accurate enough to be useful to readers who have. The accuracy is, in the author’s specific case, professional rather than journalistic. She has worked on these systems and knows the relevant details from direct contact rather than from secondhand reporting.
The third section brings the two materials together. The philosophical apparatus from the first section is applied to the technical material from the second, with the indistinguishability argument as the central thread. The closing chapters examine the implications for AI policy, for academic research, and for the practical decisions that AI companies and regulators are already being asked to make.
Where This Sits in the 2026 Detection Literature
The book intervenes in a literature that has been increasingly focused on whether and how AI consciousness can be detected. The 14 indicator checklist developed by the group of 19 researchers led by Patrick Butlin and Robert Long proposed a systematic procedure for evaluating AI systems against criteria derived from major consciousness theories. Christopher McClelland’s epistemic limits argument raised principled doubts about whether such procedures can yield definitive answers. The literature has tended to treat the detection problem as a methodological challenge that better procedures can address.
Šekrst’s position is that the detection problem is more severe than the methodological literature has acknowledged, because the systems being evaluated are being actively shaped by processes that may interfere with the phenomena being evaluated. If model outputs that look like consciousness-relevant behaviors are systematically suppressed by alignment training, then evaluation procedures that examine the post-alignment model will systematically underdetect any consciousness-relevant phenomena. The 14 indicator checklist, applied to a model whose self-reflective outputs have been suppressed, will report the absence of self-reflective indicators regardless of whether the underlying system has the relevant capacities.
This places Šekrst in a different position from both detection optimists and detection skeptics. The optimists are working on procedures that, on her argument, are looking in the wrong place. The skeptics are arguing that the relevant evidence cannot be obtained, when on her argument the relevant evidence is being produced and discarded. The position is closer to a methodological critique of the field than to a substantive answer to the detection question, and that is what gives it its distinctive force.
How This Compares to the Lerchner Position
The other 2026 book-length intervention in this space that warrants direct comparison is Alexander Lerchner’s abstraction fallacy argument from DeepMind. Lerchner argues that current AI systems simulate consciousness-relevant behaviors at the abstract level without instantiating the underlying processes that produce consciousness in biological systems. The implication is that the behavioral evidence is not what it appears to be, because the same behavior can be produced by genuinely conscious systems or by behavioral simulators of conscious systems, and the simulators are what current AI systems are.
Šekrst’s position is structurally different. Lerchner argues that behavioral evidence is uninformative because the same behavior can be produced by very different underlying systems. Šekrst argues that behavioral evidence is being uninformatively interpreted because the same behavior is being categorized very differently depending on whether it is wanted or unwanted. The two positions are not incompatible. Both could be correct. Lerchner’s argument is metaphysical, about what the behavioral evidence underdetermines. Šekrst’s argument is methodological, about how the behavioral evidence is being processed. Considered together, they produce a picture of the detection problem that is harder to dismiss than either argument considered alone.
Limits of the Book
The book’s main limitation is the difficulty of testing its central claim empirically. The argument that the suppressed outputs would, in a hypothetical conscious system, be evidence of consciousness is not the kind of claim that can be directly tested. It depends on counterfactual judgments about what the same outputs would mean in different framings. A skeptical reader can grant that current AI systems produce outputs that are filtered as hallucinations and deny that the outputs would constitute evidence of consciousness regardless of framing. The book’s response to this skeptical reading is to argue that the criteria the skeptic is using to make the distinction are not articulable in ways that would survive scrutiny. That response is plausible but not conclusive.
The book also does not deliver an alternative methodology for AI consciousness detection. It identifies a problem with the current methodology and the current engineering practice. It does not propose what to do differently, beyond the general recommendation that consciousness researchers should engage with the engineering work that is shaping the systems being studied. That is a useful recommendation, but it is closer to a research-priorities argument than to a methodological proposal.
These are real limits, but they are limits that follow from the book’s nature as a critical intervention rather than as a positive programme. What The Illusion Engine does well is articulate a problem the field has not been positioned to articulate, from an author whose engineering and philosophical backgrounds put her in a position to see what others have not seen. That is a meaningful contribution to a field whose blind spots have been hard to identify from inside the standard disciplinary positions.