The Consciousness AI Technical Architecture
Most AI consciousness research starts from computational theories (Global Workspace Theory, Integrated Information Theory) and asks: How do we make a neural network conscious? We start from a different question, grounded in evolutionary neurobiology:
What minimal neural architecture does biology require to generate subjective experience?
The answer comes from Todd E. Feinberg and Jon M. Mallatt's The Ancient Origins of Consciousness (MIT Press, 2016). Their neuroevolutionary analysis reveals that consciousness is not a software feature to be programmed. It is an emergent property of a specific neural architecture. That architecture has been identified by 520 million years of evolution, and its functional principles can be replicated computationally.
Consciousness does not require a cerebral cortex. The first conscious creatures were early vertebrates (~520 MYA), and their consciousness lived in the optic tectum, a midbrain structure that stacks aligned sensory maps into a unified spatial model. This means consciousness requires a specific type of neural organization, not a specific amount of computation.
Feinberg and Mallatt identify six features that distinguish conscious neural systems from unconscious ones (like simple reflex arcs). Each maps directly to our implementation.
| # | Biological Feature | Our Implementation |
|---|---|---|
| 1 | Many neuron types with diverse connectivity | Specialist modules (vision, audio, memory, body) with different temporal dynamics |
| 2 | Hierarchical processing (3-4+ levels) | Genuine transformation at each level from sensory tectum through workspace to policy |
| 3 | Dual hierarchy: pyramidal + nested | 4-level hierarchical Capsule Networks (implemented) with dynamic routing by agreement and intra-hierarchy top-down prediction error feedback |
| 4 | Isomorphic (topographic) mapping | Sensory Tectum with RSSM world model preserving spatial arrangement |
| 5 | Reciprocal (reentrant) connections | ReentrantProcessor with 5-10 adaptive convergence cycles |
| 6 | Oscillatory binding (gamma synchronization) | AKOrN (Artificial Kuramoto Oscillatory Neurons, ICLR 2025) |
A multisensory spatial integration layer modeled after the biological optic tectum. Stacks aligned topographic maps for different sensory modalities in a common coordinate frame.
DINOv2-B/14 (frozen, facebook/dinov2-base) provides the tectum's spatially faithful patch tokens. Each patch token at grid position (i,j) corresponds to the exact 14×14 pixel region at (i·14, j·14). This direct spatial correspondence makes the mapping genuinely isomorphic, a computational analog of V1 retinotopy. A learned 1×1 Conv2d reduces channels from 768 to 64, followed by LayerNorm and GELU. All DINOv2 weights are frozen; only the projection trains. During training, the TDANN topographic loss (Margalit et al. 2024, Neuron) penalizes the negative Pearson correlation between response similarity and inverse spatial distance, forcing nearby grid cells to respond similarly, the same pressure that develops topographic maps in V1.
Qwen2-VL-7B (4-bit quantized) processes visual streams at the semantic level and provides scene understanding, object recognition, and language-grounded visual reasoning. Runs on consumer hardware (~6GB VRAM) via Any-Resolution Vision Tokenization (AVT). This stream feeds higher-level processing, not the tectum's spatial grid directly.
A biologically grounded auditory system models the mammalian pathway from basilar membrane through auditory cortex. Raw waveforms are processed, not transcribed.
[B, 64, 16] features for tectum grid integration.
--enable-audio
during training. No external model weights are required.
The body schema, a tensor representing the proprioceptive state of body parts, is projected onto the tectum's spatial grid via a learned linear map and fused alongside vision and audio. This is grounded in biology: deep layers of the superior colliculus contain somatotopic maps aligned with visual and auditory maps (Stein & Meredith 1993, ch. 4). The tectum is now trimodal, giving the agent a felt sense of its own body position as part of its perceptual field.
The three streams fuse using the inverse effectiveness rule (Stein & Meredith 1993; Ohshiro et al. 2011): when individual stimuli are weak, their combined response is proportionally larger than either alone. When both are strong, the enhancement is smaller. This is a core property of multisensory integration in the biological superior colliculus.
Based on AKOrN (Artificial Kuramoto Oscillatory Neurons, ICLR 2025 oral). Neurons are treated as oscillatory units on a hypersphere. Each specialist module (vision, audio, memory, body) operates as a coupled oscillator. When modules process related information, their phases synchronize naturally, and their outputs become "bound" into a unified percept. When information is unrelated, oscillators remain desynchronized and representations stay separate.
This replaces the typical approach of using a fixed multiplier or attention mechanism for binding. AKOrN produces genuine synchronization dynamics. The binding is emergent, not programmed. This directly addresses the binding problem through phase synchronization rather than single-point convergence.
A dedicated optimizer trains KuramotoLayer coupling weights using reward-correlated synchronization as a learning signal (Adam optimizer). Episodes with higher cumulative reward drive stronger coupling between the modules that were co-active. This is biologically grounded in dopamine modulation of gamma band synchrony in the hippocampus and prefrontal cortex (Benchenane et al. 2010).
The central information bottleneck where distinct sensory streams compete for broadcast access. Implements three integrated mechanisms:
Specialist modules submit bids to a shared workspace. The winning coalition ignites via sigmoid non-linear ignition and broadcasts to all modules. This is "conscious access" as described by Baars (1988) and Dehaene (2011).
Broadcast is fed back to all specialists, which update their processing based on top-down context. This creates loops, not chains. The system runs 5-10 adaptive convergence cycles (~200ms biological equivalent). Easy stimuli converge in 3-4 cycles. Novel or ambiguous inputs use the full 10. The settled state after convergence IS the conscious content.
The IIT measurement was rebuilt from the ground up to correct a previous methodological error where Phi was computed from workspace bid values (salience estimates) rather than genuine causal states. The current system measures Phi using 5 ConsciousnessGate nodes: attention, stability, adaptation, coherence, and confidence. All five values are produced by learned networks operating on the broadcast tensor. They are no longer static placeholders. Gate values feed directly into both IIT Phi computation and Effective Information measurement. The nodes have genuine causal dependencies: attention drives stability, stability modulates adaptation, coherence feeds adaptation, confidence loops back to attention. Adaptive binarization thresholds use running medians rather than a fixed 0.5 cutoff. When PyPhi is not installed, a geometric proxy (determinism × integration) is used, which correlates with actual Phi. Results are returned as a PhiResult dataclass with the value, method used ("pyphi", "proxy", or "insufficient_data"), node labels, current state, and transition count. Validated via a 3-condition controlled experiment: unbound, partially bound, and fully bound states.
A 4-level hierarchical capsule composition chain implements the dual hierarchy Feinberg and Mallatt describe. Level 1: PrimaryCapsuleLayer (stride-2 Conv2d, squash normalization bounding activity to [0, 1)). Level 2: 16 intermediate capsules with 12-D pose vectors (object primitives). Level 3: 8 higher capsules with 16-D poses (object categories). Level 4: 4 output capsules with 16-D poses (scene/workspace level). Dynamic routing by agreement (Sabour et al. 2017) runs at each routing level with 3 iterations by default.
Beyond standard routing, the capsule hierarchy has intra-hierarchy reentrant feedback: higher-level capsule poses are projected back down to lower levels, which compute prediction errors that feed back into re-routing. This is a V1-LGN style top-down prediction error mechanism operating within the tectum's forward pass, nested inside the outer ReentrantProcessor loop. The system achieves two distinct layers of reentrant processing: one within the capsule hierarchy (fast, bidirectional within a single tectum pass) and one at the workspace level (the outer ReentrantProcessor, 5-10 adaptive cycles).
A parallel modulation system. Emotion does not compete with sensory modules for workspace access. Instead, it generates a valence field that modulates all sensory bids before competition, and a global arousal signal that adjusts the workspace ignition threshold.
This matches biological architecture exactly. The limbic system does not compete with sensory cortices for conscious access. It modulates sensory processing from outside, assigning emotional valence to all inputs. Fear makes you hyper-aware of movements. Joy makes you notice more of the world.
Three intrinsic variables drive the agent: Valence (satisfaction/distress), Arousal (activation/calm), and Dominance (control/helplessness). Homeostatic drives (energy, safety, curiosity) generate ongoing valence signals even without external stimuli.
Interoceptive state (energy level, fatigue, accumulated damage) generates PAD deltas directly: low energy produces negative valence proportional to depletion depth; high fatigue suppresses arousal and adds negative valence; damage triggers a strong negative valence spike, an arousal alarm signal, and reduced dominance (vulnerability). These interoceptive PAD contributions are summed with the external emotional state before the AffectiveModulator applies its valence field and arousal-threshold coupling. The body schema also feeds into the tectum's spatial grid as the somatosensory channel. This closes a loop: the agent's bodily state shapes both what it perceives (tectum level) and how it values what it perceives (affective level). This is the computational analog of Damasio's somatic marker hypothesis.
Feinberg and Mallatt identify referral (projicience) as a core property of consciousness: experiencing sensations as belonging to the world or body, not to the processing system. The Self-Model provides the basis for this.
Actor-Critic (PPO) with emotionally shaped rewards. The agent is rewarded not just for task success, but for maintaining emotional homeostasis.
Rtotal = Rext + λ1 · ΔValence - λ2 · (Arousal - Arousaltarget)² + λ3 · Dominance
This creates functional pressure toward minimizing internal dissonance. High arousal (large prediction errors) induces negative reward, motivating behaviors that reduce uncertainty. The agent "prefers" predictable environments not through programmed rules but through emergent functional dynamics.
Four built-in Gymnasium environments provide the agent's body and world. No Unity dependency is required for training.
SimpleVisualEnv)The agent starts in darkness (high arousal, negative valence). A single light source reduces prediction error when reached. The agent learns to seek it through homeostatic drives, not programmed rules. Renders via PyGame with raw pixel observations.
Multi-room grid with fog of war, colored goals with varying rewards, a battery system, and doorway-based room transitions. Tests spatial memory and exploration strategy.
Gold standard consciousness task from animal research. Four phases: fixation, sample, delay, choice. The agent must retain the sample stimulus across 15-40 blank delay steps and select the matching option from distractors. Requires working memory, feature binding, and selective attention. A reactive agent without workspace machinery cannot solve this task.
Tests meta-cognition and cognitive flexibility. The agent sorts cards by an unknown rule (shape, color, or count) that changes without warning after consecutive correct sorts. Requires error monitoring, hypothesis testing, and inhibition of previously correct strategies.
A vanilla DQN agent (3-layer CNN + MLP Q-network, epsilon-greedy, replay buffer) runs the same environments using the same interface and logging format. This provides a controlled scientific comparison: same observations, same actions, same reward signals, different architecture.
Three C# scripts (AgentManager.cs, ConsciousnessChannel.cs,
EmotionChannel.cs) in unity_scripts/ provide the foundation
for connecting to a physics-based Unity environment via side channels. The Unity
project itself is not yet included in the repository. Unity integration is under
development and is not required for current training runs.
┌─────────────────────────────────┐
│ AFFECTIVE MODULATOR (Parallel) │
│ Valence Field + Arousal Coupling │
└──────────┬──────────┬───────────┘
│ modulates│
┌───────┐ ┌──────▼──────────▼──────────┐
Visual ──►│ │ │ GLOBAL WORKSPACE │
Input │SENSORY│ │ AKOrN Oscillatory Binding │──► Broadcast ──► Policy
│TECTUM │───►│ Non-linear Ignition │
Audio ──► │(RSSM) │ │ Phi/EI Measurement │
Input │ │ └──────▲──────────▲───────────┘
└───────┘ │ │
│ reentrant│
┌──────────┴──────────┴───────────┐
│ SPECIALIST MODULES │
│ Vision │ Audio │ Memory │ Body │
│ (receive_broadcast feedback) │
└─────────────────────────────────┘
│
┌──────────────▼──────────────────┐
│ SELF-MODEL │
│ Body Schema + Interoception │
│ Identity + Capability Model │
└──────────────────────────────────┘
A key methodological commitment: we do not assume consciousness emerges from our architecture. We test for it.
We implement Erik Hoel's Effective Information (EI) framework (PNAS 2013) to measure whether macro-level states (workspace) carry more causal information than micro-level states (individual gates). If EI(workspace) > EI(gates), the workspace level exhibits causal emergence. The macro level is more deterministic than the micro level, meaning the whole genuinely carries information that the parts do not.
If this never occurs across training, the system is not exhibiting the kind of emergence associated with consciousness, and we know our architecture needs revision.
| Traditional AI Consciousness | Our Approach |
|---|---|
| Starts from computation (GWT, IIT) | Starts from biological architecture (Feinberg-Mallatt) |
| Consciousness as a software feature | Consciousness as emergent from neural architecture |
| Cortex-centric models | Tectum-first (consciousness evolved before the cortex) |
| Emotion competes with sensory processing | Emotion modulates from outside (parallel modulator) |
| Binding via attention mechanisms | Binding via oscillatory synchronization (AKOrN/Kuramoto) |
| Feedforward processing | Reentrant processing (5-10 adaptive cycles) |
| Flat vector representations | Topographic spatial maps (world model as isomorphic map) |
| Assumes emergence, measures nothing | Falsifies emergence with Effective Information + Phi validation |
The consciousness agent and a vanilla DQN baseline were trained across three environments.
Results are published in docs/results/experiment_comparison.md.
Darkness triggers high arousal (simulated fear) in the affective core. The valence field applies negative valence to dark observations. Arousal-threshold coupling lowers the workspace ignition threshold, creating heightened sensory awareness. The agent learns to seek the light source through homeostatic drives, not through a programmed rule.
Sample stimulus presented, delay of 15-40 blank steps, then forced choice between match and distractors. Requires working memory and feature binding across the delay interval. A reactive agent without Global Workspace machinery cannot hold the sample across the gap.
Card sorting rule (shape, color, or count) changes without warning after consecutive correct responses. Requires error monitoring, rule hypothesis tracking, and inhibition of previously rewarded strategies.
| Metric | Dark Room | DMTS | WCST |
|---|---|---|---|
| Consciousness agent episodes | 492 | 100 | 100 |
| DQN baseline episodes | 1000 | 500 | 500 |
| DQN last-100 reward | 92.0 | -4.1 | 2.1 |
| Consciousness agent last-100 reward | 13.0 | -9.8 | -1.9 |
| Avg Phi (consciousness agent) | 0.022 | 0.022 | 0.022 |
| Phi varies per step | Yes | Yes | Yes |
| EI ratio (workspace / gates) | 2.41 | 2.42 | 2.42 |
DQN outperforms on raw reward in short runs. The consciousness pipeline adds overhead per step without contributing to the action policy directly at this training scale. The consciousness agent produces measurable causal emergence across all three environments (EI ratio ~2.4) and variable Phi dynamics. These are early results from short training runs, not final claims. Known limitation: Phi proxy converges toward a fixed point after ~5000 steps because the TPM saturates. Sliding-window TPM and longer training runs are the next step for assessing Phi dynamics at scale.
Development validates emergent properties through five parallel tracks:
The AsimovComplianceFilter is fully implemented inside ConsciousnessCore with 32 tests. It evaluates actions through a three-law hierarchy:
The DreamerV3 world model is wired into the ethics filter to run imagined future trajectories for harm assessment. This is not a keyword filter. It is a causal prediction loop.
The biological validation stack is complete. AKOrN's oscillatory parameters (natural frequencies from skew-symmetric matrices, coupling weights, amplitudes) are translated to a standard Kuramoto network in Brian2. Both networks run from the same initial conditions, and their synchronization order parameter R curves are compared via Pearson correlation (threshold: 0.85). This is the numerical bridge between the artificial oscillatory binding system and standard computational neuroscience spiking models. 19 tests pass (translation, simulation, interpolation); 3 are intentionally skipped pending Brian2 installation, as Brian2 is an optional dependency not installed by default.
A self-narrative system generates first-person descriptions of what the agent is experiencing. The default backbone is Qwen2.5-0.5B via HuggingFace transformers, with three-tier fallback: LLM generation → injected LLM dependency → template-based generation. A CoherenceTracker measures narrative consistency via rolling-window Jaccard similarity on keywords. Results are returned as a NarrativeResult dataclass with the text, coherence score, and method used ("llm", "injected", or "template"). Memory retrieval and emotional context are injected into the generation prompt. This system connects to the Attention Schema Theory component in the memory subsystem, giving the agent a model of its own attentional state expressed in natural language.
Before any training runs, 9 testable predictions were deposited in
docs/preregistered_predictions.md, following the methodology of Melloni et al. 2025
(the adversarial IIT/GNW collaboration, n=256, fMRI+MEG+iEEG, Nature). This pre-registration
distinguishes the project from architectures that interpret results post-hoc.
An "insight moment" is operationally defined by 4 criteria: novel state-action pair, measurable reward jump, first-attempt success, and high workspace occupancy. These criteria were specified before training began.
As of April 2026:
unity_scripts/All components are open-source with commercial-use licenses (Apache 2.0, MIT, or similar).
The full codebase, including all architecture implementations and tests, is open-source.