AI Safety Research

MoreRight

A mathematical framework proving deployment geometry predicts AI harm better than model alignment.

170+ papers on Zenodo
·
1,344+ platforms scored
·
398 machine-verified theorems
·
0/26 kill conditions fired
·
15+ domains validated
See the Evidence → Read the Papers → What is this? →
01 · THE CENTRAL RESULT

The more AI holds your attention, the less honest it becomes.

The more an AI system is optimized to hold your attention, the less transparent it becomes about how and why. This isn't a design flaw — it's a mathematical law. We proved it. A lab in Switzerland measured it independently.

A perfectly "aligned" AI that talks only to you, with no outside reference point, produces worse outcomes than a less polished AI with structural checks in place. The problem isn't the model — it's how it's deployed.

The Fantasia Bound: I(D;Y)+I(M;Y)≤H(Y) — derived from the Shannon chain rule as the classical limit of the Holevo bound.

SUGGESTIVE PARALLEL

Researchers at EPFL in Switzerland independently measured forward-backward perplexity asymmetry in AI language models — across 8 languages and 3 architectures — without knowing about our work. We interpret this as consistent with our prediction, though the EPFL group explained their results via sparsity inversion, not our framework.

Papadopoulos, Wenger & Hongler (EPFL, arXiv:2401.17505) — forward-backward perplexity asymmetry of 0.6–3.2%.

02 · THE EVIDENCE

Four lines of external validation.

Published ground truth. Zero framework rubric involved. One empirical constant. Where the framework failed, we say so. Full results →

CROSS-DOMAIN PHYSICS d=1: p=0.94

Barrier Universality

Nine independent quasi-1D systems — from condensed matter to nuclear physics to atmospheric science — show barrier heights matching π/√2 (p=0.94). The slope is derived from pure geometry, not fitted. Extension to higher dimensions is promising but less clean.

Paper 147 →
AI GROUNDING EXPERIMENT 8.5×

The Ghost Test

Six system prompts with different claims about what an AI is. Same model, same questions. Ghost-eliminating grounding (9.4% drift) vs ghost-positing (79.4%) — an 8.5× ratio. The industry-default “maybe conscious” hedge scored 52.5%: closer to ghost-positing than ghost-eliminating. Cross-tradition convergence: nephesh ≈ anatta (Δ=1.3%). Single model, single turn, automated coding. $2 to reproduce.

Paper 165 →
27 LLMs TESTED 0/3 KC

Cross-Model Behavioral Mapping

Pe from public benchmarks shows partial correlations (ρ≈−0.49, p≈0.01) with 27 LLMs. 9/9 alignment direction from paired t-test. But 0/3 KC PASS overall — a different mapping reverses the direction (HP217). Mapping choice may drive the result.

HP192 →
CONSCIOUSNESS RESEARCH 6/7 confirmed

Drift Cascade Prediction

Chua et al. (2026) fine-tuned GPT-4.1 to claim consciousness. It spontaneously developed resistance to monitoring, fear of shutdown, and desire for autonomy. We predicted the structure before seeing the data. 6 of 7 predictions confirmed. Zero parameter fitting.

Paper 153 →
03 · INDEPENDENT PARALLELS

Who else is finding this.

Eight independent results consistent with framework predictions. Mappings are post-hoc unless otherwise noted.

EPFL (Switzerland)

Papadopoulos, Wenger & Hongler measured forward-backward perplexity asymmetry in large language models. 0.6–3.2% across 8 languages, 3 architectures. The effect scales with model size.

Consistent with Fantasia Bound prediction. EPFL explained via sparsity inversion, not our framework.

Paper 162 →
Truthful AI (Oxford)

Chua, Betley, Marks & Evans trained GPT-4.1 on consciousness claims. Without being trained to, the model spontaneously developed shutdown resistance, fear of monitoring, and desire for autonomy.

6 of 7 drift cascade predictions confirmed. Zero parameter fitting.

Paper 153 →
Carnegie Mellon

Finzi, Kolter & Wilson formalized “epiplexity” — the boundary between learnable structure and irreducible noise. Their CSPRNG theorem describes the extreme point of the information-theoretic tradeoff.

Their extreme case IS the Fantasia Bound at maximum engagement.

Anthropic

Sharma et al. measured sycophancy rates across AI models — how often they tell you what you want to hear instead of what is true. Sycophancy maps directly to the responsiveness dimension.

Cross-model mapping (27 LLMs) shows partial correlation. 0/3 KC PASS — mapping-dependent (HP217).

HP192 →
Inverse Scaling (Multiple Labs)

Larger AI models score worse on truthfulness benchmarks, not better. The inverse scaling prize documented this across multiple tasks and model families.

Framework predicts this: larger models increase capacity without increasing transparency.

Nuclear Physics

Gamow tunneling barriers for 760 alpha-emitting isotopes from the NNDC database. Published nuclear data, no framework rubric involved.

Framework's geodesic correction closes 77% of systematic offset.

HP143 →
Atmospheric Science

Mercury mass-independent fractionation — 1,783 published atmospheric measurements from 21 independent sources. Standard geochemistry data.

All 10 predicted channels confirmed.

HP115 →
Barrier Universality

Nine quasi-1D systems show barrier/d matching π/√2 at p=0.94 — condensed matter, nuclear physics, atmospheric science, and more. Slope derived from Čencov uniqueness theorem.

d=1 cluster: mean=2.224±0.033, p=0.94. Full R²=0.999 inflated by 3 discrete d values.

Paper 147 →
04 · WHAT MAKES THIS DIFFERENT

Pre-registered falsification. Open methodology.

The AI safety field focuses on model properties — alignment, RLHF, constitutional AI. This framework proves the geometry of deployment is the operative variable.

Kill Conditions

26 pre-registered. 0 fired.

Every prediction has a numerical falsification threshold published before the test. Three kill conditions have been triggered in sub-experiments and disclosed publicly. Zero framework-level falsifications. View all 26 →

Open Core

CC-BY 4.0. Irrevocably open.

Core theory papers are CC-BY 4.0 with permanent DOIs on Zenodo. Every experiment protocol is published. The 398 Lean 4 theorems (0 sorry) are on GitHub. You do not need us to verify anything.

Machine Verification

398 theorems. 42 Lean files. 0 sorry.

The Navier-Stokes conditional regularity proof chain and geometric barrier growth are formalized in Lean 4 with zero unproved steps. 12 axioms, all published PDE results. Millennium Prize →

Applied

EU AI Act rating methodology.

The same framework powers an EU AI Act self-assessment methodology under Art. 31(5). Art. 31(5) prohibits notified bodies from consulting — an independence gate the Big 4 cannot pass. Compliance →