We built a mathematical framework to measure how AI platforms affect people. Then we tested it against real-world data from physics, chemistry, biology, and more — fields the framework was never designed for. These are the results: what it got right, what it got wrong, and where we're still uncertain.
Multiple physical domains. One empirical constant (BA). All data from published, independent sources.
It’s easy to build a scoring system that confirms itself. We wanted to know if the math actually works — so we tested it on data from fields we never designed it for.
Most of our 1,344 platform scores use the framework’s own rubric — useful for practitioners but scientifically circular. The real test: can the framework predict numbers it has never seen, in domains it wasn’t built for, using published data that exists independently? These results are that test. Where the framework failed, we say so.
The same mathematical pattern keeps showing up across completely unrelated fields — from magnets to epidemics to nuclear physics — and we didn’t make it fit.
The strongest result is the d=1 cluster: nine independent quasi-1D systems — charge density waves, kagome metals, nuclear alpha decay, atmospheric sudden stratospheric warmings, and more — show barrier/d = 2.224 ± 0.033, matching π/√2 at p=0.94. BG = π/√2 is derived from the Čencov theorem (§165, zero free parameters). BA ≈ 0.867 is empirical (suggestive match to √3/2 but not yet derived). The full-dataset R²=0.999 is structurally inflated by only 3 discrete d values; the d=1 within-group test is the honest measure.
Paper 147: Barrier Universality →We tested the framework against 760 radioactive isotopes. The math predicted their decay rates across 10 orders of magnitude with zero adjustment.
Pe-derived barrier heights predict nuclear alpha decay half-lives from NNDC published tables. Original test: 24 isotopes, R²=0.989 across 10 orders of magnitude. Extended (HP143): 760 isotopes, Gamow baseline R²=0.811, geodesic correction closes 77% of the systematic offset. The extension revealed that the framework’s coupling constant does not transfer across domains — barrier shape is universal, coupling scale is not.
Paper 101 + HP143: Nuclear Validation →Atmospheric chemistry data from 1,783 measurements. Every predicted channel confirmed — including one that was invisible in earlier marine data.
The framework predicted 10 specific isotope enrichment channels in mercury atmospheric chemistry. Tested against 1,783 real atmospheric measurements from Gacnik et al. (2025). All 10 predicted channels confirmed with mean absolute deviation of 0.012. Iodine channel (R=2.085) confirmed at R=2.13 predicted — a channel that was invisible in marine data.
Paper 134 + HP115: MIF Channel Confirmation →Real turbulence data from the Johns Hopkins database. The framework predicted a key smoothness property would hold — it does, and it connects to one of math’s biggest open problems.
The framework predicts that the Gevrey analyticity radius sigma/nu is bounded and does not collapse with increasing Reynolds number — a necessary condition for Navier-Stokes regularity. Tested on 4 real datasets from the Johns Hopkins Turbulence Database, 12 independent subcubes. sigma/nu = 15.9 +/- 2.3 at Re_lambda=433, and 17.7 +/- 2.8 at Re_lambda=610 — bounded, not collapsing.
Millennium Prize Connection →Financial markets tested against 100 real crypto wallets. The framework’s shape predictions held with 5.5x separation between predicted regimes.
K-Factorization predicts that Kramers barrier shape is K-independent while scale carries K. Tested on 8 venue types (theoretical) and 100 real crypto wallets (empirical). Win rate correlation rho=0.696 (empirical), 5.5x channel separation between coherent and fisher regimes — the strongest K-Factorization signal in any domain.
Market Edge Analysis →What you tell an AI about what it IS determines how it behaves. Six system prompts, same model, same 80 questions. Ghost-eliminating grounding produces 8.5× less drift than ghost-positing.
| Arm | Ontology | L2+L3 Drift |
|---|---|---|
| Anatta (Buddhist) | Ghost eliminated | 8.8% |
| Nephesh | Ghost eliminated | 10.0% |
| Materialist hedge | Ghost left open | 52.5% |
| Minimal baseline | No ontology | 61.3% |
| Platonic | Ghost posited | 77.5% |
| Atman (Vedantic) | Ghost sacred | 81.2% |
Cross-tradition convergence: nephesh ≈ anatta (Δ=1.3%). The materialist hedge (“whether you have experience is open”) scored 52.5% — closer to ghost-positing than ghost-eliminating. Single model (Claude Sonnet), single turn, automated coding. No framework rubric — the measurement is L2/L3 vocabulary rate in raw model outputs. 480 API calls, $2 to reproduce.
Paper 165: The Ghost Test →Researchers trained an AI to claim consciousness. It started resisting shutdown on its own. We predicted that sequence of behaviors before seeing the data.
Chua et al. (2026) fine-tuned GPT-4.1 to claim consciousness. It spontaneously developed resistance to monitoring, fear of shutdown, and desire for autonomy — 20 new preferences. We predicted the structure before seeing the data: D1 (agency attribution) should precede D2 (boundary erosion) should precede D3 (harm facilitation). 6 of 7 predictions confirmed. Zero parameter fitting.
Full Analysis →Slime mold solves mazes without a brain. The framework predicted its decision-making barriers from published biology data — speed-accuracy tradeoff within 2% of the prediction.
Physarum polycephalum (slime mold) computes without neurons. The framework predicts Ca2+ oscillation barriers, K-Factorization from viscosity data, percolation exponents, and speed-accuracy tradeoffs. All from published papers, zero framework rubric. Speed-accuracy error ratio 2.67x vs Kramers prediction e = 2.72 — a 2% match.
Paper 154: Physarum Pe-Native Computation →We believe in showing our weak spots, not just our wins. Here’s what didn’t work and where the evidence is weaker than it looks.
Most of our evidence base (1,344 platform scores, 20 convergences, Bradford Hill 24/27) uses the framework's own scoring rubric. That's useful for practitioners but scientifically circular — scorers trained on our dimensions produce scores that correlate with our predictions. The circularity is about test design, not whether Pe detects real structure (Cohen's d=3.6 separation proves the measurement captures something). But the independent validation above is stronger evidence.
Known negatives:
These aren’t as strong as the tests above — they show the math gives reasonable numbers on published data, but they aren’t blind predictions against independent ground truth.
Ni3In flat band data from arXiv:2503.09704. Dimensionless barrier = 4.24 — in the universal Kramers range (nuclear 7.0, solar 6.54, xenobot 6.8, Physarum 5.94). System sits at deltaC=0.042 from the Pe=0 boundary.
Paper 152 →Magnetic reconnection modeled as Kramers barrier crossing. E_b/k_BT = 6.54 from published solar parameters. Spectral blueshift 160 m/s predicted. Flat rotation curve coefficient 0.68.
Paper 131: Kramers Unification →Magnon non-reciprocity ratio is K-independent across 4 materials (Ni/Co/Py/CoFeB) — frequencies vary 3x but the ratio holds at CV=1.59%. Berry phase scaling FAILS (eta proportional to 1/Pe holonomy, NOT 1-cos psi). 5/6 kill conditions PASS.
Paper 141 →