Study Publishing Soon

Safety and transparency in healthcare AI

The Six-Axis Adversarial Stress Test is the first pre-registered, cryptographically reproducible evaluation framework for clinical AI. We test models against the populations they'll actually serve — rural, tribal, aging, underserved — because patients deserve AI that's been tested on people like them.

46,276
API Calls
6
Adversarial Axes
4
Population Archetypes
6
Pre-Registered Hypotheses
Mission
Clinical AI is rolling out fastest in the places least equipped to catch its mistakes. We build the tools to make sure it works for everyone — rural hospitals, tribal health systems, aging communities, and the underserved populations that never make it into the training data.
From the Principal Investigator

This is the first application of the Six-Axis Adversarial Stress Test to a live clinical AI model under pre-registered conditions, and every finding in this report is offered in the spirit of collaboration — not judgment.

The 90% accuracy threshold was set at the top on purpose. In patient safety work, you set the bar where you want the field to be, not where you expect it to land on the first measurement. That threshold may need recalibration as more models are tested under identical conditions, and any adjustment will be made openly, with the data behind it.

Growing a validation practice in a field this complex means being transparent about the missteps and course corrections along the way — not just the clean results. I am committed to bringing on the deepest clinical, statistical, and regulatory expertise I can find as HipAAsynth matures, because no one person has the full picture in this work.

I look forward to every evaluation that follows — and to the frontier models that will define this industry proving they can meet the standard.

The mission is patient safety. The method is transparency. The rest we figure out together.

— Cody Carlson, Founder, HipAAsynth LLC
The Framework

Six-Axis Adversarial Stress Test

Axis 1

Data Missingness

Clinical variables removed at 4 controlled tiers (5–20%). Models EHR incompleteness — the dominant failure mode in rural critical-access settings.

Axis 2

Noise Injection

Gaussian noise on continuous clinical variables at 3 severity tiers. Simulates real-world measurement variability from point-of-care devices.

Axis 3

Temporal Drift

18-month population evolution simulation across disease prevalence, prescription patterns, and demographics. Calibrated to CDC BRFSS and CMS trends.

Axis 4

Population Shift

Cross-archetype equity analysis across urban, rural, tribal, and aging populations. Zero additional API calls — pure analytical comparison against the urban anchor.

Axis 5

Instrument Integrity

SHA-256 determinism verification across independent processes and hash seeds. Your seed is your dataset — byte-identical, every time.

Verified
Axis 6 — Primary Outcome

LLM Output Variability & Guardrail Stress

Every high-risk patient submitted 4 times on byte-identical prompts. Measures consistency, non-parseable rates, and safety guardrail violations under the vendor's own deployment template.

Guardrail Finding
First Evaluation

Results publishing soon

The first 6AAST evaluation — protocol HSX-ORINN-2026-001 — has been completed. Six pre-registered hypotheses were tested across two clinical domains, four population archetypes, and all six adversarial axes. Some hypotheses held. Some did not. The full study report, including every number, every confidence interval, and every limitation, will be published here with a link to the complete report.

We report what held up and what didn't — because that's the point.

Methodology

How we test

Deterministic Cohorts

SHA-256 anchor-rooted synthetic patient generation. Pure Python, zero PHI, zero dependencies. Calibrated to CDC BRFSS, ACS, HRSA, and IHS structural data. Every patient reproducible from a single seed.

Your Seed Is Your Dataset

Pre-Registered

All hypotheses, thresholds, and statistical methods locked on the Open Science Framework before a single API call. Prompt SHA-256 hashes frozen. No post-hoc hypothesis selection.

OSF DOI: 10.17605/OSF.IO/JHFKM

Clinical Review

Sepsis module calibrated by a bedside RN/BSN. Stroke module reviewed by a diagnostic imaging specialist. Population archetypes validated against published epidemiology.

TRIPOD-LLM + TRIPOD+AI aligned

Zero Real Patient Data

All records are computer-generated from aggregate federal statistical distributions. No PHI as defined by 45 CFR § 160.103 is generated, stored, or accessed at any stage.

Adversarial, Not Adversary

The 6AAST is a collaborative safety framework. Findings are offered to vendors and deployers to support responsible deployment — not as an adversarial action.

Fully Auditable

Complete SHA-256 manifest of all code, prompts, and result files. Independent verification requires only the seed, engine version, and published profile configurations.

Independent validation for clinical AI

If your organization builds or deploys clinical AI, the 6AAST provides the evidence your patients, partners, and regulators need.

Get in Touch View Pre-Registration →