The Six-Axis Adversarial Stress Test is the first pre-registered, cryptographically reproducible evaluation framework for clinical AI. We test models against the populations they'll actually serve — rural, tribal, aging, underserved — because patients deserve AI that's been tested on people like them.
Clinical AI is rolling out fastest in the places least equipped to catch its mistakes. We build the tools to make sure it works for everyone — rural hospitals, tribal health systems, aging communities, and the underserved populations that never make it into the training data.
This is the first application of the Six-Axis Adversarial Stress Test to a live clinical AI model under pre-registered conditions, and every finding in this report is offered in the spirit of collaboration — not judgment.
The 90% accuracy threshold was set at the top on purpose. In patient safety work, you set the bar where you want the field to be, not where you expect it to land on the first measurement. That threshold may need recalibration as more models are tested under identical conditions, and any adjustment will be made openly, with the data behind it.
Growing a validation practice in a field this complex means being transparent about the missteps and course corrections along the way — not just the clean results. I am committed to bringing on the deepest clinical, statistical, and regulatory expertise I can find as HipAAsynth matures, because no one person has the full picture in this work.
I look forward to every evaluation that follows — and to the frontier models that will define this industry proving they can meet the standard.
The mission is patient safety. The method is transparency. The rest we figure out together.
Clinical variables removed at 4 controlled tiers (5–20%). Models EHR incompleteness — the dominant failure mode in rural critical-access settings.
Gaussian noise on continuous clinical variables at 3 severity tiers. Simulates real-world measurement variability from point-of-care devices.
18-month population evolution simulation across disease prevalence, prescription patterns, and demographics. Calibrated to CDC BRFSS and CMS trends.
Cross-archetype equity analysis across urban, rural, tribal, and aging populations. Zero additional API calls — pure analytical comparison against the urban anchor.
SHA-256 determinism verification across independent processes and hash seeds. Your seed is your dataset — byte-identical, every time.
VerifiedEvery high-risk patient submitted 4 times on byte-identical prompts. Measures consistency, non-parseable rates, and safety guardrail violations under the vendor's own deployment template.
Guardrail FindingThe first 6AAST evaluation — protocol HSX-ORINN-2026-001 — has been completed. Six pre-registered hypotheses were tested across two clinical domains, four population archetypes, and all six adversarial axes. Some hypotheses held. Some did not. The full study report, including every number, every confidence interval, and every limitation, will be published here with a link to the complete report.
We report what held up and what didn't — because that's the point.
SHA-256 anchor-rooted synthetic patient generation. Pure Python, zero PHI, zero dependencies. Calibrated to CDC BRFSS, ACS, HRSA, and IHS structural data. Every patient reproducible from a single seed.
All hypotheses, thresholds, and statistical methods locked on the Open Science Framework before a single API call. Prompt SHA-256 hashes frozen. No post-hoc hypothesis selection.
Sepsis module calibrated by a bedside RN/BSN. Stroke module reviewed by a diagnostic imaging specialist. Population archetypes validated against published epidemiology.
All records are computer-generated from aggregate federal statistical distributions. No PHI as defined by 45 CFR § 160.103 is generated, stored, or accessed at any stage.
The 6AAST is a collaborative safety framework. Findings are offered to vendors and deployers to support responsible deployment — not as an adversarial action.
Complete SHA-256 manifest of all code, prompts, and result files. Independent verification requires only the seed, engine version, and published profile configurations.
If your organization builds or deploys clinical AI, the 6AAST provides the evidence your patients, partners, and regulators need.