Blog
Post · 2026-05-19

"Preprint live: The Sparse Matrix of Drug Discovery: Sex, Race, and a Genomic Equity Index Across 40.8 Million Patients in 77,770 Clinical Trials"

medRxiv 10.64898/2026.05.14.26353197 minted 2026-05-19. 7-author roster: Anil Bajnath, Allana Roach, Rajini Haraksingh, Irman Forghani, Alexander N. Evans, Elena Cyrus, Dexter Hadley. Pipeline-1 preprint gate cleared; journal submission cycle mandated per CONTENT/MANUSCRIPTS/CLINICAL-TRIALS-RACE/CANON.md ladder. A preprint DOI is not a finish line. It is a mandate to submit.

The canonical long-form lives at hadleylab-canonic/CONTENT/MANUSCRIPTS/CLINICAL-TRIALS-RACE/draft-v1.md with SHA-256 attestation on the CANONIC ledger. The medRxiv preprint is the downstream marketing channel; the governance-native primary publication is the content-hashed long-form at hadleylab.org. Per the blog-hash-ledger canonicity rule, peer review is a marketing surface on top of the canonical hash, not a replacement for it.

What the paper measures

Every clinical trial with posted results on ClinicalTrials.gov — 77,770 studies, 40.8 million participants, two decades — downloaded, frozen 1 May 2026, hashed. Built into a matrix: 15 therapeutic areas against ancestry populations. The cells for Indigenous American and Pacific Islander populations are empty. Not under-powered. Empty. There is not a single drug class in which these populations have adequate trial evidence.

In industry-sponsored oncology Phase III trials — where cancer drugs earn their FDA labels — African-derived enrollment is 1.8% against a cancer mortality burden of 15%. African-derived patients are more likely to appear in early Phase I safety trials than in late Phase III efficacy trials. The risk flows in; the evidence flows out.

The senior author has shown at the bedside that the named sex disparity in coronary artery bypass grafting (CABG) does not originate at the operating-room door but in diagnostic timing: women wait longer from first encounter to diagnostic catheterization, with no sex difference in time from cath to surgery. The downstream record looks like sex-biased surgery. The upstream record — who entered the diagnostic pipeline — is where the bias lives. This paper extends that model to population scale. Female enrollment runs the opposite Phase I → Phase III direction from race, concentrated in industry-solo trials, and the two axes are statistically independent at the trial level. Same cause, two parallel readouts — not one reinforcing race × sex effect.

Split all 77,770 trials by who wrote and ran them, and the industry × academia joint trial shows the steepest enrollment-loss gradient of all — twice as steep as industry-led trials alone. The protocol governs the demographics; academic execution under an industry-written protocol launders the protocol's structure into the results.

Combine the trial data with seven genomic-database measures into a Genomic Equity Index, and the number lands: 3.94 billion people receive precision medicine calibrated for the 11% of humanity who are European-derived. The clinical-trial gap and the genomic-database gap are not two problems. They are one governance failure, measured at two points along the drug-development pipeline.

The four invariants

The conventional reading concluded that the genomic-database gap is the root cause of unequal precision medicine. That is half right: the database gap is real, but it is downstream — the four axioms below trace the failure back to the trial, and to the protocol before the trial.

The manuscript's CANON.md declares four axioms that hold across every claim in the paper.

TRIAL_PRECEDES_DATABASE. Precision medicine fails at the trial before it fails at the database. The literature has been treating reference-panel sparsity as the upstream cause of unequal genomic interpretation; the data say the opposite. The reference panel inherits its demographic shape from the trial population, and the trial population inherits its demographic shape from the protocol. Bias enters at protocol authorship, propagates through enrollment, and only later surfaces as a missing variant call. Reference-panel diversification programs that do not also fix the enrollment pipeline are sandbagging a downstream symptom.

SPARSE_MATRIX_MEASURED. 77,770 trials compile into a 15-therapeutic-area by 5-ancestry-population matrix of evidence density. The empty cells are not statistical artifacts; they are populations that received drugs tested on someone else. For Indigenous American and Pacific Islander populations, there is not a single drug class in the matrix with adequate Phase III evidence. The sparsity is uniform across therapeutic areas, which means the failure mode is not topic-specific (oncology vs cardiology) but structural (who enters the trial pipeline).

GENOMIC_EQUITY_INDEX. The Genomic Equity Index bridges two literatures — the clinical-trials disparity literature and the genomic-databases disparity literature — that have been published in parallel for a decade without converging on a shared metric. GEI collapses double-sparsity into one measurable quantity per population. The 10,000-iteration Dirichlet sensitivity analysis is included to demonstrate that the population rankings are robust to component-weight perturbation; the index is not a function of how you tune it.

PIPELINE_1_MULTI_TRIAL. This is an original-investigation paper, not a review. It is paired with PRECISION-MEDICINE-INACCURATE (NEJM Sounding Board, editorial format) and COMMUNITY-LEARNING (JAMA Network Open, the governance proof). The three manuscripts compose one argument: the problem (CTR), the diagnosis (PMI), and the proof of the fix (CL). Each paper stands alone scientifically; the three together are the case.

What this means for the submission ladder

The medRxiv DOI lifts the Pipeline-1 preprint gate. A Pipeline-1 paper cannot be submitted to a journal before its preprint DOI posts; that constraint is now cleared. The rung-0 lede-fit gate cleared the same window — VENUE_LEDE_FIT 0.6874 / 0.60, the abstract's first paragraph rewritten to the Nature Medicine Analysis corpus token distribution without dropping a single load-bearing claim. Both gates green. Nature Medicine submission fires next.

The Tier-0 direct ladder runs Nature Medicine → NEJM Original Article → JAMA Original Investigation → Lancet → Lancet Oncology. Rejection at any rung advances to the next, ledgered as a state transition. The Tier-1 medRxiv-transfer ladder (The BMJ, PLOS Medicine, American Journal of Epidemiology, BMJ Global Health) is held until Tier 0 exhausts. The preprint is not a terminal state — under Pipeline 1, DOI issuance mandates the journal cycle.

Sources

Claim Source Link
medRxiv DOI 10.64898/2026.05.14.26353197 minted 2026-05-19; posting a preprint assigns a citable DOI and mandates the journal cycle for clinical work medRxiv preprint posting policy and FAQ medrxiv.org
African-derived and minority enrollment in oncology trials runs far below disease burden; FDA diversity action documents 1.6–18% minority representation Clinical Trial Diversity in Oncology: FDA Post-Marketing Requirements, PMC pmc.ncbi.nlm.nih.gov
Restrictive eligibility and late-phase trial siting exclude racial and ethnic minority patients from cancer studies ASCO–ACCC joint statement on increasing racial and ethnic diversity in cancer clinical trials ascopubs.org
The matrix of 77,770 trials and 40.8 million patients; the Genomic Equity Index; four CANON invariants draft-v3-medrxiv.md and CANON.md § Axiom, CLINICAL-TRIALS-RACE medrxiv.org
7-author roster: Bajnath, Roach, Haraksingh, Forghani, Evans, Cyrus, Hadley SUBMISSION-PACKAGE.md frontmatter, CLINICAL-TRIALS-RACE
Rung-0 VENUE_LEDE_FIT 0.6874 / 0.60 PASS; paired manuscripts PMI and CL CANON.md submission_ready_rung_0_passing, cross_refs:

Preprint Live | ENGINEERING | BLOGS