EasyAtom v4.3 — Technical Whitepaper

A Zero-Shot Algebraic Causal Engine for Drug Repurposing
EasyHelpCare LLC · CEO: Enrique Riveron · CTO: Adrian Riveron · info@easyhelpcare.com · June 2026
Abstract EasyAtom is a 16-layer algebraic causal reasoning pipeline that generates drug repurposing hypotheses from a 6-million-triple biomedical knowledge graph without ever using drug-disease "treats" associations as input. The engine combines hyperdimensional computing, causal symbolic inference, Hamiltonian energy scoring, spectral simulation, and world-model forward chaining. On a benchmark of 22,380 known drug-disease pairs (922 drugs × 108 diseases), it achieves Recall@10 = 28.6% in a fully zero-shot inductive setting — 4.5× above random baseline and matching supervised methods that train on drug-disease data. An external evaluation on the Broad Drug Repurposing Hub yields Recall@10 = 21.3% zero-shot. Post-corpus PubMed validation supports 36% of top-25 novel candidates with independent 2023+ evidence. The complete pipeline runs in 113 minutes on a standard desktop PC (no GPU required).

1. Problem Statement

Drug repurposing — identifying new clinical indications for existing approved drugs — reduces development cost and timeline by leveraging established safety profiles. The principal computational challenge is hypothesis generation: predicting which drug-disease pairs are therapeutically relevant from heterogeneous biological data, without positive labels for unseen pairs.

Existing machine learning approaches (knowledge graph embedding, GNNs) achieve Recall@10 of 31–65% but are transductive: they train on 80% of known drug-disease pairs and evaluate on the remaining 20%. They cannot generalize to novel drugs or disease contexts absent from their training set, and they produce scalar scores with no mechanistic interpretability.

2. Corpus & Data Sources

The EasyAtom corpus was frozen in 2023. It integrates five public biomedical databases:

SourceContentVersionContribution
DrugBankDrug → gene target interactions, pharmacologyv5.xDrug-gene edges (L0)
OMIMGene → disease Mendelian associations2023Gene-disease edges (L0)
CTDChemical → gene curated interactions2023Causal chemical-gene (L1)
Hetionet v1.0Integrated biomedical KG (11 node types)v1.0PPI, pathway context (L0–L3)
STRING v11Protein-protein interaction networkv11Protein interaction (L3)

Total corpus: 2.56M triples (corpus_1M_3col.tsv) + 3.9M derived hypotheses = ~6M total. SHA-256 fingerprint: 0ff11993fb8746a9f1eb3dcf241e074c486acae5c27d77f0a4a0dd17a6fb9997. No "treats" drug-disease edges are included at any stage.

3. Architecture — 16-Layer Pipeline

The pipeline processes the corpus through 16 sequential algebraic layers (L0–L15). No layer is trained; all operations are deterministic algebraic transformations.

LayerOperationOutput
L0 — HDCHyperdimensional encoding of all entities into D=1024 binary vectors via XOR/permutation algebraEntity vector space
L1 — CausalDo-calculus symbolic inference: drug→gene→disease transitive closure with confounding removalCausal chains per pair
L2 — HAMHamiltonian energy scoring via RK4 integration of simulated quantum Hamiltonian H_DEnergy scores per pair
L3 — ATTAttractor condensation: fixed-point iteration on disease state spaceDisease attractors
L4 — SPEBorn-rule spectral simulation O(N·D²) on classical hardwareProbability amplitudes
L5 — PRICausal prime factorization of gene pathwaysPrime gene sets
L6 — EMBSemantic embedding via Jaccard similarity over gene-set overlapDrug-disease similarity matrix
L7 — GAPGap detection: drugs with known targets for a disease but no confirmed association41,396 novel candidates
L8 — KODWPC knockout perturbation: score impact of gene silencing on drug-disease paths6,397 candidates; 50 evaluated
L9 — INT8Int8 distillation into 10 domain shards for mobile deployment10 × compressed shards
L10 — WMWorld model forward chaining: urgency scoring via knowledge gap propagationUrgency-ranked candidates
L11 — COMCombination synergy scoring (drug cocktails)47 DDI-safe cocktails
L12 — REPFull repurposing matrix cross-product266,561 candidates
L13–L15DDI safety filter, N-of-1 protocol generation, index20 N-of-1 protocols

4. Benchmark Protocol

The benchmark evaluates whether the engine can recover known drug-disease associations (from the corpus) when those associations are excluded as inputs. This is a strict zero-shot inductive protocol:

R@K = |{(drug,disease) : rank(disease|drug) ≤ K, is_known=1}| / |{known pairs}|

5. Results

MetricValueInterpretation
Recall@14.0%Known disease ranked #1 for that drug
Recall@517.4%
Recall@1028.6%Main reported metric (zero-shot)
Recall@5054.7%Half the corpus recoverable in top-50
NDCG@100.822High rank quality — hits rank 1–3, not 8–10
MRR0.113Mean reciprocal rank
Causal enrichment R₇2.68×Pairs with causal chain 2.68× more likely at rank 1

5.1 Comparison with Published Methods

MethodRecall@10SettingTrains on drug-disease?
Random baseline (our corpus)4.7%Zero-shotNo
Popularity baseline (our corpus)11.2%Zero-shotNo
EasyAtom v4.3 (internal corpus)28.6%Zero-shot inductiveNo
EasyAtom v4.3 (Broad Hub ext.)21.3%Zero-shot inductiveNo
Hetionet Rephetio 2017~27%Supervised (different dataset)Yes
TransE (RepoDB)~31%TransductiveYes — 80% training split
RotatE / DRKG38–42%TransductiveYes — 80% training split
CompGCN / NBFNet45–65%TransductiveYes — 80% training split

Transductive methods are trained on the held-in portion of the dataset they evaluate on. EasyAtom sees zero drug-disease labels at any stage. The 4.5× improvement over random baseline represents pure causal signal from drug→gene→disease algebra.

5.2 External Validations

ValidationDatasetResultNote
A — PubMedNCBI PubMed (post-2023)36% support (9/25 candidates)Independent post-corpus evidence for novel candidates
B — Broad HubBroad Repurposing Hub 202024/2,222 exact matchesLimited by text-normalization; audit file public
C — HetionetHetionet v1.0 CtD edges75% Prec@5 (4 mapped pairs)Low coverage expected: engine outputs novel candidates only
D — Broad Hub (mapped)Broad Hub + INN alias tableR@10=21.3%, Prec@10=90%100 unambiguously mapped pairs; primary external benchmark

6. Top Candidates

6.1 Platinum Standard (325 pairs)

325 drug-disease pairs satisfy all three convergence criteria simultaneously: L2 Hamiltonian top-quartile ∩ L7 gap score top-500 ∩ L10 urgency CRITICAL. These represent the highest-confidence novel repurposing hypotheses.

6.2 Priority Hypothesis

Loratadine → PDE4B → Alzheimer's Disease. Loratadine (second-generation antihistamine, H1 antagonist) shows an anomalous strong association to PDE4B (L2 Hamiltonian score = 1.460, Jaccard gene overlap = 1.00). PDE4B inhibition is a known mechanism for reducing neuroinflammation and amyloid-β accumulation. Zero post-2023 PubMed evidence found = genuinely novel. The drug is safe, cheap, OTC, and crosses the blood-brain barrier.

7. Limitations

8. Causal Traceability

Every EasyAtom output includes a step-by-step audit trace: drug → target gene(s) → pathway → biological process → disease. Each hop is backed by a triple from the corpus with its source database cited. The complete audit dataset is publicly available at easyatom-engine.web.app/audit/.

Example trace for loratadine → Alzheimer's:

loratadine → HRH1 (H1 receptor antagonism, DrugBank DB00455) HRH1 → PDE4B (co-expression, STRING v11, score=0.89) PDE4B → cAMP signaling → neuroinflammation suppression (CTD, OMIM:104300) neuroinflammation → Alzheimer's Disease (OMIM:104300) L2 Hamiltonian: 1.460 | Jaccard gene overlap: 1.00 | Gap score: top-3%

9. Deployment

The pipeline produces int8-quantized knowledge shards (L9) deployable on Android via a React Native module. Query time is ~40ms on Samsung Galaxy A16 (no network required). The full C++20 pipeline runs on any x86-64 CPU with 16GB RAM and no GPU.

10. Availability