OverviewModelsREI · Apollo › Changelog

REI · Apollo — the journey

Historical record from JOURNEY.md — Apollo web hub decommissioned 2026-06-01; model unaffected

rendered from notes/REI/JOURNEY.md

REI · Apollo — the journey, the comparison, and the state

⚠ Decommission note (2026-06-01). The Apollo Next.js web hub described in this document (the web/ tree → 8020rei-new-model.web.app) was decommissioned: web/ was removed from the repo and the 8020rei-new-model Firebase hosting site was permanently deleted (the URL now returns HTTP 404). The live model surface is now the 8020IQ Models Wiki at models-8020iq.web.app, served from platform/ as plain static HTML. This is a historical record — the Apollo model itself is unaffected; only the web hub is gone. All references below to 8020rei-new-model.web.app and the web/app/** source paths are preserved as a record.
TL;DR. Apollo is a per-county supervised classifier (HistGradientBoosting + isotonic calibration, 117 features over 4.05M parcels across 5 counties) that replaces Alpha, 8020REI's 25-signal hand-weighted heuristic, at step 4 of the Gaia ETL. As of 2026-05-08 it beats Alpha by a 3.03× geomean Lift@top-1% across all five counties and 5.72× across the three where the lift is statistically distinguishable from noise (Jackson 7.87×, Harris 6.92×, Maricopa 3.43×); Miami (1.22× ± 0.27) and Philadelphia (1.12× ± 0.21) sit inside the 95% CI of 1.0×. Nine of ten audit ship-blockers are closed; Scenario A (recency-feature leakage on embargoed Fold 5) is FLAG-band pending V2 ablation, and the locked March 2025 head-to-head test is gated on written sign-off from Eduardo and Camilo. State as of distillation: 2026-05-25 (REI bucket created; CallZeke moved to Roofing).

1 · The macro project

Problem being solved

8020REI is a deal-sourcing engine for small investors operating in 14 states with active county-level campaigns in five. The business runs on ranked lists of properties delivered to acquisition teams who work outreach off-market. Speed and precision matter equally: lists too broad waste acquisition bandwidth; lists that miss real opportunities cost deal flow.

Source: web/app/context/page.tsx:50-66.

The 8-week rock

Competitive build with Camilo, coached by Eduardo, weekly Thursday check-ins. Apollo is the supervised replacement for Alpha at step 4 of the Gaia 7-step ETL — the first training loop inside Gaia.

Source: web/app/decks/archive/22-ceo-summary-2026-04-27/page.tsx:70-73 (PullQuote).

Win condition (locked, three bars)

All three must clear:

BarThresholdStatus
Top-decile recall≥ Alpha AND ≥ Camilo on locked March 2025 cohortNot yet scored (gated on sign-off)
CalibrationWithin ±15% on 30/60/90-day deal-rate bucketsAchieved on 4 of 5 counties; Jackson at honest floor
TransferabilityPer-county model trained on county-X data explains county-X outcomesPer-county architecture validated; pooled costs 15–27% AUC-PR

Source: web/app/decks/archive/19-current-state-2026-04-22/page.tsx:67-86; web/app/context/page.tsx:227-246.

Players

RoleNameLane
BuilderIgnacio ArayaApollo (DS, model, features, pipeline)
CompetitorCamiloParallel model, baseline artifact pending
CoachEduardoSign-off authority, P/R/F1 evaluation against client deals
CadenceWeekly Thursday check-ins

Source: web/app/decks/archive/19-current-state-2026-04-22/page.tsx:67-86; web/app/decks/archive/22-ceo-summary-2026-04-27/page.tsx:194-209.

Sandbox & coverage

PropertyValueSource
Active states14context/page.tsx:67-75
Active counties (pilot)5context/page.tsx:67-75
Sandbox time span2021-01 → 2025-09 (57 months)data/page.tsx:235-239
Sandbox storage680 GBcontext/page.tsx:67-75
Total parcels scored at T0=2025-094,052,593 (sometimes given as 4.05M / 5.17M including non-residential strata)data/page.tsx:28-34, 244-251; brief/page.tsx:62-67

Per-county parcels at T0=2025-09 (data/page.tsx:28-34):

FIPSCountyStateParcels
04013MaricopaAZ1,384,985
48201HarrisTX1,226,790
12086Miami-DadeFL782,077
42101PhiladelphiaPA428,931
29095JacksonMO229,810
Total4,052,593

2 · Alpha — the incumbent

What Alpha is

A weighted sum of 25 distress indicators. Weights set by hand, tuned on Miami, unchanged since launch. PreforeclosureDistress carries weight 6.0; 16 other signals trail between 0.25 and 1.0.

Source: web/app/decks/01-why-apollo/page.tsx:57-71; web/app/context/page.tsx:138-156.

How it scores

  • No training loop
  • No outcome feedback
  • No re-weighting as markets shift
  • No mechanism to explain which signal fired on a given property — score 72 is an opaque sum, not a ranked list of reasons

Source: web/app/decks/01-why-apollo/page.tsx:57-89; web/app/context/page.tsx:138-156.

Where Alpha falls short (deck claims)

Failure modeMechanismEvidence
Frozen calibrationStatic weights, last tuned 2021context/page.tsx:138-146
Miami-tuned onlyWeights don't transfer to TX/MO/AZcontext/page.tsx:148-156
Cannot explainSum gives no per-feature attributioncontext/page.tsx:148-156
Wrong feature orderingThe 25 it weights are not the 25 that matter most empiricallycontext/page.tsx:148-156
Distress signals don't clear barDistress forensics: only 3 of Alpha's signals clear 5× lift (Preforeclosure 5.44×, Probate 3.46×, Affidavit 2.32×)decks/01-why-apollo/page.tsx:115-124

Why Alpha is still the baseline

  • It is the production scorer (step 4 of Gaia)
  • Apollo's win condition is defined against it ("recall ≥ Alpha")
  • The head-to-head is the gate to Phase 4

Source: web/app/decks/archive/01-macro-project/page.tsx:54-65.


3 · Apollo — the contender

What Apollo is

A supervised gradient-boosted classifier replacing Alpha (step 4 of Gaia) with: per-county HistGradientBoosting, isotonic calibration on a held-out non-downsampled slice, walk-forward folds, and a CRM-leak guard. Output contract identical to Alpha: 0–100 score per property within county.

Source: web/app/decks/archive/22-ceo-summary-2026-04-27/page.tsx:54-69.

Architecture overview

INPUT  : T0 month-end silver snapshot · 481 columns
TRIAGE : 481 → 117 curated features (sparse / constant / leaky dropped)
TRAIN  : per-county HistGradientBoosting · seed=42 · early_stopping=False
         training T0 ≤ 2025-03 · CRM-leak guard drops is_crm_matched_anywindow=1
CALIB  : Isotonic regression on held-out non-downsampled slice (~60K rows/county)
RANK   : Within-county percentile of calibrated_probability_isotonic → score_0_100
AUDIT  : 69 deterministic sanity checks (monotonicity · prevalence · ECE · CRM · numeric)

Sources: web/app/decks/02-how-apollo-trains/page.tsx:56-100, 230-285; web/app/decks/archive/22-ceo-summary-2026-04-27/page.tsx:115-135.

Why HistGB beat the field

Architecture chosen via the 5×4 ablation matrix (5 counties × {HistGB, LightGBM, logistic, random forest}) on Fold 1. HistGB never loses by a meaningful margin and wins three counties outright.

CountyHistGB AUC-PRLightGBM AUC-PRWinner
Maricopa0.2740.271HistGB
Harris0.1920.186HistGB
Jackson0.1660.151HistGB (+10.3%)
Miamitie (Δ<0.002)tie (Δ<0.002)tie
Philadelphiatie (Δ<0.002)tie (Δ<0.002)tie

Logistic regression collapses 50–72% vs LightGBM. Philly: 0.194→0.055. Harris: 0.186→0.089. The problem is non-linear; tree splits on tenure curves, leverage×valuation interactions, and distress trajectory families earn their keep. Random forest trails both GBMs everywhere.

Source: web/app/decks/04-where-it-wins/page.tsx:206-249.

Pooled rejected — per-county wins

Cross-county transfer (Harris→Maricopa): AUC-PR 0.201 vs native 0.274, a 27% drop. Five separate HistGB models, each with its own isotonic calibration, is the production configuration. Finding 11 measured 15–27% AUC-PR cost on cross-county transfer.

Source: web/app/decks/04-where-it-wins/page.tsx:235-244; web/app/brief/page.tsx:109-116.

Feature tiers (per CLAUDE.md §Data conventions, mirrored in data/page.tsx:47-82)

TierNameExamplesNote
AProperty physicalparcel size, building area, living area, year built, use typeMost stable; in Miami, property_age alone = 92.7% of importance
BOwner + distress23 distress trajectories, absentee level, leverage ratio, days-ownershipInformation-dense; 3 signals under leakage audit
CValuation + activityAVM, assessed value, market value, appreciation rate, valuation gapValuation-gap feature broken at data layer (V2 repair queued)
DDate-derivedmortgage_age_months, listing_duration_months, months_since_prev_saleUnder active leakage audit — AUC-PR contribution not validated until ablation completes
ENational macro · FREDmortgage rate 30yr, Fed funds, HPI, CPI, unemploymentZero within-T0 variance; V2.1 interaction features unlock cohort signal
FLocal market contextBLS county unemployment, ACS county median income, FHFA state HPICurrently being wired in

Counts of source columns

  • Silver carries 481 columns per row (First American provider + 8020REI distress trajectories + ETL metadata)
  • Two-reviewer triage: 117 included, 359 excluded (sparse >70% null, constant, leaky, redundant)
  • 77% of included columns have meanings sourced directly from the First American data dictionary (8 dictionaries, 984 provider-authoritative defs)
  • 25 hand-engineered synthetic features; eight of the top 15 importance slots are occupied by synthetics

Sources: web/app/decks/02-how-apollo-trains/page.tsx:112-129, 200-217; web/app/data/page.tsx:316-322.

Training method

  • Six expanding walk-forward folds. Fold 1 trains 15 months. Fold 6 trains 45 months. Each subsequent fold absorbs the previous eval window.
  • Horizon = 6 months. Train on history up to T0; predict on properties observed at T0; score on outcomes at T0+6.
  • Embargo = 1 month. Eval window shifted past prediction horizon; properties sold inside the embargo dropped from both train and eval. Closes the gap where a property listed at T0 and sold at T0+1 could carry signal into training while its outcome is visible.
  • T0 anchor. The feature builder reads only as-of-T0 columns via base_globs = _globs([t0], fips). No future data crosses the boundary.
  • Training T0 cap = 2025-03. The six-month horizon ends 2025-09, one month before first inference window T0=2025-10 — zero overlap.

Sources: web/app/decks/02-how-apollo-trains/page.tsx:56-100; web/app/data/page.tsx:444-499.

CRM-leak guard

Properties that 8020REI had already worked through its CRM are dropped via is_crm_matched_anywindow = 1. Not down-weighted, not isolated — dropped.

  • 4,431 CRM deals → 2,463 silver-matched after address join
  • All 2,463 carry the flag and never enter training
  • Verified as one of 69 deterministic sanity checks every run

Sources: web/app/data/page.tsx:444-499; web/app/decks/02-how-apollo-trains/page.tsx:241-249.

Other safety nets

  • 1,500× feature cache makes iterative training practical (30s vs 0.02s per period)
  • 69/69 deterministic sanity checks pass on every run before any ZIP ships (monotonicity, prevalence stability, ECE, CRM, numeric integrity)
  • Test suite: 0.41s, 5 categories (ZIP validator, cohort map, score formula, prefix collision, filter behavior)
  • 186.7M-row overnight audit retired 28 dead columns: stories (sentinel code 100), is_listed (binarizer bug, always 0 despite 1.66M "Y" rows), vacant_flag (99.2% null), 7 distress trajectories with max=0.0 across all 186M rows

Sources: web/app/decks/02-how-apollo-trains/page.tsx:252-275; web/app/brief/page.tsx:232-249.


4 · Apollo vs Alpha — head-to-head numbers

Locked evaluation window

Fold 5 embargoed: train 2021-01..2024-03, eval 2024-10..2025-03, residential-wide (SFH + Condo + Townhouse + 2-9 units). Same window for Alpha and Apollo — apples-to-apples.

Source: web/app/decks/archive/20-executive-submission/page.tsx:93-99.

Headline metrics — per county (Fold 5 embargoed)

CountyFIPSApollo Lift@1%Alpha Lift@1%Lift ratio95% CI half-widthStat-sig vs 1.0×AUC-ROCDeck source
Jackson2909515.36×1.95×7.87×±0.23YES0.76brief/page.tsx:138-143
Harris4820113.54×1.96×6.92×±0.11YES0.82brief/page.tsx:138-143
Maricopa0401316.76×4.88×3.43×±0.21YES0.83brief/page.tsx:138-143
Miami1208610.09×8.30×1.22×±0.27NO0.69brief/page.tsx:138-143
Philadelphia421012.42×2.16×1.12×±0.21NO0.66brief/page.tsx:138-143

Citations: web/app/decks/04-where-it-wins/page.tsx:67-94, 152-160; web/app/decks/archive/22-ceo-summary-2026-04-27/page.tsx:83-105.

Dual-geomean framing (non-negotiable comms rule)

GeomeanValueUse case
All five counties3.03×The honest all-markets headline; ships with the deliverable
Signal-three (Jackson, Harris, Maricopa)5.72×Where Apollo clearly separates from Alpha
"Both numbers travel together, or neither does." — findings/41_alpha_head_to_head.md, quoted in brief/page.tsx:69-72, decks/05-the-submission/page.tsx:132-136.

Computed: (3.43 × 6.92 × 7.87)^(1/3) = 5.72× (Deck 24 mathematical re-audit).

Calibration (Fold 5 embargoed, T0=2025-09 inference)

CountyRaw BSSIsotonic BSSECE top-10% reductionVerdict
Miami−0.0008+0.001187%First positive BSS in project
Maricopanot in source+not in sourcePositive BSS post-isotonic
Harrisnot in source+95%Positive BSS post-isotonic
Philadelphianot in source+in 69–95% bandPositive BSS post-isotonic
Jackson−0.0041−0.0003in 69–95% bandHonest floor (not a pass)

Sources: web/app/decks/05-the-submission/page.tsx:160-175; web/app/decks/archive/20-executive-submission/page.tsx:143-157.

Top-decile ECE improvement: 69–95% across all five counties (web/app/context/page.tsx:251-258).

Fold 1 Miami baseline (the deck that opened the project)

MetricApolloAlphaNotes
AUC-PR0.2590.0308.7× ratio
Precision@top-1%48.6%6.4%7.6× ratio
Recall@top-10%53.8%16.7%3.2× ratio
Brier score0.023Inside 0.025 calibration target

Source: web/app/decks/archive/05-fold1-vs-alpha/page.tsx:58-83.

Caveat on Fold 1 Miami: measured pre-embargo; the legacy "33×" claim that appeared in early decks came from a window with 5-month label overlap, since closed. The Fold 5 embargoed Miami ratio is 1.22× — much narrower. See web/app/decks/archive/20-executive-submission/page.tsx:101-105.

Top-5 SHAP gain features on Fold 1 Miami

RankFeatureSHAP gainOrigin
1days_ownership3,146engineered
2lot_size_sqft2,581raw → synthetic coalesce
3property_age_years2,546engineered (synthetic from YearBuilt)
4assd_total_value2,100raw provider
5market_total_value1,900raw provider

13 of top 25 by SHAP gain are engineered, not raw.

Source: web/app/decks/archive/05-fold1-vs-alpha/page.tsx:148-159.

Property age dominance — Miami vs others (finding 54 stratified ablation)

Age bandEval rowsDeal rateWithin-band AUCΔ vs full 0.6942Lift@1%
< 20 yr (post-2005)756,7830.00030.6068−0.08742.87×
20–50 yr (1976–2005)1,855,4990.00040.6359−0.058310.80×
≥ 50 yr (pre-1976)2,023,4710.00140.6767−0.01757.29×

Verdict: BUY-BOX. Within-band AUC collapses 0.0175–0.0874 when age is removed. 4.7× deal-rate spread (0.0003 → 0.0014) is structural population separation, not within-band motivation.

Source: web/app/decks/archive/24-scientific-audit-2026-05-07/page.tsx:213-243.

property_age_years alone explains 92.71% of feature importance in Miami; Gini coefficient 0.94; 69× dominance gap over the second feature. In Maricopa / Philly / Harris / Jackson the top-feature ratio is only 1.02×–1.11× — Miami is structurally a different model.

Source: web/app/decks/archive/24-scientific-audit-2026-05-07/page.tsx:203-207, 272-277.


5 · The data backbone

The sandbox

  • 14 states, 680 GB of monthly snapshots
  • 57 month-end snapshots covering 2021-01 → 2025-09
  • 4.05M total scored parcels at T0=2025-09
  • 5 active counties (pilot)

Source: web/app/data/page.tsx:235-251.

T0 conventions

  • T0 = month-end timestamp, stored YYYY-MM string
  • Features computed as-of T0 month-end (_month_end in src/new_model/features.py)
  • Horizon: 6 months; y_sold = 1 iff any sale recorded in T0+1..T0+6
  • FIPS always 5-digit zero-padded string. f"{fips:05d}" in Python; string-type in CSV/JSON

Source: web/app/data/page.tsx:293-298.

BuildZoom permit refresh

Snapshot dateCohort permitsS3 bytesVerdict
2026-04-2864,51332 MB"Structural data ceiling" (finding 52)
2026-05-0715,645,153 (242× growth)15 GB / 2,851 part-filesFinding 52 obsolete

Per-county coverage at the 2026-05-07 refresh (data/page.tsx:394-425):

FIPSCountySilver propsLifetime permitsProps w/ permitCoverageRecent 24m
29095Jackson MO304,0441,392,278270,75989.1%137,470
12086Miami-Dade FL924,4264,723,912597,11164.6%319,663
48201Harris TX1,592,5246,004,159861,03954.1%423,912
04013Maricopa AZ1,701,7932,550,776764,26544.9%312,165
42101Philadelphia PA588,987974,028232,37839.5%112,816
5-county cohort5,111,77415,645,1532,725,55253.3%1,306,026

S3 prefix: s3://8020rei-sandbox/ignacio_sandbox_roofing/. Jackson's coverage lead is not permit density — it's the smallest silver universe, so a moderate permit count saturates it.

Sources: web/app/data/page.tsx:362-434; web/app/decks/06-the-audits/page.tsx:248-265.

FIPS-86052 bug (fixed)

ZIP 86052 (Page, AZ — 270 miles from Maricopa core) was classified under FIPS 04013. 1,754 Maricopa + 1 Miami + 7 Harris = 1,762 mis-FIPS'd rows. Consumer-side filter shipped in src/new_model/feature_cache.py plus new module src/new_model/ref/zip_fips_validation.py. Post-filter Maricopa frame at T0=2025-03 has zero rows with ZIP 86052.

Source: web/app/decks/06-the-audits/page.tsx:255-265.

Deliverable schema (Ranked CSV + Sidecar)

Deliverable ZIP: scored_properties_2026-05-07.zip — 63 MB compressed, 577 MB uncompressed, 4,052,593 rows across 5 county-scoped CSVs plus cross-county calibration sidecar (14 columns × 5 rows: AUC-ROC per county, Lift@1%, lift ratio with 95% CI, stat_significant_lift flag, empirical deal rates at top 1%/5%/10%).

score_0_100 is within-county percentile of raw_probability (or calibrated_probability_isotonic); cross-county percentile comparison is NOT meaningful — use sidecar for cross-county base rates.

Sources: web/app/data/page.tsx:509-557; web/app/brief/page.tsx:62-67, 322-344.

Cross-county comparability gap (sidecar fix)

A "score 99" Jackson property has 6.29% expected deal rate; a "score 99" Philly property has 0.62%. 10.10× gap. This is why the sidecar exists.

Source: web/app/decks/archive/24-scientific-audit-2026-05-07/page.tsx:159-171.

Deal labels (Oracle v1.2)

PropertyValue
Transactions screened419,669
Labeled deals36,648 (8.73%)
Aggregate deal rate8.78%
Maricopa deal rate1.94%
Jackson deal rate14.84%
CRM deals in scope4,431
Silver-matched CRM deals2,463

Five-criterion AND-rule: DOCTYPE_CLEAN ∧ FLAG_CLEAN ∧ SELLER_CLEAN ∧ BUYER_INVESTOR ∧ PRICE_GATE. The C5c NDS fallback (TX/MO non-disclosure states) accounts for 62% of deals with zero price verification — the largest acknowledged structural gap, flagged in every downstream deck.

Sources: web/app/context/page.tsx:111-119; web/app/decks/archive/19-current-state-2026-04-22/page.tsx:132-140.


6 · How Apollo trains

End-to-end pipeline: data → features → folds → train → calibrate → audit.

Stage 1 — Silver materialisation

S3 silver parquet, monthly snapshots 2021-01..2025-09, 481 columns per row. FIPS-86052 consumer-side filter at read time (post-2026-05-07).

Stage 2 — Feature builder (T0 month-end)

src/new_model/features.py. Reads base_globs = _globs([t0], fips) — only as-of-T0 columns. 25 hand-engineered synthetics layered on top:

  • Coalesce with provenance: 3 leverage cols (CLBTV, CLTV, LTV) → canonical leverage_ratio + companion audit tag
  • Semantic derivation: 3-tier absentee level vs binary flag
  • Temporal construction: YearBuilt → property_age_years recomputed per snapshot, clipped to [0, 200]

Sources: web/app/decks/02-how-apollo-trains/page.tsx:200-223.

Stage 3 — Cache + ZIP/FIPS filter

1,500× speedup (30s → 0.02s per period). Never rsync --delete over data/cache/. Mini compute syncs via ./scripts/mini.sh sync-cache (one-way merge).

Stage 4 — Oracle label join (v1.2)

Y_deal label joined on (normalized_address, zip5, fips5). 99.6% address match rate on Maricopa validation sample. CRM-leak guard drops is_crm_matched_anywindow=1 rows entirely.

Stage 5 — Walk-forward training (6 folds)

FoldTrain rangeEval rangeNotes
12021-01..2022-03 (15 mo)2022-04..2022-09 + embargoMacro regime: rate-hiking onset
2+ Fold 1 eval2022-10..2023-03 + embargo
3+ Fold 2 eval2023-04..2023-09 + embargo
4+ Fold 3 eval2023-10..2024-03 + embargo
52021-01..2024-03 (39 mo)2024-10..2025-03 + embargoThe v8 fix shifted Fold 5 eval from 2024-04..09 (v7 had 5-mo label-window overlap inflating AUC to 0.843; honest AUC 0.694). Embargo permanently sealed by default.
62021-01..2024-09 (45 mo)2025-04..2025-09 + embargoMost recent pre-test

Sources: web/app/decks/02-how-apollo-trains/page.tsx:62-100; web/app/decks/archive/19-current-state-2026-04-22/page.tsx:152-179.

Stage 6 — Per-county HistGB

5 separate models. Deterministic: early_stopping=False, seed=42. scripts/train_model.py writes serialized artifacts to models/<FIPS>/v8/; manifest.feature_cache_version asserted at score-time. Old generate_final_ranked_list.py deleted (2026-05-07 audit fix).

Source: web/app/decks/archive/24-scientific-audit-2026-05-07/page.tsx:289-304.

Stage 7 — Isotonic calibration

Training universe split: 49 fit months (downsampled 10:1) + 2 held-out calibration months (non-downsampled, ~60K rows/county). Fit HistGB; predict on held-out slice; fit IsotonicRegression(p → y); apply at T0=2025-09 inference. Output carries 4–11 distinct probability tiers per county — use for threshold bands, not as continuous discriminator.

Sources: web/app/decks/archive/20-executive-submission/page.tsx:135-158; web/app/decks/archive/22-ceo-summary-2026-04-27/page.tsx:115-135.

Stage 8 — Score + rank → CSV + ZIP

Within-county percentile rank of calibrated_probability_isotonicscore_0_100. Monotonicity invariant: score_0_100 strictly monotone with raw_probability within county (asserted on output).

69/69 sanity checks

CategoryExamples
MonotonicityOlder properties trend toward higher sale rates up to a structural ceiling; reversals flag leakage candidates
Per-county prevalenceEval-window sale rate within ±10% of training prevalence across folds
Calibration errorWithin ±15% on 30/60/90-day deal-rate buckets after isotonic
CRM-leakZero rows where is_crm_matched_anywindow=1 reach training
NumericalNo NaN in score_0_100; no within-county duplicates; FIPS always 5-digit zero-padded

Source: web/app/brief/page.tsx:232-249.


7 · The buy-box

Apollo identifies who fits the buy-box. It says nothing about motivation. Pairing Apollo (buy-box) with V2 motivation signals (probate fix, foreclosure oracle, valuation gap) closes the loop.

Source: web/app/decks/03-the-buy-box/page.tsx:31-37.

Three families define the box

Physical (web/app/decks/03-the-buy-box/page.tsx:54-72):

  • property_age_years — 92.7% of importance in Miami; structural age, deferred maintenance, equity gaps
  • year_built — raw construction year (used directly in non-Miami counties where weight distributes more evenly)
  • building_area_sqft / living_area_sqft / lot_size_sqft — size thresholds define sub-market (small-footprint rowhouses Philly; condo towers Miami; sprawling lots Maricopa)

Location (web/app/decks/03-the-buy-box/page.tsx:75-92):

  • situs_zip5 — top-5 ZIPs capture 39% of Philly deals, 41% of Jackson, 13% of Maricopa. Geographic micro-concentration is the signal
  • County prevalence — Maricopa 1.94% vs Jackson 14.84% (4× spread). Pooled model washes out market-specific signal
  • BuildZoom permit density — renovation activity in ZIP predicts demand + pricing

Ownership (web/app/decks/03-the-buy-box/page.tsx:95-110):

  • days_ownership (rank 1 globally) — owners 7–12 years in are statistically most likely sell band; recent buyers near zero
  • owner_occupancy — absentee owners exit at higher rates with less friction; top-8 in Harris and Jackson
  • mortgage_age_months — refi-or-sell decisions when rates shift. Under active leakage audit (finding 09)

Top-15 feature breakdown by category

CategoryCountSource
Buy-Box (physical / location)6decks/archive/22-ceo-summary-2026-04-27/page.tsx:169-172
Deal-Motivation (distress / activity)5same
Hybrid3same
Ambiguous1same

Camilo's critique "Buy Box matters more than Likely Deal Score" is quantitatively validated by this breakdown.

Investor identity bound

Target buyer: small operator (portfolio < 10 properties, holding periods < 2 yr, acquiring at ratio below market-value estimate). Large institutional buyers (iBuyers, SFR REITs) explicitly out of scope. In 2024 small investors = 60–90% of investor-purchase flow nationally, growing as institutions become net sellers.

Target volume: ~0.46% of housing units/yr ≈ 670K client-like investor purchases nationally, ~180K recoverable across 5-county × 8-yr training window.

Source: web/app/context/page.tsx:99-119.

Three broken motivation signals (V2 territory)

SignalDefectFix
ProbateDistress_activeProbate dates NULL in 4 of 5 counties (data layer). Fires #3 in Miami only. Upstream ETL over-fires on partial string matchTighten flag predicate to court-record document types only. Bronze-side, 1-day fix
PreforeclosureDistress_active / foreclosure trajectoriesNot in top-30 anywhere. Oracle rule C2 excludes REO acquisitions at discount ratios < 0.85 — exactly the transactions wholesalers target. Rule is backwardsCorrect rule C2 to include the 3,261 entity-buyer REO acquisitions at ratio < 0.85
valuation_gapConstant 1.0 in PA and TX (normalization defect); ~20× in AZ (Save-Our-Homes equivalent). Non-discriminating in 2 of 5 marketsHPI-adjusted replacement; rebuild from raw assessment rolls with county-specific refresh calendars

Sources: web/app/decks/03-the-buy-box/page.tsx:208-239; web/app/decks/archive/24-scientific-audit-2026-05-07/page.tsx:246-258.

Three "surprises" surfaced by business-sense audit

FeatureRankWhy surprising
bathrooms#3 in Jackson at 0.0342Higher than any distress feature anywhere. KC metro is 1-bath bungalows (investor rental) vs 2+ bath (owner-occupied). Buy-box proxy disguised as physical feature
TaxDelinquentDistress_months_active#18–25 globallyTop wholesale signal in practice but ranks low. Annual assessment cycle creates near-degenerate distribution (Miami p10=p50=p90=27 months); tree-split utility collapses on constant data
property_age_years Miami#1 at 0.0844, 70× over #2In Maricopa/Philly/Harris/Jackson top ratio is only 1.02×–1.11×. Miami is structurally a different model

Source: web/app/decks/archive/24-scientific-audit-2026-05-07/page.tsx:260-277.


8 · Where Apollo wins

Quantitative summary (lifted from Section 4):

TierCountiesGeomean Lift ratio
Signal-threeJackson, Harris, Maricopa5.72×
All five+ Miami + Philadelphia3.03×

Why Miami flat

Alpha's Miami baseline lift = 8.30× (Alpha was originally tuned for Miami). Apollo's Miami lift = 10.09×, ratio 1.22× ± 0.27. The CI is wide because the base rate is already high. The Fold 1 Miami "8.7× AUC-PR" result that opened the project was measured on a different metric (AUC-PR not Lift@1% ratio) and on a single pre-embargo fold; the multi-fold embargoed evaluation showed the narrower gap.

Source: web/app/decks/04-where-it-wins/page.tsx:130-141.

Why Philly flat

AUC-ROC 0.66 is the lowest in the portfolio. Apollo model lift 2.42×, Alpha baseline 2.16×, ratio 1.12× ± 0.21. High sale prevalence (2.56%), Northeast row-house ownership structure, judicial foreclosure cycle different from Sunbelt markets. Feature stack transfers, but signal-to-noise environment is tighter. 482 positives before embargo expansion was below the 1K threshold. Apollo's 62.6% Townhouse composition (only 5% SFH) was previously masked when the model was SFH-only.

Sources: web/app/decks/04-where-it-wins/page.tsx:143-150; web/app/decks/archive/20-executive-submission/page.tsx:107-119.

The honest framing

Apollo is a buy-box model that has proven itself in 3 of 5 markets. In the remaining 2, Alpha is competitive enough that Apollo does not statistically dominate at the top of the list. That does not prevent Apollo from being useful — AUC-ROC scores (Miami 0.69, Philly 0.66) indicate meaningful ranking discrimination across the full distribution. It does mean the Lift@1% ratio headline should not be cited without the noise-band disclosure.

Source: web/app/decks/04-where-it-wins/page.tsx:167-175.


9 · The submission

Deliverable artifact

PropertyValue
Filescored_properties_2026-05-07.zip
Compressed63 MB
Uncompressed577 MB
Rows4,052,593 (5 ranked CSVs + 1 sidecar)
Inference T02025-09
Prediction windowOct 2025 – Mar 2026
meta.jsonEmbeds oracle sha256, feature cache version, train/calibration windows

Dual-size cut-off (judging flexibility)

PackRowsBytesOptimised for
Top-1,000 per county5,000660 KBPrecision@K, Lift@K, operational wholesale lists
Top-50K per county250,00031 MBF1@K, Recall@K
TOP_1000_PER_COUNTY/ folder5,000 across 5 filesSplit-by-county convenience for judges
head_to_head_by_county.csv5 rows × 26 colsPer-county metrics (AUC, BSS, ECE, Lift, Recall)

Source: web/app/decks/archive/20-executive-submission/page.tsx:172-202.

How to use the output

  • score_0_100 — within-county percentile of calibrated probability. Use for intra-top-100 ordering. NOT comparable across counties.
  • calibrated_probability_isotonic — empirical deal rate (4–11 distinct tiers per county). Use for threshold bands.
  • cross_county_calibration_2026-05-07.csv — per-county prevalence, lift ratios w/ 95% CI, stat_significant_lift flag, expected deal rate at top 1%/5%/10%.

Sources: web/app/decks/05-the-submission/page.tsx:175-179; web/app/decks/archive/24-scientific-audit-2026-05-07/page.tsx:161-171.

Top calibrated probability examples

  • Miami #1 row: calibrated_probability_isotonic = 0.579 (58% deal probability)
  • Philly top-10 global dominated by Philly rows at probability 0.50–1.00 (isotonic calibration ceiling — optics note, not a bug)

Sources: web/app/decks/archive/20-executive-submission/page.tsx:153-157, 114-119.


10 · The audits

Three audits ran between 2026-04-23 and 2026-05-07. Together they closed 9 of 10 ship-blockers. Item #10 (Scenario A) is FLAG, not failed.

Audit comparison

AuditDateLensChecksVerdictKey finding
Triple-Critic2026-04-23CRM leak · oracle proxy · prediction window · use_type filter · placebo · data integrity10 questions4 FAIL→FIXED · 2 CAVEATED · 2 PASSCRM rows in training, proxy features in model, window off by one month — all three fixed before submission
Permit Density2026-05-07BuildZoom S3 coverage · per-county permit density · ZIP/FIPS integrity5 counties · 999 ZIPsFINDING-52 OBSOLETE · FIPS BUG FIXED64K → 15.6M permits (242×) · finding-52 data ceiling closed · 1,762 mis-FIPS rows fixed
Scientific Re-Audit2026-05-07Mathematical · Business-sense · Pipeline structural (3 parallel specialist agents, no cross-coordination)36 checks across 3 agents24 PASS · 8 FLAG · 4 FAIL · 9 of 10 closedPipeline holds · geomean 3.03× · Miami/Philly within noise · model is a Buy-Box classifier

Source: web/app/decks/06-the-audits/page.tsx:48-72.

Scientific re-audit scorecard (decks/archive/24-scientific-audit-2026-05-07/page.tsx:91-127)

LensChecksPASSFLAGFAILKey finding
Mathematical1410132 stat-sig fails (Miami · Philly) · 1 ECE undocumented · 1 cross-county comparability structural
Business-sense14851property_age PASS→FLAG · 3 broken motivation signals · model is Buy-Box
Pipeline structural8 stages6202 P0 fragility risks · 0 automated tests before audit · 2 canonical scoring scripts coexisted
Combined36248467% PASS · research-quality · ship with caveats

Ten ship-blockers — action list

#ItemStatusNote
1Exclude CRM-matched rows from trainingCLOSED4,442 rows dropped · attach_y_deal(exclude_crm=True)
2Remove 5 oracle-proxy features from PROXY_DROPS (cash_buyer_flag, is_distress_deed, +3)CLOSEDRe-run: 103 clean columns · no oracle-input detected
3Fix prediction window label — Oct 2025–Mar 2026 (was off-by-one month)CLOSEDAll artifacts corrected
4Document --use-types default = expanded residential set (SFH + Condo + Townhouse + Duplex + Triplex + Quadruplex + 5-9 units)CLOSED
5Add stat-sig caveat · dual geomean (5/5=3.03× · 3/5=5.72×)CLOSEDMiami + Philly within noise of 1.0× — disclosed in deck 22 + sidecar CSV
6Serialize model · train_model.py + score_model.py · seed=42 · early_stopping=FalseCLOSEDDeterministic · manifest.feature_cache_version asserted at score-time
7Eliminate hardcoded paths in features.py:688 + macro.py:48CLOSEDPath(__file__).resolve().parents[2]
8Add monotonicity invariant to sanity_checkCLOSEDscore_0_100 strictly monotone with raw_probability within county
9Ship test baseline · tests/test_features.py · 5 checksCLOSED0.41s · ZIP validator · cohort map · score formula · prefix collision · filter
10 ⚠Scenario A leakage ablation on embargoed Fold 5 (Miami)FLAGΔAUC-ROC −0.0270 vs −0.0033 pre-embargo · 8× larger drop · not broken but FLAG-band · not a pass

Source: web/app/decks/06-the-audits/page.tsx:84-95.

Scenario A — the one open flag

  • Test: drop listing_duration_months, months_since_prev_sale, mortgage_age_months on embargoed Fold 5 Miami
  • Pre-embargo (finding 31): ΔAUC-ROC −0.0033 (within noise)
  • Embargoed: ΔAUC-ROC −0.0270 (8× larger)
  • Verdict: not "broken" with features in; may perform worse than finding 31 suggested when ablated. Eduardo + Camilo head-to-head uses the full model output, not the ablated one. Flag is on the research trail, not the submission artifact.
  • Resolution: targeted bronze-ingest probate-date fix + clean Scenario A re-run on all five counties with embargoed window. ~1 day of compute. Scoped for V2.

Sources: web/app/decks/06-the-audits/page.tsx:289-328; web/app/brief/page.tsx:250-258.


11 · Methodology evolution (timeline)

Chronological milestones from archive decks 01 → 24:

DateMilestoneSource
Project kickoffMacro project brief: replace Alpha with calibrated, transferable, explainable ranker. 8-week rock vs Camilo, coached by Eduardo, weekly Thursdaydecks/archive/01-macro-project
Phase 1Column inventory, foreclosure law validation, distress trajectory audit, 25 synthetic features, external data cachingdecks/archive/01-macro-project/page.tsx:181-186
Phase 2Six walk-forward folds across five countiessame
Phase 3Architecture sweep: HistGB vs LightGBM vs logistic vs random forest (5×4 = 20 cells) — HistGB winsdecks/archive/07-arch-ablation
~2026-04Fold 1 Miami head-to-head: Apollo 8.7× AUC-PR over Alpha (0.259 vs 0.030), Brier 0.023 inside targetdecks/archive/05-fold1-vs-alpha
~2026-04Spatial expansion: Fold 1 across all five countiesdecks/archive/06-spatial-expansion
2026-04-20Architecture ablation matrix verdict: HistGB ships as defaultdecks/archive/07-arch-ablation
2026-04-20Fold-by-fold results: 25-cell matrix, 5 county trajectoriesdecks/archive/08-fold-results
2026-04-21Investor criteria · 6-box specification · deal oracle v1.1 (decks/archive/11-investor-criteria). 4,431 CRM deals as ground truth; 5-step deal-discovery pipelinedecks/archive/12-deal-discovery
2026-04-21Identification criteria V2 · EXCLUDE + VALIDATE rule library · LIFT methodologydecks/archive/13-identification-criteria
2026-04-22v8 fix shipped: Fold 5 eval shifted from 2024-04..09 to 2024-10..2025-03. The v7 AUC inflation (0.843 → 0.694 honest) was caused by 5-month label-window overlap — now sealed by embargo default. 17 unit-test assertions PASSdecks/archive/19-current-state-2026-04-22/page.tsx:152-179
2026-04-22Current State deck 19 prepared for Thursday Eduardo+Camilo check-in: research-ready with caveatsdecks/archive/19-current-state-2026-04-22
2026-04-23Triple-Critic audit: 4 FAIL→FIXED · 2 CAVEATED · 2 PASS. CRM leak, oracle proxies, prediction window all fixed same sessiondecks/archive/21-triple-audit-2026-04-23
2026-04-23Executive submission deck 20: 3.03× geomean, 5K + 250K cut-off variants, calibration P1 solveddecks/archive/20-executive-submission
2026-04-23V2 overnight report · oracle v1.1 · five-stream briefdecks/archive/18-v2-overnight-report
2026-04-27CEO summary deck 22 (one-page brief, 5 questions/5 answers)decks/archive/22-ceo-summary-2026-04-27
2026-05-07BuildZoom refresh: 64,513 → 15,645,153 permits (242×, 15 GB, 2,851 part-files). Finding 52's "structural data ceiling" verdict obsoletedecks/archive/23-permit-data-density-2026-05-07
2026-05-07FIPS-86052 fix: 1,762 mis-FIPS'd rows filtered out (ZIP 86052 = Page, AZ, 270 mi from Maricopa core)decks/06-the-audits/page.tsx:255-265
2026-05-07Scientific Re-Audit (3 parallel agents: math, business-sense, pipeline). 36 checks · 24 PASS · 8 FLAG · 4 FAIL. 9 of 10 ship-blockers closed same day. Verdict: SHIP-WITH-SHARPER-CAVEATSdecks/archive/24-scientific-audit-2026-05-07
2026-05-07P0 risks fixed: model serialization (train_model.py + score_model.py, seed=42, deterministic), hardcoded paths replaced (Path(__file__).resolve().parents[2]), monotonicity invariant + 5-check pytest baseline shippedsame
2026-05-07Finding 54: stratified ablation confirms property_age = cross-band separator, not within-band motivation. Apollo is a Buy-Box classifiersame
2026-05-07Finding 55: Scenario A re-run on embargoed Fold 5 Miami returns FLAG (ΔAUC-ROC −0.0270 vs −0.0033 pre-embargo, 8× larger)same
2026-05-07Deliverable scored_properties_2026-05-07.zip shipped: 4.05M rows, 63 MBdecks/05-the-submission, brief
2026-05-08Brief / Context / Data / Decks 01–06 published as the live hub at 8020rei-new-model.web.app (this is the source distilled here)brief/page.tsx:30, data/page.tsx:198, decks/04-where-it-wins/page.tsx:28

12 · Current state

What's shipped

  • scored_properties_2026-05-07.zip — 4,052,593 properties across 5 counties (Miami-Dade, Maricopa, Philadelphia, Harris, Jackson), 63 MB compressed, 577 MB uncompressed
  • Five ranked CSVs + cross-county calibration sidecar
  • Per-county HistGB + isotonic calibration models serialized at models/<FIPS>/v8/ (deterministic, seed=42)
  • 117 features (from 481 column universe), 25 synthetics, 6 tiers
  • 69/69 deterministic sanity checks pass; 0.41s pytest baseline
  • 9 of 10 audit ship-blockers closed
  • Hub deployed at 8020rei-new-model.web.app (Next.js 15 + Tailwind v4, brand-token-synced from BigQuery, paths in web/app/)
  • Cross-county comparability addressed via sidecar CSV

What's open

One audit flag:

  • Scenario A leakage ablation on embargoed Fold 5 (Miami): ΔAUC-ROC −0.0270 vs −0.0033 pre-embargo. Not a ship-blocker (full model output unchanged), but blocks "10 of 10" sign-off. Scoped for V2.

Three under leakage audit (per CLAUDE.md hard rule #7):

  • listing_duration_months
  • months_since_prev_sale
  • mortgage_age_months

Until ablation completes, these features' AUC-PR contribution is NOT cited as validated.

External gates (Eduardo + Camilo):

  • Locked March 2025 head-to-head test: written sign-off required on universe, cut-off K, scoring metric
  • Camilo's baseline artifact: needs his top-N list on the same eval cohort (currently only Alpha measured)
  • Eduardo's P/R/F1 evaluation: against client deals, market deals (sold), market deals at discount. Eduardo has access to post-Oct silver; Apollo's role is shipping the list (done)
  • Alpha sunset timeline depends on the locked March 2025 test gate opening

Known blockers / structural gaps:

  • C5c NDS fallback — 62% of oracle deals (TX/MO non-disclosure states) have zero price verification. Acknowledged P1 gap, flagged in every downstream deck.
  • Probate dates NULL in 4 of 5 counties — bronze-ingest fix, ~1 day. Highest-leverage single improvement to motivation signal.
  • Foreclosure oracle rule C2 backwards — currently excludes REO acquisitions at discount ratios < 0.85, which is exactly what wholesalers target. 3,261 entity-buyer rows to recover.
  • Valuation gap constant 1.0 in PA and TX — non-discriminating in 2 of 5 markets. HPI-adjusted replacement is V3 backlog.
  • Silver universe saturated: 8 feature-addition experiments produced zero AUC gains. Next tier of gain requires fresh data sources — MLS DOM, permits (now refreshed), skip-trace, rent rolls.
  • Arms-length filter intentionally OFF: foreclosures, quit-claims, probate transfers, divorce sales all count as y_sold=1. V2 second pass once Eduardo signs off on policy definition. Impact on thin-positive counties like Philly (2.56% prevalence) unknown.

Sources: web/app/decks/archive/22-ceo-summary-2026-04-27/page.tsx:190-227; web/app/decks/05-the-submission/page.tsx:255-303; web/app/decks/archive/20-executive-submission/page.tsx:212-241.

Calibration / leakage audit status

ItemStatus
Isotonic calibrationLIVE; 4 of 5 counties Brier-positive; Jackson at honest floor (−0.0003)
ECE top-decile reduction69–95% across all 5 counties
Calibration target ±15% on 30/60/90-dayMet
3-feature leakage auditPending Scenario A V2 ablation
CRM-leak guardENFORCED on every run, verified as 1 of 69 sanity checks
Walk-forward embargoENFORCED structurally; v8 default since 2026-04-22

V2 roadmap (the contender's next phase)

  1. Dependent variable sharpen: from "any sale" → "client-like investor purchase" (4,431 CRM deals as tightened ground truth)
  2. Three motivation signal repairs: probate ingest (bronze), foreclosure oracle C2 correction, valuation gap HPI-adjusted normalization
  3. Scenario A clean ablation on all five counties with embargoed window
  4. Arms-length filter second pass once policy signed off

V3 horizon (paradigm shifts, documented but out of V1 scope)

  • Survival analysis
  • Uplift modelling
  • Computer vision on property imagery
  • Open public data at national scale

Source: web/app/decks/01-why-apollo/page.tsx:170-178.


13 · Glossary

TermDefinition
ApolloPer-county supervised classifier (HistGB + isotonic) replacing Alpha at step 4 of Gaia. 117 features, 4.05M parcels, 5 counties. The contender
Alpha8020REI's incumbent scorer. Weighted sum of 25 hand-tuned distress indicators, Miami-tuned, no training loop, no calibration. The baseline
GaiaUpstream 7-step ETL (ingest → dedup → join → label → enrich → BuyBox → export). Apollo replaces step 4 (scoring) only
CamiloCompeting modeller on the 8-week rock; baseline artifact pending. Apollo must clear top-decile recall ≥ Alpha AND ≥ Camilo
EduardoCoach; sign-off authority on locked March 2025 test; owns P/R/F1 evaluation against client deals
T0The month-end "as-of" timestamp for a snapshot. Features computed at T0 month-end; outcome window T0+1..T0+6
foldA walk-forward train/eval split. 6 expanding folds (Fold 1: 15-mo train; Fold 6: 45-mo train). Each absorbs prior eval into training
embargo1-month buffer between training T0 and evaluation window start. Closes the leak where a property listed at T0 and sold at T0+1 carries signal into training while its outcome is visible
HistGBscikit-learn HistGradientBoostingClassifier. Handles tabular mixed types; interpretable feature importance. Beat LightGBM, logistic, random forest in 5×4 ablation. Deterministic (early_stopping=False, seed=42)
isotonicMonotone non-parametric calibration. Maps raw model probability to empirical deal rate via IsotonicRegression(p → y) fit on held-out non-downsampled slice
lift / Lift@K(positives in top-K of model list) ÷ (positives in random top-K). Lift@1% = how many more deals the top 1% of Apollo's list captures vs a random 1% of properties
lift ratioApollo Lift@1% ÷ Alpha Lift@1%. The headline head-to-head metric
AUC-PRArea under precision-recall curve. Robust to class imbalance (deal rates < 9%)
AUC-ROCArea under receiver-operating-characteristic curve. Used as the secondary discriminator
Brier scoreMean squared error between predicted probability and outcome. Lower = better calibrated. Target ≤ 0.025
BSS (Brier Skill Score)1 − (model Brier ÷ reference Brier). Positive = better than reference. Miami went from −0.0008 → +0.0011 post-isotonic (first positive BSS in the project)
ECE (Expected Calibration Error)Weighted mean of bin-level miscalibration. Top-decile ECE improved 69–95% across all 5 counties post-isotonic
CRM-leak guardis_crm_matched_anywindow = 1 rows (properties 8020REI already worked through CRM) dropped entirely from training. Prevents fake head-to-head wins from prior business actions
Oracle v1.25-criterion AND-rule deal definition: DOCTYPE_CLEAN ∧ FLAG_CLEAN ∧ SELLER_CLEAN ∧ BUYER_INVESTOR ∧ PRICE_GATE. 8.73% prevalence; 36.6K labels across 419.7K transactions
C5c NDS fallbackNon-disclosure-state branch of PRICE_GATE for TX/MO. Acknowledged structural gap: 62% of deals carry zero price verification
arms-length filterFilter excluding non-arms-length transactions (foreclosures, quit-claims, probate transfers, divorce sales). Intentionally OFF in current phase per CLAUDE.md hard rule #3. V2 second pass planned once Eduardo signs off on the policy
score_0_100Display score: within-county percentile rank of calibrated_probability_isotonic. Intra-county only. Cross-county comparison NOT meaningful
calibrated_probability_isotonicThe actual empirical deal probability per property (4–11 distinct tiers per county)
stat_significant_liftSidecar boolean flag: TRUE iff 95% CI on lift ratio excludes 1.0×. TRUE for Jackson/Harris/Maricopa; FALSE for Miami/Philly
signal-threeJackson + Harris + Maricopa — the 3 counties where Apollo separates from Alpha at statistical significance. Geomean lift ratio 5.72×
buy-boxThe structural property/location/ownership fingerprint that defines a target deal. Apollo finds who fits the box; it does NOT predict motivation
dual-geomean framingComms rule: report 3.03× (all 5) and 5.72× (signal-3) together. Citing either alone misrepresents the evidence
Scenario ARecency-features leakage ablation. Drop listing_duration_months + months_since_prev_sale + mortgage_age_months. Pre-embargo: −0.0033 ΔAUC-ROC. Embargoed: −0.0270. The 8× delta is the one open audit flag
FIPSFederal Information Processing Standards county code. Always 5-digit zero-padded string: 04013 not 4013 (CLAUDE.md hard rule #1)
finding NNDated, evidence-first entry in notes/findings/NN_<topic>.md. Append-only; older facts may be stale (date wins)

14 · Cross-bucket notes

  • CallZeke deliverable was moved from REI hub → Roofing bucket on 2026-05-25. See notes/Roofing/callzeke/. The REI hub at 8020rei-new-model.web.app no longer hosts CallZeke content.
  • The REI bucket is notes/REI/ (this file JOURNEY.md). The Roofing bucket is notes/Roofing/ (see notes/Roofing/PROGRESS_NOTEBOOK.html for live state).
  • Coverage platform at coverage.8020roof.com is Roofing-side, NOT REI-side. Separate Firebase site (hosting:8020roof-coverage); never firebase deploy bare without --only hosting:8020roof-coverage.
  • Brand tokens are BigQuery-synced and live at presentations/assets/mck-ds/{colors_and_type,tokens.bigquery}.css. The web/ Next.js app symlinks them in via web/styles/. Single source of truth for color/type/spacing/motion across HTML and React.
  • Memory of arms-length scope lives at ~/.claude/projects/-Users-ignacioaraya-Projects-new-model/memory/project_arms_length_phase.md.

15 · Source map

JOURNEY.md sectionPrimary source fileSecondary sources
1 · Macro projectweb/app/context/page.tsx:45-130web/app/decks/archive/01-macro-project/page.tsx, CLAUDE.md
2 · Alphaweb/app/context/page.tsx:132-175web/app/decks/01-why-apollo/page.tsx:45-101, web/app/decks/archive/22-ceo-summary-2026-04-27/page.tsx:54-69
3 · Apolloweb/app/context/page.tsx:177-218, web/app/decks/02-how-apollo-trains/page.tsxweb/app/decks/01-why-apollo/page.tsx:103-145, web/app/data/page.tsx:38-83
4 · Head-to-headweb/app/brief/page.tsx:119-161, web/app/decks/04-where-it-wins/page.tsx, web/app/decks/archive/24-scientific-audit-2026-05-07/page.tsx:130-187web/app/decks/archive/05-fold1-vs-alpha/page.tsx, web/app/decks/archive/19-current-state-2026-04-22/page.tsx:194-239, web/app/decks/archive/22-ceo-summary-2026-04-27/page.tsx:76-105
5 · Data backboneweb/app/data/page.tsxweb/app/decks/archive/19-current-state-2026-04-22/page.tsx:96-146
6 · How Apollo trainsweb/app/decks/02-how-apollo-trains/page.tsxweb/app/decks/archive/19-current-state-2026-04-22/page.tsx:148-192
7 · Buy-boxweb/app/decks/03-the-buy-box/page.tsxweb/app/decks/archive/22-ceo-summary-2026-04-27/page.tsx:138-180, web/app/decks/archive/24-scientific-audit-2026-05-07/page.tsx:189-277
8 · Where Apollo winsweb/app/decks/04-where-it-wins/page.tsxweb/app/decks/archive/24-scientific-audit-2026-05-07/page.tsx:130-187
9 · The submissionweb/app/decks/05-the-submission/page.tsxweb/app/decks/archive/20-executive-submission/page.tsx:159-202, web/app/data/page.tsx:501-569
10 · Auditsweb/app/decks/06-the-audits/page.tsx, web/app/decks/archive/24-scientific-audit-2026-05-07/page.tsxweb/app/decks/archive/21-triple-audit-2026-04-23/page.tsx (not read; referenced via deck 06 + 24)
11 · TimelineAll archive decks 01 → 24web/app/decks/archive/19-current-state-2026-04-22/page.tsx, web/app/decks/archive/22-ceo-summary-2026-04-27/page.tsx, web/app/decks/archive/24-scientific-audit-2026-05-07/page.tsx
12 · Current stateweb/app/brief/page.tsx:260-318, web/app/decks/05-the-submission/page.tsx:249-304, web/app/decks/archive/22-ceo-summary-2026-04-27/page.tsx:182-237CLAUDE.md (hard rules), notes/PROJECT_STATUS.md, notes/findings/00_index.md
13 · GlossaryDistilled across all 15 filesCLAUDE.md
14 · Cross-bucketProject memory (~/.claude/projects/.../memory/), CLAUDE.md

Source-file inventory used

Live hub (6 decks + 3 pages):

  • web/app/brief/page.tsx (370 lines · executive brief)
  • web/app/context/page.tsx (289 · background)
  • web/app/data/page.tsx (593 · datasets feeding model)
  • web/app/decks/01-why-apollo/page.tsx (238)
  • web/app/decks/02-how-apollo-trains/page.tsx (353)
  • web/app/decks/03-the-buy-box/page.tsx (289)
  • web/app/decks/04-where-it-wins/page.tsx (333)
  • web/app/decks/05-the-submission/page.tsx (332)
  • web/app/decks/06-the-audits/page.tsx (434)

Archive milestones (6):

  • web/app/decks/archive/01-macro-project/page.tsx (225 · the original framing)
  • web/app/decks/archive/05-fold1-vs-alpha/page.tsx (216 · the head-to-head opener)
  • web/app/decks/archive/19-current-state-2026-04-22/page.tsx (304)
  • web/app/decks/archive/20-executive-submission/page.tsx (263)
  • web/app/decks/archive/22-ceo-summary-2026-04-27/page.tsx (272)
  • web/app/decks/archive/24-scientific-audit-2026-05-07/page.tsx (421)

*Document status: distilled 2026-05-25 from live hub (8020rei-new-model.web.app) source. The hub remains the live source of truth — when in doubt, check web/app/**/page.tsx for the latest framing. (Hub decommissioned 2026-06-01: web/ removed and the 8020rei-new-model Firebase site deleted; the web/app/** source no longer exists. Live surface is now the platform/ Models Wiki at models-8020iq.web.app. This document is preserved as a historical record.) Confidential.*

Progress Notebook

Phase-by-phase build log — from PROGRESS_NOTEBOOK.html

REI · Apollo · Progress Notebook cuaderno · single source of truth · 9-phase Apollo lifecycle · 2026-05-25

⚠ Decommissioned 2026-06-01 — Apollo Next.js web hub. The Apollo Next.js hub described below (the web/ directory → 8020rei-new-model.web.app) was decommissioned 2026-06-01: the web/ tree was removed from the repo and the 8020rei-new-model Firebase hosting site was permanently deleted (URL now 404). The live model surface is now the 8020IQ Models Wiki at models-8020iq.web.app, served from platform/ as plain static HTML (no build step). Note: "Apollo" here also names the REI scoring model — the model is unchanged. All historical references to the web hub below are preserved as a record.
This is the only REI notebook. Macro / decision layer for Apollo, the supervised model replacing Alpha (the 25-signal hand-weighted heuristic) at step 4 of the Gaia ETL. Every phase below has a status pill and a link to its source. Detail lives in JOURNEY.md (776-line distillation of every deck on the live hub). Decisions live here. 8-week rock with Camilo (competitor) and Eduardo (coach); weekly Thursday check-ins. Currently at the end of week 6: model trained, audited (9/10 blockers closed), submission shipped — locked March 2025 head-to-head is gated on Eduardo + Camilo sign-off.
📘 Journey · full distillation
Every deck on the live hub, distilled into one markdown — Apollo overview, Alpha baseline, head-to-head numbers (3.03× geomean, 5.72× on stat-sig counties), data backbone, training pipeline, audit closures, current state, glossary. 776 lines · 54 KB · numbers not adjectives.
Open JOURNEY.md →
🌐 Live hub · public artifact
Firebase static export at 8020rei-new-model.web.app — 41 routes, 6 current decks + 24 archived. Source: web/app/ (Next.js 15). 2026-05-25: CallZeke content removed (moved to Roofing bucket); REI hub now 100% Apollo. (Hub decommissioned 2026-06-01 — web/ removed, 8020rei-new-model Firebase site deleted; live surface is now the platform/ Models Wiki at models-8020iq.web.app.)
Open Live Hub →

Macro project · win condition

BarThresholdStatus
Top-decile recall≥ Alpha AND ≥ Camilo on locked March 2025 cohortGATED sign-off pending
CalibrationWithin ±15% on 30/60/90-day deal-rate buckets4 of 5 Jackson at honest floor
TransferabilityPer-county model trained on county-X explains county-X outcomesVALIDATED pooled costs 15–27% AUC-PR
PlayerRole
Ignacio ArayaBuilder · Apollo (DS, model, features, pipeline)
CamiloCompetitor · parallel model · baseline artifact pending
EduardoCoach · sign-off authority · P/R/F1 vs client deals
Weekly Thursday check-ins

Bottom-line KPIs

Geomean lift@1% (5co)
3.03×
vs Alpha · all counties
Geomean (stat-sig 3co)
5.72×
Jackson + Harris + Maricopa
Jackson lift@1%
7.87×
stat-sig
Harris lift@1%
6.92×
stat-sig
Maricopa lift@1%
3.43×
stat-sig
Miami lift@1%
1.22×
±0.27 · inside 95% CI of 1.0
Philly lift@1%
1.12×
±0.21 · inside 95% CI of 1.0
Parcels scored
4.05M
T0=2025-09 · 5 counties
Features
117
tiers A-F · 3 under leakage audit
Audit blockers closed
9 / 10
Scenario A FLAG-band pending
⚠ Phase 7 open finding (Scenario A · recency-feature leakage on Fold 5) — three features under active leakage audit: listing_duration_months, months_since_prev_sale, mortgage_age_months. See notes/findings/09_leakage_audit.md. Do NOT cite their AUC-PR contribution as validated until the ablation runner completes.

PHASE 1 · Data backbone DONE

Sandbox + universe

PropertyValueSource
Active states14web/app/context/page.tsx:67-75
Active counties (pilot)5 — Miami 12086 · Maricopa 04013 · Philly 42101 · Harris 48201 · Jackson 29095CLAUDE.md
Sandbox time span2021-01 → 2025-09 (57 months)web/app/data/page.tsx:235-239
Sandbox storage680 GBweb/app/context/page.tsx:67-75
Total parcels scored at T0=2025-094,052,593 (5.17M incl. non-residential)web/app/data/page.tsx:28-34
FIPS rule5-digit zero-padded everywhere (04013 not 4013)CLAUDE.md

T0 + horizon conventions

T0 = month-end snapshot string YYYY-MM. Features computed as-of T0 month-end. Horizon = 6 months. y_sold = 1 iff any sale recorded in T0+1..T0+6. Walk-forward folds enforce no-future-leakage via t0 boundaries; src/new_model/features.py reads only base_globs = _globs([t0], fips).

Arms-length filter — intentionally OFF this phase

Every sale event counts as y_sold=1 (foreclosures, quit-claims, probate, divorce included). Phase scope, not oversight. Re-run with the filter is a planned second pass after Eduardo signs off on the policy. Do NOT propose adding it as a blocker.

PHASE 2 · Alpha baseline (the incumbent) DONE

What Alpha is

25-signal hand-weighted heuristic. Step 4 of the Gaia 7-step ETL. Implementation reference in src/new_model/alpha.py. Output is a single score per parcel.

Why it's the baseline

Apollo must beat Alpha at top-decile recall AND calibration AND transferability. Alpha is what Acquisition teams use today. Numbers reported throughout this notebook are relative to Alpha unless otherwise stated.

JOURNEY.md §2 — Alpha details

PHASE 3 · Features WIP

117 features · 6 tiers

TierFamilyStatus
AProperty physical — parcels, size, use, year_builtSHIPPED
BOwner + distress — 23 distress trajectories, absentee, leverageSHIPPED
CValuation + activity — AVM, appreciation, days_ownershipSHIPPED
DDate-diffs — mortgage_age_months · listing_duration_months · months_since_prev_saleUNDER LEAKAGE AUDIT
ENational macro — FRED mortgage rate, Fed funds, HPI, CPI, unemploymentSHIPPED
FLocal market context — BLS county unemp, ACS income, FHFA state HPIWIP — being wired in

Three features under active leakage audit

See notes/findings/09_leakage_audit.md. Do not cite tier-D AUC-PR contribution as validated until ablation runner completes. Tier F leakage was FIXED 2026-04-22 — CSUSHPINSA/FHFA/ACS/BLS publication-lag shifts shipped; cache v8 supersedes v7 (memory feedback_tier_f_leakage_fixed).

Reproducibility pin

Every result JSON embeds a cache_manifest (sha256-16 per cohort + oracle) at write time. No hardcoded version strings (per 5-reviewer audit 2026-04-22).

JOURNEY.md §5 — Data backbone

PHASE 4 · Walk-forward folds DONE

6-fold walk-forward · embargo enforced

Fold embargo fixed 2026-04-22 (Tier F leakage). Folds 1-6 × 5 counties = 30 cells. Wall-clock ~17 min/cell warm cache; 25 cells ≈ 7 hours. Runner: scripts/run_folds_2_6.sh.

Compute topology

NodeRAMRole
Main (this machine)32 GBPrimary training + dev + builds
Mini32 GBSecond training node · ssh mini · ./scripts/mini.sh

Feature cache shared via ./scripts/mini.sh sync-cache (one-way, merge only — never rsync --delete; 1500× speedup, 30s → 0.02s per period). Prefer mini for jobs >5 min; exception if mini <100 MB free.

Maricopa Folds 2-6 OOM

FIPS 04013 OOMs on main. Route to mini or lower downsample ratio (memory project_maricopa_oom).

PHASE 5 · Training DONE

HistGradientBoosting · selected over LightGBM

HistGB chosen over LightGBM (+0.020 AUC). 10 experiments at one point measured before metric switch. Per-county architecture validated against pooled (pooled costs 15-27% AUC-PR).

Single fold + single county

uv run python scripts/train_fold.py fold_1 12086

Ablation: drop a feature under audit

EXTRA_DROP_COLS=listing_duration_months OUT_SUFFIX=ablation_listing \
  uv run python scripts/train_fold.py fold_1 12086

Multi-arch sweep

uv run python scripts/train_fold_arch.py fold_1 12086 histgb
JOURNEY.md §6 — Training pipeline

PHASE 6 · Calibration DONE

Isotonic calibration · 4 of 5 counties pass

Deployed across 5 counties. +0.0019 Brier, ECE −87%. Jackson sits at honest floor (small sample, hardest cohort). HistGB calibration ships +3.3× lift vs uncalibrated baseline (memory: Week of 2026-04-28).

JOURNEY.md §4 — Calibration BSS table

PHASE 7 · Audits WIP — 9/10

Scientific re-audit (2026-05-07) · 10-item ship-blocker list

9 of 10 closed. Pending: Scenario A — recency-feature leakage on embargoed Fold 5 (FLAG-band, not PASS; pending V2 ablation).

Closed blockers

  1. Tier F publication-lag (CSUSHPINSA/FHFA/ACS/BLS shifts) — shipped 2026-04-22
  2. Fold embargo bug — shipped 2026-04-22
  3. Reproducibility pinning (cache_manifest in every result JSON) — shipped 2026-04-22
  4. Calibration ECE −87% via isotonic — shipped 2026-04-21..28
  5. Per-county architecture validated vs pooled — shipped 2026-04-21..28
  6. Algorithm selection HistGB vs LightGBM (+0.020 AUC) — shipped
  7. Triple-critic audit 2026-04-23
  8. CEO summary verified 2026-04-27
  9. Permit data density audit 2026-05-07

Open blocker · Scenario A

Recency-feature leakage on embargoed Fold 5. Three features (listing_duration_months, months_since_prev_sale, mortgage_age_months) suspected of leaking embargo-period information into the model. Ablation runner pending. Status: FLAG-band, not PASS.

notes/findings/09_leakage_audit.md

PHASE 8 · Submission deliverable DONE

63 MB ZIP · dual-size cut-off

Static export bundle (parcels + scores + calibration metadata + audit manifests). Dual-size cut-off pattern: top-K small list + extended ranked list. Deliverable lives in data/sandbox/model/. Walk-forward backtest value metric: lift vs random.

JOURNEY.md §9 — Submission

PHASE 9 · Head-to-head vs Alpha (March 2025 locked) GATED

Locked March 2025 test · untouchable until sign-off

HARD RULE. Locked March 2025 test is untouchable until Eduardo + Camilo sign off in writing. This is the Phase 4 gate and must not be run early, even for sanity checks. (CLAUDE.md Hard Rule #4)

What gets measured at the gate

  • Apollo top-decile recall vs Alpha top-decile recall
  • Apollo top-decile recall vs Camilo top-decile recall
  • 30/60/90-day calibration buckets within ±15%
  • Per-county results — Apollo must NOT regress on any of 5 counties

Status summary

PhaseStatusNote
P1 · Data backboneDONE4.05M parcels · 5 counties · 57 months
P2 · Alpha baselineDONE25-signal heuristic at Gaia step 4
P3 · FeaturesWIP117 feats · tier F being wired · 3 under leakage audit
P4 · Walk-forward foldsDONE6-fold × 5 counties · embargo fixed 2026-04-22
P5 · TrainingDONEHistGB selected · +0.020 AUC over LightGBM
P6 · CalibrationDONEIsotonic · 4/5 counties pass · ECE −87%
P7 · AuditsWIP — 9/10Scenario A leakage ablation pending
P8 · SubmissionDONE63 MB ZIP shipped
P9 · Head-to-headGATEDMarch 2025 locked — pending sign-off

Next-action queue

1 · Close Scenario A ablation

Run ablation on the 3 tier-D features (listing_duration_months, months_since_prev_sale, mortgage_age_months). Confirm whether they leak embargo data on Fold 5. If yes → drop or revise. If no → reclassify FLAG → PASS.

2 · Tier F local-market wire-in

Finish wiring BLS county unemployment + ACS income + FHFA state HPI into the feature pipeline. Re-train + re-calibrate.

3 · Camilo parallel baseline

Coordinate with Camilo: extract his model artifact in the same scoring format. Required for Phase 9 head-to-head.

4 · Eduardo sign-off package

Prepare the write-up Eduardo needs to sign off on the locked March 2025 test. Should include: audit closures, current calibration tables, per-county lift evidence, arms-length-filter intentional-deferral note.

5 · REI hub redeploy

Local build clean post CallZeke purge (2026-05-25). Pending: firebase deploy --only hosting:8020rei-new-model. Awaits user confirmation. (Obsolete — hub decommissioned 2026-06-01: web/ removed and the 8020rei-new-model Firebase site deleted. No redeploy; live surface is the platform/ Models Wiki at models-8020iq.web.app.)

References

FileWhat's in it
JOURNEY.md776-line distillation · every deck · numbers tables · timeline · glossary · source map
README.mdBucket overview · cross-bucket pointers · read order
INDEX.mdFolder map · phase status mirror · grep-friendly
Live hub ↗Public artifact · 6 current decks + 24 archived · Firebase static
web/app/brief/page.tsxExecutive brief source
web/app/context/page.tsxMacro context source
web/app/data/page.tsxData backbone source
web/app/decks/01-why-apollo/page.tsx06-the-audits/page.tsx6 current decks source
src/new_model/Implementation — features, folds, alpha, cache
scripts/train_fold.py · train_fold_arch.pyTraining runners
notes/findings/00_index.mdAppend-only research log · dated · evidence-first
notes/findings/09_leakage_audit.mdThe Scenario A audit
notes/MASTER_PLAN.mdTier definitions · phase boundaries (read only when needed)
notes/PROJECT_STATUS.mdWhat's running · what's blocked (ephemeral)
notes/SESSION_HANDOFF.mdMid-session pickup
Memory entriesproject_rock_feedback_loop · project_gaia_architecture · project_arms_length_phase · project_maricopa_oom · feedback_tier_f_leakage_fixed · feedback_reproducibility_pinning — agent-side facts persisted

Cross-bucket pointers

TopicBucketWhere
Roofing pipeline · 9-step MECERoofingnotes/Roofing/PROGRESS_NOTEBOOK.html
Roofing rules · labeling + coverageRoofingnotes/Roofing/RULES_REFERENCE.html
CallZeke (Roofing-client deliverable)Roofingnotes/Roofing/callzeke/ — moved from REI hub 2026-05-25
Coverage platform (live)Roofingcoverage.8020roof.com ↗ · source evidence/
Roofing audit catalogueRoofingnotes/Roofing/audits/ — per-step empirical reports
Two-bucket rule. REI bucket = Apollo (this notebook). Roofing bucket = the roof-replacement pipeline. They share infrastructure (mini/main compute, AWS access, cache) but their decks, notebooks, and findings live in separate directories. Cross-bucket pointers above when a topic touches both.