Overview › Methodology

Methodology

The rules — how every 8020IQ model is built, validated, and judged. For the step-by-step map of the pipeline, see the Framework.

One method, many models — a shared platform, then a per-model spine. Roofing, Apollo, Olivia, garage and windows share the same recipe: a point-in-time snapshot, a six-month label, a tiered feature library, walk-forward validation, a modeling ladder, and calibration to the true base rate. The shared platform is the prediction frame, the feature library, and the audit method. What each model owns is its label (§2) and its modeling + delivery. This page documents the method; each model page shows its own numbers.

1The prediction frame shared

Every model answers one question, framed identically across surfaces: given what we know about a property at a fixed point in time, will the target event happen in the next six months? The frame is shared; only the label changes per model.

History → featureseverything knowable as of the snapshot — and nothing after it

T0month-end

Label window · T0+1 .. T0+6y = 1 iff the target event lands in these six months

The model stands at T0 and looks forward. Features may only look left; the label only looks right.

Element	Definition (from `notes/MASTER_PLAN.md` §3 + `CLAUDE.md`)
T0 — the snapshot	A month-end boundary, stored as a `YYYY-MM` string. Every feature is computed as-of T0 month-end. The model only ever sees the world as it stood on that date.
Horizon	Six months. The window is `T0+1 .. T0+6`.
The label (y)	`y = 1` if the model's target event is recorded anywhere in `T0+1 .. T0+6`, else `0`. Formally `y(T0)=1 if event ∈ (T0, T0+6 months]`. The target event differs by model — see Labeling.
Never train on future data	Features must be computable from the T0 snapshot alone. Training stops at `T0 = T_today − 6 months` because earlier labels only become observable after the horizon elapses. Walk-forward folds enforce this via `t0` boundaries.

2Labeling — by label family shared method · per-model label

The frame is shared; the label is where models differ. Two families: a permit event actually happened, or a transaction happened. The methodology for defining a leakage-safe, business-valid label is shared; the specific positive is the model's own.

2.1 · Permit-event labels — Roofing · Garage · Windows

y = 1 iff a qualifying permit event is recorded in the horizon.

A positive is a real permit of the right kind in T0+1 .. T0+6. The work is (a) classifying which permits count (type × action), and (b) bounding where a 0 is trustworthy — a "no permit" only means "no event" where the vendor actually covers that jurisdiction (coverage).

Roofing (Hestia): a qualifying roof-replacement permit, restricted by the owner-occupied-at-permit rule (a permit only counts as a positive if the owner occupied the property at permit time) and single-family.
Garage / Windows: the same permit-event frame on a different permit type; reuse the roofing classification + coverage template.

Step 1 · Labeling Permit classification Coverage How we match permits↔parcels Roofing rules

2.2 · Transaction labels — Apollo · Olivia

y = 1 iff a qualifying transaction happens in the horizon.

Here the positive is a deed / sale event, and the two models cut it differently:

Apollo (REI): any property sale in the horizon (y_sold). The arms-length filter is intentionally OFF in this phase — every sale (including foreclosure, quit-claim, probate, divorce) counts as a positive, by design. Apollo is the generic "will it transact" signal.
Olivia: a dealable transaction for the client (wholesale / fix-flip) — narrower than any sale. Built by funnel decomposition: P(dealable) = P(transacts) × P(dealable | transacts). The client's closed deals are the known positives; every other transaction is unlabeled (a mix of dealable + not), so this is a positive-unlabeled problem, corrected with a true prior. Apollo's signal is Olivia's Stage A.

Apollo · Label (sale) Olivia · Label (dealable) Investor-deal criteria (#12) Deal-pattern analysis (#13)

Model	Family	y = 1 (positive)	Key rule
roofing Roofing · Hestia	Permit	qualifying roof-replacement permit in T0+1..T0+6	owner-occupied-at-permit · single-family · coverage-bounded
garage Garage	Permit	garage-addition permit in horizon	permit classification · coverage-bounded
windows Windows	Permit	window-replacement permit in horizon	permit classification · coverage-bounded
rei Apollo · REI	Transaction	any property sale (`y_sold`) in horizon	arms-length filter OFF (every sale counts), by design
olivia Olivia	Transaction · dealable	client closes a dealable deal (wholesale / fix-flip)	funnel decomposition · positive-unlabeled + true-prior

3Feature library — tiers A through F shared

The feature library is organized into six tiers and shared across models — each model picks its own subset and driver clusters. Tiers A–E are the core (sufficient to beat the incumbent heuristic); Tier F (local market context) is the geographic-correction layer. Each tier carries a null-policy and a provenance tag. See the full feature taxonomy.

Tier	What	Examples
A	Property physical	parcels, lot size, use type, year built, building / living area (`YearBuilt`, `LotSizeSqFt`, `UseType`, `property_age_years`)
B	Owner + distress	23 distress trajectories (active flag, months-active, months-since-resolved, was-ever-active), absentee level, leverage. The behavioral spine of the model.
C	Valuation + activity	AVM, assessed value, `valuation_gap`, equity / leverage ratio, appreciation, days-ownership (`current_avm_value`, `leverage_ratio`, `price_appreciation`)
D	Date-diffs Under leakage audit	`mortgage_age_months`, `listing_duration_months`, `months_since_prev_sale`. Three features under active leakage audit — not yet cited as validated until the ablation runner completes.
E	National macro	FRED 30-yr mortgage rate, Fed Funds, Case-Shiller HPI (CSUSHPINSA), national unemployment. Same value for every property within a T0 month; joined on `Period`.
F	Local market context	county unemployment (BLS LAUS), ACS income, FHFA state HPI, foreclosure-law speed class. The geographic-correction layer; publication-lag-shifted to avoid future leakage.

Tier D caveat (active audit). Three Tier-D features — listing_duration_months, months_since_prev_sale, mortgage_age_months — are under leakage audit per notes/findings/09_leakage_audit.md. In the REI model listing_duration_months alone has contributed roughly half of one county's ranking signal in some windows, so single-feature dependency is tracked explicitly. Their contribution is not cited as validated until the ablation completes.

4Modeling — folding logic & calibration per-model spine

The modeling spine is the same for every model: walk-forward folds with an embargo, case-control sampling, a modeling ladder, and calibration to the true base rate.

4.1 · Walk-forward folds + embargo

Validation is walk-forward, never random K-fold. Train on everything available up to a fold, evaluate on the next fold, advance six months, repeat — mimicking production, where the model retrains every six months. Random K-fold on time-series data leaks future information into training and produces AUC figures that evaporate in production.

An embargo equal to the horizon separates train and eval: each dev fold's first eval T0 equals the last train T0 plus the horizon plus one month, so the last training label-window ends exactly one month before the first eval label-window starts — zero overlap. (The earlier layout had up to a 5-month label-window overlap; the embargo default eliminates it.)

Fold	Training T0 range	Evaluation T0	Label observable by
1	2021-01 .. 2021-09	2022-03	2022-09
2	2021-01 .. 2022-03	2022-09	2023-03
3	2021-01 .. 2022-09	2023-03	2023-09
4	2021-01 .. 2023-03	2023-09	2024-03
5	2021-01 .. 2023-09	2024-03	2024-09
6	2021-01 .. 2024-03	2024-09	2025-03
TESTlocked	2021-01 .. 2024-09	2025-03	2025-09

4.2 · Case-control negative sampling + true-prior correction

Sale and roof-permit events are rare (single-digit-percent base rates), so the negative class is downsampled per cohort — keep all positives plus a fixed ratio of negatives (NEG_TO_POS_RATIO × positives). This unblocks the largest counties (Maricopa, 04013, ~66M rows; Harris, 48201, ~59M rows) on a single 32 GB machine. Downsampling distorts the output probabilities, so the model's scores are corrected back to the true base rate at calibration time. For Olivia the same machinery doubles as the positive-unlabeled correction — known positives (client deals) against sampled unlabeled rows, rescaled to the estimated true prior.

4.3 · Modeling ladder

Three rungs, in order. No rung is built until the previous rung has beaten the incumbent heuristic on the current fold. Each rung has a go/no-go gate.

Rung	Model	Role	Gate to proceed
1	Logistic Regression (Tier A + active distress)	Interpretable sanity-check floor — falsifies "the signal is non-linear"	`lift@top-10%` > Alpha
2	Gradient-boosted trees, calibrated (Tier A+B+C+D+E)	Production candidate	`recall@top-10%` > Alpha
3	State-stratified / hierarchical boosting (+ Tier F)	Answers "does per-state training help?"	Decided on Fold 6 only — never on the locked test

Rung 3 is a tie-breaker question, not a default. If per-state training is a dead heat against one national model, the simpler national model ships. Simplicity wins ties.

4.4 · Calibration — rank, then forecast on the true base rate

The boosted model is first a ranker (who is most likely to transact); calibration turns that rank into a trustworthy forecast (a 0.8 score means roughly 80% of such properties actually transact). Calibration is a post-processing step fit against the true base rate — undoing the case-control downsampling — so the predicted probabilities are neither inflated nor deflated.

5The win condition Target — not a result

Each model fixes its bar in advance and is judged against it. The bar is model-specific, but the discipline — set it before you look, judge on held-out data — is shared.

Apollo (REI) — beat Alpha + Camilo on the locked March-2025 test

recall@top-10%

must exceed BOTH the incumbent heuristic (Alpha) AND Camilo's model on the locked test cohort

±15%

30 / 60 / 90-day calibration must hold within ±15% across deciles

2025-03

single locked test T0; train ends 2025-02, separated by the full horizon

The locked test is untouched. The March-2025 evaluation cohort is frozen and not run early — no feature engineering, no hyperparameter tuning, no inspection — until Eduardo and Camilo co-sign in writing. Everything reported on the model pages is walk-forward validation ending earlier; the locked test is the final gate, not a result already achieved. Two of the three criteria met is a draw; three of three is a win.

All models draw on the same feature library and the data layer. Where each model stands — status, live validation numbers, changelogs — lives on the Models page and each model's hub; the running research log is in the Log.

Rendered from notes/MASTER_PLAN.md + notes/METHODOLOGY.md.