Methodology
The rules — how every 8020IQ model is built, validated, and judged. For the step-by-step map of the pipeline, see the Framework.
1The prediction frame shared
Every model answers one question, framed identically across surfaces: given what we know about a property at a fixed point in time, will the target event happen in the next six months? The frame is shared; only the label changes per model.
The model stands at T0 and looks forward. Features may only look left; the label only looks right.
| Element | Definition (from notes/MASTER_PLAN.md §3 + CLAUDE.md) |
|---|---|
| T0 — the snapshot | A month-end boundary, stored as a YYYY-MM string. Every feature is computed
as-of T0 month-end. The model only ever sees the world as it stood on that date. |
| Horizon | Six months. The window is T0+1 .. T0+6. |
| The label (y) | y = 1 if the model's target event is recorded anywhere in
T0+1 .. T0+6, else 0. Formally y(T0)=1 if event ∈ (T0, T0+6 months]. The target event
differs by model — see Labeling. |
| Never train on future data | Features must be computable from the T0 snapshot alone. Training stops at
T0 = T_today − 6 months because earlier labels only become observable after the horizon elapses. Walk-forward
folds enforce this via t0 boundaries. |
2Labeling — by label family shared method · per-model label
The frame is shared; the label is where models differ. Two families: a permit event actually happened, or a transaction happened. The methodology for defining a leakage-safe, business-valid label is shared; the specific positive is the model's own.
2.1 · Permit-event labels — Roofing · Garage · Windows
A positive is a real permit of the right kind in T0+1 .. T0+6. The work is (a) classifying which permits
count (type × action), and (b) bounding where a 0 is trustworthy — a "no permit" only means "no event" where the
vendor actually covers that jurisdiction (coverage).
- Roofing (Hestia): a qualifying roof-replacement permit, restricted by the owner-occupied-at-permit rule (a permit only counts as a positive if the owner occupied the property at permit time) and single-family.
- Garage / Windows: the same permit-event frame on a different permit type; reuse the roofing classification + coverage template.
2.2 · Transaction labels — Apollo · Olivia
Here the positive is a deed / sale event, and the two models cut it differently:
- Apollo (REI): any property sale in the horizon (
y_sold). The arms-length filter is intentionally OFF in this phase — every sale (including foreclosure, quit-claim, probate, divorce) counts as a positive, by design. Apollo is the generic "will it transact" signal. - Olivia: a dealable transaction for the client (wholesale / fix-flip) — narrower than any sale. Built by
funnel decomposition:
P(dealable) = P(transacts) × P(dealable | transacts). The client's closed deals are the known positives; every other transaction is unlabeled (a mix of dealable + not), so this is a positive-unlabeled problem, corrected with a true prior. Apollo's signal is Olivia's Stage A.
| Model | Family | y = 1 (positive) | Key rule |
|---|---|---|---|
| roofing Roofing · Hestia | Permit | qualifying roof-replacement permit in T0+1..T0+6 | owner-occupied-at-permit · single-family · coverage-bounded |
| garage Garage | Permit | garage-addition permit in horizon | permit classification · coverage-bounded |
| windows Windows | Permit | window-replacement permit in horizon | permit classification · coverage-bounded |
| rei Apollo · REI | Transaction | any property sale (y_sold) in horizon | arms-length filter OFF (every sale counts), by design |
| olivia Olivia | Transaction · dealable | client closes a dealable deal (wholesale / fix-flip) | funnel decomposition · positive-unlabeled + true-prior |
3Feature library — tiers A through F shared
The feature library is organized into six tiers and shared across models — each model picks its own subset and driver clusters. Tiers A–E are the core (sufficient to beat the incumbent heuristic); Tier F (local market context) is the geographic-correction layer. Each tier carries a null-policy and a provenance tag. See the full feature taxonomy.
| Tier | What | Examples |
|---|---|---|
| A | Property physical | parcels, lot size, use type, year built, building / living area
(YearBuilt, LotSizeSqFt, UseType, property_age_years) |
| B | Owner + distress | 23 distress trajectories (active flag, months-active, months-since-resolved, was-ever-active), absentee level, leverage. The behavioral spine of the model. |
| C | Valuation + activity | AVM, assessed value, valuation_gap, equity / leverage ratio,
appreciation, days-ownership (current_avm_value, leverage_ratio, price_appreciation) |
| D | Date-diffs Under leakage audit | mortgage_age_months,
listing_duration_months, months_since_prev_sale. Three features under active leakage audit — not yet
cited as validated until the ablation runner completes. |
| E | National macro | FRED 30-yr mortgage rate, Fed Funds, Case-Shiller HPI (CSUSHPINSA), national
unemployment. Same value for every property within a T0 month; joined on Period. |
| F | Local market context | county unemployment (BLS LAUS), ACS income, FHFA state HPI, foreclosure-law speed class. The geographic-correction layer; publication-lag-shifted to avoid future leakage. |
Tier D caveat (active audit). Three Tier-D features — listing_duration_months,
months_since_prev_sale, mortgage_age_months — are under leakage audit per notes/findings/09_leakage_audit.md.
In the REI model listing_duration_months alone has contributed roughly half of one county's ranking signal in some
windows, so single-feature dependency is tracked explicitly. Their contribution is not cited as validated until the ablation completes.
4Modeling — folding logic & calibration per-model spine
The modeling spine is the same for every model: walk-forward folds with an embargo, case-control sampling, a modeling ladder, and calibration to the true base rate.
4.1 · Walk-forward folds + embargo
Validation is walk-forward, never random K-fold. Train on everything available up to a fold, evaluate on the next fold, advance six months, repeat — mimicking production, where the model retrains every six months. Random K-fold on time-series data leaks future information into training and produces AUC figures that evaporate in production.
An embargo equal to the horizon separates train and eval: each dev fold's first eval T0 equals the last train T0 plus the horizon plus one month, so the last training label-window ends exactly one month before the first eval label-window starts — zero overlap. (The earlier layout had up to a 5-month label-window overlap; the embargo default eliminates it.)
| Fold | Training T0 range | Evaluation T0 | Label observable by |
|---|---|---|---|
| 1 | 2021-01 .. 2021-09 | 2022-03 | 2022-09 |
| 2 | 2021-01 .. 2022-03 | 2022-09 | 2023-03 |
| 3 | 2021-01 .. 2022-09 | 2023-03 | 2023-09 |
| 4 | 2021-01 .. 2023-03 | 2023-09 | 2024-03 |
| 5 | 2021-01 .. 2023-09 | 2024-03 | 2024-09 |
| 6 | 2021-01 .. 2024-03 | 2024-09 | 2025-03 |
| TESTlocked | 2021-01 .. 2024-09 | 2025-03 | 2025-09 |
4.2 · Case-control negative sampling + true-prior correction
Sale and roof-permit events are rare (single-digit-percent base rates), so the negative class is downsampled per cohort —
keep all positives plus a fixed ratio of negatives (NEG_TO_POS_RATIO × positives). This unblocks the largest counties
(Maricopa, 04013, ~66M rows; Harris, 48201, ~59M rows) on a single 32 GB machine. Downsampling distorts the
output probabilities, so the model's scores are corrected back to the true base rate at calibration time. For Olivia the same
machinery doubles as the positive-unlabeled correction — known positives (client deals) against sampled unlabeled rows,
rescaled to the estimated true prior.
4.3 · Modeling ladder
Three rungs, in order. No rung is built until the previous rung has beaten the incumbent heuristic on the current fold. Each rung has a go/no-go gate.
| Rung | Model | Role | Gate to proceed |
|---|---|---|---|
| 1 | Logistic Regression (Tier A + active distress) | Interpretable sanity-check floor — falsifies "the signal is non-linear" | lift@top-10% > Alpha |
| 2 | Gradient-boosted trees, calibrated (Tier A+B+C+D+E) | Production candidate | recall@top-10% > Alpha |
| 3 | State-stratified / hierarchical boosting (+ Tier F) | Answers "does per-state training help?" | Decided on Fold 6 only — never on the locked test |
Rung 3 is a tie-breaker question, not a default. If per-state training is a dead heat against one national model, the simpler national model ships. Simplicity wins ties.
4.4 · Calibration — rank, then forecast on the true base rate
The boosted model is first a ranker (who is most likely to transact); calibration turns that rank into a trustworthy forecast (a 0.8 score means roughly 80% of such properties actually transact). Calibration is a post-processing step fit against the true base rate — undoing the case-control downsampling — so the predicted probabilities are neither inflated nor deflated.
5The win condition Target — not a result
Each model fixes its bar in advance and is judged against it. The bar is model-specific, but the discipline — set it before you look, judge on held-out data — is shared.
Apollo (REI) — beat Alpha + Camilo on the locked March-2025 test
The locked test is untouched. The March-2025 evaluation cohort is frozen and not run early — no feature engineering, no hyperparameter tuning, no inspection — until Eduardo and Camilo co-sign in writing. Everything reported on the model pages is walk-forward validation ending earlier; the locked test is the final gate, not a result already achieved. Two of the three criteria met is a draw; three of three is a win.
All models draw on the same feature library and the data layer. Where each model stands — status, live validation numbers, changelogs — lives on the Models page and each model's hub; the running research log is in the Log.
Rendered from notes/MASTER_PLAN.md + notes/METHODOLOGY.md.