REI rules
src line on each block.
Decision & policy rules
y_sold = 1 iff any sale is recorded in T0+1 .. T0+6
(6-month horizon). During this phase every sale event counts as a positive —
foreclosures, quit-claims, probate, and divorce sales are all included. The
arms-length filter is intentionally OFF. This is scope, not an oversight:
re-running with the filter is a planned second pass once Eduardo signs off on the
policy. Do not treat the missing filter as a bug or a blocker.
Never train on future data. Features must be computable from the T0
month-end snapshot alone. Walk-forward folds enforce this via t0
boundaries; the feature builder (src/new_model/features.py) reads only
the as-of-T0 columns (base_globs = _globs([t0], fips)). T0 is a
month-end boundary stored as a YYYY-MM string.
Three features are under active leakage audit: listing_duration_months,
months_since_prev_sale, and mortgage_age_months. Do not
cite their AUC-PR contribution as validated until the ablation runner completes.
(Finding 9 records the as-of-T0 date probe — no post-T0 records found — but the
cross-county ablation is still in progress; treat the dependence figures there as
preliminary.)
| Tier | Family | Note |
|---|---|---|
| A | Property physical — parcels, size, use, year built | |
| B | Owner + distress — 23 distress trajectories, absentee, leverage | |
| C | Valuation + activity — AVM, appreciation, days_ownership | |
| D | Date-diffs — mortgage age, listing duration, prev-sale recency | under leakage audit (see Rule 3) |
| E | National macro — FRED mortgage rate, Fed funds, HPI, CPI, unemployment (same value per T0 month) | |
| F | Local market context — BLS county unemployment, ACS income, FHFA state HPI | currently being wired in |
MASTER_PLAN §4 builds these in tiers (A–E for the MVP, F deferred). Feature counts vary across findings — this register links the findings rather than asserting one number.
Evaluation is walk-forward: train on everything up to a fold, evaluate the next fold, advance six months, repeat (MASTER_PLAN §3). The embargo between the last train T0 and the first eval T0 must equal the horizon — 6 months. With a shorter gap the last train T0s carry label windows that extend into the eval window, contaminating the outcome (finding 32 quantified a 5-month overlap in the pre-fix folds). The locked test fold is structurally clean (0 overlap T0s).
Training uses a 10:1 negative-to-positive downsample to fit in memory, which inflates the training positive rate (~9.1%) above the true eval rate (2–4%). Raw gradient-boosted probabilities are therefore globally too high on eval. Calibration must be performed at the true population base rate on a held-out, non-downsampled slice — not on the downsampled training pool (which inherits the wrong prior and cannot correct the shift). Isotonic regression is the non-negotiable post-processing step (MASTER_PLAN §6); the prior-ratio rescaling in finding 10 moved 4 of 5 counties inside the Brier target without changing AUC-PR.
Model scores rank properties within a county. A top score in one county does not imply the same deal rate as a top score in another — base rates and Alpha's separation differ markedly across markets (finding 41: Alpha AUC 0.53–0.59 outside Miami; eval positive prevalence varies county to county). Do not compare a raw score in Jackson to a raw score in Miami as if the scale were shared.
The locked March-2025 test cohort (T0 = 2025-03 → 2025-09) is
untouchable until Eduardo + Camilo sign off in writing. No feature
engineering, no hyperparameter tuning, no inspection — not even a sanity
check — touches that fold before the gate. Eduardo and Camilo co-sign the lock
(MASTER_PLAN §3). As of this page the test fold is untouched.
Win condition: top-decile recall ≥ Alpha AND ≥ Camilo, with 30/60/90-day calibration within ±15%, on the locked March-2025 head-to-head evaluation. This is the target, NOT a current result — it can only be adjudicated once the locked test in Rule 8 is opened. Finding 41 is the honest Alpha-vs-model comparison on the embargoed dev fold (Fold 5), not the locked gate; do not read it as the win condition being met.
Related pages
REI overview · REI pipeline · how it works · model card · changelog
Rules from CLAUDE.md hard rules + notes/MASTER_PLAN.md · REI findings · status: building (win condition gated, not yet met).