Roofing pipeline · Progress Notebook cuaderno · single source of truth · 9-step MECE pipeline · 2026-05-20 rev2 (Step 1 restructure + dev-layer)
ADR.
permit_scope label · how FA & BuildZoom municipalities are cleaned into one canonical key · the 4-gate coverage decision tree — all visual, all clear.a31bdbc, 188 feat post 6 structural fixes + insurance MVP): model-quality lift 6.88× fold-1 / 6.44× cross-fold mean (range 5.88–6.86×); deliverable lift (after 50/25/25 quota) 5.66× / 1,654 caught / 11.03 % precision (174 feat after synth_v4 drop, parsimony win). Status: HOLD per 4-of-4 external AI reviewer consensus (round 4) — re-anchor to 2026-05-31 + renegotiate quota required before shipping to CallZeke.DOSSIER_v5_FULL_for_external_ai.md,
55 KB self-contained):
2 ship-blockers: (1) Stale 2025-05-31 anchor — the 6-month forward window (Jun–Nov 2025) has already transpired; SDRs would call homeowners who may have already reroofed in late 2025; crosses the SB-76 / Helene / Milton regime change. Re-score at 2026-05-31 required (AWS-cost gated, REGLA DE ORO). (2) Quota cannibalization — 50/25/25 county allocation costs -1.31× lift, -373 caught positives. Pinellas is 67 % tile/metal (35–50 yr life) while Pasco is 99 % asphalt (15–25 yr); model correctly ranks Pinellas low but quota forces 1,939 below-natural-threshold rows in. Renegotiate with CallZeke before shipping next deliverable.
New critical/major findings to act on (quick wins, ≤1 day each):
- Action-zone calibration (top-1k/5k/15k + per-county + per-material) — global ECE 0.0003 hides top-decile -6.7 % relative under-call
- Triple-baseline lift report (full-pop / gated-pool / age-cohort-matched) — gate alone delivers ~1.75× of headline 7.0×, model-on-top is ~4.0×
- 10-seed insurance-wall validation — single-seed +0.41× may be Milton-driven, cross-fold +0.04× still within σ
- Block bootstrap CI on +0.50× cumulative gain — folds correlated, effective n < 6
- Jurisdiction match-rate as explicit feature — closes gate-fallback circularity (reviewer A3)
- Synth_v4 ablation given v5 material-window — test for age-cohort leakage
- HOA fee residualization on AVM + zip-income — disentangle wealth proxy vs roof signal
- Soft-label NA at weight=0.5 — definitive close of D21 / Q20
- Material-aware quota simulation — recovers est. +0.3-0.5× of the 1.31× quota cost
What reviewers said NOT to do: defer full Citizens depopulation external data until 10-seed validates insurance-wall MVP; kill Q19 (FEMA reclass / 4-point inspection chasing); kill per-storm re-scoring; do not drop fold-6 from cross-fold mean even though it shows brittleness (it's the honest stress test for distribution shift, 4 of 4 reviewers agree).
Triple agents synthesised: (1) DS expert found a synth_v4 subject-self- contamination bug + collinearity in roof-age cluster; (2) FL roofing domain expert called out insurance-renewal signals + sign-flipped distress composites; (3) Apollo cross-pollination identified Tier F micro (BLS+ACS+ FHFA) and V2.1 regime interactions as free wins. Acted on: synth_v4 fix + synth_v5 layer (sign-flipped distress composites, material-adjusted replacement window 15-25 / 35-50 / 40-60 / 12-20 yr by cover code, Apollo-port equity/absentee features) shipped. Tier F micro built + tested + DROPPED (0 features in top-40 — Apollo a-priori warning confirmed). New top-2 by gain:
in_replacement_window_by_material (#1,
13.8 %) and pct_of_useful_life (#2, 8.8 %) — clean substitution
of the asphalt-hardcoded version.
Master reports:
triple_audit_consolidated.md ·
autopilot_overnight.md ·
master_summary.md ·
v5_champion_6fold.md.
Empirical re-verification. Same 174-feature v3 stack re-evaluated at 3 fully-observed anchors (forward windows fully covered by gold 2026-05-12 vintage):
| T0 anchor | forward window | base rate | lift@15k | recall | prec |
|---|---|---|---|---|---|
| 2025-10-31 (right-censored) | 2025-11..2026-04 | 1.21% | 9.24× | 27.8% | 11.2% |
| 2024-11-30 (fully observed) | 2024-12..2025-05 | 3.27% | 6.53× | 19.7% | 21.4% |
| 2024-05-31 (fully observed) | 2024-06..2024-11 | 2.56% | 5.99× | 18.0% | 15.3% |
| 2023-11-30 (fully observed) | 2023-12..2024-05 | 2.08% | 6.61× | 19.9% | 13.8% |
Convergent BLOCKERS surfaced (will-fix list):
- B1 · Stale county multipliers (1.127 / 1.212 / 0.729 hardcoded from 2025-05-31 anchor → applied to 2025-10-31 ranking) — flagged by Model QA, Architect, Verifier.
- B5 · Confirmed leak:
canceled_roofusedLATEST_STATUSstring equality (current status, not as-of-T0) — flagged by Security. FIXED 2026-05-22 viaLATEST_STATUS_DATE ≤ T0gate inbuild_synth_v3.py:265-300. Expected lift drop: small (canceled_roof's marginal contribution was modest). - B6 ·
is_listed100% identical across two T0s 6mo apart (497,692 / 497,856) — REM single-snapshot leakage suspected. Under investigation. - Clobber bug:
v3_metrics.json+fold1_scored_v3.parquetoverwritten by parallel anchor runs. FIXED 2026-05-22: filename now embedsEVAL_T0whenRUN_TAGempty. - 5 sidecars missing at 2025-10-31: equity_macro, micro_local, synth_coverage, synth_v4, temporal_distress. SKIP_LAYERS default protects training, but explicit rebuild required if those families are ever revived at this anchor.
Master report:
2026-05-22_10agent_audit_synthesis.md ·
data: data/sandbox/model/cross_anchor_audit_2026-05-22.json.
Cross-anchor mean lift@15k = 6.66× ± 0.33, 95% CI [6.31, 7.01] (t df=5). Six anchors all post-2022, all fully observed by gold vintage 2026-05-12: [7.08, 6.64, 6.07, 6.74, 6.80, 6.62] at T0 ∈ {2025-05-31, 2024-11-30, 2024-05-31, 2023-11-30, 2023-05-31, 2022-11-30}. Critically the 95% CI does not cross 6.0× — the model is statistically distinguishable from a 6× baseline.
Per-county lifts (raw, no quota): Pasco 6.76× / Hernando 5.42× / Pinellas 4.29×. One-way ANOVA F=14.45, p≈0.005 — the per-county gap is REAL, not noise (driven by roof-material lifespan differences: asphalt 15-25yr in Pasco vs tile/metal 35-50yr in Pinellas).
Production deliverable:
callzeke_ranked_15k_v3.csv scored at
SCORE_DATE=2025-05-31 (most-recent fully-observed anchor; lift 7.08×).
Per-county multipliers loaded from
data/sandbox/calibration/county_multipliers_2024-05-31.json — fit at
strict temporal holdout (forward window observed by Dec 2024, applied 5 months
later) to eliminate the circular-validation leak Security R3 found.
Round-3 + R4 agent verdicts (final):
- Critic R3: SHIP-WITH-RESERVATIONS · Model QA R3: SHIP (conditional) · Reality Check R3: SHIP-OK
- Scientist R3: STATISTICALLY VALID · Architect R3: PRODUCTION-READY · Security R3→R4: LEAK CLOSED
- All 6 round-3+ agents SHIP. All 10 round-1 agents' concerns addressed by R2/R3 fixes.
dffcb7a · f8109e0 · aa09475 · 3b50764 ·
ceb9802 · 21e3a1f · 838d432 · 667153e.
Master report: per_county_lift_postYN ·
county_material_calibration (holdout fit).
1. Multi-seed 3-seed × 8-anchor canonical replaces single-seed headline. Rerun all 8 anchors at seeds 42 and 1337 (originals were 20260521). Per-anchor 3-seed means:
| T0 | seed=1 | seed=42 | seed=1337 | 3-seed mean | within-SD |
|---|---|---|---|---|---|
| 2025-05-31 (prod) | 7.08 | 7.97 | 8.07 | 7.71 | 0.55 |
| 2025-02-28 | 7.63 | 7.64 | 7.71 | 7.66 | 0.04 |
| 2024-11-30 | 6.64 | 7.31 | 7.42 | 7.12 | 0.42 |
| 2024-08-31 | 6.72 | 6.61 | 6.62 | 6.65 | 0.06 |
| 2024-05-31 | 6.07 | 6.10 | 6.14 | 6.10 | 0.04 |
| 2023-11-30 | 6.74 | 6.88 | 6.77 | 6.80 | 0.07 |
| 2023-05-31 | 6.80 | 7.07 | 6.84 | 6.90 | 0.15 |
| 2022-11-30 | 6.62 | 6.15 | 6.47 | 6.41 | 0.24 |
2. National-county transferability infrastructure SHIPPED, sanity-test BLOCKED.
- SHIPPED: T0_ONLY env var (17 builders) · model.save_model (.lgb + .feats.json) · score_county.py · COUNTIES env-var refactor · Maricopa silver_rem 2022-05..2026-04 pulled from S3 (~5GB).
- BLOCKED: Maricopa coverage_decisions has only 27 FLAG rows (no INCLUDE) → enrichment universe = 0 properties → can't build/score. Need coverage pipeline run for Maricopa (separate workstream) OR INCLUDE-bypass env that uses all FA SFH from silver_rem.
- FIPS naming gotcha: Maricopa stored as "4013" (4-digit) in S3 + coverage_run/silver/ but "04013" (5-digit zero-pad) elsewhere. Resolved via _COUNTIES_NAMES dict in build_enrichment.py but still needs coverage data.
- FL enrichment at 2024-11-30 was BACKED UP + RESTORED during the attempt (no contamination of canonical artifacts).
Master reports:
multiseed_validation ·
national_sanity_blocker.
Commits: cd3387a (multi-seed + COUNTIES env) · 62310fc (T0_ONLY refactor) · 0af7b68 (model save + score_county.py).
v9 SHIPPED (2026-05-25 23:48 · cycle 5 · 6-county Path A model): Retrain landed in <30 minutes (much faster than 10h estimate · prior sidecars built earlier). Output
model_v3_fl6_pathA_full.lgb · 189 features (synth + behavioral + baseline). Per-county AUC: Hernando 0.852 · Pasco 0.884 · Pinellas 0.714 · Duval 0.834 · Orange 0.854 · Sarasota 0.834. Lift@15K on full 6-county pool: 9.03× vs v8 baseline 8.26× = +9.3% improvement. Caught 3,721 / 15K vs v8's 3,405 = +316 reroofs. Isotonic-calibrated scored parquet at fold1_scored_v3_fl6_pathA_full_calibrated.parquet. v9 client CSV at data/sandbox/model/callzeke_PROD_15k_2026-04-30_v9_client.csv (15,000 × 30, all R1-R7 + caps preserved). v8 ∩ v9 overlap = 70.4% (4,435 row swap from re-rank). ETA was ~10h. Post-train: isotonic calibration on top 50K + per-county AUC emission + re-score CZ counties at PROD anchor 2026-04-30 → v9 client CSV. Expected +5-15% lift over v8 (per Path A Maricopa precedent: 189 → 287 feat = +0.072 AUC).
v8 CURRENT SHIP (2026-05-25 cycle 4 · 6-county 189-feat + all R1-R7 + caps fixed):
data/sandbox/model/callzeke_PROD_15k_2026-04-30_v8_client.csv (15,000 × 30, exact quota, all 11 ship-blockers from manual audit closed). DM/CC caps now actually enforce (was broken in v5/v6). R5/R6/R7 wired into pipeline (recent roof from gold · mailing opt-out · brand-new cutoff at 2021). +14.6% reroofs caught vs v5 (6-county model) + 2,058 audit-driven drops backfilled from scored pool.
v6 (superseded):
data/sandbox/model/callzeke_PROD_15k_2026-04-30_v6_client.csv (15,000 × 30 cols, client-ready) · internal v6_full.csv (15,000 × 43 cols with all flags). v5 superseded after 6-county Path A retrain (FL+Orange+Duval+Sarasota) demonstrated **+14.6% catch improvement** on the same 3 CallZeke counties at backtest @ 2024-11-30 (2,668 → 3,058 reroofs caught at the 15K quota). Pinellas was the biggest beneficiary (+37% lift), Pasco +13%, Hernando +8%. 5,036 rows (33.6% of list) re-ranked vs v5; tier composition essentially identical.
v3 (14,788 rows, dropped post-hoc) is superseded; v4 oversamples the scored pool and backfills so the final 15,000 is exact AFTER all hard exclusions.
Hard exclusion rules (now part of pipeline · enforced before quota slice): -
R1 SFH-only (drop 199 non-SFH)
- R2 mailable: non-blank MAILING address+city+state (drop 174)
- R3 EMV > 0 (drop 155 — Pinellas FA blanks)
- R4 USPS-deliverable via Smarty DPV ∈ {Y, S, D} (drop 161)
- From 27,000 oversampled pool → 26,447 survived → quota-sliced to 15,000.
Action Plan tiering (replaces hardcoded "30 days" constant): 1,354 High (30d) · 4,647 Medium (60d) · 8,999 Low (90d) after demoting 149 vacant/military/recent-roof rows. 12-week cadence schedules per tier. CC capped 1 per owner (14,843 unique owners) AND DM capped 2 per mailbox (14,696 unique targets). Total touches: 35,726 DM + 2,682 CC.
Spec + pipeline: Action Plan System (visual + flowchart) ·
finding #79 · builder scripts/roofing/build_15k_with_exclusions.py · filters lib scripts/roofing/lib/pipeline_filters.py · Smarty client scripts/roofing/lib/smarty_client.py (with JSONL cache → free re-runs).
Empirical analysis of 362,643 gold REPLACEMENT events across the 3 CallZeke counties (Hernando · Pasco · Pinellas) joined with FA silver. Answers the question "is it unlikely for a roof to be replaced under 15 years?" → No — 9.93% of all replacements happen on roofs <15yr old (storm damage, insurance claims, premature material failure). Cutoff for the client list was loosened from "<15yr drop" to "<5yr drop only" (R7 max_year_built=2020) based on this finding.
Age at replacement (empirical · 362,643 events)
| Age bracket | Events | % of all | Interpretation |
|---|---|---|---|
| 0–5 yr | 21,330 | 5.97% | Mostly data artifacts + new-build permit re-issuance |
| 5–10 yr | 3,551 | 0.99% | Premature failure / storm-forced |
| 10–15 yr | 10,609 | 2.97% | Coastal humidity + hurricane forced |
| 15–20 yr | 44,727 | 12.52% | Sweet spot for asphalt-shingle |
| 20–25 yr | 39,177 | 10.96% | Late asphalt cycle |
| 25–30 yr | 26,599 | 7.44% | Mid-life tile / heavy asphalt |
| 30–40 yr | 67,116 | 18.78% | Tile cycle peak |
| 40–60 yr | 99,861 | 27.94% | Late tile + older stock first-replacement |
| 60–120 yr | 44,399 | 12.42% | Vintage stock modernization |
Six personas
| # | Persona | N | % | Median (age @ replace · year built · EMV · sqft · LTV) |
|---|---|---|---|---|
| P1 | Classic FL Owner-Occupier · Asphalt Cycle 15–25yr asphalt · owner-occupied · individual | 21,046 | 5.89% | 18yr · 1992 · $268K · 1,640 sqft · 21% |
| P2 | Tile Roof · Late-Life Cycle 25–40yr tile/concrete · higher-dollar replacement | 35,848 | 10.03% | 31yr · 1982 · $341K · 2,090 sqft · 13% |
| P3 | Storm-Damage Early Replace (Irma cohort) <15yr roof · replaced 2017–2019 · forced | 3,205 | 0.90% | 9yr · 2007 · $245K · 1,830 sqft · 35% |
| P4 | Institutional Investor Portfolio LLC / Trust / Corp · OOS mailing · scheduled cap-ex | 37,106 | 10.38% | 26yr · 1988 · $228K · 1,520 sqft · 22% |
| P5 | Aging Affordable Stock · Long-Overdue 30–60yr · EMV <$200K · owner-occupied · deferred maintenance | 28,470 | 7.97% | 42yr · 1971 · $134K · 1,210 sqft · 31% |
| P6 | High-Value Pre-Listing Upgrade 5–15yr · EMV >$400K · low LTV · prep-for-sale | 4,194 | 1.17% | 10yr · 2010 · $580K · 2,470 sqft · 24% |
Implications for the list
- Sweet spot is 15–30yr (P1 + P2 + first half of P5 ≈ 30% of all replacements). The model already weights this via
roof_age_months_est. - P3 storm-damage (sub-15yr) is real but small (0.9%). Loosened R7 cutoff to 5yr not 15yr — drops only brand-new homes.
- P4 institutional (10%) = scheduled cap-ex, not lead-driven. CC outreach is wasted on portfolio managers. Recommend DM-only for these.
- P6 high-value (1.2%) = rare but high-LTV roofer's dream. Worth a separate tier in future versions.
scripts/roofing/audit_replacement_profiles.py · machine-readable JSON notes/Roofing/audits/2026-05-25_replacement_profiles.json.
Production list shipped:
data/sandbox/model/callzeke_PROD_15k_2026-04-30_quota_50_25_25.csv (superseded by v4 on 2026-05-25 — see panel above)
- SCORE_DATE = 2026-04-30 (forward window May-Oct 2026 — predicts NEXT 6 months)
- Quota: Hernando 7,500 / Pasco 3,750 / Pinellas 3,750 (50/25/25 strategic intent)
- Eligible pool 217,121 (post 15-yr recent-roof gate)
- Score range 0.20-0.92, median 0.32
- Expected 32% recall, 4.70× lift, ~1,940 catches (per 2025-05-31 backtest)
🚨 R5 RETRACT (national transferability): Critic R5 caught a smoking gun — naive year_built ASC ranking catches EXACTLY the SAME 11 Maricopa positives as the 189-feature FL-trained model. Model adds ZERO signal over trivial age-sort on Maricopa. Transferability claim RETRACTED in both audit MD and CALLZEKE deck. Honest framing: model is FL-validated; non-FL transferability requires naive-baseline tests in each market (only Maricopa tested so far → failed; Harris pending).
FL signal IS REAL: the same naive-baseline test on FL @ 2025-05-31 shows model beats naive year_built by +369% to +1,132% across all quotas. The model is NOT just an age heuristic in FL — it's a genuine 189-feature ranker that uses material, refi propensity, building condition, storm exposure, and per-county calibration to find replacement-ready properties.
Prior 25K (2026-04 DM) overlap with new 50/25/25 prod list: - 3,088 of 25,000 (12.4%) — model rebuilds list from scratch vs Apollo era - Hernando: 2,889 / 16,519 (17.5% kept) - Pasco: 165 / 6,935 (2.4% kept) - Pinellas: 34 / 1,546 (2.2% kept) - Median current rank of prior list = 273,729 (54% of pop = essentially random)
Score @ recall thresholds (eligible pool): - 50% recall: score ≥ 0.269, list size 24,273 (11.0% of pool) - 70% recall: score ≥ 0.178, list size 46,191 (21.0% of pool) - 80% recall: score ≥ 0.134, list size 65,084 (29.5% of pool) - 90% recall: score ≥ 0.098, list size 95,530 (43.4% of pool)
15K list caps at ~32% recall mathematically. If client wants 80% recall they need a 65K list.
Session totals (2 days, 2026-05-22 → 2026-05-24): 33 commits · 5 audit rounds (10 + 5 + 5 + 5 + 1 agents) · 6 bug fixes shipped (canceled_roof leak, clobber, Y/N dead-feature, flood-zone dead-feature, circular-validation leak, dead-code) · 3 framework refactors (T0_ONLY env, model save + score_county.py, JSON multipliers).
Master reports:
15k_list_quota_analysis ·
national_sanity_3county (R5 RETRACT) ·
SESSION_HANDOFF.md (full handoff).
Final commits: 4ba5f25 · 183c7ee · 816edda · 574d07f · ef81b7e · f242a9d · 9a6f0c5 · d8ec057.
3-seed × 8-anchor canonical headline:
| T0 | seed=1 | seed=42 | seed=1337 | 3-seed mean | within-SD |
|---|---|---|---|---|---|
| 2025-05-31 (production) | 7.08 | 7.97 | 8.07 | 7.71 | 0.55 |
| 2025-02-28 (Helene post) | 7.63 | 7.64 | 7.71 | 7.66 | 0.04 |
| 2024-11-30 | 6.64 | 7.31 | 7.42 | 7.12 | 0.42 |
| 2024-08-31 (Helene in) | 6.72 | 6.61 | 6.62 | 6.65 | 0.06 |
| 2024-05-31 | 6.07 | 6.10 | 6.14 | 6.10 | 0.04 |
| 2023-11-30 | 6.74 | 6.88 | 6.77 | 6.80 | 0.07 |
| 2023-05-31 | 6.80 | 7.07 | 6.84 | 6.90 | 0.15 |
| 2022-11-30 | 6.62 | 6.15 | 6.47 | 6.41 | 0.24 |
Statistical tests: p vs H0(mean=6.0) = 0.0024 (CI excludes 6×). p vs H0(mean=7.0) = 0.70 (model statistically INDISTINGUISHABLE from 7× lift). Honest framing: "~7× lift @ 15k with 95% CI [6.45, 7.39]".
Transferability infrastructure shipped:
- T0_ONLY env var across 17 builders — filter anchor iteration to a single T0 for cheap per-county sanity work.
- Model save in
retrain_with_v3.py— writesmodel_v3_<TAG>.lgb+.feats.jsonper run. - score_county.py — loads model + scores a new universe (different COUNTIES) without retraining; computes lift against gold-derived labels.
- Maricopa silver_rem pulled from S3 (all 10 anchor periods, ~5GB).
- National sanity Maricopa-FL transfer test: running at this writing (background
bqdkk8ags).
multiseed_validation ·
national_sanity_blocker (now unblocked).
Commits: cd3387a (multi-seed + COUNTIES env) · 62310fc (T0_ONLY refactor) · 0af7b68 (model save + score_county.py).
folds · model · calibration+rank · independent triple-audit (2026-05-22): folds→train→calibrate→rank + leakage — verdict SHIP. F4+F10 ablation (2026-05-22): closed both — dropping NA-as-positive raises lift +0.37× to 5.65× (clean-label story, not inflation); dropping §7 features Δ=+0.02× (noise, non-load-bearing). Feature audit: 58-feature catalog — all internal (FA static + REM + Gold permits). Synthetic-features feature engineering wave (2026-05-22): 5-layer stack lifted fold 1 from 5.28× → 6.47× (+1.19×, +23% relative). Layers: (1) synth +22 cols → 6.10× (NOAA hurricane + SUBDIVISION-or-ZIP neighbour density + business-sense flags + value ratios). (2) permit_context +27 cols → 6.13× (repair history + 11 adjacent trades 24/36m; only 2 of 27 cracked top-30, repair signal too sparse). (3) recency+distress +22 cols → 6.32× (tight 3/6/12m bands + months_since_last_X_permit per trade + distress segmentation). (4) synth_v3 +39 cols → 6.47× (cleaned JOB_VALUE / REM extras: RoofCoverCode, BuildingConditionCode, BuildingQualityCode, HOA1FeeValue, FA propensity scores, flood-zone / IBTrACS-2024 NOAA-v2 with parcel-grain lat/lon — surfaces Helene + Milton 2024 storms / cancelled-roof-permit signal / out-of-zip mailing). v3 winners: roof_cover_code #4 (gain 58K, beat fips), nearest_storm_name+_km #14-15, building_condition_code #19, hoa1_fee_value #23. Key learnings: #1 by gain is is_in_replacement_window (15-25yr bool); RECENCY (continuous months_since) beats COUNT (n_permits_Nm) decisively — bucketed counts dropped out of top-35 once continuous recency added; distress flags carry near-zero signal for 6-mo reroof in 3-county sample (sale/financial/net segmentation all failed). F9 closed (retrain w/ cap=1500 lost lift — baseline cap=600 correct). 2 MEDIUM still open (F5 static INCLUDE→as-of-T0, F18 50/25/25 allocation cost). Caveats: 3-county only, no hurricane Tier F' yet, scores at the 2025-05 anchor.
Why labeling before coverage DECIDED 2026-05-20
Coverage = "what fraction of SFH stock has at least one valid roof permit?" The numerator depends on the definition of "valid roof permit" — that's labeling. So labeling has to be locked first; coverage is downstream.
Full ADR (rationale, reversal cost, what was deliberately not touched):
decisions/2026-05-20_labeling_before_coverage.md.
AWS operating rules — read before any AWS call LOCKED 2026-05-20
Every step that touches AWS (Step 1.1 national rerun · Step 6 training · Step 9 deploy + monitor) must follow the operational context the platform team set. Hard requirements:
- Tag every resource with
Project=ClaudeCode-Ignacio(local override of originalProyecto=Roofing-Ignacio— generic tag used across all Claude Code work, not just Roofing). - Partition pruning obligatorio en dev — filtra por
FIPS(+Periodfor Silver) ANTES de cualquier read amplio. Verifica condf.explain()que aparecePartitionFiltersen el plan físico. - Clone existing EMR templates (`EduDS` / `DiegoDE`) en vez de crear desde cero. Familia base
r5/r6g· instancias más caras requieren confirmación explícita. - NO TOCAR Security Groups · VPC · subnets · NACLs · VPC Endpoints · S3 bucket policies. 95 % de problemas "de red" son IAM o path.
- REGLA DE ORO sobre todo eso: antes de cualquier billable call, confirmar con Ignacio. Si es chico, correr local en main o mini.
Doc completo (Hudi paths · skew handling · Lamborghini rule · agent behavior contract): notes/Roofing/aws_operational_context.md.
Dev-layer strategy — Layer A / B / C LOCKED 2026-05-20
All Step 1 iteration runs on local raw Gold to keep AWS cost bounded. Three layers, two confirmed AWS pulls, then unlimited local iteration.
| Layer | FIPS | Local path | Size | Use |
|---|---|---|---|---|
| A · fast local | 5 counties (FL+AZ+PA+TX+JAX): 12086 · 04013 · 42101 · 48201 · 12031 + Pinellas 50K sample (12103) | data/sandbox/roofing_audit/gold_<FIPS>/ | 2.7 GB | Sub-second keyword tweaks · MECE assertion |
| B · variability | 10 stratified FIPS across NE/MW/S/W census regions | same dir | ~3–8 GB | Cross-AHJ keyword drift · catches Pinellas-blind rules · keeps local iteration fast |
| C · national-local | All 1,421 in-coverage FIPS | same dir (streamed per-FIPS) | ~50–150 GB | Final pre-prod replay · zero AWS after pull |
Maricopa stored as 4-digit (4013/) in S3 — local naming is canonical 5-digit (gold_04013/). Hardware: M3 Pro · 18 GB RAM · ample disk. Layers A+B pulled — 31 FIPS on disk (2026-05-20). The classifier's fast-iteration subset (DEV_FIPS in classify_v5.py / audit_v5.py) is 10 counties. ADR: decisions/2026-05-20_step1_restructure_variables_and_type_subtype.md.
Define the universe of valid roof permits
- S1 · label quality unvalidated. The "is_roofing recall ≈ 99 %" headline is circular —
audit_v5.pymeasures theroof_fnstratum asv4=ROOFING AND v5≠ROOFING, i.e. v5-vs-v4 agreement, not recall against an independent ground truth. No stratum samples roofs both versions miss; the true false-negative rate is unknown (theproject_type_investigationalready found a 2.21 % vocabulary gap —epdm/sbs/single ply). - S2 ·
event_daterule frozen but not materialized. 1.5 is FROZEN, butlabels.parquetcarries noevent_datecolumn andsteps/06still names the legacyroofing_label.py(oldbroad_classify, pre-restructure) as owner. The Step 3 T0 anchor and the Step 4 label window both need event dates with the +90 d cap — no current-design implementation produces them.
Also MEDIUM: roof_action = NA for ~54 % of ROOFING items (12086) — gutted Step-4 positive set unless AMBIGUOUS absorbs them; gold_vintage string inconsistent across scripts. Scope-confusion clean.
audits/2026-05-21_step1_ds_audit.md
For every Gold permit row, the Step 1.2 classifier emits one MECE column — permit_scope, a list of {type, action} items covering every (object, verb) the permit touches: 20 permit_type values × 7 permit_action values (see 1.2). The old flat 14-category enum (ROOFING_REPLACEMENT, … with a separate ROOFING sub-class) was superseded by this two-axis model in the v5 restructure — see the archived 14-cat spec. Output: data/sandbox/classify_v5/<FIPS>/labels.parquet; contract in 1.4.
1.1 · Variable inventory DONE
Enumerate every Gold column and assign each a role — LABEL_SIGNAL (feeds 1.2 classifier) / FEATURE (downstream Step 5) / METADATA (provenance only) / DROP (all-null or noise). Restructure ratified 2026-05-20 (ADR 2026-05-20_step1_restructure_variables_and_type_subtype.md). Done: all 54 Gold parquet columns inventoried across the 31 FIPS → data/gold_variable_inventory.md (7 LABEL_SIGNAL · 23 FEATURE · 22 METADATA · 2 DROP), generado por scripts/roofing/inventory_v3.py.
steps/01_variable_inventory.md
1.1.1 · Schema dump DONE
Per-FIPS column inventory: name, dtype, originator. 54 columns, all present in 31/31 FIPS (no AHJ-specific columns). En la master role table de gold_variable_inventory.md (cols dtype, originator, n_fips).
1.1.2 · Null + cardinality DONE
Per-FIPS null rate + distinct count per column. gold_variable_inventory.md: master table with null_p50/null_max/dist_p50 + a complete per-FIPS null matrix (54 cols × 31 FIPS).
1.1.3 · Role assignment DONE
Cada columna con un rol — 7 LABEL_SIGNAL (incl. BUSINESS_NAME promovido de auxiliar), 23 FEATURE, 22 METADATA, 2 DROP (STORIES 96 % null, UNITS 99 % null). Persistido en notes/Roofing/data/gold_variable_inventory.md.
1.2 · Permit Gold taxonomy v5.3.3
Classifies every Gold permit into a single MECE column — permit_scope, a list of {type, action} items covering every (object, verb) the permit touches. A permit can do several things at once (58 % carry 2+ trades), so the taxonomy is multi-label by design; the type↔action pairing is intrinsic (same struct), not positional. is_roofing / roof_action are NOT taxonomy columns — they are roofing-project projections, derived downstream (Step 5) from permit_scope. Path: v5.0.0 single-label lost ~200 K real roofs to a tie-break bug → v5.1.0 multi-label dissolved it (is_roofing ≈ 99 %, types ≈ 93 %, 7-iteration autonomous audit loop) → v5.3.0 collapsed the schema to the single MECE column permit_scope → v5.3.1-v5.3.3 are precision/recall fixes from the 2026-05-21 audit: ROOFING keyword expansion, AHJ inspection-boilerplate stripping, and a location guard for " roof ". v5.3.3 audit: MECE pass, both defects closed. Independent recall check (vs BuildZoom PROJECT_TYPE, the S1 audit fix) measures is_roofing recall at 96.3 % on the 30-FIPS dev layer — an optimistic upper bound, ~198 K misses surfaced; the prior "≈ 99 %" was circular (v5-vs-v4 agreement). Pending next version: drop PROJECT_TYPE Tier-3 + add an explicit bn_roofer tier (needs sign-off). v4.0.1 stays shipped pending Ignacio/Eduardo sign-off.
permit_type — 20 values (the object): ROOFING · SOLAR · HVAC · ELECTRICAL · PLUMBING · POOL · FIRE · GENERATOR · WINDOWS · DOORS · GARAGE · SHED · DECK · FENCE · FOUNDATION · SIGN · SITEWORK · BUILDING · OTHER · UNKNOWN.
permit_action — 7 values (the verb): NEW · REPLACEMENT · REPAIR · ADDITION · ALTERATION · DEMOLITION · NA.
| Permit (DESCRIPTION) | permit_scope |
|---|---|
| "Tear off & re-roof, class-A shingles" | [{ROOFING, REPLACEMENT}] |
| "Install roof-mounted PV system 6.2 kW" | [{SOLAR, NEW}] — not roofing; the roof is only the location |
| "Tear off shingles, re-roof, install solar PV" | [{ROOFING, REPLACEMENT}, {SOLAR, NEW}] — one permit, two items |
| "Remove existing solar panels" | [{SOLAR, DEMOLITION}] |
| "Repair roof leak" | [{ROOFING, REPAIR}] |
| "Demolish detached garage" | [{GARAGE, DEMOLITION}] |
| "New single-family residence" | [{BUILDING, NEW}] |
Result preview — 50 random permits, before (v4.0.1) vs after (v5.3.3). Review aid; reproducible via scripts/roofing/before_after_sample.py 50 0.809817. Click to expand.
50 before/after examples — v4.0.1 → v5.3.3, random (seed 0.809817)
| # | FIPS | TYPE | DESCRIPTION | BEFORE v4.0.1 | AFTER v5.3.3 permit_scope |
|---|---|---|---|---|---|
| 1 | 26163 | Building | Exterior alterations per documents. (subject to field ap… | RENOVATION / NA | BUILDING:ALTERATION |
| 2 | 26163 | Commercial | Sidewalk sale june 6 thru 8 2014 | SITEWORK / NA | OTHER:NA |
| 3 | 09001 | Plu | New 2 family dwelling. Install U.G. plumbing waster line… | NEW_CONSTRUCTION / NEW | PLUMBING:NEW |
| 4 | 09001 | Building | Tent | RENOVATION / NA | BUILDING:ALTERATION |
| 5 | 17031 | Renovation/alteration | Rm-5 a2-multiunit: replace existing open wood rear porch… | RENOVATION / NA | OTHER:NA |
| 6 | 17031 | Driveway repair residential | Remove the old concrete and put new concrete | SITEWORK / NA | SITEWORK:REPAIR |
| 7 | 17031 | New construction | Self certification review: erect masonry 3 D.U. Building… | NEW_CONSTRUCTION / NEW | BUILDING:NEW |
| 8 | 17031 | Permit – express permit program | Replace current antenna c4 with one (1) tenxc bsa-da65-1… | ELECTRICAL / NA | ELECTRICAL:REPLACEMENT |
| 9 | 17031 | plumbing | PLUMBING / NA | PLUMBING:NA | |
| 10 | 36029 | Residence. Alteration | RENOVATION / NA | OTHER:NA | |
| 11 | 49035 | HVAC | Water heater | HVAC / NA | HVAC:NA / PLUMBING:NA |
| 12 | 49035 | Commercial plumbing | Building | PLUMBING / NA | PLUMBING:NA |
| 13 | 49035 | Coo | NEW_CONSTRUCTION / NEW | OTHER:NA | |
| 14 | 49035 | Building/permit/commercial/na | Building | OTHER / NA | BUILDING:NA |
| 15 | 48201 | Electrical permit | Expired*03-15-1997*occ. Report/lounge uk code | ELECTRICAL / NA | ELECTRICAL:NA |
| 16 | 48201 | Legacy building - electrical res… | 1 reconnect | ELECTRICAL / NA | ELECTRICAL:NA |
| 17 | 48201 | Structural overtime | S.f. Residence w/att. Garage (1-2-5-r3-b) 12 irc/15 iecc | NEW_CONSTRUCTION / NEW | BUILDING:NEW |
| 18 | 48201 | Fire marshal alarm permit | Enclose building for gymnasium 1-1-2-a3-n O.L. = 122 | FIRE / NA | FIRE:NA |
| 19 | 48201 | Electrical permit | Remodel for coffee carry-out kiosk to floor 1 lobby area… | ELECTRICAL / NA | ELECTRICAL:ALTERATION |
| 20 | 48201 | Mechanical permit | Residential mechanical permit | HVAC / NA | HVAC:NEW |
| 21 | 04013 | Drinking water site - drinking w… | OTHER / NA | OTHER:NA | |
| 22 | 04013 | Residential_addition | Addition of a living room patio with cover with electr… | ELECTRICAL / NA | OTHER:NA |
| 23 | 12031 | Electrical permit | 12264747/06replace 100 amp panel | ELECTRICAL / NA | ELECTRICAL:REPLACEMENT |
| 24 | 12031 | Mechanical permit | 1 kitchen range hood1 exh fan 4c836 dayton | HVAC / NA | HVAC:NA |
| 25 | 12031 | Mechanical permit | 1 heat pump 2twr3042 trane 3.5 tons 1 air handler 2tec3f… | HVAC / NA | HVAC:NA |
| 26 | 12031 | Electrical permit | $ 5.00 repaired water heater circuitwith 2 wire nuts & a… | ELECTRICAL / NA | ELECTRICAL:REPAIR / PLUMBING:REPAIR |
| 27 | 53033 | Construction permit | Tenant improvements to research + development laboratory… | INTERIOR / NA | BUILDING:NEW |
| 28 | 53033 | Side sewer permit | PLUMBING / NA | PLUMBING:NA | |
| 29 | 06073 | Electrical pmt | Electrical pmt:1402/k | ELECTRICAL / NA | ELECTRICAL:NA |
| 30 | 37119 | Mechanical permit | HVAC / NA | HVAC:NA | |
| 31 | 37119 | Mechanical permit | HVAC / NA | HVAC:REPLACEMENT | |
| 32 | 06037 | Bldg-alter/repair | Re-frame portions of (e) garage (all walls & footings to… | GARAGE / NA | OTHER:NA |
| 33 | 06037 | Bl: aircraft for hire unit 311 | OTHER / NA | OTHER:NA | |
| 34 | 06037 | Bldg-alter/repair | Supplemental to permit 21010-10000-05624 to capture the … | RENOVATION / NA | OTHER:NA |
| 35 | 06037 | Bldg-alter/repair | Cantilever steel deck | DECK / NA | OTHER:NA |
| 36 | 06037 | Bldg-alter/repair | Guardrail replacement on the 2nd floor of existing 2-sto… | RENOVATION / NA | BUILDING:REPLACEMENT |
| 37 | 47037 | Capl - plumbing permit | Inside plumbing | NEW_CONSTRUCTION / NEW | PLUMBING:NA |
| 38 | 47037 | Electrical permit | ********2 gas heater feeds 11 high bays all on ex. Svc. … | ELECTRICAL / NA | ELECTRICAL:ADDITION / PLUMBING:ADDITION |
| 39 | 27053 | Ele exst multi family | Replace old 60 amp services with new 100 amps in each un… | ELECTRICAL / NA | BUILDING:REPLACEMENT |
| 40 | 13121 | Residential new | New duplex residence | NEW_CONSTRUCTION / NEW | OTHER:NA |
| 41 | 32003 | Residential electric | Main panel change out with new 200 amp main and panel wi… | SOLAR / NA | SOLAR:REPLACEMENT / ELECTRICAL:REPLACEMENT |
| 42 | 32003 | plumbing single family | Galgano residence | NEW_CONSTRUCTION / NEW | PLUMBING:NA |
| 43 | 12086 | Building | Single family res-clust-zero lot-town | OTHER / NA | BUILDING:NA |
| 44 | 12086 | Building | Single family res-clust-zero lot-town | ROOFING / NA | ROOFING:NA |
| 45 | 12086 | Electrical - master tv antenna | NEW_CONSTRUCTION / NEW | ELECTRICAL:NEW | |
| 46 | 12086 | Electrical | Office buildings | ELECTRICAL / NA | ELECTRICAL:NA |
| 47 | 34003 | plumbing | Water heater | PLUMBING / NA | PLUMBING:NA |
| 48 | 51059 | Residential addition/alteration | Screen porch with shed roof on concrete slab per ffx cou… | ADDITION / NA | OTHER:NA |
| 49 | 12095 | Electrical Permit | Replace Expire Permit E05002204. For Unit 1328-A Duplex.… | ELECTRICAL / NA | ELECTRICAL:REPLACEMENT |
| 50 | 18097 | plumbing permit-non-residential-… | PLUMBING / NA | PLUMBING:ALTERATION |
steps/02_permit_taxonomy_v5.md
audits/2026-05-21_v5.3.3_classifier_audit.md
audits/2026-05-21_project_type_investigation.md
audits/_v5.3.3_audit_data/before_after_sample.md — 50 before/after examples (v4.0.1 → v5.3.3)
1.4 · Output contract DONE
Contract for the labels parquet, joined by building_permit_id: schema is building_permit_id · fips · permit_scope (list<struct<type, action>>) · spec_v · gold_vintage. Consumers derive is_roofing / roof_action from permit_scope — never re-classify. Versioned by spec_v semver + gold_vintage. Locked 2026-05-21 to the v5.3.3 artifact; provenance, cache_manifest and the partitioned Platinum-lake path are deferred to the national publish.
steps/05_labels_output_contract.md
1.5 · Canonical event date rule FROZEN
event_date = MIN(non-null status dates), capped at today + 90d to defuse future-date corruption (Pinellas anomaly). Downstream feature anchoring is T0-relative, not event-relative — see Step 3 (ADR 2026-05-21_step3_t0_anchor.md).
steps/06_event_date_rule.md
Combine vendor coverage with reality · per-muni training-inclusion rule
- S3 · training-universe leakage.
coverage_decisionis one static label per tuple, computed over the full 1900→2026-03 permit history (match_ratenumerator +first_permitspan the whole record).steps/07consumes Step 2 as a frozen input, so a walk-forward fold standing at T0 = 2021 trains on a universe whose INCLUDE membership was decided by 2022-2026 permits — inflates the Step 9 backtest. Fix is code-level: makecoverage_decisiona function of fold T0. - S4 ·
match_rateinvalid as a decision metric. Numerator is pre-v5 (LABELS_SPEC_V="coverage_pipeline_pre_v5", not the frozen v5.3.3permit_scope); the 50 % / 25 % cutoffs are self-admitted Zoom-call engineering judgment; the metric cannot separate "vendor has no coverage" from "our address join failed". - S5 · permanent selection-bias trap. "When in doubt EXCLUDE" → 47 % SFH EXCLUDE / 43 % FLAG / 9.6 % INCLUDE, and the 2.4 elbow that would recover geographies is circularly blocked on Steps 4-9 which only ever see the INCLUDE set. No in-pipeline path back.
Also MEDIUM: Gate 2 hard-EXCLUDEs 1,524 tuples on an empty 2024 label even when the reason says "year not ingested" and earlier years = Yes; INCLUDE-set base-rate shift uncorrected; ~35 % of SFH gets a county-level decision presented as city-level.
audits/2026-05-21_step2_ds_audit.md
- L1 · FA Municipality cleaning — final non-LA error rate ~1.3 % (R1 5.3 % → R4 1.3 % after 10 classifier patches: bare-word BASIN/LIGHTING/IMPROVEMENT-DIST/TRANSPORTATION tokens, SERVICES? plural, BORO + (TOV) suffix strips, UN-?INCORP hyphen tolerance, HTS→HEIGHTS, digit-prefix stub, DIST. period tolerance).
- L2 · FA↔BZ match — 0 % error across 600 graded cases.
- L3 · coverage_decision — 0 % error across 600 graded cases (gate 3 FLAG-only verified live).
Residual ~1.3 % L1 = LA County (06037) tax-area garble (98 % of 06037 Municipality strings, no decision impact — all fall to city_under_county → CA_Los Angeles; deferred per the original brief item 4) + ~3 bare-township names (need a gazetteer to catch).
audits/2026-05-21_step2_3layer_audit_summary.md — full 4-round trace
STEP2_RUNBOOK.md — ops manual (run, audit, schemas, known limitations)
For every (FIPS, jurisdiction, fa_muni), produce coverage_decision ∈ {INCLUDE, EXCLUDE, FLAG}. Only INCLUDED tuples feed training and delivery. False negatives are the dangerous failure mode → when in doubt, EXCLUDE. Output: evidence/sources/coverage/coverage_decisions.parquet.
2.0 · First American Municipality standardization DONE 2026-05-21
The match-table spine starts from FA's Municipality field — defined by FA as the legal jurisdiction, "not necessarily the property city". Each value is sorted into 4 status buckets: city_named (resolve to a city), unincorporated (FA confirms no incorporated place → county AHJ, confident), district_code (school/fire/tax district — no AHJ signal), unknown (NULL/junk). Only city_named carries a city; the other 3 set city = NULL. NULL is not treated as unincorporated — FA has a separate explicit "UNINCORPORATED" value. Classifier scripts/roofing/classify_fa_municipality.py built + audited on the local layer (72.76M parcels, 1,420 FIPS, no AWS) over 4 cycles × 4 AI agents (85 % → 87 % → 92 % → 93 % → ~96 %). Distribution (non-NULL strings): city_named 89.3 % · unincorporated 7.7 % · district_code 2.0 % · unknown 1.0 %.
data/fa_municipality_dictionary.md
audits/2026-05-21_fa_municipality_4bucket_audit.md
2.0b · BuildZoom jurisdiction → canonical DONE 2026-05-21
The BZ side of the two-sided standardization. BuildZoom names a jurisdiction STATE_County_City / STATE_County; the provider coverage CSV and the permit feed share this vocabulary (2,135 of ~2,300 strings overlap exactly), so one normalizer serves both. scripts/roofing/normalize_bz_jurisdiction.py parses each of the 2,497 distinct strings → canonical key (state, county_fips, canonical_place, place_type) — the same shape the FA side emits, so the two can be matched. county→FIPS uses the complete Census 2024 counties gazetteer (3,222 counties; county_master only held our 1,419-county set). Fixes: CT legacy counties (CT dropped counties for planning regions in 2022), diacritic folding, NYC/GA/MD/DC aliases. Result: county→FIPS 99.7 % (0 unmatched), status resolved 80.0 % / county_level 19.7 % / malformed 0.3 %. Audited 4 cycles × 4 AI agents = 404 row-checks, 100 % all 4 cycles.
audits/2026-05-21_bz_canonical_audit.md
2.0c · FA ↔ BZ canonical match DONE 2026-05-21
Joins the two standardized sides on the shared canonical key (county_fips, canonical_place) — scripts/roofing/match_fa_bz.py, 32,179 FA (fips, Municipality) rows. Outcome per FA municipality: city_matched (FA city ↔ BZ city jurisdiction — 31.1 % of SFH), city_under_county (FA city, BZ covers the county not the city — 22.6 %), county_matched (FA unincorporated/district/unknown ↔ BZ county jurisdiction — 12.9 %), no_bz (no BZ jurisdiction for the county — 33.3 %). 66.7 % of FA SFH falls under a BuildZoom jurisdiction. This match table supersedes the v2.7 one-sided match_table_v2 as the coverage spine: build_match_table_v3.py materializes it in the v2-compatible schema (enriched with measured sfh_with_permit + provider labels) and build_coverage_decisions.py now consumes match_table_v3. Known gap: FIPS 48113 (Dallas) silver is a dangling symlink — excluded until the silver layer is repaired.
audits/2026-05-21_fa_bz_match.md
2.1 · Vendor coverage labels INGESTED
BuildZoom publishes per-(jurisdiction × year) labels in {Yes, Some, None, empty} — what the provider claims, not what we measure. Join key is COLLECTION_POINT_3PART (not "jurisdiction"). Weekly call with BZ counterpart pending Camilo intro.
steps/03_geographic_coverage.md
2.2 · Our match-rate diagnostic SHIPPED NATIONALLY
Per-FIPS pure match rate (excluding unit-numbered SFH from the denominator). National avg = 14.98 % (~15 %). Shipped 2026-05-18 to 8020roof-coverage.web.app. Will refresh once Step 1.1 national rerun lands. Honest caveat (2026-05-22): the numerator is the all-time SFH-with-≥1-matched-roof-permit count under the pre-v5 roof filter (labels_spec_v=coverage_pipeline_pre_v5, DS audit S4), with no date cap — so the Pinellas future-dated permits (last_permit 2055/2060/2066) and decades-old permits count equally. It is an optimistic upper bound, not a current-state visibility figure. FL averages 62.2 % (54 fips); rest-of-US 10.7 %. The model-relevant coverage (clean v5.3.3 roof event in a usable recent window) is lower and needs the v5.3.3 + event_date-cap national rerun (blocked AWS).
steps/03_geographic_coverage.md
2.3 · Per-tuple inclusion decision tree DONE 2026-05-21
Four gates, first-fail-decides: (1) tuple in BZ provider list, (2) provider label ∈ {Yes, Some}, (3) match_rate FLAG-only — ≥ 50 % → INCLUDE, else FLAG (gate 3 never hard-EXCLUDEs: match_rate can't tell "no vendor coverage" from "our address join failed" — DS audit S4), (4) sub-muni veto. Implemented in scripts/roofing/build_coverage_decisions.py over all 32,179 (fips, jurisdiction, fa_muni) tuples. Gate 4: 86 confirmed non-AHJ special-district vetoes (USD / FD / SD# / MSTU / CDD / EMS / water / road / ambulance / metro district) → coverage_rules/submuni_veto_list.csv. Thresholds are documented defaults pending the 2.4 elbow.
coverage_rules/coverage_inclusion_rules.md
audits/2026-05-21_coverage_decisions.md
audits/2026-05-21_gate4_veto_before_after.md — 81 gate-4 vetoes, before/after
2.4 · Municipality-level training-inclusion rule SPEC
"Solo los que pasan el test se usan para entrenar" — per-FA-muni threshold over a trailing-5-year window, using roof_sub_class ∈ {REPLACEMENT, AMBIGUOUS}. Default threshold ≥ 25 % until backtest sweep argues otherwise (elbow on held-out recall).
steps/03_geographic_coverage.md
2.5 · Output contract DONE 2026-05-21
Materialized to evidence/sources/coverage/coverage_decisions.parquet — one row per (fips, canonical_jurisdiction, fa_muni), 32,179 rows. Columns: coverage_decision, coverage_decision_reason, match_rate, muni_match_rate, provider labels, provider_first_year, back-pointers labels_spec_v + gold_vintage, last_evaluated_at. Materialized run (post DS-audit + 3-layer-audit patches): 1,006 INCLUDE / 9,445 FLAG / 21,728 EXCLUDE tuples (9.6 % / 46.4 % / 44.0 % of SFH). Covers 1,420 of 1,421 FIPS — 48113 Dallas excluded (dangling FA silver symlink, needs an AWS re-pull). A standing coverage_recovery_queue.csv (275 jurisdictions / 9.20 M SFH) lists geographies whose measured evidence contradicts a conservative decision — the non-circular input to the 2.4 elbow (S5). Dashboard wiring of coverage_decision_reason still pending.
steps/03_geographic_coverage.md
As-of enrichment — property state at T0 DONE (3 counties) 2026-05-21
event_date ≤ T0 with 0-violation assertion; behavioral recency features 0 negative = no future dates). 57-column enriched_full.parquet per anchor. Two halves:
- Local (
build_enrichment.py) — FA physical (year_built, property_age@T0, lot/living/building sqft) + prior-permit-history roofing core (roof_age_months, n_wholeroof≤T0, n_roof_repair_24m, n_hvac/solar). 70-76 % carry a prior whole-roof permit. - Behavioral (
build_enrichment_behavioral.py) — unblocked by the silver-data-lake IAM grant; reuses the FA REM feature vocabulary (the data that fed Apollo, NOT the Apollo model). 23 distress flags + count, owner/occupancy (absentee/high_equity/vacant/is_listed), point-in-time valuation (AssdValue/MarketValue/CurrentAVM), leverage (CLTV/LTV/DaysOwnership), recency (months_since_prev_sale, mortgage_age — §7 audit-pending). REM-matched 96-99 %, any-distress 6-10 %, median AVM $396K.
roof_age × months_since_hurricane interaction.
audits/2026-05-21_enrichment.md (local)
audits/2026-05-21_enrichment_behavioral.md (REM)
The training unit is the pair (property_id, T0) — not "a property with an event". Features anchor on T0 (the prediction moment) plus trend anchors T0−3 / T0−6; every feature value must be knowable at T0. The permit event never sets the feature anchor — it is only read forward from T0 to set the label (Step 4): y = 1 iff a qualifying roof event in (T0, T0+6]. As-of join takes the latest FA / Silver REM snapshot ≤ each anchor.
Leakage rule. "Past data" is not automatically safe. A feature is leaky if dated after that row's T0, even if still before today. The earlier "T-1 too close" framing was wrong: a refi one month pre-permit is a real, serve-time-available signal, not leakage — true leakage is a post-T0 feature or an FA snapshot vintage backfill. No artificial freshest-month exclusion.
Repair vs replacement. Different drivers — handled distinctly. Positive label = roof replacement (permit_action ∈ {REPLACEMENT, AMBIGUOUS}); REPAIR-only permits do not count as positives (they dilute the target). Prior repair history (n_roof_REPAIR_24m) is carried as a feature — a roof patched twice is near end of life.
Hurricane signal. Primary exogenous driver — storm → insurance claim → reroof, lagged months. Carry months_since_hurricane / in_hurricane_corridor; the lift is in the roof_age × months_since_hurricane interaction (seeded in Step 5).
Feature source. Internal feature families — property physical (incl. year built, roof age), distress, equity, owner state, sale recency, tenure — are reused from the Apollo macro-model feature builder. Shared dependency is the feature pipeline, NOT the Alpha model (roofing does not compete against Alpha). Reused features re-anchor to the roofing T0 and inherit the CLAUDE.md §7 leakage audit.
Decisions to lock: anchor set ({T0, T0−3, T0−6} · add T0−12?) · FA snapshot vintage audit (point-in-time integrity — gates whether near-T0 features need a guard) · negatives matched on month + FIPS, K-ratio (Step 4 cross-link).
steps/12_as_of_enrichment.md
decisions/2026-05-21_step3_t0_anchor.md
Walk-forward folds & negative sampling PARAMS LOCKED 2026-05-21
Locked: horizon H = 6 months (label window (T0, T0+6]) · embargo 6 months = H (a shorter embargo leaves adjacent folds' label windows overlapping) · fold cadence 6 months (eval windows disjoint → honest variance estimate).
Fold layout — backward from the latest observation. latest_obs = most recent month with complete data. Fold x eval anchor E_x = latest_obs − 6·x; eval window (E_x, E_x+6]; fold x trains on all rows with T0_train ≤ E_x − 6. The first validatable fold is x=1 (latest_obs − 6) — latest_obs itself is the production scoring point, not a fold. Run as many folds as history allows (~6-8); fold x=1 is the most production-like. e.g. latest_obs = May 2026 → fold 1 eval (Nov 2025, May 2026], fold 2 (May 2025, Nov 2025], …
Negatives. Case-control: per positive, K negatives at the same FIPS + T0, from the coverage-INCLUDE SFH universe. Positive label = roof replacement (permit_action ∈ {REPLACEMENT, AMBIGUOUS}); a property supplies positive and negative rows at different T0s (discrete-time panel). K pending base-rate measurement.
Held-out final test stays carved out. The roofing model is judged on a walk-forward backtest (Step 9): lift over a random list of equal size. Baseline = the random list; roofing does not compete against Alpha (the "March 2025 vs Alpha" gate in CLAUDE.md is the macro sales model). Dev folds must not touch the backtest window's cells ± embargo. Open: pick a window old enough that its permit data is fully settled.
Decisions to lock: K negatives per positive (measure base rate first) · backtest window — old enough for settled permit data · positives-per-(FIPS, fold) count feeding the Step 6 per-FIPS-vs-global call.
steps/07_walk_forward_folds.md
Synthetic features · Internal × External SPEC · audited 2026-05-21
Feature columns on the enriched (property_id, T0) anchors from Step 3. Internal = property + owner (property physical incl. year built / roof age, distress, equity, tenure, sale recency, prior-permit history) — reused from the Apollo feature builder, not the Alpha model. External = market + environment.
Feature selection — expert pre-filter, then let the model judge. ~320 candidate columns need disciplined selection. Three steps: (1) expert pre-filter — drop obvious non-signal + no-variability columns, high confidence only; keep the "plausible but I doubt it" bucket. (2) quick model + permutation importance — multivariate, captures interactions; do NOT filter on univariate correlation, which misses roof_age × hurricane-type effects; drop importance ≈ 0. (3) train on the survivors. The distress family (~276 columns) is the main pre-filter target — collapse to a count + one trend + top-N flags. Drop the 2nd-diff for the first model.
External features split by whether they vary within (FIPS, T0). Step 4 matches negatives in the same FIPS + T0, so any feature constant across that cell — county macro (FRED rates, HPI, unemployment), a county-level hurricane flag — is identical for the positive and its negatives; the case-control sampling removes its signal entirely. Those do NOT go in the per-FIPS model — they belong at calibration / threshold tuning (Step 7 / 9). Useful External = features that vary within the cell: sub-county neighbour permit density, parcel-level storm exposure.
Hurricane must be parcel-level — peak wind / distance to track at the parcel, not a county-month flag (a county flag is wasted under T0-matched case-control). Neighbour permit density is a leakage trap — the rolling window must end at T0 and exclude the subject property.
Interactions — let the model do it. Gradient-boosted trees learn interactions automatically; hand-coding a boolean AND is redundant. Skip hand-coded interactions for the first model; reserve hand-coding for ratios.
Decisions to lock: feature budget number (after Step 4 reports positives-per-FIPS) · distress collapse set · insurance / claim climate source — sub-county grain, highest-value gap · FA ROOF_COVER availability for roof material.
steps/08_synthetic_features.md
Model training SPEC · audited 2026-05-21
One global model, not per-FIPS by default. Roof replacement is a rare event — a per-FIPS model on a small county (~100-300 positives/fold) re-learns the basics from thin data, while a global model learns them once from millions of rows. Default = one global gradient-boosted model with FIPS + region as features (the tree carves FIPS-specific behaviour where data supports it, pools where it does not). A dedicated per-FIPS model is a carve-out, justified only on evidence — not tied to the 5 gold-tier dev FIPS.
Algorithm. LightGBM (native categorical for FIPS) — confirm against HistGB / logistic on a roofing-specific arch sweep. The prior arch matrix was the sales model, not roofing.
Metrics — on the true base rate, not the case-control eval. The Step 4 eval is 1:K downsampled; AUC-PR read off it is inflated and top-decile recall is optimistic. Primary metrics must be computed on a holdout with the true population base rate. Success = lift over a random list of equal size on a walk-forward backtest (Step 9) — roofing does not compete against Alpha, the REI sales heuristic.
No re-weighting. The training table arrives already downsampled 1:K from Step 4 — Step 6 does not re-sample or re-weight; prior correction is Step 7's job. Tune hyperparameters once on pooled data and reuse — not a fresh 50-trial search per FIPS (overfits thin data). Early-stopping validation is carved from train, never the eval window.
Decisions to lock: global+FIPS-feature vs per-FIPS carve-out (+ row-count threshold) · algorithm (after a roofing arch sweep) · true-prior holdout for metric reporting · backtest window for the held-out evaluation (Step 9).
steps/09_model_training.md
Calibration — turn rankers into forecasters SPEC · audited 2026-05-21
Map raw model scores to probabilities that mean what they claim. Critical — calibrate against the true base rate, not the case-control prior. The model is trained on the Step 4 table downsampled 1:K, so its scores reflect a ~1 % sample prior, not the real (rare) population rate. Calibrating against a slice of that 1:K pool yields probabilities wrong by 1-2 orders of magnitude ("23 %" when the truth is ~0.3 %). Fix: calibrate on a true-prior hold-out, or apply an explicit log-odds prior correction. This is the prior-correction Step 4 and Step 6 defer here — measure the true base rate, do not assume it.
Method + scope. Platt (sigmoid) default — stable on thin data; isotonic is data-hungry, use only where the calibration sample is large. Calibrate globally (matches the global model from Step 6) — per-FIPS isotonic on ~1,421 counties starves on data; split by region only if markets genuinely differ.
Acceptance — measured on the true-prior population: calibration error within a relative tolerance (absolute percentage points are the wrong frame for a rare event), reliability diagram monotonic, Brier improves. A region that fails downgrades to "ranker only — no probability claim" (score still ranks; no fake probability shown). The ±15 % in the legacy sketch is the sales model's number — set roofing's own.
Decisions to lock: measured true base rate (shared with Step 4) · roofing tolerance (value + relative vs absolute) · scope global vs per-region · method.
steps/10_calibration.md
Output filter / sanity GATE SPEC · audited 2026-05-21
The 25K-list incident (finding 67) is fixed here, at the gate, not in the label. After calibration, apply hard rules: recent-roof exclusion (last_ROOFING_age < 15 yr), VENDOR_BLIND (property outside an INCLUDE tuple — Step 2), dedup (one mailpiece per household), confidence (combines coverage match-rate + calibration).
The gate is also a model-health detector — by design. The 15-year exclusion is intentional: if the model scores a recently-rerofed property high, the model is probably wrong. The gate catches it before the client sees it — and the catch itself is the signal. So the gate's firing rate is a monitored metric (Step 9), not silent plumbing — a rising count of high-scored recent roofs means fix the model, not the gate. A tighter very-recent band (~2-3 yr + high score) is the sharpest model-error tell, tracked separately.
Never overwrite the score. calibrated_probability is kept intact for every row; the gate only sets delivery_eligible + delivery_reason_excluded. That is what lets the recent-roof catch tell us the model is wrong — "model said 0.8, gate excluded for RECENT_ROOF". Overwriting the score to 0 / NULL would erase that evidence.
Decisions to lock: very-recent alarm band threshold + high-score cutoff · dedup unit (household, not mailing address — handle multi-property owners) · 3-level confidence definition · soft vs hard filter at the 15-yr boundary.
steps/11_output_filter.md
Deployment + monitoring SPEC · audited 2026-05-21
Value demonstration — the walk-forward backtest. The client-facing proof: stand at a past T0, train on ≤ T0, generate the list we would have delivered, then check the (T0, T0+6] outcome. "If you'd been a client in January, here's the list — of all the roofs Jan–Jun, it caught X %." Honest only with strict T0 discipline and a backtest window old enough that permit data is fully settled (permits lag — a too-recent window undercounts).
The value metric. Recall (capture rate) at a fixed list size N — recall alone is gamed by enlarging the list. The honest headline is lift over a random list of equal size: "our 15 000-property list caught 85 % of the roofs — 11× better than random." Show precision@N too (the client works the list) and a gains curve so the client picks N to capacity. Baseline = the random list — this closes the held-out-test open item; roofing does not compete against Alpha.
Ground truth is the roof permit, not the CRM. Measure recall / precision against new permit data (the label's own source). The CRM (Clients_Deals_RBB_V1) measures business conversion — a narrower, different outcome; track it separately. Avoid the feedback loop: score the full population each cycle, not just the delivered list, or keep a randomised control.
Monitoring — fast leading signals vs slow ground truth. Fast/daily: input-feature drift (PSI on features, not just score), score drift, the Step 8 gate firing rate. Slow/lagging: recall / precision vs permits — resolves ~6 months late, so it is confirmation, not a weekly trigger. Thresholds tune per-region on the calibrated probability (calibration already fixed score incomparability — no per-FIPS needed). Hybrid retraining: 90-day floor + fast-signal ceiling.
Decisions to lock: backtest window (settled permit data) · debias method (full-population scoring vs randomised control) · alert routing / on-call.
steps/13_deploy_monitor.md
Status summary
| Step | Sub-step | Status | Blocker |
|---|---|---|---|
| STEP 1 · LABELING | 1.1.1 Schema dump (54 cols · 31 FIPS) | DONE | — |
| 1.1.2 Null + cardinality | DONE | 1.1.1 | |
| 1.1.3 Role assignment | DONE | 1.1.2 | |
| 1.2 Permit Gold taxonomy — permit_scope classifier v5.3.3 | DONE | 1.1.3 | |
| 1.4 Labels parquet output contract | DONE | 1.2 validation | |
| 1.5 Canonical event date rule | FROZEN | — | |
| STEP 2 · COVERAGE | 2.1 Vendor CSV ingested | DONE | — |
| 2.2 Match-rate diagnostic (national) | DONE | Refresh after 1.1 rerun | |
| 2.3 Decision tree (gates 1-4) + 2.5 output contract | DONE | coverage_decisions.parquet materialized | |
| 2.4 Muni-level threshold elbow | TODO · binding | Blocked: needs backtest → Steps 4-9 | |
| STEP 3 · ENRICHMENT | As-of T0 anchor + FA / Silver REM joins | SPEC · design locked 2026-05-21 | Impl: Steps 1 + 2 frozen |
| STEP 4 · FOLDS | Walk-forward folds (H=6m, embargo=6m, cadence=6m) + negatives | SPEC · params locked 2026-05-21 | Impl: Step 3 frozen |
| STEP 5 · FEATURES | Internal × External · feature budget · case-control-aware External split | SPEC · audited 2026-05-21 | Impl: Step 4 frozen |
| STEP 6 · TRAINING | Global GBM + FIPS feature · true-prior metrics | SPEC · audited 2026-05-21 | Impl: Step 5 frozen |
| STEP 7 · CALIBRATION | Global calibrator · true-prior correction | SPEC · audited 2026-05-21 | Impl: Step 6 frozen |
| STEP 8 · OUTPUT GATE | Recent-roof gate (model-health detector) · VENDOR_BLIND · dedup | SPEC · audited 2026-05-21 | Impl: Step 7 frozen |
| STEP 9 · DEPLOY + MONITOR | Walk-forward backtest (lift vs random) · permit-truth monitoring · retraining cadence | SPEC · audited 2026-05-21 | Impl: Step 8 frozen |
Next-action queue
DS audit follow-ups (2026-05-21) — 5 SERIOUS findings, all APPLIED
From the Step 1 + Step 2 audits. Detail: step1 audit · step2 audit.
- DONE S1 · independent recall check.
measure_recall_independent.py— recall vs BuildZoomPROJECT_TYPE= 96.3 % (was a circular "≈ 99 %").artifact. Remaining: hand-grade the ~198 K FN cell + a v5-not-ROOFING random sample (true ground truth). - DONE S2 · event_date relocated.
steps/06rewritten — the rule lives in the Step 3/4 permit reader, read forward from T0 only; legacyroofing_label.pymarked legacy. - DONE S3 · T0-relative coverage — spec'd.
coverage_inclusion_rules.md/steps/03/steps/07now mandatecoverage_decision(tuple, T0). Remaining: the per-fold-anchor materialization is code, to land before Step 4 consumes the parquet. - DONE S4 · match_rate fixed. Gate 3 is now FLAG-only (never hard-EXCLUDE) — code shipped + re-run. Remaining: the v5.3.3 numerator refresh needs the Step 1.1 national rerun (AWS — REGLA DE ORO).
- DONE S5 · selection-bias trap broken.
coverage_recovery_queue.csvemitted every run (274 jur / 9.17 M SFH); the 2.4 backtest re-scoped to evaluate the full SFH population, not INCLUDE-only.
Next week
- Design the backtest threshold sweep (held-out period, success metric).
- BZ counterpart weekly call (Camilo intro pending) — clarify "Some" semantics.
- Wire
coverage_decision_reasonthrough to evidence dashboard muni profile. - Spec
build_label_universe.py+build_delivery_list.pyconsumers.
Blocked / pending external
- 1.1 national rerun — AWS confirm (REGLA DE ORO) + Otata scaling.
- 2.1 Camilo → BZ counterpart intro (BZ contact on vacation this week).
- Refund proposal numbers — pending coverage decision retro-applied to prior 25K list.
Champion audit · v22 / V22.1
The 9 steps above are the rebuild spec (implementation not started). This section is about the incumbent the rebuild will replace: v22_super_fl7_v3, the shipped production champion behind the CallZeke lists (180 features, FL-7, OO-strict + Individual-only). In May 2026 it was audited against the 7-failure-families roof-pipeline-audit doctrine — leakage, case-control prior, p≫n explosion, metric validity, selection-bias feedback, scope confusion. Six parallel findings (F1-F7). Bottom line: the champion is sound; one serious leakage risk (F3) is unresolved and blocks the one validated upgrade (V22.1) from shipping.
Detail: v2_model/README.md · V22.1_DEPLOYMENT.md (decision matrix) · V2_MULTISEED.md (3-way 3-seed) · F5_RESULTS.md
Seven-finding verdict
| # | Finding | Verdict | What it means |
|---|---|---|---|
| F1 | roof_age_months_est leakage (#1 feature, 829k gain) | CLEAN | Computable from T0 snapshot alone. No fix needed. |
| F2 | area_* neighborhood-window features | CLEAN | Windows respect T0 boundary. No fix needed. |
| F3 | FA snapshot vintage (Real_Estate_Master_V1 partition semantics) | SERIOUS · BLOCKER | 10-40 mo forward-leak risk on 11 features incl. roof_cover_code (#2). Awaiting data-team confirmation of partition vintage. Blocks V22.1 promotion + F5 replacements. |
| F4 | canceled-roof family (row support) | DENSIFY | Replaced 3 sparse cols with dense canceled_roof_recent_12m bool. Lift-neutral (doesn't crack top-40). Shipped. |
| F5 | fips + nearest_storm_name memorization | PARTIAL · KEEP | Dropping both tightens cross-fold variance −13.8% but costs −1.6% mean lift. They encode time-varying real signal (hurricane corridors, county insurance regime), not pure memorization. Keep; replace with continuous (county_base_rate_36m, drop storm name keep physics). |
| F6 | roof-age redundancy (r=1.0 between #1 and #3) | RETRACTED | Theory correct (correlation real) but dropping the 5 "redundant" cols cost 2-4% lift on multi-seed — LightGBM used them as marginal split candidates. Do not drop by correlation alone. |
| F7 | insurance-pressure design (renewal cycle) | VALIDATED | F7-b roof_age_at_purchase_anniv_y12 ranks top-5 every seed; recovers ~0.7-0.8 lift pts. Ships as the V22.1 add. F7-a falsified, F7-c deferred. |
V22.1 = v22 + F7-b · 6-fold cross-val (FL-7 OO-strict, eval anchors 2023-05 → 2025-10)
Statistical tie with a structural era-split. Mean lift@15K 8.40× (V22.1) vs 8.48× (v22), within seed noise. But hybrid wins all 3 older folds; v22 wins all 3 recent folds. Hybrid carries 14% tighter cross-fold variance (stdev 1.024 vs 1.184) — more predictable list quality. The recent-fold loss is the F3 leak surfacing: F7-b reads CurrentSaleRecordingDate, whose FA-commit-vs-T0 gap widens on recent anchors.
| Eval T0 | v22 L@15K | V22.1 L@15K | Δ | Era |
|---|---|---|---|---|
| 2025-10-31 (prod eval) | 9.68× | 9.51× | −1.7% | Recent · v22 +2.7% |
| 2025-05-31 | 9.96× | 9.70× | −2.6% | |
| 2024-11-30 | 8.75× | 8.41× | −3.8% | |
| 2024-05-31 | 7.00× | 7.16× | +2.3% | Older · V22.1 +1.2% |
| 2023-11-30 | 7.55× | 7.63× | +1.0% | |
| 2023-05-31 | 7.97× | 7.99× | +0.3% | |
| mean | 8.48× | 8.40× | −1.0% | stdev 1.184 → 1.024 (−14%) |
v2_oostrict (full F4+F6+F7) RETRACTED. 3-way 3-seed at anchor 2025-10-31: v22 9.60±0.15× · V22.1 9.46±0.08× · v2_oostrict 9.38±0.06× (L@15K). v2_oostrict has separated losses (−3.9% L@10K, −2.2% L@15K) — the F6 drops were net-negative. The earlier "v2 wins +12.6% AUC-PR vs v21" framing is a universe-mismatch artifact (FL-3, different filter version) and is superseded.
Trim feature-set · finding 85 (concurrent · super model)
Separate experiment on the FL-6/FL-7 super model, validated + committed (76f13b0). Tested whether "lifestyle" permits (POOL/DECK/GARAGE/WINDOWS/FENCE — owner-enjoyment capex as OO-proxy + renovate-then-sell signal) and FA garage cols help. The full 11-feat lifestyle set was near-flat; the 3-feature trim won: n_pool_permits_lifetime + garage_sqft + last_lifestyle_permit_months. Dropping the 11 noise feats freed LightGBM split capacity for the 3 orthogonal signals (noise-saturation effect).
| Arm | AUC-PR Δ | lift@5K Δ | lift@15K Δ |
|---|---|---|---|
| Lifestyle (11 feats) | +3.4% | — | flat |
| Garage (4 feats) | +1.7% | — | −0.5% |
| Trim (3 feats) | +4.5% | +6.8% | +3.5% |
Cross-fold confirmed (EVAL_T0=2026-04-30): trim 10.05× vs 9.32× base @15K (+7.8%), recall 17.8% → 19.2%. Feature ranks 3 / 10 / 25 of 201 (7.03% gain). Caveat: the win is N=15K-specific; the AUC-PR % is noise-inflated (287 eval positives). Open: FL-7 super_v3 go/no-go (~9h retrain). Detail: finding 85.
Production decision — stays v22 until F3 lands
- NOW Keep v22 in production. Recent folds (what next-quarter lists look like) favor v22 by 2.7%. V22.1 is spec'd + 6-fold validated as standby.
- BLOCKER F3 — resolve
Real_Estate_Master_V1partition vintage (data-team Q). Path A (vintage-frozen) → V22.1 promotes directly. Path B (current-as-of-commit) → add a 1-day vintage gate to the hybrid build. - THEN Re-validate post-F3 — falsifiable: V22.1 wins ≥4/6 folds, mean ≥+1.5%. If it holds → promote to V22.1. A/B test v22 vs V22.1 on a real CallZeke build, measure 30/60/90-day conversion (lift-on-labels is a proxy; business outcome is truth).
References
Markdown files (AI-detail layer) — every link in the boxes above resolves to one of these:
| File | Step | Content |
|---|---|---|
steps/01_variable_inventory.md | 1.1 | Variable inventory · role assignment (LABEL_SIGNAL / FEATURE / METADATA / DROP) |
steps/02_type_subtype_mece.md | 1.2 | Type + subtype MECE rule table · priority cascade · anti-FP guards |
data/gold_variable_inventory.md | 1.1.3 | Master per-column role table + per-FIPS null/cardinality matrix (output of 1.1) |
steps/01_bz_roof_filter.md | 1.1 (archived) | Pre-restructure binary classifier spec — kept read-only as historical reference |
steps/02_permit_category_mece.md | 1.2 (archived) | Pre-restructure 14-cat enum spec — kept read-only as historical reference |
steps/04_repair_vs_replace.md | 1.3 (folded) | Pre-restructure repair/replace sub-class — now folded into permit_subtype in 1.2 |
steps/05_labels_output_contract.md | 1.4 | Platinum parquet schema · spec_v versioning policy |
steps/06_event_date_rule.md | 1.5 | Canonical event date · anti-corruption guard · T-3 anchor |
steps/03_geographic_coverage.md | 2.x | Decision tree, output schema, hard rule (false-negatives) · referenced from 2.1 / 2.2 / 2.4 / 2.5 |
coverage_rules/coverage_inclusion_rules.md | 2.3 | Canonical inclusion gates · sub-muni veto list |
steps/12_as_of_enrichment.md | 3 | T0-relative anchor · as-of join method · leakage definition · repair/replace · hurricane — design locked 2026-05-21 |
steps/07_walk_forward_folds.md | 4 | H=6m · embargo=6m · cadence=6m · backward fold layout · case-control negatives — params locked 2026-05-21 |
steps/08_synthetic_features.md | 5 | Internal × External · feature budget · case-control-aware External split · interactions — audited 2026-05-21 |
steps/09_model_training.md | 6 | Global model + FIPS feature · algorithm · true-prior metrics · no re-weighting — audited 2026-05-21 |
steps/10_calibration.md | 7 | True-prior calibration · Platt default · global scope · ranker-only downgrade — audited 2026-05-21 |
steps/11_output_filter.md | 8 | Recent-roof gate as model-health detector · score never overwritten · VENDOR_BLIND · dedup · finding 67 fix — audited 2026-05-21 |
steps/13_deploy_monitor.md | 9 | Walk-forward backtest value metric (lift vs random) · permit-truth closed loop · feature drift · retraining — audited 2026-05-21 |
decisions/2026-05-20_labeling_before_coverage.md | — | ADR · rationale for the 2026-05-20 renumber (labeling before coverage) |
decisions/2026-05-20_step1_restructure_variables_and_type_subtype.md | 1.1 / 1.2 | ADR · 2026-05-20 Step 1 restructure into variables + type+subtype MECE · dev-layer strategy |
decisions/2026-05-21_step3_t0_anchor.md | 3 | ADR · 2026-05-21 Step 3 T0-relative anchor · leakage definition · negatives · repair/replace · hurricane |
INDEX.md | — | Folder map + status table mirror |
Legacy pipeline HTML. The pre-renumber pipeline doc lives at
notes/roofing_model_design/roofing_pipeline_scientific_method_2026-05-19.html. Marked ARCHIVED at the top of the doc; retained read-only as the historical sketch source for the Step 3-9 stubs above. Do not edit — migrate detail into the corresponding steps/*.md file as each phase becomes active.
Deep-dives · further reading
Documents that are not the source of truth for any pipeline step but that you should read to understand the data + ETL layers underneath the steps. Keep these handy when you want to dig past the "what" into the "how" and "why".
BuildZoom ETL walkthrough (2026-05-19)
The data's journey from raw provider CSVs through Bronze → Silver → Gold, including the FA-address match cascade that produces the rows our Step 1 classifier reads. Authoritative for understanding where the data we label actually comes from.
| Section | What it covers | Relevant to pipeline step |
|---|---|---|
| §1 Architecture | Swim-lane view of the ETL · all four Bronze tables · joins | Background for all steps |
| §2 Bronze | Raw ingestion · four tables · one job each | Upstream of Step 1 (labels read Bronze-derived rows) |
| §3 Silver | Fact-table join · 3-join recipe | Upstream of Step 1 + Step 3 enrichment |
| §4 Gold | FA address matching · 3-condition cascade | Direct input to Step 1.1 + Step 2 match-rate |
| §4b Temporal contract | event_date + observation_period rule | Direct source for Step 1.5 canonical event date rule |
| §5 QA gate | Same-shape audit at every layer | Pattern to mirror in Step 1 audits |
| §7 Issues identified | Bugs / surprises surfaced by the audit | Worth scanning before every Step 1 / Step 2 change |
| §8 Opportunities | Proposed improvements (Bronze → Platinum) | Several proposals (permit_category enum, status timeline, canonical event_date) are now Step 1 sub-tasks |
| §8b Missing cleaning layer | Proposal for a clean layer between Bronze and Gold | Relevant to Step 1 + Step 3 enrichment design |
| §9 Where Platinum fits | Why a separate tier — our Step 1.4 output IS the Platinum tier proposal materialized | Direct source for Step 1.4 output contract |
| §11 Deep technical audit | 3 specialist passes: critical bugs · perf wins · silent-failure DQ issues | Read before any Step 1 spec change · 13 HIGH bugs catalogued |
Other deep-dive references
| Doc | What's in it |
|---|---|
aws_operational_context.md | AWS ops rules (already linked from the AWS callout at the top of this page) · tag · partition pruning · EMR cloning · networking-don't-touch |
findings_index.md | Cross-reference to every notes/findings/*roof*.md entry — chronological evidence log |
audits/ | Per-step empirical audit reports (e.g. 2026-05-20_v2_classifier_duval_audit.md · 25/25 ✓) |
hypotheses/ | H1-H4 + experimental hypotheses (new owner × old roof · climate zone × material · hurricane wave · silver equity) |
data/ | Field dictionaries (BZ, FA, providers, external macro) |
callzeke/ | CallZeke client engagement archive (2026-05-13 call · 5 objections diagnosed 2026-05-15 · 9 distilled L1-L9 learnings for the roofing pipeline). Ex-REI hub HTMLs converted to AI-readable markdown 2026-05-25. Start at README.md → 00_context.md → objections.md → learnings.md. |
| Memory entries | project_buildzoom_etl_audit · project_coverage_universe · project_coverage_verified · reference_aws_operational_context — agent-side facts persisted across sessions |
Suggested reading order (if you're starting fresh)
- This cuaderno end-to-end (you are here) — get the 9-step macro.
- ETL §1-4 (architecture + Bronze + Silver + Gold) — understand where the rows we label come from.
- ETL §4b temporal contract — why
event_date = MIN(non-null status dates). - Step 1.1 + 1.2 MDs (
steps/01_bz_roof_filter.md,02_permit_category_mece.md) — the labeling rules in detail. - ETL §11 deep audit — 13 HIGH bugs already catalogued; useful to know what's pending DE-side.
- AWS ops context — before touching any cluster.
- Step 2 (
steps/03_geographic_coverage.md) + ADR — how coverage decides the training cohort.
v22 — How it works (field guide)
Migrated from v2_model/V22_EXPLAINED.html — the v22 model field guide as a changelog entry
How the model finds the next roof to replace.
A look inside the 163-feature, 2.27-million-row machine that scores every single-family home across seven Florida counties on its odds of pulling a roof permit in the next six months.
Most of the model's brain is the same brain a twenty-year Florida roofer carries to a job site. It looks at how old the roof is, what it's made of, whether the neighbors have been pulling permits, whether a hurricane recently came through, and whether the owner is in fix-things-up mode. The model just does it at 1.10 million properties at once instead of one driveway at a time. On a list of fifteen thousand homes, it catches roof replacements at 9.7× the rate of a random draw from that full scored universe on the current production anchor, and 8.5× averaged across six historical evaluation windows.
What the model actually weights.
How well it actually works.
Of 1.10 million scored homes across the seven counties, only 12,631 (1.15%) will pull a roof permit in the next six months. A random 15,000 from that universe catches about 170. The model picks 15,000 and catches 1,665 — a 9.7× edge.
Which baseline? The 9.7× / 8.5× figures compare against a random draw from the full 1.10M scored universe. A client mails the narrower buy-box (single-family, age 10+, no roof permit in 10 years), which already screens out the obvious non-candidates — against that harder baseline the v21 reconciliation measured roughly 4–5×. Same model, smaller multiple; the denominator is what changes.
Six historical windows, 7-to-10× lift.
What we checked, changed, and left open.
What pulls a score down.
- Absentee owner
- Non-individual owner
- Mobile / manufactured
- Vacant or no mailable address
- Recently reroofed
roof_age_months_est - Long-life tile
roof_cover_code - Prior canceled permit
months_since_canceled_roof
- Few neighbors reroofing
area_pct_wholeroofs_36m - No recent storm
months_since_nearest_storm - Low-rate county
fips
Two stages. The first column is a hard filter — absentees, non-individual owners, mobile homes, vacants, missing EMV, and addresses that fail Smarty DPV are cut before the model ever scores. The other two columns are in-model signals whose adverse values pull a score down (directions inferred from feature behavior, not yet SHAP-audited). The 15-year recent-roof rule is a separate hard exclusion at the output gate.
Against a random draw from the scored universe the model is roughly nine-to-ten times more efficient across six historical windows (about 4–5× against the tighter buy-box a client actually mails). Physical roof age dominates (≈ 51% of signal), followed by place, owner activity, material, and capacity — mirroring the questions a twenty-year Florida roofer would ask. What the model can't see: insurance-carrier data, owner intent before permits surface, and any market outside the seven Florida counties it was trained on. Production stays on v22 until the FA-vintage question (the largest open lever) is resolved.