OverviewModelsRoofing › How it works
8020ROOF
May 2026
Field Guide

How the model finds the next roof to replace.

A look inside the 163-feature, 2.27-million-row machine that scores every single-family home across seven Florida counties on its odds of pulling a roof permit in the next six months.

Most of the model's brain is the same brain a twenty-year Florida roofer carries to a job site. It looks at how old the roof is, what it's made of, whether the neighbors have been pulling permits, whether a hurricane recently came through, and whether the owner is in fix-things-up mode. The model just does it at 1.10 million properties at once instead of one driveway at a time. On a list of fifteen thousand homes, it catches roof replacements at 9.7× the rate of a random draw from that full scored universe on the current production anchor, and 8.5× averaged across six historical evaluation windows.

Driver clusters · LightGBM gain, aggregated by signal

What the model actually weights.

Approximate share of the top-40 features' gain, by signal cluster · 100% total
01
Roof age & lifecycle
≈ 51%
The dominant driver. Six features all encode "how long until this roof needs replacing" — directly (estimated roof age, months since the last whole-roof permit), as fallback (house age, year built), or as a learned threshold (the replacement-window flag). Highly correlated; individual rankings are not independent.
roof_age_months_est · roof_age_months · property_age_yr · year_built · months_since_any_roof · is_in_replacement_window
02
Roof material
≈ 10%
A single FA assessor field (shingle / tile / metal / flat) that interacts strongly with age. The model learns whatever per-material replacement rates exist in the training data; we have not independently audited those rates against industry conventions.
roof_cover_code
03
Place — county, neighborhood, storms
≈ 19%
Three nested location signals: county regime (insurance + climate, via the county code), subdivision herd effect (are neighbors reroofing?), and parcel-specific storm exposure (named storm, distance to track, peak wind). They overlap — Sarasota's high county reroof rate is partly the Ian effect — but encode distinct geographic scales.
fips · area_pct_wholeroofs_36m · nearest_storm_name · nearest_storm_km · nearest_storm_wind_kt · months_since_nearest_storm
04
Owner activity
≈ 12%
Recent permit history on this parcel for any non-roof trade — HVAC, plumbing, electrical, building, windows, solar, pool. The composite "any non-roof permit recently" carries the bulk of this signal; the model cannot pinpoint which specific trade matters most.
months_since_last_any_non_roof_permit · months_since_last_hvac_permit · months_since_last_building_permit · months_since_last_windows_permit
05
Capacity & value
≈ 8%
Can they pay, will the HOA enforce, what's the property worth. Physical size, market and AVM value, value per square foot, HOA tier. Smaller share but consistent across folds.
market_total_value · lot_sqft · building_sqft · avm_per_living_sqft · hoa1_fee_value
On feature counts. LightGBM reports gain per column. When multiple columns encode the same underlying signal — roof age in six different forms, for example — the gain is partitioned across them. Reading the model by individual feature rank overstates independent contribution. Reading by signal cluster is the honest version. Cluster shares above are computed over the top-40 features, which carry the bulk of total gain.
How the model finds the next roof
Field Guide · 1
8020ROOF
May 2026
Headline numbers

How well it actually works.

9.7×
Lift @ 15K · production anchor
eval at 2025-10-31
8.5×
Lift @ 15K · 6-fold mean
honest cross-period
13.2×
Lift @ 5K · top of list
tightest selection
1.15%
Universe base rate
6-mo permit hit rate

Of 1.10 million scored homes across the seven counties, only 12,631 (1.15%) will pull a roof permit in the next six months. A random 15,000 from that universe catches about 170. The model picks 15,000 and catches 1,665 — a 9.7× edge.

Which baseline? The 9.7× / 8.5× figures compare against a random draw from the full 1.10M scored universe. A client mails the narrower buy-box (single-family, age 10+, no roof permit in 10 years), which already screens out the obvious non-candidates — against that harder baseline the v21 reconciliation measured roughly 4–5×. Same model, smaller multiple; the denominator is what changes.

Cross-fold stability

Six historical windows, 7-to-10× lift.

Lift at 15,000 across six walk-forward anchors: 2023-05 7.97×, 2023-11 7.55×, 2024-05 7.00×, 2024-11 8.75×, 2025-05 9.96×, 2025-10 (production) 9.68×. 10× 7.97× 2023·05 7.55× 2023·11 7.00× 2024·05 8.75× 2024·11 9.96× 2025·05 9.68× 2025·10*
Lift @ 15,000 across six walk-forward evaluation anchors. Strongest on 2025 anchors (Helene + Milton + Ian in-window); weakest on the quieter 2024-05 window. *Current production anchor. The lowest fold still beats random by 7×.
Audit findings

What we checked, changed, and left open.

Clean
Roof-age derivation and neighborhood-window features pass T0-strict leakage audit. No code changes required.
Validated
Insurance-renewal proximity (purchase-anniversary × roof age) ranks top-5 by gain — the validated v22.1 upgrade candidate, not yet in production v22. Continuous county base-rate is proposed (not yet built). One signal confirmed, one pending.
Validated
Three-feature "lifestyle trim" (pool-permit history, garage size, last improvement-permit recency) adds +3.5% lift@15K, confirmed cross-fold. Candidate for the next retrain.
Pending
FA assessor snapshot vintage — 10-to-40-month forward-leak risk on 11 features including the #2 driver. Awaiting data-team confirmation. Production stays on v22 until this resolves.
Closed
County + storm-name memorization ablation: partially validated. Variance tightened 14% but mean lift dropped 1.6% — categoricals carry real recent-storm signal, so they stay.
Anti-signals

What pulls a score down.

Removed before scoring
  • Absentee owner
  • Non-individual owner
  • Mobile / manufactured
  • Vacant or no mailable address
Roof says "wait"
  • Recently reroofed roof_age_months_est
  • Long-life tile roof_cover_code
  • Prior canceled permit months_since_canceled_roof
Quiet surroundings
  • Few neighbors reroofing area_pct_wholeroofs_36m
  • No recent storm months_since_nearest_storm
  • Low-rate county fips

Two stages. The first column is a hard filter — absentees, non-individual owners, mobile homes, vacants, missing EMV, and addresses that fail Smarty DPV are cut before the model ever scores. The other two columns are in-model signals whose adverse values pull a score down (directions inferred from feature behavior, not yet SHAP-audited). The 15-year recent-roof rule is a separate hard exclusion at the output gate.

Bottom line

Against a random draw from the scored universe the model is roughly nine-to-ten times more efficient across six historical windows (about 4–5× against the tighter buy-box a client actually mails). Physical roof age dominates (≈ 51% of signal), followed by place, owner activity, material, and capacity — mirroring the questions a twenty-year Florida roofer would ask. What the model can't see: insurance-carrier data, owner intent before permits surface, and any market outside the seven Florida counties it was trained on. Production stays on v22 until the FA-vintage question (the largest open lever) is resolved.

v22 super · FL-7 · 163 feat · build 5295029 · eval 2025-10-31
Field Guide · 2