Data Quality · roofing_master_gold_v1

Gold table QA audit

23 pass/fail checks. An audit that fails in any one county fails.

Go test the queries in Athena
Snapshot under audit
Table audited
roofing_master_gold_v1 · db 8020rei-gold-data-lake
Table version (vintage)
20260512 — vintage label = data as-of 12 May 2026 (not the write date)
Table last written
29 May 2026 · Hudi _hoodie_commit_time 20260529215620885 (Hernando 20260529220723222) — the snapshot you are viewing
Audit last run
4 Jun 2026 · DuckDB on the frozen 29 May commit · DuckDB == Athena, finding 91
S3 snapshot path
s3://8020rei-gold-data-lake/Roofing_Master_Gold_V1_20260512
Audit verdict · a single failing county fails the audit

16 of 23 audits fail — fix Gold before any model.

16/23
70% failed
audits fail in ≥1 county
Columns
Florida · deep audit
National · big
National · medium
Florida · deep audit
National · big
National · medium
#IDAuditPasco12101Hernando12053Pinellas12103Los Angeles06037Cook17031Harris48201Maricopa04013San Diego06073Boulder08013St. Clair17163Cumberland23005Multnomah41051Davidson47037
1A1Permit fan-outexpect ratio = 1.01.001.001.001.001.001.001.001.001.001.001.001.001.00
Each permit must occupy exactly ONE row. If permit #5 shows twice, you double-count it.
① Count query (must hit the expected value) · paste into Athena
SELECT count(*)*1.0 / count(DISTINCT building_permit_id) AS rows_per_permit
FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103
② Inspect the rows (drill-down) · paste into Athena
SELECT building_permit_id, count(*) AS n_rows
FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103
GROUP BY building_permit_id HAVING count(*) > 1
LIMIT 50
Sample failing rows (Pinellas)

No sample rows captured — run the drill-down query above in Athena to inspect.

2A2Duplicate permitexpect 0 groups293,37895,456824,3182,057,590689,7181,293,576466,627459,055135,21831,89681,963160,391
The same municipal permit number should not appear under two records. If #BCP-123 (Clearwater) shows twice, it is double-counted.
① Count query (must hit the expected value) · paste into Athena
SELECT count(*) FROM (
  SELECT permit_number, permit_jurisdiction
  FROM "8020rei-gold-data-lake".roofing_master_gold_v1
  WHERE fips = 12103 AND permit_number IS NOT NULL
  GROUP BY 1,2 HAVING count(*) > 1)
② Inspect the rows (drill-down) · paste into Athena
SELECT permit_number, permit_jurisdiction, count(*) AS n_rows,
       array_join(array_agg(building_permit_id), ', ') AS building_permit_ids
FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103 AND permit_number IS NOT NULL
GROUP BY 1,2 HAVING count(*) > 1
ORDER BY n_rows DESC LIMIT 20
Sample failing rows (Pinellas)
permit_numberjurisdictionn_filasbuilding_permit_ids
PER-H-CB12-04179FL_Pinellas21A97Vzro, NVYMBWN, BrR4ZXp, NVjl544, Wg4boak, WgKONBz, kpmQJ6A, QGe4x19, A9qQkax, ZeyKN3z, aO76aVp, GLQ6dm9, ddkg3w9, Br7BA26, kpZ1dXN, QGoaZAQ, M4GBQ1B, bB0wQy, 55qXJ2j, z4ygQA9, z4d2jbB
PER-H-CB14-03985FL_Pinellas21bXn09My, YxK0Om0, pQ9Ll43, kpLq4YA, 19Bakny, jlLRVa1, L50krnb, o5mLjVb, dd06EQJ, 3JMzXVd, NV0A7p4, 0rQZWlW, K6709mz, nNL1yEW, g5LrpM6, lrm1XpG, 92Elan2, E576Rmn, Wg03W6z, QG0MdV9, EW6an5
PER-H-CB12-06801FL_Pinellas209WNNdB, JrREkwE, qdoZ9mR, kpmQdbp, m5GQr4X, jlzk647, WgKOom5, RYbVkyA, 5VKYkxq, Xgedzl1, ddmj3Vr, NVaeBdG, L5aE2BQ, xMeb0Ob, 0rEM6bm, j5yyE7, 1rwwyw, rMba0mq, NVjlBmz, 7r4EAg3
3B1FA_PROPERTYID -> Addressexpect 0 ids2.16%5,160 · worst 330.15%138 · worst 21.69%5,152 · worst 2521.39%18,833 · worst 820.32%1,275 · worst 180.69%6,504 · worst 150.69%6,466 · worst 2282.50%13,960 · worst 1190.36%328 · worst 70.00%01.26%136 · worst 21.18%2,272 · worst 40.38%715 · worst 5
One property (FA id) is ONE house. If the same id appears at 2+ addresses, it is mis-linked.
① Count query (must hit the expected value) · paste into Athena
WITH ex AS (
  SELECT pid, coalesce(cast(fips as varchar),'')||'|'||coalesce(cast(zip_code as varchar),'')||'|'||coalesce(regexp_replace(upper(street), '[^A-Z0-9]', ''),'') AS addr
  FROM "8020rei-gold-data-lake".roofing_master_gold_v1
  CROSS JOIN UNNEST(fa_propertyid) AS t(pid)
  WHERE fips = 12103)
SELECT count(*) FROM (
  SELECT pid FROM ex GROUP BY pid HAVING count(DISTINCT addr) > 1)
② Inspect the rows (drill-down) · paste into Athena
WITH ex AS (
  SELECT pid, coalesce(cast(fips as varchar),'')||'|'||coalesce(cast(zip_code as varchar),'')||'|'||coalesce(regexp_replace(upper(street), '[^A-Z0-9]', ''),'') AS addr
  FROM "8020rei-gold-data-lake".roofing_master_gold_v1
  CROSS JOIN UNNEST(fa_propertyid) AS t(pid)
  WHERE fips = 12103)
SELECT pid, count(DISTINCT addr) AS n_addresses,
       array_join(array_agg(DISTINCT addr), '  |  ') AS addresses
FROM ex GROUP BY pid HAVING count(DISTINCT addr) > 1
ORDER BY n_addresses DESC LIMIT 20
Sample failing rows (Pinellas)
fa_propertyidn_direccionesmuestra_direcciones
349917762521656 SERENITY LN, APT A, DUNEDIN 34698 | 1687 PARADISE LN, DUNEDIN 34698 | 1651 WALSH LN, DUNEDIN 34698 | 1656 SERENITY LN, DUNEDIN 34698
350302881852639 SUMO DR, CLEARWATER 33764 | 2644 FUJIAMA DR, CLEARWATER 33764 | 1212 TEAHOUSE DR, CLEARWATER 33764 | 1293 S KIMONA DR, CLEARWATER 33764
35074353922690 ENTERPRISE RD E, # 77, CLEARWATER 33759 | 2769 ENTERPRISE RD E, CLEARWATER 33759 | 2791 ENTERPRISE RD E, APT 8, CLEARWATER 33759 | 2690 ENTERPRISE RD, CLEARWATER 33759
4B2Address -> FA_PROPERTYIDexpect 0 addresses0.49%1,201 · worst 182.95%2,648 · worst 400.64%1,953 · worst 3040.70%9,601 · worst 2064.47%16,516 · worst 2692.94%26,368 · worst 3731.31%12,074 · worst 2741.23%6,331 · worst 25,2781.16%1,024 · worst 793.96%102 · worst 91.87%198 · worst 382.08%3,904 · worst 2151.76%3,221 · worst 70
One house must have ONE property id. If an address carries several FA ids, we cannot tell which is real.
① Count query (must hit the expected value) · paste into Athena
WITH ex AS (
  SELECT pid, coalesce(cast(fips as varchar),'')||'|'||coalesce(cast(zip_code as varchar),'')||'|'||coalesce(regexp_replace(upper(street), '[^A-Z0-9]', ''),'') AS addr
  FROM "8020rei-gold-data-lake".roofing_master_gold_v1
  CROSS JOIN UNNEST(fa_propertyid) AS t(pid)
  WHERE fips = 12103)
SELECT count(*) FROM (
  SELECT addr FROM ex GROUP BY addr HAVING count(DISTINCT pid) > 1)
② Inspect the rows (drill-down) · paste into Athena
WITH ex AS (
  SELECT pid, coalesce(cast(fips as varchar),'')||'|'||coalesce(cast(zip_code as varchar),'')||'|'||coalesce(regexp_replace(upper(street), '[^A-Z0-9]', ''),'') AS addr
  FROM "8020rei-gold-data-lake".roofing_master_gold_v1
  CROSS JOIN UNNEST(fa_propertyid) AS t(pid)
  WHERE fips = 12103)
SELECT addr, count(DISTINCT pid) AS n_fa_ids,
       array_join(array_agg(DISTINCT cast(pid AS varchar)), ', ') AS fa_ids
FROM ex GROUP BY addr HAVING count(DISTINCT pid) > 1
ORDER BY n_fa_ids DESC LIMIT 20
Sample failing rows (Pinellas)
direccionn_fa_idsmuestra_fa_ids
18675 US-19 HWY, CLEARWATER 3376430434987361, 34987783, 34987530, 34987411, 34987700
400 ISLAND WAY, CLEARWATER BEACH 3376717234835172, 34835189, 34834790, 34835265, 34835300
24862 US-19 HWY N, CLEARWATER 3376317135155713, 35155746, 35155586, 35155628, 35155571
5B3Address -> PropertyID (gold)expect 0 addresses0.21%652 · worst 190.51%568 · worst 110.64%2,364 · worst 1581.00%19,467 · worst 2170.09%404 · worst 130.37%4,491 · worst 790.20%2,175 · worst 1741.01%6,368 · worst 270.32%341 · worst 140.27%9 · worst 31.00%157 · worst 120.06%128 · worst 161.61%3,715 · worst 186
Same as B2 but with the Gold internal PropertyID. This is the trustworthy key.
① Count query (must hit the expected value) · paste into Athena
SELECT count(*) FROM (
  SELECT coalesce(cast(fips as varchar),'')||'|'||coalesce(cast(zip_code as varchar),'')||'|'||coalesce(regexp_replace(upper(street), '[^A-Z0-9]', ''),'') AS addr
  FROM "8020rei-gold-data-lake".roofing_master_gold_v1
  WHERE fips = 12103
  GROUP BY coalesce(cast(fips as varchar),'')||'|'||coalesce(cast(zip_code as varchar),'')||'|'||coalesce(regexp_replace(upper(street), '[^A-Z0-9]', ''),'') HAVING count(DISTINCT propertyid) > 1)
② Inspect the rows (drill-down) · paste into Athena
SELECT coalesce(cast(fips as varchar),'')||'|'||coalesce(cast(zip_code as varchar),'')||'|'||coalesce(regexp_replace(upper(street), '[^A-Z0-9]', ''),'') AS addr, count(DISTINCT propertyid) AS n_propertyid,
       array_join(array_agg(DISTINCT propertyid), ', ') AS propertyids
FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103
GROUP BY coalesce(cast(fips as varchar),'')||'|'||coalesce(cast(zip_code as varchar),'')||'|'||coalesce(regexp_replace(upper(street), '[^A-Z0-9]', ''),'') HAVING count(DISTINCT propertyid) > 1
ORDER BY n_propertyid DESC LIMIT 20
Sample failing rows (Pinellas)

No sample rows captured — run the drill-down query above in Athena to inspect.

6B4PropertyID (gold) -> Addressexpect 0 PropertyID0.00%00.00%00.00%00.00%00.00%00.00%00.00%00.00%00.00%00.00%00.00%00.00%00.00%0
One internal PropertyID must be one house.
① Count query (must hit the expected value) · paste into Athena
SELECT count(*) FROM (
  SELECT propertyid
  FROM "8020rei-gold-data-lake".roofing_master_gold_v1
  WHERE fips = 12103
  GROUP BY propertyid HAVING count(DISTINCT coalesce(cast(fips as varchar),'')||'|'||coalesce(cast(zip_code as varchar),'')||'|'||coalesce(regexp_replace(upper(street), '[^A-Z0-9]', ''),'')) > 1)
② Inspect the rows (drill-down) · paste into Athena
SELECT propertyid, count(DISTINCT coalesce(cast(fips as varchar),'')||'|'||coalesce(cast(zip_code as varchar),'')||'|'||coalesce(regexp_replace(upper(street), '[^A-Z0-9]', ''),'')) AS n_addresses
FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103
GROUP BY propertyid HAVING count(DISTINCT coalesce(cast(fips as varchar),'')||'|'||coalesce(cast(zip_code as varchar),'')||'|'||coalesce(regexp_replace(upper(street), '[^A-Z0-9]', ''),'')) > 1
ORDER BY n_addresses DESC LIMIT 20
Sample failing rows (Pinellas)

No sample rows captured — run the drill-down query above in Athena to inspect.

7B5Address -> APNexpect 0 addresses16.08%44,744 · worst 1,183n/a11.03%30,558 · worst 4484.92%11,426 · worst 1,2944.93%2,555 · worst 1054.02%591 · worst 395.38%25,672 · worst 1,49512.03%42,447 · worst 7,30113.39%9,836 · worst 167n/an/a1.44%133 · worst 195.69%9,167 · worst 20,114
One house must have ONE parcel number (APN). Several means the catalog or the join is wrong.
① Count query (must hit the expected value) · paste into Athena
SELECT count(*) FROM (
  SELECT coalesce(cast(fips as varchar),'')||'|'||coalesce(cast(zip_code as varchar),'')||'|'||coalesce(regexp_replace(upper(street), '[^A-Z0-9]', ''),'') AS addr
  FROM "8020rei-gold-data-lake".roofing_master_gold_v1
  WHERE fips = 12103 AND parcel_number IS NOT NULL AND trim(parcel_number) <> ''
  GROUP BY coalesce(cast(fips as varchar),'')||'|'||coalesce(cast(zip_code as varchar),'')||'|'||coalesce(regexp_replace(upper(street), '[^A-Z0-9]', ''),'') HAVING count(DISTINCT parcel_number) > 1)
② Inspect the rows (drill-down) · paste into Athena
SELECT coalesce(cast(fips as varchar),'')||'|'||coalesce(cast(zip_code as varchar),'')||'|'||coalesce(regexp_replace(upper(street), '[^A-Z0-9]', ''),'') AS addr, count(DISTINCT parcel_number) AS n_apn,
       array_join(array_agg(DISTINCT parcel_number), ', ') AS apns
FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103 AND parcel_number IS NOT NULL AND trim(parcel_number) <> ''
GROUP BY coalesce(cast(fips as varchar),'')||'|'||coalesce(cast(zip_code as varchar),'')||'|'||coalesce(regexp_replace(upper(street), '[^A-Z0-9]', ''),'') HAVING count(DISTINCT parcel_number) > 1
ORDER BY n_apn DESC LIMIT 20
Sample failing rows (Pinellas)
direccionn_apnmuestra_apn
19029 US-19 HWY, CLEARWATER 3376444820-29-16-41955-000-0408, 20-29-16-41955-000-0404, 20-29-16-41937-001-0250, 20-29-16-41942-000-0120, 20-29-16-41944-000-0210
36750 US-19 HWY N, PALM HARBOR 34684440252715430990001056, 252715430930001024, 252715430911002112, 252715430860001056, 252715430800002048
18675 US-19 HWY, CLEARWATER 3376432720-29-16-03290-000-5040, 20-29-16-03290-000-1200, 20-29-16-03290-000-2330, 20-29-16-03290-000-2370, 20-29-16-03290-000-1720
8C1FA_PROPERTYID emptyexpect 0%17.43%370,84020.75%184,43713.27%717,05628.08%3,604,32126.17%854,43631.17%3,063,59419.63%828,76226.90%847,73116.38%151,74734.79%2,41730.73%12,61013.09%181,67120.62%359,038
The permit ended up linked to no property (empty list []). A letter with no recipient.
① Count query (must hit the expected value) · paste into Athena
SELECT count(*) FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103 AND cardinality(fa_propertyid) = 0
② Inspect the rows (drill-down) · paste into Athena
SELECT building_permit_id, permit_number, street, city, zip_code, subtype, issued_date
FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103 AND cardinality(fa_propertyid) = 0 AND lower(subtype) LIKE '%roof%'
LIMIT 20
Sample failing rows (Pinellas)
permit_idpermit_numberstreetcityzipsubtypeissued
zGq1LwxRBP26-0003236580 SEMINOLE BLVDSEMINOLE33772Re-roof2026-05-04
G0lQ1XlEBP-26-0651312450 ULMERTON RDLARGO33774Reroof metal/alum roof over– res/comm2026-04-29
ZWoa4dzCBP26-00009510983 70TH AVE NSEMINOLE33772Re-roof2026-04-27
9C2ISSUED_DATE nullexpect 0%66.51%1,415,2957.57%67,25151.98%2,808,68123.44%3,008,9916.41%209,19715.39%1,512,80224.43%1,031,06325.25%795,60528.82%266,97019.96%1,3877.26%2,98117.35%240,69927.33%475,929
Permit with no issue date. Without it we cannot tell if the roof is from 2024 or 2005.
① Count query (must hit the expected value) · paste into Athena
SELECT count(*) FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103 AND issued_date IS NULL
② Inspect the rows (drill-down) · paste into Athena
SELECT building_permit_id, street, subtype, latest_status, issued_date, partition_year
FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103 AND issued_date IS NULL AND lower(subtype) LIKE '%roof%'
LIMIT 20
Sample failing rows (Pinellas)
permit_idstreetcitysubtypestatusissuedpart_year
075BxW7583 MEADOWLAWN DR NSAINT PETERSBURGRoof permitcompletenan1998
07g1WW527 53RD ST SSAINT PETERSBURGRoof permitcompletenan2007
07gYpd7962 9TH AVE SSAINT PETERSBURGRoof permitcompletenan2006
10C3APN missingexpect 0%29.94%637,14199.99%888,63446.06%2,488,47391.51%11,744,35183.97%2,741,61599.54%9,783,87770.68%2,983,44760.10%1,893,94767.43%624,550100.00%6,94899.79%40,94898.78%1,370,59466.70%1,161,511
Permit with no parcel number (PARCEL_NUMBER). In Hernando 99.99% is missing.
① Count query (must hit the expected value) · paste into Athena
SELECT count(*) FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103 AND (parcel_number IS NULL OR trim(parcel_number) = '')
② Inspect the rows (drill-down) · paste into Athena
SELECT building_permit_id, street, city, parcel_number, subtype
FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103 AND (parcel_number IS NULL OR trim(parcel_number) = '')
LIMIT 20
Sample failing rows (Pinellas)
permit_idstreetcityapnsubtype
00XMRa501 N WASHINGTON AVECLEARWATERnannan
02MGkQ607 SANDY HOLLOW RDTARPON SPRINGSnanWindows/doors/garage door/exterior coverings
02MzJb224 E TARPON AVETARPON SPRINGSnanSign
11C4No house numberexpect 0%5.39%114,7413.37%29,9550.49%26,2960.54%69,1060.42%13,6120.38%37,7373.03%127,7471.29%40,6370.35%3,2420.43%301.08%4422.14%29,7393.72%64,723
The address does not start with a number ('123 street'), so it cannot be reconstructed.
① Count query (must hit the expected value) · paste into Athena
SELECT count(*) FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103 AND NOT regexp_like(coalesce(street,''), '^[0-9]')
② Inspect the rows (drill-down) · paste into Athena
SELECT building_permit_id, street, city, zip_code
FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103 AND NOT regexp_like(coalesce(street,''), '^[0-9]') AND street <> ''
LIMIT 20
Sample failing rows (Pinellas)
permit_idstreetcityzip
0NXm3bINNESS DRTARPON SPRINGSnan
0j7zQzd<PLACEHOLDER>PINELLAS COUNTYnan
0jMNZdW<PLACEHOLDER>PINELLAS COUNTYnan
12C5ZIP nullexpect 0%5.63%119,8558.77%77,9670.99%53,6661.50%192,7640.63%20,5221.31%128,3123.65%153,9101.81%57,1691.85%17,1063.51%24463.80%26,1811.20%16,6463.55%61,805
Permit with no ZIP code. Weakens the address match and every geo/demographic join.
① Count query (must hit the expected value) · paste into Athena
SELECT count(*) FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103 AND zip_code IS NULL
② Inspect the rows (drill-down) · paste into Athena
SELECT building_permit_id, street, city, zip_code, subtype
FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103 AND zip_code IS NULL
LIMIT 20
Sample failing rows (Pinellas)

No sample rows captured — run the drill-down query above in Athena to inspect.

13C6No description or subtypeexpect 0%9.69%206,2450.01%5116.46%889,1781.46%187,5392.46%80,47520.85%2,049,10711.43%482,5464.41%138,8263.53%32,6900.00%026.86%11,0200.23%3,2599.83%171,233
Both DESCRIPTION and SUBTYPE are empty, so the work cannot be classified — no roofing label is possible for this row.
① Count query (must hit the expected value) · paste into Athena
SELECT count(*) FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103 AND (description IS NULL OR trim(description) = '')
  AND (subtype IS NULL OR trim(subtype) = '')
② Inspect the rows (drill-down) · paste into Athena
SELECT building_permit_id, street, type, subtype, description
FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103 AND (description IS NULL OR trim(description) = '')
  AND (subtype IS NULL OR trim(subtype) = '')
LIMIT 20
Sample failing rows (Pinellas)

No sample rows captured — run the drill-down query above in Athena to inspect.

14D1ISSUED > COMPLETEDexpect 00.00%00.00%00.00%00.00%00.00%00.00%00.00%00.00%00.00%00.00%00.00%00.00%00.00%0
Issue date is AFTER completion date. Impossible: you cannot finish before the permit is issued.
① Count query (must hit the expected value) · paste into Athena
SELECT count(*) FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103 AND issued_date > completed_date
② Inspect the rows (drill-down) · paste into Athena
SELECT building_permit_id, street, issued_date, completed_date
FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103 AND issued_date > completed_date
ORDER BY issued_date DESC LIMIT 20
Sample failing rows (Pinellas)

No sample rows captured — run the drill-down query above in Athena to inspect.

15D2APPLIED > ISSUEDexpect 00.00%00.00%00.00%00.00%00.00%00.00%00.00%00.00%00.00%00.00%00.00%00.00%00.00%0
You applied AFTER it was issued. Impossible.
① Count query (must hit the expected value) · paste into Athena
SELECT count(*) FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103 AND applied_date > issued_date
② Inspect the rows (drill-down) · paste into Athena
SELECT building_permit_id, street, applied_date, issued_date
FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103 AND applied_date > issued_date
ORDER BY applied_date DESC LIMIT 20
Sample failing rows (Pinellas)

No sample rows captured — run the drill-down query above in Athena to inspect.

16D3Future dateexpect 00.00%00.00%70.00%1150.00%1530.00%220.00%00.00%90.00%00.00%00.00%00.02%70.00%30.00%0
Some date is in the future (after today). Capture error.
① Count query (must hit the expected value) · paste into Athena
SELECT count(*) FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103 AND greatest(
  coalesce(issued_date, DATE '1900-01-01'),
  coalesce(completed_date, DATE '1900-01-01'),
  coalesce(applied_date, DATE '1900-01-01')) > current_date
② Inspect the rows (drill-down) · paste into Athena
SELECT building_permit_id, street, issued_date, completed_date, applied_date
FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103 AND greatest(
  coalesce(issued_date, DATE '1900-01-01'),
  coalesce(completed_date, DATE '1900-01-01'),
  coalesce(applied_date, DATE '1900-01-01')) > current_date
LIMIT 20
Sample failing rows (Pinellas)
permit_idstreetissuedcompletedapplied
3Jz2jl3252 GARDEN CIR N2054-05-27nan2001-08-23
5Va0Wkq1185 NELSON ST2050-05-23nan2019-08-16
Wg4KkNq8740 47TH ST2100-12-31nan2000-04-21
17D4Final without issueexpect 056.13%1,194,3860.40%3,54742.83%2,314,41513.68%1,755,4932.03%66,2351.66%163,27714.20%599,24111.84%373,15821.26%196,9620.00%04.11%1,6881.39%19,34810.81%188,262
Permit says 'final/complete' but has no issue date. Contradiction.
① Count query (must hit the expected value) · paste into Athena
SELECT count(*) FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103 AND lower(latest_status) IN ('final','complete','completed')
  AND issued_date IS NULL
② Inspect the rows (drill-down) · paste into Athena
SELECT building_permit_id, street, subtype, latest_status, issued_date
FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103 AND lower(latest_status) IN ('final','complete','completed')
  AND issued_date IS NULL AND lower(subtype) LIKE '%roof%'
LIMIT 20
Sample failing rows (Pinellas)
permit_idstreetsubtypestatusissued
075BxW7583 MEADOWLAWN DR NRoof permitcompletenan
07g1WW527 53RD ST SRoof permitcompletenan
07gYpd7962 9TH AVE SRoof permitcompletenan
18D5Pre-1980 sentinel dateexpect 00.00%00.00%120.00%400.01%1,4660.00%70.01%1,2890.00%240.00%930.01%540.00%00.01%40.00%00.00%0
ISSUED_DATE before 1980 — almost always a sentinel/placeholder, not a real permit date.
① Count query (must hit the expected value) · paste into Athena
SELECT count(*) FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103 AND issued_date < DATE '1980-01-01'
② Inspect the rows (drill-down) · paste into Athena
SELECT building_permit_id, street, issued_date, subtype, latest_status
FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103 AND issued_date < DATE '1980-01-01'
ORDER BY issued_date LIMIT 20
Sample failing rows (Pinellas)

No sample rows captured — run the drill-down query above in Athena to inspect.

19E1FIPS = COUNTY_FIPSexpect 00.00%00.00%00.00%00.00%00.00%00.00%00.00%00.00%00.00%00.00%00.00%00.00%00.00%0
The row's county (FIPS) must equal COUNTY_FIPS.
① Count query (must hit the expected value) · paste into Athena
SELECT count(*) FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103 AND fips <> county_fips
② Inspect the rows (drill-down) · paste into Athena
SELECT building_permit_id, fips, county_fips, street
FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103 AND fips <> county_fips
LIMIT 20
Sample failing rows (Pinellas)

No sample rows captured — run the drill-down query above in Athena to inspect.

20E2Partition = FIPSexpect 00.00%00.00%00.00%00.00%00.00%00.00%00.00%00.00%00.00%00.00%00.00%00.00%00.00%0
The file is partitioned by county; that number must equal FIPS.
① Count query (must hit the expected value) · paste into Athena
SELECT count(*) FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103 AND split_part(partition_fips_salting, '_', 1) <> cast(fips AS varchar)
② Inspect the rows (drill-down) · paste into Athena
SELECT building_permit_id, fips, partition_fips_salting
FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103 AND split_part(partition_fips_salting, '_', 1) <> cast(fips AS varchar)
LIMIT 20
Sample failing rows (Pinellas)

No sample rows captured — run the drill-down query above in Athena to inspect.

21E3ZIP outside its stateexpect 00.00%700.00%410.03%1,4150.19%24,0070.01%2780.00%4600.02%1,0520.02%6370.01%610.00%00.64%2630.02%2550.00%66
ZIP code falls outside the band for the row's own state (e.g. a 90210 in a Florida county). State-aware — meaningful for every county, not just FL.
① Count query (must hit the expected value) · paste into Athena
SELECT count(*) FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103 AND zip_code IS NOT NULL AND NOT (
  CASE substr(lpad(cast(fips as varchar),5,'0'),1,2) WHEN '12' THEN zip_code BETWEEN 32000 AND 34999 WHEN '06' THEN zip_code BETWEEN 90000 AND 96199 WHEN '17' THEN zip_code BETWEEN 60000 AND 62999 WHEN '48' THEN zip_code BETWEEN 75000 AND 79999 OR zip_code BETWEEN 73300 AND 73399 OR zip_code BETWEEN 88500 AND 88599 WHEN '04' THEN zip_code BETWEEN 85000 AND 86599 WHEN '08' THEN zip_code BETWEEN 80000 AND 81699 WHEN '23' THEN zip_code BETWEEN 3900 AND 4999 WHEN '41' THEN zip_code BETWEEN 97000 AND 97999 WHEN '47' THEN zip_code BETWEEN 37000 AND 38599 ELSE true END)
② Inspect the rows (drill-down) · paste into Athena
SELECT building_permit_id, state, city, zip_code
FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103 AND zip_code IS NOT NULL AND NOT (
  CASE substr(lpad(cast(fips as varchar),5,'0'),1,2) WHEN '12' THEN zip_code BETWEEN 32000 AND 34999 WHEN '06' THEN zip_code BETWEEN 90000 AND 96199 WHEN '17' THEN zip_code BETWEEN 60000 AND 62999 WHEN '48' THEN zip_code BETWEEN 75000 AND 79999 OR zip_code BETWEEN 73300 AND 73399 OR zip_code BETWEEN 88500 AND 88599 WHEN '04' THEN zip_code BETWEEN 85000 AND 86599 WHEN '08' THEN zip_code BETWEEN 80000 AND 81699 WHEN '23' THEN zip_code BETWEEN 3900 AND 4999 WHEN '41' THEN zip_code BETWEEN 97000 AND 97999 WHEN '47' THEN zip_code BETWEEN 37000 AND 38599 ELSE true END)
LIMIT 20
Sample failing rows (Pinellas)
permit_idstreetcityzipfips
1jrGk5410810 91ST AVEPEORIA8534512103
2jlzZ4g2987 MANGULAR AVECORONA9288212103
6emprxN8225 SH 6, UNIT 300MISSOURI CITY7745912103
22E4STATE = county stateexpect 00.00%360.01%450.03%1,5000.01%1,2540.01%3070.00%3400.03%1,0730.02%5260.00%410.00%00.68%2810.01%1460.00%66
The STATE column must match the state implied by the county FIPS. A mismatch is a mislabeled / cross-pasted row.
① Count query (must hit the expected value) · paste into Athena
SELECT count(*) FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103 AND state IS NOT NULL AND trim(state) <> ''
  AND upper(trim(state)) <> (
  CASE substr(lpad(cast(fips as varchar),5,'0'),1,2) WHEN '12' THEN 'FL' WHEN '06' THEN 'CA' WHEN '17' THEN 'IL' WHEN '48' THEN 'TX' WHEN '04' THEN 'AZ' WHEN '08' THEN 'CO' WHEN '23' THEN 'ME' WHEN '41' THEN 'OR' WHEN '47' THEN 'TN' ELSE upper(trim(state)) END)
② Inspect the rows (drill-down) · paste into Athena
SELECT building_permit_id, state, fips, county_name, city, zip_code
FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103 AND state IS NOT NULL AND trim(state) <> ''
  AND upper(trim(state)) <> (
  CASE substr(lpad(cast(fips as varchar),5,'0'),1,2) WHEN '12' THEN 'FL' WHEN '06' THEN 'CA' WHEN '17' THEN 'IL' WHEN '48' THEN 'TX' WHEN '04' THEN 'AZ' WHEN '08' THEN 'CO' WHEN '23' THEN 'ME' WHEN '41' THEN 'OR' WHEN '47' THEN 'TN' ELSE upper(trim(state)) END)
LIMIT 20
Sample failing rows (Pinellas)

No sample rows captured — run the drill-down query above in Athena to inspect.

23F1STREET4JOIN1 ≠ STREET (mis-keyed join)expect 00.00%00.00%00.00%00.00%00.00%00.00%00.00%00.00%00.00%00.00%00.00%00.00%00.00%0
STREET4JOIN1 is the key the ETL uses to attach each permit to a property. It must be THIS row's STREET, normalized (upper-case, no spaces or punctuation). When it instead holds a DIFFERENT address, the permit is joined to the WRONG house. Example: 2672 MONTAGUE CT W, Clearwater has 6 permits — only one carries the correct key (2672MONTAGUECTW); the other five are keyed to unrelated properties (2041 AUSTRALIA WAY, 3343 SAN PEDRO ST, 1175 PINELLAS POINT DR, 1589 W KLOSTERMAN RD, 1412 S BETTY LN). In Pinellas 79% of rows are mis-keyed like this — the permit→property link is silently wrong for most of the county.
① Count query (must hit the expected value) · paste into Athena
SELECT count(*) FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103
  AND regexp_replace(upper(street), '[^A-Z0-9]', '')
   <> regexp_replace(upper(street4join1), '[^A-Z0-9]', '')
② Inspect the rows (drill-down) · paste into Athena
-- each row's join key (street4join1) should equal norm_street; where it doesn't, it points at another house
SELECT building_permit_id, street AS real_street,
       regexp_replace(upper(street), '[^A-Z0-9]', '') AS norm_street,
       street4join1 AS join_key, zip_code
FROM "8020rei-gold-data-lake".roofing_master_gold_v1
WHERE fips = 12103
  AND regexp_replace(upper(street), '[^A-Z0-9]', '')
   <> regexp_replace(upper(street4join1), '[^A-Z0-9]', '')
  -- reproduce the example: AND street LIKE '2672 MONTAGUE%'
LIMIT 20
Sample failing rows (Pinellas)

No sample rows captured — run the drill-down query above in Athena to inspect.

↔ scrolls sideways · toggle columns with “Show counties” above · click a row to expand it.

Data dictionary — 54 columns (click a header to sort)

ColumnTypeDescriptionExample
_hoodie_commit_timevarcharHudi: commit timestamp that wrote the row. Internal.
_hoodie_commit_seqnovarcharHudi: commit sequence number. Internal.
_hoodie_record_keyvarcharHudi: unique physical record key (lake PK).
_hoodie_partition_pathvarcharHudi: partition path. Internal.
_hoodie_file_namevarcharHudi: source parquet file. Internal.
BUILDING_PERMIT_IDvarcharUnique permit id (business key). 1 row = 1 permit.
FIPSinteger5-digit county code. 12103=Pinellas, 12101=Pasco, 12053=Hernando.
ID_PREFIXvarcharFirst char of the record key. Internal.
STATEvarcharState (FL).
PropertyIDvarcharGold internal property id. Stable per physical address -> recommended key.
CONTRACTOR_IDvarcharContractor id.
PERMIT_NUMBERvarcharPermit number assigned by the municipality (verifiable on portal).
DESCRIPTIONvarcharFree text of the work. Source for roofing classification.
TYPEvarcharPermit category (e.g. 'Building - construction permit').
SUBTYPEvarcharSubcategory (e.g. 'Roof', 'HVAC').
PROJECT_NAMEvarcharProject name (sparsely populated).
HOMEOWNERvarcharHomeowner name.
BUSINESS_NAMEvarcharBusiness / contractor name.
JOB_VALUEdoubleDeclared job value (USD).
FEESdoubleFees paid (USD).
PARCEL_NUMBERvarcharCounty APN / parcel number. Very incomplete (missing 45-99%).
SUBDIVISIONvarcharSubdivision.
STORIESintegerNumber of stories.
SQUARE_FEETintegerSquare feet.
UNITSintegerUnits.
INITIAL_STATUS_DATEdateInitial status date.
INITIAL_STATUSvarcharInitial status (e.g. 'applied').
LATEST_STATUS_DATEdateLatest status date.
LATEST_STATUSvarcharLatest status ('final','complete','expired'...).
APPLIED_DATEdateApplication date.
ISSUED_DATEdateIssue date. Key for roof recency; often null (52-72%).
COMPLETED_DATEdateCompletion date.
CANCELLED_DATEdateCancellation date.
PERMIT_JURISDICTIONvarcharJurisdiction (e.g. 'FL_Pinellas_Clearwater').
PARTITION_YEARintegerPartition year (derived from dates).
STREETvarcharReal street address of the permit.
CITYvarcharPermit city (use silver to group by property).
ZIP_CODEintegerZIP code (integer). Sometimes mistyped.
LATITUDEdoubleGeocoded latitude.
LONGITUDEdoubleGeocoded longitude.
PLACE_FIPSvarcharCensus place FIPS.
PLACE_NAMEvarcharPlace name.
COUNTY_FIPSintegerCounty FIPS. Must equal FIPS.
COUNTY_NAMEvarcharCounty name.
CBSA_FIPSvarcharMetro area (CBSA) FIPS.
CBSA_NAMEvarcharMetro area name.
DateSnapshotdateData snapshot date.
PROJECT_TYPEarray<varchar>Array of classified types (e.g. ['Roofing','Residential']).
AUX_PERMIT_STATUSvarcharAuxiliary permit status.
AUX_EFFECTIVE_STATUS_DATEdateAuxiliary status effective date.
STREET4JOIN1varcharNormalized street = ETL join key permit<->property.
Partition_FIPS_SaltingvarcharFIPS + salt for partition distribution (e.g. '12103_-2').
FA_PROPERTYIDarray<integer>Array of linked First American property ids.
tstimestampProcessing timestamp.