Hi,
I have a study looking to examine health outcomes over a long period and use Census data/ACS to provide contextual factors. Here are some hypothetical descriptors:
- 1,000,000 individuals with geocoded addresses for X,Y coordinates
- The addresses represent where the individuals were living at the time of a common medical procedure
- Spatial joined data to get FIPS 2020 at the block group level
- Records span from 1998 to 2021
- There are several very differing regions across the country contributing.
Advice tends to say, "do not use overlapping periods for comparison", but over such a long period of time, there have been substantial changes to the area. The 1-year estimates have larger MOEs, but there are big jumps in changes, even with the 5-year estimates.
The study would like to use block level estimates, but the documentation suggests two things:
- MOE are much larger at the block group level
- Block groups should be combined prior to analysis:
- https://www.federalregister.gov/documents/2018/11/13/2018-24570/block-groups-for-the-2020-census-final-criteria
- I read this as, "you can do this, but you need to commit effort into aggregating block groups effectively".
- Only ~1% of households are sampled across the US for the ACS in a given year; which means between 6 and 30 survey responses were used for a given block group, whereas a tract would consistently have ~45.
Here's what I think I would propose to this study:
- Use tract level data. You could use Block Group, but it represents work for regions to aggregate block groups effectively.
- For 1998-2005 records: use the 2000 Decennial Long form SES values.
- For 2006-2010: use the 2006-2010 ACS 5-year. Although the sampling rate was lower, ACS was collected during this period.
- For 2011-2021: use the ACS that matches the ACS release year
The rationale, at least from my experience in a rapidly changing city, there are some big changes. The neighborhood where I work, the median household income went from ~$30k to >$100k between 2010 and 2020 in the ACS 5 year releases.
Anyhow, does this seem like a rational methodology for blending ACS data with such a long study frame?
If not, can you tell me why and what you would suggest instead and why?