They have set up a separate GQ Count Review for the 2020. Apparently some of the GQs were put in nearby wrong blocks. One of the problems, however, is that Differential Privacy was applied to the Block Groups, so in NY State after the prisoner relocations, some qroup quarters blocks ended up with negative populations, since presumably the original counts had already been distorted when the prison population was relocated. According to Applied Geographic Systems there are many problems with the 2020:
A tremendous effort in the analytics world is devoted to the task of data preparation and cleansing. Data exchanges refer to ‘curated’ data, which suggests that the suppliers of that data have gone to the trouble of estimating missing fields, reigning in outliers, and harmonizing the data with other known and trusted data sources at various geographic aggregations. Users of that curated data then rely on the dataset for their models, often automated, for making informed business decisions. If the goal of analytics is to reduce uncertainty in business decisions, the minimization of error must be a priority at all stages of the effort – since error in source data is not only propagated but magnified.
Consider it equivalent to having a termite infestation in your house. Superficially, everything looks perfectly fine, as the frame of the house is covered by layers of drywall, paint, siding and roofing materials. Sooner or later though, the foundational rot erupts at the surface – sagging floors and cracking drywall – but by then, the damage done is substantial. Structural rehabilitation of the bones of a house is an expensive and time-consuming effort. Tenting the house early in the process is in comparison extremely cheap, despite the neighbors mocking about the circus coming to town.
The notion that error would be deliberately induced into a foundational dataset is close to a moral issue. Who would do such a dastardly deed?
What is the Issue?
In years past, the census has – as required by law – made substantial efforts at protecting the privacy of individuals. As the genealogy world well knows, the physical records which have names and addresses, are sealed for decades. When the census included both the short form and the long form, the sensitive personal data found in the long form was reasonably well protected — it was based on a sample and techniques were employed to “borrow†characteristics between similar, nearby census blocks. With the demise of the long form – replaced by the American Community Survey (ACS) – the census consists of only completely (obviously with some error) enumerated geographic areas. As a result, the data for small areas can be used in conjunction with other databases (mailing lists, property records, etc.) to potentially identify individuals within them.
We have recently been talking about the census concept of a “privacy budget†and its potential effects on the 2020 data releases. Detailed discussions of those issues can be found on the AGS blog —
The unpleasant conclusion is that the data has been seriously corrupted, so much so that a significant number of census block groups have statistically impossible data, among them –
- entire blocks of unsupervised children in households (no adults)
- ghost communes, where there are occupied dwellings with no people
- baseball team size families, complete with a stocked bullpen
For every identified impossibility, there lurks underneath it at least ten improbabilities, and this is just the baseline numbers. The real meat of the 2020 census is found in the detailed tables which address key population characteristics (age, sex, race, Hispanic origin, ancestry) and household characteristics (household size and structure).