ACS Data Users Group

 View Only
  • 1.  Better stand-alone names for categories like age groups

    Posted 12-06-2023 01:50 PM

    I'm looking for a resource or some ideas about handling the metadata that indent levels provide in ACS row names. The use of indents to represent different categories often creates repeated names for rows, under different parent categories.

    Row names are often structured in a nested way, e.g. in table B01001, Sex by Age, the first several rows are

    1 Total:
    2 Male:
    3 Under 5 years
    4 5 to 9 years
    5 10 to 14 years
    6 15 to 17 years
    7 18 and 19 years
    8 20 years
    9 21 years

    So Row 3 is named just named "Under 5 years" with an indent. That's the same as Row 27, under Female. The Excel table downloads and templates don't provide the names for parent categories in the data row - and categories can go at least a couple of layers deep. CSV's do the same, with multiple spacing instead of tabs. This issue can make working with individual rows from downloaded data tricky.

    When you download csv files in a ZIP format, you do get metadata and data files with complete data column names like "Estimate!!Total:!!Male:!!Under 5 years", which can be converted into something legible. But that's an extra step for each distinct table you download. I don't know of a way to do it if you want to work with metadata for all tables.

    Another inelegant solution would be to find a cell's indent level in files downloaded directly from the UI and infer that the previous indent level(s) represents the parent category/categories. That's doable in data analytic software, even in Excel with VBA, but relying on typography to infer data categories doesn't seem like the best idea.

    So, does anyone know of a resource that would provide columns named something like "Male: Under 5 years" directly for a all tables, or a set of tables? Or to be complete, a name like "Total: Male: Under 5 years" might be better in general.

    Does anyone have any better ideas or resources? Am I missing something?

    Thanks -

    Jon



  • 2.  RE: Better stand-alone names for categories like age groups

    Posted 12-06-2023 07:25 PM

    For what it is worth, I have an R function that downloads the table and metadata via the API and attaches the row labels preserving the "indent" (as you call it) structure. Tidycensus doesn't seem to take care of this as far as I can tell.

    Dave Dorer



  • 3.  RE: Better stand-alone names for categories like age groups

    Posted 12-06-2023 08:06 PM

    The shells file has indent levels in its own column.



  • 4.  RE: Better stand-alone names for categories like age groups

    Posted 12-07-2023 09:42 AM

    I don't see it. Am I looking in the wrong place, e.g., https://www2.census.gov/programs-surveys/acs/tech_docs/table_shells/2022/ ?



  • 5.  RE: Better stand-alone names for categories like age groups

    Posted 12-07-2023 09:56 AM

    That's weird, the indents are not in the Excel versions of the table shells lists. They were included in the text version of the file, which for some reason the Bureau has removed from their site. I fortunately created an excel version of that one, it's here: https://mcdc.missouri.edu/data/acs2022/acs2022_5yr_table_shells.xlsx



  • 6.  RE: Better stand-alone names for categories like age groups

    Posted 12-07-2023 10:00 AM

    Thanks, exactly what I needed.



  • 7.  RE: Better stand-alone names for categories like age groups

    Posted 12-07-2023 11:26 AM

    Someone on the SCD email list just mentioned that all versions of the table-shells file are available on the FTP site at https://www2.census.gov/programs-surveys/acs/summary_file/2022/table-based-SF/documentation/ -- you'll need the pipe-delimited text version to see the indents.



  • 8.  RE: Better stand-alone names for categories like age groups

    Posted 12-07-2023 03:05 PM

    A couple other sources you could use:

    • IPUMS NHGIS ACS data files, which come with concatenated variable labels in metadata files ("codebooks") and in a descriptive header row (if you choose that option). You can request multiple tables in a single NHGIS file, so you wouldn't have do "an extra step for each distinct table you download." We also have an API you could use to get our metadata.
    • The Census Bureau's API for ACS includes endpoints for variable labels with concatenated categories in HTML, XML or JSON format. You can find links to these endpoints on any of the ACS API pages (e.g., 2022 5-year is here.)

    FWIW, when we add the data to NHGIS, we generally try to use the API endpoints to get the concatenated labels, but unfortunately, we start processing the 5-year data during the 2-day embargo period before the public data release, and new endpoints aren't available until then. This year we got started on 2022 5-year processing by using a 2022 1-year variables list and adding in 2021 5-year labels for the 10 5-year tables that aren't in 1-year data.

    Also: the API list is apparently randomly ordered, so it may take a little extra effort to select and order the labels for your variables of interest.



  • 9.  RE: Better stand-alone names for categories like age groups

    Posted 12-08-2023 01:20 PM

    TidyCensus does a nice job of aggregating the various subparts with the load_variables function, for example

    x <- load_variables(2022, "acs5")

    You get output like

    name label
    B01001A_001 Estimate!!Total:
    B01001A_002 Estimate!!Total:!!Male:
    B01001A_003 Estimate!!Total:!!Male:!!Under 5 years
    B01001A_004 Estimate!!Total:!!Male:!!5 to 9 years
    B01001A_005 Estimate!!Total:!!Male:!!10 to 14 years