ACS Data Users Group

 View Only
Expand all | Collapse all

5-Yr PUMA Geography

  • 1.  5-Yr PUMA Geography

    Posted 02-21-2024 12:05 PM

    Hello, I am accessing the 5-Yr estimates for 2022 but not seeing that PUMAs are an option in the "Geography" Any ideas how to get this?

    2022

    2021



  • 2.  RE: 5-Yr PUMA Geography

    Posted 02-21-2024 02:23 PM

    Yes, in MDAT, for the 2022 5-year PUMS, you'll find PUMA information through two _variables_, not geographies. The PUMA10 variable identifies 2010 PUMA codes for respondents from 2018 through 2021. PUMA20 identifies 2020 PUMA codes for respondents from 2022. This two-variable system will most likely continue for 5-year PUMS until the 2026 release when, once again, the 5-year PUMS will use only one set of PUMA definitions for the entire 5-year period. (MDAT uses the same setup for the 2012 through 2015 5-year PUMS releases, which also used two sets of PUMA definitions.)

    For IPUMS USA, we're working on providing PUMA codes for the 2022 5-year sample through a single "PUMA" variable (as we already do for the 2012 through 2015 5-year samples). We aim to release that update sometime in the next couple weeks. We have several other resources related to PUMAs and PUMA changes through our Geographic Tools & Resources page.



  • 3.  RE: 5-Yr PUMA Geography

    Posted 02-21-2024 02:40 PM

    Thank you, Jonathan. I found the variable, but there is no detail with it. So, there is no telling which PUMA each row is.



  • 4.  RE: 5-Yr PUMA Geography

    Posted 02-22-2024 01:13 PM

    As Jonathan indicate, the 2018-2022 PUMS data is a "mixed geography" file. There are 2 variables that don't exist in the 2017-2021 file, PUMA10 and PUMA20. The file is 4/5ths PUMA10 records and 1/5th PUMA20 records. This makes the 2022 5 year file pretty useless. You are better using the 2022 1-year PUMS data. For the API for the 2018-2022 you can't use the the "&for=public use microdata:PUMAFIPS" construction. You need to use "&for state=STATEFIPS" and the "subset" on PUMA10 or PUMA20 FIPS code records.

    From the people at ACSO (American Community Survey Operations) :

    Hi David,
    I think you are running into is the dual-PUMA issue.
    This is what I found in the 2022 PUMS 5-year User Guide: The current PUMA boundaries are based on Census 2020 definitions, while records from 2021 and earlier use boundaries based on Census 2010 definitions. Therefore, multi-year files for 2022 will contain PUMA codes created from both Census 2010 and Census 2020. PUMA codes defined using Census 2010 are called PUMA10, while the newer PUMA codes defined from Census 2020 are called PUMA20. Each record on the PUMS files will contain either the PUMA10 or PUMA20 code, based on which year the record’s data were collected. Due to disclosure concerns, it is not possible to update the PUMA codes for the records from 2021 and earlier to 2020-based PUMAs by using their detailed geographic locations. Data users will need to crosswalk their data to obtain a single PUMA geography using other means, such as using allocation rates using GEOCORR.

    I have reached out to the PUMS subject matter experts and have received the following responses:

    There's an error in the first API URL you provided, for the housing file call. It's retrieving records where PUMA20=00902 regardless of state, while the person API call is restricting to PUMA20=00902 and state=25.

    Dual PUMAs were used for the DY22 5-year PUMS, as PUMA20 is only on the 2022 records, while PUMA10 is on the 2018 through 2021 records. The information available is on p.12 of the ACS 5-year PUMS User Guide: https://www.census.gov/programs-surveys/acs/microdata/documentation.html

    You may benefit from using the PUMA10 and PUMA20 Variables to narrow down the search with the universe of 00902 below or could use the state link and do the same thing.
    This might be helpful if you want to only be in that PUMA area for the Geography:


    I have attached the PUMS Data Dictionary for you as well.


    I hope this helps. Let me know if you have any other questions.

    Vicki


  • 5.  RE: 5-Yr PUMA Geography

    Posted 02-22-2024 01:33 PM

    Thank you David. I need to use the 5-yr estimate as I want to go down to the PUMA level (I actually want county data). But, this seems to not be an option with this file. Like I've stated, I used the PUMA20 or PUMA10 variable but its pretty useless, with the way I'm using it. The PUMA value does not show on the table.

    Any ideas of how to accurately get to the county level for 2022 data?

    Thank you,

    Lorna



  • 6.  RE: 5-Yr PUMA Geography

    Posted 02-22-2024 02:31 PM

    Dear Lorna,

    If you look at my post about a Small Area Estimation (SAE) program that I wrote you can use the program to get county data. The problem is that a county may contain several PUMAs, This situation is pretty easy to handle you can just "add up" the tables for the relevant PUMAs. For PUMAs that cross county lines, which may contain parts of several counties, you are stuck. I wrote the SAE program to handle this situation. I start with PUMS data for all the relevant PUMAs (large area) and then I create PUMs like tract level data. Next I "stack" the tracts for the county that I want. There are many potential issues with this approach but it seems reasonable. You can then create any county table that you want using the synthetic data.

    The current version of the program does not produce useful MOEs. I'm working on an extension the produces replicate weights. You can produce MOEs using the replicate weights. To do all this you need to be able to use "R" Do you have any experience with R ? If you work for a nonprofit 501(c)(3) or government you can get free support through my foundation dorerfoundation.org

    Dave



  • 7.  RE: 5-Yr PUMA Geography

    Posted 02-23-2024 10:10 AM

    Thanks again David. I think I will switch over and use the PUMS data with SAS. I need demographics by county/zip for 200% FPL. I'm VERY interested in in using the Supplemental poverty data though. Can you get me started? I work for the state of Washington, DSHS



  • 8.  RE: 5-Yr PUMA Geography

    Posted 02-23-2024 10:47 AM

    I haven't used SAS in years and years so I don't recall how transferrable this resource is.... These two links were extremely helpful with using pums data in R.

    https://walker-data.com/census-r/introduction-to-census-microdata.html

    https://walker-data.com/tidycensus/articles/pums-data.html

    Meghan



  • 9.  RE: 5-Yr PUMA Geography

    Posted 02-23-2024 11:43 AM

    Dear Lorna,

    Go to dorerfoundation.org an look for the "Contact Us" tab across the top. Send an email to the address and we can communicate via email.

    Best,

    Dave



  • 10.  RE: 5-Yr PUMA Geography

    Posted 03-06-2024 12:15 PM

    Jonathan, thanks for your note. Will there be an announcement when IPUMS releases the single PUMA geography?



  • 11.  RE: 5-Yr PUMA Geography

    Posted 03-06-2024 12:59 PM

    We will announce to registered IPUMS users by email, but sometimes there's a week or two delay between release and the email. You can also occasionally check the Revision History page on the site for current status.

    Some insider info: we're on track for a release this Thursday or Friday. It's fully prepped but there are some technical hold-ups on the deployment. I'm not sure, but it sounds like our IT team will get those worked out soon.



  • 12.  RE: 5-Yr PUMA Geography

    Posted 03-06-2024 02:06 PM

    That's great to hear, I'm excited to use the new PUMAs in the 2022 5yr survey for the older data for a question I'm working on. I have a tangential question.

    In my work I've been using the tidycensus, survey, and srvyr packages in R and calling the data through ACS API. I'm using it because of the resources available on how to use these to get margins of error as recommended by the census bureau (primarily: https://walker-data.com/tidycensus/articles/pums-data.html and https://walker-data.com/census-r/introduction-to-census-microdata.html).

    My question is, do you know of similar resources that walk a novice R user on how to do similar manipulations and summarizations of IPUMS microdata that include error estimates? Or perhaps a crosswalk of how IPUMS should be handled differently than the data extracted from ACS to get the margin of error using replicate weights (since the survey and srvyr packages do that automatically).

    Thank you,

    Meghan



  • 13.  RE: 5-Yr PUMA Geography

    Posted 03-07-2024 01:55 PM
    If I am understanding your question correctly, you are interested in sample code for applying replicate weights to the ACS PUMS available from IPUMS USA to generate empirically derived standard errors for your estimates. IPUMS USA offers both the household (REPWT) and the person (REPWTP) replicate weights through their data access system. This IPUMS USA replicate weights summary page provides a bit of background information as well as sample code for applying replicate weights in R with the srvyr package.


  • 14.  RE: 5-Yr PUMA Geography

    Posted 03-07-2024 02:46 PM

    Update: IPUMS USA has now added PUMA identifiers to its version of the 2022 5-year sample through a single PUMA variable. Almost all the other geographic variables that we derive from PUMA information have also been added, identifying counties, cities, metropolitan areas, percent metro population, and metropolitan / principal city status (where possible).

    We also extended three more geographic variables to both the 2022 1-year and 5-year samples: DENSITY, METPOP10, and HOMELAND. (We hadn't yet updated these for 2020 PUMAs, so unlike the other variables mentioned above, they weren't yet available in the 2022 1-year sample until today.)



  • 15.  RE: 5-Yr PUMA Geography

    Posted 03-11-2024 06:48 PM

    While you work working on generating the single PUMA varialble, did you uncover any mislabeled 2020 PUMAs in the Census Bureau Reference Maps?

    In Arkansas, for example, 0500800 is labeled as "White, Lonoke & Woodruff Counties" but it does not include any part of Lonoke County, that not resides in 0500900.

    It's not a huge deal, but I was hoping not to have to check and manually relabel all the PUMAs before we publish our data dashboard.



  • 16.  RE: 5-Yr PUMA Geography

    Posted 03-12-2024 09:12 AM

    To KatherinRPhillips: We didn't notice any PUMA naming issues, but IPUMS USA doesn't do much with PUMA names. In our versions of the microdata, we provide PUMA codes but not names. Out of curiosity, I looked into the naming issue you describe, and I agree that PUMA 0500800 is apparently misnamed. In the Census Bureau's relationship file between 2020 tracts and 2020 PUMAs, all of the tracts in Lonoke County (state 05, county 085) are in PUMA 0500900 and none are in 0500800.



  • 17.  RE: 5-Yr PUMA Geography

    Posted 03-12-2024 09:23 AM

    Thanks for the quick response. Looks like I will have to hand check each label as I suspect the shape files contain the same issue.



  • 18.  RE: 5-Yr PUMA Geography

    Posted 03-12-2024 10:00 AM

    In New York the names are very long. My guess is they edited the PUMAs, but did not edit all of the names. PUMAs now are almost always a set of tracts. In the distant past they were created by Summary Level 80 which was tract split by place, but that summary level has been abandoned and is no longer used to report data. Originally the PUMAs were not developed for mapping, and were not necessarily contiguous, in fact one PUMA in Will County Illinois was in as many as 50 parts. Once the ACS was started and they were used for mapping they imposed stricter rules. They are still created in cooperation with the state governments..

    Andy



  • 19.  RE: 5-Yr PUMA Geography

    Posted 03-12-2024 10:04 AM

    PUMA boundaries and names are defined by the states, specifically the SDCs. I don't think the names go through any kind of review, although I could be wrong about that.

    At MCDC, we don't put much stock in PUMA names, either -- they're a convenience, but just aren't that relevant. PUMAs weren't even named until the 2010 census.



  • 20.  RE: 5-Yr PUMA Geography

    Posted 03-12-2024 10:13 AM

    Before the advent of the ACS, PUMAs were mostly used for data analysis with no spatial reference. They started in the 1990 Census. Once the ACS was launched and the 65000 Threshold was adopted for release of the one year data, the PUMAs became the most deetailed complete coverage of the US that could be rleased. P:laces are not a complete coverage, and only about 800 of 3200 counties are larger than 65K. So they had defined PUMAs and decided to use them to Map the ACS, since they are larger tnan 100k and cover the entire country ( there are somewher around 2200 of them). At that point how messy some of there were became apparent. They had no names until 2010 ones, which were released in 2012.



  • 21.  RE: 5-Yr PUMA Geography

    Posted 03-12-2024 10:30 AM

    Thanks Andrew, I'm already aware of this history.

    Here in Missouri, MCDC retrospectively gave names to the 2000 PUMAs for our state only.



  • 22.  RE: 5-Yr PUMA Geography

    Posted 08-21-2025 12:40 PM

    Hi Jonathan,

    I''m curious why the Census Bureau does not simply regeocode the 2018 to 2021 respondents into the 2020 PUMA boundaries? This way all respondents would be in the same geographies. This would take very little time and would make the dataset much easier to use.

    Thanks,
    Charles



  • 23.  RE: 5-Yr PUMA Geography

    Posted 08-21-2025 01:39 PM

    Geocorr (Glenn Rice) has cross-walks between various vintages of tracts and pumas. Between decennial census' the tract boundaries may change as well as the PUMA boundaries. Again Geocorr has cross walks between the "new 2020" boundaries (put into effect in 2022 for PUMAs).

    https://mcdc.missouri.edu/applications/geocorr.html

    For tracts or pumas or counties or any combination of geos and vintages, when the boundaries split another geography (such as 2019 tracts and 2022 tracts), geocorr gives an "allocation factor" based on the populations in the split geographies. I think that geocorr disaggregates down to the block level and then regroups up to the higher geography to get the allocation factors. I'm not sure how geocorr works when the block boundaries change between vintages. For example when a block is split between vintages.

    Hope this helps. I use R to download and analyze PUMS data (including the geocorr files).



  • 24.  RE: 5-Yr PUMA Geography

    Posted 08-21-2025 03:21 PM

    The short answer is that I use a GIS to intersect the older geography boundaries with the current block boundaries.

    All geographies within in a single vintage of Geocorr must use the same "atoms" (e.g. 2020 blocks for Geocorr 2022). In order to add an earlier vintage of geography (e.g. 2010-vintage PUMAs) to Geocorr 2022, I have to redefine it in terms of 2020 blocks.

    This is tedious, as you might expect, so I do this only for geo types that I consider most useful to compare across decennial vintages.



  • 25.  RE: 5-Yr PUMA Geography

    Posted 08-21-2025 06:28 PM

    Thank you very much David, I'll check this out.

    I'm still curious as to why the respondents in 2010 PUMAs are not geocoded to 2020 PUMAs in this latest file.



  • 26.  RE: 5-Yr PUMA Geography

    Posted 08-25-2025 01:18 PM

    Charles,

    Since my original post here, the Census Bureau has done something very much like what you suggest. Their first release of the 2019-2023 5-year PUMS identified 2020 PUMAs for all respondents. Later, they released a new version of the 2018-2022 5-year PUMS that identifies 2020 PUMAs for all respondents, but they later removed that.

    (We at IPUMS discovered some 2018 records in the 2018-2022 updated release that had invalid 2020 PUMA IDs. We let the Bureau know about these errors, and they temporarily removed the entire updated version of the PUMS. In this errata note, they state that they plan to release a corrected version at an unspecified later date.)

    We’ve integrated the 2019-2023 5-year release into IPUMS USA, and we will also add the updated 2018-2022 PUMS if/when they're again available.

    That said, as I understand, the approach they've taken is not exactly what you suggest--"regeocoding"--and there's a very good reason for that: it would violate respondent privacy.

    If they had simply regeocoded, then for the many households that appear in both the 2017-2021 5-year PUMS (using 2010 PUMAs) and in the 2018-2022 5-year PUMS (using 2020 PUMAs), it would typically be possible for a user to identify both the 2010 and 2020 PUMAs where each household resided. If a household resided in an area affected by a small PUMA boundary change--maybe encompassing a few thousand residents or less--then we could determine the location of that household MUCH more precisely than if we only knew which 2010 PUMA the household is in. (PUMAs are required to have at least 100,000 residents in order to prevent users from locating respondents more precisely.)

    To my knowledge, the Bureau has yet to provide exact details on how they allocated 2020 PUMA identifiers to 2018-2021 responses in these new PUMS releases, but the documentation they have provided indicates that it was not by regeocoding. Rather, they used some kind of crosswalk from 2010 PUMAs to 2020 PUMAs, as you might get from Geocorr. I’ve confirmed that the allocation is not based on areal weighting (simplistically assuming that the likelihood of a 2010 PUMA resident living in a 2020 PUMA is equal to the proportion of the 2010 PUMA’s area in the 2020 PUMA.) But beyond that, we don’t know what the technique was.

    They may have used population allocation factors like those provided in Geocorr files, but I can't verify that. Unfortunately, whatever allocation technique they used--if it was not in fact regeocoding--will have introduced some allocation errors, so it'd be helpful if at some point they could provide more information about their approach and/or the potential scope and impact of any resultant PUMA misidentifications.



  • 27.  RE: 5-Yr PUMA Geography

    Posted 08-25-2025 03:13 PM

    Hi Jonathan,

    Thanks so much for the detailed explanation about why they did not regeocode the older respondents, that makes sense (and did not occur to me). We are going to proceed with using the 2019-2023 5-year since it has been "corrected", but I agree that it would be helpful to know how they allocated 2010 PUMA respondents to 2020 PUMAs.

    Thanks,
    Charles