ACS Data Users Group

 View Only
  • 1.  ACS 5 years 2019-2023 PUMS data

    Posted 07-08-2025 09:59 AM

    Hello,

    I am trying to use the ACS 5 years 2019-2023 PUMS data to do some analysis by occupations and industries. I have downloaded the pusa-pusd SAS datasets but unable to find the variable on occupation which is supposed to be the OCCP variable. There is a variable OCPIP which is not explained in the data dictionary. I had worked with the one-year datasets for 2019 and 2020 which had the occupation variable as OCCP. I tried to look into the faqs in the ACS websites but no luck. Can any of you help me with this? Am I doing something wrong, should I be using some other ACS PUMS data files, or there is something missing in these 2019-2023 datasets? Any help is appreciated.

    Thanks,

    Anasua



  • 2.  RE: ACS 5 years 2019-2023 PUMS data

    Posted 07-08-2025 10:37 AM


  • 3.  RE: ACS 5 years 2019-2023 PUMS data

    Posted 07-10-2025 11:05 AM

    According to the official data dictionary for 2019 to 2023 ACS PUMS data, OCPIP refers to "Selected monthly owner costs as a percentage of household income during the past 12 months".

    If you are worried about the completeness or format of this "pusa-pusd" SAS dataset, you can always look into a CSV version in the official repository of PUMS data. The naming convention of the files contained tell you a few things, and I always like to start with Wyoming when exploring the structure of the data; Even though I live in the most populous state of California, I prefer to start with the least populous state of Wyoming to check what the data looks like before loading in the enormous size of the data I later intend to use.

    So I start with csv_pwy.zip and csv_hwy.zip. The three-letters before the underscore tell me whether the format of the files are csv or SAS files within the zip archive; The letter after the underscore tells me whether the files represent a housing unit/household/group quarters per line (h) or each line represents a person (p); the last two letters represent a state abbreviation, which can be anything from California (ca), Wyoming (wy), Puerto Rico (pr), or even the collective of the United States (us).

    I just checked both sas and csv versions of Wyoming person records and both had 286 fields/variables with 29,065 observations. So I have no reason to believe the larger files would be different.

    Hope this helps!



  • 4.  RE: ACS 5 years 2019-2023 PUMS data

    Posted 07-10-2025 11:45 AM

    Thank you, I will try.