ACS Data Users Group

 View Only
  • 1.  ACS data 2019 and 2020

    Posted 12-02-2024 03:40 PM

    Hello,

    I am using ACS PUMs data 2019 and 2020 to develop some dashboards and was writing a manuscript about it. I found that there are few records for individuals 16-17 years age and below high school level education with very high incomes and they do show up as the highest mean incomes (wage income and total income). They are greater than $400,000 annually and cannot be correct. Should i remove these records to do my analysis?

    Thanks,

    Anasua



  • 2.  RE: ACS data 2019 and 2020

    Posted 12-02-2024 03:54 PM

    Would need more information to answer you. Paper's topic and universe?



  • 3.  RE: ACS data 2019 and 2020

    Posted 12-03-2024 11:10 AM

    I am working on a project using 2019 and 2020 one-year data to develop dashboards classified by occupation codes and grouped into two groups, healthcare workers and non-healthcare workers. There are three dashboards being developed, (i) the first dashboard uses data on selected demographic characteristics, (ii) the second dashboard uses data on mean wage income and mean total income, (iii) the third dashboard uses data on income and demographic characteristics for health workers. As a part of this project I am also writing a manuscript to explain these dashboards which will also incorporate some tables explaining the dashboards. While working on the tables I came across few highest mean incomes for 2019 and 2020 that are earned by those in the age group of 16-17 years old and few who are older but have below high school level education. Now if we keep these records they would show up as the highest ones which cannot be correct and if we remove them then we have to remove more observations based on the condition we use to remove observations, for example, those with below high school education or 16-17 years old earning >$400,000. My question is how to address this issue? I am adding some of these observations here in a table below, but there are more.

    occupation code year health workers wage income total income age groups education race hispanic ethnicity marital status citizenship nativity language spoken sex
    120 2020 0 408,497 408,497 16-17 Below high school White alone Non Hispanic Married Born in the United States Nativie Only English Female
    1310 2020 0 412,521 412,521 35-54 Below high school White alone Non Hispanic Married Born in the United States Nativie Only English Male
    8610 2019 0 458,606 458,606 16-17 Below high school White alone Non Hispanic Never married Born in the United States Nativie Only English Female
    3324 2020 1 501,062 501,062 55-64 Below high school Some Other Race alone Non Hispanic Married Born in the United States Nativie Only English Male
    3324 2020 1 540,302 540,302 55-64 Below high school American Indian alone Hispanic Married U.S. citizen by naturalization Foreign born Language other than English Male
    3324 2019 1 557,600 555,681 18-34 Below high school White alone Hispanic Never married Born abroad of U.S. citizen parent or parents Nativie Language other than English Male
    3324 2019 1 557,600 555,681 18-34 Below high school White alone Hispanic Never married U.S. citizen by naturalization Foreign born Language other than English Male
    40 2020 0 665,065 671,101 55-64 Below high school Two or More Races Hispanic Married Not a U.S. citizen Foreign born Language other than English Male
    3256 2020 1 665,065 673,516 35-54 Below high school Black or African American alone Non Hispanic Never married Born in the United States Nativie Only English Female


  • 4.  RE: ACS data 2019 and 2020

    Posted 12-03-2024 05:02 PM

    May I ask source of this PUMS? These are the row level data that you've recoded, correct? You have not aggregated these, right?



  • 5.  RE: ACS data 2019 and 2020

    Posted 12-03-2024 10:14 PM

    I had downloaded the psum_pusa and psum_pusab from the website. And yes, these are records recoded, not aggregate data.



  • 6.  RE: ACS data 2019 and 2020

    Posted 12-04-2024 08:40 AM

    I'm still not 100% sure what your goal is here but if you are using the PUMS as input data then I would leave it alone and not throw out these outliers, which is really what these are. I would not characterize these as "incorrect" as implausible as they seem.

    On the other hand, if you feel you have to discard them their weight would be so small in your aggregation that they probably wouldn't have a huge effect as you're using national files.