ACS Data Users Group

 View Only
Expand all | Collapse all

calculating SE and MOE in R package srvyr for a proportion using replicate weights and PUMS data

  • 1.  calculating SE and MOE in R package srvyr for a proportion using replicate weights and PUMS data

    Posted 07-21-2024 06:46 PM

    Using the srvyr package and replicate weights REPWT, I would like to calculate the standard error and margin of error of the proportion (p) of households (HHWT) in the dataframe "data" that are paying over housing tax credit level rents, this is denoted in the dataframe by a "1" in the field OVERLIHTC. I'm able to get as far as shown below, but I don't know how to finish the code and print the results. Ideas?

    p <- sum(data$HHWT[data$OVERLIHTC == 1]) / sum(data$HHWT)

    svy <- as_survey(data, weight = HHWT , repweights = matches("REPWT[0-9]+"), type = "JK1", scale = 4/ 80 , rscales = rep(1, 80 ), mse = TRUE)

    sub_design <- subset(svy, OverLIHTC == 1 )



  • 2.  RE: calculating SE and MOE in R package srvyr for a proportion using replicate weights and PUMS data

    Posted 07-22-2024 05:31 PM

    I would personally use dplyr (https://dplyr.tidyverse.org/index.html) to summarize this (since srvyr is designed to work with the dplyr syntax) and do something like this. (I can't see your actual dataset, so you may need to tweak this.) NOTE: You'll need to convert OVERLIHTC to a character vector if it's not one already for this to work correctly.

    svy |>

    group_by(OVERLIHTC) |>
    summarise(Percent = survey_mean())

    That should provide a df with the percentage and SE for each percentage.



  • 3.  RE: calculating SE and MOE in R package srvyr for a proportion using replicate weights and PUMS data

    Posted 07-22-2024 05:31 PM

    I would personally use dplyr (https://dplyr.tidyverse.org/index.html) to summarize this (since srvyr is designed to work with the dplyr syntax) and do something like this. (I can't see your actual dataset, so you may need to tweak this.) NOTE: You'll need to convert OVERLIHTC to a character vector if it's not one already for this to work correctly.

    svy |>

    group_by(OVERLIHTC) |>
    summarise(Percent = survey_mean())

    That should provide a df with the percentage and SE for each percentage.



  • 4.  RE: calculating SE and MOE in R package srvyr for a proportion using replicate weights and PUMS data

    Posted 07-22-2024 05:36 PM

    Here is a great summary of srvyr, which includes the code for outputting SEs and MOEs:

    https://cran.r-project.org/web/packages/srvyr/vignettes/srvyr-vs-survey.html

    Cheers--

    AB



  • 5.  RE: calculating SE and MOE in R package srvyr for a proportion using replicate weights and PUMS data

    Posted 07-23-2024 11:37 PM

    Here is an example using the R survey package and the variable SEX

    Download PUMS data (using API)
    with variable SEX PWGTP and paste0("PWGTP",seq(1,80)) and save in data.frame pums
    (SEX 1=male 2=female)

    Create an indicator variable for the category "male"
    pums$male<-as.numeric(pums$SEX==1);
    require(survey)

    design<-svrepdesign(ids=~1,data=pums$male,weights=pums$PWGTP,repweights=pums[,paste0("PWGTP",seq(1,80))])

    sm<-svymean(~male,design);
    mean SE
    male 0.45799 0.0094

    For an indicator (0/1) variable the mean is just the proportion.
    MoE<-sqrt(as.numeric(attr(sm,"var")))*qnorm(0.95);

    print(MoE)

    0.01200602



  • 6.  RE: calculating SE and MOE in R package srvyr for a proportion using replicate weights and PUMS data

    Posted 07-24-2024 12:04 PM

    Thanks everyone. Elizabeth, I did add the code you suggested and it produced a percent_se of .0275. I presume the is a standard error of 2.75% (not .0275%), correct? And presuming all this is on the right track, what is the code to get the margin of error (let's say for the 90% confidence interval with its factor of 1.645)?



  • 7.  RE: calculating SE and MOE in R package srvyr for a proportion using replicate weights and PUMS data

    Posted 07-24-2024 02:21 PM

    The calculation above should use qnorm(0.95) for a 90% CI



  • 8.  RE: calculating SE and MOE in R package srvyr for a proportion using replicate weights and PUMS data

    Posted 07-26-2024 11:55 AM

    Thanks David, and in my original code with the additions provided by Elizabeth, how do I then utilize qnorm(0.95) to get the margin of error?

    And do you agree that standard error is 2.75% (not .0275%)?



  • 9.  RE: calculating SE and MOE in R package srvyr for a proportion using replicate weights and PUMS data

    Posted 07-26-2024 12:33 PM

    The "output" from the calculation is a fraction so you need to multiply by 100 to get a percent.

    the number qnorm(0.95) is 1.644854 If you look in the ACS handbook chapter 8 they use 1.645 to convert a standard error (SE) to a margin of error (MoE) which is what you get when you round qnorm(0.95) up. This gives a conservative estimate for the MoE. In my survey package example the output from the svymean function is complicated (use str to see the actual structure of the return from svymean). The variance of the mean is in an attribute. The SE is the square root (sqrt) of the variance. You then scale the SE to get the MoE. FYI you can use svymean on a categorical variable and it will give the fractions for the various categories along with the SE for each category. You still need to use the attr function to extract the variance covariance matrix. In the case of multiple categories the SE is sqrt(diag(attr(svymean(x),"var"))) The advantage of using svymean is that it will give a better estimate than using svytable and then applying the ratio calculation in chapter 8 of the ACS handbook.

    As a note I like to use the "oldest" and simplest package to make a calculation. This means that I prefer the "survey" package over "srvyr" package



  • 10.  RE: calculating SE and MOE in R package srvyr for a proportion using replicate weights and PUMS data

    Posted 07-26-2024 12:46 PM

    I just completed the calculation of SE and SOE manually, following the Census guide. And what I got was the Variance is .0275, the SE was .166 and the MOE is .273. So it appears that the code provided by Elizabeth yielded the Variance -- Although eyeing the data that seemed to be what I would expect for a margin of error. Suggestions?



  • 11.  RE: calculating SE and MOE in R package srvyr for a proportion using replicate weights and PUMS data

    Posted 07-26-2024 12:57 PM

    Forget the most recent post, I made an error in the excel spreadsheet with the manual calculation -- with a correction, I do get a SE of .0275. My question on getting to the MOE still stands however.



  • 12.  RE: calculating SE and MOE in R package srvyr for a proportion using replicate weights and PUMS data

    Posted 07-26-2024 02:28 PM

    what state and puma are you using ?



  • 13.  RE: calculating SE and MOE in R package srvyr for a proportion using replicate weights and PUMS data

    Posted 07-26-2024 04:50 PM

    Nevada, 2021 5YR ACS, all PUMAs



  • 14.  RE: calculating SE and MOE in R package srvyr for a proportion using replicate weights and PUMS data

    Posted 07-28-2024 07:53 PM

    I double checked, my table was drawn from NV PUMA #200, 2021 5YR ACS microdata



  • 15.  RE: calculating SE and MOE in R package srvyr for a proportion using replicate weights and PUMS data

    Posted 07-29-2024 11:27 AM

    Thanks for the info. I assume that you mean PUMA FIPS 00200 (PUMAs have 5 digits). Also where did you get OVERLIHTC LIHTC is a HUD "variable" which is based on the AMI (Area median income). Where did you get the AMI ? I would like to recreate your calculation using my R program.

    Thanks,

    Dave



  • 16.  RE: calculating SE and MOE in R package srvyr for a proportion using replicate weights and PUMS data

    Posted 07-29-2024 05:28 PM

    I think David has much more statistical expertise than I do, so if he disagrees I would defer to him. I typically estimate the MOE by multiplying the SE by the critical value. I typically use 95% CI in my work and use 1.96.



  • 17.  RE: calculating SE and MOE in R package srvyr for a proportion using replicate weights and PUMS data

    Posted 07-29-2024 09:59 PM

    Dear Elizabeth,

    1.96 would be a 95% confidence interval [qnorm(0.975) ]which is typical for many statistical analyses. However the Census uses a 90% confidence interval for the MoE so the "scale" factor would be approximately 1.65

    See the ACS handbook chapter 8 for the calculation

    https://www.census.gov/content/dam/Census/library/publications/2020/acs/acs_general_handbook_2020_ch08.pdf

    formula (2) page 60 that gives the relationship between SE (standard error) and the MoE (margin of error)

    Dave



  • 18.  RE: calculating SE and MOE in R package srvyr for a proportion using replicate weights and PUMS data

    Posted 08-20-2024 04:31 AM

    hi, the instructions on this page hopefully allow you to create a survey design and run the most common analysis commands.. the latter section also includes the `srvyr` conversion in case you'd like to work with dplyr syntax. hope this helps :-)

    https://asdfree.com/american-community-survey-acs.html#analysis-examples-with-srvyr