Dear Christine,
MAKING PROGRESS
I was able to reproduce the AL row in the Lifeline spreadsheet. The value is 626087 I used the 2021 1-year PUMS. I added SPORDER==1 to the code posted earlier This code here computes with only SPORDER==1 person records and with all person records If you can tell me now to paste in the code like you did so it looks OK then I'll post it like you did. If you send an email to info@dorerfoundation.org I'll email the code as an attachment.
I don't use the %>% format in my code but I believe if you add | SPORDER == 1 to the first subset line and eliminate the second subset line you will get the right thing. I'm a long time FORTRAN programmer so I still use loops with an integer index. The other issue is that you used 2020 5 year data and the spreadsheet uses 2021 1-year PUMS data. You might also ask why USAC Lifeline people didn't "|" across all members of the household. The application seems to say that if someone in the household is eligible then the entire household is eligible. So if anyone receives Medicaid then the household is eligible. The ACS has the concept of a single Householder (person 1 on the form). If there is a husband and wife for example and the wife fills out the form as "person 1" and the husband is on Medicaid then the ACS based calculation with SPORDER==1 is wrong and will count the household as NOT eligible. This makes the spreadsheet computed participation rate too high. They should be using the denominator of 774694 (for AL 2021 - 1 year data) which looks at all persons in the household.
Another thing is that the csv and API downloaded data have a different rule for missing values. The csv files have a ",," blank field for missing. The API cannot handle a "blank" data field so the API puts in a value. The rule that a Census person posted on the forum is lowest possible value -1 for missing with the caveat of "usually," This detail does not appear in the PUMS pdf codebook. Your code with POVPIP==0:200 should take care of this in the API case but I'm not sure. Also remember to use a 135 limit for POVPIP to match the Lifeline spreadsheet.
Reference - Lifeline Federal Regulations giving qualification for Lifeline: (uses 135% of poverty GUIDELINE)
https://www.law.cornell.edu/cfr/text/47/54.409
"The consumer, one or more of the consumer's dependents, or the consumer's household must receive benefits from one of the following federal assistance programs: Medicaid; Supplemental Nutrition Assistance Program; Supplemental Security Income; Federal Public Housing Assistance; or Veterans and Survivors Pension Benefit."
ACP (Affordable Connectivity Program (uses 200% of Poverty GUIDELINE (Not Poverty Threshold as does the ACS)
https://www.law.cornell.edu/cfr/text/47/54.1805
Both Lifeline and ACP use "Family"
Household and income are defined here:
https://www.law.cornell.edu/cfr/text/47/54.400#f
#
# USAC lifeline_pums_analysis_021823.R
#
# R function to compute number of eligible Lifeline Households from ACS PUMS data
#
# www.usac.org/.../LI_Application_NVstates.pdf
#
# By David J Dorer, Ph.D.
#
# copyright 2023 by Dorer Community Service Foundation Inc of Massachusetts
#
# info@dorerfoundation.org
#
# v1.0 djd 16 Feb 2023 12:40
# v1.1 djd 18 Feb 2023 13:38 added group quarters eligible persons calculation
# v1.2 djd 19 Feb 2023 18:43 added calculation using only person records with SPORDER==1
#
#
# z1<-usac(vintage=2021,period=1)
# z2<-usac(vintage=2020,period=1);
# z3<-usac(vintage=2020,period=5);
# z4<-usac(vintage=2021,period=5);
# z5<-usac(vintage=2021,period=1,state="MA",fips="25");
usac<-function(vintage=2021,state="AL",poverty.level=135,fips="01",period=1) {
#
# ARGUMENTS:
#
# vintage: PUMS vintage
# period: PUMS period
# state: state abbreviation
# fips: state FIPS Code
# poverty.level: poverty threshold to use in calculation
#
# VALUE:
#
# list with
#
# households: number of eligible households in state
# householders: number of eligible householders (SPORDER==1) records only
# vintage: argument
# period: argument
# url.person: Census FTP site url for PUMS Person records
# url.house: Census FTP site url for PUMS House records
# state: argument
# fips argument
# data: Derived data.frame used to make calculation.
#
state<-tolower(state);
error<-0; note<-paste0(vintage," ",period,"- PUMS");
msg<-paste("State=",state,"fips=",fips,"PUMS vintage=",vintage,"period=",period);
note<-paste("vintage: ",vintage,"period: ",period," year PUMS");
urlh<-paste0(""/",period,"-Year/csv_h",state,".zip");">www2.census.gov/.../csv_h",state,".zip");
urlp<-paste0(""/",period,"-Year/csv_p",state,".zip");">www2.census.gov/.../csv_p",state,".zip");
if((vintage==2020)&(period==1)) { # because of covid 2020 1 year PUMS was released with experimental weights
urlh<-paste0("".zip");">www2.census.gov/.../csv_h",state,".zip");
urlp<-paste0("".zip");">www2.census.gov/.../csv_p",state,".zip");
}; # if
# note this code downloads zip file to your "working directory" use getwd() to see the directory/folder
vh<-try(download.file(urlh,"pumsh.zip")); # housing records from FTP site
vp<-try(download.file(urlp,"pumsp.zip")); # person records from FTP site
if(class(vh)[1]=="try-error") {
msg<-c(msg,paste0("ERROR downloading housing file url=",urlh,"vintage=",vintage,"period=",period));
error<-error+1;
};
if(class(vp)[1]=="try-error") {
msg<-c(msg,paste0("ERROR downloading person file url=",urlp));
error<-error+1;
};
if(error>0) {cat("usac: ERRORS\n");return(msg); };
hfile<-paste0("psam_h",fips,".csv");
pfile<-paste0("psam_p",fips,".csv");
house<-try(read.csv(unz("pumsh.zip",hfile))); # load housing records directly from zip file
person<-try(read.csv(unz("pumsp.zip",pfile))); # load person records directly from zip file
if(class(house)[1]=="try-error") {
msg<-c(msg,paste0("ERROR reading zip archive: pumsh.zip file=",hfile))
error<-error+1;
};
if(class(person)[1]=="try-error") {
msg<-c(msg,paste0("ERROR reading zip archive: pumsp.zip file=",pfile))
error<-error+1;
};
if(error>0) {cat("usac: ERRORS\n");return(msg); };
# trim down house and person to the variables of interest
hvar0<-c("SERIALNO","TYPEHUGQ","WGTP","FS"); hvar<-c("TYPEHUGQ","WGTP","FS");
pvar<-c("SERIALNO","SPORDER","HINS4","PAP","SSIP","POVPIP","PWGTP");
house<-house[,hvar0];
cat("\nsummary house data.frame=\n");print(summary(house));
person<-person[,pvar];
# merge person and housing records
m<-match(as.character(person$SERIALNO),as.character(house$SERIALNO));
# data.frame like the on returned by tidycensus function get_pums() ??
# note tidycensus get_pums() appears to use the API.
# The API cannot handle blank fields, missing is lowest value - 1 (typically)
dat<-data.frame(person,house[m,c("TYPEHUGQ","WGTP","FS")]);
cat("\nWGTP==0 by TYPEHUGQ\n");print(table(wgtp0=addNA(house$WGTP==0),typehq=addNA(as.factor(house$TYPEHUGQ))));
cat("\nusac: summary dat merged person/house data.frame:\n");print(summary(dat));
cat("\nusac: summary dat merged person/house data.frame with house weight (WGTP)==0:\n");print(summary(dat[dat$WGTP==0,]));
# create logical variables in the merged "dat" data.frame
pov<-dat$POVPIP;pov[pov<0]<-NA; pov[pov<=poverty.level]<-1; # <= poverty.level argument poverty level clause
pov3<-rep(NA,length(pov)); pov3[pov<0]<-3;pov3[is.na(pov)]<-3;
flag<-(pov>=0)|(pov<=poverty.level);flag[is.na(flag)]<-FALSE;pov3[flag]<-1;
flag<-pov>poverty.level;flag[is.na(flag)]<-FALSE;pov3[flag]<-2;
pov3f<-factor(pov3,levels=c(1,2,3),c("below","above","undefined"));
pov[pov>poverty.level]<-0;
sporder<-dat$SPORDER;
ssip<-dat$SSIP; ssip[is.na(ssip)]<-0; # Receive Supplemental Social Security Income SSI >= 1 otherwise 0
pap<-dat$PAP; pap[is.na(pap)]<-0; # Public Assistance Income missing value set to zero
# HINS4 Health Insurance Medicade etc ==1 TRUE other wise no missing
hins<-dat$HINS4; hins[is.na(hins)]<-0; # result 1==receives Medicaid etc. 0==missing 2==no
# FS Receives Food Stamps/SNAP == 1 (yes) == 2 (no) no missing values
cat("summary of PAP>0\n"); print(summary(dat$PAP[dat$PAP>0]));
cat("summary of pap>0\n"); print(summary(pap[pap>0]));
cat("summary PAP==0/pap==0\n");print(table(PAP0=addNA(dat$PAP==0),pap0=addNA(pap==0)));
cat("summary of SSIP>0\n"); print(summary(dat$SSIP[dat$SSIP>0]));
cat("summary of ssip>0\n"); print(summary(ssip[ssip>0]));
cat("summary SSIP==0/ssip==0\n");print(table(SSIP00=addNA(dat$SSIP==0),ssip0=addNA(ssip==0)));
# some defensive coding
fs<-dat$FS==1; fs[is.na(fs)]<-0; fs[fs==2]<-0; # recode 1=receives SNAP yes, 0 missing or no
cat("summary of FS/fs\n"); print(table(addNA(as.factor(fs)),addNA(as.factor(dat$FS))));
life<-(fs==1)|(pov3f=="below")|(pap>0)|(ssip>0)|(hins==1); # logic at household person record level
# no missing values for all derived variables including "life"
dfs<-data.frame(dat,pov3f,pap,hins,fs,ssip,lifeline=life); # data.frame to make sure everything aligns and check logic
cat("\nusac: Summary of derived per person data frame (dat with derived variables)\n");print(summary(dfs));
cat("\nusac: Summary of derived per person (for households) data frame with POVPIP missing.:\n");
print(summary(dfs[is.na(dfs$POVPIP),])); # check logic on pov variable POVPIP missing pov set to 0 (no)
serialH<-sort(unique(as.character(dat$SERIALNO))); # unique household SERIALNO in dat data.frame
mhp<-match(house$SERIALNO,serialH); # look for records in house data.frame that don't match records in dat data.frame
multi.person<-house[is.na(mhp),];
cat("\nusac: PUMS Housing Records that match multiple SERIALNO id's from Person Records\n");
cat(" should be all Household records)\n");
# all records have TYPEHUGQ set to 1 i.e. household records.
print(summary(multi.person));
# now apply | (or) logic across values of life for all persons in a household
#
# application form that lifeline applicants fill out
# www.usac.org/.../LI_Application_NVstates.pdf
# This application seems to indicate that the Household is elgible in anyone individual personin the househould
# is eligible not just the ACS SPORDER==1 (Head of Household -- ACS Questionnaire Person 1)
# so the calculation should OR across all person records in the Household
#
N<-length(serialH); # loop over serialH (unique dat$SERIALNO values)
lifeline1<-lifeline<-rep(0,N);
pweight<-rep(NA,N);
for(i in seq(1,N)) {
if(!(i%%2000)) cat("i=",i,"\n");
housei<-serialH[i]==dat$SERIALNO;
lifeline[i]<-any(life[housei]);
lifeline1[i]<-any(life[housei&(sporder==1)]);
pweight[i]<-sum(person$PWGTP[serialH[i]==person$SERIALNO]); # sum person weights
}; # for i
weight<-dat$WGTP[match(serialH,dat$SERIALNO)];
weight2<-house$WGTP[match(serialH,as.character(house$SERIALNO))];
dataH<-data.frame(SERIALNO=serialH,weight,pweight,lifeline,lifeline1,house[match(serialH,house$SERIALNO),]);
cat("\nusac: compare weights: weight-weight2\n");print(summary(weight-weight2));
cat("\nsummary dataH: \n");print(summary(dataH));
cat("\nsummary 0 weights \n");print(summary(dataH[dataH$weight==0,]));
# number of qualified households
cat("\nsummary dataH: for weight==0\n");print(summary(dataH[dataH$weight==0,]));
qhouse<-sum(dataH$weight[dataH$lifeline==1],na.rm=TRUE); # any person in Household qualifed make Household qualified
qhouse1<-sum(dataH$weight[dataH$lifeline1==1],na.rm=TRUE); # qualified Household based on person SPORDER==1
qperson<-sum(dataH$pweight[dataH$TYPEHUGQ!=1]); # qualified Group Quarters persons (sum person weights)
qhouse2<-sum(dataH$weight[(dataH$lifeline==1)&(dataH$TYPEHUGQ==1)],na.rm=TRUE);
cat("\nLifeline/TYPEHUGQ=\n");print(table(H.type=addNA(as.factor(dataH$TYPEHUGQ)),lifeline=addNA(as.factor(dataH$lifeline))));
cat("summary(dataH: for Group Quarters=\n");print(summary(dataH[dataH$TYPEHUGQ!=1,]));
cat("\nsummary qualified households\n");print(summary(dataH[dataH$lifeline==1,]));
cat("\nusac: Results: ",date(),note,"\n");
cat("usac: Number of households/group.quarters qualified for Lifeline Households=",qhouse,"\n");
cat("usac: Number of households qualified for Lifeline Households=",qhouse2,"\n");
cat("usac: Number of households qualified for Lifeline based on Householder=",qhouse1,"\n");
cat("usac: Number of Group.Quarters persons qualified for Lifeline Households=",qperson,"\n");
rtn<-list(households=qhouse2,householders=qhouse1,personsGQ=qperson,includeGQ=qhouse
,poverty.level=poverty.level,state=toupper(state),fips=fips,note=note,error=error
,url.house=urlh,url.person=urlp,vintage=vintage,period=period,data.person=dfs,data.house=dataH);
invisible(rtn);
}; # usac
## END ##