small typoe -- that's wihh (workers in HH), not whih. Other than that, what Matt posted does the job.
There are multiple ways to get the thing done in R (and FWIW there were multiple ways to do that in Stata). And there are multiple cracks in both the original code and in Matt Herman's code that tiny things where fidelity of the code and the data may cause minor issues.
1. The concept of a "flag" is that it is a 0/1 variable indicating a certain logical condition. Stata does not have the explicit logical type, but it internally converts true statements (2==2) to the value of 1, and false statements (0==1) to 0. Conversely, 0 values are understood as false and nonzero values, as true. So if you want to create a variable that identifies a subset, the above code does exactly that. As Matt said, in R, you would want to use the explicitly defined logical format for that.
2. Matt's code does not reproduce creating missing values for the situation when WKL is missing, but it is likely that this is being picked up in the WKL==1 logical condition.
3. In fact, WKL appears to be a string variable in Stata, but Matt treats it as a numeric variable, so that PUMS data set may have been read differently in Stata and R.
4. I would have to assume that get_pums() picks up all the technical variables by default, including SERIALNO, the household ID. This variable was not specified in the list to read variables=c("WKL"), but appears nevertheless in the data set, along with the weights and such.
5. This is a style issue -- I would have ungroup() in the end of the pipe, getting your grouping variables in the summary data sets sometimes becomes annoying.
6. In R, you could have skipped the whole step of creating a separate worker variable and just go with
vt_pums %>% group_by(SERIALNO) %>% mutate(wihh = sum(WKL==1))... and likewise in Stata, you could simply
egen wihh = total( WKL=="1"), by(SERIALNO). Actually, the code that is there is quite weird and not quite Stata-style; the person who wrote it was confusing the sum() function of creating the running sum, and the total() sub-function of egen command that creates the groupwise totals. So that would raise a brow with me, frankly.
It looks like regonzalez prefers the base R syntax; the group operations in the base R must be weird, as I have to admit I never bothered going back to the base R for data management. Also there are at least two recode functions, car::recode() and dplyr::recode(), it looks like the former was used.