Charles,
Since my original post here, the Census Bureau has done something very much like what you suggest. Their first release of the 2019-2023 5-year PUMS identified 2020 PUMAs for all respondents. Later, they released a new version of the 2018-2022 5-year PUMS that identifies 2020 PUMAs for all respondents, but they later removed that.
(We at IPUMS discovered some 2018 records in the 2018-2022 updated release that had invalid 2020 PUMA IDs. We let the Bureau know about these errors, and they temporarily removed the entire updated version of the PUMS. In this errata note, they state that they plan to release a corrected version at an unspecified later date.)
We’ve integrated the 2019-2023 5-year release into IPUMS USA, and we will also add the updated 2018-2022 PUMS if/when they're again available.
That said, as I understand, the approach they've taken is not exactly what you suggest--"regeocoding"--and there's a very good reason for that: it would violate respondent privacy.
If they had simply regeocoded, then for the many households that appear in both the 2017-2021 5-year PUMS (using 2010 PUMAs) and in the 2018-2022 5-year PUMS (using 2020 PUMAs), it would typically be possible for a user to identify both the 2010 and 2020 PUMAs where each household resided. If a household resided in an area affected by a small PUMA boundary change--maybe encompassing a few thousand residents or less--then we could determine the location of that household MUCH more precisely than if we only knew which 2010 PUMA the household is in. (PUMAs are required to have at least 100,000 residents in order to prevent users from locating respondents more precisely.)
To my knowledge, the Bureau has yet to provide exact details on how they allocated 2020 PUMA identifiers to 2018-2021 responses in these new PUMS releases, but the documentation they have provided indicates that it was not by regeocoding. Rather, they used some kind of crosswalk from 2010 PUMAs to 2020 PUMAs, as you might get from Geocorr. I’ve confirmed that the allocation is not based on areal weighting (simplistically assuming that the likelihood of a 2010 PUMA resident living in a 2020 PUMA is equal to the proportion of the 2010 PUMA’s area in the 2020 PUMA.) But beyond that, we don’t know what the technique was.
They may have used population allocation factors like those provided in Geocorr files, but I can't verify that. Unfortunately, whatever allocation technique they used--if it was not in fact regeocoding--will have introduced some allocation errors, so it'd be helpful if at some point they could provide more information about their approach and/or the potential scope and impact of any resultant PUMA misidentifications.