R/generateHHID.R
generate.HHID.Rd
Generating a new houshold ID for survey data using a houshold ID and a personal ID. For surveys with rotating panel design containing housholds, houshold members can move from an existing household to a new one, that was not originally in the sample. This leads to the creation of so called split households. Using a peronal ID (that stays fixed over the whole survey), an indicator for different time steps and a houshold ID, a new houshold ID is assigned to the original and the split household.
generate.HHID(dat, period = "RB010", pid = "RB030", hid = "DB030")
dat | data table of data frame containing the survey data |
---|---|
period | column name of |
pid | column name of |
hid | column name of |
the survey data dat
as data.table object containing a new and
an old household ID. The new household ID which considers the split
households is now named hid
and the original household ID has a
trailing "_orig".
if (FALSE) { library(surveysd) library(laeken) library(data.table) eusilc <- surveysd:::demo.eusilc(n=4) # create spit households eusilc[,rb030split:=rb030] year <- eusilc[,unique(year)] year <- year[-1] leaf_out <- c() for(y in year) { split.person <- eusilc[year==(y-1)&!duplicated(db030)&!db030%in%leaf_out, sample(rb030,20)] overwrite.person <- eusilc[year==(y)&!duplicated(db030)&!db030%in%leaf_out, .(rb030=sample(rb030,20))] overwrite.person[,c("rb030split","year_curr"):=.(split.person,y)] eusilc[overwrite.person, rb030split:=i.rb030split,on=.(rb030,year>=year_curr)] leaf_out <- c( leaf_out, eusilc[rb030%in%c(overwrite.person$rb030,overwrite.person$rb030split), unique(db030)]) } # pid which are in split households eusilc[,.(uniqueN(db030)),by=list(rb030split)][V1>1] eusilc.new <- generate.HHID(eusilc, period = "year", pid = "rb030split", hid = "db030") # no longer any split households in the data eusilc.new[,.(uniqueN(db030)),by=list(rb030split)][V1>1] }