Further, the NSS estimates from the employment and unemployment for people engaged in the agricultural sector and the workforce participation rate might be exaggerated as well as it over-represents the rural population and the proportion of the working-age population at the national and rural and urban areas respectively, it said.
“National Sample Survey done by the ministry of statistics and programme implementation over-represents the rural population, the scheduled caste (SC) population, and the working-age (age between 15 and 59 years) population compared to the Census done during the same period, raising doubts on the representativeness of the Surveys,” EAC-PM member Shamika Ravi said in a working paper, analysing the data quality of NSS. The paper has been co-authoured by Mudit Kapoor of the Indian Statistical and SV Subramanian of Harvard University.
“Quantitatively, our data quality analysis suggests a reduction in the statistical efficiency ranging from 97% to 99.9%, implying that the NSS is statistically non-representative at the national level, including at the rural and urban levels,” the authors said.
Talking about the implications of the analysis for survey strategy in general, the paper said there are consequences for surveys that use the same sampling strategy, such as the National Family Health Survey (NFHS) in 2019–2114 and the Periodic Labour Force Survey (PLFS) in 2021–2215.
“Since both these Surveys use Census 2011 for the sampling frame and supposing the urbanization process was as rapid as in the previous decade, our analysis suggests that both these surveys will have a rural bias because the sampling frame does not account for the dynamic changes in the target population,” it said, adding that the estimates from the Survey might not be representative.Talking about the key implication from the post enumeration surveys (PES) of Census 2011, the paper said that the data quality of the NSS is perhaps worse than what has been estimated, suggesting a need for adequate attention on data quality. “Given the importance of data in framing policies, adequate attention must be paid to data quality. Otherwise, there is a possibility of misguided policies that are based on biased estimates, which might not reflect society’s true changes or progress,” the authors said in the paper.
“However, we believe that if the survey estimates have to become more representative and robust, we would need further rigorous research to understand the nature of data defects,” the paper said, suggesting similar analysis should be done at the state level also given that government policies vary across states.
Listing out some of the limitations of the current working paper, the authors concluded that the bigness of the data cannot address issues related to data quality. On the contrary, it makes us “precisely wrong”, they said.