Census releases guidelines for controversial privacy tool
Hold onto your calculators, statisticians
Your support helps us to tell the story
From reproductive rights to climate change to Big Tech, The Independent is on the ground when the story is developing. Whether it's investigating the financials of Elon Musk's pro-Trump PAC or producing our latest documentary, 'The A Word', which shines a light on the American women fighting for reproductive rights, we know how important it is to parse out the facts from the messaging.
At such a critical moment in US history, we need reporters on the ground. Your donation allows us to keep sending journalists to speak to both sides of the story.
The Independent is trusted by Americans across the entire political spectrum. And unlike many other quality news outlets, we choose not to lock Americans out of our reporting and analysis with paywalls. We believe quality journalism should be available to everyone, paid for by those who can afford it.
Your support makes all the difference.Hold onto your calculators, statisticians!
After three years of fierce debates, conflicting academic papers and a lawsuit, the U.S. Census Bureau on Wednesday announced guidelines for how a controversial statistical method will be applied to the numbers used for drawing congressional and legislative districts. The method is meant to protect the privacy of people who participated in the 2020 census, though critics claim it favors confidentiality at the expense of accurate numbers.
The privacy method adds controlled “noise,” or intentional errors, to the data to obscure the identity of any given participant in the 2020 census while still providing statistically valid information. The final guidelines announced by the Census Bureau weigh more in favor of accuracy than privacy compared to past test versions released by the statistical agency that interested parties have been evaluating.
The debate over the method known as differential privacy has resulted in a nerd knife-fight of sorts among statisticians, demographers and the redistricting experts who argued over whether its application would make unusable the numbers used for redrawing congressional and legislative districts. Release of the specific guidelines could further intensify the ongoing debate about the accuracy of numbers gathered during a national headcount that took place in the midst of a global pandemic and an already supercharged political climate.
If you picture the privacy tool as a dial with lower settings offering the most privacy and higher settings providing the most accuracy, the Census Bureau dialed up the accuracy in the final guidelines. The statistical term for this dial is “epsilon," and the bureau settled on an epsilon of 19.61, significantly higher than where the dial was set in earlier versions that critics raised concerns about.
“The decisions strike the best balance between the need to release detailed, usable statistics from the 2020 Census with our statutory responsibility to protect the privacy of individuals’ data,” said Ron Jarmin, acting director of the Census Bureau. “They were made after many years of research and candid feedback from data users and outside experts – whom we thank for their invaluable input.”
University of Minnesota demographer Steven Ruggles, who had raised accuracy concerns about earlier versions, said Wednesday that the epsilon in the final guidelines is now so high it won't offer much privacy protection.
“The inventors of differential privacy regard such a high epsilon as pointless," Ruggles said.
The state of Alabama sued in an effort to stop differential privacy from being used at all on the redistricting data, claiming it would produce inaccurate numbers, and a panel of three judges could make a decision any day.
The Census Bureau says more privacy protections are needed than in past decades, as technological innovations magnify the threat of people being identified through their census answers, which are confidential by law. Computing power is now so vast that it can easily crunch third-party data sets that combine personal information from credit ratings and social media companies, purchasing records, voting patterns and public documents, among other things.
The redistricting data is expected to be released in mid-August. Differential privacy wasn't applied to the state-level numbers used for divvying up congressional seats among the states. Those numbers were released in April.
___
Follow Mike Schneider on Twitter at https://twitter.com/MikeSchneiderAP