Privacy and the Census
Maria Amezcua ’20, a member of the graduating class of the M.A. Urban and Public Affairs (UPA) program writes on the elements of privacy concerns related to the 2020 Census. Her guest blog post is taken from a class assignment for her course work in “Social Justice and the Census”. After reading this post, follow @USFVotes on Instagram or Twitter to stay current on the conversation!
Throughout the history of the US Census Bureau, privacy and the protection of information have experienced positive and negative outcomes. For the 2020 Census, privacy has become a sensitive topic as tensions rise high with advanced technology, cyber hackers, and continued distrust of the government. How the U.S. Census will respond regarding privacy could drastically challenge the accuracy of reports and people’s self-reporting.
Identifying U.S. Census Privacy systems
The Disclosure Avoidance System is a team whose priority is to find the faults of Census privacy and calculate or find new methods that will bridge the gaps for the approaching census. In the 2020 Census, the team conducted in-depth research into the 2010 Census through what they call reconstruction. Because the data and reports that the census releases are aggregated data, or data that is grouped together in a block, county, or more, there is no individual data that can be targeted for personal information. In reconstruction, the data can be recreated and produce individual data. The team then uses a set number of data and finds measures to link the data to a specific person, this method is called re-identification. By using external data such as names, age, and sex, found during reconstruction, the information is linked to the public data that has been published by sources like the Census, age, sex, race, etc. In doing all these steps, the team for Disclosure Avoidance uses the results to find a new and updated method to protect the privacy of individuals while also providing accurate reports to the public (Hawes 2020).
The Census Bureau released an article identifying and balancing the types of privacy needed to adequately provide protective methods against attacks for online security. By considering the 2010 census, the Bureau agrees that with an advance in technology, many changes need to occur to properly protect the people and their information for the 2020 Census. The U.S. Census after analyzing the previous census, has decided that the 2020 Census will not be using the same privacy methods as the 2010 Census, instead differential privacy, or “formal privacy,” a key and leading method will be used by the Census Bureau to protect online information gathered by the census (Hawes 2020; Abowd and Velkoff 2019).
What is “differential privacy”?
Differential privacy is a new method created by cryptographers who encrypt data to counter the modern privacy risks and protect individual information from outside entities. This new system will calculate the privacy risk throughout the entire gathering, producing, and releasing reports and information (Abowd and Velkoff 2019). It will calculate all external data and prevent entities from easily and accurately reconstructing and re-identifying the data released by the Census Bureau.
Differential privacy works by adding noise to all the data that is gathered and can control the uncertainty of tables, variables, and records. This means the system will automatically be inputting noise of random data that blocks actual data or noise that is mixed with actual data, producing uncertainty of which data is real. Noise prevents outside entities from accurately re-identifying and connecting people to their information, more uncertainty of data results in more privacy where information cannot be linked. The use of differential privacy is a great improvement over the 2010 Census which had sporadically added privacy and noise to some data before the information was released as reports and statistics. When reconstruction and re-identification were tested, the Disclosure Avoidance team analyzed and concluded that outside entities could easily and very likely use the reports gathered by the census and link them to external data (Hawes 2020; Abowd and Velkoff 2019).
The above video uses drawings and math to better understand privacy concerns and the options that are available when discussing the need for privacy and accuracy.
Privacy or Accuracy?
While differential privacy controls the privacy of the information, new concerns appear as the accuracy of the data becomes compromised. By including differential privacy into the data and information, the reports and statistics that will be released compromises the accuracy. This means that more noise being added to data can disrupt the actual data and produce reports that are no longer accurate. Because there is only so much privacy that can be inputted into the data, data will always be at risk, it can only be a hundred percent safe and risk-free if the Census completely privatizes the data and does not release any information. While this option is not realistic, we must mitigate our wants and needs for both privacy and accuracy.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Abowd, Dr. John M. and Velkoff, Dr. Victoria A. 2019. “Balancing Privacy and Accuracy: New Opportunity for Disclosure Avoidance Analysis.” US Census Bureau. Hawes, Michael. 2020 “Differential Privacy and the 2020 Decennial Census.” US Census Bureau. Minutephysics. 2019. “Protecting Privacy with Math (Collab with the Census).” https://youtu.be/pT19VwBAqKA