Navigation and service

Homepage

How is anonymity ensured in the 2011 Census?

The statistical offices are often asked how it is ensured that individuals cannot be identified after the census results have been published. Here we explain how anonymisation of the results works in the 2011 Census.

The individual data for the census results are obtained both from existing registers and from surveys. By means of auxiliary variables such as name and address, the statistical offices link the information for every individual. After such linking, the sensitive auxiliary variables are no longer required, so that they are separated from the survey variables and deleted. What remains is only the survey variables required for statistical evaluation. Consequently, it is no longer possible to directly identify individuals (for example, through their name).

However, it must also be ruled out that individuals can be identified using unique combinations of variables, which would annul the effect of anonymisation.

Protection of individual data …

Imagine a small village in which there is only one 93 year old man. If this single case were published in a table with reference to age, the man could probably be identified by many people although his name is not mentioned. This means that people could learn quite a number of things about the man from the census results, for example, in what kind of dwelling he lives, what religion he has, what his education is, whether he is married, etc.

Every individual must absolutely be protected from such a situation. A central principle applying to all official statistics in Germany says that the respondents’ individual data must strictly be kept confidential (statistical confidentiality, cf. Article 16 of the Federal Statistics Law).

…versus information value of the data

However, apart from the requirement of data protection, there is an equally valid requirement to be met by the census results. The results should very well represent the population structures even at a detailed level because it is a special feature of the census results that the data are available down to municipality level, which makes them a unique data basis.

These two requirements – largely maintaining the analytical potential and ensuring anonymity – must be reconciled. This is achieved by SAFE – a procedure for the safe anonymisation of individual data – which we will explain here.

The solution: the SAFE procedure

The SAFE procedure is a data perturbation method which originally had been developed by staff members of the Land Statistical Office of Berlin-Brandenburg.

The idea is to produce a data stock in which every individual data record is identical to at least two other data records, so that it is no longer possible with this data stock to identify an individual. This means that one-item or two-item cases are either transformed into cases with at least three items or a zero is shown. In the case of our 93 year old man, this would mean that in two other cases (preferably single cases, too) the age would have to be changed. For example, the age of a 92 year old man and of a 94 year old man would slightly be changed upwards or downwards, so that the table would contain three 93 year old men but neither a 94 year old nor a 92 year old man. Alternatively, the age of the 93 year old man could be changed, so that no 93 year old man would be contained in the table.

Little excursus: why must there be at least three cases rather than two?

If there were only two 93 year old men in a municipality, one of the 93 year olds could easily identify himself in the results through his own data and, consequently, he would automatically have all the data on the other 93 year old. So sufficient protection is ensured only when there are three or more cases.

Although the SAFE data perturbation method may at first sound quite arbitrary to non-statisticians, it really is a very stringent and highly complex scientific method.

Through data perturbation, some uncertainty is created for very small numbers of cases, so that individuals cannot be identified. At the same time, the quality and statistical information value of the data are largely maintained because the data are modified in a way that important statistical information and distributions are changed only marginally.

Let us continue with our example of the 93 year old man. When the housing and living situation of older people are examined, the interesting thing of course is not the exact number of 93 year olds. Instead, the focus is on structural data, for example, whether there are many or few older people in a municipality and, in a second step for example, how people live in old age, whether they are married, widowed or divorced.

The official number of inhabitants is shown as an original value

The SAFE method is applied to ensure the confidentiality of the personal data obtained from the population registers and in the survey at residential establishments and collective living quarters and of data from the census of buildings and housing. However, the official number of inhabitants is always shown as an original value (without data perturbation).

For the results of the household survey conducted among roughly 10 % of the population, no data perturbation is necessary because the results are expanded and rounded. Results from the household survey will not be published where they are based on very small numbers of cases.

Service

© Statistische Ämter des Bundes und der Länder 2020

Logo: Statistische Ämter des Bundes und der Länder