How to balance privacy and informativeness
Privacy is a fundamental ethical requirement in data usage. Protecting privacy means safeguarding the rights and freedoms of data subjects. But how to segment data content, retain informativeness, and protect individuals’ personal space? Finding the optimal balance in this matter is the most serious challenge in the use of mobility data.
Privacy is simultaneously a normative concept, encompassing philosophical, legal, sociological, political, and economic aspects; and a technical term, signifying specific actions in technology design. Philosophers declared the importance of protecting privacy almost two thousand years ago. In simple words this requirement means “right to be let alone” for self-determination of personality. In the context of data usage, semantic and syntactic privacy are distinguished. The semantic definition of privacy means the right of individuals to control over the collection, use, and sharing of information that pertains to their movements and location: their location history, usual travel routes, travel preferences, their frequency, and duration. Syntactic privacy refers not to the informativeness of the data, but to the architecture of its organization for usage. The syntactic definition would cover the technical specifics of how data is anonymized, encrypted, or aggregated to maintain privacy.
Personal Data
As a normative or semantic concept, privacy encompasses personal data (name, address, phone number, i.e., information that identifies a person), communicative data (emails, phone calls, messenger messages), financial data (bank account, credit card number, transaction history), health data (medical services, treatment history, insurance), behavioral data (people’s tastes and habits, shopping patterns), location and movement data, intellectual property data, internet behavior, political views, religious beliefs, and employment history. In usage of mobile data, privacy also means queries on a location-based service, time of someone’s movements, the purpose of a trip, attendance at specific events or locations. All these examples describe the semantic level of privacy.
Technical solutions
The syntactic level of privacy is implemented through technical solutions in data use. This includes the removal of personal attributes (personally identifiable information, PII), their masking, or pseudonymization (by replacing private identifiers). Often, data aggregation is used (combining individual data points into larger sets or summaries), as well as the indistinguishability method, where individual data points (such as locations or movements of a person) are made less distinct or less identifiable within the dataset. Apart from operations with the existing data content, practices of differential privacy are also used, which involve adding carefully calibrated noise, random data, to the dataset with the aim of preventing the (re-)identification of data subjects. Finding a balance in this practice is problematic, as the level of noise determines the degree of privacy and accuracy. More noise means higher privacy but potentially less useful data, and vice versa.
To protect privacy, coarsening of mobility data is used, which means reducing the precision or granularity of the data. For example, instead of documenting the exact GPS coordinates of a person, the information can be simplified to denote city districts or streets. And instead of precise time stamps, data can be presented in more generalized time frames, such as hourly or daily intervals. Data granulation represents a compromise between ensuring confidentiality and maintaining their usefulness. Excessively coarse processing of data can lead to their unsuitability for practical use, while insufficiently coarse processing may not guarantee adequate protection of confidential information. One form of this strategy worth mentioning is ‘cropping trajectories’, where only segments or portions of an individual’s movement trajectory are retained or used in the dataset. Another and not linked with an anonymization techniques approach is the avoidance of centralized data processing, where analysis is conducted in a distributed manner and the raw, detailed mobility data doesn’t need to be sent to or stored in a central server.
The most radical solution in protecting privacy is the use of synthetic data, artificially generated datasets that mimic the statistical properties of the original data. However, this technical solution is difficult to implement in practice.
Finding the balance
Finding a compromise in ensuring the indistinguishability of data is a matter of precise balance. If the data is too heavily modified, it can impair its suitability for analysis and research purposes. Conversely, if the changes in the data are minimal, it can create risks for maintaining confidentiality. When too much detailed data is presented, it can lead to a loss of privacy. On the other hand, unrealistic privacy standards that demand perfect confidentiality are impossible to implement. Therefore, current technical solutions allow protect the privacy and obtain “almost the same result” in the data informativeness. Effectiveness of the balance between informativeness and indistinguishability, measured using a two-dimensional scale: de-identification techniques и data sharing scenarios, different combinations of which could be found in literature sources and practical cases.
About the project
The Posmo (POSitive MObility) co-operative collects mobility data of a quality not previously available in Switzerland. The data is not only used as a basis for decision-making for the design of more sustainable mobility, but is also made available in a data market for research, urban development or mobility planning. The aim of the cooperative is not to make a profit, but to make an important contribution to a better future for Switzerland. As mobility data is highly sensitive under data protection law, Posmo has developed an initial concept for the ethical data market in an earlier Innocheque project together with researchers from the Institute for Data Applications and Security IDAS, which is now to be further developed.
Leave a Reply
Want to join the discussion?Feel free to contribute!