Data ethics: balance between innovations and responsibility
POSMO, as an ethical data market, prioritizes the protection of rights and freedoms of its data producers. Accordingly, the ethical principles of the company dictate the implementation and use of innovative technological solutions. There are numerous theoretical publications, about the importance and effectiveness of this approach. However, experts note the difficulty in the practical realization of an ethical data use for projects and for maintaining a fair balance between the implementation of technologies and their responsible use.
The use of mobility data is associated with potential risks for its producers. Data may be re-identified when compared with other datasets or at a later time. However, re-identification does not necessarily imply immediate harm to the subjects of the data; it only suggests such a potential risk. Nonetheless, ethical standards in data usage mandate the necessity to warn about and prevent this possibility.
The most significant risks are related to the loss of anonymity or the possible re-identification of data subjects. For instance, information about the frequency and nature of visits to medical institutions could negatively affect the terms of insurance guarantees, and information about the absence of residents in a house at a certain time could be used for criminal purposes.
Moreover, there’s always a risk of data breaches, where sensitive information could fall into the wrong hands and be used for malicious purposes such as identity theft, stalking, or harassment. For example, data about a large gathering of people at a specific time can be used for social engineering, influencing public opinion, carrying out terrorist acts, political provocations, and other antisocial activities. Thus, for instance, attempts to distribute illegal substances may occur in places where large numbers of teenagers gather. And while official institutions, like schools, have certain security measures, such spontaneous locations lack them.
What are the risks?
Serious risks associated with the collection of mobility data are related to surveillance and loss of anonymity, where individuals feel constantly monitored, and leads to a loss of anonymity and personal freedom. Intense tracking of workers’ activities by the management of an organization can, under certain conditions, cross the boundaries of self-determination and become abusive.
Another group of risks is associated with discrimination and bias: if the data is biased (e.g., over-representing certain populations, or containing social biases inherent to the collection method), it can lead to discriminatory practices in urban planning, resource allocation, or targeted advertising. For example, information about neighborhoods predominantly inhabited by migrants could lead to a lower level of infrastructure development: availability of stores and schools, public transportation, and provoke conflicts based on cultural, ethnic, and religious differences. Such information can also become a basis for manipulating the population, such as varying prices for identical goods or aggressive marketing campaigns. This argument is also linked to the risk of economic exploitation, which can be associated with spam, targeted advertising, and the use of data without compensation.
The next in this list of potential risks is data accuracy and reliability: incorrect or misleading data can lead to poor decision-making in urban planning, transport management, and other civic areas. For instance, incomplete data about the number of people using bicycles can lead to a lack of bike racks in convenient locations for users.
A particular risk of using mobility data is the dependency on technology. Thus, a low number of public transport users at certain times leads to a reduction in the number of transport vehicles used, which can negatively impact the comfort of its users. Moreover, serious risks are due to psychological factors, such as fear of manipulation or loss of control in decision-making by data subjects.
Figure 1 (Source: https://aircloak.com/the-five-private-eyes-part-1-the-surprising-strength-of-de-identified-data/ )
4 Conditions to the violation of privacy
One should not overlook the impact on vulnerable populations. Vulnerable groups are also difficult to protect in datasets. This is because it is very difficult to predict use cases and protect against biased or malicious analyses. For example, children and the elderly might become simply ‘invisible’ in the decision-making process since they are typically not the subjects of data.
As has been demonstrated, the landscape of mobility data is fraught with potential risks. While the intricacies of handling such data may seem dangerous, a closer examination reveals that there are several layers of safeguards that can be put in place before harm befalls data subjects.
There are four safeguarding conditions against re-identification that all have to be broken as shown in the image above. They are as following
- Ingress: Securing Access to Data
- The first condition necessitates that an attacker must have access to de-identified data. To mitigate this risk, limiting access to sensitive information becomes a foundational principle.
- Incentive: Weighing Risks and Benefits
- The second condition addresses the analyst’s motivation to re-identify data. Contractual prohibitions and monitoring actions serve as effective deterrents, while the cost of re-identification rises with the strength of de-identification mechanisms.
- Isolation: Breaking Recognition Patterns
- Strong anonymization mechanisms make it challenging to recognize or isolate individual data points. Techniques such as K-anonymity and advanced systems contribute to rendering re-identification increasingly difficult and expensive.
- Identification: Bridging Isolation and Intrusion
- Even if an individual is isolated in the data, true re-identification only occurs when personal information is associated. The distinction between isolating and identifying individuals becomes crucial in safeguarding privacy.
Only once all four conditions are met, the privacy of an individual can be violated. However, this does not automatically lead to harm for the individual. From an attacker’s perspective, the knowledge gained about the victim needs to be useful and the attacker has to act on the gained knowledge to inflict harm. From the victims perspective, the subjective feeling of privacy violation plays a pivotal role. Understanding whether an individual feels intruded upon adds a nuanced layer to the ethical considerations of handling de-identified data.
Contrary to the widely held belief that anonymization is an insurmountable challenge, instances of malicious re-identification of anonymized data are notably rare. Efforts to re-identify such data are often confined to white-hat attacks initiated by academics or journalists. Instances of malicious attacks on properly anonymized data are conspicuously absent, suggesting that successful breaches are not as commonplace as initially thought. However, the progression of technology allows for new attacks on datasets that have been anonymized with suddenly outdated algorithms. Therefore it is impervious to publicize as little data as possible.
About the project
The Posmo (POSitive MObility) co-operative collects mobility data of a quality not previously available in Switzerland. The data is not only used as a basis for decision-making for the design of more sustainable mobility, but is also made available in a data market for research, urban development or mobility planning. The aim of the cooperative is not to make a profit, but to make an important contribution to a better future for Switzerland. As mobility data is highly sensitive under data protection law, Posmo has developed an initial concept for the ethical data market in an earlier Innocheque project together with researchers from the Institute for Data Applications and Security IDAS, which is now to be further developed.
Leave a Reply
Want to join the discussion?Feel free to contribute!