Everyone knows that with a social login, e.g. with Google or Facebook, user data is collected and the profiles formed from this data have a lot of significance. But how does this actually work and how could it be prevented? Digital identities have developed from purely isolated to federated systems in recent years. A comparison of these models was described in detail in the article Secure Identities. A good example of federated identity models are social logins. As already described in the article “Secure Identities”, these are services that offer electronic identities (Identity Provider, IdP). In a first step (see Fig. 1), a user usually creates an identity together with the IdP, who then stores it centrally in a user database.
Figure 1: Creating a centrally managed identity
If a user wants to access a certain website, it forwards him to the IdP (see Fig. 2). There, the user first authenticates himself, e.g. with a password or a multi-factor authentication. If this was successful, the IdP sends the web application a confirmation of the successful authentication and, if necessary, information about the user.
Figure 2: Using the electronic identity
A web application provider no longer has to take care of the authentication of a user. Rather, it can delegate this to an external service (Google, Facebook, etc.), and additionally receives certain information about the user.
Figure 3: Typical login screen with social login
It is easy to see that with this approach, an IdP learns when the user wants to access which website, and the website in turn can learn which IdP the user has logged on to.
If one wants to prevent this technically, one must fulfil the following requirements:
- Restricted observability (blindness of the IdP): The IdP must not be able to detect and collect data about the use of the web applications by users in order to deduce personal interests or behaviour.
- Restricted observability (blindness of the web application): A web application must not be able to find out where and when the user has authenticated himself and from whom a confirmation of his identity originates.
- Unlinkability: Web applications must not be able to merge personal data without the user’s knowledge. Only if necessary for a legitimate purpose may web applications be able to link a user’s information. This applies not only to unique identifiers but also to attributes that are highly likely to be identifying.
The privacy requirements can be fulfilled technically by various means.
1. Infrequent contacting of the IdP:
Users are sent to the identity provider only once at the beginning by the web application to authenticate themselves and release their personal data. The web application retrieves the data and then stores it in his local user profile. This process is only repeated at the user’s suggestion or sporadically (e.g. 1x per year). Since the web application now authenticates the user itself, the IdP only notices that a user is interested in a web application and has logged in at least once. But the IdP gains no knowledge about when and how often a user uses it. This procedure makes sense if the web application has its own user administration and has many subject-specific details as well as its own authentication. The central IdP can thus be used for “onboarding” (the initial registration of the user). For a web application, this would be particularly interesting if the IdP has high-quality user data (e.g. state-certified attributes) and can thus delegate the identification of the users to the IdP once.
Another way to limit the observability by the IdP and the web application are intermediaries (also called brokers or hubs) that are interposed. By splitting the connection into two sections during authentication, the IdP does not know which service the user is using and, conversely, the service does not know which IdP the user has logged into. This procedure is also advantageous for the web application, as it only has to interact with the intermediary and – in the event that multiple IdPs are to be supported – does not have to support multiple communication connections with potential technical differences. However, the presence of an intermediary creates new problems, as the intermediary can now log authentication processes of the user. Technically, it is possible to build the intermediary in such a way that “blindness” of the intermediary can also be achieved. However, these procedures are quite complex and expensive.
3. Decentralised identities
A completely different approach are “decentralised identities”, as described in this article. With this type of identity, the user is his own IdP, i.e. he creates his own identity (1) and publishes it. This eliminates the problem of observability by an IdP. The user subsequently collects so-called verified statements (2) for his identity from authoritative sources (publishers) who also publish their credibility. Authentication now only takes place between the user and the using web application (3), which can check the statements presented to it in a public identity directory.
Figure 4: Decentralised identity
In order to ensure unlinkability between web applications, a web application must not be able to discover the real identity, but it must be able to verify the statements in a trustworthy manner. This can be achieved with appropriate procedures. However, the decentralised solutions available today are not yet mature enough to be used by the general public. In addition, some research questions, e.g. on the topic of key recovery, trust or revocation (blocking of statements) have not yet been satisfactorily solved. Besides the still unresolved issues, however, decentralised digital identities offer advantages for the user and a company. For example, decentralised identity systems inherently protect the user’s privacy because they give users full control over their identity and an authentication process does not require a central authority. Decentralised identities also represent added value for companies. The risk of a central user database is completely eliminated. This relieves companies because they no longer have to store identities and “credentials” (e.g. passwords) and protect them against unauthorised access. This applies to both identity providers and web application operators. An identity provider no longer has to maintain the identity of a user (this is created by the user himself), but only personal information. He is responsible for this information as an authoritative source (publisher) and can make it available to the user in the form of verifiable statements. But the web application operator also has an advantage because he can always authenticate the bearer of information without having to store login IDs and passwords himself. This can significantly reduce identity fraud, as such information can no longer be stolen and reused. In addition to data management and security, this also reduces costs for “customer onboarding” and overall lifecycle management.
The solutions presented for protecting privacy when logging into websites and apps have both advantages and disadvantages. Further research is currently being conducted (also at the BFH in the IDAS Institute) in order to achieve the goals of unobservability and unlinkability in the future. Until then, one can only trust the IdP not to misuse its knowledge commercially. Legal regulations, such as those for private IdPs in the planned E-ID, attempt to strengthen this trust, but the risk of having revealed too much of one’s privacy remains with the user.