Taking back control of our data: Re-decentralizing the Web, for good this time (Part 2)

A recurring theme in the above centralization races is the lack of choice: a choice of browser and operating system, of entry point to the Web, of storage for our personal data. Decentralization is fundamentally about enabling choice, by breaking up artificially coupled decisions into individual options that can be combined at will.

Just as we are free to choose any combination of device, operating system, and browser to access the Web, we should be able to interact with websites and other people without commitment to a single social or other platform. The resulting universality is crucial for permissionless innovation, because it establishes inventors’ independence of these platforms and of other centralized forces.

Taking back control of our data

Taking back control of our personal data, as envisioned by Tim Berners-Lee, is realized by decoupling data storage from services. This means people can store their data wherever they choose, while still enjoying the services they want. We could pick any provider to store our texts, photos, and videos—or simply store them on our own Web server—and rely on any third-party service to interact with them, regardless of storage location. The crucial service of identity can, but does not need to, be provided along with storage.

This mindset gives rise to the concept of a personal data pod, in which we can store every single piece of information we or others produce about us. As shown in the figure below this statement can be taken quite literally: even a seemingly trivial piece of data, such as simple “like” we gave a certain webpage or thing, can be stored in our own pod. While such a degree of decentralization might seem extreme, recall that even supposedly trivial likes can reveal much deeper personal information [12], so it makes sense to give people control over them. Furthermore, since we do not depend on anyone’s permission to publish data in our own pod, we can place public or private likes, dislikes, and comments on anything we want, without fear of them being censored or deleted.

 

On a decentralized Web, every single piece of data can be stored in a place chosen by its author [14].

Storing data ourselves enables highly granular access control: people can selectively give permission to friends or applications to read or write specific parts of their data pod. For instance, they can decide whether or not they make their profile picture and full name public, who can see which of their likes and comments, and what applications can edit their pictures or posts on their behalf. These permissions can be changed or revoked at any time. People can have multiple data pods for different purposes, for instance, a pod for personal and family pictures at home, a pod governed by retention policies for professional data at the workplace, and a university pod with study materials and grades. Upon creation, they can decide which data is stored in which pod.

By choosing the storage location of our own data, we prevent unauthorized access and exploitation. We are no longer obliged to pay with our data in order to access a certain service. Moreover, we can protect the most sensitive parts of our data by keeping them to ourselves, and limit sharing to people and services that really require it—but only for as long as they need it.

Independent innovation in separate data and service spaces

A key point, however, is that this separation of data and apps benefits not only people, but also the companies providing apps or services. Nowadays, the first task for any app is collecting the data it needs for its operation—either by asking the user to input it (once again), or by integrating with existing data collection platforms. In order words: either they compete with the big platforms, or they depend on them. Of course, data is a prerequisite for many services in today’s data-driven world, but the fallacy is that this would imply every single company has to collect that data over and over again. Through personal data vaults, every service gains an equal opportunity to innovate with data, when people explicitly give them access. This levels the playing field, since every company can now create experiences with readily available data without depending on any central party.

Moreover, user-managed data removes both the burden and the benefits of data collection, which is expensive and legally complex for most companies. Privacy-unfriendly business models centered around data collection will become less attractive. Such an economic change can be accelerated through legislation, like the EU’s General Data Protection Regulation (GDPR), as well as growing awareness among the general population about the dangers of centralization, given data scandals at companies such as Equifax and Facebook. Consequently, new business models for applications become necessary.

Decentralization requires the nature of applications to evolve from silos to shared views. As shown in the figure below, current Web apps combine data and service. Because of this coupling, our LinkedIn contacts cannot comment on our Facebook pictures, and an RSVP on a Facebook event will not be reflected in our Doodle calendar. Decentralized applications, on the other hand, act as views on top of our data pod and those of others. When granted specific access rights, photos uploaded into our data pod by a photo gallery application can be accessed by a social feed app. Events in my personal calendar that have public visibility can show up in that same feed. Our friends can view the parts of our data to which we grant them access through whatever application they wish to use.

Centralized Web applications act as silos that do not share data with each other. Decentralized Web applications act as shared views on top of personal data pods.

Because the choices of data and service become untangled, separate competitions for storage and apps emerge. The figure below shows that centralized applications compete in a single market based on data collection, because usage of a service is coupled with usage of its storage. As such, people cannot easily switch to a better application experience, as migrating their data—if possible—is technically challenging. The incentives for large application builders to develop great apps are small compared with the temptation to collect large heaps of data. Furthermore, new applications that could offer a better experience have trouble entering the market, since they do not have any data at all. With decentralized Web applications, people select their storage and service providers separately, which allows an independent competition on the level of storage and on the level of services. On both levels, the competition is solely based on service quality and features versus cost, not by which party happens to collect the most data.

This independence means we can freely switch data and service providers, without requiring our friends to choose the same ones. It tears down the walls in between the gardens, because we gain the ability to reuse and move our data, and can interact with anyone in the entire landscape. Data and service providers can evolve without dependency on each other, which enables a faster and more creative innovation cycle. Through the implementation of standards and specifications, anyone can enter either market and attract customers by providing a better experience than others, without demanding control of our data.

Centralized applications compete in a single space, with the aim of collecting the most data. On a decentralized Web, data and service solutions compete in different spaces [14], independently aiming to provide a good experience.

The Solid project

In order to realize this vision of independent data storage and services, Tim Berners-Lee started the Solid project [15]. Solid consists of specifications for interoperability, implementations of servers, clients, and applications, and a community of people who build new things. In the next sections, we will discuss some of Solid’s unique aspects.

Personal data linking and integration

The goal of Solid is empowering people through personal data management, as a counterpart to enterprise data management. We can consider a Solid server or data pod as the equivalent of a hard disk for the Web, on which we can store arbitrary documents and pieces of data. Then Solid apps are like our desktop applications, except that they open documents from Solid servers on the Web. In contrast to actual hard disks, Solid servers are typically public to the entire world, so detailed access control settings allow us to specify who can view or edit which of our documents. Tim Berners-Lee has been leading by example, by managing his personal and professional life with Solid for several years already.

In order for such data management to work at Web scale, data in different pods need to link to each other, similar to how hypertext documents allow us to jump from one website to another. Solid uses Linked Data [16] to achieve this: every piece of data can link to any other. This is how, for example, a comment in your data pod can be attached to a photo in someone else’s pod, while both of you can remain in control of your own data. At runtime, Solid applications integrate data from multiple sources and blend them together into a single experience.

Solid pods can offer people a decentralized means of identification. People can pick a so-called WebID, which is a Web address that identifies them. This Web address leads to their public profile, and people can log on to any pod with their own WebID, instead of requiring a new login on every website or resorting to a centralized identity provider.

Read–Write Web

One of the crucial aspects of Solid is that it provides a Read–Write platform, as was Tim Berners-Lee’s original intention for the Web [17]. While writing has always been possible, in the sense that anyone could start their own website, the Web 2.0 and social media revolutions should be credited with making it considerably easier. This explains part of the success of these platforms, as anyone can now be a content producer at any time, especially through their mobile devices.

Solid should make authoring content similarly easy, the difference being of course that we would always write to our own data pods instead of to the application through which we create. To maximize interoperability, our Linked Data should be stored using Semantic Web technologies [18], which interweave a piece of data with its meaning. That way, applications can make sense of (parts of) each other’s data, without having to agree upfront exactly what our data should look like. When storing data in our own pods, we need a mechanism to inform others when things have been created or modified—especially if these are comments on their data. This is enabled through Linked Data Notifications [19], small automated messages similar to email, which different data pods can send to each other. By combining these technologies, Solid aims to realize the Read–Write Linked Data vision [17], in order to ensure that everyone can participate in the Web of Data.

Potential for disruption

By transforming the role of applications in a decentralized ecosystem, Solid is able to disrupt many interactions that happen on the Web. Many processes that currently depend on centralization can be revolutionized in a decentralized way, by cutting out the middlemen currently executing these processes. This can stimulate innovation in areas that are embracing the current status quo and resisting change.

A first obvious target are social interactions between people. Sharing multimedia with friends, colleagues, and family members without privacy concerns becomes possible through Solid. Other examples include collaborating on various kinds of documents under transparent access control, and organizing meetings and events—again with independent choice of application and storage, and seamless synchronization between different apps.

Moreover, Solid has the technological potential to disrupt entire industries, such as for instance scholarly publishing. The current scholarly publication process assumes that an author uploads a scientific manuscript to a centralized platform, where a closed group of reviewers evaluates it. After acceptance, the manuscript is published as an article and then becomes accessible to the public, possibly at a fee. This process is rather slow, as the wider scientific community can only read the article at the end—if accepted. It is also non-transparent because valuable artefacts of the process, such as reviews and revisions, remain hidden. Further participation is typically only possible through a reply that has to undergo a similar slow process. A decentralized authoring application such as dokieli [20] instead allows researchers to self-publish their manuscripts online in their own Solid pod. Their peers can annotate these manuscripts with comments and reviews, which are stored in their own pods, guaranteeing freedom of expression to any researcher who wants to participate. All outcomes of this process are online, and the scientific community can continuously provide feedback, even after publication on the Web.

A decentralized Web for all

Re-decentralizing the Web along the lines of the Solid vision can help us tackle Tim Berners-Lee’s three challenges [4]. We can take back control of our personal data by storing data in our own data pods. The spread of misinformation can be halted, because a free choice of applications allows us to influence our news feed—and all information in there can be traced back to its source. Political advertising becomes more transparent, as we can decide which parts of our data we expose to whom; moreover, the separate data and services spaces allow us to consider other options that are not based on advertising in the first place. While this does not fully address all aspects of the challenges, control and choice are major factors.

Freedom of course always comes at a cost: what constitutes a victory for personal rights and freedom of speech also facilitates the spread of illegal messages, since decentralized networks make it harder to control what information is exchanged. Legality is of course a tricky matter, as some countries instate laws that prevent their citizens from voicing opinions that would be legal elsewhere. An intriguing case is the increased popularity of the decentralized social network Mastodon in Japan [21]: as Twitter started removing images that were deemed questionable under US norms, Japanese users began publishing them on platforms with less censorship. Japanese users began publishing them on their own platforms. We might have to accept this trade-off between freedom and control—and in absence of a globally accepted set of norms, centralized filtering will never be an adequate solution.

Freedom and internet

This brings us to another aspect of decentralization, which is the tension between freedom and universality. The Paradox of Freedom states that we can only be free if we subject ourselves to certain rules. Simply said, we can take our bike and ride anywhere—if only we stay on the right side of the road (which in several countries is actually left). If not for such rules, we would be unable to get anywhere without causing accidents. Given that universality has always been a main goal of the Web [7], decentralized communities can only flourish if they agree on some basic framework on how to decentralize. As with the universality of browsers, there is a major role for the W3C in creating the standards that will allow decentralized data pods and apps to interoperate. Fortunately, we do not have to agree on everything. Linked Data enables layered agreements, in which a few rules need to be adopted by many, and sets of additional rules are agreed upon by smaller groups as required.

Importantly, the arrows of decentralization and Solid are not aimed at specific companies. Instead, they point at centralization in general, since many of the problems and challenges faced by these companies are inherent to centralization and the business model of data collection. We have come to the point where companies possess so much data that they themselves are unable to predict the long-term effects that such a centralization might have [10]. Therefore, it is an unreasonable excuse for them to claim that people could have given “informed consent” to let their data be processed the way it is. No individual can reasonably understand what happens when they give up control over small parts of their data, since that leads to very different effects in the Big Data picture. Storing our data in a trusted place of our choice, combined with a granular permission model, is therefore a much safer bet.

Tim Berners-Lee insists that the Web should always remain scale-free [22], with room for the very large and the very small, and everything in between. The goal is thus not to have a Web without large players. However, the problem is that the very large are currently trying to make the rest obsolete, which endangers the online freedom and permissionless innovation we have been enjoying for so many years. As argued above, decentralization is foremost about choice, so people should be free to join large or small communities. While several technical issues concerning decentralized applications lie ahead of us, notably providing a better user experience than centralized platforms, the first technological proof has been delivered with Solid. Now, it is up to all of us to anchor this technological progress in today’s and tomorrow’s socio-economic reality in order to re-decentralize the Web for good. Only when we succeed in taking back control over our most precious digital assets, we can truly say: this is for everyone.


References

  1. Derakhshan, H. (2015), “The Web We Have to Save”, 14 July, available at: https://medium.com/matter/the-web-we-have-to-save-2eb1fe15a426.
  2. “Break down these walls”. (2008), The Economist, available at: https://www.economist.com/node/10880516.
  3. Pariser, E. (2011), The Filter Bubble, Penguin Books.
  4. Berners-Lee, T. (2017), “Three challenges for the Web, according to its inventor”, Web Foundation, 12 March, available at: https://webfoundation.org/2017/03/web-turns-28-letter/.
  5. Rosenthal, D. (2018), “It Isn’t About The Technology”, 11 January, available at: https://blog.dshr.org/2018/01/it-isnt-abouttechnology.html.
  6. Berners-Lee, T. (2018), “The Web is under threat. Join us and fight for it”., Web Foundation, 12 March, available at: https://webfoundation.org/2018/03/web-birthday-29/.
  7. Berners-Lee, T. (2005), “Universality of the Web”, 23 March, available at: https://www.w3.org/2005/Talks/0323-yorkshire-tbl/slide5-2.html.
  8. Berners‐Lee, T., Cailliau, R., Groff, J.F. and Pollermann, B. (1992), “World‐wide web: the information universe”, Electronic Networking, Vol. 2 No. 1.
  9. Gustafson, A. (2008), “Beyond DOCTYPE: Web Standards, Forward Compatibility, and IE8”, 21 January, available at: https://alistapart.com/article/beyonddoctype.
  10. Berjon, R. (2018), “Advertising’s War on Consent”, 19 March, available at: https://berjon.com/advertising-war-on-consent/.
  11. Berners-Lee, T. (2018), “From Utopia to Dystopia in 29 Short Years”, 18 May, available at: https://www.csail.mit.edu/news/utopiadystopia-29-short-years.
  12. Kosinski, M., Stillwell, D. and Graepel, T. (2013), “Private traits and attributes are predictable from digital records of human behavior”, Proceedings of the National Academy of Sciences, National Academy of Sciences, Vol. 110 No. 15, pp. 5802–5805.
  13. Samarajiva, R. (2014), “More Facebook users than Internet users in South East Asia?”, 30 August, available at: http://lirneasia.net/2014/08/more-facebook-users-than-internet-users-in-south-east-asia/.
  14. Verborgh, R. (2017), “Paradigm shifts for the decentralized Web”, 20 December, available at: https://ruben.verborgh.org/blog/2017/12/20/paradigm-shifts-for-the-decentralized-web/.
  15. “Solid”. (n.d.). , available at: https://solid.mit.edu/.
  16. Berners-Lee, T. (2006), “Linked Data”, 27 July, available at: https://www.w3.org/DesignIssues/LinkedData.html.
  17. Berners-Lee, T. and O’Hara, K. (2013), “The read–write Linked Data Web”, Philosophical Transactions of the Royal Society A, Vol. 371 No. 1987.
  18. Berners-Lee, T., Hendler, J. and Lassila, O. (2001), “The Semantic Web”, Scientific American, Vol. 284 No. 5, pp. 34–43.
  19. Capadisli, S. and Guy, A. (Eds.). (2017), Linked Data Notifications, Recommendation, World Wide Web Consortium, available at: https://www.w3.org/TR/ldn/.
  20. Capadisli, S., Guy, A., Verborgh, R., Lange, C., Auer, S. and Berners-Lee, T. (2017), “Decentralised Authoring, Annotations and Notifications for a Read–Write Web with dokieli”, in Proceedings of the 17 International Conference on Web Engineering, pp. 469–481, available at: https://csarven.ca/dokieli-rww.
  21. Zuckerman, E. (2017), “Mastodon is big in Japan. The reason why is… uncomfortable”, 18 August, available at: http://www.ethanzuckerman.com/blog/2017/08/18/mastodon-is-big-in-japan-the-reason-why-is-uncomfortable/.
  22. Barabási, A.-L. and Albert, R. (1999), “Emergence of Scaling in Random Networks”, Science, Vol. 286, pp. 509–512, 286, pp. 509–512.
Creative Commons Licence

AUTHOR: Ruben Verborgh

Ruben Verborgh is a professor of Decentralized Web technology at IDLab, Ghent University – imec, and a research\ affiliate at the Decentralized Information Group of CSAIL at MIT.

Create PDF

Related Posts

None found

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *