OpenMined: an example of 'decentralized AI' - Fondazione Giannino Bassetti

The era of the algorithm and the frontiers of digital “trusting”

The full implementation of the General Data Protection Regulation (GDPR) means the end of the transitory period before the sanction regime is applied. It has brought both small and large transformations that practically effect the behaviour of all of the keepers of information, and as a consequence all of us.

The protection of privacy is already defined as a fundamental right by the EU as an integral building block of human dignity and personal freedom. The era of the algorithm relies on the “dynamic protection” of personal data extending user rights over information regarding them (for example the possibility to transfer their own data from one online service to another, limitations or opposition to data holding and so on).

The question has borders that are decidedly broadly drawn however, going beyond the remits of locally drawn legislation to touch upon the fundamental themes of civil life and the roles of institutions that are held responsible for their regulation.

For a concrete example we only have to look at the recent case of Cambridge Analytica. Facebook, a giant international company that guarantees the communications of billions of people across the world, was more or less knowingly taken advantage of in order to manipulate (more or less effectively) the results of the presidential election of the most powerful democracy on Earth. Who was responsible for regulating this operation and under which international laws should they have regulated it?

When we speak of the immense quantity of sensitive information collected through IT and digital instruments, the traditional relationship between contents and container, goods and instrument, operator and third party are superseded in a way and with such ease to make our heads spin. It is a little like in those mutated images of quantum physics and has become part of common-sense rhetoric: the observer modifies the object observed. The keeper of the data is its manipulator. Whoever should (be able to) regulate the keeper, from the moment that they gain access to the tools that would allow them to do so, finds themselves in the paradoxical situation of taking the role themselves of keeper, with all of the risk that this involves.

This comes about because the technology and the techniques that are drawn from it have become its paradigm, and have already assumed the role of total predominance. Whoever has access to certain technologies has the power that this brings in their own hands. It is no longer the model of an army that increments its own power by adding arms and men to its arsenal which is in turn limited by the defensive tools of the enemy; now it is the arsenal, that has become global that holds the force, and whoever can manage or control it has all the power. Any form of third party that needs to fulfil the role of regulator has to take total control of this total power. The risk that we run is that there is no longer any space for either mediation or roles.

And so, consequentially, who can we trust in the immediate future, and upon which dynamics is this trust to be based? One not so far-fetched idea is that trust cannot be attributed to one or more actors in this scenario, but to a combination of transparency and technology. But, and here a range of scenarios open, which technology? Implemented within which technical principles?

One extremely innovative approach has been adopted by the recent “OpenMined” project, that using an ingenious combination of technologies proposes a new “trusting” dynamic for immediate practical application, within the field of Artificial Intelligence (AI). The criteria and methods proposed allow the fusion of a high level of transparency with the functioning of technological architecture, with an extremely high level of confidentiality for each user.

The OpenMined Project: total decentralization

When we think about the need to change a mindset to face today’s technological challenges, intuition seems absolutely convincing. When we hear the ideas of someone who is actually doing it however, we feel ourselves push against the idea. The recent OpenMined project and its community of supporters have precisely this effect.

The claim made by the project is explicit enough: “OpenMined is a community focused on building open-source technology for the decentralized ownership of data and intelligence”.

The project’s mission (at the time of writing this involves a community about 150 developers and a website that has existed since May 2018) is certainly challenging: To build new technology and make it available both to users who furnish it with their data and to developers of machine learning solutions who will develop models and carry out training for their solutions on that data, which will be capable of maintaining the confidentiality of all parties involved. If we add to this the idea that each participant should be paid in order to constructively collaborate to the improvement of the applications, without having to sell their own data, there seem to be two possibility: there is either a trick, or magic involved.

Every aspect of this project features many elements of what we might call a responsible research and innovation approach (RRI). The project began with the growth of a community that is multidisciplinary and broad, and not merely composed by technicians, while the entire production and development process is published through open sources in order to allow the project to become a framework that can be shared as widely as possible. The solution displays many of the “privacy by design” and “privacy by default” characteristics of the aforementioned GDPR. It uses the most interesting and often most controversial technology in a particularly innovative way, and does so in an extremely integrated way. If there is a risk however, it is that of having raised expectations too high in order to make a convincing argument that this particular use of technology may be able to address some of the spinier problems “auto-magic-ally”.

The current state of the art in the development of applications in the field of AI displays characteristics and dynamics that are more or less consolidated: the need to have a lot of data available and for enormous calculation capacity for its analysis. The main questions raised by this approach are those around the theme of ownership of data and of the capabilities of its producers to uphold forms of control over the effective treatment and real aims of the use that this collected data is put to. The importance of this aspect becomes more or less amplified by the type of data collected: medical, behavioural, political, shopping, sexual and so on.

The data that individuals produce is collected via applications, smartphones or IoT devices and given (more or less consciously) to those who run the projects in question, who in order to deliver what they have promised to the customer, pass them on to third party companies. The cloud services offered by Amazon, Google o Microsoft are for example ever more frequently the technological answer to the complexity of organizing such huge volumes of data, offering competitive advantage over other solutions or those completely carried out autonomously. The choice to pass on the data to external service providers is therefore often conditioned by technical and economic conditions derived from the quantity of resources necessary for the storage and processing of the information, with all of the implications for confidentiality imaginable.

Similar considerations are to be borne in mind connected purely to calculation capacity. An already consolidated phenomena is the concentration of this capacity in an oligopoly of large providers that a broad type of user is ever more incentivized to use. These dynamics are similar in every type of application: from start-ups to giant companies, and from private research centres to those public.

The OpenMined project: a technology based upon magic

“Any sufficiently advanced technology is indistinguishable from magic” (Arthur C.Clarke)

The new aspects of the OpenMined project create a form of innovation that could be defined as incremental, because it develops already existing paradigms through a particularly inventive approach, representing therefore a certain continuity. The innovativeness is however radically empowered due to the fact that the technology upon which it is based is itself considered radical, as independently from its longevity its potential for innovation is coming to the fore in this moment.

Deep learning, Federated learning, Homomorphic Encryption, Blockchain, Smart Contracts: all of these forms of technology come together in the construction of a new architecture whose aim is to allow the implementation of AI solutions that are able to manage efficient and functional learning. All of this permits the use of calculation potential that is distributed and the full respect of the confidentiality of the data providers and the industrial property rights of the companies that propose the models.

“With OpenMined, an AI can be trained in environments that are not secure on data it never has access to”

In the OpenMined approach the questions that seem “magic” are numerous:

Privacy and ownership of user data.

The data produced by the users can remain their property and entirely under their own control. There are no technical necessities to pass on or share personalized data to the other parties involved. This means the simplification (in many cases to the point of elimination) of the need for legal agreements that guarantee that the different parties will maintain confidentiality. The machine learning systems proposed can carry out their training on data that has been mathematically treated in order to have made it non traceable, while maintaining its value for the learning process.

Confidentiality and intellectual property of the AI models

The models proposed by research organizations, start-ups and private companies can be distributed to calculation centres or more broadly to single users with adequate calculation capacity in order to undergo the necessary training techniques that can improve their efficiency. Technically speaking however, the models can be operated without the user gaining information about the model itself. In this way the technology cannot be taken or copied or used without explicit authorization from its owner, safeguarding economic investment and technological knowledge.

Payment for participation

The technological solution proposed by OpenMined allows the implementation of a reliable payment system that can allow the users to receive payment for having participated, collaborated with their data and calculations to system learning and therefore to its general and wholistic improvement. The payment can be based upon the level of improvement that their involvement leads to and is decided in a transparent and fair way.

Furthermore, the relationships between the various parties involved does not require regulation through adherence to or drawing up of legal agreements. The architecture of the solution proposed is firmly based upon technology that having blockchain-based components inherits many of its characteristics. The concept of trust between participants does not therefore require trust in itself, but is expressed as collections of restrictions whose following must be algorithmically verifiable and checked.

Politics of control: the “Oracle”

The functionality of the entire architecture is based upon a logic/mathematical component that is defined or named the Oracle. This structure is responsible for the autonomous management of the correct distribution of the cryptographic keys necessary and uses a smart-contract to guarantee the maximum level of trust between all of the components of the solution. It is therefore possible to see it as the guarantor. The management of the technological infrastructure of the oracle is therefore the most sensitive element within the architecture, and at least for the time being the community that is developing OpenMined considers its development impossible to constrain. Having trust in it therefore becomes unavoidable; but we are obviously talking about relative trust, because it is visible through smart-contract.

The organization that will be entrusted with the management of the oracle can therefore be either public or private, having interest in taking the role of guarantor between all of the various components and users involved. This role could be held for example by a public entity that guarantees the processes of development of a model of AI within the medical field beginning with patient data, without this data having to be given by the patients to those carrying out the research. In other contexts, for example within the broad field of business analysis, the correct management of the oracle could be delegated to a third party and restricted to economic work that was directly proportional to the scope of the project and the sensitivity of the data treated.

————-