Integral and Open Systems, Inc helps solve complex and novel business problems by deploying data-driven and AI systems into production. By working with the business from initial data strategy & employing a holistic approach to our services, we combine the latest technical advances, real-world expertise, AI engineering, and an understanding of business and data requirements to generate operational value.
Using Openmined we use federated learning to training machine learning models on data that is stored on different devices or servers across the world, without having to centrally collect the data samples.
Encryption
Encryption: Securely keep models and data private against malicious actors
Federated learning
Federated learning: Train computationally greedy machine learning models in a decentralised manner on less computationally efficient devices.
Often, deep-neural networks are over-parameterized, meaning that they can encode more information than is necessary for the prediction task. The result is a machine learning model that can inadvertently memorize individual samples. For example, a language model designed to emit predictive text (such as the next-word suggestions seen on smartphones) can be probed to release information about individual samples that were used for training (“my social security number is …”).
Differential privacy is a mathematical framework for measuring this leakage. Differential privacy describes the following promise to data owners: “you will not be affected, adversely or otherwise, by allowing your data to be used in any study or analysis, no matter what other studies, datasets, or information sources are available”.Differential privacy works by injecting a controlled amount of statistical noise to obscure the data contributions from individuals in the dataset. This is performed while ensuring that the model still gains insight into the overall population, and thus provides predictions that are accurate enough to be useful. Research in this field allows the degree of privacy loss to be calculated and evaluated based on the concept of a privacy ‘budget’, and ultimately, the use of differential privacy is a careful tradeoff between privacy preservation and model utility.
Data partitions
Federated learning supports three types of data partitions: horizontal, vertical, and federated transfer learning. A brief summary of each is below:
Horizontally partitioned federated learning (HFL):
data distributed in different silos contain the same feature space and different samples.
Horizontally partitioned federated learning (HFL):
data distributed in different silos contain different feature spaces and the same samples.
Federated transfer learning (FTL):
data distributed in different silos contain different feature spaces and different samples.
Below are generalized manifestations of horizontally partitioned, cross-silo learning problems
Structured data
Structured data: Examples include application data from enterprise software, installed across many institutions and/or databases split across international boundaries in adherence to data localization requirements.
Unstructured data
Unstructured data: Examples include clinical documentation, tomographic images, and/or VCF files from multiple cooperating healthcare institutions.