Resources
That's Fresh! Newsletter
Read a selection of our past issues.
- 🙌 NumPy 2.0 is almost out!And: Our new data preprocessor with Polars | Interview with S2E at Italy Insurance ForumJune 5, 2024
- 😮 What a month for new LLMs!And: Datacamp webinar with ShaliniMay 22, 2024
- ✨ GenAI true value lies beyond operational enhancementsAnd: The Future of Data Protection | New updates about AI ActApril 24, 2024
- 👁 What are 1-bit Large Language Models?And: Linkedin Live about AI Act | Mastercard's Country Manager interviewed our CEOMarch 6, 2024
- LLaMAntino - Effective Text Generation in ItalianAnd: Creating train and test datasets | Use case: Detecting money muling with the help of synthetic dataFebruary 21, 2024
- 🗞️ The NY Times sues OpenAI and MicrosoftAnd: Can AI work with little data? | La Stampa: AI means developmentJanuary 10, 2024
- Synthetic Data 101 🚨And: Why synthetic data? | New project with Poste ItalianeNovember 8, 2023
- How easy is it for LLM to infer sensitive information?And: Why is data sharing important? | Our new partnership with S2EOctober 25, 2023
- Have you heard of Pythia?And: Data augmentation tutorial | Did you say AI apocalypse?August 30, 2023
- Google's answer to ChatGPTAnd: Generating synthetic data within relational databases. Let's meet at WAICF!February 8, 2023
- Understanding ChatGPT betterAnd: How to deal with imbalanced data. More about our productDecember 14, 2022
- A curated list of failed ML projectsAnd: How to build a data strategy. Clearbox AI and Bearing Point partnership.November 16, 2022
- Our open source library is now on GitHubAnd: Clearbox AI on Cybernews.June 22, 2022
- Discovering DagsterAnd: Quantifying privacy risks. Use case: a synthetic data sandbox to freely share data.June 8, 2022
- Can interaction data be fully anonymized?And: Synthetic Data for privacy preservation: understanding privacy risks. Discover our Enterprise solution.April 6, 2022
- What are GFlow nets?And: Improve models with Synthetic Data. Use case: augment financial time series.March 16, 2022
- The European Commission selected us for Women TechEU pilot project!And: What is Synthetic Data. The new Synthetic Data platform.March 09, 2022
- The EDPS on Synthetic DataAnd: From raw to good quality data. Changelogs: now you can upload unlabeled datasets.February 23, 2022
- 2022 Gartner’s Technology TrendsAnd: How to harness the power of AI in companies. Changelogs: new metrics available for your synthetic dataset.February 09, 2022
FROM THE AI WORLD
This week’s discussion topic is about data privacy applied to interaction data. Interaction data is usually collected by phone carriers, messaging apps or social media companies and, when pseudonymised, is generally regarded as safe with respect to privacy risks. However, in the linked article, the researchers of the Computational Privacy Group at the Imperial College, London, argue otherwise, showing that the supposed anonymised data is susceptible to profiling attacks.
In particular, they demonstrate that deep learning algorithms can be trained to perform successful linkability attacks. Linkability is defined as “the ability to link, at least, two records concerning the same data subject.”. The vulnerability to this specific attack means that interaction data should be treated as personal data even when direct and indirect identifiers are removed.
From a technical point of view, the fascinating part is how the researchers devised the attack itself. They first represent each individual from interaction datasets using interaction graphs describing interactions up to a specified depth. They then train a geometric deep learning model based on the interaction graphs to link individuals in the dataset. They demonstrate the accuracy of the attack on a few datasets, including a Bluetooth proximity dataset similar to that of COVID-19 contact tracing apps.
The interesting idea while reading about these sophisticated attacks is that data cannot be fully anonymised. As capabilities grow, for example, thanks to deep learning, we cannot take a simplistic approach toward resolving privacy risks when anonymity cannot be guaranteed. On the bright side, we notice progress in privacy engineering, risk quantification, and comprehensive assessment of risks rather than a checkbox approach. More on this in the coming weeks!
Anonymous, but not so much
In this article on Nature, the researchers from the CPG at the Imperial College demonstrate that interaction data are identifiable even across long periods of time.
CLEARBOX AI
Have you seen our Enterprise solution?
We care for your needs. That's why we offer a flexible solution of our technology to meet your demands, which can be installed locally or on cloud. Are you curious?
BLOGPOST
Understanding privacy risks
The first part of this blogpost series about Synthetic Data for privacy preservation introduces the analysis of privacy risks to better acknowledge how to protect data.