Resources
That's Fresh! Newsletter
Read a selection of our past issues.
- 🙌 NumPy 2.0 is almost out!And: Our new data preprocessor with Polars | Interview with S2E at Italy Insurance ForumJune 5, 2024
- 😮 What a month for new LLMs!And: Datacamp webinar with ShaliniMay 22, 2024
- ✨ GenAI true value lies beyond operational enhancementsAnd: The Future of Data Protection | New updates about AI ActApril 24, 2024
- 👁 What are 1-bit Large Language Models?And: Linkedin Live about AI Act | Mastercard's Country Manager interviewed our CEOMarch 6, 2024
- LLaMAntino - Effective Text Generation in ItalianAnd: Creating train and test datasets | Use case: Detecting money muling with the help of synthetic dataFebruary 21, 2024
- 🗞️ The NY Times sues OpenAI and MicrosoftAnd: Can AI work with little data? | La Stampa: AI means developmentJanuary 10, 2024
- Synthetic Data 101 🚨And: Why synthetic data? | New project with Poste ItalianeNovember 8, 2023
- How easy is it for LLM to infer sensitive information?And: Why is data sharing important? | Our new partnership with S2EOctober 25, 2023
- Have you heard of Pythia?And: Data augmentation tutorial | Did you say AI apocalypse?August 30, 2023
- Google's answer to ChatGPTAnd: Generating synthetic data within relational databases. Let's meet at WAICF!February 8, 2023
- Understanding ChatGPT betterAnd: How to deal with imbalanced data. More about our productDecember 14, 2022
- A curated list of failed ML projectsAnd: How to build a data strategy. Clearbox AI and Bearing Point partnership.November 16, 2022
- Our open source library is now on GitHubAnd: Clearbox AI on Cybernews.June 22, 2022
- Discovering DagsterAnd: Quantifying privacy risks. Use case: a synthetic data sandbox to freely share data.June 8, 2022
- Can interaction data be fully anonymized?And: Synthetic Data for privacy preservation: understanding privacy risks. Discover our Enterprise solution.April 6, 2022
- What are GFlow nets?And: Improve models with Synthetic Data. Use case: augment financial time series.March 16, 2022
- The European Commission selected us for Women TechEU pilot project!And: What is Synthetic Data. The new Synthetic Data platform.March 09, 2022
- The EDPS on Synthetic DataAnd: From raw to good quality data. Changelogs: now you can upload unlabeled datasets.February 23, 2022
- 2022 Gartner’s Technology TrendsAnd: How to harness the power of AI in companies. Changelogs: new metrics available for your synthetic dataset.February 09, 2022
FROM THE AI WORLD
In the last couple of months, chatGPT became a viral phenomenon. Meanwhile, people have started wondering whether a conversational agent such as chatGPT can change how we search for information online, eventually replacing traditional search engines.
Feeling the pressure, Google announced they would gradually start deploying their conversational AI as an add-on to their search engine. This conversational agent, Bard, will be tested in the coming days by a selected number of developers with the idea of releasing it to the public in the coming weeks/months.
Bard internally uses Google's conversational model LaMDA. LaMDA is a transformer-based model, fine-tuned using human annotations. The main idea behind its architecture is to align Large Language Models with human values by defining metrics covering aspects such as output quality and safety. These metrics are used to fine-tune the model using a supervised approach. Furthermore, LaMDA can also use external APIs pointing, for example, to information retrieval systems and calculators to improve answers containing factual content.
Let's see how Bard will fare, especially considering the competition from chatGPT!
Introducing Google's Bard
ChatGPT became recently a viral phenomenon. Now Google announced they started deploying their conversational AI as an add-on to their search engine.
CLEARBOX AI
Let's meet at WAICF!
Will you be at WAICF Cannes this weekend? We can't wait to meet you! Find us at Booth S33 or contact us directly and don't miss our pitch on 10th Feb at 4pm.
BLOG POST
Synthetic relational data
How to generate synthetic data when the original database includes sensitive personal information and contains dozens of relational tables?