That's Fresh! Newsletter
Read a selection of our past issues.
- Google's answer to ChatGPTAnd: Generating synthetic data within relational databases. Let's meet at WAICF!February 8, 2023
- Understanding ChatGPT betterAnd: How to deal with imbalanced data. More about our productDecember 14, 2022
- A curated list of failed ML projectsAnd: How to build a data strategy. Clearbox AI and Bearing Point partnership.November 16, 2022
- Our open source library is now on GitHubAnd: Clearbox AI on Cybernews.June 22, 2022
- Discovering DagsterAnd: Quantifying privacy risks. Use case: a synthetic data sandbox to freely share data.June 8, 2022
- Can interaction data be fully anonymized?And: Synthetic Data for privacy preservation: understanding privacy risks. Discover our Enterprise solution.April 6, 2022
- What are GFlow nets?And: Improve models with Synthetic Data. Use case: augment financial time series.March 16, 2022
- The European Commission selected us for Women TechEU pilot project!And: What is Synthetic Data. The new Synthetic Data platform.March 09, 2022
- The EDPS on Synthetic DataAnd: From raw to good quality data. Changelogs: now you can upload unlabeled datasets.February 23, 2022
- 2022 Gartner’s Technology TrendsAnd: How to harness the power of AI in companies. Changelogs: new metrics available for your synthetic dataset.February 09, 2022
If you have spent some time on social media during the last couple of weeks, you might have already read many impressive prompts from ChatGPT. ChatGPT is a chatbot built on top of OpenAI's famous Large Language Model, GPT3.5. Large Language Models are Natural Language Processing models, usually based on Deep Learning, which are trained on massive datasets containing text from different sources (GPT3, for example, is trained on 45TB of text data).
OpenAI's researchers created ChatGPT by first fine-tuning GPT3.5 on a dataset manually created by human labelers containing many prompts and corresponding answers. ChatGPT is then further improved by using a reinforcement learning approach. Such an approach updates the architecture by continuously assigning rewards based on the quality of the output.
The results are impressive, [as you can see by yourself](https://chat.openai.com/?__s=xxxxxxx! As OpenAI's CEO stated, despite its believable prompts, ChatGPT should not be used to obtain reliable and accurate information. However, there's a lot of discussion about the fact that such plausible prompts might trick inexperienced users into blindly trusting their content!
Open AI's model ChatGPT interacts in a conversational way. With its dialogue format, it answers followup questions, reject inappropriate requests, and more.
In order for your company to enjoy all the benefits of synthetic data, we developed a product called Enterprise Solution. Discover how it matches your business' needs.
Our Luca discusses the concepts related to imbalanced datasets and present two techniques to augment your dataset when you encounter such an issue.