That's Fresh! Newsletter
Read a selection of our past issues.
- Google's answer to ChatGPTAnd: Generating synthetic data within relational databases. Let's meet at WAICF!February 8, 2023
- Understanding ChatGPT betterAnd: How to deal with imbalanced data. More about our productDecember 14, 2022
- A curated list of failed ML projectsAnd: How to build a data strategy. Clearbox AI and Bearing Point partnership.November 16, 2022
- Our open source library is now on GitHubAnd: Clearbox AI on Cybernews.June 22, 2022
- Discovering DagsterAnd: Quantifying privacy risks. Use case: a synthetic data sandbox to freely share data.June 8, 2022
- Can interaction data be fully anonymized?And: Synthetic Data for privacy preservation: understanding privacy risks. Discover our Enterprise solution.April 6, 2022
- What are GFlow nets?And: Improve models with Synthetic Data. Use case: augment financial time series.March 16, 2022
- The European Commission selected us for Women TechEU pilot project!And: What is Synthetic Data. The new Synthetic Data platform.March 09, 2022
- The EDPS on Synthetic DataAnd: From raw to good quality data. Changelogs: now you can upload unlabeled datasets.February 23, 2022
- 2022 Gartner’s Technology TrendsAnd: How to harness the power of AI in companies. Changelogs: new metrics available for your synthetic dataset.February 09, 2022
I recently discovered Dagster at a conference and was genuinely impressed by its potential. Dagster is an open-source data orchestration platform, allowing data engineers to develop, test, and deploy data pipelines at scale. It can be seen as an alternative to Airflow and wants to enable developers to write data pipelines using Python.
Data pipelines are defined as Directed Acyclic Graphs (DAGs), where each graph is composed of several ops. Ops are units of computations, and they can be written and tested locally using Python functions. One of the most important ideas behind Dagster is that developers can work on pipelines by starting small and gradually scaling up. It also includes a web interface, Dagit, to visualise graphs and ops better.
What interested me and will probably push me to play with Dagster is that the learning curve doesn’t seem steep, especially when thinking of using the tool for small scale projects!
This technology is an open-source data orchestration platform for the development, production, and observation of data assets. Do you want to dig deeper?
In this use case we show how our technology is helpful to share and move your data inside and outside your organisation, while complying with privacy regulations.
In the second part of 'Synthetic Data for privacy preservation' series, our Andrea provides a tutorial on how to quantify and prevent re-identification risk.