Nikhil Das Nomula
Orchestrators play an important role in data engineering to automate workflows. Orchestrators have grown from something that can run a sequence of steps to now where we expect orchestrators to
Basically providing a nice interface where we can "observe" what is going on with our workflows/data pipelines.
When it comes to orchestrators, Apache airflow and Kestra have been a great orchestrator but their approach is task based. What it means is that - the way we approach the problem is by focusing on hows? The tasks/verbs
Dagster takes a different approach where it is focused on the whats - which dagster terms them as assets. Dagster provides a great example in its documentation of how this makes a difference when it comes to reusability.
For e.g. if we want to make cookies the task centric way, the way we approach the problem is
Now if we take the asset centric approach, the way we approach the problem is
Now what makes asset centric approach different is that, we can re-use these assets. For e.g. in the above example, if we go with asset based approach to make peanut based cookies, you can use the existing asset which is cookie dough and add peanuts to it.
We will get into more detail in the series of dagster articles, but this should give you an idea of what Dagster is and how it is different from Apache Airflow?
In this post we will go over three approaches that we see across organizations when it comes to data engineering. The three approaches are...
Read MoreOne might wonder, if relational databases have been mainstream for so long - how come document based databases i.e. NoSQL have become so popular.
Read More