The major problems I encountered are the following, It worked but not without problems, we had a rough journey, we paid hefty prices in the process but eventually succeeded. i.e The most voluminous data transfer was around 25-30 million records at the frequency of 30 minutes with a promise of 100% data integrity for an F500 company. When I say large scale, I meant significantly large but not of the size of Social Media platforms' order. IntroductionsĪ robust data engineering framework cannot be deployed without using a sophisticated workflow management tool, I was using Pentaho Kettle extensively for large-scale deployments for a significant period of my career. In the second section, we shall study the 10 different branching strategies that Airflow provides to build complex data pipelines. The objective of this post is to explore a few obvious challenges of designing and deploying data engineering pipelines with a specific focus on trigger rules of Apache Airflow 2.0. I thank Marc Lamberti for his guide to Apache Airflow, this post is just an attempt to complete what he had started in his blog.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |