DEM1 : Data Processing Patterns ~ ETL to ELT

Thursday. February 17, 2022 - 1 min

ETL is an integration pattern that involves the extraction of data from various sources on-premise, transforming and enriching it into a staging server and loading the data to a target destination such as a data warehouse or data lake. While it is commonly used for batch ingestion process on mostly structured data and legacy data architectures, it tends to create its own data silo for each production application, whether it is on premise or on cloud. Given the diversity of data types coming from website interactions, equipment sensors and social media streams as characterized by the current data landscape, ELT is the preferred option. With an ELT pipeline, newer data formats can be accomodated easily until specific use-cases are defined and transformation processes that are compute intensive are initialized.

For full tutorials and examples on MLflow, please refer here

Designing and implementing an open source machine learning plaform undoubtly has its own set of challenges and unknowns on the triviality or complexity that comes with it.