Pipeline Data Engineering Academy home blog pages letters

The Data Janitor Letters - September 2020

Data engineering salon. News and interesting reads about the world of data.

Data Cleaning IS Analysis, Not Grunt Work
Randy Au, Quantitative UX Researcher, Google

Also, most data cleaning articles suck.


Importing account statements and building a data warehouse
Kristian Köhntopp, Senior Scalability Engineer, Booking.com

I was experimenting with importing the account statements from my German Sparkasse, which at that time were being made available as a CSV.


Our Online Analytical Processing Journey with ClickHouse on Kubernetes
Sudeep Kumar, Member of Technical Staff - 2, ebay

With our new, cross-region aware OLAP pipeline, we reduced our overall infrastructure footprint by over 90 percent.


Dawn of DataOps: Can We Build a 100% Serverless ETL Following CI/CD Principles?
Luis Velasco, Cloud Data Analytics Specialist, Google

Is it time to enjoy the benefits of DevOps in the informational space?


Publish events, not logs
Kislay Verma, Software Engineer, Cure.Fit

I believe that logging as understood commonly is an ad hoc activity, and cannot handle the unknown-unknowns of a production system. We need to switch to an event perspective to leverage it more effectively for system design and reliability.


Be Vigilant about Time Order in Event-Based Data Processing
Mingwei Li, Senior Software Engineer, Expedia Group Technology

How to handle the timing of events.