Pipeline Data Engineering Academy home blog pages letters

The Data Janitor Letters - April 2021

Data engineering salon. News and interesting reads about the world of data.

Diving Deep on S3 Consistency
Dr. Werner Vogels, CTO, Amazon.com

We built S3 on the design principals that we called out when we launched the service in 2006, and every time we review a design for a new feature or microservice in S3, we go back to these same principles.


We were promised Strong AI, but instead we got metadata analysis
Cal Paterson, contract software engineer

How simple structured data trumps clever machine learning.


A Comprehensive Framework for Data Quality Management
Chau Vinh Loi, Data Scientist, ANZ Australia

How to monitor and maintain Data Quality to make sure the data meets certain standards for specific business use-cases


Layering Your Data Warehouse
Mitchell Silverman, Analytics Engineer, Spotify

I never thought I would be comparing my work in data engineering to the great Mike Myers but Ogres and Data Warehouses have a lot in common. Both are misunderstood by most and both can save the day when called upon.


The missing piece of the modern data stack
Benn Stancil, Chief Analytics Officer + Founder, Mode

The core problem is that there’s no central repository for defining a metric.


Benchmarking SQL engines for Data Serving: PrestoDb, Trino, and Redshift
Anton Peniaziev, data and machine learning engineer, Explorium

Data serving is a special business case, which demands real-time low-latency on small queries, the ability to scale and withstand abrupt peak loads, and high concurrency.



Software infrastructure 2.0: a wishlist
Erik Bernhardsson, Ex-CTO, Better

I mean, as a user, I can set up a static website in AWS, but it takes 45 steps in the console and 12 of them are highly confusing if you never did it before.


How Litestream Eliminated My Database Server for $0.03/month
Michael Lynch, Builder of @TinyPilotKVM

Data persistence for people who hate database servers.


It is time to fulfill the promise of CI/CD
Charity Majors, Cofounder/CTO, @honeycombio

Why your software should be auto-deployed within 15 minutes after you merge it, with no manual gates. This is the key to high performing teams and high-quality software.