Pipeline Data Engineering Academy home blog pages letters

The Data Janitor Letters - May 2021

Data engineering salon. News and interesting reads about the world of data.

Analytics Engineering Everywhere
Jason Ganz, Cofounder, ChattyKathi

Or why in five years every organization will have an Analytics Engineering team.


Common data model mistakes made by startups
Metabase

It’s important to note that the anti-patterns we’ll discuss below are specific to startups.


16 fundamental principles for transforming data in a warehouse
Rahul Jain, Head of BI and Data Engineering, Beat

The current discourse on data can get a little tiring because of its over focus on tooling.


Using PostgreSQL as a Data Warehouse
Cedric Dussud, Cofounder, Narrator.ai

With some tweaking Postgres can be a great data warehouse. Here's how to configure it.


Why Spark is NOT the right tool for ETL work
Kevin Bair, Director Sales Engineering, Snowflake

My larger point here is when you have a hammer (spark) everything looks like nail.


Using Apache Airflow DockerOperator with Docker Compose
Flávio Clésio, Staff Engineer Data/Machine Learning, Artsy

I personally believe that Airflow + Docker it’s a good combination for flexible, scalable, and hassle-free environments for ELT/ETL tasks.


The Metagame of Applying Machine Learning
Eugene Yan, Applied Scientist, Amazon

When designing systems, less is more.


PostGIS at 20, The Beginning
Paul Ramsey, Executive Geospatial Engineer, Crunchy Data

All the development was done on the trusty Sun Ultra 10 I had taken out a $10,000 loan to purchase when starting up the company.


Drunk Post: Things I've learned as a Sr Engineer

SQL is king. Airflow is shit, yes.