Pipeline Data Engineering Academy home blog pages letters

The Data Janitor Letters - August 2020

Data engineering salon. News and interesting reads about the world of data.

The startup data stack starter pack (2020)
Lewis Hemens, CTO, Dataform

I advise a lot of people on how to build out their data stack, from tiny startups to enterprise companies that are moving to the cloud or from legacy solutions. There are many choices out there, and navigating them all can be tricky.


Why is dbt so important?
Stephen Whitworth, Senior Software Engineer, Monzo Bank

If you choose not to use dbt, you’ll probably waste time building a less-fully featured, buggy implementation of it yourself. Give it a serious look.


Guiding principles for a data engineering team
Rahul Jain, Principal Engineering Manager, Data engineering and BI platform, Omio/GoEuro

We prefer boring but battle tested technologies over tech-fetishism.


ClickHouse & Redshift Face Off in NYC Taxi Rides Benchmark
Alexander Zaitsev, Co-founder, Altinity

2020 versions of both ClickHouse and Redshift show much better performance. However, open source ClickHouse continues to outperform Redshift on similarly sized hardware, and the difference increases as the query complexity grows.


Unpopular Opinion - Data Scientists Should Be More End-to-End
Eugene Yan, Applied Scientist, Amazon

Going out of the regular DS & ML job scope helped with delivering more value, faster.


Dear Google Cloud: Your Deprecation Policy is Killing You
Steve Yegge, Head Dude, Ghost Track

Backwards compatibility keeps systems alive and relevant for decades.


Get rid of AI Saviorism
Shreya Shankar, Machine Learning Engineer, Viaduct

Machine learning is a tool, not a panacea. If our tools don’t immediately work for them, it’s not their fault.