Pipeline Data Engineering Academy home blog pages letters

The Data Janitor Letters - January 2021

Data engineering salon. News and interesting reads about the world of data.

We Don't Need Data Scientists, We Need Data Engineers
Mihail Eric, Machine Learning Scientist, Amazon Alexa AI

There are 70% more open roles at companies in data engineering as compared to data science.


What can we learn from SQL's 50 year reign? A story of 2 Turing Awards
Felix Schildorfer, Chief Data Scientist, First Retail Inc.

The relational data model was introduced in 1970 and has dominated for 50 years. What led to its success? Building on first principles and Bushnell's law.


Automating my job by using GPT-3 to generate database-ready SQL to answer business questions
Brian Kane, Data Engineer, SeekWell

Now, I've got a GPT-3 instance that takes a plain English question and translates it to SQL that really works on my database.


How Shopify Is Building Their Production Data Warehouse Using DBT - Episode 171
Michelle Ark + Zeeshan Qureshi, Senior Data Engineer + Tech Lead/Engineering Manager, Shopify

Structure the project to allow for multiple teams to collaborate in a scalable manner, have the additional tooling to address the edge cases, and the optimize the continuous integration process to provide fast feedback and reduce costs.


Introduction to Databases for Data Engineers
Oleg Agapov, Data analyst/BI, GOG.com

Data engineers need to have a broad knowledge about ways of storing and processing data. Most of this knowledge will come with practice. But is still important to understand general ideas behind all concepts I've explained here.


Kafka As A Database? Yes Or No – A Summary Of Both Sides
David Xiang, Engineering Team Manager, Squarespace

I personally have never used a Kafka log as the source-of-truth for my data. Software development is hard enough as it is, even when trying to go “by the book.”


An unlikely database migration
Brad Fitzpatrick + David Crawshaw, Late Stage Co-Founder + CTO, Tailscale

The goal is to keep development speed as close to the early days of JSONMutexDB, when you could recompile and run locally in a fraction of a second and deploy ten times a day.


Achieving 11M IOPS & 66 GB/s IO on a Single ThreadRipper Workstation
Tanel Põder, Co-founder, Gluent Inc.

Modern disks are so fast that system performance bottleneck shifts to RAM access and CPU.