Pipeline Data Engineering Academy home blog pages letters

The Data Janitor Letters - November 2021

Data engineering salon. News and interesting reads about the world of data.

Tracking-industry body IAB Europe told that it has infringed the GDPR
Irish Council for Civil Liberties

Google and the entire tracking industry relies on IAB Europe’s consent system, which has now been found to be illegal.


Why the Data Analyst role has never been harder
Petr Janda, CTO, Pleo

The curse of complexity around the Modern Data Stack.


Spreadsheets: The Duct Tape of the Modern Data Stack
Mikkel Dengsøe, Head of Data Science, Operations & Financial Crime, Monzo Bank

Spreadsheets are the interface that allows anyone to quickly and easily bring data into the data warehouse.


Serverless dbt on Google Cloud Platform
Robert Sahlin, Senior Data Engineer, MatHem.se

A serverless solution to run dbt in a self-hosted and collaborative setup and being able to follow GitOps style.


Orchestrating ELT with Prefect and dbt — a Flow of Flows (Part 1)
Anna Geller, Solutions Engineer, Prefect

How to manage dependencies between data pipelines.


Modelling Type 1 + 2 Slowly Changing Dimensions with dbt
Weng-Kin Lee, Associate Consultant, Servian

By following this pattern to create dimension models, it is easy to incorporate both Type 1 and Type 2 changes into the same dimension. Using dbt macros, we have modularised the dimension’s functionality for reusability and provide capacity to add more functionality to the dimension.


Mapping our data journey with column lineage
Borja Vázquez Barreiros, Senior Analytics Engineer, Monzo

At the time of writing, we have over 4700 data models in our production dbt project, and over 800 views defined in Looker 🤯.


Lesser Known PostgreSQL Features
Haki Benita, Development Team Lead, PCENTRA

Most of us are not aware of all the features in tools we use on a daily basis, especially if it's big and extensive like PostgreSQL.


DuckDB-Wasm: Efficient Analytical SQL in the Browser
André Kohn and Dominik Moritz, DuckDB

It is powered by WebAssembly, speaks Arrow fluently, reads Parquet, CSV and JSON files backed by Filesystem APIs or HTTP requests and has been tested with Chrome, Firefox, Safari and Node.js.


From Data Engineer to SysAdmin
Jonathon Belotti, Team Lead Data Platform, Canva

Put down the K8s cluster, your pipelines can run without it.


Will Nix Overtake Docker?
Connor Brewster, Software Engineer, Replit

No, these tools accomplish different goals, however they can be used in combination to provide the best of both worlds: reproducible builds and containerized deployments.


Storm in the stratosphere: how the cloud will be reshuffled
Erik Bernhardsson

Cloud vendors will increasingly focus on the lowest layers in the stack: basically leasing capacity in their data centers through an API. Other pure-software providers will build all the stuff on top of it. Databases, running code, you name it.