Pipeline Data Engineering Academy home blog pages letters

The Data Janitor Letters - June 2021

Data engineering salon. News and interesting reads about the world of data.

The Analytics Engineering Guide
dbt Labs

Collaborating as a data team to produce excellent datasets -- some parts are bullshit, but it's an interesting read.


Welcome to Snowpark: New Data Programmability for the Data Cloud
Isaac Kunen, Senior Product Manager, Snowflake

Two words: Java functions.


Accidentally exponential behavior in Spark
Ivan Vergiliev, Tech Lead, Leanplum

Don't use Spark for tasks that require complex logic.


The ritual of the deploy
Vicki Boykis, Machine Learning Engineer, Tumblr

Deploying is a ritual. It’s a sacred place, a quiet place, and a dangerous place, where anything can happen. In deployment, the system is in a fragile state, and you are in a fragile state.


Gently Down the Stream
Mitch Seymour, Illustrator, Author, Founder, Round Robin Publishing

A gentle introduction to Apache Kafka.


What's Kafka and what does Confluent do?
Justin Gage, Technically

Help with solving Kafka-esque data problems


Cloudera to go private as KKR & CD&R grab it for $5.3B
Ron Miller, TechCrunch

Cloudera was once one of the hottest Hadoop startups, but over time the shine has come off that market, and today it went private.


Meltano Spins Out of GitLab, Raises $4.2M in Seed Funding Led by GV to Enhance Open Source Data Integration

"Meltano aims to bring the entire data lifecycle into the DataOps Era." Wut?