Pipeline Data Engineering Academy home blog pages letters

The Data Janitor Letters - October 2020

Data engineering salon. News and interesting reads about the world of data.

Testing SQL for BigQuery
Barbara Scherlein, Scala Backend and Data Engineer, Soundcloud

When I finally deleted the old Spark code, it was a net delete of almost 1,700 lines of code; the resulting two SQL queries have, respectively, 155 and 81 lines of SQL code; and the new tests have about 1,231 lines of Python code.


Why Percona wants your database to be open source, and not everyone is happy about it
Matt Asay, Head of Open Source Strategy and Marketing, AWS

Percona runs open source databases as managed services, which makes the company popular with customers but less so with competitors.


ETL Batch Processing With Kafka?
Tomasz Kaszuba, Java Big Data Engineer, Swiss Re

For small batch loads using traditional ETL tools is less complicated and much simpler to implement. But if the ETL pipeline needs to handle large amounts of data and scale Kafka wins hands down.


Meet whale! 🐳 The stupidly simple data discovery tool.
Robert Yi, Co-founder & Chief Data Officer, Dataframe

A Python library that scrapes metadata and formats it as markdown. A Rust CLI interface to search over that data.


A Short Story About SQL’s Biggest Rival
Cedric Chin, Content Marketing, Holistics Software

We might have once lived in a world where QUEL and SQL would have continued to duke it out, and where the ‘best’ language might have found its own niches.


Using CTEs to do a binary search of large tables with non-indexed correlated data
David Christensen, Senior Software and Database Engineer, End Point

The initial query went from timing out in the webservice in question to returning results in a fraction of a second with basic binary search.


New BigQuery Integration for GA4 Properties
Charles Farina, Head of Innovation, Adswerve, Inc.

Free BigQuery export from GA to all customers.


How to set up a multi-touch attribution model
Cyprien Marcos, Business Intelligence Manager, Project A Ventures

We show you how you can easily set up a multi-touch attribution model to track website conversions with Google Analytics, Google Tag Manager and a Jupyter notebook.