Pipeline Data Engineering Academy home blog pages letters

The Data Janitor Letters - March 2021

Data engineering salon. News and interesting reads about the world of data.

Star Schema Benchmark: ClickHouse Nails Cost-Efficiency Challenge Against Druid & Rockset
Alexander Zaitsev, Co-Founder & CTO, Altinity

When we use the same schema approaches as Druid and Rockset, ClickHouse significantly outperformed both whilst using the cheaper AWS setup.


Speeding up SQL queries by orders of magnitude using UNION
Ben Levy and Christian Charukiewicz, Partners and Principal Software Engineers, Foxhound Systems

SQL’s UNION operation is not usually thought of as a means to boost performance.


Building the world's fastest website analytics
Jack Ellis, Co-founder, Fathom

Performing a migration is such a high adrenaline, stressful task.


SQLite is not a toy database
Anton Zhiyanov

Whether you are a developer, data analyst, QA engineer, DevOps person, or product manager - SQLite is a perfect tool for you.


Taming the Dependency Hell with dbt
Rafael Barbosa, Data Engineering Team Lead, WeTransfer

We’re using dbt to simplify the management of the views and build more trust in the data we store in our data warehouse.


Storage size and generation time in popular file formats
Barthelemy Ngom, Solution Architect & Data Engineer, Adaltas

For archiving it is preferable to choose column based format and ORC.


Why We Don’t Use Docker (We Don’t Need It)
Nicky Rees, MeeZeeCo

We get a single binary.