Pipeline Data Engineering Academy home blog pages letters

The Data Janitor Letters - July 2020

Data engineering salon. News and interesting reads about the world of data.

Some SQL Tricks of an Application DBA
Haki Benita, Development Team Lead, PCENTRA

Databases are the backbone of most modern systems, so taking some time to understand how they work is a good investment for any developer!


Sessions for analysis, the eternal fiction
Randy Au, Quantitative UX Researcher, Google

There are a vast multitude of hypotheses and potential narratives we could attach to any session.


Our journey to a new data warehouse
Ana Gulevskaia, BI Analyst, Omio

How do the benefits of 3NF and SF fit in here?


1.1 Billion Taxi Rides using OmniSciDB and a MacBook Pro
Mark Litwintschik #BigData Consultant

The Q1 time is the fastest for any workstation benchmark I've done. To get this level of performance on a regular piece of office equipment is a big game changer.


Big Data Small GPU, No Problem
Rodrigo Aramburu, CEO, BlazingSQL

BlazingSQL is no longer limited by available GPU memory for query execution.


Evolution of the Modern Data Warehouse
Paige Roberts, Open Source Relations Manager, Vertica

A data warehouse is essentially a business-driven, enterprise-centric and technology-based solution.


How much can you trust your data?
Ellen König, Senior Data Engineer, ThoughtWorks

Data quality assessments are an effective, but often overlooked way to make your company’s data products more trustworthy.


Let's build a Full-Text Search engine
Artem Krylysov, Senior Software Engineer, Datadog

Despite its simplicity, it can be a solid foundation for more advanced projects.


How to make simple Geolocation service
Max Kostinevich

On Cloudflare Workers I've got almost x10 better performance in comparison to AWS.


Apache Arrow 1.0.0 Release

The 1.0.0 release indicates that the Arrow columnar format is declared stable, with forward and backward compatibility guarantees.