Pipeline Data Engineering Academy home blog pages letters

The Data Janitor Letters - January 2020

Data engineering salon. News and interesting reads about the world of data.

Code reading is the single largest expense in software development, and very few even talk about it.

@girba


Stop Hiring Data Scientists.
Luke Posey, CEO, Spawner

Your ROI is suffering from an inability to hire properly.



GPT-2 and the Nature of Intelligence
Gary Marcus, CEO, Robust.AI

In essence, GPT-2 has been a monumental experiment in Locke's hypothesis, and so far it has failed.


Our Neophobic, Conservative AI Overlords Want Everything to Stay the Same
Cory Doctorow

Ultimately, machine learning is about finding things that are similar to things the machine learning system can already model.


Are you using #postgres via #docker for mac? Have you ever noticed EXPLAIN ANALYZE slowing down your queries by like 60x? The important takeaway is that our modern stacks are incredibly complex and fragile.

@felixge


ClickHouse Cost-Efficiency in Action: Analyzing 500 Billion Rows on an Intel NUC
Alexander Zaitsev, Co-founder, Altinity

A single ClickHouse server can be used to collect and monitor temperature data from 1,000,000 homes, find temperature anomalies, provide data for real-time visualisation and much more. Since it is a single server, setting it up, loading 500B rows and running sample queries is very easy.


Security in Machine Learning Engineering: A white-box attack and simple countermeasures
Flávio Clésio, Senior Machine Learning Engineer, MyHammer AG

After running a simple script based in using Scikit-Learn, I noticed there’s some latent vulnerabilities not only in terms of objects but also in regarding to have a proper security mindset when we’re developing ML models.


Fast IPv4 to Host Lookups
Mark Litwintschik, #BigData Consultant

My interest here is in seeing the performance differences between using PostgreSQL with a B-Tree index versus ClickHouse and its MergeTree engine for this use case. The performance gap in the hourly lookup rate favouring ClickHouse is off by an order of magnitude.