Data Lake House

Written by Ronald Baan

Ronald is a data enthusiast who spends his time sharing his passion in data with others.

25 July 2022

In case you’re pretty content with your data lake (or not at all), it’s time to upgrade the implementation around the data lake.

While logically a data lake can still consist of raw, enriched and curated (or bronze, silver and gold, just pick a label) and you can still apply all the data mesh principles, the technology has already taken the next step in that you don’t have to physically create these layers, that you don’t have to set everything up in advance with DWH, SQL and Spark instances.

The Data Lakehouse has a bottom side (for ingestion of data) and a top side (for addressing all kinds of awesome functionality). What happens in between, you have logical control over that, but physically the Data Lakehouse takes care of it.

[Just a quick note, to pique the interest of those who may not have gotten to this yet].

For completeness, Data Lakehouse is a term frequently used by Databricks. However, it is quite possible to use cloud components to create your own data lakehouse or look at what other vendors have similar products.

You may also like…

Layers of Knowledge (Graph)

Layers of Knowledge (Graph)

You can model reality intricately, you can also do it smartly and then make sure systems can easily handle it as well....

The many layers of Data Lineage

The many layers of Data Lineage

Nice article in Medium on #datalineage by Borja Vázquez Barreiros.Data lineage is a tricky one, though so important if...

When Data is not Data

When Data is not Data

It remains funny and sometimes this mistake is made in data science or data engineering.