Categories
Data & Analytics

Practical Lakehouse Architecture By Gaurav Thalpati

In this post, I review Gaurav Ashok Thalpati’s 2024 book ‘Practical Lakehouse Architecture‘ published by O’Reilly Media.

Table of Contents

Introduction

I first found O’Reilly books a few years back in a Data Engineering-themed Humble Bundle. Since then, I’ve built an extensive library of both e-books and physical books, with many more on my Amazon wish list. At the start of 2025, I decided to actually start reading them…

So far, I’ve finished three. Now, I don’t feel compelled to review them all. But having finished Practical Lakehouse Architecture I decided to start the Shark Shelf. This will be an occasional series of review posts about books that I really like, or that deserve some fanfare. And yes – How To Solve It belongs on the Shark Shelf.

Now let’s talk about Practical Lakehouse Architecture.

The Author

Gaurav Ashok Thalpati hails from Pune, India, where he’s worked as an independent cloud data consultant for decades. He’s a blogger and YouTuber, holds multiple data certifications and is an AWS Community Builder.

In July 2024, O’Reilly published his first book, Practical Lakehouse Architecture.

The Book

From the Practical Lakehouse Architecture blurb:

This guide explains how to adopt a data lakehouse architecture to implement modern data platforms. It reviews the design considerations, challenges, and best practices for implementing a lakehouse and provides key insights into the ways that using a lakehouse can impact your data platform, from managing structured and unstructured data and supporting BI and AI/ML use cases to enabling more rigorous data governance and security measures.

Practical Lakehouse Architecture was released in July 2024. It is available in both physical and eBook forms from O’Reilly, Amazon US, Amazon UK and eBooks.

Motivations

Reading a book?! In 2025?! I know, right? This section examines my motivations for buying and reading Practical Lakehouse Architecture.

Project Wolfie

I recently wrote about the beginning of Project Wolfie. I kinda expected to have started coding by now. Instead, most of my work is currently on paper and whiteboards. But there’s a good reason for this.

Project Wolfie is greenfield. I don’t have any existing code or resources, and I can use modern tools freely. However, with this freedom comes responsibility. Every choice I make now affects the architecture and involves tradeoffs. As much as I want to start working on the deliverables, I also want to make sensible decisions that can withstand scrutiny.

My hope with Practical Lakehouse Architecture was that it would help me with critical areas like observability, CI/CD, and security. Because it’s not that there isn’t advice online…

Advice Spread Thin

Lakehouse architectures are relatively recent in the data landscape. As a result, their understanding is not as established as that of data warehouses and data lakes, and some aspects of Lakehouse architecture are still evolving.

Many Lakehouse resources are either brief overviews, opinionated deep dives into specific use cases or marketing posts acting as best practices. This makes it hard to find balanced advice. My hope with Practical Lakehouse Architecture was that it would offer clear, unbiased views.

Professional Curiosity

As of 2025, I’ve spent nearly a decade in technical data roles. And in that time I’ve seen massive changes in data management, ranging from a server cupboard in Stockport to huge, multi‑region distributed data platforms.

Over the years, I’ve cultivated a passion for data technology, evolving from writing blog posts and speaking at meetups to working as an AWS consultant. As an AWS Community Builder in the Data category, I can access early previews and best practices from AWS experts. Additionally, as an AWS User Group Leader, I help attendees and guest speakers discuss data patterns.

With this in mind, I was curious about what new insights Practical Lakehouse Architecture could offer me.

Book Review

Onto the review! In this section, I’ll summarise the chapters and examine what stood out in each.

Chapters 1 – 3

The first set of chapters introduces the foundations of Lakehouse architecture, comparing it with traditional models and exploring the importance of storage in modern data platforms.

Chapter 1: Introduction to Lakehouse Architecture lays the groundwork for the book, putting all readers on equal footing for the chapters ahead. Gaurav starts by defining and exploring the ideas and concepts of various data architectures. He then examines the characteristics, evolution and benefits of the Lakehouse architecture.

Chapter 1 can be viewed on the O’Reilly site.

Chapter 2: Traditional Architectures and Modern Platforms contrasts the Lakehouse architecture with traditional data lakes and data warehouses, outlining the benefits and limitations of each. Gaurav then shifts his focus to how modern cloud platforms have transformed these traditional architectures.

I like how Gaurav hasn’t dismissed lakes and warehouses here. Both are proven and well-understood options, and they are still the better choice in certain situations over Lakehouses.

Chapter 3: Storage: The Heart Of The Lakehouse examines the various factors surrounding data storage. Gaurav looks at row-based and column-based storage formats. He then explains the features and uses of Parquet, ORC, and Avro. He also compares newer open table formats, like Iceberg, Hudi, and Delta Lake, highlighting their similarities, differences, and use cases.

This is one area where the book really shines. Having topics like this explained clearly in one place, without having to go online, is incredibly useful!

Chapters 4 – 6

Next, these chapters focus on the operational and organisational elements of Lakehouse architectures. Topics include metadata management, compute engines, and governance. These elements are essential for effectively scaling and securing a modern data platform.

Chapter 4: Data Catalogs explores the purpose of data catalogs and the different types of metadata they can contain. It explains how catalogs support essential processes such as classification, governance, and lineage. Gaurav also compares data catalog implementations across AWS, Azure, and GCP.

Including multi-cloud examples both broadens the chapter’s scope and reinforces the cloud-agnostic nature of Lakehouse architecture – an important theme of the book.

Chapter 5: Compute Engines for Lakehouse Architectures examines compute options for batch and real-time data processing. Gaurav covers open-source tools such as Spark, Flink, and Presto, as well as cloud-native services like AWS Glue, Google BigQuery, and Databricks. He offers practical advice for selecting a compute engine, considering factors such as provisioning complexity, open-source support and AI/ML capabilities.

Chapter 6: Data and AI Governance and Security in Lakehouse Architecture explores governance and security, crucial areas for any production-ready data platform. Gaurav discusses core topics such as data quality, ownership, sensitivity and compliance. He also explores how governance responsibilities span both business and technical domains, emphasising the importance of organisational roles in maintaining control and oversight.

Chapters 7 – 9

Finally, these chapters focus on the practical realities of Lakehouse implementation – moving between theory and practice, and looking ahead to the architecture’s potential future.

Chapter 7: The Big Picture: Designing and Implementing a Lakehouse Platform examines considerations ranging from requirements gathering to defining business goals. Recommended Lakehouse zones are analysed and explained, and the expectations for each zone are defined. Finally, CICD is considered, and a sample design questionnaire is provided to help guide implementation planning.

Zones, or layers, are currently one of the most contentious areas of Lakehouse architectures. I like Gaurav’s stance on this – it’s somewhat similar to Simon Whiteley‘s. Yup – this video again.

Chapter 8: Lakehouse in the Real World does something I don’t see often – contrasting ideal scenarios with real-world events. It covers key stages in a Lakehouse’s development like analysis, testing and maintenance, examining what could go wrong and offering mitigation strategies.

This section is definitely accurate, as I’ve encountered some of these factors! It includes comparing greenfield and brownfield implementations, examining how business constraints affect technology choices, and considering if the desired RPO and RTO targets are financially and logistically possible.

Finally, Chapter 9: Lakehouse Of The Future looks ahead, exploring how Lakehouses might evolve in the years to come. Gaurav discusses potential intersections with trends like Data Mesh, Zero ETL and AI model integration. He also introduces emerging technologies like Delta UniForm and Apache XTable, which aim to improve interoperability across data processing systems and query engines. Finally, he touches on future innovations such as Apache Puffin and Ververica Streamhause that could further transform the data landscape.

(Sidenote: this Dremio post explores UniFrom and XTable very well.)

Thoughts

Having finished the book (in two weeks no less!), here are my thoughts:

Firstly, it’s not an intimidating read. At 283 pages, Practical Lakehouse Architecture is authoritative and content-rich without being overly complex or wordy. It also uses familiar O’Reilly conventions and style. When placed next to similar books I own, like The Data Warehouse Toolkit (600 pages) and Designing Data-Intensive Applications (614 pages), it’s easier to pick up and get into. And with some books, that’s a battle in itself!

PXL 20250417 143214247~2

Also, Practical Lakehouse Architecture‘s flow is very natural and the chapters make their points very well. I find some technical books, including some O’Reilly ones, hard to follow because they feel disjointed and jargon-heavy. That wasn’t the case here. The book held my attention very well throughout, and will serve me well as a future reference point.

Practical Lakehouse Architecture also feels like it will be relevant for a while. Some of my technical books have sections that are now outdated due to rapid technological changes. Here, ideas such as decoupled storage and compute, unified governance, and data personas will continue to matter for years to come.

Overall, an excellent book that I enjoyed reading.

Summary

In this post, I reviewed Gaurav Ashok Thalpati’s 2024 book ‘Practical Lakehouse Architecture‘ published by O’Reilly Media.

Ultimately, Practical Lakehouse Architecture is a well-written and informative book that caters to a wide range of skills. It’s a strong addition to the O’Reilly catalogue and complements titles like Rukmani Gopalan‘s 2022 book, The Cloud Data Lake, which I’m currently reading. It’s a great knowledge source for this constantly evolving modern data architecture.

If this post has been useful then the button below has links for contact, socials, projects and sessions:

SharkLinkButton 1

Thanks for reading ~~^~~

Categories
Me

Fixing A Broken Tap With George Pólya

In this post, I use the principles in “How To Solve It” by George Pólya to diagnose and fix my broken kitchen tap. Yes – really.

Table of Contents

Introduction

Bit of a change this time. Let me set the scene.

It’s time to top up Wolfie’s water bowl, so to the kitchen sink we go. Two unexpected events happen when the tap is turned on:

  1. The water flow goes mental and starts spraying everywhere.
  2. Something gets launched out of the tap into the water bowl:
Aerator Initial

My first thought is that the tap is broken and that I’ll need to buy a new one. And then get a plumber to fit it. Great.

But wait. Last year I fixed some broken panes in our greenhouse. This year I’ve built a potting bench, fixed a leaky water butt and mounted a shower rail. Is this a problem I can solve?

This Doesn’t Sound Like Technology

True. It is, though, a chance to write a post I’ve fancied doing for a while. And this set of circumstances was too compelling to pass up.

Last year I became aware of a book called “How To Solve It” by George Pólya. The recommendation included a chart based on the book, similar to this one:

Source: KPMathematics

What struck me was how close these steps were to the Systems Development Life Cycle I was taught at college. My interest was piqued.

Around the same time, I was getting to grips with my new Data Engineer role. Since then, I’ve used the “How To Solve It” principles to help me complete work both for my role and for this blog.

Now, faced with a new unfamiliar situation, I can demonstrate how the “How To Solve It” principles can be applied beyond mathematics. In this case I’m fixing a broken tap, but this could just as easily be a Python bug, a poorly performing SQL query or an AWS authentication issue.

Here is my plan:

  • Firstly, I’ll examine the “How To Solve It” book.
  • Secondly, I’ll look at the author of the book – George Pólya.
  • Then I’ll look at each of the George Pólya principles, relating them to the broken tap problem I want to solve.

Let’s start with the book.

How To Solve It

Source: Penguin

‘A superb book on how to think fresh thoughts … A walk inside Pólya’s mind as he builds up maxims on how to comprehend a problem, how to build up a strategy, and then how to test it.’

David Bodanis, Guardian

‘Everyone should know the work of George Polya on how to solve problems’

Marvin Minsky

How To Solve It can be bought on Penguin’s website.

History

How To Solve It was written in 1945 by George Pólya. Since then, the book has stayed in print and has been translated into over a dozen languages. It has sold more than 1 million copies, making it one of the most widely circulated mathematics books in history.

Four Principles

How To Solve It explains in non-technical terms how to think about invention, discovery, creativity and analysis. Central to this are four principles:

  1. First. You have to understand the problem.
  2. Second. Find the connection between the data and the unknown. You may be obliged to consider auxiliary problems if an immediate connection cannot be found. You should obtain eventually a plan of the solution.
  3. Third. Carry out your plan
  4. Fourth. Examine the solution obtained.

The book also poses several questions for each principle. They aim to stimulate thought and produce the answers needed to satisfy each principle.

These can be seen in the below image from the book’s first edition:

2022 10 25 HowToSolveItInsideCover

They are also available as text from the University of Utah’s summary of 1957’s second edition.

Although How To Solve It was written with mathematics in mind, the book’s principles have been applied to additional disciplines over the decades. Pólya seems to take great care not to limit the scope of How To Solve It, speaking of problems in general terms throughout the book.

One such example is this extract:

A great discovery solves a great problem but there is a grain of discovery in the solution of any problem. Your problem may be modest; but if it challenges your curiosity and brings into play your inventive faculties, and if you solve it by your own means, you may experience the tension and enjoy the triumph of discovery.

“How To Solve It” – George Pólya

How To Solve It remains in high regard to this day. The Math Sorcerer produced this video in July 2022, and his affection for the book is clear.

Sources

Next, let’s look at the book’s author – George Pólya.

George Pólya

George Pólya
Source: MacTutor

George Pólya (December 13 1887 – September 7 1985 aged 97) was a Hungarian mathematician. He was a professor of mathematics from 1914 to 1940 at ETH Zürich in Hungary, and from 1940 to 1953 at Stanford University in North America having moved there during World War 2.

After retiring from Stanford, Pólya remained active in his field. He continued his association with Stanford as Professor Emeritus well into his 90s and taught a course in their Computer Science Department in 1978.

Works

In pure mathematics, Pólya made important discoveries in fields including probability, real and complex analysis, combinatorics, geometry, number theory and mathematical physics.

Several of his discoveries bear his name, including:

Pólya also authored and contributed to numerous books and articles throughout his life, a selection of which can be seen on Wikipedia.

Recognition

Pólya was well-regarded by his peers and awards given to him included:

“He has given a new dimension to problem-solving by emphasizing the organic building up of elementary steps into a complex proof, and conversely, the decomposition of mathematical invention into smaller steps.”

and:

“Problem solving a la Polya serves not only to develop mathematical skill but also teaches constructive reasoning in general.”

Sources

There is much more to know about Pólya. The following links detail his life, works and legacy in far greater detail:

Now I’m going to apply the George Pólya principles to my broken tap!

Applying The Principles

In the following sections, I will apply each George Pólya principle from How To Solve It to my tap problem. In each section I will:

  • Quote each principle in full.
  • State the supporting questions that I’ll answer.
  • Relate these to my tap problem.

Principle 1: Understanding The Problem

First. You have to understand the problem.

“How To Solve It” – George Pólya
  • What is the unknown? What are the data? What is the condition?

The Unknown

The unknown is what I want. Here, I want to restore the tap’s original flow rate.

The Data

The data is the information available. This is what was expelled from the tap:

Aerator Initial

Other data:

  • Water was still flowing from the tap.
  • The flow of water was under more pressure than before.

The Condition

The condition is the link between the unknown and the data. Here, whatever has come out of the tap has changed the water’s flow but hasn’t obstructed it.

Principle 2: Devising A Plan

Second. Find the connection between the data and the unknown. You may be obliged to consider auxiliary problems if an immediate connection cannot be found. You should obtain eventually a plan of the solution.

“How To Solve It” – George Pólya
  • Do you know a related problem? Do you know a theorem that could be useful?
  • Here is a problem related to yours and solved before. Could you use it? Could you use its result? Could you use its method?

I searched Google for the phrase “kitchen tap water flow changed”. There was an immediate common thread in the results:

2022 10 26 GoogleResults

The Estes Services link gave a useful definition of an aerator:

“The aerator on your faucet is a mesh screen and covers the water outlet. The aerator catches minerals and other debris in your pipes. It also helps save water by introducing air into the water stream.”

“How to Fix Low Water Pressure in Kitchen” on Estes

Getting somewhere! This led me to “Everything you Need to Know About Tap Aerators” on TapWarehouse, which includes this:

“They save you water by adding oxygen to the flow (and that means saving pennies) and reduce splashing around the bowl of the basin.”

“Everything you Need to Know About Tap Aerators” on TapWarehouse

Mesh screen? Reduced splashing? This definitely sounded like the right area!

Solved Problems

At this point, what came out of the tap sounded very much like an aerator. However, there’s no cleaning something that’s disintegrated, so it was time for a replacement.

TapWarehouse to the rescue again:

“If your existing tap already has an aerator, simply turn it anticlockwise until it’s unscrewed from the tap. Then, simply screw in the new aerator until it’s secure, being careful not to screw it too tightly.”

How can I Install a Tap Aerator?” on TapWarehouse

TapWarehouse also gave advice on aerator types. There are male and female aerators depending on the tap. There are also various aerator sizes ranging from 16mm to 28mm.

Planned Solution

Based on this research, the solution needed the following steps:

  • Remove the broken aerator.
  • Confirm the aerator type.
  • Confirm the aerator size.
  • Buy a replacement aerator.
  • Fit the replacement aerator.
  • Test the replacement aerator.

Principle 3: Carrying Out The Plan

Third. Carry out your plan.

“How To Solve It” – George Pólya
  • Carrying out your plan of the solution, check each step. Can you see clearly that the step is correct?

Time to remove the broken aerator! Straight into a problem. It wouldn’t budge.

Fortunately, there’s a DIY StackExchange! Advice ranged from WD-40 to vinegar to a hammer and chisel (!), but in the end I used my heat gun on the aerator and removed it with pliers.

I then determines that I needed a 24mm male aerator as a replacement. One trip to B&Q later and:

Aerator Replacement

Fitting the new aerator was a simple matter of screwing it on.

Principle 4: Looking Back

Fourth. Examine the solution obtained.

“How To Solve It” – George Pólya
  • Can you check the result?

BEHOLD:

Tap with running water

Summary

In this post, I used the principles in “How To Solve It” by George Pólya to diagnose and fix my broken kitchen tap. I applied each of the Pólya principles to my problem, and was able to solve it by answering the relevant questions and doing some investigation with the knowledge gained.

If this post has been useful, please feel free to follow me on the following platforms for future updates:

Thanks for reading ~~^~~