Modern data engineering: dbt, Iceberg and the lakehouse come together
Actualizado: 2026-05-03
The analytical data stack has spent the last decade promising a clean separation between storage, engine and transformation, but only in the last twelve months has that promise stopped sounding like a brochure. Apache Iceberg with REST catalogs, dbt as a declarative layer, and an interchangeable query engine underneath have become the real skeleton of new analytics projects.
Key takeaways
- Iceberg has won the open table format adoption war for mundane rather than technical reasons: Databricks, Snowflake, BigQuery and Redshift all support it natively.
- The Iceberg 1.6 REST specification standardized an HTTP API any engine can consume; today Trino, Spark, DuckDB and Python can read and write the same tables through a single service.
- dbt Fusion, the Rust rewrite of dbt-core, introduces static column and type validation at compile time and accepts Iceberg as a native target.
- Interactive query performance over Iceberg in an open engine still trails a well-tuned closed warehouse, sometimes by a factor of two or three.
- The open lakehouse with Iceberg and dbt pays off when three conditions hold simultaneously: sufficient volume, workload heterogeneity, and data engineering maturity.
Why Iceberg won the table-format war
By late 2025 the reality is that Iceberg has won the adoption war for mundane rather than technical reasons. Databricks bought Tabular in 2024 and began converging Delta and Iceberg. Snowflake, BigQuery and Redshift support Iceberg natively. AWS Glue offers Iceberg as the default format in new catalogs. When your three main warehouses and the largest open analytics engine converge on the same format, the choice stops being technical.
Iceberg isn’t best at any single aspect individually. What it kept is the cleanest abstraction between specification and engine: its metadata is JSON and Avro, its hidden partitioning model avoids classic ill-written-predicate pitfalls, and its time-travel and schema-evolution support is the most rigorous of the three.
The 2025 twist has been the REST catalog. The Iceberg 1.6 REST specification standardized an HTTP API any engine can consume. Today Trino, Spark, DuckDB and a Python client can all read and write the same tables through a single service.
dbt on top: what Fusion changes
In May dbt Labs unveiled the Fusion engine, a Rust rewrite of dbt-core that compiles the model graph, validates SQL before executing, and emits optimized execution plans. Fusion introduces static column and type validation at compile time, something long missing. It also accepts Iceberg as a native target rather than one more adapter.
The combination of dbt plus Iceberg solves the real problem most midsize data platforms face: vendor lock-in doesn’t come from the engine, it comes from the data. If your tables live in Iceberg with a REST catalog, moving from Snowflake to Trino stops being a full rewrite and becomes a connection change.
Where it still hurts
Interactive query performance over Iceberg in an open engine like Trino still trails a well-tuned closed warehouse, sometimes by a factor of two or three. Snowflake and BigQuery have query optimizers, result caches and resource management refined over a decade.
The second issue is operating the catalog itself. A REST catalog is critical infrastructure: if it goes down, every engine loses a coherent view of the tables. Commercial vendors like Tabular handle this, but self-hosting with Lakekeeper means HA, metadata backups, and a recovery plan.
The third front is write cost. Iceberg optimizes for analytical reads with many partitions and large Parquet files, but keeping that optimum requires regular compaction and cleanup jobs. Forgetting to compact a streaming-ingested table leads in weeks to thousands of small files and minute-long queries.
When it pays off
The open lakehouse with Iceberg and dbt pays off when three conditions hold simultaneously:
- Enough volume for the closed warehouse cost to be a significant budget line, typically above six figures annually.
- Workload heterogeneity: Iceberg shines when you have to mix Python, Spark, interactive SQL and machine learning over the same tables without duplicating data.
- Data engineering maturity to operate your own infrastructure. Teams arriving from a closed warehouse often underestimate this curve.
Conclusion
The Iceberg plus dbt stack is today the most sensible answer for organizations wanting to escape vendor lock-in without giving up quality declarative transformations. Tools have matured enough that operational friction is reasonable, but the learning curve is real and shouldn’t be underestimated.