Ryft Blog | Insights on Iceberg, Lakehouse & Data Optimization

Featured

News

The State of Apache Iceberg in the Enterprise (2026)

Independent research based on a survey of 252 data leaders examines Apache Iceberg adoption, operational maturity, performance, and governance in enterprise production environments.

Yossi Reitblat

February 25, 2026

February 19, 2026

News

Announcing the Ryft Context Layer

Ryft already monitors Iceberg lakehouses for optimization and observability. That means we already collect the signals that matter most for context: schema and structure, query patterns across every engine (Spark, Trino, Snowflake, Athena), write and ingestion behavior, freshness, and statistics. It's the same information a senior analyst would use to understand a table, captured in real time, at infrastructure scale.The Lakehouse Context Layer combines these signals into rich, agent-readable context for every table. Instead of starting from a blank documentation page, your tables come with context that reflects how the data actually behaves: what gets queried, how it's joined, how often it's updated, what the common access patterns look like.

Guy Yasoor

Yuval Yogev

April 11, 2026

April 7, 2026

News

The State of Apache Iceberg in the Enterprise (2026)

Independent research based on a survey of 252 data leaders examines Apache Iceberg adoption, operational maturity, performance, and governance in enterprise production environments.

Yossi Reitblat

February 25, 2026

February 19, 2026

Engineering

Migrating from Hive to Apache Iceberg

Most companies adopting Iceberg aren't starting from scratch. They have years of data in Hive tables or raw Parquet files, and migrating that data without disrupting existing pipelines is the real challenge.

Yossi Reitblat

February 6, 2026

February 4, 2026

News

Announcing Iceberg Backups In Ryft

Today we are introducing something we are really excited about - Iceberg Backups: a new way to manage snapshots that gives you reliable recovery points, predictable costs, and zero manual overhead.

Yossi Reitblat

January 14, 2026

January 13, 2026

Engineering

Unlocking Faster Iceberg Queries: The Writer Optimizations You’re Missing

Apache Iceberg query performance is often limited long before a query engine gets involved. In a joint post with Firebolt, we break down why writer configuration, file layout, and continuous table maintenance matter most.

Yuval Yogev

January 12, 2026

January 6, 2026

Engineering

CDC Strategies in Apache Iceberg

One of the most common patterns in modern data lakes is replicating operational databases into analytical storage. We explore the different CDD strategies with Apache Iceberg.

Yuval Yogev

December 31, 2025

December 30, 2025

Engineering

Apache Iceberg V3: Is It Ready?

Apache Iceberg V3 is a huge step forward for the lakehouse ecosystem. The V3 specification was finalized and ratified earlier this year, bringing several long-awaited capabilities into the core of the format: efficient row-level deletes, built-in row lineage, better handling of semi-structured data, and the beginnings of native encryption. This post breaks down the major features, the current state of implementation, and what this means for real adoption.

Guy Yasoor

December 22, 2025

December 16, 2025

News

Announcing Ryft Data Retention & Compliance Enforcement for Apache Iceberg

Today, we’re introducing two new capabilities in Ryft: Automated Data Retention and Data Compliance Enforcement for Apache Iceberg™. These features integrate directly into the Ryft platform to ensure efficient, policy-driven data deletion and compliance, working seamlessly alongside table maintenance and optimization.

Yuval Yogev

December 8, 2025

Engineering

Streaming with Apache Iceberg: The Operational Problems at Scale

Streaming into Iceberg creates three operational problems most teams don't see coming: small files pile up faster than you can compact them, storage costs climb because you're paying for data you've already replaced, and merges take much longer than they should. After seeing these problems in real production data lakes, we decided to share more about the causes and possible solutions.

Yuval Yogev

December 8, 2025

December 1, 2025

Engineering

How to Choose an Apache Iceberg Catalog

Apache Iceberg has become the table format of choice for building open data lakehouses. It solves long-standing problems around ACID transactions and engine interoperability. This post covers how to choose an Iceberg Catalog for production.

Guy Yasoor

December 8, 2025

November 21, 2025

Engineering

How to Fix Corrupted Iceberg Tables

In Part 1 and Part 2 of this series, we analyzed two different scenarios that led to Iceberg table corruption - from silent overwrites to inconsistent metadata. Since publishing these posts we have received more requests from people who encountered these situations on how to safely repair those tables. In this post, we’ll focus on the remediation process: identifying what’s affected, how to safely clean it up, and how to prevent further damage.

Omer Hadari

November 7, 2025

November 4, 2025

Engineering

Handling Commit Conflicts in Apache Iceberg: Patterns and Fixes

Commit conflicts in Apache Iceberg are one of those problems that seem rare - until you start operating at scale. The first time a long-running compaction job fails after hours of compute, or a CDC pipeline spends half its time retrying commits, you realize this isn’t a corner case. It’s a core operational challenge that directly impacts cost, latency, and reliability.This post covers what commit conflicts are, why they happen, and how to fix them without creating new problems in the process.

Yossi Reitblat

October 23, 2025

Engineering

Data Retention in Apache Iceberg: Implementation Details and Best Practices

Data retention in Apache Iceberg is one of those critical operations that seems simple until you implement it at scale. Delete old data, save money, stay compliant - straightforward enough. But the implementation details matter, and getting them wrong can mean failed compliance audits, runaway storage costs, or accidentally purging the wrong data.

Yuval Yogev

January 27, 2026

October 3, 2025

News

Announcing Ryft Adaptive Optimization

Today, we’re officially introducing Ryft Adaptive Optimization - always-on, dynamic optimization engine for Apache Iceberg™. Our engine continuously compacts, rewrites, indexes, and reorders data based on how your tables are actually used, delivering up to 5× faster queries, 10x storage reduction, and 7x better compaction efficiency compared to other engines.

Yossi Reitblat

February 25, 2026

September 17, 2025

Engineering

Iceberg Table Corruption and Data Loss in the Wild: Part 2

Data and metadata integrity issues in Iceberg, particularly in streaming workloads, often present similar patterns. In these two posts we covered two similar Iceberg table corruption issues, which manifested in almost the exact same way - data file overwrites. Each time, the underlying reason was different.

Omer Hadari

October 21, 2025

August 20, 2025

Engineering

Why Apache Iceberg Finally Unlocks Security Data Lakes

Security data lakes are notoriously hard to build and operate efficiently. Apache Iceberg changes the equation. It’s an open table format purpose-built for scalable, flexible, and cost-effective analytics - and it’s quickly becoming the new standard for modern cybersecurity data lakes.Here’s why.

Yuval Yogev

Guy Yasoor

October 23, 2025

August 6, 2025

Engineering

GDPR Compliance with Apache Iceberg: A Practical Guide

GDPR compliance boils down to one critical requirement: when a user requests deletion of their data, you must delete ALL traces of their “user identifiable information” across ALL systems and copies. Not hide it. Not mark it as deleted. Delete it completely

Yossi Reitblat

January 27, 2026

July 30, 2025

News

High performant graph queries on Apache Iceberg powered by Ryft and PuppyGraph

Graph workloads traditionally rely on specialized graph storage systems, but these come with significant challenges in scalability, performance, and data duplication. By combining the storage optimization capabilities of Ryft on Apache Iceberg with PuppyGraph’s advanced graph query engine, teams can run high-performance, scalable graph queries directly on their Iceberg data lake - without moving or duplicating data.

Yossi Reitblat

October 23, 2025

July 21, 2025

Engineering

Iceberg Table Corruption and Data Loss in the Wild: Part 1

In this post, we want to share a story about a sneaky bug we encountered that caused table corruption, as well as silent data loss in Iceberg tables. If you're using Iceberg, if your ingestion is based on a streaming pipeline, if you're an AWS EMR user, or if you just like a good bug hunt - read on.

Omer Hadari

October 27, 2025

July 16, 2025

News

Ryft Raises $8M to Help Enterprises Take Control Over Their Data

For years, cloud giants like Snowflake, Databricks, Microsoft, and Google have made billions by offering enterprises an easy way to store and analyze data, as long as that data stays within their platforms. But that convenience came at a hidden cost: soaring expenses, rigid infrastructure, and deep vendor lock-in that slowed innovation and made AI adoption harder. Still, many companies remain stuck in outdated systems, deterred by the complexity of managing their data independently.

Yossi Reitblat

October 23, 2025

July 9, 2025

News

Unlocking Iceberg management for everyone

Ryft, in many ways, is a story 15 years in the making. Yuval Yogev, Guy Gadon and I went to the same high school, worked together at 8200, building high-scale data infrastructure, and went our separate ways - all to realize that we really enjoy solving complicated data infrastructure problems together with the people we love.

Yossi Reitblat

October 27, 2025

July 8, 2025

Engineering

Athena vs. Snowflake on Iceberg: Performance and Cost Comparison on TPC-H

How do Amazon Athena and Snowflake compare when running real-world analytics on Apache Iceberg tables? We ran a TPC-H benchmark to break down the trade-offs in performance, cost, and architecture.

Yuval Yogev

Yossi Reitblat

October 27, 2025

May 26, 2025

Engineering

Making Sense of Apache Iceberg Statistics

Apache Iceberg™ is known for its rich metadata model, and one of its most powerful (but often confusing) features is its support for statistics. In this blog post we will break them down, helping you understand what exists today, what you should configure, and what’s coming next.

Guy Yasoor

October 27, 2025

May 26, 2025

Oops! No matches found

Try tweaking your filters or resetting them to see everything again.

Reset All