Let me give you the straight story on a challenge I'm seeing across the enterprise landscape: Snowflake costs are climbing, and many organizations are looking for strategic ways to control their data platform spend without sacrificing capability. The good news? There's a proven approach that puts you back in the driver's seat.
Snowflake delivers exceptional performance and ease of use—that's not up for debate. But here's what I'm seeing in the field: as data volumes grow and analytics teams scale their workloads, compute costs can become unpredictable. Every query spins up compute resources, and those credits add up fast. For organizations running continuous analytics, real-time dashboards, or heavy transformation workloads, the monthly bill can become a serious line item.
What's our objective here? We want to maintain the analytical flexibility that Snowflake provides while controlling where we spend our compute dollars. The solution lies in understanding a fundamental truth about modern data architecture: storage and compute don't have to be coupled.
Apache Iceberg is an open table format that's changing how we think about data lake architecture. It brings the reliability and simplicity of SQL tables to cloud object storage while enabling multiple compute engines—Spark, Trino, Flink, Presto, and yes, Snowflake—to work with the same tables simultaneously.
Here's where it gets interesting for cost control: Iceberg tables store their data and metadata files in external cloud storage that you manage. When Snowflake connects to these external Iceberg tables, something significant happens from a billing perspective:
Snowflake does not charge storage costs for Iceberg tables. Your cloud storage provider bills you directly—and hyperscaler object storage (S3, GCS, Azure Storage) is significantly less expensive than Snowflake-managed storage.
But storage savings are just the opening move. The real strategic advantage is in compute.
When you store your data in Apache Iceberg format on hyperscaler object storage, you gain the ability to choose which compute engine runs your queries. Need to run a heavy transformation job? Spin up a Spark cluster on AWS EMR, Google Dataproc, or Azure HDInsight. Running ad-hoc analytics? Fire up Trino or Presto. Want the Snowflake experience for specific workloads? Connect Snowflake to your Iceberg tables using an external volume—you'll only pay for the compute cycles you actually use.
This is what I call compute portability, and it's a game-changer for cost management.
The hyperscalers offer reserved instances, spot pricing, and committed use discounts that can dramatically reduce your compute costs compared to on-demand Snowflake credits. You're not locked in—you're making strategic choices based on workload requirements and budget constraints.
Now let's talk execution. How do you get your data into Apache Iceberg format continuously and reliably? This is where Oracle GoldenGate 23ai Distributed Analytics and Applications (DAA) delivers.
GoldenGate's Iceberg Replicat writes directly to Iceberg tables without requiring a SQL engine. It uses the Iceberg Java SDK along with object storage-specific SDKs to write data directly to your hyperscaler storage. This means:
The Replicat process handles all the complexity of Iceberg's specification—metadata management, snapshot creation, and delete file generation—so your data arrives ready for analytics.
Here's how a typical implementation looks:
Source Systems → Oracle, SQL Server, PostgreSQL, or other supported databases where your transactional data lives.
Oracle GoldenGate 23ai DAA → Captures changes in real-time and replicates to Iceberg format. Supports multiple catalog types including AWS Glue, Polaris, Nessie, and REST catalogs.
Hyperscaler Object Storage → Your data lands in S3, GCS, or ADLS in open Parquet format, organized as Iceberg tables.
Compute Engines → Choose your weapon based on the mission:
Snowflake (Optional) → Query your Iceberg tables through catalog integration. You get Snowflake's query experience without Snowflake storage costs, and you only pay compute when you actually query.
Let's look at a practical scenario. Say you have 50TB of analytical data that's refreshed continuously throughout the day.
Traditional Snowflake Approach:
Iceberg + GoldenGate Approach:
Organizations I've worked with have seen 40-60% reductions in their total data platform costs using this approach—and they've gained flexibility they didn't have before.
For teams ready to execute, here's what you need to know about the GoldenGate configuration. The Iceberg Replicat supports automatic configuration—set gg.target=iceberg and the handler autoconfigures the required components.
Key configuration properties:
gg.target=iceberg
gg.eventhandler.iceberg.catalogType=glue # or nessie, polaris, rest
gg.eventhandler.iceberg.fileSystemScheme=s3://
gg.eventhandler.iceberg.awsS3Bucket=your-iceberg-bucket
gg.eventhandler.iceberg.awsS3Region=us-east-2
GoldenGate handles automatic table creation, operation aggregation, and proper Iceberg metadata management. The default flush interval is 15 minutes, configurable based on your latency requirements.
Once your data is in Iceberg format on your hyperscaler storage, Snowflake connects through an external volume and catalog integration. Snowflake supports both scenarios:
Either way, your Iceberg tables show up in Snowflake like native tables, but the storage costs stay with your hyperscaler agreement—where you likely have better rates and more control.
When this architecture is implemented correctly, you achieve several strategic objectives:
If you're facing Snowflake cost challenges, here's my recommendation: start with a single high-volume table or workload. Implement GoldenGate 23ai DAA replication to Iceberg on your hyperscaler of choice. Measure the cost differential over 30 days. The numbers will tell the story.
This isn't about replacing Snowflake—it's about using the right tool for each part of the mission. Snowflake excels at interactive analytics and business intelligence. Spark excels at heavy transformations. Iceberg lets them coexist on the same data without duplicating storage or sacrificing capability.
Outstanding work comes from making strategic architectural decisions. This is one of them.
Have questions about implementing this architecture? Let's coordinate. Reach out to discuss how RheoData can help you achieve your data platform objectives.