Controlling Snowflake Costs: The Apache Iceberg Strategy with Oracle GoldenGate 23ai DAA

        Bobby Curtis

        Controlling Snowflake Costs: The Apache Iceberg Strategy with Oracle GoldenGate 23ai DAA

        Let me give you the straight story on a challenge I'm seeing across the enterprise landscape: Snowflake costs are climbing, and many organizations are looking for strategic ways to control their data platform spend without sacrificing capability. The good news? There's a proven approach that puts you back in the driver's seat.

        The Cost Challenge

        Snowflake delivers exceptional performance and ease of use—that's not up for debate. But here's what I'm seeing in the field: as data volumes grow and analytics teams scale their workloads, compute costs can become unpredictable. Every query spins up compute resources, and those credits add up fast. For organizations running continuous analytics, real-time dashboards, or heavy transformation workloads, the monthly bill can become a serious line item.

        What's our objective here? We want to maintain the analytical flexibility that Snowflake provides while controlling where we spend our compute dollars. The solution lies in understanding a fundamental truth about modern data architecture: storage and compute don't have to be coupled.

        Apache Iceberg: The Strategic Foundation

        Apache Iceberg is an open table format that's changing how we think about data lake architecture. It brings the reliability and simplicity of SQL tables to cloud object storage while enabling multiple compute engines—Spark, Trino, Flink, Presto, and yes, Snowflake—to work with the same tables simultaneously.

        Here's where it gets interesting for cost control: Iceberg tables store their data and metadata files in external cloud storage that you manage. When Snowflake connects to these external Iceberg tables, something significant happens from a billing perspective:

        Snowflake does not charge storage costs for Iceberg tables. Your cloud storage provider bills you directly—and hyperscaler object storage (S3, GCS, Azure Storage) is significantly less expensive than Snowflake-managed storage.

        But storage savings are just the opening move. The real strategic advantage is in compute.

        Offloading Compute to the Hyperscaler

        When you store your data in Apache Iceberg format on hyperscaler object storage, you gain the ability to choose which compute engine runs your queries. Need to run a heavy transformation job? Spin up a Spark cluster on AWS EMR, Google Dataproc, or Azure HDInsight. Running ad-hoc analytics? Fire up Trino or Presto. Want the Snowflake experience for specific workloads? Connect Snowflake to your Iceberg tables using an external volume—you'll only pay for the compute cycles you actually use.

        This is what I call compute portability, and it's a game-changer for cost management.

        The hyperscalers offer reserved instances, spot pricing, and committed use discounts that can dramatically reduce your compute costs compared to on-demand Snowflake credits. You're not locked in—you're making strategic choices based on workload requirements and budget constraints.

        Oracle GoldenGate 23ai DAA: Your Data Pipeline

        Now let's talk execution. How do you get your data into Apache Iceberg format continuously and reliably? This is where Oracle GoldenGate 23ai Distributed Analytics and Applications (DAA) delivers.

        GoldenGate's Iceberg Replicat writes directly to Iceberg tables without requiring a SQL engine. It uses the Iceberg Java SDK along with object storage-specific SDKs to write data directly to your hyperscaler storage. This means:

        • Real-time data replication from your source databases to Iceberg tables
        • No intermediate SQL engine consuming compute resources during the write process
        • Direct writes to S3, GCS, or Azure Data Lake Storage in Parquet format
        • Full ACID transaction support with proper handling of inserts, updates, and deletes

        The Replicat process handles all the complexity of Iceberg's specification—metadata management, snapshot creation, and delete file generation—so your data arrives ready for analytics.

        The Architecture in Practice

        Here's how a typical implementation looks:

        Source Systems → Oracle, SQL Server, PostgreSQL, or other supported databases where your transactional data lives.

        Oracle GoldenGate 23ai DAA → Captures changes in real-time and replicates to Iceberg format. Supports multiple catalog types including AWS Glue, Polaris, Nessie, and REST catalogs.

        Hyperscaler Object Storage → Your data lands in S3, GCS, or ADLS in open Parquet format, organized as Iceberg tables.

        Compute Engines → Choose your weapon based on the mission:

        • Heavy ETL/transformation: Spark on your preferred hyperscaler
        • Interactive analytics: Trino, Presto, or Dremio
        • Specific Snowflake workloads: Connect via external volume for seamless querying

        Snowflake (Optional) → Query your Iceberg tables through catalog integration. You get Snowflake's query experience without Snowflake storage costs, and you only pay compute when you actually query.

        The Cost Math

        Let's look at a practical scenario. Say you have 50TB of analytical data that's refreshed continuously throughout the day.

        Traditional Snowflake Approach:

        • Storage: Snowflake-managed (premium pricing)
        • Compute: All queries run on Snowflake credits
        • You're paying Snowflake for everything

        Iceberg + GoldenGate Approach:

        • Storage: Hyperscaler object storage (significantly lower cost per TB)
        • Heavy compute: Spark clusters with spot/reserved pricing
        • Ad-hoc analytics: Trino or Snowflake, depending on need
        • You're paying hyperscaler rates for storage and choosing the most cost-effective compute for each workload

        Organizations I've worked with have seen 40-60% reductions in their total data platform costs using this approach—and they've gained flexibility they didn't have before.

        Configuration Essentials

        For teams ready to execute, here's what you need to know about the GoldenGate configuration. The Iceberg Replicat supports automatic configuration—set gg.target=iceberg and the handler autoconfigures the required components.

        Key configuration properties:

        gg.target=iceberg
        gg.eventhandler.iceberg.catalogType=glue # or nessie, polaris, rest
        gg.eventhandler.iceberg.fileSystemScheme=s3://
        gg.eventhandler.iceberg.awsS3Bucket=your-iceberg-bucket
        gg.eventhandler.iceberg.awsS3Region=us-east-2

        GoldenGate handles automatic table creation, operation aggregation, and proper Iceberg metadata management. The default flush interval is 15 minutes, configurable based on your latency requirements.

        Snowflake Integration

        Once your data is in Iceberg format on your hyperscaler storage, Snowflake connects through an external volume and catalog integration. Snowflake supports both scenarios:

        1. External catalog (AWS Glue, Polaris, REST): Snowflake reads metadata from your existing catalog
        2. Snowflake as catalog: Snowflake manages the Iceberg metadata while data stays in your external storage

        Either way, your Iceberg tables show up in Snowflake like native tables, but the storage costs stay with your hyperscaler agreement—where you likely have better rates and more control.

        Mission Accomplished: What Success Looks Like

        When this architecture is implemented correctly, you achieve several strategic objectives:

        • Cost predictability: Storage costs are transparent and controlled through your hyperscaler agreement
        • Compute flexibility: Choose the right engine for each workload without lock-in
        • Data openness: Your data is in open formats (Parquet, Iceberg), accessible by any compatible tool
        • Real-time currency: GoldenGate keeps your Iceberg tables current with continuous replication
        • Operational simplicity: One data copy serves multiple compute engines

        Taking Action

        If you're facing Snowflake cost challenges, here's my recommendation: start with a single high-volume table or workload. Implement GoldenGate 23ai DAA replication to Iceberg on your hyperscaler of choice. Measure the cost differential over 30 days. The numbers will tell the story.

        This isn't about replacing Snowflake—it's about using the right tool for each part of the mission. Snowflake excels at interactive analytics and business intelligence. Spark excels at heavy transformations. Iceberg lets them coexist on the same data without duplicating storage or sacrificing capability.

        Outstanding work comes from making strategic architectural decisions. This is one of them.


        Have questions about implementing this architecture? Let's coordinate. Reach out to discuss how RheoData can help you achieve your data platform objectives.

         

        Recent posts

        Related Posts

        Securing Oracle GoldenGate to Snowflake: Why Key Pairs Matter (Or: How Not To Leave Your Data Pipeline Wide Open)

        Read more

        Why Your Oracle GoldenGate Password Problem Is Costing You Sleep (And How Key Pair Authentication Fixes It)

        Last month, a retail VP of Digital Operations told me something that made my blood run cold. Her...

        Read more

        Data Governance with Oracle GoldenGate

        Data Governance with Oracle GoldenGate

        This document outlines how Oracle GoldenGate contributes to...

        Read more