When Your Oracle LOBs Won’t Play Nice with Snowflake: A Real-World Solution
This week, I found myself deep in a familiar challenge – helping a client navigate the complexities of replicating Oracle Large Objects (LOBs) to Snowflake using Oracle GoldenGate. What started as a seemingly straightforward data integration project quickly revealed the nuanced technical considerations that separate successful migrations from frustrating dead ends.
Let me give you the straight story about what we discovered and how we solved it.
The Challenge: More Than Just Moving Data
Our client came to us with what appeared to be a standard requirement: replicate Oracle CLOB data to Snowflake in real-time. Simple enough, right? Well, as anyone who’s worked with LOBs knows, there’s always more beneath the surface.
The real challenge wasn’t just moving the data – it was understanding how Oracle stores LOBs and how that impacts replication to a completely different platform like Snowflake.
Understanding the Foundation: What Are LOBs Anyway?
Before we dive into the solution, let’s establish some common ground. In Oracle, LOBs (Large Objects) are data types designed to handle substantial amounts of character or binary data – think documents, images, or large text fields that exceed the limitations of standard VARCHAR2 columns.
But here’s where it gets interesting: Oracle doesn’t store all LOBs the same way.
Inline vs. Out-of-Line LOBs: The Critical Distinction
Oracle uses two storage methods for LOBs, and understanding this distinction is crucial for successful replication:
Inline LOBs:
- Small LOB data (typically under 4KB to 8KB, depending on your Oracle version)
- Stored directly within the table row alongside other column data
- Oracle GoldenGate can capture these changes directly from the redo logs
- More efficient for replication purposes
Out-of-Line LOBs:
- LOB data exceeding the inline threshold
- Stored in separate LOB segments with only a pointer in the table row
- GoldenGate must fetch this data directly from the database
- Less performant for replication, especially with large LOBs
The Oracle-to-Snowflake Translation Challenge
Here’s where our client’s project got interesting. In Oracle, we’re dealing with CLOB data types. In Snowflake, these become VARCHAR columns. This isn’t just a simple rename – it’s a fundamental data type conversion that requires careful planning.
Our client’s Oracle environment had CLOB columns that could theoretically hold massive amounts of data, but their business requirements kept most content under 15MB. Meanwhile, Snowflake VARCHAR columns can handle 64MB to 128MB (depending on documentation), giving us plenty of headroom.
The challenge was ensuring GoldenGate could handle this conversion seamlessly.
The Standard Setup: Getting the Basics Right
Let me walk you through how we structured the tables to ensure compatibility.
In Oracle, our standard table looked like this:
CREATE TABLE CTMS_PSO.document_store (
id NUMBER GENERATED BY DEFAULT AS IDENTITY,
content CLOB NOT NULL CHECK (LENGTH(content) <= 15728640),
created_date DATE DEFAULT SYSDATE,
CONSTRAINT pk_document_store PRIMARY KEY (id)
);
Optional: Create index on created_date for performance
CREATE INDEX CTMS_PSO.idx_document_store_created ON document_store(created_date);
Notice the check constraint limiting CLOB size to 15MB – this business rule became crucial for our Snowflake design.
The corresponding Snowflake table:
CREATE TABLE ctms_pso.document_store (
id NUMBER AUTOINCREMENT,
content VARCHAR(16777216) NOT NULL,
created_date TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP(),
CONSTRAINT pk_document_store PRIMARY KEY (id)
);
The VARCHAR(16777216) gives us 16MB capacity – slightly larger than our Oracle constraint to provide a safety buffer.
The GoldenGate Configuration: Where the Magic Happens
Here’s where our experience really paid off. Oracle GoldenGate handles LOB replication differently depending on your target system:
1. Oracle-to-Oracle: LOBs replicate in pieces (partial LOB replication)
2. Oracle-to-Non-Oracle: You need the complete LOB for each transaction
For our Snowflake target, we needed to ensure complete LOB capture. The key parameter in our Extract configuration:
EXTRACT ETSCSF3
USERIDALIAS SOURCE DOMAIN OracleGoldenGate
EXTTRAIL WW
REPORTCOUNT EVERY 2 MINUTES, RATE
WARNLONGTRANS 30MIN CHECKINTERVAL 10MIN
TRANLOGOPTIONS INTEGRATEDPARAMS (MAX_SGA_SIZE 512, PARALLELISM 2)
TRANLOGOPTIONS FETCHPARTIALLOB
NOCOMPRESSUPDATES
TABLE FREEPDB1.CTMS_PSO.document_store;
The `TRANLOGOPTIONS FETCHPARTIALLOB` parameter is your best friend here. When Extract receives partial LOB content from the logmining server, this forces it to fetch the complete LOB image instead of just processing the partial content.
The Replicat configuration remained straightforward:
REPLICAT REPSF
REPERROR(DEFAULT, ABEND)
REPORTCOUNT EVERY 1 MINUTES, RATE
GROUPTRANSOPS 10000
MAXTRANSOPS 20000
MAP FREEPDB1.CTMS_PSO.document_store, TARGET TRACTORSUPPLY.CTMS_PSO.document_store;
The Plot Twist: When 15MB Becomes 8MB
Just when we thought we had everything figured out, our client threw us a curveball. Due to their existing Snowflake table structure and constraints from a previous replication tool, they needed to limit all LOBs to exactly 8MB during replication.
This required a more sophisticated approach using GoldenGate’s SQLEXEC functionality.
SQLEXEC: The Swiss Army Knife of GoldenGate
SQLEXEC allows GoldenGate to execute database commands within the replication process. Think of it as a way to transform data on-the-fly during extraction.
Here’s how we modified the Extract to capture only the first 8MB of each LOB:
EXTRACT ETSCSF3
USERIDALIAS SOURCE_TSC DOMAIN OracleGoldenGate
EXTTRAIL WW
REPORTCOUNT EVERY 2 MINUTES, RATE
WARNLONGTRANS 30MIN CHECKINTERVAL 10MIN
TRANLOGOPTIONS INTEGRATEDPARAMS (MAX_SGA_SIZE 512, PARALLELISM 2)
TRANLOGOPTIONS FETCHPARTIALLOB
NOCOMPRESSUPDATES
TABLE FREEPDB1.CTMS_PSO.document_store, SQLEXEC(ID lob_id, QUERY "select dbms_lob.SUBSTR(content, 1, 8388608) from CTMS_PSO.document_store where ID = :LOB_ID", PARAMS(LOB_ID = ID), EXEC SOURCEROW);
Let me break down this SQLEXEC command:
- ID lob_id – Creates a parameter variable
- QUERY “select dbms_lob.SUBSTR(content, 1, 8388608)…” – Executes a substring operation capturing exactly 8MB (8388608 bytes)
- PARAMS(LOB_ID = ID) – Maps the table’s ID column to our parameter
- EXEC SOURCEROW); – Runs this SQL for every captured row
The beauty of this approach is that it handles all DML operations – inserts, updates, and deletes – automatically applying the 8MB limit during extraction.
The Results: Mission Accomplished
After implementing this solution, our client achieved exactly what they needed:
- Real-time replication of Oracle CLOBs to Snowflake VARCHARs
- Automatic truncation to 8MB to match their existing architecture
- Reliable, consistent performance across all transaction types
- Clean integration with their existing Snowflake environment
Key Takeaways for Your Oracle-to-Snowflake Journey
- Understand your LOB storage patterns – inline vs. out-of-line makes a significant difference in replication performance
- Plan your data type mapping carefully – Oracle CLOBs to Snowflake VARCHARs requires thoughtful sizing
- Use FETCHPARTIALLOB for non-Oracle targets – this ensures complete LOB capture in your trail files
- Leverage SQLEXEC for data transformation – when you need to modify data during extraction, this is your tool
- Test thoroughly with realistic data volumes – LOB replication behaves differently under various load conditions
Partner with the Experts
Data integration projects like Oracle-to-Snowflake migrations involve countless technical nuances that can make or break your success. At RheoData, we’ve navigated these challenges across dozens of enterprise implementations, combining deep Oracle expertise with modern cloud platform knowledge. Whether you’re planning a complete migration or need to solve specific replication challenges, our team brings the experience and proven methodologies to ensure your data integration project succeeds.
Ready to tackle your Oracle-to-Snowflake integration challenge? Let’s coordinate on a solution that fits your specific requirements. Contact RheoData ([email protected]) today to discuss how we can accelerate your data transformation journey.