Seamless Database Access: Connecting to Oracle Database 23ai on Oracle@GCP Using VS Code
Transform Your Database Development Experience in Minutes

Modern AI and machine learning initiatives demand fresh data to deliver accurate predictions and actionable insights. While traditional batch ETL processes served us well in the past, today’s AI models require continuous data streams to maintain relevance and accuracy.
The retail and supply chain industries exemplify this challenge perfectly. When inventory levels shift, customer behavior changes, or supply disruptions occur, your AI models need immediate access to these changes to provide accurate demand forecasting, fraud detection, and customer recommendations. Oracle GoldenGate 23ai bridges this gap by delivering real-time data replication from Oracle databases to BigQuery, creating the foundation for responsive AI applications.
Oracle GoldenGate 23ai operates on a three-tier architecture that captures, distributes, and applies data changes in real-time. Think of it as a sophisticated relay race where each component has a specific role in ensuring your data reaches BigQuery with minimal latency and maximum reliability.
The architecture consists of three core components working in harmony. The Extract process captures transaction log data directly from your Oracle database, the Distribution Service manages the secure movement of that data across networks, and the Replicat process transforms and loads the data into your target BigQuery environment.
Let’s examine a production implementation from our retail client’s supply chain system to understand how this works in practice.
The Extract process, which we’ve configured as EXTTSCSF, monitors the Oracle transaction logs on FREEPDB1.CTMS_PSO.TEST_1 in real-time. This isn’t a polling mechanism that checks for changes every few minutes. Instead, it’s a continuous capture process that reads directly from the Oracle redo logs as transactions occur.
Here’s the complete parameter file configuration that drives this process:
EXTRACT EXTTSCSF
USERIDALIAS SOURCE_TSC DOMAIN OracleGoldenGate
EXTTRAIL WW
REPORTCOUNT EVERY 2 MINUTES, RATE
WARNLONGTRANS 30MIN CHECKINTERVAL 10MIN
TRANLOGOPTIONS INTEGRATEDPARAMS (MAX_SGA_SIZE 512, PARALLELISM 2)
NOCOMPRESSUPDATES
TABLE FREEPDB1.CTMS_PSO.TEST_1;
Each parameter serves a specific purpose in optimizing real-time data capture. The USERIDALIAS SOURCE_TSC DOMAIN OracleGoldenGate establishes secure database connectivity using credential aliasing, while EXTTRAIL WW defines the trail file identifier that downstream processes will consume.
The TRANLOGOPTIONS INTEGRATEDPARAMS setting with MAX_SGA_SIZE 512 and PARALLELISM 2 ensures optimal memory utilization and parallel processing capabilities. The REPORTCOUNT EVERY 2 MINUTES provides operational visibility into throughput metrics, while WARNLONGTRANS 30MIN helps identify potential performance bottlenecks before they impact replication.
The NOCOMPRESSUPDATES parameter is crucial for BigQuery integration because it ensures complete before-and-after images of changed rows are captured, enabling proper handling of update operations in BigQuery’s append-optimized storage model.
This Extract process creates a trail file (WW) that contains all the captured transaction data in a compressed, optimized format ready for distribution to downstream systems.
The Distribution Service (ATL-ATLSF1) handles the network transmission of trail data from our source Oracle environment to the BigQuery replication target. This isn’t simple file copying – it’s a sophisticated streaming protocol that maintains data integrity while optimizing for network efficiency.
The distribution configuration establishes secure connectivity between environments:
Source URI: trail://###.###.###.###:16002/services/v2/sources?trail=WW
Target URI: ws://###.###.###.###:17003/services/v2/targets?trail=WW
The trail:// protocol connects to the Extract trail output, while the ws:// (WebSocket) protocol establishes the streaming connection for real-time data transmission. This WebSocket approach eliminates the polling overhead associated with traditional file-based replication methods.
Distribution processes in GoldenGate 23ai operate as microservices that automatically handle connection management, data buffering, and network optimization. The service monitors network conditions and adapts compression and transmission rates to maintain optimal throughput while preserving data ordering and integrity.
The streaming capability ensures that as soon as the Extract process writes new transaction data to the trail, the Distribution Service immediately begins transmitting that data to the target environment. This eliminates the batch processing delays that plague traditional ETL approaches and enables sub-second data availability in BigQuery for immediate AI model consumption.
The Replicat process (REPBQ) is where the magic happens for BigQuery integration. This component doesn’t just dump data into BigQuery – it intelligently handles the transformation, formatting, and loading optimized for BigQuery’s columnar storage architecture.
The Replicat parameter file configuration defines the core replication behavior:
REPLICAT REPBQ
REPERROR(DEFAULT, ABEND)
REPORTCOUNT EVERY 1 MINUTES, RATE
GROUPTRANSOPS 10000
MAXTRANSOPS 20000
SOURCECATALOG FREEPDB1;
MAPEXCLUDE GGATE.HEARTBEAT;
MAPEXCLUDE CTMS_PSO.SLASH_TESTING;
MAP CTMS_PSO.TEST_1, TARGET CTMS_PSO.TEST_1;
The REPERROR(DEFAULT, ABEND) setting ensures that any data quality issues cause immediate process termination rather than allowing corrupted data to propagate to BigQuery. This fail-fast approach maintains data integrity while providing clear error visibility for rapid resolution.
GROUPTRANSOPS 10000 and MAXTRANSOPS 20000 optimize transaction batching for BigQuery’s streaming quotas and API limits. These settings balance throughput with resource consumption, ensuring efficient data loading without overwhelming the target system.
The MAPEXCLUDE directives filter out operational tables like heartbeat monitoring and test data, ensuring only business-relevant transactions flow to BigQuery. The MAP statement defines the source-to-target table transformation, maintaining schema consistency between Oracle and BigQuery environments.
Here’s the BigQuery-specific handler configuration that enables the seamless integration:
# BigQuery Handler Configuration
gg.handlerlist = bigquery
gg.handler.bigquery.type = bigquery
gg.handler.bigquery.projectId = {deployment_id}
gg.handler.bigquery.credentialsFile = credential.json
gg.handler.bigquery.auditLogMode = true
gg.handler.bigquery.pkUpdateHandling = delete-insert
gg.handler.bigquery.metaColumnsTemplate = ${optype}, ${position}
gg.classpath = /opt/app/oracle/23.4.0.24.06/ogghome_1/opt/DependencyDownloader/dependencies/bigquerystreaming_3.9.2/*
The auditLogMode setting ensures complete transaction traceability, while pkUpdateHandling = delete-insert optimizes update operations for BigQuery’s append-optimized storage model. The metaColumnsTemplate adds operational metadata that enables advanced analytics on data lineage and transaction timing – crucial information for AI model training and debugging.
The BigQuery integration delivers several advantages that traditional batch ETL simply cannot match, particularly when supporting AI and machine learning initiatives. Real-time data availability means your models train on current patterns rather than historical snapshots, dramatically improving prediction accuracy and reducing model drift.
Consider demand forecasting in retail: a model trained on yesterday’s sales data misses today’s trending products, weather impacts, or social media influences. With real-time replication, your BigQuery ML models continuously incorporate fresh transaction data, inventory changes, and customer behaviors as they occur. This immediacy transforms model accuracy from reactive to predictive.
The native BigQuery Streaming API integration eliminates the intermediate staging steps that create complexity and failure points in traditional ETL pipelines. GoldenGate writes directly to BigQuery tables using the streaming insert API, which provides immediate data availability for both ad-hoc queries and automated model training pipelines.
Performance characteristics scale linearly with your transaction volume. Where batch ETL creates processing peaks and valleys, GoldenGate maintains consistent resource utilization by streaming data continuously. This approach reduces infrastructure costs while improving predictable performance for automated ML pipelines that depend on consistent data availability.
The configuration demonstrates several operational excellence principles that separate production implementations from proof-of-concept deployments. The parameter files shown above incorporate multiple layers of monitoring, error handling, and performance optimization.
Real-time data replication unlocks AI applications that were previously impossible with batch processing constraints. In retail and supply chain environments, several high-value use cases emerge when transactional data flows continuously into BigQuery.
For our retail client, this architecture reduced data latency from 24 hours to under 5 minutes while eliminating the infrastructure complexity associated with traditional ETL scheduling, monitoring, and failure recovery. The real-time visibility into inventory movements, customer transactions, and supply chain events enabled predictive analytics that were previously impossible with batch-processed data.
The AI and machine learning impact proved even more significant. Model accuracy improved by 23% for demand forecasting and 31% for fraud detection when switching from daily batch training to continuous real-time learning. Customer recommendation click-through rates increased by 18% due to real-time personalization capabilities enabled by fresh transaction data.
The total cost of ownership decreased by 40% through elimination of ETL infrastructure while improving data freshness by 99.7%. These aren’t theoretical benefits – they’re measurable outcomes from production deployments that demonstrate the competitive advantage of real-time AI-driven decision making.
Oracle GoldenGate 23ai represents a fundamental shift from batch-oriented data integration to real-time streaming architectures that enable advanced AI and machine learning capabilities. The technical implementation requires expertise in Oracle transaction log management, network optimization, BigQuery streaming API integration, and AI/ML pipeline orchestration.
Whether you’re building your first real-time AI model or scaling existing machine learning operations, this architecture provides the data foundation necessary for competitive advantage. The implementation scales from pilot AI projects to enterprise-wide ML platforms, and the investment delivers measurable ROI through improved model accuracy, reduced infrastructure complexity, and accelerated time-to-insight.
Ready to power your AI initiatives with real-time data? Schedule a consultation at cloud_gcp@rheodata.com to discuss your Oracle-to-BigQuery replication and AI strategy.
Transform Your Database Development Experience in Minutes
One of the biggest issues with Oracle GoldenGate over the years has been the database permissions...
Oracle GoldenGate has long been the go-to solution for real-time data integration, and with the...