The age of AI with Oracle GoldenGate23ai
Artificial Intelligence (AI) is buzzing in today’s enterprise circles. Let’s briefly examine using AI in data integration, specifically with Oracle GoldenGate. First, what are the benefits of AI in data integration?
Using AI in data integration offers several significant benefits, enhancing data management processes’ efficiency, accuracy, and overall capabilities. Here are some of the key advantages:
- Automation of Repetitive Tasks: AI can automate routine data integration tasks, reducing the need for manual intervention. This includes data extraction, transformation, and loading (ETL) processes, which can be labor-intensive and error-prone when done manually.
- Improved Data Quality: AI algorithms can detect and correct errors, inconsistencies, and duplicates in data. Machine learning models can learn from historical data to identify patterns and anomalies, ensuring higher data quality.
- Enhanced Data Matching and Merging: AI can improve the accuracy of data matching and merging from different sources by using advanced algorithms to reconcile discrepancies and integrate data seamlessly. This is particularly useful when data comes from various formats and systems.
- Real-time Data Integration: AI can facilitate real-time data integration by continuously monitoring and updating data streams. This ensures that integrated data is always up to date, supporting real-time analytics and decision-making.
- Scalability: AI-driven data integration solutions can scale efficiently with growing data volumes and complexity. They can handle large datasets and complex data structures more effectively than traditional methods.
- Intelligent Data Mapping: AI can automate the process of data mapping, where data fields from different sources are aligned with each other. Machine learning models can infer and adapt to changes in data schema, reducing the need for manual mapping.
- Cost Efficiency: AI can significantly reduce operational costs by automating many aspects of data integration. It minimizes the need for extensive human resources and reduces errors that might lead to costly rectifications.
- Enhanced Data Governance and Compliance: AI can help maintain compliance with data governance standards and regulations by automatically enforcing data quality rules, auditing data usage, and ensuring proper data lineage tracking.
- Better Insights and Analytics: Integrated data enriched by AI capabilities allows for deeper insights and more sophisticated analytics. AI can help uncover hidden patterns, trends, and correlations within integrated datasets, leading to more informed business decisions.
- Handling Unstructured Data: AI is particularly adept at handling unstructured data (e.g., text, images, videos) and integrating it with structured data, providing a more comprehensive view of the data landscape.
- Personalization and Customization: AI can tailor data integration processes to specific business needs, learning and adapting to unique requirements and preferences, leading to more effective and relevant data integration solutions.
By leveraging AI in data integration, organizations can achieve more reliable, efficient, and scalable data management, ultimately driving better business outcomes and gaining a competitive edge.
These eleven benefits are all great in concept, but how do you put these into practice?
Oracle GoldenGate has always been the interface for successful data integration for many organizations. With the recent rise in AI, many organizations seek solutions to build robust AI infrastructure with minimal downtime and maximum value.
In reviewing all the data integration platforms out there – Oracle GoldenGate, FiveTran, Qlik, Airbyte, Striim, and many others – Oracle GoldenGate has the most robust and stable approach to tackling the new age of AI, although it is geared mainly towards Oracle and PostgreSQL workloads currently.
With the release of Oracle GoldenGate 23ai, many are excited about getting to see AI at work; however, the 23ai in Oracle GoldenGate 23ai only means that Oracle GoldenGate can replicate the new datatype within Oracle Database 23ai – Vector datatype. The core replication concepts that Oracle GoldenGate follows are still the bedrock of replication.
Since we touched on the vector datatype, let’s look at what vectors are and their benefits.
Vectors
Vectors are nothing new within the IT industry; after all, search engines have used different vectors for decades. The vectors being introduced now are high-dimensional numerical representations of data items. These vectors capture the semantic meaning of the data, enabling machines to understand and process the information effectively.
The primary purpose of vectors is to use them within Retrieval Augmentation Generation (RAG) to facilitate efficient and accurate retrieval of relevant information from a large corpus of structured and unstructured data.
How Vectors are used
There are four steps to using a vector with Retrieval Augmentation Generation (RAG). The basic concepts are:
- Query Encoding: When a user inputs a query, it is transformed into a vector. This process is known as query encoding.
- Document Encoding: Similarly, all documents in the corpus are pre-encoded into vectors.
- Similarity Search: The vector of the query is compared against the vectors of the documents using similarity measures (e.g., cosine similarity). The documents with vectors most like the query vector are retrieved as relevant results.
- Generation: After the similarity search retrieves the relevant documents, these are used to augment the generative model. The generative model uses the information from the retrieved documents to generate a coherent and contextually appropriate response.
Benefits of using Vectors with RAG systems
- Efficiency: Vectors allow for fast and efficient retrieval of information from large datasets.
- Accuracy: By capturing semantic meanings, vectors improve the accuracy of retrieval and the relevance of generated responses.
- Scalability: Vector-based retrieval scales well with increasing data sizes, maintaining performance even with large corpora.
- Contextual Understanding: Vectors enable the system to understand and leverage the context of queries and documents, enhancing the overall quality of generated output
Now, with a brief understanding of what vectors are, how they are used, and their benefits, how does this apply to Oracle GoldenGate 23ai?
Reviewing a simple uni-directional use case, like populating a data warehouse running Oracle Database 23ai (cloud or on-premises) in the manufacturing vertical.
Basic replication steps
Data is captured from data marts located at remote sites and applied to the data warehouse at a central site. The steps would look like:
- Source: Oracle GoldenGate (Extract) captures changed data from data marts.
- Source: Captured data is placed in trail files
- Source: Trail files are shipped across the network (if needed)
- Remote: Trail files are staged
- Remote: Trail files read by apply process
- Remote: Oracle GoldenGate (Replicat) applies changed data to the data warehouse
Where does AI come in
The six steps in the previous section are based on general replication principles defined in the CAP theorem. Under CAP, a data replication system ensures consistency (among replicated copies), availability (of the system for read/write operations), and partition tolerance (in the face of the nodes in the system being partitioned by a network fault).
When overlaying AI onto these replication steps, it is nothing more than ensuring that vectors, the data that drive an RAG system, are available where needed for usage within an RAG system. In Oracle GoldenGate 23ai, Oracle has ensured that vectors can be captured, transferred, and applied to data platforms that support the vector datatype—enabling organizations to bring their real-time unstructured data to their AI applications and delivering the next-generation user experiences.
Data Platforms that support Vectors
Oracle GoldenGate 23ai has added to its extensive heterogeneous platforms by including popular vector databases in its growing portfolio. The new additions to the portfolio are:
- Oracle Database 23ai (OCI & On-Premises)
- MySQL Heatwave
- Postgres + pgVector (OCI)
- Postgres + pgVector
- EnterpriseDB + pgVector
- AlloyDB
- AmazonRDS
- AmazonAura
- Azure Postgres
- Elasticsearch
- OpenSearch
Now that Oracle GoldenGate 23ai supports so many different Oracle and non-Oracle vector platforms, getting your mission-critical unstructured data to where it is needed is possible. The data integration patterns below can be used immediately after upgrading to Oracle GoldenGate 23ai.
Migration of vectors to Oracle Database 23ai vector database
Multi-Master/Multi-Cloud/Active-Active database replication
Consolidation of vector changes
With the ability to now take an organization’s real-time unstructured data, represented as vectors, an organization can move structured and unstructured data across the enterprise.
As Artificial Intelligence (AI) starts to take hold in organizations, the typical use case that will leverage Oracle GoldenGate 23ai will be private Retrieval Augmented Generation (RAG) systems. These private RAG systems will leverage Oracle GoldenGate 23ai by moving structured and unstructured data to a central data hub where vectors will be leveraged against private large language models.
An illustration of the architecture this would follow is:
In this illustration, you can see that Oracle GoldenGate 23ai is moving data from various database sources to the data warehouse in real-time. Don’t let the illustration fool you; the sources that Oracle GoldenGate 23ai can capture include more than databases now. New capture types that can be supported include ERP & SaaS applications, Vector stores, Event Messaging, and NoSQL.
Pitfalls
Although Oracle GoldenGate 23ai makes it easy to move structured and unstructured data in real-time, including vectors, there is an inherent problem here. This problem is common with all vector databases – what embedding model is used?
As AI begins to grow and be used more in enterprises, ensuring that the correct embedding model to embed data is used will be critical. Because we can now replicate vectors and enable Retrieval Augmented Generation (RAG) platforms with fresh and relevant data, it doesn’t mean that data sources can have different embeddings. What needs to be understood and combed are:
- What embedding model is being used at each site?
- How does an organization standardize an embedding model?
- How do you troubleshoot embedding models?
- Who is curating the embedding models and ensuring validity?
These are only some of the questions that should be asked or evaluated before building a Retrieval Augmented Generation (RAG) system or implementing real-time replication of vectors with Oracle GoldenGate 23ai.
In closing, Oracle GoldenGate 23ai is a leap forward in ensuring that enterprises can quickly and dynamically build Retrieval Augmented Generation (RAG) systems for their private and secure use cases.