
Sign up to save your podcasts
Or


00:00
Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we'll bring you foundational training on the most popular Oracle technologies. Let's get started!
00:26
Lois: Hello and welcome to the Oracle University Podcast! I'm Lois Houston, Director of Communications and Adoption Programs with Customer Success Services, and with me is Nikita Abraham, Team Lead: Editorial Services with Oracle University.
Nikita: Hi everyone! Today, we're beginning a brand-new season, this time on Oracle AI Vector Search. Whether you're new to vector searches or you've already been experimenting with AI and data, this episode will help you understand why Oracle's approach is such a game-changer.
Lois: To make sure we're all starting from the same place, here's a quick overview. Oracle AI Vector Search lets you go beyond traditional database searches. Not only can you find data based on specific attribute values or keywords, but you can also search by meaning, using the semantics of your data, which opens up a whole new world of possibilities.
01:20
Nikita: That's right, Lois. And guiding us through this episode is Senior Principal APEX & Apps Dev Instructor Brent Dayley. Hi Brent! What's unique about Oracle's approach to vector search? What are the big benefits?
Brent: Now one of the biggest benefits of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data, all in one single system. This is very powerful, and also a lot more effective because you don't need to add a specialized vector database. And this eliminates the pain of data fragmentation between multiple systems.
It also supports Retrieval Augmented Generation, also known as RAG. Now this is a breakthrough generative AI technique that combines large language models and private business data. And this allows you to deliver responses to natural language questions. RAG provides higher accuracy and avoids having to expose private data by including it in the large language model training data.
02:41
Lois: OK, and can you explain what the new VECTOR data type is?
Brent: So, this data type was introduced in Oracle Database 23ai. And it allows you to store vector embeddings alongside other business data.
Now, the vector data type allows a foundation to store vector embeddings. This allows you to store your business data in the database alongside your unstructured data, and allows you to use those in your queries. So it allows you to apply semantic queries on business data.
03:24
Lois: For many of our listeners, "vector embeddings" might be a new term. Can you explain what vector embeddings are?
Brent: Vector embeddings are mathematical representations of data points. They assign mathematical representations based on meaning and context of your unstructured data.
You have to generate vector embeddings from your unstructured data either outside or within the Oracle Database. In order to get vector embeddings, you can either use ONNX embedding machine learning models or access third-party REST APIs.
Embeddings can be used to represent almost any type of data, including text, audio, or visual such as pictures. And they are used in proximity searches.
04:19
Nikita: Now, searching with these embeddings isn't about looking for exact matches like traditional search, right? This is more about meaning and similarity, even when the words or images differ? Brent, how does similarity search work in this context?
Brent: So vector data is usually unevenly distributed and clustered. Vector data tends to be unevenly distributed and clustered into groups that are semantically related. Doing a similarity search based on a given query vector is equivalent to retrieving the k nearest vectors to your query vector in your vector space.
What this means is that basically you need to find an ordered list of vectors by ranking them, where the first row is the closest or most similar vector to the query vector. The second row in the list would be the second closest vector to the query vector, and so on, depending on your data set. What we need to do is to find the relative order of distances. And that's really what matters rather than the actual distance.
Now, similarity searches tend to get data from one or more clusters, depending on the value of the query vector and the fetch size. Approximate searches using vector indexes can limit the searches to specific clusters. Exact searches visit vectors across all clusters.
05:51
Lois: Let's talk about how we actually convert information into these vectors. There are models behind the scenes, right? Kind of like translators between words, images, and numbers. Brent, what embedding models does Oracle support, and how do they handle different data types?
Brent: Vector embedding models allow you to assign meaning to what a word, or a sentence, or the pixels in an image, or perhaps audio. What that actually means? It allows you to quantify features or dimensions.
Most modern vector embeddings use a transformer model. Bear in mind that convolutional neural networks can also be used. Depending on the type of your data, you can use different pretrained open-source models to create vector embeddings. As an example, for textual data, sentence transformers can transform words, sentences, or paragraphs into vector embeddings.
For visual data, you can use residual network, also known as ResNet, to generate vector embeddings. You can also use visual spectrogram representation for audio data. And that allows us to use the audio data to fall back into the visual data case. Now, these can also be based on your own data set. Each model also determines the number of dimensions for your vectors.
As an example, Cohere's embedding model, embed English version 3.0, has 1,024 dimensions. Open AI's embedding model, text-embedding-3-large, has 3,072 dimensions.
07:45
Nikita: For organizations ready to put this into practice, there's the question of how to get the models up and running inside Oracle Database. Can you walk us through how these models are brought into Oracle Database?
Brent: Although you can generate vector embeddings outside the Oracle Database using pre-trained open-source embeddings or your own embedding models, you also have the option of doing those within the Oracle Database. In order to use those within the Oracle Database, you need to use models that are compatible with the Open Neural Network Exchange Standard, or ONNX, also known as onn-ex.
Oracle Database implements an ONNX runtime directly within the database, and this is going to allow you to generate vector embeddings directly inside the Oracle Database using SQL.
08:41
AI is transforming every industry. So, it's no wonder that AI skills are the most sought-after by employers. If you're ready to dive into AI, check out the OCI AI Foundations training and certification that's available for free! It's the perfect starting point to build your AI knowledge. Head over to mylearn.oracle.com to kickstart your AI journey today!
09:06
Nikita: Welcome back! Let's make this practical. Imagine I'm setting this up for the first time. What are the big steps? Can you walk us through the end-to-end workflow using Oracle AI Vector Search?
Brent: Generate vector embeddings from your data, either outside the database or within the database. Now, embeddings are a mathematical representation of what your data meaning is. So, what does this long sentence mean, for instance? What are the main keywords out of it?
You can also generate embeddings not only on your typical string type of data, but you can also generate embeddings on other types of data, such as pictures or perhaps maybe audio wavelengths.
Maybe we want to convert text strings to embeddings or convert files into text. And then from text, maybe we can chunk that up into smaller chunks and then generate embeddings on those chunks. Maybe we want to convert files to embeddings, or maybe we want to use embeddings for end-to-end search.
Now you have to generate vector embeddings from your unstructured data, as we mentioned, either outside or within the Oracle Database. You can either use the ONNX embedding machine learning models or you can access third-party REST APIs.
You can import pretrained models in ONNX format for vector generation within the database. You can download pretrained embedding machine learning models, convert them into the ONNX format if they are not already in that format. Then you can import those models into the Oracle Database and generate vector embeddings from your data within the database.
Oracle also allows you to convert pre-trained models to the ONNX format using Oracle machine learning for Python. This enables the use of text transformers from different companies.
11:36
Nikita: Once those embeddings are generated, what's the next step?
Brent: Store vector embeddings. So you can create one or more columns of the vector data type in your standard relational data tables. You can also store those in secondary tables that are related to the primary tables using primary key foreign key relationships.
You can store vector embeddings on structured data and relational business data in the Oracle Database. You do store the resulting vector embeddings and associated unstructured data with your relational business data inside the Oracle Database.
12:17
Lois: And when do vector indexes come into play?
Brent: Now you may want to create vector indexes in the event that you have huge vector spaces. This is an optional step, but this is beneficial for running similarity searches over those huge vector spaces.
12:38
Nikita: Now, once all of that is in place, how do users perform similarity searches?
Brent: So once you have generated the vector embeddings and stored those vector embeddings and possibly created the vector indexes, you can then query your data with similarity searches. This allows for native SQL operations and allows you to combine similarity searches with relational searches in order to retrieve relevant data.
So let's take a look at the combined complete workflow. Step number one, generate the vector embeddings from your unstructured data. Step number two, store the vector embeddings. Step number three, create vector indexes. And step number four, combine similarity and keyword searches.
Now there is another optional step. You could generate a prompt and send it to a large language model for a full RAG inference. You can use the similarity search results to generate a prompt and send it to your generative large language model in order to complete your RAG pipeline.
14:07
Lois: Thanks for that detailed walk-through, Brent. To sum up, today we introduced Oracle AI Vector Search, discussed its core concepts, data types, embedding models, and the complete workflow you'll use to get real value out of your business data, securely and efficiently.
Nikita: If you want to learn more about the topics we discussed today, go to mylearn.oracle.com and search for the Oracle AI Vector Search Fundamentals course. And if you're feeling inspired to try this out for yourself, don't forget to check out the Oracle Database 23ai SQL Workshop for hands-on training. Until next time, this is Nikita Abraham…
Lois: And Lois Houston, signing off!
14:49
That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
By Oracle Corporation3.5
66 ratings
00:00
Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we'll bring you foundational training on the most popular Oracle technologies. Let's get started!
00:26
Lois: Hello and welcome to the Oracle University Podcast! I'm Lois Houston, Director of Communications and Adoption Programs with Customer Success Services, and with me is Nikita Abraham, Team Lead: Editorial Services with Oracle University.
Nikita: Hi everyone! Today, we're beginning a brand-new season, this time on Oracle AI Vector Search. Whether you're new to vector searches or you've already been experimenting with AI and data, this episode will help you understand why Oracle's approach is such a game-changer.
Lois: To make sure we're all starting from the same place, here's a quick overview. Oracle AI Vector Search lets you go beyond traditional database searches. Not only can you find data based on specific attribute values or keywords, but you can also search by meaning, using the semantics of your data, which opens up a whole new world of possibilities.
01:20
Nikita: That's right, Lois. And guiding us through this episode is Senior Principal APEX & Apps Dev Instructor Brent Dayley. Hi Brent! What's unique about Oracle's approach to vector search? What are the big benefits?
Brent: Now one of the biggest benefits of Oracle AI Vector Search is that semantic search on unstructured data can be combined with relational search on business data, all in one single system. This is very powerful, and also a lot more effective because you don't need to add a specialized vector database. And this eliminates the pain of data fragmentation between multiple systems.
It also supports Retrieval Augmented Generation, also known as RAG. Now this is a breakthrough generative AI technique that combines large language models and private business data. And this allows you to deliver responses to natural language questions. RAG provides higher accuracy and avoids having to expose private data by including it in the large language model training data.
02:41
Lois: OK, and can you explain what the new VECTOR data type is?
Brent: So, this data type was introduced in Oracle Database 23ai. And it allows you to store vector embeddings alongside other business data.
Now, the vector data type allows a foundation to store vector embeddings. This allows you to store your business data in the database alongside your unstructured data, and allows you to use those in your queries. So it allows you to apply semantic queries on business data.
03:24
Lois: For many of our listeners, "vector embeddings" might be a new term. Can you explain what vector embeddings are?
Brent: Vector embeddings are mathematical representations of data points. They assign mathematical representations based on meaning and context of your unstructured data.
You have to generate vector embeddings from your unstructured data either outside or within the Oracle Database. In order to get vector embeddings, you can either use ONNX embedding machine learning models or access third-party REST APIs.
Embeddings can be used to represent almost any type of data, including text, audio, or visual such as pictures. And they are used in proximity searches.
04:19
Nikita: Now, searching with these embeddings isn't about looking for exact matches like traditional search, right? This is more about meaning and similarity, even when the words or images differ? Brent, how does similarity search work in this context?
Brent: So vector data is usually unevenly distributed and clustered. Vector data tends to be unevenly distributed and clustered into groups that are semantically related. Doing a similarity search based on a given query vector is equivalent to retrieving the k nearest vectors to your query vector in your vector space.
What this means is that basically you need to find an ordered list of vectors by ranking them, where the first row is the closest or most similar vector to the query vector. The second row in the list would be the second closest vector to the query vector, and so on, depending on your data set. What we need to do is to find the relative order of distances. And that's really what matters rather than the actual distance.
Now, similarity searches tend to get data from one or more clusters, depending on the value of the query vector and the fetch size. Approximate searches using vector indexes can limit the searches to specific clusters. Exact searches visit vectors across all clusters.
05:51
Lois: Let's talk about how we actually convert information into these vectors. There are models behind the scenes, right? Kind of like translators between words, images, and numbers. Brent, what embedding models does Oracle support, and how do they handle different data types?
Brent: Vector embedding models allow you to assign meaning to what a word, or a sentence, or the pixels in an image, or perhaps audio. What that actually means? It allows you to quantify features or dimensions.
Most modern vector embeddings use a transformer model. Bear in mind that convolutional neural networks can also be used. Depending on the type of your data, you can use different pretrained open-source models to create vector embeddings. As an example, for textual data, sentence transformers can transform words, sentences, or paragraphs into vector embeddings.
For visual data, you can use residual network, also known as ResNet, to generate vector embeddings. You can also use visual spectrogram representation for audio data. And that allows us to use the audio data to fall back into the visual data case. Now, these can also be based on your own data set. Each model also determines the number of dimensions for your vectors.
As an example, Cohere's embedding model, embed English version 3.0, has 1,024 dimensions. Open AI's embedding model, text-embedding-3-large, has 3,072 dimensions.
07:45
Nikita: For organizations ready to put this into practice, there's the question of how to get the models up and running inside Oracle Database. Can you walk us through how these models are brought into Oracle Database?
Brent: Although you can generate vector embeddings outside the Oracle Database using pre-trained open-source embeddings or your own embedding models, you also have the option of doing those within the Oracle Database. In order to use those within the Oracle Database, you need to use models that are compatible with the Open Neural Network Exchange Standard, or ONNX, also known as onn-ex.
Oracle Database implements an ONNX runtime directly within the database, and this is going to allow you to generate vector embeddings directly inside the Oracle Database using SQL.
08:41
AI is transforming every industry. So, it's no wonder that AI skills are the most sought-after by employers. If you're ready to dive into AI, check out the OCI AI Foundations training and certification that's available for free! It's the perfect starting point to build your AI knowledge. Head over to mylearn.oracle.com to kickstart your AI journey today!
09:06
Nikita: Welcome back! Let's make this practical. Imagine I'm setting this up for the first time. What are the big steps? Can you walk us through the end-to-end workflow using Oracle AI Vector Search?
Brent: Generate vector embeddings from your data, either outside the database or within the database. Now, embeddings are a mathematical representation of what your data meaning is. So, what does this long sentence mean, for instance? What are the main keywords out of it?
You can also generate embeddings not only on your typical string type of data, but you can also generate embeddings on other types of data, such as pictures or perhaps maybe audio wavelengths.
Maybe we want to convert text strings to embeddings or convert files into text. And then from text, maybe we can chunk that up into smaller chunks and then generate embeddings on those chunks. Maybe we want to convert files to embeddings, or maybe we want to use embeddings for end-to-end search.
Now you have to generate vector embeddings from your unstructured data, as we mentioned, either outside or within the Oracle Database. You can either use the ONNX embedding machine learning models or you can access third-party REST APIs.
You can import pretrained models in ONNX format for vector generation within the database. You can download pretrained embedding machine learning models, convert them into the ONNX format if they are not already in that format. Then you can import those models into the Oracle Database and generate vector embeddings from your data within the database.
Oracle also allows you to convert pre-trained models to the ONNX format using Oracle machine learning for Python. This enables the use of text transformers from different companies.
11:36
Nikita: Once those embeddings are generated, what's the next step?
Brent: Store vector embeddings. So you can create one or more columns of the vector data type in your standard relational data tables. You can also store those in secondary tables that are related to the primary tables using primary key foreign key relationships.
You can store vector embeddings on structured data and relational business data in the Oracle Database. You do store the resulting vector embeddings and associated unstructured data with your relational business data inside the Oracle Database.
12:17
Lois: And when do vector indexes come into play?
Brent: Now you may want to create vector indexes in the event that you have huge vector spaces. This is an optional step, but this is beneficial for running similarity searches over those huge vector spaces.
12:38
Nikita: Now, once all of that is in place, how do users perform similarity searches?
Brent: So once you have generated the vector embeddings and stored those vector embeddings and possibly created the vector indexes, you can then query your data with similarity searches. This allows for native SQL operations and allows you to combine similarity searches with relational searches in order to retrieve relevant data.
So let's take a look at the combined complete workflow. Step number one, generate the vector embeddings from your unstructured data. Step number two, store the vector embeddings. Step number three, create vector indexes. And step number four, combine similarity and keyword searches.
Now there is another optional step. You could generate a prompt and send it to a large language model for a full RAG inference. You can use the similarity search results to generate a prompt and send it to your generative large language model in order to complete your RAG pipeline.
14:07
Lois: Thanks for that detailed walk-through, Brent. To sum up, today we introduced Oracle AI Vector Search, discussed its core concepts, data types, embedding models, and the complete workflow you'll use to get real value out of your business data, securely and efficiently.
Nikita: If you want to learn more about the topics we discussed today, go to mylearn.oracle.com and search for the Oracle AI Vector Search Fundamentals course. And if you're feeling inspired to try this out for yourself, don't forget to check out the Oracle Database 23ai SQL Workshop for hands-on training. Until next time, this is Nikita Abraham…
Lois: And Lois Houston, signing off!
14:49
That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.

43,659 Listeners

7,911 Listeners

967 Listeners

4,423 Listeners

2,002 Listeners

118 Listeners

29,274 Listeners

95 Listeners

688 Listeners

53 Listeners

1,477 Listeners