Vector
Exposed provides support for vector data types, allowing you to store and query high-dimensional embeddings directly within your database, as well as to run AI-powered similarity searches. Oracle, SQL Server, MySQL, and MariaDB all support explicit vector types, with PostgreSQL providing the type through the open-source extension pgvector.
Supported types
The exposed-core module supports three main ways to define vector columns using the .vector() method:
The most common vector definition that accepts
FloatArrayvalues and allows you to define dimensions:val embedding = vector("embedding", dimensions = 3)This will set the dimensions format as using 32-bit floating-point numbers, if your database supports explicit formats.
Vector definition with type inference that sets the underlying format. Currently, the only options available are
FloatArrayandIntArray:val embedding = vector<IntArray>("embedding", 99)By default,
FloatArraywill set a 32-bit floating-point number format, whileIntArraywill set an 8-bit integer format.Full vector definition with control over the accepted input type, the dimensions, and the underlying format:
val embedding = vector<FloatArray>("embedding", 2049, VectorFormat.FLOAT64)This is useful, for example, when an alternative floating-point format is required, like for 64-bit numbers.
Basic usage
Vector columns store and retrieve data as Kotlin primitive arrays, either FloatArray or IntArray, providing a natural way to work with vector data in your code.
You can define vector columns within a table definition as follows:
Here's an example of inserting data into vector columns:
PostgreSQL usage
PostgreSQL provides support for the vector data type through the installation of the pgvector extension. Exposed assumes that this extension has already been enabled at some point prior to table creation or use of a vector column:
If this is not the case, usage of the column will result in a database exception.
Vector functions
Exposed provides vector distance functions with results being calculated based on the following distance similarity options:
Cosine
Euclidean
Dot product
Here's an example of querying vector data using a distance function: