Skip to main content

Vector Search

Nomic Atlas enables you to search your dataset semantically with vector search.

You can run neural search over embeddings generated by Nomic Embedding models or your own.

In this example, we create a dataset of 25,000 news articles with the default Nomic Text Embedding model and run various types of semantic search.

from nomic import atlas
import pandas

news_articles = pandas.read_csv('https://raw.githubusercontent.com/nomic-ai/maps/main/data/ag_news_25k.csv')

dataset = atlas.map_data(data=news_articles, indexed_field='text')
print(dataset)

Atlas Interface

Vector search is a readily-available functionality in the Atlas interface. Use vector search in Atlas to visualize semantic neighborhoods and create complex data selections.

Vector Search Tool

Open the vector search modal by clicking its selection icon or using the hotkey 'V'.

Vector Search Tool

Vector search has three modes:

  1. Query Search (Default): Find data points that best answer a question you have about your data (e.g. "What's happening with the stock market?")

  2. Document Search: Find data points that are most similar to a text sample you provide. (e.g. "Stock market performance")

  3. Embedding Search: Find data points that are close to a user-inputted embedding (e.g. [0.42, 0.87, 0.013, ...]). This is the only mode supported for Embedding datasets. The search will fail or provide incorrect results if your embedding does not have the same dimensionality as your dataset's embeddings or comes from a different embedding model/space.

After submitting a successful search, a slider will appear for you to adjust. The slider is over similarity values (i.e. dot products) where the larger the value, the more similar the data point is to the search vector. Drag the slider to the left to include data points that are less similar and right for more similar. The percentages displayed shows the percentile number of data points captured between the similarity cutoffs. The Atlas map selection will automatically update based on the slider range.

Vector Search Results

Vector Search as a Filter

Vector Search is part of the selection paradigm within Atlas. This means that out-of-the-box, you can combine your vector search filter with other Atlas tools to compose complex data selections.

Below, we join our vector search results with a exact term search and a visual lasso to find a subset of technology stock articles we are interested in.

Vector Search Filter

Nomic Multimodality

Nomic Text and Vision embedding models provide compatible, aligned embeddings (See details here). This means you can run text-to-image and image-to-text vector searches on your data (e.g. Find cat articles by providing a picture of a cat; find cat images that match the query "What animals are cute to cuddle with?").

Vector Search Text-to-Image

Python

Searching by datapoint

Running vector_search on your dataset will return the IDs of the k-closest datapoints to a given query point.

# Load map and perform vector search
map = dataset.maps[0]

# Run vector search on your map for points with ID numbers 100, 111, 112
neighbors, distances = map.embeddings.vector_search(ids=[100,111,112], k=5)

From IDs, print the values of the datapoints:

# Print the 5 most similar datapoints to the data point to your first query point (id #100)
# Your query: 'The US team is set for Spain for the Davis Cup final NEW YORK Andy Roddick and Mardy Fish will represent the United States in singles play at next month #39;s Davis Cup final against Spain.',
similar_datapoints = dataset.get_data(ids=neighbors[0])
for i, point in enumerate(similar_datapoints):
if i == 0:
print('Initial point:',point,'\n')
print('Nearest neighbors:')
else:
print(point)

Searching by embedding

You may also vector search using a query vector instead of an ID. This function finds nearest neighbors based on your input vector.

import numpy as np

# Generates a random query vector
random_query_vector = np.random.rand(1, 768)

# Searches for k-nearest neighbors of random_query_vector
with dataset.wait_for_dataset_lock():
neighbors, distances = map.embeddings.vector_search(queries=random_query_vector, k=10)

print("Neighbor IDs:", neighbors)

data = dataset.get_data(ids=query_document_ids)
for datum, datum_neighbors in zip(data, neighbors):
neighbor_data = dataset.get_data(ids=datum_neighbors)
print(f"The ten nearest neighbors to the query point {datum} are {neighbor_data}")

Retrieving search distances

# Load map and perform vector search
map = dataset.maps[0]

with dataset.wait_for_dataset_lock():
neighbors, distances = map.embeddings.vector_search(ids=[100,111,112], k=5)

print(distances)