Skip to main content

Neural search

Nomic Atlas enables you to search your dataset semantically with vector search.

You can run neural search over embeddings generated by Nomic Embedding models or your own.

In this example, we create a dataset of 25,000 news articles with the default Nomic Text Embedding model and run various types of semantic search.

from nomic import atlas
import pandas

news_articles = pandas.read_csv('https://raw.githubusercontent.com/nomic-ai/maps/main/data/ag_news_25k.csv')

dataset = atlas.map_data(data=news_articles, indexed_field='text')
print(dataset)

Searching by datapoint

Running vector_search on your dataset will return the IDs of the k-closest datapoints to a given query point.

# Load map and perform vector search
map = dataset.maps[0]

# Run vector search on your map for points with ID numbers 100, 111, 112
neighbors, distances = map.embeddings.vector_search(ids=[100,111,112], k=5)

From IDs, print the values of the datapoints:

# Print the 5 most similar datapoints to the data point to your first query point (id #100)
# Your query: 'The US team is set for Spain for the Davis Cup final NEW YORK Andy Roddick and Mardy Fish will represent the United States in singles play at next month #39;s Davis Cup final against Spain.',
similar_datapoints = dataset.get_data(ids=neighbors[0])
for i, point in enumerate(similar_datapoints):
if i == 0:
print('Initial point:',point,'\n')
print('Nearest neighbors:')
else:
print(point)

Searching by embedding

You may also vector search using a query vector instead of an ID. This function finds nearest neighbors based on your input vector.

import numpy as np

# Generates a random query vector
random_query_vector = np.random.rand(1, 768)

# Searches for k-nearest neighbors of random_query_vector
with dataset.wait_for_dataset_lock():
neighbors, distances = map.embeddings.vector_search(queries=random_query_vector, k=10)

print("Neighbor IDs:", neighbors)

data = dataset.get_data(ids=query_document_ids)
for datum, datum_neighbors in zip(data, neighbors):
neighbor_data = dataset.get_data(ids=datum_neighbors)
print(f"The ten nearest neighbors to the query point {datum} are {neighbor_data}")

Retrieving search distances

# Load map and perform vector search
map = dataset.maps[0]

with dataset.wait_for_dataset_lock():
neighbors, distances = map.embeddings.vector_search(ids=[100,111,112], k=5)

print(distances)