Skip to main content

Data upload

See the API Reference for details and additional options.

Creating an embedding dataset

The following minimal example allows you to interact with your embeddings dataset in Atlas.

from nomic import atlas
import numpy as np

num_embeddings = 10000
embeddings = np.random.rand(num_embeddings, 512)

dataset = atlas.map_data(embeddings=embeddings)
print(dataset)

This dataset will contain 10,000 random embeddings. You can interact with it in Nomic Atlas as organized by the nomic-project-v1 model by navigating to the browser link.

Creating a text dataset

from nomic import atlas
import pandas

news_articles = pandas.read_csv('https://raw.githubusercontent.com/nomic-ai/maps/main/data/ag_news_25k.csv')

dataset = atlas.map_data(data=news_articles, indexed_field='text')
print(dataset)

This dataset will contain 25,000 news articles embedded with the default Nomic Embed Text model.

Creating an image dataset

from nomic import atlas
from datasets import load_dataset

dataset = load_dataset('cifar10', split="train")

images = dataset["img"]
data = [{"label": label} for label in dataset["label"]]
dataset = atlas.map_data(blobs=images, data=data)

This dataset will map the CIFAR10 dataset with the default Nomic Embed Vision model.