Data upload
This page gives basic starter code to create new data maps in Atlas by uploading data via the Python SDK:
In our API Reference, you can find more details about the Python SDK and how to add batches of data to existing datasets
Upload a text dataset
This will create a map of 25,000 news articles. Each article will be embedded automatically with Nomic Embed Text.
from nomic import atlas
import pandas
news_articles = pandas.read_csv(
'https://raw.githubusercontent.com/nomic-ai/maps/main/data/ag_news_25k.csv'
)
atlas.map_data(
data=news_articles,
indexed_field='text',
identifier="Example-text-dataset-news"
)
Upload an image dataset
This will create a map of the CIFAR10 dataset. Each image will be embedded automatically with Nomic Embed Vision.
from nomic import atlas
from datasets import load_dataset
cifar = load_dataset('cifar10', split="train")
images = cifar["img"]
data = [{"label": label} for label in cifar["label"]]
atlas.map_data(
blobs=images,
data=data,
identifier="Example-image-dataset-CIFAR10"
)
Upload an embeddings dataset
This will create a map of the same 25,000 news articles as the above example for creating a text dataset, except instead of having the embeddings created automatically upon upload to Atlas on your behalf, we first generate the embeddings ourselves in Python and then upload those embedding vectors directly.
We use Nomic's embedding model nomic-embed-text-v1.5
, to create the embeddings, using the task prefix "clustering"
to create embeddings specialized for visual clustering, and using local inference mode to generate the embeddings on your local device using the downloaded model.
We use NomicTopicOptions
to create topic labels for our embeddings dataset. Topics are generated automatically when you create a text dataset, but when you create an embeddings dataset you need to specify which features to use for descriptive topics in this way. To learn more about topics and topic modeling in Atlas, visit our Topic modeling explainer guide.
from nomic import atlas, embed
from nomic.data_inference import NomicTopicOptions
import pandas
news_articles = pandas.read_csv(
'https://raw.githubusercontent.com/nomic-ai/maps/main/data/ag_news_25k.csv'
)
embeddings = embed.text(
texts=news_articles.text.values,
model='nomic-embed-text-v1.5',
task_type='clustering',
inference_mode='local'
)['embeddings']
atlas.map_data(
data=news_articles,
embeddings=embeddings,
topic_model=NomicTopicOptions(
build_topic_model=True,
topic_label_field='text'
),
identifier='Example-embeddings-dataset-news'
)
Note: Nomic Embed vs other embedding vectors
You can upload any embedding vectors from any embedding model to Atlas, but features like vector search only are available when you upload vectors in Nomic's embedding space.