Embeddings
Nomic Atlas allows anyone to access the power of embeddings.
An embedding is a vector representation of an unstructured datapoint that enables computers to manipulate the data based on semantics and meaning.
Learn more about Nomic Embedding Models and the Embedding Inference API.
We've launched nomic-embed-text-v1.5 a text embedding model that supports variable output size!
You can use it as the text embedding model powering your AtlasDataset and it is available in the Nomic Embedding API.
Read more in our official blog post and learn how to use it in the API Reference.
Embeddings in Atlas
When an unstructured dataset is uploaded to Atlas, an embedding is associated with each datapoint using a Nomic Embedding Model.
Nomic Atlas operates over embeddings to enable its unstructured data capabilities.
2D Embeddings: All embeddings stored in Atlas have a corresponding 2D, human-interpretable representation. These 2D embeddings power the layout of the Atlas Map. They are generated with a Nomic Dimensionality Reduction model.
Accessing Embeddings
You can use the Nomic Python client to access and download low-dimensional (2D) and high-dimensional embeddings of your dataset.
- Low-dimensional (2-D): These are the embeddings used to visualize your datasets in the Atlas Map.
- High-dimensional (latent): These are produced by Nomic Embedding Models or are your uploaded embeddings.
Your datasets embeddings exist in the map.embeddings
attribute of the AtlasDataset
:
from nomic import AtlasDataset
map = AtlasDataset('my-dataset').maps[0]
map.embeddings
Latent and 2D Embeddings
The map.embeddings.projected
attribute contains a Pandas dataframe of your 2D embeddings.
The map.embeddings.latent
contains high-dimensional embeddings produced by a Nomic Embedding Model.
# 2D
projected_embeddings = map.embeddings.projected
# Latent high dimensional
latent_embeddings = map.embeddings.latent