Skip to main content

Embeddings

Nomic Atlas allows anyone to access the power of embeddings.

An embedding is a vector representation of an unstructured datapoint that enables computers to manipulate the data based on semantics and meaning.

Learn more about Nomic Embedding Models and the Embedding Inference API.

New Release: Nomic Embedding Model and API

We've launched nomic-embed-text-v1.5 a text embedding model that supports variable output size!

You can use it as the text embedding model powering your AtlasDataset and it is available in the Nomic Embedding API.

Read more in our official blog post and learn how to use it in the API Reference.

Embeddings in Atlas

When an unstructured dataset is uploaded to Atlas, an embedding is associated with each datapoint using a Nomic Embedding Model.

Nomic Atlas operates over embeddings to enable its unstructured data capabilities.

2D Embeddings: All embeddings stored in Atlas have a corresponding 2D, human-interpretable representation. These 2D embeddings power the layout of the Atlas Map. They are generated with a Nomic Dimensionality Reduction model.

Accessing Embeddings

You can use the Nomic Python client to access and download low-dimensional (2D) and high-dimensional embeddings of your dataset.

  • Low-dimensional (2-D): These are the embeddings used to visualize your datasets in the Atlas Map.
  • High-dimensional (latent): These are produced by Nomic Embedding Models or are your uploaded embeddings.

Your datasets embeddings exist in the map.embeddings attribute of the AtlasDataset:

from nomic import AtlasDataset

map = AtlasDataset('my-dataset').maps[0]

map.embeddings

Latent and 2D Embeddings

The map.embeddings.projected attribute contains a Pandas dataframe of your 2D embeddings.

The map.embeddings.latent contains high-dimensional embeddings produced by a Nomic Embedding Model.

# 2D
projected_embeddings = map.embeddings.projected

# Latent high dimensional
latent_embeddings = map.embeddings.latent