Improve AI Model Performance with Embedding Visualization and Evaluation
In this guide, we’ll walk through how to visualize embedding model decision boundaries to improve AI model accuracy using Nomic Atlas. Embedding visualizations help engineers debug models, detect overlapping clusters, and refine vector search performance.
By using Nomic Atlas’s interactive embedding visualizations, AI engineers can:
- Quickly diagnose model failures
- Optimize embedding separability for improved performance
- Debug vector search, clustering, and classification models
Why Visualizing Embedding Decision Boundaries Matters
Embeddings encode data into high-dimensional spaces for use in AI applications such as NLP, recommendation systems, and search engines. However, when embeddings are poorly formed, models struggle to separate concepts correctly, leading to misclassification and poor model performance.
With Atlas, engineers can explore embeddings in an interactive space and detect:
- Cluster overlap causing poor classification performance
- Misclassified points that require dataset adjustments
- Feature drift in embeddings over different training iterations
Setup
To run the code in this guide, make sure you have nomic
and numpy
installed to your python environment:
- pip
- uv
pip install nomic numpy
uv add nomic numpy
Then, login to nomic
with your Nomic API key. If you don't have a Nomic API key you can create one here.
- Terminal
- Python
nomic login nk-...
import nomic
nomic.login("nk-...")
Uploading Embeddings to Atlas
Let’s assume you have a 128-dimensional embedding dataset from a well-trained NLP model on three classes. Since the model is well-trained, the embeddings will be grouped into three distinct clusters. Let's see how Atlas can help us debug this model.
Prepare Embeddings and Create an Atlas Dataset
The following code creates an embedding dataset with 3 distinct clusters of embeddings.
from nomic import AtlasDataset
import numpy as np
n_per_class = 150
embedding_dim = 128
embedding_cluster_1 = np.random.normal(loc=-1, scale=.1, size=(n_per_class, embedding_dim))
labels_cluster_1 = np.zeros(n_per_class)
embedding_cluster_2 = np.random.normal(loc=0, scale=.1, size=(n_per_class, embedding_dim))
labels_cluster_2 = np.zeros(n_per_class)+1
embedding_cluster_3 = np.random.normal(loc=1, scale=.1, size=(n_per_class, embedding_dim))
labels_cluster_3 = np.zeros(n_per_class)+2
labels = np.concatenate([labels_cluster_1, labels_cluster_2, labels_cluster_3], axis=0)
embeddings = np.concatenate([embedding_cluster_1, embedding_cluster_2, embedding_cluster_3], axis=0)
data = [
{'class': f'class_{label}', 'id': i}
for i, label in enumerate(labels)
]
Now we create an Atlas Dataset with the embeddings and labels and then build a data map.
dataset = AtlasDataset(
identifier='three-embedding-clusters',
description='Visualizing three embedding clusters',
unique_id_field='id')
dataset.add_data(embeddings=embeddings, data=data)
data_map = dataset.create_index(topic_model=False)
View Your Atlas Map
Once uploaded, Nomic Atlas generates an interactive visualization where you can zoom, filter, and analyze embedding clusters. You'll see that the clusters have distinct boundaries between the class labels, which you can use to color the clusters in the View Settings menu.
You'll find that the embeddings are clustered into 3 distinct clusters. We didn't compute their 2D coordinates explicitly: we created random samples in 128-dimensional space that Atlas represents with 2D coordinates, capturing the essential information of which embeddings are nearest which other embeddings.
Once embeddings are mapped in Nomic Atlas, you can:
- Inspect clusters for separation quality
- Highlight misclassified points
- Compare embeddings across different training iterations
Visualizing Decision Boundaries
When analyzing your embeddings visualization, there are several key patterns to look for. Overlapping clusters are a warning sign that your model is failing to properly distinguish between different classes. If you see sparse embeddings without clear clustering, this may indicate that your model has poor feature representation capabilities. On the other hand, tightly packed, well-separated clusters suggest that your embeddings are well-structured and your model is effectively learning to differentiate between classes.
Debugging and Improving Model Accuracy with Atlas
Nomic Atlas makes it simple to iteratively improve your model's performance during training and debugging. By visualizing your embeddings in Atlas, you can quickly identify mislabeled data points that appear in unexpected clusters, then use that insight to relabel ambiguous cases. As you fine-tune your model with techniques like contrastive learning, you can continuously upload new embeddings to Atlas to track improvements in cluster separation and overall embedding quality. This visual feedback loop helps you efficiently identify and address issues like outliers or poor feature representation, allowing you to rapidly iterate until you achieve well-structured, semantically meaningful embeddings that translate to better model performance.
Conclusion
Embedding visualizations help AI teams gain deeper insights into feature learning—beyond traditional accuracy metrics.
Try this tutorial with your own embeddings using the Nomic Atlas Python SDK