Skip to main content

Image Datasets

Nomic Atlas natively supports Image Datasets, allowing users to upload, explore, and search image datasets with ease. Using Nomic Embed Vision, Atlas Datasets are built over users uploaded image, enabling powerful search and exploration capabilities.

Cleaning and annotating image datasets is a breeze with Atlas's intuitive interface. Users can use filters, lassos, and other selection tools to annotate and clean image datasets.

And with the release of Nomic Embed Vision, users can perform multimodal vector search over image datasets, allowing powerful text search over images.

Nomic Embed Vision is aligned with Nomic Embed Text enabling high powered multimodal search capabilities. This capability is available in the Atlas interface as well as the Nomic client API.

Similar to the Vector Search Demo, you perform multimodal search over an image dataset. In simpler terms, you can search for images using text queries.

Open the vector search modal by clicking its selection icon or using the hotkey 'V'.

Vector Search Tool

For example, searching for a tiny white ball over the Imagenette dataset returns images of golf balls.

Multimodal Vector Search Results

Storing Images in Atlas with Python

When you upload an image dataset to Atlas, each image is associated with an embedding using a Nomic Embedding Vision Model.

Here, we show how to create a native image Atlas Dataset using the Imagenette dataset

Note: The images must be located on your local machine. You can pass a list of image paths or PIL.Image objects to the blobs parameter in add_data. See the API Reference for more information.

from nomic import atlas
from datasets import load_dataset
from tqdm import tqdm

id2label = {
"0": "tench",
"1": "English springer",
"2": "cassette player",
"3": "chain saw",
"4": "church",
"5": "French horn",
"6": "garbage truck",
"7": "gas pump",
"8": "golf ball",
"9": "parachute"
}

dataset = load_dataset('frgfm/imagenette', '160px')['train'].shuffle(seed=42)

images = dataset["image"]
labels = dataset["label"]

metadata = [{"label": id2label[str(label)]} for label in tqdm(labels, desc="Creating metadata")]

atlas.map_data(blobs=images,
data=metadata,
identifier='nomic/imagenette10k-block',
description='10k 160px Imagenette images',
topic_model={"build_topic_model": False},
)

Note: If you have a large dataset of images, you can additionally successively add images to the dataset using the add_data method.

from nomic import AtlasDataset

dataset = AtlasDataset(
'zach/imagenette10k-successive-adds',
unique_id_field="id",
)

for i, record in enumerate(metadata):
metadata[i]["id"] = i

for i in range(0, len(images), 1000):
dataset.add_data(blobs=images[i:i+1000],
data=metadata[i:i+1000],
)

atlas_map = dataset.create_index(topic_model={"build_topic_model": False}, embedding_model="nomic-embed-vision-v1.5")

print(f"Map URL: {atlas_map.dataset_link}")

Bring Your Own Image Embeddings

You can also upload your own image embeddings to Atlas.

from nomic import atlas

# Load your image embeddings

embeddings = ... # np.array of shape (n_images, embedding_dim)


atlas.map_data(embeddings=embeddings,
identifier='My Image Embeddings',
)