Image Datasets
Nomic Atlas natively supports Image Datasets, allowing users to upload, explore, and search image datasets with ease. Using Nomic Embed Vision, Atlas Datasets are built over users uploaded image, enabling powerful search and exploration capabilities.
Cleaning and annotating image datasets is a breeze with Atlas's intuitive interface. Users can use filters, lassos, and other selection tools to annotate and clean image datasets.
And with the release of Nomic Embed Vision, users can perform multimodal vector search over image datasets, allowing powerful text search over images.
Multimodal Vector Search
Nomic Embed Vision is aligned with Nomic Embed Text enabling high powered multimodal search capabilities. This capability is available in the Atlas interface as well as the Nomic client API.
Similar to the Vector Search Demo, you perform multimodal search over an image dataset. In simpler terms, you can search for images using text queries.
Open the vector search modal by clicking its selection icon or using the hotkey 'V'.
For example, searching for a tiny white ball
over the Imagenette dataset returns images of golf balls.
Storing Images in Atlas with Python
When you upload an image dataset to Atlas, each image is associated with an embedding using a Nomic Embedding Vision Model.
Here, we show how to create a native image Atlas Dataset using the Imagenette dataset
Note: The images must be located on your local machine. You can pass a list of image paths or PIL.Image
objects to the blobs
parameter in add_data
. See the API Reference for more information.
from nomic import atlas
from datasets import load_dataset
from tqdm import tqdm
id2label = {
"0": "tench",
"1": "English springer",
"2": "cassette player",
"3": "chain saw",
"4": "church",
"5": "French horn",
"6": "garbage truck",
"7": "gas pump",
"8": "golf ball",
"9": "parachute"
}
dataset = load_dataset('frgfm/imagenette', '160px')['train'].shuffle(seed=42)
images = dataset["image"]
labels = dataset["label"]
metadata = [{"label": id2label[str(label)]} for label in tqdm(labels, desc="Creating metadata")]
atlas.map_data(blobs=images,
data=metadata,
identifier='nomic/imagenette10k-block',
description='10k 160px Imagenette images',
topic_model={"build_topic_model": False},
)
Note: If you have a large dataset of images, you can additionally successively add images to the dataset using the add_data
method.
from nomic import AtlasDataset
dataset = AtlasDataset(
'zach/imagenette10k-successive-adds',
unique_id_field="id",
)
for i, record in enumerate(metadata):
metadata[i]["id"] = i
for i in range(0, len(images), 1000):
dataset.add_data(blobs=images[i:i+1000],
data=metadata[i:i+1000],
)
atlas_map = dataset.create_index(topic_model={"build_topic_model": False}, embedding_model="nomic-embed-vision-v1.5")
print(f"Map URL: {atlas_map.dataset_link}")
Bring Your Own Image Embeddings
You can also upload your own image embeddings to Atlas.
from nomic import atlas
# Load your image embeddings
embeddings = ... # np.array of shape (n_images, embedding_dim)
atlas.map_data(embeddings=embeddings,
identifier='My Image Embeddings',
)