Skip to main content

Unstructured data interface

Nomic Atlas introduces a revolutionary interface for working with unstructured data: The Atlas Map.

The Atlas Map plots your entire dataset on one screen and organizes it by meaning into clusters. Clusters on your Atlas Map contain datapoints that are semantically similar. An Atlas Map can search, filter and export data at scale.

tip

All operations on the Atlas Map browser interface can be executed with the API.

Use cases

  • Understand and find insights in your text, image, audio and video datasets.
  • Cluster and auto-categorize your datasets.
  • Discover outlier and anomolious regions in your data.
  • Rapidly and collaboratively iterate on your dataset by removing undesired datapoints, applying tags and sharing insights.

To better explain how to interact with an Atlas Map, let’s take a tour around a dataset of news articles. Click around the American News Sample map below or visit it in the browser.


An Atlas Map has the following properties:

  1. Points close to each other on the map are semantically similar/related.

    All news articles about sports are on the left side of the map. Inside the sports region, the map breaks down by type of sport because news articles about a fixed sport (e.g. football) have more similarity to each other than with news articles about other types of sports (e.g. baseball).

  2. Numerical distances between 2D point positions do not have concrete meaning.

    For example, the observation that the Baseball and Football clubs news article clusters are adjacent signify a relationship between Baseball and Football in the embedding space. You should not, however, make claims or draw conclusions using the Euclidean distance between points in the two clusters. Distance information is only meaningful in the ambient embedding space and can be retrieved with vector_search.

  3. Floating labels correspond to distinct topics in your data.

    For example, the Baseball cluster has the label 'Astros win Game 1 of World'. Labels are automatically determined from the textual contents of your data and are crucial for navigating the map. Learn more about how topics labels are generated.

  4. Topics have a hierarchy.

    Topics group your dataset into homogenous regions. As you zoom around the map, more granular versions of your datasets topics will appear.

  5. Maps update as your data updates.

    When new data enters your dataset, Atlas can rebuild the map to reflect how the new data relates to existing data.

  6. Built for collaboration across technical and non-technical teams.

    All information and operations that are visually presented on an Atlas Map have a programmatic analog. For example, engineers can access cluster ids, topic information, duplicate clusters and run vector search through the Python client.

Atlas Search button.

Instantly search datasets with up to tens of millions of points in Atlas.

You can search over any column in your dataset and matches will display on your Atlas Map.

For more involved search terms, you may want to layer the helper tools (below) onto your search which match complete words or match case exactly. Using regular expressions can allow you to apply complex pattern-matching to your search.

Search options

Only match complete words

Search button for only matching complete words.

Match case exactly

Search button for matching case exactly.

Regular expression search

Search button for regex.

Example: Search over beauty reviews

On an Atlas map of Amazon Beauty Reviews, doing a search on the keyword “hair” highlights at least two large areas of the graph which contain the word “hair.”

Zooming in shows that the cluster on the left side is mostly composed of reviews related to shaving and hair removal products, while the cluster on the right side has reviews related to hairstyling.

Check out the Atlas map on Amazon Beauty Reviews yourself and try running a search!

Atlas Map on Amazon Beauty Reviews

Atlas Map on Beauty Reviews.

Below: Above map zoomed in on right-most cluster for "hair"

Atlas Map on Beauty Reviews - zoomed on Head.

Metadata Filters

Atlas Filter button.

Apply filters to your data to filter over metadata, giving you new views and greater insight into your dataset. Slicing by timestamp allows you to see change in topics over time. Filter over any of your numerical metadata values like sentiment value, temperature, price, score, and much more.

Atlas Filter dropdown 1. Atlas Filter dropdown 2.
Example: Filter a dataset of TikTok videos

The example below shows the same map, before and after applying a filter. This map uses a dataset of TikTok videos from a one-week span in 2023. In this dataset, metadata such as like count and play count were collected along with the videos.

If you were interested in looking at popular videos, you could filter your map to view datapoints above a certain threshold. In the map below, the data was filtered on videos with like counts above 1M. As we can see from the data sidebar, there are only 10 videos out of 39k total which surpassed 1M likes.

Check out the TikTok map shown below and try applying your own filters.

Original map on sample of a week of TikTok videos

Atlas Map on TikTok data.

Below: Above map filtered for videos with more than 1M likes

Atlas Map on TikTok data, filtered.

Lasso and Tagging

Atlas lasso button.

The Lasso tool allows you to select points on the map by circling them with your mouse. Lassoing can be a part of your data pipeline as you find, select, tag, and clean your data.

Example: Identifying an outlier cluster from a news dataset

In the news dataset example below, we can use Lasso to select an outlier cluster.

On inspection, we see an area of the map containing points related to betting and casinos. Let's say we don’t want to include these points in our news analysis.

To tag these points using the Lasso tool:

  1. Select the lasso function under “Selection Tools.”
  2. Draw an outline on the map which captures the points of interest.
  3. On the data sidebar, click +tag all and add the name of the tag you want to apply to all lassoed points.
  4. Your points are now tagged!
note

See the API reference or the data tagging walkthrough to learn how to use Python to use your tags for cleaning data.

Atlas News Map zoomed into betting and gambling area

Atlas news map zoomed on betting section.

Video: Example of lasso tool used to tag Atlas map

Gif of Atlas news applying lasso tool to betting section.

Duplicate detection

Atlas duplicate detection button.

Duplicate detection in Atlas streamlines your data by identifying and consolidating duplicate entries. This tool ensures data accuracy and integrity, enabling cleaner datasets for more reliable analysis. Use the tool in the browser to find your duplicate datapoints.

Visual Configuration

Screenshot of map view

You can customize the color scheme and point sizes on your map in View Settings.

You may color by existing columns in your data. Depending on your metadata, you might be able to color a news map, for example, by language, news outlet name, country of origin, or number of views. Coloring works for both categorical or numerical data types.

Screenshot of view settings

To color datapoints by topic clusters, you can color by Nomic Topic: 1/2/3. Depth level 1 is most general and depth level 3 is the most specific. This can give you a clearer view of the divisions and overlap between different topics in your data.

The legend on your graph will describe the current labels corresponding to colors of points on the map. If the colorable field is one of the Nomic Topic depth levels, then the labels in the legend and on the map will be the topic labels themselves.

Adjust your point size to any size that works for you — the right point size can better highlight the structures in the map, help you more quickly identify outliers, or more easily identify color patterns.

Point Positioning

By default, all points are positioned using our own projection algorithm, Nomic Project. However, Atlas allows you to reposition points using alternative positioning schemes.

tip

Combining point repositioning with selection filters allows for more precise data selection. For example, make a lasso in one positioning scheme and then switch to another to see the selected points in a different context.

X-Y Positioning

To use an alternative X-Y coordinate positioning scheme, you must include a pair of named X and Y columns in your dataset. For example, if you want a position scheme called MyPosition, you would include any of the following pairs of columns in your dataset:

  • MyPosition_X and MyPosition_Y (x and y are case-insensitive)
  • MyPosition-X and MyPosition-Y
  • MyPosition.X and MyPosition.Y
  • MyPosition X and MyPosition Y

This will appear as "MyPosition XY" in the "Position Mode" dropdown. Multiple X-Y pairs can be used in the same dataset, and you can switch between them in the dropdown.

Geospatial Positioning

To use geospatial positioning, you must include a pair of Latitude and Longitude columns in your dataset. These can be lat and lon or latitude and longitude.

note

Unlike X-Y positioning, geospatial positioning only supports one pair of latitude and longitude columns in a dataset for now.