Atlas Data Maps
What are Data Maps?
The Atlas Data Map plots your entire dataset on one screen and organizes it by meaning into clusters. Clusters on your Atlas Map contain datapoints that are semantically similar. An Atlas Map can search, filter and export data at scale.
All operations on the Atlas Map browser interface can be executed with the API.
Use Cases for Data Maps
- See and explore your text, image, audio and video datasets.
- Cluster and auto-categorize your datasets.
- Discover outlier and anomolous regions in your data.
- Rapidly and collaboratively iterate on your dataset by removing undesired datapoints, applying tags and sharing insights.
An Atlas Map has the following properties:
-
Points close to each other on the map are semantically similar/related.
All news articles about sports are on the left side of the map. Inside the sports region, the map breaks down by type of sport because news articles about a fixed sport (e.g. football) have more similarity to each other than with news articles about other types of sports (e.g. baseball).
-
Numerical distances between 2D point positions do not have concrete meaning.
For example, the observation that the Baseball and Football clubs news article clusters are adjacent signify a relationship between Baseball and Football in the embedding space. You should not, however, make claims or draw conclusions using the Euclidean distance between points in the two clusters. Distance information is only meaningful in the ambient embedding space and can be retrieved with
vector_search
. -
Floating labels correspond to distinct topics in your data.
For example, the Baseball cluster has the label 'Astros win Game 1 of World'. Labels are automatically determined from the textual contents of your data and are crucial for navigating the map. Learn more about how topics labels are generated.
-
Topics have a hierarchy.
Topics group your dataset into homogenous regions. As you zoom around the map, more granular versions of your datasets topics will appear.
-
Maps update as your data updates.
When new data enters your dataset, Atlas can rebuild the map to reflect how the new data relates to existing data.
-
Built for collaboration across technical and non-technical teams.
All information and operations that are visually presented on an Atlas Map have a programmatic analog. For example, engineers can access cluster ids, topic information, duplicate clusters and run vector search through the Python client.