Quickstart
Create Account
Visit the Nomic Atlas website and sign up for a free account. You'll then be asked to choose a name for your Atlas organization, after which you can upload datasets and create data maps in Atlas.
Atlas Dashboard
Once you are signed up for Atlas and logged in, visit https://atlas.nomic.ai/data to open your Atlas Dashboard.
Alteratively, on the Nomic Atlas homepage click the Dashboard
button.
Your organization's data maps will live here. For new organizations with no datasets yet, you will be prompted to get started uploading your first dataset.
Data upload
Here are a few pathways available to bring data into Atlas:
Data connector upload
This path involves using a connector to bring in a dataset from an external platform (e.g. using the Hugging Face integration).
Drag & drop upload
Atlas lets you take a dataset file (in CSV, TSV JSON, or JSONL format) and upload it directly.
SDK upload
Here is an example of using the Atlas Python SDK to upload data for an Atlas data map.
First, login to Nomic with your API key at your terminal/command line:
nomic login nk-...
Then, use the atlas.map_data()
function to upload your data (e.g. as a pandas DataFrame) to Atlas and create a map from it:
from nomic import atlas
import pandas
news_articles = pandas.read_csv(
'https://raw.githubusercontent.com/nomic-ai/maps/main/data/ag_news_25k.csv'
)
atlas.map_data(
data=news_articles,
indexed_field='text',
identifier="Example-text-dataset-news"
)
Developers can see more examples of uploading data to Atlas in our Python SDK documentation.
Upload options
Name
The name which will be used for your data map's display in the Dashboard, as well as its URL.
Embedding Field / Indexed Field
The attribute of your dataset used to arrange the points in the Atlas map.
Uploading data to Atlas requires choosing a which field/column from your dataset to embed with an embedding model. This choice determines how the datapoints will get arranged as a map in Atlas: data that show up as neighboring points in the data map will have similar semnatic content in this field/column from your dataset (and thus similar embeddings via the embedding model). Typically, you will want this to be the text column from your data, as opposed to non-semantic content like IDs or numerical metadata.
Build Topic Model
Whether to build a topic model, which displays labels over clusters and subclusters within the Atlas data map interface. You can read more about how it works here.
Duplicate Detection
Whether to activate duplicate detection, which will create a new column of metadata for your dataset indicating which points are likely duplicates of other data points.
Use Multilingual Model
Whether to use a multilingual embedding model, which will group data points together based on semantic meaning regardless of language used in the text (as opposed to the default nomic-embed-text-v1.5 embedding model, which will create distinct clusters of data depending on the language used in the text).
Private Map
For Pro & Enterprise accounts, you can make a map private so that they are not publicly accessible, and only members of your organization will be able to find/access the map.