Skip to main content

Overview

The Nomic Python client is the best way to upload and interact with large unstructured datasets.

Installation

pip install nomic

Data Upload

def map_data(
data: Optional[Union[DataFrame, List[Dict], Table]] = None,
blobs: Optional[List[Union[str, bytes, Image.Image]]] = None,
embeddings: Optional[np.ndarray] = None,
identifier: Optional[str] = None,
description: str = "",
id_field: Optional[str] = None,
is_public: bool = True,
indexed_field: Optional[str] = None,
projection: Union[bool, Dict, NomicProjectOptions] = True,
topic_model: Union[bool, Dict, NomicTopicOptions] = True,
duplicate_detection: Union[bool, Dict, NomicDuplicatesOptions] = True,
embedding_model: Optional[Union[str, Dict, NomicEmbedOptions]] = None
) -> AtlasDataset

Arguments:

  • data: An ordered collection of the datapoints you are structuring. Can be a list of dictionaries, Pandas Dataframe or PyArrow Table.
  • blobs: A list of image paths, bytes, or PIL images to add to your image dataset that are stored locally.
  • embeddings: An [N,d] numpy array containing the N embeddings to add.
  • identifier: A name for your dataset that is used to generate the dataset identifier. A unique name will be chosen if not supplied.
  • description: The description of your dataset
  • id_field: Specify your data unique id field. This field can be up 36 characters in length. If not specified, one will be created for you named id_.
  • is_public: Should the dataset be accessible outside your Nomic Atlas organization.
  • projection: Options to adjust Nomic Project - the dimensionality algorithm organizing your dataset.
  • topic_model: Options to adjust Nomic Topic - the topic model organizing your dataset.
  • duplicate_detection: Options to adjust Nomic Duplicates - the duplicate detection algorithm.
  • embedding_model: Options to adjust the embedding model used to embed your dataset.