Overview
The Nomic Python client is the best way to upload and interact with large unstructured datasets.
Installation
pip install nomic
Data Upload
def map_data(
data: Optional[Union[DataFrame, List[Dict], Table]] = None,
blobs: Optional[List[Union[str, bytes, Image.Image]]] = None,
embeddings: Optional[np.ndarray] = None,
identifier: Optional[str] = None,
description: str = "",
id_field: Optional[str] = None,
is_public: bool = True,
indexed_field: Optional[str] = None,
projection: Union[bool, Dict, NomicProjectOptions] = True,
topic_model: Union[bool, Dict, NomicTopicOptions] = True,
duplicate_detection: Union[bool, Dict, NomicDuplicatesOptions] = True,
embedding_model: Optional[Union[str, Dict, NomicEmbedOptions]] = None
) -> AtlasDataset
Arguments:
data
: An ordered collection of the datapoints you are structuring. Can be a list of dictionaries, Pandas Dataframe or PyArrow Table.blobs
: A list of image paths, bytes, or PIL images to add to your image dataset that are stored locally.embeddings
: An [N,d] numpy array containing the N embeddings to add.identifier
: A name for your dataset that is used to generate the dataset identifier. A unique name will be chosen if not supplied.description
: The description of your datasetid_field
: Specify your data unique id field. This field can be up 36 characters in length. If not specified, one will be created for you namedid_
.is_public
: Should the dataset be accessible outside your Nomic Atlas organization.projection
: Options to adjust Nomic Project - the dimensionality algorithm organizing your dataset.topic_model
: Options to adjust Nomic Topic - the topic model organizing your dataset.duplicate_detection
: Options to adjust Nomic Duplicates - the duplicate detection algorithm.embedding_model
: Options to adjust the embedding model used to embed your dataset.