Skip to main content

Embedding and projection algorithms

Embeddings are vector representations of real-world objects, and embedding projection preserves the semantic neighborhood information embeddings capture in two dimensions.

Despite their mathematical utility, high-dimensional embeddings cannot be easily interpreted because humans do not have a good point of reference for understanding what a 768-dimensional vector "looks" like.

Atlas augments your dataset using embedding projection to build a contextual two-dimensional data map from high-dimensional embeddings. This map preserves high-dimensional relationships present between embeddings in a two-dimensional, human interpretable view.

Embedding

Atlas uses a contrastively trained deep neural network, similar to Neelakantan et al., to vectorize text. Users may also upload their own embeddings to build Atlas maps. Image, audio, video, and multimodal embeddings are on the way to Atlas.

Projection

Nomic Atlas takes high-dimensional embeddings and generates a low-dimensional set of vectors for mapping. Nomic’s layout optimizer performs dimensionality reduction on embeddings, and decides on parameters for the high dimensional kernel, low dimensional kernel, and optimizer that enable optimized projection speed, size, and quality.