Skip to main content

Topic modeling

Nomic Atlas organizes your data into a semantic topic heirachy allowing you to quickly group similar datapoints together.

Learn how to access your topics in Python or read more about the topic modeling algorithms behind the Atlas system.

Use cases

  • Document clustering and classification
  • Content recommendation
  • Trend analysis and monitoring / topic evolution over time
  • Text summarization
  • Knowledge discovery, pattern-finding, and data mining

What is an "indexed field"?

Your indexed_field is the attribute of data which is used to arrange the Atlas map and is up to you. As a result, a map of news articles indexed over their text content will be a semantic layout of news article content. So, the topic labels correspond to the data being indexed (article content), so topic labels describe article content. Topic labels ideally describe entire clusters of datapoints.

For example, your data may contain metadata, like time of publication, language, name of author, name of outlet, and possibly other user-generated attributes like bias, objectivity, and polarity.
If you build an Atlas map on this dataset, you will likely want to specify to Atlas to "index over" the article's text contents so the resulting map shows the landscape of article content, not some other metadata attribute like time or polarity.



Understanding topics in Atlas

The topic labels on the Atlas map are automatically generated based on the underlying data. More specifically, the topics describe one user-selected attribute of the data, like the text content of news articles.

Example: News map with Atlas topics

Atlas News Map zoomed in on Russia/World politics

Reading clusters and labels

We can see clustering and topic labeling in action in the map above, a multilingual map on world news (try browsing it yourself!).

  • High-level topics: A very general topic like "Russia-Ukraine conflict" broadly describes the data points within the bottom-right cluster.
  • Sub-topics: Within the cluster, there are sub-clusters with topic labels that describe more specific themes, like "Russian Navy", "Drones," "Russia-Canada Relations" and "Black Sea Fleet Headquarters."
  • Individual points comprise clusters: Zooming in and hovering over individual data points in the "Russia-Canada Relations" cluster, we see headlines like "Le président de l'Ukraine, Volodymyr Zelenskyy, effectuera une visite au Canada" ("Ukrainian President Volodymyr Zelenskyy to visit Canada") and "Tổng thống Ukraine tiếp tục chuyến vận động tới Canada sau những thách thức tại Mỹ" ("The President of Ukraine continues his campaign trip to Canada after challenges in the US").
  • Topic inference from clusters: The topic model infers labels such as "Russia-Canada Relations" based on clusters of individual datapoints like these. In this case in particular, the system uses a multilingual-aware model.

To learn more about the specifics of the computational processes behind the topic label generation process, see the Topics section in How Atlas Works.

Intuition behind Atlas topics

If you think about an Atlas map like an actual map — like one in a phone app or a classic tri-fold tucked in your glove compartment — both serve as a guide through a landscape, except Atlas does so with your data.

To understand the labels on an Atlas map, we can look to real maps as an analogy.

Consider a digital map of Earth.

  • When we see the whole world on our screen at once, we can see labels of continents, countries, ocean names and mountain ranges.
  • Zooming in brings more granularity, like states and provinces, rivers, and lakes.
  • Zooming in further, names of cities, towns and villages come into view.
  • Zooming in even further, we may see labeled buildngs, roads, paths, bridges, or monuments.
Stock image of map with labels

The Atlas projection algorithm applies statistical methods to generate a "landscape" of your data where semantically similar datapoints are close to each other on the map (for more info on semantic layouts, see the note below). This imposes order over a previously disordered dataset.

Now, to help users better understand the data landscape, Atlas generates topic labels which describe the underlying data. Hierarchical clustering results in more general labels and more specific labels. Zooming into the map reveals more specific labels; zooming out of the map hides these specific labels.

Real-life examples

To provide some other examples of unstructured data that gets structured in real life:

  • The grocery store: A grocery store organizes thousands of individual products into sections like produce, dairy, frozen goods, and baked goods based on the temperature, taste, type, culinary use, and origin of food and drink products. Often, there is an order within each section: the baked goods section may keep sweet and savory items separate.
  • The library: A library is organized by different attributes of its many books. The Dewey Decimal System divides books up into ten main classes of content like history, science, and the arts, and these get arranged around the library's stacks. Libraries could also be organized by last name of author, genre, reading-level, and more.

Although far-removed from data analysis, these examples show how a complex disarray of objects can be turned into a navigable space.