Indexing

Before Assistant can reason over a file, Nomic indexes it: the file is parsed into structured text and images, then embedded for semantic search. Indexing is a one-time cost per file version — after that, every question Assistant answers about the file uses only foundation-model tokens.

How indexing works

Indexing runs two models under the hood:

Nomic Parse — Nomic's domain-specific parsing model turns a document into structured text, tables, and figure extractions. For drawings and PDFs, Parse handles sheet segmentation, OCR, and layout understanding.
Nomic Embed — Parsed content is chunked and embedded so it can be retrieved by semantic search.

Once a file is indexed, its parsed content and embeddings are stored and reused across every session, workflow, and search. Re-indexing only happens when a new version of the file is uploaded or synced. You can see each file's indexing status — Indexed, Indexing, Failed, or not yet indexed — in the Status column of the Files view. See Files for the full UX.

Supported file formats

PDF — including multi-sheet drawing sets, specifications, and reports
Microsoft Word (.doc, .docx, .docm, .dotx, .dotm)
Microsoft Excel (.xls, .xlsx, .xlsm, .xltx, .xltm)
Microsoft PowerPoint (.ppt, .pptx, .pptm, .potx, .potm)
Google Docs, Sheets, and Slides
Images — for visual understanding in drawing review workflows
BIM models (.ifc, .ifczip) — prepared on demand for BIM viewing and Assistant queries

Files synced or imported from integrations (SharePoint, Egnyte, Autodesk Forma, Bentley, Fieldwire) are indexed automatically when file sync is enabled for that integration. Directly uploaded files can be indexed on demand via Index Document in the file's action menu — for yourself only, or for everyone in the organization.

BIM model preparation

BIM models do not follow the normal Parse and Embed indexing flow. Nomic prepares IFC-derived model artifacts the first time you open or query a model, then caches those artifacts for the file version. Because BIM models are not split into document pages, they do not show the same document indexing status or page-based indexing estimate as PDFs and Office files.

Pricing

Indexing cost is driven by pages parsed, not the number of files:

Step	Unit	Price
Nomic Parse	Per page	$0.012 / page
Nomic Embed	Per file	Included in the parse cost for typical documents

Reference costs:

A 10-page spec costs about $0.12 to index.
A 100-sheet drawing set costs about $1.20 to index.
A 1,000-page project archive costs about $12 to index.

Each parse event is recorded as a spend event against your AI Usage pool with the page count and cost. Admins can review parse events across the organization in Admin → Usage or Admin → Analytics → Parsing; individual users can review their own parse events in Settings → Usage.

The per-page price absorbs layout segmentation, OCR for scanned or mixed-content PDFs, table and figure extraction, drawing sheet decomposition, and embedding for both text chunks and image regions. There is no separate per-token charge.

Indexing vs. not indexing

You can attach a non-indexed file directly to a single Assistant session, and Assistant will read it for that conversation. The tradeoffs:

	Indexed	Not indexed (attached only)
Retrievable across sessions	Yes — available in every thread and workflow in the project	No — only visible to the session it is attached to
Searchable via semantic search	Yes	No
Scalable to large sets	Yes — Assistant pulls only relevant chunks	No — the whole file goes into the model's context
Cost per question	Low — only pays for the tokens Assistant actually uses	High — every question re-reads the full file, paying foundation-model input tokens each time
One-time cost	Yes — $0.012 / page, then reused forever	None upfront, but every turn pays for the full file

The break-even point is low. For a 50-page document, indexing costs $0.60 once. Reading that same file inline on Sonnet 4.6 costs roughly $0.05 – $0.15 per question in input tokens alone. After a handful of questions, indexing has paid for itself — and the file is then available to every future session and workflow.

Rule of thumb: index anything you expect to reference more than once, or anything that will live in a project the team will return to.

Managing indexing costs

Index selectively. You don't need to index every file in a SharePoint site or Autodesk Forma project. Use integration configuration to scope which folders sync into Nomic.
Use personal vs. organization indexing for one-off exploration to avoid promoting files to canonical project context.
Avoid re-uploading. Re-indexing is tied to file versions — uploading the same file twice indexes twice.
Watch large binary PDFs. Sort the Files view by page count to find the most expensive files to index.
Audit indexing spend. Admins can filter Admin → Usage to parse rows or open Admin → Analytics → Parsing to see parse spend by user and individual parse events. Individual users can open Settings → Usage to see their own parse events.
Use tags to scope Assistant context. Even after indexing, narrowing context with Tags and Projects keeps per-question costs in check.

Where to next

Files — The Files view, supported formats, and indexing statuses.
BIM Model Intelligence — Supported BIM formats, viewer tabs, and Assistant query behavior.
Models & Pricing overview — How Assistant token pricing interacts with indexed vs. attached files.
Analytics & monitoring — Audit parse events and organization-wide indexing spend.

How indexing works​

Supported file formats​

BIM model preparation​

Pricing​

Indexing vs. not indexing​

Managing indexing costs​

Where to next​