Skip to main content

Indexing

Before Assistant can reason over a file, Nomic indexes it: the file is parsed into structured text and images, then embedded for semantic search. Indexing is a one-time cost per file version — after that, every question Assistant answers about the file uses only foundation-model tokens.

How indexing works

Indexing runs two models under the hood:

  1. Nomic Parse — Nomic's domain-specific parsing model turns a document into structured text, tables, and figure extractions. For drawings and PDFs, Parse handles sheet segmentation, OCR, and layout understanding.
  2. Nomic Embed — Parsed content is chunked and embedded so it can be retrieved by semantic search.

Once a file is indexed, its parsed content and embeddings are stored and reused across every session, workflow, and search. Re-indexing only happens when a new version of the file is uploaded or synced. You can see each file's indexing status — Indexed, Indexing, Failed, or not yet indexed — in the Status column of the Files view. See Files for the full UX.

Supported file formats

  • PDF — including multi-sheet drawing sets, specifications, and reports
  • Microsoft Word (.docx)
  • Microsoft Excel (.xlsx)
  • Microsoft PowerPoint (.pptx)
  • Google Docs, Sheets, and Slides
  • Images — for visual understanding in drawing review workflows

Files synced from integrations (SharePoint, Egnyte, ACC, Bentley) are indexed automatically. Directly uploaded files can be indexed on demand via Index Document in the file's action menu — for yourself only, or for everyone in the organization.

Pricing

Indexing cost is driven by pages parsed, not the number of files:

StepUnitPrice
Nomic ParsePer page$0.012 / page
Nomic EmbedPer fileIncluded in the parse cost for typical documents

Reference costs:

  • A 10-page spec costs about $0.12 to index.
  • A 100-sheet drawing set costs about $1.20 to index.
  • A 1,000-page project archive costs about $12 to index.

Each parse event is recorded as a SpendEvent against your AI Usage pool and is visible in Admin → Usage with the page count and cost.

The per-page price absorbs layout segmentation, OCR for scanned or mixed-content PDFs, table and figure extraction, drawing sheet decomposition, and embedding for both text chunks and image regions. There is no separate per-token charge.

Indexing vs. not indexing

You can attach a non-indexed file directly to a single Assistant session, and Assistant will read it for that conversation. The tradeoffs:

IndexedNot indexed (attached only)
Retrievable across sessionsYes — available in every thread and workflow in the projectNo — only visible to the session it is attached to
Searchable via semantic searchYesNo
Scalable to large setsYes — Assistant pulls only relevant chunksNo — the whole file goes into the model's context
Cost per questionLow — only pays for the tokens Assistant actually usesHigh — every question re-reads the full file, paying foundation-model input tokens each time
One-time costYes — $0.012 / page, then reused foreverNone upfront, but every turn pays for the full file

The break-even point is low. For a 50-page document, indexing costs $0.60 once. Reading that same file inline on Sonnet 4.6 costs roughly $0.05 – $0.15 per question in input tokens alone. After a handful of questions, indexing has paid for itself — and the file is then available to every future session and workflow.

Rule of thumb: index anything you expect to reference more than once, or anything that will live in a project the team will return to.

Managing indexing costs

  • Index selectively. You don't need to index every file in a SharePoint site or ACC project. Use integration configuration to scope which folders sync into Nomic.
  • Use personal vs. organization indexing for one-off exploration to avoid promoting files to canonical project context.
  • Avoid re-uploading. Re-indexing is tied to file versions — uploading the same file twice indexes twice.
  • Watch large binary PDFs. Sort the Files view by page count to find the most expensive files to index.
  • Audit in Admin → Usage. Filter the Usage events table to parse rows to see which files have consumed indexing spend, and which users kicked them off.
  • Use tags to scope Assistant context. Even after indexing, narrowing context with Tags and Projects keeps per-question costs in check.

Where to next

  • Files — The Files view, supported formats, and indexing statuses.
  • Models & Pricing overview — How Assistant token pricing interacts with indexed vs. attached files.
  • Admin → Usage — Audit parse events per file, per user.