Indexing
Before Assistant can reason over a file, Nomic indexes it: the file is parsed into structured text and images, then embedded for semantic search. Indexing is a one-time cost per file version — after that, every question Assistant answers about the file uses only foundation-model tokens.
How indexing works
Indexing runs two models under the hood:
- Nomic Parse — Nomic's domain-specific parsing model turns a document into structured text, tables, and figure extractions. For drawings and PDFs, Parse handles sheet segmentation, OCR, and layout understanding.
- Nomic Embed — Parsed content is chunked and embedded so it can be retrieved by semantic search.
Once a file is indexed, its parsed content and embeddings are stored and reused across every session, workflow, and search. Re-indexing only happens when a new version of the file is uploaded or synced. You can see each file's indexing status — Indexed, Indexing, Failed, or not yet indexed — in the Status column of the Files view. See Files for the full UX.
Supported file formats
- PDF — including multi-sheet drawing sets, specifications, and reports
- Microsoft Word (
.docx) - Microsoft Excel (
.xlsx) - Microsoft PowerPoint (
.pptx) - Google Docs, Sheets, and Slides
- Images — for visual understanding in drawing review workflows
Files synced from integrations (SharePoint, Egnyte, ACC, Bentley) are indexed automatically. Directly uploaded files can be indexed on demand via Index Document in the file's action menu — for yourself only, or for everyone in the organization.
Pricing
Indexing cost is driven by pages parsed, not the number of files:
| Step | Unit | Price |
|---|---|---|
| Nomic Parse | Per page | $0.012 / page |
| Nomic Embed | Per file | Included in the parse cost for typical documents |
Reference costs:
- A 10-page spec costs about $0.12 to index.
- A 100-sheet drawing set costs about $1.20 to index.
- A 1,000-page project archive costs about $12 to index.
Each parse event is recorded as a SpendEvent against your AI Usage pool and is visible in Admin → Usage with the page count and cost.
The per-page price absorbs layout segmentation, OCR for scanned or mixed-content PDFs, table and figure extraction, drawing sheet decomposition, and embedding for both text chunks and image regions. There is no separate per-token charge.
Indexing vs. not indexing
You can attach a non-indexed file directly to a single Assistant session, and Assistant will read it for that conversation. The tradeoffs:
| Indexed | Not indexed (attached only) | |
|---|---|---|
| Retrievable across sessions | Yes — available in every thread and workflow in the project | No — only visible to the session it is attached to |
| Searchable via semantic search | Yes | No |
| Scalable to large sets | Yes — Assistant pulls only relevant chunks | No — the whole file goes into the model's context |
| Cost per question | Low — only pays for the tokens Assistant actually uses | High — every question re-reads the full file, paying foundation-model input tokens each time |
| One-time cost | Yes — $0.012 / page, then reused forever | None upfront, but every turn pays for the full file |
The break-even point is low. For a 50-page document, indexing costs $0.60 once. Reading that same file inline on Sonnet 4.6 costs roughly $0.05 – $0.15 per question in input tokens alone. After a handful of questions, indexing has paid for itself — and the file is then available to every future session and workflow.
Rule of thumb: index anything you expect to reference more than once, or anything that will live in a project the team will return to.
Managing indexing costs
- Index selectively. You don't need to index every file in a SharePoint site or ACC project. Use integration configuration to scope which folders sync into Nomic.
- Use personal vs. organization indexing for one-off exploration to avoid promoting files to canonical project context.
- Avoid re-uploading. Re-indexing is tied to file versions — uploading the same file twice indexes twice.
- Watch large binary PDFs. Sort the Files view by page count to find the most expensive files to index.
- Audit in Admin → Usage. Filter the Usage events table to
parserows to see which files have consumed indexing spend, and which users kicked them off. - Use tags to scope Assistant context. Even after indexing, narrowing context with Tags and Projects keeps per-question costs in check.
Where to next
- Files — The Files view, supported formats, and indexing statuses.
- Models & Pricing overview — How Assistant token pricing interacts with indexed vs. attached files.
- Admin → Usage — Audit parse events per file, per user.