Parse files into structured information

POST /v1/parse

Parse a file into structured information.

Supports PDF files with configurable chunking strategies and optional embedding generation.

Request

application/json

Body

required

file_url File Url (string)required

File URL to process. Supports two URL types:

Public URLs - accessible from the internet

`nomic://` prefixed URLs - obtained from the `/upload` endpoint

options

object

Options to customize document parsing.

chunking

object

Options that control how the document is split into chunks.

chunk_mode DocChunkMode (string)

Possible values: [page]

Default value: page

The method by which the document is split into chunks.

ocr_system DocOcrSystem (string)

Possible values: [standard]

Default value: standard

The OCR method used to extract text from the document.

content_extraction_mode DocContentExtractionMode (string)

Possible values: [metadata, hybrid, ocr]

Default value: hybrid

The overall strategy for extracting content from the document. metadata: Disable all OCR. Only use embedded document text. hybrid: Use a VLM for tables, and run an OCR model on all bitmaps found in the document. ocr: Use a VLM for tables. Run an OCR model on full pages.

table_summary

object

Options for generating table summaries.

enabled Enabled (boolean)

Whether to generate a summary of table content.

figure_summary

object

Options for generating figure summaries.

enabled Enabled (boolean)

Default value: true

Whether to generate a summary of figure content.

Responses

The task id of the parsing task.

application/json

Schema
Example (from schema)

Schema

task_id Task Id (string)required

The id of the task.

{
  "task_id": "string"
}

The user is not authorized to perform this action.

application/json

Schema
Example (from schema)
Example

Schema

any

{
  "status_code": 403,
  "detail": "The user is not authorized to perform this action."
}

{
  "status_code": 403,
  "detail": "The user is not authorized to perform this action."
}

Validation Error

application/json

Schema
Example (from schema)

Schema

detail

object[]

Array [

loc

object[]

required

Array [

anyOf

MOD1
MOD2

string

]

msg Message (string)required

type Error Type (string)required

]

{
  "detail": [
    {
      "loc": [
        "string",
        0
      ],
      "msg": "string",
      "type": "string"
    }
  ]
}

Parse files into structured information

/v1/parse

Request​

Body

Responses​

Request

Responses