Parse files into structured information
POST/v1/parse
Parse a file into structured information.
Supports PDF files with configurable chunking strategies and optional embedding generation.
Request
- application/json
Body
required
List of file URLs to process. Supports two URL types:
1. Public URLs - accessible from the internet
2. `nomic://` prefixed URLs - obtained from the `/upload` endpoint
Custom storage URL where parse results will be uploaded via PUT request. If not provided, results are stored by us temporarily and accessible via the task status endpoint.
result_put_headers
object
HTTP headers to include when uploading parse results to the result_url
.
options
object
Possible values: [hybrid
, hierarchical
]
Default value: hybrid
Chunking strategy: hybrid
splits documents using both content and structure, hierarchical
splits at natural document sections like headings and chapters.
When enabled, generates vector embeddings for each chunk after parsing is complete. Embedding generation runs as a separate background job and won't slow down the main parsing process.
The id of an uploaded file to parse. (deprecated)
Responses
- 201
- 403
- 422
The task id of the parsing task.
- application/json
- Schema
- Example (from schema)
Schema
The id of the task.
options_ids
object
The id of the options used for the parsing task (e.g., embed_chunks
).
{
"task_id": "string",
"options_ids": {}
}
The user is not authorized to perform this action.
- application/json
- Schema
- Example (from schema)
- Example
Schema
any
{
"status_code": 403,
"detail": "The user is not authorized to perform this action."
}
{
"status_code": 403,
"detail": "The user is not authorized to perform this action."
}
Validation Error
- application/json
- Schema
- Example (from schema)
Schema
Array [
Array [
- MOD1
- MOD2
]
]
detail
object[]
loc
object[]
required
anyOf
string
integer
{
"detail": [
{
"loc": [
"string",
0
],
"msg": "string",
"type": "string"
}
]
}