Skip to main content

Parse files into structured information

POST 

/v1/parse

Parse a file into structured information.

Supports PDF files with configurable chunking strategies and optional embedding generation.

Request

Body

required

    file_url File Url (string)required

    File URL to process. Supports two URL types:

        1. Public URLs - accessible from the internet

    2. `nomic://` prefixed URLs - obtained from the `/upload` endpoint

    options

    object

    Options to customize document parsing.

    chunking

    object

    Options that control how the document is split into chunks.

    chunk_mode DocChunkMode (string)

    Possible values: [page]

    Default value: page

    The method by which the document is split into chunks.

    ocr_system DocOcrSystem (string)

    Possible values: [standard]

    Default value: standard

    The OCR method used to extract text from the document.

    content_extraction_mode DocContentExtractionMode (string)

    Possible values: [metadata, hybrid, ocr]

    Default value: hybrid

    The overall strategy for extracting content from the document. metadata: Disable all OCR. Only use embedded document text. hybrid: Use a VLM for tables, and run an OCR model on all bitmaps found in the document. ocr: Use a VLM for tables. Run an OCR model on full pages.

    table_summary

    object

    Options for generating table summaries.

    enabled Enabled (boolean)

    Whether to generate a summary of table content.

    figure_summary

    object

    Options for generating figure summaries.

    enabled Enabled (boolean)

    Default value: true

    Whether to generate a summary of figure content.

Responses

The task id of the parsing task.

Schema

    task_id Task Id (string)required

    The id of the task.

Loading...