Extract
Extract turns unstructured documents into predictable, structured JSON. Provide a JSON Schema describing the shape of the output you want, and Extract reads the parsed content of each document to fill that schema with values from the source material.
Extract does not invent data — if a value does not exist in the document, the field will be empty or omitted.
The Developer Console at /developer on your Nomic instance lets you upload files, define a schema with a visual builder or JSON editor, and inspect extracted results in the browser — no code required.
Submit an extraction job
POST /extract
Submit one or more files for extraction against a JSON Schema. Files must be uploaded through the upload endpoint first.
Scope: developer:parse · Rate limit: Heavy (30 req / min)
Request body
| Field | Type | Required | Description |
|---|---|---|---|
fileVersionIds | string[] (uuid) | Yes | One or more fileVersionId values returned by the upload endpoint. |
schema | object | Yes | A valid JSON Schema describing the fields to extract. |
enableCitations | boolean | No | Include source references linking each value back to its location in the document. |
Response
{
"taskId": "019abc12-3456-7890-abcd-ef1234567893"
}
Errors
| Status | Cause |
|---|---|
400 | Invalid body, or a file is an integration file (not an uploaded file). |
401 | Missing or invalid API key. |
403 | API key lacks developer:parse scope. |
404 | One or more file versions not found or not accessible. |
422 | Schema validation failed. |
501 | Document extraction unavailable on this instance. |
Example
curl -X POST "https://<your-domain>.nomic.ai/api/v0/extract" \
-H "Authorization: Bearer $NOMIC_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"fileVersionIds": ["019abc12-3456-7890-abcd-ef1234567891"],
"schema": {
"type": "object",
"properties": {
"permitNumber": {
"type": "string",
"description": "The permit or license number printed on the document."
},
"operator": {
"type": "string",
"description": "Name of the well operator."
}
},
"required": ["permitNumber"]
}
}'
Get extraction task status
GET /extract/{taskId}
Poll the status of an extraction task.
Scope: developer:parse · Rate limit: Standard (300 req / min)
Path parameters
| Parameter | Type | Description |
|---|---|---|
taskId | string (uuid) | The taskId returned by the submit endpoint. |
Response
Pending:
{ "status": "pending" }
Completed:
{
"status": "completed",
"result": { ... }
}
Failed:
{
"status": "failed",
"error": "Description of what went wrong"
}
Errors
| Status | Cause |
|---|---|
401 | Missing or invalid API key. |
403 | API key lacks developer:parse scope. |
404 | Task not found or not accessible. |
501 | Document extraction unavailable on this instance. |
Defining a schema
A schema tells Extract what fields to look for, their types, and what they mean. Schemas follow the JSON Schema standard.
Tips for reliable extractions:
- Use clear field names and
descriptiontext that mirrors how the value appears in the document. - Set
typefor every field. - Use
requiredfor must-have fields so missing data is surfaced rather than silently omitted. - Use
enumto constrain values when there is a known set of options. - Keep nesting shallow (2–3 levels).
- Use
arrayfor repeating items like line items or rows.
Example schema for a multi-well permit:
{
"type": "object",
"properties": {
"permitNumber": {
"type": "string",
"description": "The permit or license number printed on the document."
},
"operator": {
"type": "string",
"description": "Name of the well operator."
},
"wells": {
"type": "array",
"description": "List of wells covered by this permit.",
"items": {
"type": "object",
"properties": {
"wellName": { "type": "string" },
"county": { "type": "string" },
"depth": { "type": "number", "description": "Total depth in feet." }
},
"required": ["wellName"]
}
}
},
"required": ["permitNumber"]
}
End-to-end example
Upload a file and extract structured data from it:
# 1. Upload
UPLOAD=$(curl -s -X POST "https://<your-domain>.nomic.ai/api/v0/files/upload" \
-H "Authorization: Bearer $NOMIC_API_KEY" \
-F "file=@permit.pdf")
FILE_VERSION_ID=$(echo "$UPLOAD" | jq -r '.fileVersionId')
# 2. Submit extraction job
EXTRACT=$(curl -s -X POST "https://<your-domain>.nomic.ai/api/v0/extract" \
-H "Authorization: Bearer $NOMIC_API_KEY" \
-H "Content-Type: application/json" \
-d "{
\"fileVersionIds\": [\"$FILE_VERSION_ID\"],
\"schema\": {
\"type\": \"object\",
\"properties\": {
\"permitNumber\": { \"type\": \"string\", \"description\": \"Permit number.\" }
}
}
}")
TASK_ID=$(echo "$EXTRACT" | jq -r '.taskId')
# 3. Poll until complete
while true; do
STATUS=$(curl -s "https://<your-domain>.nomic.ai/api/v0/extract/$TASK_ID" \
-H "Authorization: Bearer $NOMIC_API_KEY")
STATE=$(echo "$STATUS" | jq -r '.status')
if [ "$STATE" = "completed" ]; then
echo "$STATUS" | jq '.result'
break
elif [ "$STATE" = "failed" ]; then
echo "Extraction failed: $(echo "$STATUS" | jq -r '.error')"
exit 1
fi
sleep 5
done