Rate Limits

To ensure consistent service quality, the Nomic Platform API enforces rate limits per user account. All API keys owned by the same user share a single quota. These limits help maintain platform stability for all users.

Platform API (v0)

All v0 endpoints are rate-limited per user. Limits are enforced per tier — all endpoints in the same tier share a single bucket. Requests are paced evenly over time, so clients that respect the Retry-After header will naturally spread their traffic at a steady rate.

Tier	Limit	Applies to
Standard	300 req/min	Read endpoints (list/get users, audit logs, API keys, parse status)
Write	60 req/min	Mutation endpoints (disable/enable user, revoke API key)
Heavy	30 req/min	Resource-intensive endpoints (file upload, submit parse)
Search	60 req/min	Search endpoints (codes search)

Rate Limit Headers

Every API response includes the following headers so you can track your usage:

Header	Description
`X-RateLimit-Limit`	Maximum requests allowed per window
`X-RateLimit-Remaining`	Approximate remaining requests before throttling
`X-RateLimit-Reset`	Unix timestamp (seconds) when the quota resets

When a limit is exceeded, the API returns 429 Too Many Requests along with a Retry-After header indicating the exact number of seconds until your next request will be accepted. Clients that sleep for this duration and then retry will be automatically paced at a steady rate.

Embedding Inference API

We enforce a rate limit of 1200 requests per 5-minute rolling window per IP address for the Embedding Inference API.

If your use case requires higher throughput than our standard rate limits allow, we offer two recommended solutions:

Atlas Datasets

You can upload your dataset into Atlas and then download the embeddings we generate.

The Atlas Platform prioritizes embedding generation for Atlas Datasets and does not apply rate limits. This solution offers you an efficient way to generate embeddings at scale, coupled with the ability to visualize and explore your data in the Atlas UI in your browser.

You can get started generating embeddings for your data via Atlas by visiting our Upload a Dataset guide.

Amazon SageMaker

Our text embedding and image models are available for inference on Amazon SageMaker, giving you ability to configure settings for throughput and processing power. This solution is well-suited for teams already working within the AWS ecosystem who need high-volume embedding generation.

You can get started generating text and image embeddings via Amazon SageMaker with these example notebooks from our GitHub repository.

Platform API (v0)​

Rate Limit Headers​

Embedding Inference API​

Atlas Datasets​

Amazon SageMaker​

Platform API (v0)

Rate Limit Headers

Embedding Inference API

Atlas Datasets

Amazon SageMaker