Papermill API Reference
Papermill exposes a REST API for PDF ingestion, document conversion, and figure extraction. Integrations use resource-oriented URLs, JSON metadata payloads, and standard HTTP verbs and status codes.
Upload a document with POST /tasks, poll task status, and download both markdown output and extracted figures once processing is complete.
Base URL
https://papermill.akl773.com
Quick start
Quick start request
curl -X POST https://papermill.akl773.com/tasks \
-H "Authorization: Bearer <access-token>" \
-H "Content-Type: multipart/form-data" \
-F "file=@/path/to/document.pdf"Initial response
{
"id": "d227b2b7-1c3f-4e6f-96a9-450f11e35561",
"status": "QUEUED"
}Authentication
Papermill accepts Bearer tokens using either a Clerk session JWT or a Papermill API key (pmk_...).
Task and figure routes also support guest sessions via theguest_sessioncookie when no Bearer token is supplied. API key management routes always require Bearer auth.
Bearer auth
Authenticated request
curl -X POST "https://papermill.akl773.com/tasks" \
-H "Authorization: Bearer <access-token>" \
-F "file=@document.pdf"Errors
Papermill uses conventional HTTP status codes with FastAPI error bodies. Missing resources return404while result-not-ready states return409.
All error payloads include a detail field that can be surfaced directly in client logs or retry workflows.
Error payloads
Error shape
{
"detail": "Task not found"
}Conflict response
HTTP/1.1 409 Conflict
Content-Type: application/json
{
"detail": "Task result not ready"
}Pagination
List endpoints support offset-based pagination through limit and offset query parameters.
Responses use a stable envelope with items, total, limit, offset, and has_more so clients can paginate deterministically.
Paginated list payload
List response
{
"items": [
{
"id": "d227b2b7-1c3f-4e6f-96a9-450f11e35561",
"status": "DONE",
"created_at": "2026-03-08T09:14:12Z"
},
{
"id": "d227b2b7-1c3f-4e6f-96a9-450f11e35562",
"status": "PROCESSING",
"created_at": "2026-03-08T09:11:04Z"
}
],
"total": 37,
"limit": 20,
"offset": 0,
"has_more": true
}The Task object
Represents a single document processing task from upload through completion.
Unique identifier for the task.
Original filename provided during upload.
Current processing lifecycle state.
Allowed values
PENDING, QUEUED, PROCESSING, DONE, FAILED
Storage path to the uploaded source file.
Storage path to the processed output file when task is done.
Error details when processing fails.
UTC timestamp when the task row was created.
UTC timestamp for the latest task state change.
Object example
{
"id": "d227b2b7-1c3f-4e6f-96a9-450f11e35561",
"file_name": "document.pdf",
"status": "QUEUED",
"input_path": "uploads/d227b2b7-1c3f-4e6f-96a9-450f11e35561/input.pdf",
"output_path": null,
"error_message": null,
"created_at": "2026-03-08T12:00:00Z",
"updated_at": "2026-03-08T12:00:01Z"
}The Figure object
Represents an extracted figure image with a pre-signed URL for direct retrieval.
1-based index of the figure within the source document.
Title extracted for the figure from the parsed document metadata.
Object storage key used to retrieve the image from MinIO.
Time-limited pre-signed URL for direct retrieval.
Object example
{
"figure_index": 1,
"title": "Figure 1. Evaluation pipeline overview",
"s3_key": "d227b2b7-1c3f-4e6f-96a9-450f11e35561/figures/picture-1.png",
"url": "https://minio.localhost/papermill/..."
}The API key object
Represents a programmatic authentication credential linked to a user.
Unique identifier for the API key.
User-defined label for the key.
Non-sensitive prefix used to identify the key.
UTC timestamp when the key was created.
UTC timestamp of the most recent authenticated API call.
UTC timestamp when the key was revoked.
Object example
{
"id": "7f4bfe76-c8ca-4e36-a107-88f14acbf391",
"name": "CI runner",
"key_prefix": "pmk_2fd2058f53c",
"created_at": "2026-03-09T12:00:00Z",
"last_used_at": "2026-03-09T12:15:30Z",
"revoked_at": null
}Create API Key
Creates a Papermill API key for the authenticated user and returns the one-time secret token.
Request body
Content-Type: application/json
Friendly key label for auditing and rotation.
Display name for the API key.
Response attributes
Unique identifier for the API key.
User-defined label for the key.
Non-sensitive prefix used to identify the key.
UTC timestamp when the key was created.
One-time API key secret. Store immediately; it is not returned again.
Status codes
| Code | Description |
|---|---|
| 201 | API key created. |
| 401 | "Authentication required" |
| 422 | "API key name must not be empty" or request validation error |
cURL request
curl -X POST https://papermill.akl773.com/api-keys \
-H "Authorization: Bearer <access-token>" \
-H "Content-Type: application/json" \
-d '{"name":"CI runner"}'Response (application/json)
{
"id": "7f4bfe76-c8ca-4e36-a107-88f14acbf391",
"token": "pmk_2fd2058f53cd84ce3f0ff8f8db3f587_..."
}List API Keys
Lists API keys created by the authenticated user.
Parameters
Query parameters
| Name | Type | Required | Description |
|---|---|---|---|
| limit | integer | Optional default: 50 | Maximum number of API keys to return. |
| offset | integer | Optional default: 0 | Zero-based index of the first row to return. |
Response attributes
Ordered list of API key objects.
Unique identifier for the API key.
User-defined label for the key.
Non-sensitive prefix used to identify the key.
UTC timestamp when the key was created.
UTC timestamp of the most recent authenticated API call.
UTC timestamp when the key was revoked.
Total number of API keys available for the authenticated user.
Page size requested through the query string.
Zero-based starting row for this page.
True when additional records exist beyond this page.
Status codes
| Code | Description |
|---|---|
| 200 | List of API key metadata. |
| 401 | "Authentication required" |
cURL request
curl -X GET "https://papermill.akl773.com/api-keys?limit=50&offset=0" \
-H "Authorization: Bearer <access-token>"Response (application/json)
{
"items": [
{
"id": "7f4bfe76-c8ca-4e36-a107-88f14acbf391",
"name": "CI runner",
"key_prefix": "pmk_2fd2058f53c"
}
],
"total": 1,
"limit": 50,
"offset": 0,
"has_more": false
}Revoke API Key
Revokes an API key so it can no longer be used for authentication.
Parameters
Path parameters
| Name | Type | Required | Description |
|---|---|---|---|
| api_key_id | uuid | Required | API key identifier. |
Response attributes
Indicates whether the key is revoked.
Status codes
| Code | Description |
|---|---|
| 200 | Revocation confirmation. |
| 401 | "Authentication required" |
| 404 | "API key not found" |
cURL request
curl -X DELETE https://papermill.akl773.com/api-keys/7f4bfe76-c8ca-4e36-a107-88f14acbf391 \
-H "Authorization: Bearer <access-token>"Response (application/json)
{
"revoked": true
}Create a Task
Uploads a PDF and enqueues it for processing. This route accepts optional Bearer auth; without a token, tasks are scoped to a guest session cookie.
Parameters
Form fields
| Name | Type | Required | Description |
|---|---|---|---|
| file | file | Required | PDF file to process. |
Request body
Content-Type: multipart/form-data
Upload payload with one required form field.
PDF document to process.
Response attributes
Unique identifier for the task.
Original filename provided during upload.
Current processing lifecycle state.
Allowed values
PENDING, QUEUED, PROCESSING, DONE, FAILED
Storage path to the uploaded source file.
Storage path to the processed output file when task is done.
Error details when processing fails.
UTC timestamp when the task row was created.
UTC timestamp for the latest task state change.
Status codes
| Code | Description |
|---|---|
| 201 | Task created and enqueued. |
| 400 | "Uploaded file must have a filename" |
| 500 | "Failed to enqueue task" |
cURL request
curl -X POST https://papermill.akl773.com/tasks \
-H "Authorization: Bearer <access-token>" \
-H "Content-Type: multipart/form-data" \
-F "file=@/path/to/document.pdf"Response (application/json)
{
"id": "d227b2b7-1c3f-4e6f-96a9-450f11e35561",
"status": "QUEUED"
}List Tasks
Returns tasks ordered by creation time. This route accepts optional Bearer auth; without a token, results are scoped to a guest session cookie.
Parameters
Query parameters
| Name | Type | Required | Description |
|---|---|---|---|
| limit | integer | Optional default: 50 | Maximum number of tasks to return. |
| offset | integer | Optional default: 0 | Zero-based index of the first row to return. |
Response attributes
Ordered list of task objects.
Unique identifier for the task.
Original filename provided during upload.
Current processing lifecycle state.
Allowed values
PENDING, QUEUED, PROCESSING, DONE, FAILED
Storage path to the uploaded source file.
Storage path to the processed output file when task is done.
Error details when processing fails.
UTC timestamp when the task row was created.
UTC timestamp for the latest task state change.
Total number of task records available for the current auth or guest scope.
Page size requested through the query string.
Zero-based starting row for this page.
True when additional records exist beyond this page.
Status codes
| Code | Description |
|---|---|
| 200 | List of task objects. |
cURL request
curl -X GET "https://papermill.akl773.com/tasks?limit=50&offset=0" \
-H "Authorization: Bearer <access-token>"Response (application/json)
{
"items": [
{
"id": "d227b2b7-1c3f-4e6f-96a9-450f11e35561",
"status": "DONE"
}
],
"total": 1,
"limit": 50,
"offset": 0,
"has_more": false
}List Public Guest Tasks
Returns tasks scoped to the guest cookie session only. This endpoint ignores Bearer headers and is intended for unauthenticated browser flows.
Parameters
Query parameters
| Name | Type | Required | Description |
|---|---|---|---|
| limit | integer | Optional default: 50 | Maximum number of tasks to return. |
| offset | integer | Optional default: 0 | Zero-based index of the first row to return. |
Response attributes
Ordered list of task objects.
Unique identifier for the task.
Original filename provided during upload.
Current processing lifecycle state.
Allowed values
PENDING, QUEUED, PROCESSING, DONE, FAILED
Storage path to the uploaded source file.
Storage path to the processed output file when task is done.
Error details when processing fails.
UTC timestamp when the task row was created.
UTC timestamp for the latest task state change.
Total number of task records available for the current auth or guest scope.
Page size requested through the query string.
Zero-based starting row for this page.
True when additional records exist beyond this page.
Status codes
| Code | Description |
|---|---|
| 200 | List of task objects for current guest cookie scope. |
cURL request
curl -X GET "https://papermill.akl773.com/public/tasks?limit=50&offset=0"Response (application/json)
{
"items": [],
"total": 0,
"limit": 50,
"offset": 0,
"has_more": false
}Retrieve a Task
Fetches one task by UUID. This route accepts optional Bearer auth; without a token, access is scoped to a guest session cookie.
Parameters
Path parameters
| Name | Type | Required | Description |
|---|---|---|---|
| task_id | uuid | Required | Task identifier. |
Response attributes
Unique identifier for the task.
Original filename provided during upload.
Current processing lifecycle state.
Allowed values
PENDING, QUEUED, PROCESSING, DONE, FAILED
Storage path to the uploaded source file.
Storage path to the processed output file when task is done.
Error details when processing fails.
UTC timestamp when the task row was created.
UTC timestamp for the latest task state change.
Status codes
| Code | Description |
|---|---|
| 200 | Task object. |
| 404 | "Task not found" |
cURL request
curl -X GET https://papermill.akl773.com/tasks/d227b2b7-1c3f-4e6f-96a9-450f11e35561 \
-H "Authorization: Bearer <access-token>"Response (application/json)
{
"id": "d227b2b7-1c3f-4e6f-96a9-450f11e35561",
"status": "DONE"
}Delete a Task
Deletes a task in the current auth scope (Bearer user or guest session) and removes local artifacts. Active queued/processing tasks cannot be deleted.
Parameters
Path parameters
| Name | Type | Required | Description |
|---|---|---|---|
| task_id | uuid | Required | Task identifier. |
Response attributes
Indicates whether the task and associated storage were removed.
Status codes
| Code | Description |
|---|---|
| 200 | Deletion confirmation. |
| 404 | "Task not found" |
| 409 | "Cannot delete task while processing" |
cURL request
curl -X DELETE https://papermill.akl773.com/tasks/d227b2b7-1c3f-4e6f-96a9-450f11e35561 \
-H "Authorization: Bearer <access-token>"Response (application/json)
{
"deleted": true
}Download Parsed Document
Streams the final output artifact for a completed task. The content type is inferred from file extension.
Parameters
Path parameters
| Name | Type | Required | Description |
|---|---|---|---|
| task_id | uuid | Required | Task identifier. |
Status codes
| Code | Description |
|---|---|
| 200 | File stream (.md or .txt). |
| 404 | "Task not found" or "Task output file missing" |
| 409 | "Task result not ready" |
cURL request
curl -L -o result.md https://papermill.akl773.com/tasks/d227b2b7-1c3f-4e6f-96a9-450f11e35561/download \
-H "Authorization: Bearer <access-token>"Response (text/markdown | text/plain | application/octet-stream)
# Markdown output
Generated content...Get Extracted Assets JSON
Returns structural metadata from `result.assets.json` produced during parsing.
Parameters
Path parameters
| Name | Type | Required | Description |
|---|---|---|---|
| task_id | uuid | Required | Task identifier. |
Response attributes
Extracted figure metadata generated by the parser. Exact structure can evolve with parser versions.
1-based figure index in the source document.
Source page number where the figure was detected.
Bounding box coordinates as `[x0, y0, x1, y1]`.
Status codes
| Code | Description |
|---|---|
| 200 | Assets metadata JSON. |
| 404 | "Task not found", "Task output file missing", or "Task assets file missing" |
| 409 | "Task result not ready" |
| 500 | "Task assets file is invalid" |
cURL request
curl -X GET https://papermill.akl773.com/tasks/d227b2b7-1c3f-4e6f-96a9-450f11e35561/assets \
-H "Authorization: Bearer <access-token>"Response (application/json)
{
"figures": [
{ "index": 1, "page": 2, "bbox": [110, 210, 430, 512] }
]
}List Extracted Figures
Lists extracted figures for a completed task and returns pre-signed URLs for direct retrieval.
Parameters
Path parameters
| Name | Type | Required | Description |
|---|---|---|---|
| task_id | uuid | Required | Task identifier. |
Query parameters
| Name | Type | Required | Description |
|---|---|---|---|
| limit | integer | Optional default: 50 | Maximum number of figures to return. |
| offset | integer | Optional default: 0 | Zero-based index of the first row to return. |
Response attributes
Ordered list of extracted figure objects.
1-based index of the figure within the source document.
Title extracted for the figure from the parsed document metadata.
Object storage key used to retrieve the image from MinIO.
Time-limited pre-signed URL for direct retrieval.
Total number of extracted figures for the task.
Page size requested through the query string.
Zero-based starting row for this page.
True when additional figure records exist beyond this page.
Status codes
| Code | Description |
|---|---|
| 200 | List of figure objects. |
| 404 | "Task not found" |
| 409 | "Task result not ready" |
cURL request
curl -X GET "https://papermill.akl773.com/tasks/d227b2b7-1c3f-4e6f-96a9-450f11e35561/figures?limit=50&offset=0" \
-H "Authorization: Bearer <access-token>"Response (application/json)
{
"items": [
{
"figure_index": 1,
"title": "Figure 1. Evaluation pipeline overview",
"s3_key": "d227b2b7-1c3f-4e6f-96a9-450f11e35561/figures/picture-1.png",
"url": "https://minio.localhost/papermill/..."
}
],
"total": 1,
"limit": 50,
"offset": 0,
"has_more": false
}Retrieve a Figure
Returns metadata for a specific figure index.
Parameters
Path parameters
| Name | Type | Required | Description |
|---|---|---|---|
| task_id | uuid | Required | Task identifier. |
| n | integer | Required default: 1 | 1-based figure index. |
Response attributes
1-based index of the figure within the source document.
Title extracted for the figure from the parsed document metadata.
Object storage key used to retrieve the image from MinIO.
Time-limited pre-signed URL for direct retrieval.
Status codes
| Code | Description |
|---|---|
| 200 | Single figure object. |
| 404 | "Task not found" or "Figure not found" |
| 409 | "Task result not ready" |
| 422 | Validation error when `n` is less than 1. |
cURL request
curl -X GET https://papermill.akl773.com/tasks/d227b2b7-1c3f-4e6f-96a9-450f11e35561/figures/1 \
-H "Authorization: Bearer <access-token>"Response (application/json)
{
"figure_index": 1,
"title": "Figure 1. Evaluation pipeline overview",
"s3_key": "d227b2b7-1c3f-4e6f-96a9-450f11e35561/figures/picture-1.png",
"url": "https://minio.localhost/papermill/..."
}Download a Figure
Streams the figure image directly. Set `download=true` to force attachment download.
Parameters
Path parameters
| Name | Type | Required | Description |
|---|---|---|---|
| task_id | uuid | Required | Task identifier. |
| n | integer | Required | 1-based figure index. |
Query parameters
| Name | Type | Required | Description |
|---|---|---|---|
| download | boolean | Optional default: false | When true, returns attachment disposition. |
Status codes
| Code | Description |
|---|---|
| 200 | PNG byte stream. |
| 404 | "Task not found", "Figure not found", or "Figure file not found" |
| 409 | "Task result not ready" |
| 422 | Validation error when `n` is less than 1. |
| 502 | "Failed to fetch figure file" |
cURL request
curl -L -o figure-1.png "https://papermill.akl773.com/tasks/d227b2b7-1c3f-4e6f-96a9-450f11e35561/figures/1/download?download=true" \
-H "Authorization: Bearer <access-token>"Response (image/png)
(binary image data)Health Check
Verifies that the API process is reachable.
Response attributes
Health marker. Always `ok` when the service is reachable.
Allowed values
ok
Status codes
| Code | Description |
|---|---|
| 200 | Service health payload. |
cURL request
curl -X GET https://papermill.akl773.com/healthResponse (application/json)
{
"status": "ok"
}Service Metadata
Returns runtime metadata and queue configuration values.
Response attributes
Service name configured at startup.
Runtime environment name.
Application version.
Debug mode flag.
ARQ queue name used for document processing jobs.
Reserved feature flag for future processing behavior.
Status codes
| Code | Description |
|---|---|
| 200 | Metadata payload. |
cURL request
curl -X GET https://papermill.akl773.com/api/v1/metaResponse (application/json)
{
"app_name": "Document Task Queue Service",
"environment": "development",
"version": "0.1.0",
"debug": false,
"queue": "document_tasks",
"placeholder_processing": false
}