Skip to main content
Papermill API

Papermill API Reference

Papermill exposes a REST API for PDF ingestion, document conversion, and figure extraction. Integrations use resource-oriented URLs, JSON metadata payloads, and standard HTTP verbs and status codes.

Upload a document with POST /tasks, poll task status, and download both markdown output and extracted figures once processing is complete.

Base URL

https://papermill.akl773.com

Quick start

Quick start request

curl -X POST https://papermill.akl773.com/tasks \
  -H "Authorization: Bearer <access-token>" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@/path/to/document.pdf"

Initial response

{
  "id": "d227b2b7-1c3f-4e6f-96a9-450f11e35561",
  "status": "QUEUED"
}

Authentication

Papermill accepts Bearer tokens using either a Clerk session JWT or a Papermill API key (pmk_...).

Task and figure routes also support guest sessions via theguest_sessioncookie when no Bearer token is supplied. API key management routes always require Bearer auth.

Bearer auth

Authenticated request

curl -X POST "https://papermill.akl773.com/tasks" \
  -H "Authorization: Bearer <access-token>" \
  -F "file=@document.pdf"

Errors

Papermill uses conventional HTTP status codes with FastAPI error bodies. Missing resources return404while result-not-ready states return409.

All error payloads include a detail field that can be surfaced directly in client logs or retry workflows.

Error payloads

Error shape

{
  "detail": "Task not found"
}

Conflict response

HTTP/1.1 409 Conflict
Content-Type: application/json

{
  "detail": "Task result not ready"
}

Pagination

List endpoints support offset-based pagination through limit and offset query parameters.

Responses use a stable envelope with items, total, limit, offset, and has_more so clients can paginate deterministically.

Paginated list payload

List response

{
  "items": [
    {
      "id": "d227b2b7-1c3f-4e6f-96a9-450f11e35561",
      "status": "DONE",
      "created_at": "2026-03-08T09:14:12Z"
    },
    {
      "id": "d227b2b7-1c3f-4e6f-96a9-450f11e35562",
      "status": "PROCESSING",
      "created_at": "2026-03-08T09:11:04Z"
    }
  ],
  "total": 37,
  "limit": 20,
  "offset": 0,
  "has_more": true
}

The Task object

Represents a single document processing task from upload through completion.

iduuidRequired

Unique identifier for the task.

file_namestringRequired

Original filename provided during upload.

statusenum<string>Required

Current processing lifecycle state.

Allowed values

PENDING, QUEUED, PROCESSING, DONE, FAILED

input_pathstringRequired

Storage path to the uploaded source file.

output_pathstringRequiredNullable

Storage path to the processed output file when task is done.

error_messagestringRequiredNullable

Error details when processing fails.

created_atdatetimeRequired

UTC timestamp when the task row was created.

updated_atdatetimeRequired

UTC timestamp for the latest task state change.

Object example

{
  "id": "d227b2b7-1c3f-4e6f-96a9-450f11e35561",
  "file_name": "document.pdf",
  "status": "QUEUED",
  "input_path": "uploads/d227b2b7-1c3f-4e6f-96a9-450f11e35561/input.pdf",
  "output_path": null,
  "error_message": null,
  "created_at": "2026-03-08T12:00:00Z",
  "updated_at": "2026-03-08T12:00:01Z"
}

The Figure object

Represents an extracted figure image with a pre-signed URL for direct retrieval.

figure_indexintegerRequired

1-based index of the figure within the source document.

titlestringRequired

Title extracted for the figure from the parsed document metadata.

s3_keystringRequired

Object storage key used to retrieve the image from MinIO.

urlstringRequired

Time-limited pre-signed URL for direct retrieval.

Object example

{
  "figure_index": 1,
  "title": "Figure 1. Evaluation pipeline overview",
  "s3_key": "d227b2b7-1c3f-4e6f-96a9-450f11e35561/figures/picture-1.png",
  "url": "https://minio.localhost/papermill/..."
}

The API key object

Represents a programmatic authentication credential linked to a user.

iduuidRequired

Unique identifier for the API key.

namestringRequired

User-defined label for the key.

key_prefixstringRequired

Non-sensitive prefix used to identify the key.

created_atdatetimeRequired

UTC timestamp when the key was created.

last_used_atdatetimeRequiredNullable

UTC timestamp of the most recent authenticated API call.

revoked_atdatetimeRequiredNullable

UTC timestamp when the key was revoked.

Object example

{
  "id": "7f4bfe76-c8ca-4e36-a107-88f14acbf391",
  "name": "CI runner",
  "key_prefix": "pmk_2fd2058f53c",
  "created_at": "2026-03-09T12:00:00Z",
  "last_used_at": "2026-03-09T12:15:30Z",
  "revoked_at": null
}
POST/api-keys

Create API Key

Creates a Papermill API key for the authenticated user and returns the one-time secret token.

Request body

Content-Type: application/json

Friendly key label for auditing and rotation.

namestringRequired

Display name for the API key.

Response attributes

iduuidRequired

Unique identifier for the API key.

namestringRequired

User-defined label for the key.

key_prefixstringRequired

Non-sensitive prefix used to identify the key.

created_atdatetimeRequired

UTC timestamp when the key was created.

tokenstringRequired

One-time API key secret. Store immediately; it is not returned again.

Status codes

CodeDescription
201API key created.
401"Authentication required"
422"API key name must not be empty" or request validation error
POST/api-keys

cURL request

curl -X POST https://papermill.akl773.com/api-keys \
  -H "Authorization: Bearer <access-token>" \
  -H "Content-Type: application/json" \
  -d '{"name":"CI runner"}'

Response (application/json)

{
  "id": "7f4bfe76-c8ca-4e36-a107-88f14acbf391",
  "token": "pmk_2fd2058f53cd84ce3f0ff8f8db3f587_..."
}
GET/api-keys

List API Keys

Lists API keys created by the authenticated user.

Parameters

Query parameters

NameTypeRequiredDescription
limitintegerOptional

default: 50

Maximum number of API keys to return.
offsetintegerOptional

default: 0

Zero-based index of the first row to return.

Response attributes

itemsarray<ApiKey>Required

Ordered list of API key objects.

iduuidRequired

Unique identifier for the API key.

namestringRequired

User-defined label for the key.

key_prefixstringRequired

Non-sensitive prefix used to identify the key.

created_atdatetimeRequired

UTC timestamp when the key was created.

last_used_atdatetimeRequiredNullable

UTC timestamp of the most recent authenticated API call.

revoked_atdatetimeRequiredNullable

UTC timestamp when the key was revoked.

totalintegerRequired

Total number of API keys available for the authenticated user.

limitintegerRequired

Page size requested through the query string.

offsetintegerRequired

Zero-based starting row for this page.

has_morebooleanRequired

True when additional records exist beyond this page.

Status codes

CodeDescription
200List of API key metadata.
401"Authentication required"
GET/api-keys

cURL request

curl -X GET "https://papermill.akl773.com/api-keys?limit=50&offset=0" \
  -H "Authorization: Bearer <access-token>"

Response (application/json)

{
  "items": [
    {
      "id": "7f4bfe76-c8ca-4e36-a107-88f14acbf391",
      "name": "CI runner",
      "key_prefix": "pmk_2fd2058f53c"
    }
  ],
  "total": 1,
  "limit": 50,
  "offset": 0,
  "has_more": false
}
DELETE/api-keys/{api_key_id}

Revoke API Key

Revokes an API key so it can no longer be used for authentication.

Parameters

Path parameters

NameTypeRequiredDescription
api_key_iduuidRequiredAPI key identifier.

Response attributes

revokedbooleanRequired

Indicates whether the key is revoked.

Status codes

CodeDescription
200Revocation confirmation.
401"Authentication required"
404"API key not found"
DELETE/api-keys/{api_key_id}

cURL request

curl -X DELETE https://papermill.akl773.com/api-keys/7f4bfe76-c8ca-4e36-a107-88f14acbf391 \
  -H "Authorization: Bearer <access-token>"

Response (application/json)

{
  "revoked": true
}
POST/tasks

Create a Task

Uploads a PDF and enqueues it for processing. This route accepts optional Bearer auth; without a token, tasks are scoped to a guest session cookie.

Parameters

Form fields

NameTypeRequiredDescription
filefileRequiredPDF file to process.

Request body

Content-Type: multipart/form-data

Upload payload with one required form field.

filefileRequired

PDF document to process.

Response attributes

iduuidRequired

Unique identifier for the task.

file_namestringRequired

Original filename provided during upload.

statusenum<string>Required

Current processing lifecycle state.

Allowed values

PENDING, QUEUED, PROCESSING, DONE, FAILED

input_pathstringRequired

Storage path to the uploaded source file.

output_pathstringRequiredNullable

Storage path to the processed output file when task is done.

error_messagestringRequiredNullable

Error details when processing fails.

created_atdatetimeRequired

UTC timestamp when the task row was created.

updated_atdatetimeRequired

UTC timestamp for the latest task state change.

Status codes

CodeDescription
201Task created and enqueued.
400"Uploaded file must have a filename"
500"Failed to enqueue task"
POST/tasks

cURL request

curl -X POST https://papermill.akl773.com/tasks \
  -H "Authorization: Bearer <access-token>" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@/path/to/document.pdf"

Response (application/json)

{
  "id": "d227b2b7-1c3f-4e6f-96a9-450f11e35561",
  "status": "QUEUED"
}
GET/tasks

List Tasks

Returns tasks ordered by creation time. This route accepts optional Bearer auth; without a token, results are scoped to a guest session cookie.

Parameters

Query parameters

NameTypeRequiredDescription
limitintegerOptional

default: 50

Maximum number of tasks to return.
offsetintegerOptional

default: 0

Zero-based index of the first row to return.

Response attributes

itemsarray<Task>Required

Ordered list of task objects.

iduuidRequired

Unique identifier for the task.

file_namestringRequired

Original filename provided during upload.

statusenum<string>Required

Current processing lifecycle state.

Allowed values

PENDING, QUEUED, PROCESSING, DONE, FAILED

input_pathstringRequired

Storage path to the uploaded source file.

output_pathstringRequiredNullable

Storage path to the processed output file when task is done.

error_messagestringRequiredNullable

Error details when processing fails.

created_atdatetimeRequired

UTC timestamp when the task row was created.

updated_atdatetimeRequired

UTC timestamp for the latest task state change.

totalintegerRequired

Total number of task records available for the current auth or guest scope.

limitintegerRequired

Page size requested through the query string.

offsetintegerRequired

Zero-based starting row for this page.

has_morebooleanRequired

True when additional records exist beyond this page.

Status codes

CodeDescription
200List of task objects.
GET/tasks

cURL request

curl -X GET "https://papermill.akl773.com/tasks?limit=50&offset=0" \
  -H "Authorization: Bearer <access-token>"

Response (application/json)

{
  "items": [
    {
      "id": "d227b2b7-1c3f-4e6f-96a9-450f11e35561",
      "status": "DONE"
    }
  ],
  "total": 1,
  "limit": 50,
  "offset": 0,
  "has_more": false
}
GET/public/tasks

List Public Guest Tasks

Returns tasks scoped to the guest cookie session only. This endpoint ignores Bearer headers and is intended for unauthenticated browser flows.

Parameters

Query parameters

NameTypeRequiredDescription
limitintegerOptional

default: 50

Maximum number of tasks to return.
offsetintegerOptional

default: 0

Zero-based index of the first row to return.

Response attributes

itemsarray<Task>Required

Ordered list of task objects.

iduuidRequired

Unique identifier for the task.

file_namestringRequired

Original filename provided during upload.

statusenum<string>Required

Current processing lifecycle state.

Allowed values

PENDING, QUEUED, PROCESSING, DONE, FAILED

input_pathstringRequired

Storage path to the uploaded source file.

output_pathstringRequiredNullable

Storage path to the processed output file when task is done.

error_messagestringRequiredNullable

Error details when processing fails.

created_atdatetimeRequired

UTC timestamp when the task row was created.

updated_atdatetimeRequired

UTC timestamp for the latest task state change.

totalintegerRequired

Total number of task records available for the current auth or guest scope.

limitintegerRequired

Page size requested through the query string.

offsetintegerRequired

Zero-based starting row for this page.

has_morebooleanRequired

True when additional records exist beyond this page.

Status codes

CodeDescription
200List of task objects for current guest cookie scope.
GET/public/tasks

cURL request

curl -X GET "https://papermill.akl773.com/public/tasks?limit=50&offset=0"

Response (application/json)

{
  "items": [],
  "total": 0,
  "limit": 50,
  "offset": 0,
  "has_more": false
}
GET/tasks/{task_id}

Retrieve a Task

Fetches one task by UUID. This route accepts optional Bearer auth; without a token, access is scoped to a guest session cookie.

Parameters

Path parameters

NameTypeRequiredDescription
task_iduuidRequiredTask identifier.

Response attributes

iduuidRequired

Unique identifier for the task.

file_namestringRequired

Original filename provided during upload.

statusenum<string>Required

Current processing lifecycle state.

Allowed values

PENDING, QUEUED, PROCESSING, DONE, FAILED

input_pathstringRequired

Storage path to the uploaded source file.

output_pathstringRequiredNullable

Storage path to the processed output file when task is done.

error_messagestringRequiredNullable

Error details when processing fails.

created_atdatetimeRequired

UTC timestamp when the task row was created.

updated_atdatetimeRequired

UTC timestamp for the latest task state change.

Status codes

CodeDescription
200Task object.
404"Task not found"
GET/tasks/{task_id}

cURL request

curl -X GET https://papermill.akl773.com/tasks/d227b2b7-1c3f-4e6f-96a9-450f11e35561 \
  -H "Authorization: Bearer <access-token>"

Response (application/json)

{
  "id": "d227b2b7-1c3f-4e6f-96a9-450f11e35561",
  "status": "DONE"
}
DELETE/tasks/{task_id}

Delete a Task

Deletes a task in the current auth scope (Bearer user or guest session) and removes local artifacts. Active queued/processing tasks cannot be deleted.

Parameters

Path parameters

NameTypeRequiredDescription
task_iduuidRequiredTask identifier.

Response attributes

deletedbooleanRequired

Indicates whether the task and associated storage were removed.

Status codes

CodeDescription
200Deletion confirmation.
404"Task not found"
409"Cannot delete task while processing"
DELETE/tasks/{task_id}

cURL request

curl -X DELETE https://papermill.akl773.com/tasks/d227b2b7-1c3f-4e6f-96a9-450f11e35561 \
  -H "Authorization: Bearer <access-token>"

Response (application/json)

{
  "deleted": true
}
GET/tasks/{task_id}/download

Download Parsed Document

Streams the final output artifact for a completed task. The content type is inferred from file extension.

Parameters

Path parameters

NameTypeRequiredDescription
task_iduuidRequiredTask identifier.

Status codes

CodeDescription
200File stream (.md or .txt).
404"Task not found" or "Task output file missing"
409"Task result not ready"
GET/tasks/{task_id}/download

cURL request

curl -L -o result.md https://papermill.akl773.com/tasks/d227b2b7-1c3f-4e6f-96a9-450f11e35561/download \
  -H "Authorization: Bearer <access-token>"

Response (text/markdown | text/plain | application/octet-stream)

# Markdown output

Generated content...
GET/tasks/{task_id}/assets

Get Extracted Assets JSON

Returns structural metadata from `result.assets.json` produced during parsing.

Parameters

Path parameters

NameTypeRequiredDescription
task_iduuidRequiredTask identifier.

Response attributes

figuresarray<object>Optional

Extracted figure metadata generated by the parser. Exact structure can evolve with parser versions.

indexintegerRequired

1-based figure index in the source document.

pageintegerRequired

Source page number where the figure was detected.

bboxarray<number>Required

Bounding box coordinates as `[x0, y0, x1, y1]`.

Status codes

CodeDescription
200Assets metadata JSON.
404"Task not found", "Task output file missing", or "Task assets file missing"
409"Task result not ready"
500"Task assets file is invalid"
GET/tasks/{task_id}/assets

cURL request

curl -X GET https://papermill.akl773.com/tasks/d227b2b7-1c3f-4e6f-96a9-450f11e35561/assets \
  -H "Authorization: Bearer <access-token>"

Response (application/json)

{
  "figures": [
    { "index": 1, "page": 2, "bbox": [110, 210, 430, 512] }
  ]
}
GET/tasks/{task_id}/figures

List Extracted Figures

Lists extracted figures for a completed task and returns pre-signed URLs for direct retrieval.

Parameters

Path parameters

NameTypeRequiredDescription
task_iduuidRequiredTask identifier.

Query parameters

NameTypeRequiredDescription
limitintegerOptional

default: 50

Maximum number of figures to return.
offsetintegerOptional

default: 0

Zero-based index of the first row to return.

Response attributes

itemsarray<Figure>Required

Ordered list of extracted figure objects.

figure_indexintegerRequired

1-based index of the figure within the source document.

titlestringRequired

Title extracted for the figure from the parsed document metadata.

s3_keystringRequired

Object storage key used to retrieve the image from MinIO.

urlstringRequired

Time-limited pre-signed URL for direct retrieval.

totalintegerRequired

Total number of extracted figures for the task.

limitintegerRequired

Page size requested through the query string.

offsetintegerRequired

Zero-based starting row for this page.

has_morebooleanRequired

True when additional figure records exist beyond this page.

Status codes

CodeDescription
200List of figure objects.
404"Task not found"
409"Task result not ready"
GET/tasks/{task_id}/figures

cURL request

curl -X GET "https://papermill.akl773.com/tasks/d227b2b7-1c3f-4e6f-96a9-450f11e35561/figures?limit=50&offset=0" \
  -H "Authorization: Bearer <access-token>"

Response (application/json)

{
  "items": [
    {
      "figure_index": 1,
      "title": "Figure 1. Evaluation pipeline overview",
      "s3_key": "d227b2b7-1c3f-4e6f-96a9-450f11e35561/figures/picture-1.png",
      "url": "https://minio.localhost/papermill/..."
    }
  ],
  "total": 1,
  "limit": 50,
  "offset": 0,
  "has_more": false
}
GET/tasks/{task_id}/figures/{n}

Retrieve a Figure

Returns metadata for a specific figure index.

Parameters

Path parameters

NameTypeRequiredDescription
task_iduuidRequiredTask identifier.
nintegerRequired

default: 1

1-based figure index.

Response attributes

figure_indexintegerRequired

1-based index of the figure within the source document.

titlestringRequired

Title extracted for the figure from the parsed document metadata.

s3_keystringRequired

Object storage key used to retrieve the image from MinIO.

urlstringRequired

Time-limited pre-signed URL for direct retrieval.

Status codes

CodeDescription
200Single figure object.
404"Task not found" or "Figure not found"
409"Task result not ready"
422Validation error when `n` is less than 1.
GET/tasks/{task_id}/figures/{n}

cURL request

curl -X GET https://papermill.akl773.com/tasks/d227b2b7-1c3f-4e6f-96a9-450f11e35561/figures/1 \
  -H "Authorization: Bearer <access-token>"

Response (application/json)

{
  "figure_index": 1,
  "title": "Figure 1. Evaluation pipeline overview",
  "s3_key": "d227b2b7-1c3f-4e6f-96a9-450f11e35561/figures/picture-1.png",
  "url": "https://minio.localhost/papermill/..."
}
GET/tasks/{task_id}/figures/{n}/download

Download a Figure

Streams the figure image directly. Set `download=true` to force attachment download.

Parameters

Path parameters

NameTypeRequiredDescription
task_iduuidRequiredTask identifier.
nintegerRequired1-based figure index.

Query parameters

NameTypeRequiredDescription
downloadbooleanOptional

default: false

When true, returns attachment disposition.

Status codes

CodeDescription
200PNG byte stream.
404"Task not found", "Figure not found", or "Figure file not found"
409"Task result not ready"
422Validation error when `n` is less than 1.
502"Failed to fetch figure file"
GET/tasks/{task_id}/figures/{n}/download

cURL request

curl -L -o figure-1.png "https://papermill.akl773.com/tasks/d227b2b7-1c3f-4e6f-96a9-450f11e35561/figures/1/download?download=true" \
  -H "Authorization: Bearer <access-token>"

Response (image/png)

(binary image data)
GET/health

Health Check

Verifies that the API process is reachable.

Response attributes

statusstringRequired

Health marker. Always `ok` when the service is reachable.

Allowed values

ok

Status codes

CodeDescription
200Service health payload.
GET/health

cURL request

curl -X GET https://papermill.akl773.com/health

Response (application/json)

{
  "status": "ok"
}
GET/api/v1/meta

Service Metadata

Returns runtime metadata and queue configuration values.

Response attributes

app_namestringRequired

Service name configured at startup.

environmentstringRequired

Runtime environment name.

versionstringRequired

Application version.

debugbooleanRequired

Debug mode flag.

queuestringRequired

ARQ queue name used for document processing jobs.

placeholder_processingbooleanRequired

Reserved feature flag for future processing behavior.

Status codes

CodeDescription
200Metadata payload.
GET/api/v1/meta

cURL request

curl -X GET https://papermill.akl773.com/api/v1/meta

Response (application/json)

{
  "app_name": "Document Task Queue Service",
  "environment": "development",
  "version": "0.1.0",
  "debug": false,
  "queue": "document_tasks",
  "placeholder_processing": false
}