API Reference

Lucena One exposes an OpenAI-style chat API on the branded hosted endpoint below. The production surface currently includes model discovery, chat completions, token counting, and runtime stats.

Authentication

All API requests require a Bearer token in the Authorization header. Your key is prefixed sk-lucena-.

Authorization: Bearer sk-lucena-your_key_here

All public routes on models.lucena.one are authenticated. If the header is missing, the gateway returns 401 with {"error":"Missing API Key"}.

→ Request API Access

Base URL

https://models.lucena.one

All endpoints are served over HTTPS. HTTP requests will be redirected.

EnvironmentBase URLStatus
Productionhttps://models.lucena.oneLive
Hosted aliashttps://lucenalabs.web.appEquivalent

Use the branded production hostname for client integrations. Routes not listed on this page should be treated as internal or unstable.

Quick Start

Verify your key, inspect the live model list, then send a chat request.

1. List models

curl https://models.lucena.one/v1/models \
  -H "Authorization: Bearer sk-lucena-your_key_here"

2. Send a completion request

curl -N https://models.lucena.one/v1/chat/completions \
  -H "Authorization: Bearer sk-lucena-your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "lucena-coder-latest",
    "messages": [
      { "role": "user", "content": "Reply with the single word pong." }
    ]
  }'

List Models

GET /v1/models

Returns the models currently loaded on the hosted deployment. Query this endpoint instead of hardcoding model availability.

Response

{
  "object": "list",
  "data": [
    {
      "id": "lucena-coder-latest",
      "object": "model",
      "created": 1700000000,
      "owned_by": "lucena",
      "description": "Lucena Coder: advanced code generation and understanding.",
      "name": "Lucena Coder",
      "max_input_tokens": 100000,
      "max_output_tokens": 16000,
      "context_window": 101000,
      "default_shape": null,
      "available_shapes": []
    }
  ]
}

Chat Completions

POST /v1/chat/completions

Creates a model response for a conversation. The current hosted deployment responds as text/event-stream chunk data, so clients should be prepared to consume SSE.

Request Body

ParameterTypeDescription
model required string Model ID returned by /v1/models. The current production deployment returns lucena-coder-latest.
messages required array Array of message objects with role (user | assistant | system) and content.
stream optional boolean OpenAI-style streaming flag. The current hosted deployment streams chunk responses even when this field is omitted or false.
max_tokens optional integer Maximum tokens to generate. The live model advertises an output limit of 16000.
temperature optional float Sampling temperature between 0 and 2. Lower = more deterministic. Default: 0.7.

Response

The live endpoint currently returns Content-Type: text/event-stream. A typical response looks like this:

: connected

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"pong"},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[],"usage":{"prompt_tokens":62,"completion_tokens":1,"total_tokens":63}}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Token Count

POST /v1/tokens/count

Returns a prompt token estimate and model limits without generating a completion.

Request

{
  "model": "lucena-coder-latest",
  "messages": [
    { "role": "user", "content": "Hello world" }
  ]
}

Response

{
  "object": "token_count",
  "model": "lucena-coder-latest",
  "usage": {
    "prompt_tokens": 57,
    "completion_tokens": 0,
    "total_tokens": 57
  },
  "limits": {
    "trim_limit": 100000,
    "context_size": 2048,
    "context_limit": 101000,
    "output_limit": 16000
  },
  "_lucena": {
    "system_prompt_source": "default",
    "trimmed_messages": 1,
    "shape_id": null
  }
}

Runtime Stats

GET /v1/stats

Returns lightweight memory stats from the currently running worker.

Response

{
  "heap_used_mb": 41,
  "heap_total_mb": 85,
  "rss_mb": 1276
}

Streaming

Lucena uses Server-Sent Events (SSE) on the hosted chat endpoint. You may receive keepalive comment lines such as : connected. Parse lines prefixed with data: , ignore the final data: [DONE] marker, and concatenate each chunk's choices[0].delta.content.

Usage arrives in a separate chunk with an empty choices array and a populated usage object.

JavaScript (fetch)

const response = await fetch('https://models.lucena.one/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer sk-lucena-your_key_here',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'lucena-coder-latest',
    stream: true,
    messages: [{ role: 'user', content: 'Write a binary search in Python.' }]
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  buffer += decoder.decode(value, { stream: true });
  const lines = buffer.split('\n');
  buffer = lines.pop() || '';

  for (const line of lines) {
    if (!line.startsWith('data: ') || line === 'data: [DONE]') continue;
    const chunk = JSON.parse(line.slice(6));
    const content = chunk.choices[0]?.delta?.content || '';
    process.stdout.write(content);
  }
}

Code Examples

cURL

curl -N https://models.lucena.one/v1/chat/completions \
  -H "Authorization: Bearer sk-lucena-your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "lucena-coder-latest",
    "messages": [
      { "role": "user", "content": "Refactor this function to use async/await." }
    ]
  }'

Node.js (OpenAI SDK)

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-lucena-your_key_here',
  baseURL: 'https://models.lucena.one/v1'
});

const stream = await client.chat.completions.create({
  model: 'lucena-coder-latest',
  stream: true,
  messages: [{ role: 'user', content: 'Write a REST API in Express.' }]
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    api_key="sk-lucena-your_key_here",
    base_url="https://models.lucena.one/v1"
)

stream = client.chat.completions.create(
    model="lucena-coder-latest",
    stream=True,
    messages=[{"role": "user", "content": "Write a binary search in Python."}]
)

for chunk in stream:
  if not chunk.choices:
    continue
  text = chunk.choices[0].delta.content or ""
    print(text, end="")

Error Handling

Lucena uses standard HTTP status codes. Authentication errors are returned by the gateway, while backend route errors come from the model server.

Missing API key

{
  "error": "Missing API Key"
}

Backend route error

{
  "error": {
    "message": "Not found.",
    "type": "not_found",
    "code": 404
  }
}
StatusTypeDescription
400invalid_request_errorMalformed request body or missing required fields.
401gateway_auth_errorMissing or invalid bearer key.
404not_foundEndpoint is not exposed on the current hosted deployment.
503server_errorModel worker is loading or temporarily unavailable.
500server_errorInference failed or internal error.