API Reference
Lucena One exposes an OpenAI-style chat API on the branded hosted endpoint below. The production surface currently includes model discovery, chat completions, token counting, and runtime stats.
Authentication
All API requests require a Bearer token in the Authorization header. Your key is prefixed sk-lucena-.
Authorization: Bearer sk-lucena-your_key_here
All public routes on models.lucena.one are authenticated. If the header is missing, the gateway returns 401 with {"error":"Missing API Key"}.
Base URL
https://models.lucena.one
All endpoints are served over HTTPS. HTTP requests will be redirected.
| Environment | Base URL | Status |
|---|---|---|
| Production | https://models.lucena.one | Live |
| Hosted alias | https://lucenalabs.web.app | Equivalent |
Use the branded production hostname for client integrations. Routes not listed on this page should be treated as internal or unstable.
Quick Start
Verify your key, inspect the live model list, then send a chat request.
1. List models
curl https://models.lucena.one/v1/models \ -H "Authorization: Bearer sk-lucena-your_key_here"
2. Send a completion request
curl -N https://models.lucena.one/v1/chat/completions \
-H "Authorization: Bearer sk-lucena-your_key_here" \
-H "Content-Type: application/json" \
-d '{
"model": "lucena-coder-latest",
"messages": [
{ "role": "user", "content": "Reply with the single word pong." }
]
}'
List Models
Returns the models currently loaded on the hosted deployment. Query this endpoint instead of hardcoding model availability.
Response
{
"object": "list",
"data": [
{
"id": "lucena-coder-latest",
"object": "model",
"created": 1700000000,
"owned_by": "lucena",
"description": "Lucena Coder: advanced code generation and understanding.",
"name": "Lucena Coder",
"max_input_tokens": 100000,
"max_output_tokens": 16000,
"context_window": 101000,
"default_shape": null,
"available_shapes": []
}
]
}
Chat Completions
Creates a model response for a conversation. The current hosted deployment responds as text/event-stream chunk data, so clients should be prepared to consume SSE.
Request Body
| Parameter | Type | Description |
|---|---|---|
| model required | string | Model ID returned by /v1/models. The current production deployment returns lucena-coder-latest. |
| messages required | array | Array of message objects with role (user | assistant | system) and content. |
| stream optional | boolean | OpenAI-style streaming flag. The current hosted deployment streams chunk responses even when this field is omitted or false. |
| max_tokens optional | integer | Maximum tokens to generate. The live model advertises an output limit of 16000. |
| temperature optional | float | Sampling temperature between 0 and 2. Lower = more deterministic. Default: 0.7. |
Response
The live endpoint currently returns Content-Type: text/event-stream. A typical response looks like this:
: connected
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"pong"},"finish_reason":null}]}
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[],"usage":{"prompt_tokens":62,"completion_tokens":1,"total_tokens":63}}
data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
Token Count
Returns a prompt token estimate and model limits without generating a completion.
Request
{
"model": "lucena-coder-latest",
"messages": [
{ "role": "user", "content": "Hello world" }
]
}
Response
{
"object": "token_count",
"model": "lucena-coder-latest",
"usage": {
"prompt_tokens": 57,
"completion_tokens": 0,
"total_tokens": 57
},
"limits": {
"trim_limit": 100000,
"context_size": 2048,
"context_limit": 101000,
"output_limit": 16000
},
"_lucena": {
"system_prompt_source": "default",
"trimmed_messages": 1,
"shape_id": null
}
}
Runtime Stats
Returns lightweight memory stats from the currently running worker.
Response
{
"heap_used_mb": 41,
"heap_total_mb": 85,
"rss_mb": 1276
}
Streaming
Lucena uses Server-Sent Events (SSE) on the hosted chat endpoint. You may receive keepalive comment lines such as : connected. Parse lines prefixed with data: , ignore the final data: [DONE] marker, and concatenate each chunk's choices[0].delta.content.
Usage arrives in a separate chunk with an empty choices array and a populated usage object.
JavaScript (fetch)
const response = await fetch('https://models.lucena.one/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer sk-lucena-your_key_here',
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'lucena-coder-latest',
stream: true,
messages: [{ role: 'user', content: 'Write a binary search in Python.' }]
})
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split('\n');
buffer = lines.pop() || '';
for (const line of lines) {
if (!line.startsWith('data: ') || line === 'data: [DONE]') continue;
const chunk = JSON.parse(line.slice(6));
const content = chunk.choices[0]?.delta?.content || '';
process.stdout.write(content);
}
}
Code Examples
cURL
curl -N https://models.lucena.one/v1/chat/completions \
-H "Authorization: Bearer sk-lucena-your_key_here" \
-H "Content-Type: application/json" \
-d '{
"model": "lucena-coder-latest",
"messages": [
{ "role": "user", "content": "Refactor this function to use async/await." }
]
}'
Node.js (OpenAI SDK)
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: 'sk-lucena-your_key_here',
baseURL: 'https://models.lucena.one/v1'
});
const stream = await client.chat.completions.create({
model: 'lucena-coder-latest',
stream: true,
messages: [{ role: 'user', content: 'Write a REST API in Express.' }]
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}
Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
api_key="sk-lucena-your_key_here",
base_url="https://models.lucena.one/v1"
)
stream = client.chat.completions.create(
model="lucena-coder-latest",
stream=True,
messages=[{"role": "user", "content": "Write a binary search in Python."}]
)
for chunk in stream:
if not chunk.choices:
continue
text = chunk.choices[0].delta.content or ""
print(text, end="")
Error Handling
Lucena uses standard HTTP status codes. Authentication errors are returned by the gateway, while backend route errors come from the model server.
Missing API key
{
"error": "Missing API Key"
}
Backend route error
{
"error": {
"message": "Not found.",
"type": "not_found",
"code": 404
}
}
| Status | Type | Description |
|---|---|---|
| 400 | invalid_request_error | Malformed request body or missing required fields. |
| 401 | gateway_auth_error | Missing or invalid bearer key. |
| 404 | not_found | Endpoint is not exposed on the current hosted deployment. |
| 503 | server_error | Model worker is loading or temporarily unavailable. |
| 500 | server_error | Inference failed or internal error. |