API Reference

Lucena One exposes an OpenAI-style chat API on the branded hosted endpoint below. The production surface currently includes model discovery, chat completions, token counting, and runtime stats.

Authentication

All API requests require a Bearer token in the Authorization header. Your key is prefixed sk-lucena-.

Authorization: Bearer sk-lucena-your_key_here

All public routes on models.lucena.one are authenticated. If the header is missing, the gateway returns 401 with {"error":"Missing API Key"}.

→ Request API Access

Base URL

https://models.lucena.one

All endpoints are served over HTTPS. HTTP requests will be redirected.

Environment	Base URL	Status
Production	https://models.lucena.one	Live
Hosted alias	https://lucenalabs.web.app	Equivalent

Use the branded production hostname for client integrations. Routes not listed on this page should be treated as internal or unstable.

Quick Start

Verify your key, inspect the live model list, then send a chat request.

1. List models

curl https://models.lucena.one/v1/models \
  -H "Authorization: Bearer sk-lucena-your_key_here"

2. Send a completion request

curl -N https://models.lucena.one/v1/chat/completions \
  -H "Authorization: Bearer sk-lucena-your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "lucena-coder-latest",
    "messages": [
      { "role": "user", "content": "Reply with the single word pong." }
    ]
  }'

List Models

GET /v1/models

Returns the models currently loaded on the hosted deployment. Query this endpoint instead of hardcoding model availability.

Response

{
  "object": "list",
  "data": [
    {
      "id": "lucena-coder-latest",
      "object": "model",
      "created": 1700000000,
      "owned_by": "lucena",
      "description": "Lucena Coder: advanced code generation and understanding.",
      "name": "Lucena Coder",
      "max_input_tokens": 100000,
      "max_output_tokens": 16000,
      "context_window": 101000,
      "default_shape": null,
      "available_shapes": []
    }
  ]
}

Chat Completions

POST /v1/chat/completions

Creates a model response for a conversation. The current hosted deployment responds as text/event-stream chunk data, so clients should be prepared to consume SSE.

Request Body

Parameter	Type	Description
model required	string	Model ID returned by `/v1/models`. The current production deployment returns `lucena-coder-latest`.
messages required	array	Array of message objects with `role` (`user` \| `assistant` \| `system`) and `content`.
stream optional	boolean	OpenAI-style streaming flag. The current hosted deployment streams chunk responses even when this field is omitted or `false`.
max_tokens optional	integer	Maximum tokens to generate. The live model advertises an output limit of `16000`.
temperature optional	float	Sampling temperature between 0 and 2. Lower = more deterministic. Default: `0.7`.

Response

The live endpoint currently returns Content-Type: text/event-stream. A typical response looks like this:

: connected

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"pong"},"finish_reason":null}]}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[],"usage":{"prompt_tokens":62,"completion_tokens":1,"total_tokens":63}}

data: {"id":"chatcmpl-...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Token Count

POST /v1/tokens/count

Returns a prompt token estimate and model limits without generating a completion.

Request

{
  "model": "lucena-coder-latest",
  "messages": [
    { "role": "user", "content": "Hello world" }
  ]
}

Response

{
  "object": "token_count",
  "model": "lucena-coder-latest",
  "usage": {
    "prompt_tokens": 57,
    "completion_tokens": 0,
    "total_tokens": 57
  },
  "limits": {
    "trim_limit": 100000,
    "context_size": 2048,
    "context_limit": 101000,
    "output_limit": 16000
  },
  "_lucena": {
    "system_prompt_source": "default",
    "trimmed_messages": 1,
    "shape_id": null
  }
}

Runtime Stats

GET /v1/stats

Returns lightweight memory stats from the currently running worker.

Response

{
  "heap_used_mb": 41,
  "heap_total_mb": 85,
  "rss_mb": 1276
}

Streaming

Lucena uses Server-Sent Events (SSE) on the hosted chat endpoint. You may receive keepalive comment lines such as : connected. Parse lines prefixed with data: , ignore the final data: [DONE] marker, and concatenate each chunk's choices[0].delta.content.

Usage arrives in a separate chunk with an empty choices array and a populated usage object.

JavaScript (fetch)

const response = await fetch('https://models.lucena.one/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer sk-lucena-your_key_here',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'lucena-coder-latest',
    stream: true,
    messages: [{ role: 'user', content: 'Write a binary search in Python.' }]
  })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  buffer += decoder.decode(value, { stream: true });
  const lines = buffer.split('\n');
  buffer = lines.pop() || '';

  for (const line of lines) {
    if (!line.startsWith('data: ') || line === 'data: [DONE]') continue;
    const chunk = JSON.parse(line.slice(6));
    const content = chunk.choices[0]?.delta?.content || '';
    process.stdout.write(content);
  }
}

Code Examples

cURL

curl -N https://models.lucena.one/v1/chat/completions \
  -H "Authorization: Bearer sk-lucena-your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "lucena-coder-latest",
    "messages": [
      { "role": "user", "content": "Refactor this function to use async/await." }
    ]
  }'

Node.js (OpenAI SDK)

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-lucena-your_key_here',
  baseURL: 'https://models.lucena.one/v1'
});

const stream = await client.chat.completions.create({
  model: 'lucena-coder-latest',
  stream: true,
  messages: [{ role: 'user', content: 'Write a REST API in Express.' }]
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    api_key="sk-lucena-your_key_here",
    base_url="https://models.lucena.one/v1"
)

stream = client.chat.completions.create(
    model="lucena-coder-latest",
    stream=True,
    messages=[{"role": "user", "content": "Write a binary search in Python."}]
)

for chunk in stream:
  if not chunk.choices:
    continue
  text = chunk.choices[0].delta.content or ""
    print(text, end="")

Error Handling

Lucena uses standard HTTP status codes. Authentication errors are returned by the gateway, while backend route errors come from the model server.

Missing API key

{
  "error": "Missing API Key"
}

Backend route error

{
  "error": {
    "message": "Not found.",
    "type": "not_found",
    "code": 404
  }
}

Status	Type	Description
400	invalid_request_error	Malformed request body or missing required fields.
401	gateway_auth_error	Missing or invalid bearer key.
404	not_found	Endpoint is not exposed on the current hosted deployment.
503	server_error	Model worker is loading or temporarily unavailable.
500	server_error	Inference failed or internal error.