Skip to content

Using Alan's OpenAI-compatible API Interface

This article provides the necessary steps to use Alan via the OpenAI-compatible API interface. This allows Alan to be used directly through the official OpenAI client or indirectly through other OpenAI-compatible software.

Provided Endpoints

To enable usage with OpenAI-compatible software, Alan offers various endpoints:

GET /oai/models Lists all available models that can be used via the OpenAI-compatible API and is used to check the status of deployed models.

GET /oai/models/{model} Retrieves detailed information about a specific model.

GET /oai/chat/completions Generates text responses based on a chat history.

GET /oai/embeddings Generates embeddings from text.

The provided endpoints match the official OpenAI endpoints in expected input and generated output.

Using the Official OpenAI Client

The following code example demonstrates how to use Alan with the official OpenAI client. Both the asynchronous and synchronous OpenAI clients can be used.

Preparation

Using Alan with an OpenAI client requires an Alan API key. Every user can create one here.

Synchronous Client

python
from openai import OpenAI

api_key = "alan-..."
base_url = "https://app.alan.de/api/v1/oai"

client = OpenAI(api_key=api_key, base_url=base_url)

Asynchronous Client

python
from openai import AsyncOpenAI

api_key = "alan-..."
base_url = "https://app.alan.de/api/v1/oai"

async_client = AsyncOpenAI(api_key=api_key, base_url=base_url)

Retrieving Available Models and Specific Model Information

Lists all available models and their status. The response schema matches the OpenAI format.

python
model = "comma-soft/llama-comma-llm-s-v3"
print(client.models.list())
print(client.models.retrieve(model))

Generating Chat Completions

Generates text responses based on a chat history matching the OpenAI standard. When using this endpoint, only the following parameters are considered in addition to messages, model, and stream: temperature, top_p, max_token. The parameter ranges, types, and response format match the OpenAI standard.

Streaming Response

python
completion = client.chat.completions.create(
    messages=[{"role": "user", "content": "Wie geht es dir?"}],
    model="comma-soft/comma-llm-l",
    temperature=...,
    top_p=...,
    max_token=...,
    stream=True
)

for chunk in completion:
    print(chunk)

Non-Streaming Response

python
completion = client.chat.completions.create(
    messages=[{"role": "user", "content": "Wie geht es dir?"}],
    model="comma-soft/comma-llm-l",
    stream=False
)

print(completion)

Embeddings

Calculates a numerical vector for a given input text. All available parameters can be used.

python
embeddings = client.embeddings.create(input="Dies ist ein Text!", model="comma-soft/comma-embedding-20240625")
print(embeddings)

Extending with Alan-specific Features

We also provide the ability to use Alan-specific features like experts and knowledge databases through the OpenAI interface. For this, the extra_body parameter is used. The behavior and prioritization of knowledge databases match the generate_stream endpoint.

python
extra_body = {
    "alan": {
        "expert_id": "<expert-id>",
        "knowledgebase_ids": ["<knowledgebase_id>"]
    }
}

message = {
    "role": "user",
    "content": "Was kannst du mir über Max erzählen?"
}

completion = client.chat.completions.create(
    messages=[message], model="comma-soft/llama-comma-llm-l-v3", stream=False, extra_body=extra_body
)

print(completion)

Limitations

Currently, we only support the basic functionality described above. An extension to include advanced features like Guided Generation or Beam Search is planned for one of the next updates.