Using Alan's OpenAI-compatible API Interface
This article provides the necessary steps to use Alan via the OpenAI-compatible API interface. This allows Alan to be used directly through the official OpenAI client or indirectly through other OpenAI-compatible software.
Provided Endpoints
To enable usage with OpenAI-compatible software, Alan offers various endpoints:
GET /oai/models
Lists all available models that can be used via the OpenAI-compatible API and is used to check the status of deployed models.
GET /oai/models/{model}
Retrieves detailed information about a specific model.
GET /oai/chat/completions
Generates text responses based on a chat history.
GET /oai/embeddings
Generates embeddings from text.
The provided endpoints match the official OpenAI endpoints in expected input and generated output.
Using the Official OpenAI Client
The following code example demonstrates how to use Alan with the official OpenAI client. Both the asynchronous and synchronous OpenAI clients can be used.
Preparation
Using Alan with an OpenAI client requires an Alan API key. Every user can create one here.
Synchronous Client
from openai import OpenAI
api_key = "alan-..."
base_url = "https://app.alan.de/api/v1/oai"
client = OpenAI(api_key=api_key, base_url=base_url)
Asynchronous Client
from openai import AsyncOpenAI
api_key = "alan-..."
base_url = "https://app.alan.de/api/v1/oai"
async_client = AsyncOpenAI(api_key=api_key, base_url=base_url)
Retrieving Available Models and Specific Model Information
Lists all available models and their status. The response schema matches the OpenAI format.
model = "comma-soft/llama-comma-llm-s-v3"
print(client.models.list())
print(client.models.retrieve(model))
Generating Chat Completions
Generates text responses based on a chat history matching the OpenAI standard. When using this endpoint, only the following parameters are considered in addition to messages
, model
, and stream
: temperature
, top_p
, max_token
. The parameter ranges, types, and response format match the OpenAI standard.
Streaming Response
completion = client.chat.completions.create(
messages=[{"role": "user", "content": "Wie geht es dir?"}],
model="comma-soft/comma-llm-l",
temperature=...,
top_p=...,
max_token=...,
stream=True
)
for chunk in completion:
print(chunk)
Non-Streaming Response
completion = client.chat.completions.create(
messages=[{"role": "user", "content": "Wie geht es dir?"}],
model="comma-soft/comma-llm-l",
stream=False
)
print(completion)
Embeddings
Calculates a numerical vector for a given input text. All available parameters can be used.
embeddings = client.embeddings.create(input="Dies ist ein Text!", model="comma-soft/comma-embedding-20240625")
print(embeddings)
Extending with Alan-specific Features
We also provide the ability to use Alan-specific features like experts and knowledge databases through the OpenAI interface. For this, the extra_body
parameter is used. The behavior and prioritization of knowledge databases match the generate_stream
endpoint.
extra_body = {
"alan": {
"expert_id": "<expert-id>",
"knowledgebase_ids": ["<knowledgebase_id>"]
}
}
message = {
"role": "user",
"content": "Was kannst du mir über Max erzählen?"
}
completion = client.chat.completions.create(
messages=[message], model="comma-soft/llama-comma-llm-l-v3", stream=False, extra_body=extra_body
)
print(completion)
Limitations
Currently, we only support the basic functionality described above. An extension to include advanced features like Guided Generation or Beam Search is planned for one of the next updates.