LLM as a Service (LLMaaS)
API Access
The API is accessible via the Cloud Temple Console. You can manage your API keys, monitor your usage, and configure your tiers in your account settings. The console also allows you to view the usage of your models.
Authentication
All requests to the LLMaaS API must include an Authorization header with your API key in Bearer token format. If you use the client SDKs, the key will be automatically included in each request. If you integrate directly with the API, you must send this header yourself.
Content Types
The LLMaaS API always accepts JSON in the request body and returns JSON in the response body. You must send the content-type: application/json header in your requests. If you are using client SDKs, this will be handled automatically.
Response Headers
The LLMaaS API includes the following headers in each response:
id: A globally unique identifier for the requestbackend: Information about the infrastructure used (engine_type, machine_name)
Examples
cURL Request
curl -X POST "https://api.ai.cloud-temple.com/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "gpt-oss:120b",
"messages": [
{
"role": "user",
"content": "Salut ! Peux-tu te présenter en français ?"
}
],
"max_tokens": 200,
"temperature": 0.7
}'
Response
{
"backend": {
"engine_type": "engo",
"machine_name": "ma02"
},
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "Bonjour ! Je suis un modèle de langage virtuel...",
"role": "assistant"
}
}
],
"created": 1749110753,
"id": "chatcmpl-ollama-14b812ef-b21f-430c-b93c-d0d1bf653806",
"model": "gpt-oss:120b",
"object": "chat.completion",
"usage": {
"completion_tokens": 200,
"prompt_tokens": 70,
"reasoning_tokens": 0,
"total_tokens": 270
}
}
Available Parameters
| Parameter | Type | Description |
|---|---|---|
model | string | The model to use (see model catalog) |
messages | array | List of conversation messages |
max_tokens | integer | Maximum number of tokens to generate |
temperature | float | Controls creativity (0.0-2.0) |
top_p | float | Controls response diversity |
stream | boolean | Enables response streaming |
user | string | Unique identifier for the end user |
Base URL
The base URL for all API requests is:
https://api.ai.cloud-temple.com/v1/
Available Endpoints
/chat/completions: Conversational response generation/completions: Simple text completion/embeddings: Vectorization for semantic search and RAG/rerankand/v2/rerank: Result reranking (Cohere SDK compatible)/audio/transcriptions: Batch audio transcription (Whisper)/audio/speech: Voice synthesis (TTS)/images/generations: Image generation/models: List of available models
Example: List of models
curl -X GET "https://api.ai.cloud-temple.com/v1/models" \
-H "Authorization: Bearer YOUR_API_KEY"
Response:
{
"object": "list",
"data": [
{
"id": "gpt-oss:120b",
"object": "model",
"created": 1749110897,
"owned_by": "CloudTemple",
"root": "gpt-oss:120b",
"aliases": ["gpt-oss:120b"],
"parent": null,
"max_model_len": 60000,
"permission": [
{
"id": "modelperm-granite3.3:8b-1749110897",
"object": "model_permission",
"created": 1749110897,
"allow_create_engine": false,
"allow_sampling": true,
"allow_logprobs": true,
"allow_search_indices": false,
"allow_view": true,
"allow_fine_tuning": false,
"organization": "*",
"group": null,
"is_blocking": false
}
]
}
]
}
The response contains all available models along with their specifications and permissions.