LLM as a Service (LLMaaS)

API Access

The API is accessible via the Cloud Temple Console. You can manage your API keys, monitor your usage, and configure your tiers in your account settings. The console also allows you to view the usage of your models.

Authentication

All requests to the LLMaaS API must include an Authorization header with your API key in Bearer token format. If you use the client SDKs, the key will be automatically included in each request. If you integrate directly with the API, you must send this header yourself.

Content Types

The LLMaaS API always accepts JSON in the request body and returns JSON in the response body. You must send the content-type: application/json header in your requests. If you are using client SDKs, this will be handled automatically.

Response Headers

The LLMaaS API includes the following headers in each response:

id : A globally unique identifier for the request
backend : Information about the infrastructure used (engine_type, machine_name)

Examples

cURL Request

curl -X POST "https://api.ai.cloud-temple.com/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "gpt-oss:120b",
    "messages": [
      {
        "role": "user", 
        "content": "Salut ! Peux-tu te présenter en français ?"
      }
    ],
    "max_tokens": 200,
    "temperature": 0.7
  }'

Response

{
  "backend": {
    "engine_type": "engo",
    "machine_name": "ma02"
  },
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "Bonjour ! Je suis un modèle de langage virtuel...",
        "role": "assistant"
      }
    }
  ],
  "created": 1749110753,
  "id": "chatcmpl-ollama-14b812ef-b21f-430c-b93c-d0d1bf653806",
  "model": "gpt-oss:120b",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 200,
    "prompt_tokens": 70,
    "reasoning_tokens": 0,
    "total_tokens": 270
  }
}

Available Parameters

Parameter	Type	Description
`model`	string	The model to use (see model catalog)
`messages`	array	List of conversation messages
`max_tokens`	integer	Maximum number of tokens to generate
`temperature`	float	Controls creativity (0.0-2.0)
`top_p`	float	Controls response diversity
`stream`	boolean	Enables response streaming
`user`	string	Unique identifier for the end user

Base URL

The base URL for all API requests is:

https://api.ai.cloud-temple.com/v1/

Available Endpoints

/chat/completions : Conversational response generation
/completions : Simple text completion
/embeddings : Vectorization for semantic search and RAG
/rerank and /v2/rerank : Result reranking (Cohere SDK compatible)
/audio/transcriptions : Batch audio transcription (Whisper)
/audio/speech : Voice synthesis (TTS)
/images/generations : Image generation
/models : List of available models

Example: List of models

curl -X GET "https://api.ai.cloud-temple.com/v1/models" \
  -H "Authorization: Bearer YOUR_API_KEY"

Response:

{
  "object": "list",
  "data": [
    {
      "id": "gpt-oss:120b",
      "object": "model",
      "created": 1749110897,
      "owned_by": "CloudTemple",
      "root": "gpt-oss:120b",
      "aliases": ["gpt-oss:120b"],
      "parent": null,
      "max_model_len": 60000,
      "permission": [
        {
          "id": "modelperm-granite3.3:8b-1749110897",
          "object": "model_permission",
          "created": 1749110897,
          "allow_create_engine": false,
          "allow_sampling": true,
          "allow_logprobs": true,
          "allow_search_indices": false,
          "allow_view": true,
          "allow_fine_tuning": false,
          "organization": "*",
          "group": null,
          "is_blocking": false
        }
      ]
    }
  ]
}

The response contains all available models along with their specifications and permissions.

API Access​

Authentication​

Content Types​

Response Headers​

Examples​

cURL Request​

Response​

Available Parameters​

Base URL​

Available Endpoints​

Example: List of models​