LLM as a Service Modellkatalog
Übersicht
Cloud Temple LLMaaS bietet 46 sorgfältig ausgewählte und optimierte große Sprachmodelle, die den anspruchsvollsten Anforderungen von SecNumCloud entsprechen. Unser Katalog umfasst die gesamte Bandbreite – von ultraeffizienten Mikromodellen bis hin zu extrem umfangreichen Modellen.
Globale Statistiken
| Metrik | Wert |
|---|---|
| Gesamte Anzahl an Modellen | 46 Modelle |
| Minimale Kontextlänge | 2.048 Tokens |
| Maximale Kontextlänge | 262.144 Tokens |
| Konformität | SecNumCloud ✅ HDS ✅ Souveränität ✅ C5 ✅ |
| Standort | 100 % Frankreich 🇫🇷 |
Pricing
| Usage Type | Price |
|---|---|
| Input Tokens | 1.9€ / million tokens |
| Output Tokens | 8€ / million tokens |
| Advanced Reasoning | 8€ / million tokens |
Large Language Models
cogito:32b
Deep Cogito • 32B parameters • Context: 32,000 tokens
Advanced version of the Cogito model offering significantly enhanced reasoning and analytical capabilities, designed for the most demanding AI analytical applications.
Technical specifications:
- Speed : 20 tokens/second
- Consumption : 6.67 kWh/million tokens
- License : LLAMA 3.2 Community License
- Location : FR 🇫🇷
Capabilities: ✅ Tools/Agent • ❌ Vision • ✅ Reasoning • ❌ Security
Tags: Agent Reasoning Comprehension Analysis
Use cases:
- Multi-factorial scenario analysis with probabilistic evaluation of outcomes
- Scientific problem solving with formal demonstration of steps
- High-criticality applications requiring precision and verifiability of results
- Expert systems in specialized domains (legal, medical, technical)
- Multi-step reasoning analysis with full explainability of conclusions
gemma3:27b
Google • 27B Parameters • Context: 120,000 tokens
Revolutionary model from Google offering an optimal balance between power and efficiency, with an exceptional performance-to-cost ratio for demanding professional applications.
Technical Specifications:
- Speed: 21 tokens/second
- Energy Consumption: 6.35 kWh per million tokens
- License: Google Gemma Terms of Use
- Location: FR 🇫🇷
Capabilities: ✅ Tools/Agent • ✅ Vision • ❌ Reasoning • ❌ Security
Tags: Vision Agent Large Context
Use Cases:
- Document analysis with extended context up to 120K tokens (approximately 400 pages)
- Semantic indexing and search in large document databases
- Simultaneous processing of images and text thanks to multimodal capabilities
- Structured data extraction from PDFs and scanned documents
- Integration with external tools via function calling API
glm-4.7-flash:30b
Zhipu AI • 30B parameters • Context: 120,000 tokens
Flash version of the GLM-4.7 model, optimized for speed and efficiency.
Technical specifications:
- Speed: 103 tokens/second
- Energy consumption: 1.41 kWh per million tokens
- License: Apache 2.0
- Location: FR 🇫🇷
Capabilities: ✅ Tools/Agent • ❌ Vision • ✅ Reasoning • ❌ Security
Tags: Agent Fast Large Context Multilingual
Use cases:
- Fast conversational assistants
- Long document analysis (up to 200k)
- Reasoning tasks with low latency
glm-4.7:358b
⚠️ WARNING: This model is deprecated. Model removed from catalog on 03/30/2026.
Zhipu AI • 358B parameters • Context: 120,000 tokens
High-performance general-purpose model developed by Zhipu AI, excelling in logical reasoning, multilingual understanding, and complex tasks.
Technical Specifications:
- Speed : 18 tokens/second
- Consumption : 7.41 kWh per million tokens
- License : Apache 2.0
- Location : FR 🇫🇷
Capabilities: ✅ Tools/Agent • ❌ Vision • ✅ Reasoning • ❌ Security
Tags: Agent Reasoning Large Context Multilingual
Use Cases:
- Complex reasoning tasks
- Long document analysis
- Advanced conversational assistants
gpt-oss:120b
OpenAI • 120B parameters • Context: 120,000 tokens
State-of-the-art open-weight language model from OpenAI, delivering strong performance with a flexible Apache 2.0 license.
Technical specifications:
- Speed: 104 tokens/second
- Energy consumption: 2.19 kWh per million tokens
- License: Apache 2.0
- Location: FR 🇫🇷
Capabilities: ✅ Tools/Agent • ❌ Vision • ✅ Reasoning • ❌ Security
Tags: MoE Agent Reasoning Open-Source Very Large
Use cases:
- Advanced conversational agents with complex reasoning and tool integration.
- Applications requiring full transparency in the reasoning process (chain-of-thought).
- Commercial scenarios needing a permissive license (Apache 2.0).
- Fine-tuning for specialized tasks requiring a powerful base model.
llama3.3:70b
Meta • 70B parameters • Context: 132,000 tokens
State-of-the-art multilingual model developed by Meta, designed to excel in natural dialogue, complex reasoning, and nuanced instruction understanding.
Technical specifications:
- Speed: 29 tokens/second
- Energy consumption: 7.85 kWh per million tokens
- License: LLAMA 3.3 Community License
- Location: FR 🇫🇷
Capabilities: ✅ Tools/Agent • ❌ Vision • ❌ Reasoning • ❌ Security
Tags: Agent Dialogue Multilingual
Use cases:
- Multilingual chatbots supporting 8 languages simultaneously
- Execution of complex, chained instructions (prompt chaining)
- Processing of 60K-token dialogue windows for conversational history
- Analysis of large legal or technical documents (>100 pages)
- Generation of structured text with strict adherence to stylistic guidelines
ministral-3:14b
Mistral AI • 14B parameters • Context: 250,000 tokens
The most powerful model in the Ministral family, designed for complex tasks on local infrastructure.
Technical specifications:
- Speed: 31 tokens/second
- Energy consumption: 4.30 kWh per million tokens
- License: Apache 2.0
- Location: FR 🇫🇷
Capabilities: ✅ Tools/Agent • ❌ Vision • ✅ Reasoning • ❌ Security
Tags: High Performance Edge Reasoning Code
Use cases:
- Solving complex problems locally
- Coding and engineering assistants
- In-depth document analysis with reasoning
nemotron-3-nano:30b
NVIDIA • 30B parameters • Context: 250,000 tokens
NVIDIA-optimized model for complex reasoning and tool utilization, deployed with an extended context.
Technical Specifications:
- Speed: 89 tokens/second
- Energy Consumption: 1.62 kWh per million tokens
- License: NVIDIA Community License
- Location: FR 🇫🇷
Capabilities: ✅ Tools/Agent • ❌ Vision • ✅ Reasoning • ❌ Security
Tags: Agent Reasoning Large Context
Use Cases:
- Complex autonomous agents with multiple tool calls
- Logical reasoning and problem solving
- Long document analysis with precise extraction
olmo-3:32b
AllenAI • 32B Parameters • Context: 65,536 tokens
The first fully open reasoning model at this scale, competing with the best proprietary models.
Technical Specifications:
- Speed: 19 tokens/second
- Energy Consumption: 7.02 kWh per million tokens
- License: Apache 2.0
- Location: FR 🇫🇷
Capabilities: ❌ Tools/Agent • ❌ Vision • ✅ Reasoning • ❌ Security
Tags: Open-Source Large Context Reasoning Transparent Code High Performance
Use Cases:
- Complex reasoning and multi-step problem solving
- Advanced software development and code generation
- In-depth analysis requiring transparency in decision-making processes
olmo-3:7b
AllenAI • 7B parameters • Context: 65,536 tokens
Reference "Fully Open" model, offering complete transparency (data, code, weights) and remarkable efficiency.
Technical specifications:
- Speed : 37 tokens/second
- Energy consumption : 1.65 kWh per million tokens
- License : Apache 2.0
- Location : FR 🇫🇷
Capabilities: ❌ Tools/Agent • ❌ Vision • ❌ Reasoning • ❌ Security
Tags: Open-Source Large Context Transparent Efficient Maths Code
Use cases:
- Academic and scientific research requiring full reproducibility
- Programming tasks and mathematical problem solving
- Analysis of medium-sized documents with full traceability
qwen-coder-next:80b
Qwen Team • 80B Parameters • Context: 250,000 tokens
State-of-the-art MoE model optimized for code and complex reasoning.
Technical Specifications:
- Speed: 98 tokens/second
- Energy Consumption: 1.47 kWh per million tokens
- License: Apache 2.0
- Location: FR 🇫🇷
Capabilities: ✅ Tools/Agent • ❌ Vision • ✅ Reasoning • ❌ Security
Tags: Agent Programming MoE Large Context AWQ
Use Cases:
- Advanced code assistant (repo-scale)
- Complex code analysis and refactoring
- Autonomous software engineering agents
qwen3-2507:235b
Qwen Team • 235B Parameters • Context: 130,000 tokens
Massive MoE model with 235 billion parameters, activating only 22 billion at a time, delivering state-of-the-art performance.
Technical Specifications:
- Speed: 58 tokens/second
- Energy Consumption: 3.93 kWh per million tokens
- License: Apache 2.0
- Location: FR 🇫🇷
Capabilities: ✅ Tools/Agent • ❌ Vision • ✅ Reasoning • ❌ Security
Tags: MoE Agent Reasoning Very Large
Use Cases:
- Solving complex mathematical and logical problems
- Tasks requiring extensive knowledge base
- Advanced coding assistant
- In-depth document analysis
qwen3-2507:30b-a3b
Qwen Team • 30B Parameters • Context: 250,000 tokens
Enhanced version of the non-thinking mode of Qwen3-30B, featuring improved general capabilities, broader knowledge coverage, and better user alignment.
Technical Specifications:
- Speed: 104 tokens/second
- Energy Consumption: 1.39 kWh per million tokens
- License: Apache 2.0
- Location: FR 🇫🇷
Capabilities: ✅ Tools/Agent • ❌ Vision • ❌ Reasoning • ❌ Security
Tags: Agent Large Context MoE Multilingual
Use Cases:
- Complex tasks requiring precise instruction following and logical reasoning.
- Multilingual applications with extensive knowledge coverage.
- High-quality text generation for open-ended and subjective tasks.
- Analysis of very large documents thanks to the 250k-token context.
qwen3-coder:30b
⚠️ WARNING: This model is deprecated. Recommendation to migrate to qwen-coder-next:80b.
Qwen Team • 30B parameters • Context: 250,000 tokens
MoE-optimized model for software engineering tasks, featuring an extremely long context.
Technical Specifications:
- Speed: 104 tokens/second
- Energy Consumption: 1.39 kWh per million tokens
- License: Apache 2.0
- Location: FR 🇫🇷
Capabilities: ✅ Tools/Agent • ❌ Vision • ✅ Reasoning • ❌ Security
Tags: Agent Programming Large Context MoE
Use Cases:
- Software engineering agents for exploring and modifying codebases
- Generation of complex code with repository-scale understanding
- Reasoning tasks over extended contexts
- Code improvement via reinforcement learning
qwen3-next:80b
Qwen Team • 80B Parameters • Context: 250,000 tokens
Next 80B model from Qwen, optimized for large contexts and reasoning.
Technical Specifications:
- Speed : 98 tokens/second
- Consumption : 1.47 kWh per million tokens
- License : Apache 2.0
- Location : FR 🇫🇷
Capabilities: ✅ Tools/Agent • ❌ Vision • ❌ Reasoning • ❌ Security
Tags: Agent Reasoning Large Context MoE
Use Cases:
- Advanced conversational agents with tool integration
- Analysis of very large documents (up to 260k tokens)
- Code generation and complex tasks requiring structured reasoning
qwen3-omni:30b
Qwen Team • 30B Parameters • Context: 32,768 tokens
Qwen3-Omni 30B is a native multimodal model capable of understanding text, images, video, and audio within a single stream.
Technical Specifications:
- Speed: 86 tokens/second
- Energy Consumption: 2.65 kWh per million tokens
- License: Apache 2.0
- Location: FR 🇫🇷
Capabilities: ✅ Tools/Agent • ✅ Vision • ✅ Reasoning • ❌ Security
Tags: Omni Audio Vision Agent Multimodal BF16
Use Cases:
- Seamless multimodal interactions (speaks, sees, listens)
- Combined video and audio analysis
- Next-generation intelligent assistants
qwen3-vl:235b
Qwen Team • 235B Parameters • Context: 200,000 tokens
The most powerful multimodal model in the catalog, combining state-of-the-art visual understanding with exceptional reasoning capabilities.
Technical Specifications:
- Speed: 31 tokens/second
- Energy Consumption: 7.35 kWh per million tokens
- License: Apache 2.0
- Location: FR 🇫🇷
Capabilities: ✅ Tools/Agent • ✅ Vision • ✅ Reasoning • ❌ Security
Tags: Agent Reasoning Large Context NVFP4 Blackwell Vision
Use Cases:
- Automation of complex document workflows (multilingual OCR, structured extraction)
- Intelligent visual agents for software interaction and GUI automation
- Advanced scientific and technical analysis (STEM, 3D spatial reasoning)
- Multimodal RAG on large documents (>200k tokens) and videos
qwen3-vl:30b
Qwen Team • 30B Parameters • Context: 250,000 tokens
State-of-the-art multimodal model (Qwen3-VL) offering exceptional visual understanding and precise temporal reasoning.
Technical Specifications:
- Speed: 43 tokens/second
- Energy Consumption: 3.10 kWh per million tokens
- License: Apache 2.0
- Location: FR 🇫🇷
Capabilities: ✅ Tools/Agent • ✅ Vision • ❌ Reasoning • ❌ Security
Tags: Vision Agent Large Context Multimodal Video OCR
Use Cases:
- Deep analysis of long videos and intelligent surveillance
- Extraction of complex structured data (documents, tables, charts)
- Advanced visual assistants with spatial understanding
- Multimodal reasoning over sequences of events
qwen3-vl:32b
Qwen Team • 32B parameters • Context: 250,000 tokens
High-performance variant of Qwen3-VL, optimized for the most demanding vision tasks.
Technical Specifications:
- Speed: 17 tokens/second
- Energy Consumption: 7.84 kWh per million tokens
- License: Apache 2.0
- Location: FR 🇫🇷
Capabilities: ✅ Tools/Agent • ✅ Vision • ❌ Reasoning • ❌ Security
Tags: Vision Agent Large Context Multimodal Video OCR
Use Cases:
- Scientific and technical analysis of high-resolution images
- Automation of complex visual processes
- Detailed understanding of dynamic scenes
qwen3:14b
Qwen Team • 14B parameters • Context: 131,072 tokens
Balanced Qwen3 14B model, delivering strong general performance with good inference speed.
Technical Specifications:
- Speed: 68.2 tokens/second
- Energy Consumption: 0.90 kWh per million tokens
- License: Apache 2.0
- Location: FR 🇫🇷
Capabilities: ✅ Tools/Agent • ❌ Vision • ❌ Reasoning • ❌ Security
Tags: Agent Versatile Multilingual
Use Cases:
- High-performance virtual assistants
- High-quality content generation
- Classification and extraction tasks
Specialized Models
bge-m3:567m
BAAI • 567M parameters • Context: 8,192 tokens
State-of-the-art multilingual embedding model (BGE-M3), delivering exceptional semantic search capabilities across more than 100 languages.
Technical Specifications:
- Speed: 171 tokens/second
- Energy Consumption: 0.36 kWh per million tokens
- License: MIT
- Location: FR 🇫🇷
Capabilities: ❌ Tools/Agent • ❌ Vision • ❌ Reasoning • ❌ Security
Tags: Embedding Multilingual Efficient
Use Cases:
- Multilingual semantic search
- Retrieval-Augmented Generation (RAG)
- Document clustering and classification
deepseek-ocr
DeepSeek AI • 3B parameters • Context: 8,192 tokens
Specialized OCR model from DeepSeek, designed for high-precision text extraction with formatting preservation.
Technical specifications:
- Speed: 79 tokens/second
- Consumption: 1.01 kWh per million tokens
- License: MIT License
- Location: FR 🇫🇷
Capabilities: ❌ Tools/Agent • ✅ Vision • ❌ Reasoning • ❌ Security
Tags: Vision OCR Efficient
Use cases:
- Extraction of structured text (Markdown/LaTeX) from images/PDFs
- Document digitization with complex tables and formulas
devstral-small-2:24b
Mistral AI & All Hands AI • 24B parameters • Context: 200,000 tokens
Second iteration of Devstral (Small 2), state-of-the-art agent model for software engineering, deployed on high-performance GPU server.
Technical specifications:
- Speed : 38 tokens/second
- Consumption : 3.80 kWh per million tokens
- License : Apache 2.0
- Location : FR 🇫🇷
Capabilities: ✅ Tools/Agent • ✅ Vision • ❌ Reasoning • ✅ Security
Tags: Agent Programming Vision Open-Source Large Context FP8 Fast
Use cases:
- Autonomous coding agents requiring low latency
- Rapid code refactoring
- Iterative engineering tasks
devstral:24b
⚠️ WARNING: This model is deprecated. Recommendation to migrate to devstral-small-2:24b.
Mistral AI & All Hands AI • 24B parameters • Context: 120,000 tokens
Devstral 24b is an agent-based LLM specialized in software engineering, co-developed by Mistral AI and All Hands AI.
Technical specifications:
- Speed: 44 tokens/second
- Energy consumption: 3.28 kWh per million tokens
- License: Apache 2.0
- Location: FR 🇫🇷
Capabilities: ✅ Tools/Agent • ❌ Vision • ❌ Reasoning • ✅ Security
Tags: Agent Programming Open-Source Large Context FP8
Use cases:
- Exploration and modification of codebases
- Autonomous software engineering agents
- Complex code refactoring and generation
embeddinggemma:300m
Google • 300M parameters • Context: 2,048 tokens
State-of-the-art embedding model from Google, optimized for its size, ideal for search and semantic retrieval tasks.
Technical specifications:
- Speed : 175 tokens/second
- Energy consumption : 0.35 kWh per million tokens
- License : Google Gemma Terms of Use
- Location : FR 🇫🇷
Capabilities: ❌ Tools/Agent • ❌ Vision • ❌ Reasoning • ❌ Security
Tags: Embedding Compact Semantic Efficient Multilingual
Use cases:
- Information retrieval and search
- Document classification and clustering
- Semantic similarity search
- Deployment on resource-constrained devices (mobile, laptop)
gemma3:1b
Google • 1B parameters • Context: 120,000 tokens
Ultra-fast and efficient micro-model Gemma 3.
Technical specifications:
- Speed : 53 tokens/second
- Energy consumption : 1.15 kWh per million tokens
- License : Google Gemma Terms of Use
- Location : FR 🇫🇷
Capabilities: ❌ Tools/Agent • ❌ Vision • ❌ Reasoning • ❌ Security
Tags: Compact Efficient Edge
Use cases:
- Fast text classification
- Simple chatbots
- Rapid prototyping
gemma3:4b
Google • 4B Parameter • Kontext: 120.000 Tokens
Kompakter Gemma 3-Modell mit 4B Parametern, bietet ein hervorragendes Leistungs-/Größen-Verhältnis.
Technische Spezifikationen:
- Geschwindigkeit : 48,0 Tokens pro Sekunde
- Energieverbrauch : 1,27 kWh pro Million Tokens
- Lizenz : Google Gemma Nutzungsbedingungen
- Standort : FR 🇫🇷
Funktionen: ❌ Werkzeuge/Agent • ❌ Vision • ❌ Schlussfolgerung • ❌ Sicherheit
Tags: Kompakt Effizient Edge
Anwendungsfälle:
- Persönliche Assistenten auf Laptop
- Textzusammenfassung
- Leichte Übersetzungen
gpt-oss:20b
OpenAI • 20B Parameters • Context: 120,000 tokens
Open-weight language model from OpenAI, optimized for efficiency and deployment on consumer-grade hardware.
Technical Specifications:
- Speed: 9 tokens/second
- Energy Consumption: 14.81 kWh per million tokens
- License: Apache 2.0
- Location: FR 🇫🇷
Capabilities: ✅ Tools/Agent • ❌ Vision • ✅ Reasoning • ❌ Security
Tags: MoE Agent Reasoning Open-Source Compact Fast
Use Cases:
- Deployments on resource-constrained devices (edge devices) or low-cost servers.
- Applications requiring fast inference with strong reasoning capabilities.
- Agent-based use cases involving function calls, web navigation, and code execution.
- Fine-tuning for specialized tasks on consumer-grade hardware.
granite-embedding:278m
IBM • 278M parameters • Context: 8,192 tokens
Ultra-compact IBM Granite embedding model, designed for maximum efficiency.
Technical specifications:
- Speed : 196.3 tokens/second
- Energy consumption : 0.31 kWh per million tokens
- License : Apache 2.0
- Location : FR 🇫🇷
Capabilities: ❌ Tools/Agent • ❌ Vision • ❌ Reasoning • ❌ Security
Tags: Embedding Compact Efficient
Use cases:
- Semantic search
- Document clustering
granite4-small-h:32b
IBM • 32B (9B active) Parameters • Context: 128,000 tokens
IBM's MoE (Mixture-of-Experts) model, designed as a "workhorse" for daily enterprise tasks, featuring excellent efficiency for long contexts.
Technical Specifications:
- Speed: 49 tokens/second
- Energy Consumption: 2.95 kWh per million tokens
- License: Apache 2.0
- Location: FR 🇫🇷
Capabilities: ✅ Tools/Agent • ❌ Vision • ✅ Reasoning • ✅ Security
Tags: Agent Reasoning Security MoE Large Context Efficient
Use Cases:
- Conversational agents for customer support with access to extensive knowledge bases.
- Enterprise workflow automation requiring the use of multiple tools.
- Analysis of long documents with optimized resource consumption.
- Deployment on medium-sized infrastructures thanks to its efficiency.
granite4-tiny-h:7b
IBM • 7B (1B active) parameters • Context: 128,000 tokens
Ultra-efficient hybrid MoE model from IBM, designed for low latency, edge and local applications, and as a foundational component for agent workflows.
Technical Specifications:
- Speed: 58 tokens/second
- Energy Consumption: 2.30 kWh per million tokens
- License: Apache 2.0
- Location: FR 🇫🇷
Capabilities: ✅ Tools/Agent • ❌ Vision • ✅ Reasoning • ✅ Security
Tags: Agent Reasoning Security MoE Large Context Efficient Fast Compact
Use Cases:
- Embedded and edge applications requiring low latency.
- Fast tasks within larger agent workflows (e.g., function calling).
- Document analysis on consumer-grade hardware.
- Deployments requiring minimal memory footprint.
medgemma:27b
Google • 27B Parameters • Context: 128,000 tokens
MedGemma is one of Google's most advanced open models for understanding medical text and images, based on Gemma 3.
Technical Specifications:
- Speed: 22 tokens/second
- Energy Consumption: 6.56 kWh per million tokens
- License: Google Gemma Terms of Use
- Location: FR 🇫🇷
Capabilities:
✅ Tools/Agent • ✅ Vision • ❌ Reasoning • ❌ Security
Tags: Medical Vision Specialized Large Context
Use Cases:
- Medical image interpretation (Report generation and VQA)
- Medical text understanding and clinical reasoning (Decision support)
- Patient interaction (Interviews and medical triage)
- Medical record synthesis and literature search
ministral-3:3b
Mistral AI • 3B parameters • Context: 250,000 tokens
High-performance compact model from Mistral AI, designed for efficiency in local and edge deployments.
Technical Specifications:
- Speed : 50 tokens/second
- Energy Consumption : 1.22 kWh per million tokens
- License : Apache 2.0
- Location : FR 🇫🇷
Capabilities: ✅ Tools/Agent • ❌ Vision • ❌ Reasoning • ❌ Security
Tags: Compact Efficient Edge
Use Cases:
- Local inference on mobile devices or edge devices
- Responsive personal assistants
- Fast routing and classification tasks
ministral-3:8b
Mistral AI • 8B parameters • Context: 250,000 tokens
Intermediate-sized model from the Ministral family, offering an optimal balance between performance and resource usage.
Technical specifications:
- Speed: 55 tokens/second
- Energy consumption: 2.42 kWh per million tokens
- License: Apache 2.0
- Location: FR 🇫🇷
Capabilities: ✅ Tools/Agent • ❌ Vision • ✅ Reasoning • ❌ Security
Tags: Efficient Edge Reasoning
Use cases:
- Advanced local conversational assistants
- Document analysis and information extraction
- Tasks requiring a good trade-off between speed and quality
mistral-small3.2:24b
Mistral AI • 24B parameters • Context: 128,000 tokens
Minor update to Mistral Small 3.1, improving instruction following, function calling robustness, and reducing repetition errors.
Technical specifications:
- Speed : 27 tokens/second
- Energy consumption : 5.35 kWh per million tokens
- License : Apache 2.0
- Location : FR 🇫🇷
Capabilities: ✅ Tools/Agent • ✅ Vision • ❌ Reasoning • ✅ Security
Tags: Vision Agent Security Instruction Following
Use cases:
- Conversational agents with enhanced instruction following
- Robust integration with external tools via function calling
- Applications requiring high reliability to avoid repetitions
- Use cases identical to Mistral Small 3.1, with improved performance
qwen3-2507-think:4b
Qwen Team • 4B parameters • Context: 250,000 tokens
Qwen3-4B model optimized for reasoning, with improved performance on logical tasks, mathematics, science, and code, featuring an extended context of 250K tokens.
Technical Specifications:
- Speed: 52 tokens/second
- Energy Consumption: 2.56 kWh per million tokens
- License: Apache 2.0
- Location: FR 🇫🇷
Capabilities: ✅ Tools/Agent • ❌ Vision • ✅ Reasoning • ❌ Security
Tags: Agent Reasoning Large Context Compact Fast
Use Cases:
- Highly complex reasoning tasks (logic, math, science, code).
- Conversational agents with extremely long conversation histories (256k tokens).
- Deep reasoning analysis of very large documents.
- Integration with external tools via function calling on extremely large contexts.
qwen3-2507:4b
⚠️ WARNING: This model is deprecated. Deprecated.
Qwen Team • 4B parameters • Context: 250,000 tokens
Updated version of the non-thinking mode of Qwen3-4B, featuring significant improvements in general capabilities, expanded knowledge coverage, and better alignment with user preferences.
Technical Specifications:
- Speed: 30 tokens/second
- Energy Consumption: 4.44 kWh per million tokens
- License: Apache 2.0
- Location: FR 🇫🇷
Capabilities: ✅ Tools/Agent • ❌ Vision • ❌ Reasoning • ❌ Security
Tags: Agent Large Context Compact Fast Multilingual
Use Cases:
- General tasks requiring precise instruction following and logical reasoning.
- Multilingual applications with broad knowledge coverage.
- High-quality text generation for open-ended and subjective tasks.
- Analysis of very large documents thanks to a 256k-token context.
qwen3-embedding:0.6b
Qwen Team • 0.6B parameters • Context: 32,768 tokens
Ultra-lightweight Qwen3 embedding model, optimized for speed and efficiency on resource-constrained infrastructure.
Technical Specifications:
- Speed: N/A
- Energy Consumption: 0.57 kWh per million tokens
- License: Apache 2.0
- Localization: FR 🇫🇷
Capabilities: ❌ Tools/Agent • ❌ Vision • ❌ Reasoning • ❌ Security
Tags: Embedding Compact Efficient
Use Cases:
- Fast semantic search
- Real-time text classification
qwen3-embedding:4b
Qwen Team • 4B parameters • Context: 40,000 tokens
Ultra-performant embedding model Qwen3-4B, offering deep semantic understanding and an extended context window.
Technical Specifications:
- Speed : N/A
- Energy Consumption : 0.57 kWh per million tokens
- License : Apache 2.0
- Location : FR 🇫🇷
Capabilities: ❌ Tools/Agent • ❌ Vision • ❌ Reasoning • ❌ Security
Tags: Embedding Large Context Efficient
Use Cases:
- Semantic search on long documents
- RAG with extended context windows
- High-precision semantic analysis
qwen3-vl:2b
Qwen Team • 2B Parameters • Context: 250,000 tokens
Ultra-compact multimodal model Qwen3-VL, bringing advanced vision capabilities to edge devices.
Technical Specifications:
- Speed: 64 tokens/second
- Power Consumption: 0.95 kWh per million tokens
- License: Apache 2.0
- Location: FR 🇫🇷
Capabilities: ✅ Tools/Agent • ✅ Vision • ❌ Reasoning • ❌ Security
Tags: Vision Compact Efficient Multimodal Edge OCR
Use Cases:
- Real-time image analysis on mobile devices
- Lightweight OCR and document reading
- Fast visual sorting and classification
qwen3-vl:4b
Qwen Team • 4B Parameter • Kontext: 250.000 Tokens
Ausgewogener multimodaler Qwen3-VL-Modell mit solider Bildverarbeitungsleistung bei geringem Ressourcenverbrauch.
Technische Spezifikationen:
- Geschwindigkeit : 57 Tokens/Sekunde
- Energieverbrauch : 2,34 kWh pro Million Tokens
- Lizenz : Apache 2.0
- Standort : FR 🇫🇷
Funktionen: ✅ Werkzeuge/Agent • ✅ Bildverarbeitung • ❌ Schlussfolgerung • ❌ Sicherheit
Tags: Bildverarbeitung Kompakt Multimodal Effizient Video OCR
Anwendungsfälle:
- Automatisierte Dokumentenanalyse (Rechnungen, Formulare)
- Verständnis von Videoinhalten
- Interaktive visuelle Assistenten
qwen3-vl:8b
Qwen Team • 8B Parameters • Context: 250,000 tokens
Multimodal model Qwen3-VL (8B), delivering advanced vision capabilities with a reasonable footprint.
Technical Specifications:
- Speed: 44 tokens/second
- Energy Consumption: 3.03 kWh per million tokens
- License: Apache 2.0
- Location: FR 🇫🇷
Capabilities: ✅ Tools/Agent • ✅ Vision • ❌ Reasoning • ❌ Security
Tags: Vision Compact Multimodal Efficient Video OCR
Use Cases:
- Automated document analysis
- Video content understanding
- Interactive visual assistants
qwen3:0.6b
Qwen Team • 0.6B parameters • Context: 40,000 tokens
Ultra-light Qwen3 model with 0.6 billion parameters, delivering exceptional inference speed for simple and fast tasks.
Technical Specifications:
- Speed: 46 tokens/second
- Energy Consumption: 1.33 kWh per million tokens
- License: Apache 2.0
- Location: FR 🇫🇷
Capabilities: ✅ Tools/Agent • ❌ Vision • ❌ Reasoning • ❌ Security
Tags: Compact Fast Efficient Multilingual
Use Cases:
- Simple text processing tasks
- Fast classification and sorting
- Lightweight assistants with low latency
rnj-1:8b
Essential AI • 8B parameters • Context: 32,000 tokens
8B "Open Weight" model specialized in code, mathematics, and sciences (STEM).
Technical specifications:
- Speed : 31 tokens/second
- Consumption : 1.97 kWh per million tokens
- License : Open Weights
- Location : FR 🇫🇷
Capabilities: ❌ Tools/Agent • ❌ Vision • ✅ Reasoning • ❌ Security
Tags: Code Maths STEM Reasoning Efficient
Use cases:
- Advanced programming assistant and code generation
- Solving complex mathematical problems
- Scientific and technical tasks (STEM)
translategemma:12b
Google • 12B parameters • Context: 128,000 tokens
State-of-the-art open translation model based on Gemma 3, supporting 55 languages.
Technical specifications:
- Speed : 30 tokens/second
- Energy consumption : 4.44 kWh per million tokens
- License : Gemma Terms of Use
- Location : FR 🇫🇷
Capabilities:
❌ Tools/Agent • ❌ Vision • ❌ Reasoning • ❌ Security
Tags: Translation Multilingual Specialized
Use cases:
- Translation of long documents
- Cross-language communication
- Content localization
translategemma:27b
Google • 27B Parameters • Context: 120,000 tokens
High-performance translation model based on Gemma 3 27B.
Technical specifications:
- Speed: 44 tokens/second
- Energy consumption: 6.35 kWh per million tokens
- License: Gemma Terms of Use
- Localization: FR 🇫🇷
Capabilities:
❌ Tools/Agent • ❌ Vision • ❌ Reasoning • ❌ Security
Tags: Translation Multilingual Specialized High Performance
Use cases:
- High-precision translation
- Technical document translation
- Literary and cultural nuances
translategemma:4b
Google • 4B Parameters • Context: 128,000 tokens
Compact version of the TranslateGemma translation model, optimized for speed.
Technical Specifications:
- Speed: 38 tokens/second
- Energy Consumption: 1.27 kWh per million tokens
- License: Gemma Terms of Use
- Localization: FR 🇫🇷
Capabilities: ❌ Tools/Agent • ❌ Vision • ❌ Reasoning • ❌ Security
Tags: Translation Multilingual Specialized Efficient
Use Cases:
- Fast text translation
- Translation on devices with limited resources
- Real-time localization
Recommended Use Cases
Multilingual Dialogue
Chatbots and assistants capable of communicating in multiple languages with automatic language detection, context preservation throughout the entire conversation, and understanding of linguistic nuances.
Recommended Models:
- Llama 3.3
- Mistral Small 3.2
- Qwen 3
- Openai OSS
- Granite 4
Long Document Analysis
Processing large documents (>100 pages) while preserving context across the entire text, extracting key information, generating relevant summaries, and answering specific questions about the content.
Recommended Models:
- Gemma 3
- Qwen next
- Qwen 3
- Granite 4
Programming and Development
Generation and optimization of code in multiple languages, debugging, refactoring, development of complete features, understanding of complex algorithmic implementations, and creation of unit tests
Recommended models:
- DeepCoder
- Qwen3 coder
- Granite 4
- Devstral
Visual Analysis
Direct processing of images and visual documents without prior OCR preprocessing, interpretation of technical diagrams, charts, tables, drawings, and photos with detailed textual explanations of the visual content.
Recommended Models:
- deepseek-OCR
- Mistral Small 3.2
- Gemma 3
- Qwen 3 VL
Security and Compliance
Applications requiring specific security capabilities; sensitive content filtering, reasoning traceability, GDPR/HDS compliance verification, risk minimization, vulnerability analysis, and adherence to industry-specific regulations
Recommended models:
- Granite Guardian
- Granite 4
- Devstral
- Mistral Small 3.2
- Magistral small
Lightweight and Embedded Deployments
Applications requiring minimal resource footprint, deployment on devices with limited capacity, real-time inference on standard CPUs, and integration into embedded systems or IoT devices
Recommended models:
- Gemma 3n
- Granite 4 tiny
- Qwen 3 VL (2B)