Skip to main content

Concepts

Managed Kubernetes Overview

The Managed Kubernetes offering (also known as "Kub Managé" or "KM") is a Kubernetes containerization solution managed by Cloud-Temple, deployed as Virtual Machines running on Cloud-Temple's OpenIaaS IaaS infrastructure.

Managed Kubernetes is built on Talos Linux (https://www.talos.dev/), a lightweight and secure operating system dedicated to Kubernetes. It is immutable, with no shell or SSH access, and configured exclusively through a declarative API using gRPC.

The standardized installation includes a set of components, mostly open-source and certified by the CNCF:

  • Cilium CNI, with observability interface (Hubble): Cilium is a Kubernetes container networking solution (Container Network Interface). It handles security, load balancing, service mesh, observability, encryption, and more. It is a core networking component found in most Kubernetes distributions (OpenShift, AKS, GKE, EKS, etc.). We have included the Hubble graphical interface for visualizing Cilium traffic flows.

  • MetalLB and nginx: For exposing web applications, three nginx ingress-classes are included by default:

    • nginx-external-secured: exposed on a public IP, filtered through the firewall to allow only known IPs (used for product web interfaces and the Kubernetes API).
    • nginx-external: exposed on a second public IP without filtering (or with client-specific filtering).
    • nginx-internal: exposed only on an internal IP.

    For non-web services, a MetalLB load balancer enables internal or public IP exposure (allowing deployment of other ingresses, such as a WAF).

  • Distributed Rook-Ceph Storage: For persistent volumes (PVs), an open-source distributed storage solution, Ceph, is integrated into the platform. It supports the following storage classes: ceph-block, ceph-bucket, and ceph-filesystem. A high-performance storage backend with 7,500 IOPS is used, enabling excellent performance. In production deployments (across 3 AZs), storage nodes are dedicated (one per AZ); in non-production deployments (1 AZ), storage is shared with worker nodes.

  • Cert-Manager: The open-source Cert-Manager is natively integrated into the platform for automated certificate management.

  • ArgoCD is available for automated deployments via a CI/CD pipeline.

  • Prometheus Stack (Prometheus, Grafana, Loki): Managed Kubernetes clusters come standard with a complete open-source Prometheus observability stack, including:

    • Prometheus
    • Grafana, with numerous pre-built dashboards
    • Loki: Platform logs are exported to Cloud-Temple's S3 storage and integrated into Grafana.
  • Harbor is a container registry that allows you to store your container images or Helm charts directly within the cluster. It performs vulnerability scanning on your images and supports digital signing. Harbor also enables synchronization with other registries. (https://goharbor.io/)

  • OpenCost (https://github.com/opencost/opencost) is a FinOps tool for Kubernetes cost management. It enables fine-grained tracking of Kubernetes resource consumption and supports cost allocation by project/namespace.

  • Advanced security policies with Kyverno and Capsule:

    • Kyverno (https://kyverno.io/) is a Kubernetes admission controller that enforces policies. It is essential for governance and security in Kubernetes environments.
    • Capsule (https://projectcapsule.dev/) is a permission management tool that simplifies access control in Kubernetes. It introduces the concept of tenants, enabling centralized and delegated permissions across multiple namespaces. With Capsule, users of the Managed Kubernetes platform are granted access restricted only to their own namespaces.
  • Veeam Kasten (aka 'k10') is a solution for Kubernetes workload backup.

    It enables full deployment backups: manifests, volumes, etc., to Cloud-Temple’s S3 object storage. Kasten uses Kanister to ensure application-consistent backups, such as for databases (https://docs.kasten.io/latest/usage/blueprints/).

    Kasten is a cross-platform tool compatible with other Kubernetes clusters (OpenShift, hyperscalers, etc.). It can be used for disaster recovery, migration scenarios (K10 handles necessary adaptations via transformations, e.g., changing ingress-class), and environment refreshes (e.g., scheduled restoration of a production environment into pre-production).

  • SSO Authentication with external OIDC Identity Providers (Microsoft Entra, FranceConnect, Okta, AWS IAM, Google, Salesforce, ...)

SLA & Support Information

  • Guaranteed Availability (production 3 AZ): 99.90%
  • Support: N1/N2/N3 included for the core scope (infrastructure and standard operators).
  • Mean Time to Recovery (MTTR): As per the Cloud Temple framework agreement.
  • Maintenance (MCO): Regular patching of Talos / Kubernetes / standard operators by MSP, without service interruption (rolling upgrade).

Response and recovery times depend on the incident severity, according to the support matrix (P1 to P4).

Versioning Policy & Lifecycle

  • Supported Kubernetes: N-2 (3 major releases per year, approximately every 4 months). Each release is officially supported for 12 months, ensuring a maximum Cloud Temple support window of ~16 months per version.
  • Talos OS: aligned with stable Kubernetes releases.
    • Each branch is maintained for approximately 12 months (including security patches).
    • Recommended upgrade frequency: 3 times per year, consistent with Kubernetes upgrades.
    • Critical patches (CVE, kernel) are applied via rolling upgrade, without service interruption.
  • Standard Operators: updated within 90 days following stable release.
  • Updates:
    • Major (Kubernetes N+1, Talos X+1): scheduled 3 times per year, via rolling update.
    • Minor: applied automatically within 30 to 60 days.
  • Deprecation: version N-3 → end of support within 90 days after the release of N.

Kubernetes Nodes

Production (multi-zonal)

For a "production" (multi-zone) deployment, the following machines are used:

AZMachinevCoresRAMLocal Storage
AZ07Git Runner48 GBOS: 64 GB
AZ05Control Plane 1812 GBOS: 64 GB
AZ06Control Plane 2812 GBOS: 64 GB
AZ07Control Plane 3812 GBOS: 64 GB
AZ05Storage Node 11224 GBOS: 64 GB + Ceph 500 GB minimum (*)
AZ06Storage Node 21224 GBOS: 64 GB + Ceph 500 GB minimum (*)
AZ07Storage Node 31224 GBOS: 64 GB + Ceph 500 GB minimum (*)
AZ05Worker Node 1 (**)1224 GBOS: 64 GB
AZ06Worker Node 2 (**)1224 GBOS: 64 GB
AZ07Worker Node 3 (**)1224 GBOS: 64 GB

(*) Each storage node comes with a minimum of 500 GB of disk space, providing 500 GB of usable distributed Ceph storage (data is replicated across each AZ, hence ×3). The available free space for the client is approximately 350 GB. This initial size can be increased during provisioning or later, depending on requirements. Quotas are enforced on Ceph, with a distribution between Block and File storage.

(**) The size and number of Worker Nodes can be adjusted according to the client’s compute capacity needs. The minimum number of Worker Nodes is 3 (1 per AZ), and we recommend increasing the number in batches of 3 to maintain consistent multi-zone distribution. Worker Node size can be customized, with a minimum of 12 cores and 24 GB of RAM; the upper limit per Worker Node is determined by the hypervisor capacity (up to 112 cores / 1536 GB RAM with Performance 3 blade servers). The maximum number of Worker Nodes is capped at 100. The CNCF recommends using Worker Nodes of identical size. The maximum number of pods per Worker Node is 110.

Dev/Test

For a "dev/test" deployment, the following machines are provisioned:

AZMachinevCoresRAMLocal Storage
AZ0nGit Runner48 GBOS: 30 GB
AZ0nControl Plane812 GBOS: 64 GB
AZ0nWorker Node 1 (**)1224 GBOS: 64 GB + Ceph 300 GB minimum (*)
AZ0nWorker Node 2 (**)1224 GBOS: 64 GB + Ceph 300 GB minimum (*)
AZ0nWorker Node 3 (**)1224 GBOS: 64 GB + Ceph 300 GB minimum (*)

(*) : Three Worker nodes are used as Storage nodes and are provisioned with a minimum of 300 GB of disk space, providing a distributed usable storage capacity of 300 GB (data is replicated three times). The available free space for the client is approximately 150 GB. This initial size can be increased during deployment or later, depending on requirements.

(**) : The size and number of Worker Nodes can be adjusted based on the client’s compute capacity needs. The minimum number of Worker Nodes is 3 (due to storage replication requirements). The minimum configuration per Worker Node is 12 cores and 24 GB of RAM; the upper limit per Worker Node is constrained by the hypervisor size (up to 112 cores / 1536 GB RAM with Performance 3 blade servers). The maximum number of Worker Nodes is 250. The CNCF recommends using Worker Nodes of identical size. The maximum number of pods per Worker Node is 110.

RACI

Architecture & Infrastructure

ActivityClientCloud Temple
Define the overall architecture of the Kubernetes serviceCRA
Size the Kubernetes service (number of nodes, resources)CRA
Install the Kubernetes service with default configurationIRA
Configure the Kubernetes serviceCRA
Set up the base networking for the Kubernetes serviceIRA
Deploy initial configuration for identities and accessCRA
Define scaling and high availability strategyCRA

Project and Business Applications Management

ActivityClientCloud Temple
Create and manage Kubernetes projectsRAI*
Deploy and manage applications in KubernetesRAI*
Configure CI/CD pipelinesRAI*
Manage container images and registriesRAI*

*May transition to "C" depending on the managed services contract

Monitoring and Performance

ActivityClientCloud Temple
Monitor Kubernetes service performanceIRA*
Monitor application performanceRA
Manage alerts related to the Kubernetes serviceIRA*
Manage alerts related to applicationsRA

(*) : Production cluster only. In Dev/Test, the client is fully autonomous and responsible.

Infrastructure Maintenance and Updates

ActivityClientCloud Temple
Update Kubernetes/OS serviceCRA
Apply security patches to KubernetesCRA
Update deployed applications (operators*)CRA

*Operator package included in Managed Kube - see sections: Managed Helm Packages

Security

ActivityClientCloud Temple
Manage security for the Kubernetes serviceRARA*
Configure and manage pod security policiesRAI
Manage SSL/TLS certificates for the Kubernetes serviceCRA*
Manage SSL/TLS certificates for applicationsRAI
Implement and manage Role-Based Access Control (RBAC)CR*
Implement and manage Client-Based Role-Based Access Control (RBAC)RAI

(*) : Production cluster only. In Dev/Test, the client has full autonomy and responsibility.

Backup and Disaster Recovery

ActivityClientCloud Temple
Define the backup strategy for the Kubernetes serviceIRA
Implement and manage backups for the Kubernetes serviceIRA
Define the backup strategy for applicationsRA*I*
Implement and manage backups for applicationsRA*I*
Test disaster recovery procedures for the Kubernetes serviceCIRA
Test disaster recovery procedures for applicationsRA*CI*

*May change to "CI | RA" depending on the managed services contract

Support and Troubleshooting

ActivityClientCloud Temple
Provide level 1 support for infrastructureIRA
Provide level 2 and 3 support for infrastructureIRA
Resolve issues related to the Kubernetes serviceCRA
Resolve issues related to applicationsRAI

Capacity Management and Evolution

Production cluster only. In Dev/Test, the client is fully autonomous and responsible.

ActivityClientCloud Temple
Monitor Kubernetes resource usageCRA
Plan service capacity evolutionRAC
Implement capacity changesIRA
Manage application evolution and their resourcesRAI

Documentation and Compliance

ActivityClientCloud Temple
Maintain Kubernetes service documentationIRA
Maintain application documentationRAI
Ensure Kubernetes service complianceIRA
Ensure application complianceRAI
Conduct Kubernetes service auditsIRA
Conduct application auditsRAI

Kubernetes Operators/CRD Management (included in the offering)

ActivityClientCloud Temple
Provisioning of default Operators catalogCIRA
Updating OperatorsCIRA
Monitoring Operators' statusCIRA
Troubleshooting issues related to OperatorsCIRA
Managing Operators' permissionsCIRA
Managing Operators' resources (addition/removal)CIRA
Backing up Operators' resource dataCIRA
Monitoring Operators' resourcesCIRA
Restoring Operators' resource dataCIRA
Security auditing of OperatorsCIRA
Operator supportCIRA
License management for operatorsCIRA
Management of specific support plans for operatorsCIRA

*Operator package included in Managed Kube – see sections: Managed Helm Packages

Kubernetes Applications/Operators/CRD Management (Client Side)

Production cluster only. In Dev/Test, the client is fully autonomous and responsible.

ActivityClientCloud Temple
Deployment of CRDsI*RA*
Updating OperatorsRAI
Monitoring Operator statusRAI
Troubleshooting issues related to OperatorsRAI
Managing Operator permissionsRAI
Managing Operator resources (addition/removal)RAI
Backing up Operator resource dataRAI
Monitoring Operator resourcesRAI
Restoring Operator resource dataRAI
Security auditing of OperatorsRAI
Supporting OperatorsRAI
Managing licenses for operatorsRAI
Managing specific support plans for operatorsRAI

Some operator services may be managed depending on the managed services contract.

*May change to "A | RC" depending on the managed services contract

Application Support

ActivityClientCloud Temple
Application Support (external service)RAI

Application support can be provided through an additional service.

RACI (synthetic)

  • Cloud Temple: responsible and accountable (RA) for the Kubernetes foundation, cluster security, infrastructure backups, monitoring, and CRDs.
  • Client: responsible and accountable (RA) for application projects, business operators, CI/CD pipelines, and application-level backups.
  • "Grey zone": adaptations and extensions (IAM, specific operators, cluster compliance/security hardening) — billed on a project basis.