AI for Kubernetes - part 1: The big picture

Not a day goes by without an update on AI. For us at Softwaredam it is time to use the Blogtober month to dive a little deeper into AI and start writing about it.

In this AI for Kubernetes series, we will keep it simple. Firstly, we gather general facts in this blog to depict the bigger picture, then we try some models and tools in the coming 2~3 blogs, and finally we will wrap it up with a final blog.

By: Yosuf Haydary and Stephan Duivelshof

As part of Blogtober 2025

Basic AI vocabulary

AI – Artificial Intelligence is the capability of computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making.
GenAI – Generative AI is a subset of AI which uses generative models to produce text, images, video, and source code.
Model – A model is a (mathematical) system trained on data. This system can classify information, predict based on what it has learned, or generate new information.
LLM – Large Language Model is a model trained with a huge amount of (language) data.
GPT – Generative Pre-trained Transformers are very large LLM’s.
Token – is a basic unit of input/output for LLM models. It can be compared to a word in natural language.
Parameters – The number of possibilities inside a model that determines predictions. The more the richer the model.
Training – Training is is basically the process that uses an algorithm to process input data and create a model.
Prompt – A prompt is a question for an AI model. Anything you type
AI Agent – Agentic AI is a class of artificial intelligence that focuses on autonomous systems that can make decisions and perform tasks with limited or no human intervention.
AI Hallucination – AI models can confidently present false information as facts, which are called hallucinations.

Current GenAI LLMs

Companies like OpenAI, Microsoft, Meta, and alike invest billions in AI talent, creating better algorithms, training better LLMs, and providing ease of use of these LLM. However, there also many opensource projects which put effort into development of AI models and tools. In the following table we summerize a list of most popular LLMs and their important properties.

Model	Company	Description	Open Source	Paramaters
GPT-5	OpenAI	GPT-5 is the current model that chatgpt uses as of this day. If you use chatgpt you get a certain amount of tokens before it switches to the gpt-5-mini model.	No	Unknown
Claude 4.1	Anthropic	Claude claims to be the industry leader when it comes to coding and agentic capabilities.	No	Unknown
Grok 5	Xai	This is the model being used on the platform X. It can do the following: Answer questions, Generate image and also help with coding.	No	Unknown
Llama 4 Scout	Meta	This is a good model for different things from coding to multilingual support. But unfortunatly it is only available for people outside of the european union.	Yes	17b, 109b
Gemini 2.5 Pro	Google	This is the model google currently uses when you search for something and it gives you an answer on top of the page. It is capable of more of course like coding, text generation and generating videos.	No	2b, 9b, 27b
Qwen 3	Alibaba Cloud	Qwen is the latest addition of Alibaba Cloud. It is an open source LLM model it is capable of the basic answering questions but excels in solving math equation and helping understand it.	Yes	32.8b
Phi-3	Microsoft	Ph-3 is a smaller model, but its not to be underestimated. It is able to run on devices without cloud connectivity.	Yes	3b, 7b, 14b

Basic Kubernetes vocabulary

Container – a compressed file with an application and everything that the application needs to run.
Container – an isolated proces that runs on a Linux machine.
Kubernetes – an system which automates running of containers, robustly, on Linux machines.
K8s – short for Kubernetes. The 8 represents the 8 characters between first K and last s.
Node – a Linux (virtual) machine/server which is controlled by Kubernetes to run processes on.
Cluster – a set of one or more nodes managed by Kubernetes.
Pod – a container running in Kubernetes, usually hosting a website or doing some other work.
PVC – a storage volume which is accessed by a Pod for data.
Autoscaler – a mechanism in Kubernetes which automatically scales pods up or down on demand based on all kinds of inputs.
Manifest – aka Deployment Manifest. Refers to a set of yaml files in which Kubernetes is instructed to run applications, scale up, create a website, and so on.

AI for Kubernetes?

Those who know Kubernetes, also know that Kubernetes is intertwined with a lot of technical and management complexity. Of course it would be a dream (and also foolish) to assume that AI will take care of all that at once. Until we are there, I think we better understand the different aspects, break it down, and then see how (gen) AI can be applied to help us with Kubernetes management.

Application Development aspect

The application development aspect is the process of creating Kubernetes or cloud-native applications and manifests. From the top of my head, some parts that AI can assist are:

Assisting in creating containers
Assisting in developing deployment manifests like helm charts, Kustomize or plain yaml
Assisting in quality assurance of deployment manifests, like enforcing rules & best-practices
Assisting in simplifying complex setups
Assisting in designing clusters

Operational aspect

Operational aspect of Kubernetes is keeping the Kubernetes environments and the applications running on it up-and-running as required and expected to behave. Also here, the list is long, but from the top of my head, here are a few things that (gen) AI might be able to help:

Assisting in error detection and analysis
Assisting in spotting complex problems
Assisting in error prevention, or recovery
Assisting in security risk analysis and detection (like DDoS attacks, CVEs)
Assisting in security risk mitigation
Assisting in creating reports
Assisting in alerting to keep stuff under the thresholds
Assisting in resource utilization and cost-effectiveness.

Security and Sovereignty aspect

Security is one of the most important aspects of operating a Kubernetes environment. Companies invest a lot of manpower, time and money to keep these environments and its data safe. Since AI primarily works on data, it is very important that the security is guaranteed. It is also the exact same reason that we do not want to rely on AI for certain aspects for 100%, but rather we use it to assist us, until we are sure that multiple quality and security aspects are guaranteed.

From a security by default perspective, we will also ditch any public hosted LLM, and mainly focus on keeping AI and its model within the controllable boundaries of our own environments. Since we are huge fans of open source, we will also ditch any non-open-source LLM for now.

Current AI applications for Kubernetes

There are already many initiatives that try to fill one or more of the mentioned Kubernetes challenges. The following table contains a short list of AI tools and projects for Kubernetes.

Name	Description	Open source
KubeAI	A tool for deploying Machine learning models.	Yes
Kubeflow	A foundation for building AI powered platforms.	Yes
K8sGPT	A tool that analyzes your Kubernetes cluster.	Yes
PredictKube	an AI-based predictive autoscaler for KEDA made by Dysnix.	Yes
Cast AI	Cast AI automates kubernetes optimization, because of this it is able to lower cloud costs and enchane performance free from human involvement.	No

What’s next?

So far, we laid the basis on a common terminology and listed the well-known gen AI models. We also had a look at Kubernetes, its different development, security and operational aspects, and listed a few interesting things that AI might help us in managing Kubernetes.

In the coming blogs, we will get our hands dirty and start trying different models and tools. The goal is to learn, write and in the process see how much (gen) AI can help us in our day-to-day DevOps with Kubernetes.

Subscribe here.
✉️