Recently at Render, we built a data pipeline to track which node every Pod is scheduled on in our Kubernetes clusters. This was part of a broader effort to measure the efficiency of the underlying compute we provision.
Under the hood, the collection of data was done with Kubernetes Informers. Informers stream updates for a particular resource (Pods, Services, Deployments, etc.) from the Kubernetes API to your code. They're well-weathered, battle-tested software that continues to improve.
In this post, we'll share some of Render's hard-learned lessons about using Informers in Very Large Clusters™. Who knows, they might save you a few hours of debugging—or even help avoid an incident!
Informer fundamentals
A Kubernetes cluster is an endless cycle of action and reaction: creating a Service triggers the creation of an Endpoint; scheduling a Pod triggers the startup of a container; and a Pod failing its health check triggers an iptables
update. The Kubernetes term for this infinite machinery is reconciliation.
Informers are central to the reconciliation process: they power native components like kubelet
, kube-proxy
, and kube-controller-manager
, third-party components like Knative and Calico, and most likely the custom controller you wrote that one time. Operators, controllers, watchers—whatever you call them, they all use Informers to stay, well, informed.
Here's a minimal initialization of a Pod Informer:
import ( "k8s.io/client-go/informers" "k8s.io/client-go/kubernetes" "k8s.io/client-go/tools/cache" "time")
func SetupPodInformer(ctx context.Context, kubeClient kubernetes.Interface) { // Unless you have an advanced use case, create informers with NewSharedInformerFactory: sif := informers.NewSharedInformerFactory(kubeClient, 30*time.Minute) podInformer := sif.Core().V1().Pods().Informer()
// Define handler functions for each event type podInformer.AddEventHandler(cache.ResourceEventHandlerFuncs{ AddFunc: func(obj any) { // Handle the addition of a Pod }, UpdateFunc: func(oldObj, newObj any) { // Handle the modification of a Pod }, DeleteFunc: func(obj any) { // Handle the deletion of a Pod }, })
sif.Start(ctx.Done())}
The core setup is straightforward: given a Kubernetes client, invoke the correct incantation of method calls to register a handler of type cache.ResourceEventHandlerFuncs
that will receive Pod updates.
Behind the scenes, the Informer communicates with the Kubernetes API, organizes returned data neatly in a local cache, and calls registered handlers with the relevant Pod specs:
- On initialization, the Informer sends a
LIST
request to fetch all Pods in the cluster. Because the Informer's cache starts out empty, all returned Pods are treated as "new" and passed toAddFunc
.- This response includes a
resourceVersion
marker, which is used in the next step. - Importantly, the Informer stores returned Pods in its in-memory cache, which is kept up-to-date using events from the
WATCH
below.
- This response includes a
- The Informer then sends a
WATCH
request, which subscribes to all Pod updates that happen after theresourceVersion
marker.- Pod creations are passed to
AddFunc
, deletions toDeleteFunc
, and modifications toUpdateFunc
.
- Pod creations are passed to
That's Informers in a nutshell. With these basics in mind, let's dive into some of the lessons we've learned working with Informers at scale.
Lesson 1: Be careful with your UpdateFunc
Here's a basic declaration of a handler's UpdateFunc
function, which is called whenever something changes about the resource (like its Status
changing):
UpdateFunc: func(oldObj, newObj any) { // Handle the modification of a Pod},
Based on the function signature, it's natural to think, "I should compare oldObj
and newObj
to detect state changes." After all, why else would both be passed here? We can just skip taking action if they're the same, right?
This point from the official Kubernetes guidelines for writing controllers is particularly salient (emphasis added):
Level driven, not edge driven. Just like having a shell script that isn't running all the time, your controller may be off for an indeterminate amount of time before running again.
If an API object appears with a marker value of true, you can't count on having seen it turn from false to true, only that you now observe it being true. Even an API watch suffers from this problem, so be sure that you're not counting on seeing a change unless your controller is also marking the information it last made the decision on in the object's status.
Taking action based on a state transition is edge-driven logic: it requires the code to have observed the change in order to act correctly. However, that's not how Informers are architected. This GitHub comment sums it up:
Informers are guaranteed to report the current state of an object, eventually. They are not guaranteed to report every in-between state.
This is why Kubernetes emphasizes level-driven logic:
- Whatever you see right now is the state of the world.
- Based entirely on that state, update downstream dependencies accordingly.
Edge-driven logic has caused us more than its share of headaches over the years. Code that needs to observe each state transition will work consistently in a low-load dev cluster (leading you to trust it), only to silently miss updates in a high-churn production cluster (leading you to question whether you can ever trust again).
Fun fact: These uses of "edge" and "level" originate from digital signal processing.
Edge-driven logic also causes issues during production rollouts. Because the old and new replicas can't coordinate hand-off (at least, not with vanilla Informers), you can't guarantee that all transitions were caught during a rollout.
(Personally, I think OnUpdate
should simply not pass oldObj
to eliminate the footgun of comparing the two arguments. But maybe that's just a skill issue 🤷♂️)
Lesson 2: Beware the "OOMLoop"
Each of your handlers processes incoming events in serial. This may seem surprising (it certainly was to us). It's the result of Informers using Go channels to shepherd events to handlers.
This means that if an Informer's initial LIST
returned 10,000 Pods, the handler's OnAdd
function would run one at a time, 10,000 times. If each run took one second, well, you do the math. To keep things moving in these high-load scenarios, handlers should avoid high-latency operations.
While a handler is working through its backlog, memory is allocated elsewhere to hold the "pending" items in an unbounded ring buffer (the comments on that struct are worth a read; and yes, we do realize "unbounded ring buffer" is an oxymoron... we didn't name it!) This "unboundedness" means that a handler needs to keep up with the rate of change. Falling behind creates backpressure, causes the buffer to provision more and more memory, and eventually leads to an OOM (Out-Of-Memory) kill.
The relaying and buffering of notifications is done through processorListener.pop()
.
If your handlers are quick (they log a line or operate entirely on in-memory data), then you're probably safe. If you can tolerate some amount of duct tape, simply slap a go func() {}
around your code. But if you require a higher degree of sophistication, consider using a workqueue.Interface
.
This is no news. Most controllers and operators already use a workqueue
—in fact, it's the first recommendation listed in the Controllers Guidelines. But it helps to illuminate why this is recommended practice. Effectively, instead of keeping your objects in a dimly lit unbounded buffer with no observability, keep them in a metrics-infused queue with amenities like retries and delays! However, a workqueue
ain't no free lunch, as they require much more boilerplate code.
In the context of the Informer's architectural diagram below (source article), the scenario we're describing happens in Step 6: after updating the indexer (which provides the in-memory cache of objects), events are sent to individual handlers by way of the unbounded buffer:
At Render, we've had multiple incidents where important controllers got stuck in an "OOMLoop" due to stalling handlers. The sequence looked like this:
- The Informer's initial
LIST
returns items to work through, on the order of 105. Each item is enqueued in the ring buffer. - The handler hangs for an indefinite amount of time processing certain items. Regardless of the reason (typically a slow network call), memory usage starts to build as items accumulate.
- The Informer continues with the
WATCH
request (see ListAndWatch), exacerbating the issue by generating more notifications. If an Informer resync coincides, the effect is compounded. - Staying on this trajectory, the Pod eventually OOMs.
- Kubernetes recreates the Pod, and it starts all over.
This pathology is neatly summarized in this comment in code:
// pendingNotifications is an unbounded ring buffer that holds all notifications not yet distributed.// There is one per listener, but a failing/stalled listener will have infinite pendingNotifications// added until we OOM.pendingNotifications buffer.RingGrowing
Here's an example of that happening to one of our controllers:
Lesson 3: Work smarter, not harder
Here are a few optimizations you might consider the next time you write a controller:
Use the Informer's cache whenever possible. As mentioned earlier, every Informer maintains an in-memory cache of its objects. If you need the latest spec for a given Pod, it's much faster to fetch from the cache than to hit the Kubernetes API (which might be in a different region or network topology):
podInformer.AddEventHandler(cache.ResourceEventHandlerFuncs{ AddFunc: func(obj any) { // Let's say you want to look up this specific Pod's state on each Add podNamespace, podName := "default", "pod-a"
// ✅ This fetches from the informer's own cache: super fast. podInformer.Lister().Pods(podNamespace).Get(podName)
// ⚠️ This fetches from the K8s API: much slower. kubeClient.CoreV1().Pods(podNamespace).Get(ctx, podName, metav1.GetOptions{}) },})
Use WithTweakListOptions
when initializing your SharedInformerFactory
. This option applies a label selector to the Informer's LIST
and WATCH
requests, returning only resources that match the selector (instead of all resources in the cluster). This can make a big difference in large clusters, reducing the number of Pods (and Pod updates) that the Informer receives. This translates to savings on both handler load and bytes over the network.
Use the SetTransform
hook to keep memory usage in check. Specs of certain objects can become large: think Deployments with a long list of containers, or containers with a long list of environment variables. SetTransform
lets you reduce the Informer's memory footprint by zeroing out fields that you don't need, saving precious bytes. However, this tactic is a double-edged sword: zeroed-out fields can confuse debugging tools, and they're easy to overlook—especially for future team members who aren't aware of the transformations.
Wrapping up
Kubernetes Informers are powerful abstractions—but as we’ve learned the hard way, that power comes with a few sharp edges at scale. If you're writing your own controllers, you'll save yourself a lot of time and frustration by designing with those edges in mind from the start.
Thanks for reading. If this kind of deep-systems thinking sounds fun, come work with us at Render 😉
Acknowledgments
Shoutouts to Kenny Do, Oliver Huang, and David Mauskop for the technical insights shared in this post.