I'm using Prometheus query language, PromQL, quite a bit these days. But all I do are very simple queries like sum(...)
or rate(...[5m])
on a OpenShift cluster I work with.
For few weeks now, mine inner me was bothered with one slightly more complex stuff. To filter one metric by label from different metric - something like JOIN in SQL world. Specifically, I wanted to see number of pods running on each cluster node with "worker" role.
We have (I'm on OpenShift 4.13) kube_node_role{role="worker"}
(AFAICT this is what we call "vector" in PromQL) that have these labels:
Name container endpoint job namespace node prometheus role service Value kube_node_role kube-rbac-proxy-main https-main kube-state-metrics openshift-monitoring ip-1-2-3-4.ec2.internal openshift-monitoring/k8s worker kube-state-metrics 1 kube_node_role kube-rbac-proxy-main https-main kube-state-metrics openshift-monitoring ip-1-2-3-5.ec2.internal openshift-monitoring/k8s worker kube-state-metrics 1 [...]
And we have kube_pod_info
with these labels:
Name container created_by_kind created_by_name endpoint host_ip host_network job namespace node pod pod_ip priority_class prometheus service uid Value kube_pod_info kube-rbac-proxy-main <none> <none> https-main 10.201.24.232 false kube-state-metrics openshift-etcd ip-1-2-3-6.ec2.internal etcd-guard-ip-10-201-24-232.ec2.internal 10.128.2.14 system-cluster-critical openshift-monitoring/k8s kube-state-metrics a2eec7b0-9f29-42b4-853d-6919d963ffa1 1 kube_pod_info kube-rbac-proxy-main <none> <none> https-main 10.201.24.232 false kube-state-metrics openshift-etcd ip-1-2-3-6.ec2.internal revision-pruner-13-ip-10-201-24-232.ec2.internal 10.128.2.4 system-node-critical openshift-monitoring/k8s kube-state-metrics df5cdd67-b0f5-4896-b0b0-85095a9f3122 1
We will use on(...)
and group_left(...)
PromQL operators. I had some issues understanding what these do, so here is mine interpretation:
*
because values are always 1 in these vectors, it is safe to multiply these.
on(...)
allows me to define common label(s) that should be used to match two different vectors.
group_left(...)
... thinking, thinking, nah. I forgot mine mental model here :-/
And this is the final query I used:
sum( kube_pod_info{} * on(node) group_left(role) kube_node_role{role="worker"} ) by(node)
These links helped me a lot: