Node Tuning Operator on Openshift Container Platform
Node Tuning Operator
The Operator manages the containerized Tuned daemon for OpenShift as a Kubernetes DaemonSet.
It ensures custom tuning specification is passed to all containerized Tuned daemons running in the cluster in the format that the daemons understand. The daemons run on all nodes in the cluster, one per node.
Node-level settings applied by the containerized Tuned daemon are rolled back on an event that triggers a profile change or when the containerized Tuned daemon is terminated gracefully by receiving and handling a termination signal.
Once the operator is installed via YAML, it creates a deployment and a default Custom Resource. This default CR is used for Node level tuning, however if customizations are made, the node level tuning gets overwritten by the Operator.
Labels can be used to apply selective custom-tuning for the kernel sysctl either at a Node level or pod-labels.
Newly created CRs will take into effect based on the labels as well as profile priorities.
An example of a custom-tuning-specification could be as below.
As you can see multiple specs are provided and at the bottom there is a logic for picking up a respective profile based on priority.
apiVersion: tuned.openshift.io/v1kind: Tunedmetadata:name: defaultnamespace: openshift-cluster-node-tuning-operatorspec:profile:- name: "openshift"data: |[main]summary=Optimize systems running OpenShift (parent profile)include=${f:virt_check:virtual-guest:throughput-performance}[selinux]avc_cache_threshold=8192[net]nf_conntrack_hashsize=131072[sysctl]net.ipv4.ip_forward=1kernel.pid_max=>4194304net.netfilter.nf_conntrack_max=1048576net.ipv4.conf.all.arp_announce=2net.ipv4.neigh.default.gc_thresh1=8192net.ipv4.neigh.default.gc_thresh2=32768net.ipv4.neigh.default.gc_thresh3=65536net.ipv6.neigh.default.gc_thresh1=8192net.ipv6.neigh.default.gc_thresh2=32768net.ipv6.neigh.default.gc_thresh3=65536vm.max_map_count=262144[sysfs]/sys/module/nvme_core/parameters/io_timeout=4294967295/sys/module/nvme_core/parameters/max_retries=10- name: "openshift-control-plane"data: |[main]summary=Optimize systems running OpenShift control planeinclude=openshift[sysctl]# ktune sysctl settings, maximizing i/o throughput## Minimal preemption granularity for CPU-bound tasks:# (default: 1 msec# (1 + ilog(ncpus)), units: nanoseconds)kernel.sched_min_granularity_ns=10000000# The total time the scheduler will consider a migrated process# "cache hot" and thus less likely to be re-migrated# (system default is 500000, i.e. 0.5 ms)kernel.sched_migration_cost_ns=5000000# SCHED_OTHER wake-up granularity.## Preemption granularity when tasks wake up. Lower the value to# improve wake-up latency and throughput for latency critical tasks.kernel.sched_wakeup_granularity_ns=4000000- name: "openshift-node"data: |[main]summary=Optimize systems running OpenShift nodesinclude=openshift[sysctl]net.ipv4.tcp_fastopen=3fs.inotify.max_user_watches=65536fs.inotify.max_user_instances=8192recommend:- profile: "openshift-control-plane"priority: 30match:- label: "node-role.kubernetes.io/master"- label: "node-role.kubernetes.io/infra"- profile: "openshift-node"priority: 40