Skip to main content
The following sections provide details of new features and updates introduced in each release.

Densify

This topic summarizes both new and updated features introduced in Densify
The following changes have been made to Kubex in this release:
  • Updated Container Summary Tab—The container summary page has been updated with a new look and feel to better highlight current spend, waste, and risk.

Figure: COntainer Summary Tab

  • Container Out of Memory Kills—Container out of memory kill data is now visible in OOM Kills (Last Day) column on AI Analysis Details page:
    • Badges will display on summary and histogram page to indicate if there are OOM Kills in the environment
    • ML Model and historical data available in the Utilization Charts carousel and Metrics Viewer
  • CPU Throttling (%)—CPU Throttling data is now visible in CPU Throttling (%) column on AI Analysis Details page:
    • ML Model and historical data available in Utilization Charts carousel and Metrics Viewer
  • Automation Tab—Automation tab has been added to highlight request and limit changes made by the Kubex Automation Stack. An overview of optimized manifests is also available on the summary page
  • Business & Operational Attributes—Various business and operations related attributes (e.g., Application, Business Unit, Cost Center, Operational Environment) have been added to the tree view/filter options and AI Analysis Details columns
  • GPU Optimization for Container Resource Requests—– Containers with GPUs allocated now receive recommendations for downsize opportunities. The GPU Overview page has been updated to highlight this information:
    • Added system views to AI Analysis Details (Immediate GPU Savings & Addtional GPU Waste)
The following changes have been made to Cloudex in this release:
  • Effort Popup—The Effort column on AI Analysis Details page is now clickable, showing a popup with additional details on the effort required to change to the recommended instance type.
  • Overall effort is the highest effort of the individual rule items. Possible values are None, Low, Medium, and High
  • Business & Operational Attributes—Various business and operations related attributes (e.g., Application, Business Unit, Cost Center, Operational Environment) have been added to the tree view/filter options and AI Analysis Details columns
The following changes have been made to Kubex in this release:
  • Updated Container Summary Tab—The container summary page has been updated to better highlight both your current spend and the details of possible savings.
  • Updated Node Metrics—The Node Metrics Viewer has been updated to show the following GPU metrics:
    • GPU Utilization %
    • GPU Memory Utilization (%)
    • GPU Power Usage (Watts)
  • Updated Utilization Chart—The node chart showing the number of pods running on a node has been updated to show the limit for the selected node.
  • Customizable Data Aggregation—You now have the ability to edit how grouped data in a table is aggregated (i.e. sum, average, max, etc.) and save your customization in a table view. This feature is available for containers, node groups, nodes and cloud instances.
Refer to the Kubex documentation for details.
The following changes have been made to Cloudex in this release:
  • Cloud Connections—In this release you can now can create and edit the connections to AWS and Azure. You can also view data collection status for each of the connections. Cloudex uses these connections to collect data from each account, daily.
  • Accessing the Catalog Map—You can now access the catalog map directly from the AI Analysis Details tab. The catalog map with a modal view has been added to the utilization charts. You now do not need to leave the AI Analysis Details tab to use the catalog map.
  • Summary Tab—The Cloud Summary tab has been updated to improve useability.
Refer to the Cloudex documentation for details.
The following changes have been made to improve how Densify determines the cost of public cloud instances, using customer-specific discounts:
  • Azure Hybrid Benefit Status—As part of enhancing the cost data model, Densify needs to distinguish between systems that are BYOL (also called Azure Hybrid Benefit or AHB) to accurately represent the cost of the system. The license model is available in the existing data collection configuration so no changes are required to the minimum permissions for BYOL VMs.
  • Azure Instance OS Name—Densify now determines the type of Linux OS. In this release RHEL and SUSE are supported. The OS name is used to accurately determine instance costs.
  • Updated Cost Model—The cloud cost data model has been updated to apply discounts based on the following details:
    • OS Type—The type of Linux OS installed on a cloud instance. This is not an attribute and is stored in a table but cannot be edited or cleared like an attribute.
    • License Model (Attr_key is aws:licensemodel)—The licensing model that has been configured for a cloud instance )
    • Life Cycle (Attr_key is aws:lifecycle)—This attribute is not currently used but is collected for future development. It is used to store whether or not the instance is a spot instance. Possible values are “spot” or “normal”
Refer to the Cloudex documentation for details.
See Resolved and Known Issues for the list of resolved defects.
The following changes have been made to Kubex in this release:
  • Node Modal—A new modal opens when you click a node name on the Node Details page.
  • Performance Improvements— The Kubex/Cloudex UI no longer retrieves data from the Densify Reporting Database (RDB). All Kubex and Cloudex UI component queries are directed exclusively to Kubex-specific tables in MS SQL.
Refer to the Kubex documentation for details.
See Resolved and Known Issues for the list of resolved defects.
The following changes have been made to Kubex in this release:
  • GPU Metrics for Containers—In this release Densify introduces reporting of GPU configuration and utilization data for containers, node groups and nodes. This data enables you to identify GPU-enabled containers and nodes in your environment and see how effectively you are utilizing your GPU resources. You must have data forwarder v4.2.1 deployed to collect the GPU data from your environments. There are updates on many pages to highlight GPU resource configuration and utilization:
    • Updated AI Analysis Details—A number of data points allow you to identify containers with GPU requests and to determine where the GPU and GPU memory request values are higher than peak utilization over a given period. The analysis populates these attribute values based on the date range and data selection options that are defined in your container optimization policy. The AI Analysis Details page has been updated to display the following GPU data.
      • GPU Model
      • GPU Request
      • Total GPU Request
      • GPU Average
      • %GPU Average
      • GPU Sustained
      • GPU Min
      • GPU Memory Request
      • Total GPU Memory Request
      • GPU Memory Average
      • % GPU Memory Average
      • GPU Memory Sustained
      • GPU Memory Min
      • GPU Power Usage Peak
      • GPU Power Usage Average
    • New Views—Four new views have been added to the AI Analysis Details page. These views allow you to focus on your GPU data:
      • GPU Inventory
      • GPU Cost
      • GPU Low Utilization
      • GPU Memory Low Utilization
      The default view has been updated to include GPU data.
    • Updated Summary—New GPU-specific cards have been added to the containers Summary page. A note includes a link that opens a modal containing GPU pricing details. The top two cards on the Summary page have also been updated to better highlight both your current spend and the details of possible savings.
    • New GPU Tab—A new GPU tab has been added when reviewing a single container. This new page provides details of the GPU utilization on the selected container.
    • Updated Metrics Viewer—The following GPU metrics are available in the metrics viewer:
      • GPU Utilization in GPUs - Average Container
      • GPU Memory Utilization (MB) - Average Container
      • GPU Memory Utilization (%) - Average Container
      • GPU Power Usage (Watts) - Average Container
  • GPU Metrics for Node Groups—GPU metrics have also been added for Node Groups and individual nodes.
    • Allocatable GPU Memory (GB)
    • Allocatable GPUs
    • Allocatable Memory (GB)
    • Average CPU Utilization (%)
    • Average GPU Memory Utilization (%)
    • Average GPU Utilization (%)
    • Average Memory Utilization (%)
    • GPU - Node Balance Ratio
    • GPU Capacity
    • GPU Memory - Node Balance Ratio
    • GPU Memory Capacity (GB)
    • GPU Request (% of Allocatable)
    • No. of Nodes with Underused GPU
    • No. of Nodes with Underused GPU Memory
    • Nodes with Underused GPU Memory(%)
    • Nodes with Underused GPU(%)
    • Peak GPU Memory Utilization (%)
    • Peak GPU Utilization (%)
    • Primary Node GPU Model
    • Primary Node Type GPU (GPUs)
    • Primary Node Type GPU Memory (GB)
    • Unallocated GPUs
  • Updated Node Overview Tab—This tab has been updated add GPU details and to improve usability.
  • New Data in AI Analysis Details—Six new data points allow you to find containers where the CPU and memory request values are higher than peak utilization over a given period. The analysis populates these values based on the date range and data selection options that are defined in your container optimization policy.
    • CPU Peak—Highest peak value in mCores, on the busiest container.
    • CPU Sustained—Highest sustained value in mCores, on the average container.
    • CPU Min—Lowest value in mCores, on the average container.
    • Memory Peak—Highest peak value in MB, on the busiest container.
    • Memory Sustained—Highest sustained value in MB, on the average container.
    • Memory Min—Lowest value in mCores, on the average container.
  • Timezone—The utilization charts for containers, node groups and nodes have been updated to standardize the timeline on UTC. The X-axis label has been updated to indicate the timezone designation for all charts to ‘UTC ’.
  • Updated Node Overview—The node overview page has been updated to improve useability and add details of GPU utilization.
Refer to the Kubex documentation for details.
The following changes have been made to Cloudex in this release:
  • Timezone—The X-axis label has been updated to indicate the timezone designation for all historical utilization charts to ‘UTC ’ This change to does not currently apply to the ML charts. The ML charts on the Cloudex AI Analysis Details page use Densify server’s time. So this will be either “Eastern Time” or “Central Europe Time”.
Refer to the Cloudex documentation for details.
See 3.2.0 June 11, 2025 for the list of resolved defects.
The following changes have been made to Kubex in this release:
  • Users no longer need to clear their browser cache after an upgrade. This changes applies to Kubex and Cloudex only. When using the Densify Console users must clear their browser cache after an upgrade,
  • AI Analysis Details Export —When the content of the AI Analysis Details is exported, the Container Name is provided as a hyperlink that takes you to the Overview page for the selected container. When you open the exported .XLS file, the content initially appears as plain text, and when you click on the cell it becomes a hyperlink.
  • Node Group Overview—The cards on the Node Group Overview page have been updated to better report on the status of a single node group. Additionally, utilization charts have been added to this page so you can review detailed metrics for a selected node group.
  • New Filter—A new public filter, “With automation enabled”, has been added to the container tree viewer. When selected this filter displays all containers with automation enabled. i.e. Automation Enabled = true.
  • Improved Performance>—By focusing on nodes that were running in the last day rather than the past 7 days you will notice improved performance when working with nodes and node groups.
  • Access to the Online Help—A new help icon provides direct access to the online help.
  • Renamed Value—The value, “No. of Containers” has been renamed to, “Avg No. of Containers”. The new name more appropriately describes the value. The following pages have been updated: Summary, Overview, Histogram and AI Analysis Details.
Refer to the Kubex documentation for details.
See Resolved and Known Issues for the list of resolved defects.

In this release Densify introduces Cloudex. In this new console, Densify’s patented analytics determine optimal resource settings for your public cloud environments and display the results in a new user interface.

Use the new Cloudex console to review resource utilization across your AWS, Azure and Google Cloud environments.

You can access the new console from your existing instance using:
https://<customer name>.densify.com/kubex
See the new Cloudex website to start a free trial.Use your existing Densify credentials to access the Cloudex console. You can either close the browser window or navigate back to the Densify Console to log out.The new console uses the latest MSSQL and Postgres databases.
The following changes have been made to Kubex in this release:
  • Automation Tab—A new automation tab has been added to report on the status of the Densify Container Automation solution.
  • Updated Connections Table —The following changes have been made to this table:
    • Three new columns have been added: Cluster Version, Kubex Automation Stack Version and Prometheus Version
    • A link that provides secure Densify credentials has been added. The credentials are provided in a code snippet than can be copied directly into your values.edit.yaml file. This new method makes it easier for you to set up container data collection, reducing the previous manual dependency of requesting these credentials from Densify.
  • Updated Optimization Breakdown—Hovering over any bar in this chart, displays a description of the recommended optimization. Badges have also been added to indicate significant information, such a s memory events or unspecified resource settings.
  • Summary Page—The Summary page has been updated for improved readability.
  • Cost Modeling—Kubex now has improved algorithms that use existing utilization data to estimate the cost for containers with unspecified CPU/memory request values. The Summary, AI Analysis Details, and Overview pages have all been updated to show the cost estimates for these containers. The following updates have been made to show these values:
    • Two columns have been added to the AI Analysis Details tab:
      • “Total Surplus CPU Request from Unspecified (mCores)”;
      • “Total Surplus Memory Request from Unspecified (MB)”
    • Two system views have been added to the AI Analysis Details tab:
      • “CPU Request Shortfall (from Unspecified)”
      • “Memory Request Shortfall (from Unspecified)”
    • The Overview tab and single instance modal page have been updated to add details of unspecified CPU and memory request settings. These additional cards are only displayed if the selected container does not have specified CPU and/or memory request settings.
Refer to the Kubex documentation for details.
This release provides enhancements to the Kubernetes automation framework including an updated version of the Mutating Admission Controller (MAC) that sends configuration and modification event status information to your Kubex instanceA new Automation status page has been added to Kubex that reports on the status of containers that have been enabled for automation, You can enable/disable containers for automation using a right-click context menu from the Kubex container tree viewer. Refer to the Kubex documentation for details.Refer to Mutating Admission Controller for details on deploying and using the Densify Container Automation solution.
The following endpoints have been added to the Densify API:
  • GET /kubernetes/clusters
  • GET /kubernetes/clusters/<clusterName>
  • GET /kubernetes/clusters/<clusterName>/containers
  • GET /kubernetes/clusters/<clusterName>/containers?details=true
  • GET /kubernetes/clusters/<clusterName>/automation
Two new attributes have been added to support the automation features.
  • Kubex Automation—Indicates whether the container is eligible for automation within Kubex.The attribute is set one-time, is static, and is not inherited from cluster, node group, or any higher level.
  • Kubex Policy—Indicates what policy to use when automation is enabled.
See Densify API New Features
The token expiry has been increased from 5 to 60 minutes. Re-authorize will no longer be required for longer-running pipeline operations that previously exceed the 5-minute limit. The JSON Web Token (JWT) secures the Densify API so users can only make authorized API requests.See Authorize
In this release, singleton scale groups, both ASGs and VM Scale Sets, will now be analyzed as scale groups rather than as EC2 or VM instances. You will observe the following changes:
  • Densify Console:
    • Singleton ASGs/VM Scale Sets will now only appear in the corresponding ASG or VM Scale Set tabs.
    • If you have large number of these singleton scale groups, you may notice a decrease in the number of EC2 and Azure VMs reported, while the ASG and VM Scale Set counts will increase accordingly.
    • The Catalog Map will no longer report on ASGs and VM Scale Sets with Max Size =1.
    • No change to the Impact Analysis and Recommendation Report for ASGs, as this report recognized singleton ASG instances and reported on them accordingly.
  • Scaling Recommendations:
    • Singleton scale groups will now have scaling recommendations, as required. Previously, there were no scaling recommendations, only instance type change recommendations.
    • If you need to control the group size and prevent scaling, use the “Group Max Size Override” and “Group Min Size Override” attributes. See Overriding Cloud Recommendations. Contact Support@Densify.com for detail of configuring these attributes.
  • API Results:
  • Licensing:
    • Currently, singleton scale groups are licensed as individual EC2/VMs.
    • With this update, ASG licensing can be based on 1 ASG = 4 licenses OR 1 license per instance, based on the average number of instances in the scale group. VM Scale Sets are licensed using 1 license per instance, based on the average number of instances in the group. See License Compliance Report.
    • The current container licensing model is based on the last known number of containers, on which the scale groups are running. With this change, container licensing will be calculated based on the average number of containers used, over the data range, as defined in the policy.
The public cloud metadata catalogs have been updated to ensure Densify recommendations are based on the latest vendor pricing:
  • AWS Metadata Updates:
    • The pricing has been updated and is correct as of April 7, 2025.
    • The following new instance type has been added:
      • i8g.48xlarge

  • Azure Metadata Updates:
    • The pricing has been updated and is correct as of April 7, 2025.
    • The following new instance families have been added:
      • Dsv6
      • Ddsv6
      • Dasv6
      • Dadsv6
      • Dlsv6
      • Dldsv6
      • Dalsv6
      • Daldsv6
      • Esv6
      • Edsv6
      • Easv6
      • Eadsv6
      • Fasv6
      • Falsv6
      • Famsv6
    • Retired the following instances:
      • NC
      • NC-Promo instances
      • NCv2 instances
  • GCP Metadata Updates:
    • The pricing has been updated and is correct as of April 14, 2025.
    • The following new instance families have been added:
      • c4a-standard-lssd
      • c4a-highmem-lssd
See 3.0 April 23, 2025 for the list of resolved defects.

Container Data Forwarder

This section lists new features and updates to the Container Data Forwarder. A Helm chart bundles all of the components required for container data collection and automates the process. See Kubex Automation Stack for details for a single-cluster configuration. Refer to Github repository for samples and configuration files for multi-cluster configurations. When deploying the Container Data Forwarder ensure that the same version is deployed for all of your clusters. See Data Collection for Containers
This update collects additional GPU metrics.
  • The following GPU metrics are now collected for Containers:
    • GPU Utilization in GPUs - Average Container*
    • GPU Utilization in Percent - Average Container*
    • GPU Memory Utilization in MB - Average Container*
    • GPU Memory Utilization (%) - Average Container*
    • GPU Power Usage (W) - Average Container*
    • GPU Utilization in GPUs - Busiest Container
    • GPU Utilization in Percent - Busiest Container
    • GPU Memory Utilization in MB - Busiest Container
    • GPU Memory Utilization (%) - Busiest Container
    • GPU Power Usage (W) - Busiest Container
  • The following GPU metrics are now collected for nodes:
    • GPU Utilization( GPUs)
    • GPU Utilization(%)*
    • GPU Memory Utilization (MB)
    • GPU Memory Utilization (%)*
    • GPU Power Usage (W)*
    • GPU Requests (GPU)
    • GPU Limit (GPU)
    Only those metrics indicated with an asterisk(*) are displayed in the current Kubex user interface. The remaining metrics will be covered in a future release.
The Helm charts have been updated as follows:
  • All-In-One Kubex-automation-stack has been updated to version 0.9.8;
  • The chart containing only the data forwarder has been updated to version 4.0.6.
These updates address the following issues:
  • The metrics that are retained in Prometheus are now limited to only the metrics that Densify requested. This change reduces the resources consumed by Prometheus, in large clusters.
  • Both the pod security context and container security context are now set for all of the AIO chart components, including the Densify data forwarder, Prometheus, Node exporter etc. This includes using the runtime default seccomp profile, running as a non-root user, no privileges for escalation, mounting the root file system as read-only, etc.
  • A sizing option for Prometheus resources, has been added. The size is based on the cluster size.
  • The Prometheus subchart has been updated to the latest version.
  • The Prometheus scrape configuration has been updated to use the k8s endpointslice. The endpoints that the data forwarder was using were deprecated in the K8s v1.33 API and will be removed in a future release. Endpointslice has been available since K8s v1.21.
  • Installation instructions for offline (or air-gapped) mode are now provided.
See Kubex Automation Stack for details. Ensure that your helm chart has been upgraded to the latest version, using the helm upgrade command.
This update contains an upgrade for Go to 1.24 for all package dependencies. This upgrade eliminates security scan warnings preventing some customers from upgrading their data forwarder to the latest version.
This update adds support for NVIDIA GPU metrics and HPA:
NVIDIA GPU Metrics—With this update, the Densify data forwarder now collects GPU data from your Kubernetes clusters.The analysis and display of the collected metrics will be provided in a future release of Kubex.Note the following additional prerequisites to collect the GPU data:
  • NVIDIA-device-plugin—This plugin allows containers to access the NVIDIA GPUs. It must be installed on all your Kubernetes clusters to allocate NVIDIA GPU resources to workloads and to provide the GPU data.
  • dcgm-exporter—This Prometheus exporter exposes GPU metrics from the Data Center GPU Manager (DCGM). It is required to collect GPU data such as, utilization, memory usage, and power usage from NVIDIA GPUs, The dcgm-exporter can be deployed as a DaemonSet, where each node with an NVIDIA GPU runs a pod that exposes these metrics in a format that Prometheus can scrape and the Densify data forwarder then collects.
The GPU data collection is currently supported on the following platforms:
  • AKS
  • EKS
  • GKE
HPA Scales Based on Multiple Metrics—With this update Densify recommendations are not implemented for any containers enabled for HPA that scale based on single or multiple metrics, that are also enabled for automation using the Densify Mutating Automation Controller (MAC).Previously Densify did not support HPA-based on multiple metrics.
The Kubex Automation Stack has been updated to collect data from the NVIDIA DCGM exporter. You must ensure that your helm chart has been upgraded to the latest version (0.9.6) to get GPU data. You will get the updates automatically if you have the following settings, in the data forwarder cronJob specification:
  • image: densify/container-optimization-data-forwarder:4
  • imagePullPolicy: Always
If the settings are not configured as above, then you need to update the image, manually.
This update resolves an issue with reporting a container’s parent node and node groups.
This update collects new container and node-level metrics to provide more accurate recommendations.Upgrade data forwarder v4.1.2 after upgrading to Densify v2.10.0. New database columns have been added in v2.10.0 that are required by this version of the data forwarder for the new metrics and attributes.When updating the data forwarder, you need to ensure that the same version is deployed for all of your clusters.
The data forwarder has been updated to collect the following:
  • Horizontal Pod Autoscaler (HPA) target metrics—Three new attributes have been added.
    • hpa_target_metric_name
    • hpa_target_metric_type
    • hpa_target_metric_value
  • Node taints—The configuration attribute and new metrics will be collected:
    • Added the multi-value attribute, “Node Taints” (attr_NodeTaints)
  • QoS class—The configuration attribute and new metrics will be collected.
    • Added the attribute, Quality of Service Class” (attr_QOSClass)
  • Node Working Set Memory Metrics—Working set memory is a process (or container) metric, rather than a node metric. It has been added to align with what is already shown in the AKS console. Densify provides the following additional node memory metrics:
    • working set memory (in bytes)
    • working set memory utilization (percent)
    • memory utilization (percent) (based on memory_bytes metric)
    • memory actual utilization (percent) (based on memory_actual_workload metric)
    • total node memory (in bytes) (configuration attribute)
Densify’s container analyses will be updated in future releases to utilize this HPA data when generating recommendations.
To improve the identification of a container’s parent node, the provider_id has been introduced as a third identification component. Currently, nodes are identified using cluster_name and node_name.The provider_id is optional and will be used only if required. In Kubernetes clusters without aprovider_id, node identification will continue to rely on cluster_name and node_name, ensuring that node IDs remain unchanged.Specifically, for EKS and OKE clusters, existing node IDs will change and as a result, node history will be lost since these nodes will receive new IDs.
The data forwarder’s configmap.yaml file has been updated. The setting, node_group_list has been updated to add label_karpenter_sh_nodepool. When enabled, Karpenter NodeGroups are discovered and created in Densify.
See 4.2.0 - May 15, 2024 for the list of resolved defects.
This update resolves an issue with reporting CPU and memory reservation percent for nodes, node groups and clusters.See Resolved Issues: Resolved and Known Issues for the resolved defect.
Version 4.1.0 collects new container and node-level metrics to provide more accurate container sizing recommendations.
Node level metrics and some new container metrics are collected, but will not be available in the Densify Console until the release of Densify v2.6.
The following container metrics are now collected:
  • max_cpu_throttling_percent—The Linux kernel allocates “CPU periods” (default = 100 milliseconds) to both processes and containers. The percentage is the number of throttled periods out of the total number of 100-ms periods. This value is a percentage of periods that were throttled vs those that were requested. This provides a more accurate indication of resource limitations. For example if the pod was throttled for 8 seconds out of the 5 minutes, this would be 2.7%, but if the CPU was not actually requested for the full 5 minutes, then the 2.7% is not an accurate representation of the state of the container. If the pod was throttled for 8 seconds, and it’s request was less that than the 300 seconds the throttled percentage is higher.
  • avg_cpu_throttling_percent—Indicates the average percentage of the number of 100-ms periods that a container is throttled in terms of CPU usage. The average and maximum values are collected since the container is likely aggregated. Metrics are aggregated at the highest level of the pod owner. For a deployment of one pod in one container the average and maximum will be the same. If it is a deployment of 10 pods then the average and maximum will not be the same.
  • sum_cpu_throttling_seconds—Aggregates the total time during which throttling has been applied.
  • Container Events—These are not metrics that are collected at 5-minute intervals, but are individual events and the time that the event occurred or was detected. In this version of the data forwarder “process exit” is now collected. The exit code and whether or not this is the main process (i.e. PID #1 using a true/false flag) of the container is collected and stored. In this version process exit will be false only when PID #1 exits with code 137 which corresponds to an OOM kill. In all other cases, process exit will be true.
CPU and memory request and limit values at the top of the hour are already collected and stored as attributes, once per day. In this release the data forwarder collects the following metrics at 5-minute intervals:
  • CPU Limit—The defined CPU allocation limit for the container.
  • Memory Limit—The defined memory allocation limit for the container.
  • CPU Request—The defined CPU allocation requested for the container.
  • Memory Request—The defined memory allocation requested for the container.
You must have Densify v2.5, to see these new metrics in the Container Details report and metrics viewer. See Optimizing Your Containers - Details Tab and Using the Metrics Viewer for Containers.
Densify currently sets configuration attributes with both the node-level and container-level CPU, memory request and limit values from the top of the hour. This provides a point in time data point that allows you to see how a node’s request and limit values move during the day.In this release the data forwarder collects the following metrics at 5-minute intervals:
  • CPU Limit—The defined CPU allocation limit for the node.
  • Memory Limit—The defined memory allocation limit for the node.
  • CPU Request—The defined CPU allocation requested for the node.
  • Memory Request—The defined memory allocation requested for the node.
  • Pod Count—The number of pods running on the selected node.
If values have not been specified this will be indicated.The following node events are collected at 5-minute intervals:
  • oom_kill_events—The number of kill events that happened on the node in a 5-minute interval;
  • cpu_throttling_events—The number of CPU throttling events that happened on the node in a 5-minute interval;
The following metrics are calculated from the raw data:
  • cpu_reservation_percent—The percentage of the node’s total CPU resources that are reserved or guaranteed for workloads, containers, or virtual machines.
  • memory_reservation_percent—The percentage of the node’s total memory resources that are reserved or guaranteed for workloads, containers, or virtual machines.
The following configuration details are now collected and stored in attributes:
  • provider_ID—The provider_id is available as a Prometheus label of kube_node_info metric which is extracted to its own column in the k8s_node_v0 postgres table. The relevant node data is then used to facilitate link the Kubernetes node to a cloud instance. This linking will be done in postgres, as will the determination of the relationship between cloud instance and ASGs or VM Scale Sets.
  • k8s_version—The Kubernetes version of the node is collected and stored in the attribute, k8s_node. This value is collected for each node and for the cluster. This is currently informational only and is not used not used in the analysis.
The features allowing you to view node details will be provided in Densify v2.6.
The Kubernetes version of the cluster is collected and stored in the attribute, k8s_cluster. This is the version of the control plane/API server. It can be higher than those of the nodes in cases where the control plane was upgraded and the nodes have not yet been updated. This is used when determining OOM kills for Kubernetes versions older than 1.28.
Containers will now be linked to their node group using a provider ID that is populated by each public cloud provider’s Kubernetes offerings (i.e. EKS, AKS and GKE). The provider_id attribute is passed to the kubelet.The provider_id is available as a Prometheus label of kube_node_info metric which is extracted to its own column in the k8s_node_v0 postgres table. The relevant node data is then used to facilitate linking the Kubernetes node to a cloud instance. This will be done in postgres, as will the relationship between cloud instance and ASGs or VM Scale Sets.
See 4.1.0 - September 19, 2024 for the list of resolved defects.
Version 4 introduces significant updates to Densify’s container data collection solution. This version resides in a new Github repository.While the core PromQL queries remain the same, with the exception of some improvements and bug fixes, the data forwarder has been re-architected to support multi-cluster data collection.You can upgrade from version 2.x or 3.x to version 4.0. You need to update the image tag to “4”:
  • image: densify/container-optimization-data-forwarder:4
  • imagePullPolicy: Always
When updating the data forwarder, you need to ensure that the same version is deployed for all of your clusters.See Data Collection for Containers for details of configuring the connection.
In addition to Prometheus, third party observability platforms are supported. Commercially-available observability platforms do not need to reside within a cluster and can be used to monitor multi-cluster environments. The observability platform must support the Prometheus API and one of the following authentication mechanisms:If you are using Amazon Managed Service for Prometheus (AMP), an AWS role is required to associate the Kubernetes service account with Densify for the purpose of container data collection. The additional required annotations can be added to your Helm charts.The following Prometheus dependencies are applicable:
  • AWS Managed Prometheus data ingestion requires Prometheus v2.26.0 or higher.
  • Starting with version 16.0, the Prometheus chart requires Helm 3.7 or higher, to install successfully.
The following commercial observability platforms are supported:The following self-hosted OSS solutions are supported:
Version 4 collects data from external Prometheus servers. In this use case Prometheus runs outside the Kubernetes cluster, in which the data collection runs.
Since Prometheus can now be run externally, you can now run a single job to collect data for multiple Kubernetes clusters from a single Prometheus server/observability platform. Using Prometheus labels, the job determines which data belongs to which cluster.
You now have the option to select “container_memory_working_set_bytes” in addition to “container_memory_usage_bytes” or “container_memory_rss” for container data collection. This new metric allows you to determine how aggressively you want to optimize your container memory allocations, when analyzing your Kubernetes environments.Using the metrics, “container_memory_rss” and “container_memory_working_set_bytes” provides more aggressive recommendations that allow you to reclaim more memory. Using “container_memory_usage_bytes” and “container_memory_rss” provide recommendations that will not result in downsizing recommendations that may lead to out-of-memory (OOM) issues.The corresponding workload charts were added to the Containers - Details Tab in v2.2.0. See Optimizing Your Containers - Details TabContact Support@Densify.comfor details on selecting memory utilization for your container data collection.
The following updates and bug fixes were made in this release:
  • HTTP retries have been added to the calls to the Prometheus API. This handles observability platform rate limiting.
  • The data forwarder now handles the relabel configs of Node Exporter.
  • Outdated Node Exporter metrics have been addressed.
  • Utilizes improved Horizontal Pod Autoscaler metrics (autoscaling v2).
  • The data forwarder has been upgraded to Go 1.22,
  • Updated examples for both single and multiple cluster configurations are provided in the new Github repository.
See 4.0.0 - May 28, 2024 for the list of resolved defects.
Densify will discontinue support of Container Data Forwarder versions 3.x.x on June 30, 2024. Contact Support@Densify.comfor details on migrating to Densify’s current version of the Container Data Forwarder.