Pinpointing Health Issues in the Cloud with a Three-Prong Attack
Complex era of multi-cloud
As more organizations drift to the cloud, the view into application performance management continues to become tougher to manage and challenging as organizations track more applications across on-premises or hybrid cloud environments managed by multiple providers.
A recent industry survey of enterprise IT professionals revealed that 85 percent are managing a multi-cloud strategy with 32 percent of their workloads running in a public cloud and 43 percent in a private cloud.
Many applications are not prepared to support multi-tiered workloads (think storage tiers) on premises let alone in the cloud. And many legacy IT management tools haven’t adapted to monitoring cloud-based applications. Pinpointing application performance problems in a traditional application stack was already tough enough. Throw in on-premises stacks and one or more cloud providers, and suddenly you have an environment with multiple moving parts, affecting application performance from several points.
Application performance challenges in this environment have deterred many IT leaders to adopt cloud. But in the end, the need to lower costs and added flexibility invariably drove them to the cloud.
Three pronged monitoring and management strategy
Multiple points of performance degradation call for a three-pronged monitoring and management strategy that includes application performance management (APM), infrastructure monitoring and log aggregation tools. Combining these three solutions provides a complete look across the entire combined stack of on-premises and cloud-based applications to identify bottlenecks before they occur.
In hybrid clouds, an organization’s key performance bottlenecks are related to capacity and network performance between on-premises and cloud workloads. This combination offers the greatest potential to negatively affect the end-user experience. APM tools can help track real user experiences or test instrument synthetic transactions to monitor performance across an application.
With an APM tool, you can monitor a single use transaction per user or aggregate multiple transactions to identify the overall performance of the given application or transaction.
APM tools don’t always provide the full picture. You can know the transaction is running slowly, but what’s the source of that problem? Is it an oversaturated network switch, a misconfigured load balancer, a physical storage bottleneck or a bad fan on a compute blade that is causing a CPU to overheat?
This stage is where APM tools fall short, but infrastructure monitoring and log analytic tools fill the gap. Infrastructure monitoring can cover network, storage and compute levels through indicators like throughput, latency, IOPS, capacity, consumed bandwidth, and memory or CPU usage. They can also dive into the database or application layers to look at database response times or cache ratios.
Log tools consolidate and index log and machine data, including structured, unstructured and complex multi-line application logs. This indexing enables you to collect, search, correlate, visualize, analyze and report on machine-generated data to identify and resolve operational and security issues. Ex on-prem and cloud tools: Splunk, VMware vRealize Operations Manager.
Once you’ve built your portfolio of tools, you need to develop a sequence or series of steps to find the root cause of performance issues in the cloud or hybrid cloud hosted applications. Most likely, you’re looking at a top down approach using the three-prong approach strategy and four steps.
The four step investigation process
First, use your APM tool to determine if the problem is application wide or just affecting specific transactions. The APM tool helps identify the application and transactions between the affected applications.
Next, compare individual user performance to the aggregate performance. You can see those outliers with those individual users, but you also need to look at the aggregate performance of everyone who is your synthetic transactions. An APM tool should help you isolate what parts of the transaction are taking longer than normal.
After you’ve isolated the problem, evaluate the tiering or servicing of the degraded application using your infrastructure monitoring tools. Sometimes the problem might exist between two infrastructure tiers like network connectivity from the compute chassis to the top of rack server or the fibre channel network from compute to storage. So again isolate the tier to service the application or transaction that was degraded.
Leverage your infrastructure monitoring tools to help you isolate a problem between or within a given tier. It can isolate problems like hung processes or CPU memory issues or storage latency.
Once you’ve isolated down to that certain component, you can determine if the root cause is the configuration, algorithmic or a result of extensive VM extensions, over subscriptions or IO and database bottlenecks. This is also where log analytic tools come into play to help you diagnose the problem further and drive towards isolating it.
The three-prong approach makes it easier to monitor and improve your cloud performance. Applications spread across multiple clouds and service providers can create headaches across departmental lines. But if you put the power of three to work in an integrated fashion across APM, infrastructure, and log analytics tools or use a single tool that integrates all three, you’ll find your problems can be located and solved before they do damage to your application performance.