In today’s post, we will continue looking at the Blue Medora ITM Agent for Amazon EC2 with a walkthrough of the agent in the TEP. In this post, we will consider a number of enterprise scenarios and use the agent to solve them.It is my hope that the information given here will help you to quickly begin taking advantage of the agent’s features in your ITM environment. — Mike Major
Let’s start with an easy one: Amazon is known for their uncanny ability to maximize the up-time of their services, but even they may experience some downtime or slow time once in awhile. In this scenario, let’s assume that an EC2 instance is not responding and we want to know if we currently have a connection to the service. This problem is easily solved by clicking the AWS Health node.
Here we can see a quick overview of the status of all Amazon Web Services. Near the bottom, we can also view our current connection with the EC2 and CloudWatch services. There are also a couple of pre-packaged situations that could help us to become aware of an unreachable service. The situations KB5_EC2_Unreachable and KB5_Cloudwatch_Unreachable fire when there is no response from the EC2 service or CloudWatch service respectively. We will get into the ITM Situations available a little more in the next scenario.
Consider this scenario: The system’s administrator for your enterprise needs CloudWatch data to keep a close watch on the performance of your EC2 environment’s instances. The first thing we will want to do is determine how to view the needed CloudWatch data in the TEP. From the TEP desktop client, we will first view the collected CoudWatch data for all the instances. Click the CloudWatch Summary node.
Here we get a quick overview of the major CloudWatch data such as CPU utilization, network activity and disk operations. For a more detailed look at any of these areas, we can right click the node, clickWorkspace, and click the workspace view for the data you are interested in.
If we have warehousing enabled, we can view historical CloudWatch data for the last hour, day or week. Let’s take a look at the last day.
Historical views give a quick, graphical indicator of the trends of data in your environment. They can be used to debug problem areas and identify areas that could handle a bigger workload. We can also view the instances that currently have CloudWatch enabled and disabled. Let’s do that now.
Here we can see much of the same information as the higher level CloudWatch node, except this information is unique to this specific instance. Like above, we can view historical data with warehousing enabled by right clicking the CloudWatch node and clicking the view we would like to see.
Historical data for a specific instance is very useful for analyzing the performance of an instance. In our scenario, this may be the screen that the administrator is most interested in.
Now, we want to know when one of our instances has CloudWatch disabled, since we are depending on the CloudWatch data for noticing any problem areas. Like I mentioned earlier, we could check this list to see when one of our instances does not have CloudWatch enabled, but that is tedious and takes unnecessary time out of our day. A better solution would be using an ITM Situation. Specifically, the pre-packaged situation named KB5_Instance_CW_Disabled. To start the situation, right click the instance node and click Manage Situations.
We will now see a list of the available situations for this agent. Click the KB5_Instance_CW_Disabled situation, and Start Situation… near the top (It looks like a ‘play’ button).
Now that we are being notified that our instances have CloudWatch disabled, let’s fix the problem with an ITM Take Action. Right click on the Amazon EC2 node and click Take Action… followed by Select…
Choose the Start_CloudWatch_All_Instances Take Action and click OK. You could also start CloudWatch for a single instance by right clicking on the Instance node to choose a Take Action.
Congratulations! Your Instance now has CloudWatch enabled!
Consider this scenario: You have a number of critical servers on your Amazon EC2 environment. These servers need to be running smoothly in order to work effectively. To solve this problem, we need to first set a situation to warn us when a critical server has a very high CPU utilization. Luckily, the agent comes with a couple pre-packaged situations that do just that. Navigate to the Manage Situation screen as in the above example. We have two pre-packaged situations to choose from, KB5_Instance_CPU_Warn (which warns us of CPU utilization from 75% – 90%) and KB5_Instance_CPU_Crit (which warns us of a CPU utilization of 90% or higher).
Choose the one that feels the most appropriate and click the Start Situation button.
Now that we are aware of any instances behaving badly, let’s fix the situation. Most likely, the instance is being overtaxed. Identifying these instances with the situation warnings will allow you to be aware of whether or not it is time to spawn off another Amazon EC2 Instance and balance the workload. Let’s assume that, in our example, the problem is a simple runaway process. One way to fix this might be to reboot the instance at the end of the work day. We can do this with a Take Action. Right click the node of the instance in question and navigate to the Take Action window as in the above example.
Now choose the Reboot_EC2_Instance take action and click OK.
Notice the list of instances at the bottom of the screen. The instance we are working with is highlighted by default, but you can choose to reboot more instances at the same time by holding the Ctrl key and clicking the instances you would like to reboot. Click OK and the instance will reboot! Well, that’s it for now. I hope these scenarios have given you a good idea of the power behind the Blue Medora ITM Agent for Amazon EC2.