Tech Field Day at VMworld: Why Metric Collection Matters

Tech Field Day at VMworld: Why Metric Collection Matters Video and Transcript

I wanted to talk a little bit about why metric collection matters. Back in June, we did an interview with a number of VMUG members. We talked to over 400, specifically asking, you know, questions about their monitoring
environment–some of the challenges that they had, how they were using it, how they were implementing their monitoring strategy.

These are four things, just to give you a little bit of background. These are four items that we think about when we talk about Dimensional Data in monitoring. We talked about having a universal data language. So this is basically just saying do we have the same data across all of our platforms that are available. So that’s if you’re using multiple monitoring solutions that you can get that data in each one and it’s the same. You’re not collecting in a different way,
but you’re using something similar that is not siloed.

Next, we talk about internal relationship links. So this is if we have, you know, once you get to a certain level of depth. In the certain level of depth, monitoring a specific technology. You’re going to break that into different components. So if we’re talking about a Kubernetes cluster, you’re going to have a lot of individual components. It’s going to have nodes and pods, and master/slave controllers. Each one of those is going to have its own metrics associated with it. But we also need to know how they relate to each other. If you don’t have that context, it becomes more difficult to understand what’s going on in your environment.

The next is the external relationship metadata. So not just at the component level, but as I mentioned earlier, let’s say it’s a database, to a virtual machine, or to an ec2 instance and then to the underlying infrastructure. Being able to relate those one-to-another has a huge impact on your ability to take action.

And then finally something we call super metrics, which is really just beyond the raw metrics. It is taking, kind of, key performance indicators and pulling together data that allows you to get a realistic and meaningful health for each one of your components and then how that health relates to two different pieces within your environment.

Q. I have a question on the internal and external relationship pieces. So, earlier on you mentioned you guys can visualize or understand common threads between system interdependencies, right? You see the linkage. From an operational and incident management perspective, do you guys do any sort of advanced correlation or event-based triggers that say “we see these things happening, we visualize at all”? Are there forecasting, trending, kind of predicts the severity of the incident that you can say “you should check these out”? Let’s raise the ticket. Let’s, here, proactively look into what we’re seeing. Is that a function or feature?

A. It’s…so, this is something that we will focus on with the platform. A lot of the platforms that we integrate with do have that. We’re providing the data and then they’re doing the analysis on that. And because those relationships are there it allows them to make, you know, exactly those kind of conclusions.

Q. Is there anything in your product that helps with application dependency mapping?

A. Yeah, so just as I mentioned the relationships. Specifically, what we’re not doing is not we’re not doing app instrumentation. Well, what we do is, we’ll take a system that an app is running on, or the app themselves may be monitoring and then identify the dependencies of that. And just like I mentioned with the database it’ll go all the way through the stack, whether it’s a cloud stack or it’s on-prem. We can
identify the different components and how they’re related.

Q. You’re going from the app down into the infrastructure as opposed to like a New Relic, which is going app to app for a basis?

A. Yes, yep.

Q. So in application dependency mapping, they’re saying whatever these
metrics can see you can see. So it’s not necessarily communication-based. It’s whatever metrics these tools are actually spitting out. So if I’m doing a database, I may know who is querying me, but if I’m not monitoring the application, or wherever was querying me, I may not know what’s calling it. It’s a very limited map.

A. Sure. I mean I think it’s fair. We’re not automatically reaching out to that application. But if you are, if you do have it monitored within the environment, We’ll automatically make those links.

Q. It’s whatever you can see we’ll link, but there could get a lot of missed, a lot of gaps?

A. Sure. And that’s, you know, we try and cover those. That’s why we have 150 different integrations. We’re always trying to find those gaps, and then fill them in.

Q. Okay. Can you describe a way that your tools, or anything like it, even, would tell me what I’m missing? So that I knew “oh I need that particular integration to make this work better.”

A. I think that it would be apparent in a visual map of it. So if you look at
something like…

Q. Love to see the relationships and they know it’s apparent that it may be a gap. It’s not apparent what tool I need to measure that, monitor that. See the problem is a lot of people that look at monitoring data are not necessarily the application developers–depending on how they’re used. The bigger the organization, they’re very silent like that. It would be great if you actually said “Oh you know what this is a database. It’s MySQL. You need this tool to monitor MySQL better. That’s a gap you have.” or “Hey we notice that this is being called in a certain way that recognizes it as SharePoint. This is the tool you need to monitor SharePoint.” To tell us more than just saying “Oh, it’s a parent. It’s a gap.” Well, yeah gaps are apparent, but how we fill those gaps are not apparent.

A. Sure.

Q. Because not all people that do the monitoring, just don’t know the applications.

A. Okay.

Q. This is the same problem. We have data protection people doing backup. Don’t
know the applications, or just do what they’re told. Where we need the tools, smarter, to tell us what we’re missing and how to fill those. So that comes back full circle. It’s like, “oh you know what you needed this. Let me put it in automatically for you.” Or get some workflow. So if, you know what, the next
time we do this…here’s the workflow to put that in place knowing that.

A. But Blue Medora’s in the marketplace, for VMware’s marketplace. So as
long as you have that set of integrations done, you’ll least get visibility to see
what toolsets are available. It may not match one-for-one but you can only browse your data.

Q. But again, that’s a human involved. In a large environment, you don’t want a human involved. I want the tool to tell me what I’m missing.

A. I think auto-discovery is one of the things that comes up a lot. That partially addresses what you’re talking about. It’s something that we do on an individual technology level, and we were moving towards doing that across the board.

Q. Seems like that’s a perfect application of what BindPlane can do is to pick out what’s missing. I’m not, naturally, asking for the individual management pack to tell me that there’s something missing over here. BindPlane could probably tell me what’s missing.

A. Yeah. So, I’ll go through some of these pretty quickly, but I wanted to hit on a few key points that we got from this in this research from this survey that we did. One was we asked, you know, “how many unique monitoring tools do you run inside of your environments?”

And what we saw, this wasn’t a big surprise, was that, you know, 60% of the customers we spoke with had over three monitoring platforms. So like I said, that’s not a surprise in and of itself, but what was interesting to me was when we
followed up and said, “how many of you are planning to consolidate?” What we’ve been
hearing in the past is usually everyone says, “yeah, well we all want to consolidate it down to one.” That is sort of the Holy Grail. And that’s not the case as strongly as it wasn’t in the past. Now what we’re hearing is…well, 50% of the folks we talked to said, “no we’re not planning to combine anymore because every single one of these tools is critical to us and serves a different need inside of the environment.” I think that was something that was that was really interesting. I think that we’ve seen it with our customer in the past, but it wasn’t quite as clear
as what we saw through this survey.

Several of these are really looking at, “okay, with your monitoring integrations how
does it help you, in this case, improve resource utilization?” Alright, so some of these questions are somewhat vague, but we allow the user to define that on their own. And so we look at, alright, how many of them are showing that it’s, you know, 50 to 100 percent? And then what we did on top of that is…overlaid how many of those customers said that they are they are implementing all of the four components of Dimensional Data that we mentioned earlier. So do they have relationships internal and external, do they have an overview of the entire system today, are they using super metrics. And what we found with this was the vast majority that was in that top utilization performance. The people who are happiest with what they were seeing, were also implementing those four values in their monitoring environments.

Similarly, we saw we asked a question about how much your monitoring integrations improve your team’s productivity. So we’re narrowing this down to integration specifically and again in those those those folks in the top 50 to 100 percent they were heavily skewed towards folks who are fully deployed or who were looking at
integrations and all the different components of Dimensional Data that we mentioned earlier. Then when we finally asked cost savings. And so this is this is quite a range of companies, you know, some of them were…well, I think everyone was over 500 employees, but there was a range from there. Some of them were smaller, some of them were significantly larger.

So we asked, “how much cost savings can be attributed to that?” I mean this was interesting in that when we looked at the fully deployed. So folks that had followed all of the principles that we laid out. One in five of them saw one million-plus in savings per year. So this is pretty significant, to see a correlation between that
the amount fully deployed and then what the cost savings that they were seeing.

Q. Quick good question, what is fully deployed mean?

A. So what we’re saying there is of those four things that we mentioned. In the beginning, so we talked about: do they have relationships internal and external, do they have visibility across, you know, the same metrics available in all their platforms and super metric components.

Q. We have a question from the internet as well. Mike Ridge wants to ask, “is
there an API to trigger additional metric collection? So example, if there is
a problem with some discs.

A. Sure can you run an API call to get more info on the SAN disks, for example. In
some platforms, we have done that. We’ve done that inside of VMware, or if triggered actions in certain cases. We’ve also had situations where we can trigger a query plan for database monitoring.

If you want to do something, like build out a query plan, well that can be expensive. We can’t do that every collection. What we allow the user to do is go in
and request a special collection, where they get a query plan for a specific query on demand.

Q. So there’s API access. Is it supported by any sort of PowerShell or whatever they want to do.

A. It’s really its platform dependent. So it depends on which platform we’re
talking about. If we’re talking with VMware, then it is, you know, it’s CLI based and you can do it that way.

Q. So, yeah, you may have to jump through a few hoops.

A. So the majority of what we’re what we’re doing is monitoring focused and, so,
it’s not pushing over into automation.

Q. But I guess it stems from the fact that from the point that you know because you
could control what exactly what you’re monitoring or not monitoring. I guess that question means that typically wouldn’t be monitoring something, but then ad hoc basis you can enable something.

A. So, yes, that’s actually…I think I may have missed part of the question. So with BindPlane, for example, you know, and with a lot of our solutions customers have massive environments and so it’s not realistic to do manual configuration of everything there. So wherever possible we have an API as well alongside it that will allow you to you know give it a list, for example, or a standard pattern and it’ll
automatically configure those for you.

I guess it also is useful in the sense that when you say, for example, have certain recipes. I guess so which you can enable ad hoc when you want to monitor something for a period of time.

Yeah. In fact, you know, our DevOps team, we do this so often internally because we’re testing, you know, hundreds of different technologies. We need to turn them on and off, automate them, setting them up. So they built their own for VMware specifically a vROps CLI tool that would allow them to do that specifically and that’s available to all of our customers.

A. Thank you.

Get started

Try BindPlane for free for 30 days. No credit card required.

Sign up
Have questions?
Meet BindPlane

Dive deep into documentation, data sheets, videos and more.

Ready to buy?
Let us answer your questions

Want to talk to an expert? We’re ready!

Thank you for contacting us. Your information was received. We'll be in touch shortly.