Google Operations is Here! But Where’s Stackdriver?

by Craig Lee on March 9, 2020

Last week, Google Cloud Platform (GCP) announced that it rebranded its Stackdriver monitoring and logging platform that Google acquired in 2014, to be part of its new Google Operations platform. This rebrand included renaming Google Stackdriver Monitoring to Google Cloud Monitoring and Google Stackdriver Logs to Google Cloud Logging. So what does this mean for Stackdriver customers?

While I for one am excited to see Google pulling all of its operations products together, I also want to be clear that other than a few new feature releases, these products are in fact still Stackdriver! We are looking at this rebrand as essentially being Google Stackdriver 2.0. It allows Google to say goodbye to the Stackdriver brand as it fully embraces its Google-Esque naming conventions to make it clear what Stackdriver delivers. The new Google Cloud Operations SKU enables Google to take the monitoring and logging functionality that Stackdriver customers know and love and promote it to the”Googleverse so that other GCP customers can also benefit. 

This changing direction can be seen in the recent merging of the Stackdriver Metrics UI into the Google Cloud Console. A change that will make for a more unified experience in the Google Ecosystem. 

Google Cloud Platform

Google Monitoring is now available in the same console as all the other services.

BindPlane Logs and metrics will continue to integrate with Google Cloud Logging and Google Cloud Monitoring to support the extension of Google Cloud’s monitoring capabilities to on-prem, hybrid cloud and multi-cloud environments. This allows for GCP users to manage over 150+ of the most common non-GCP technology sources all within Google Cloud, enhancing the observability for users all within Google Cloud. One of the most exciting parts about this new release is that Google did add in a few feature updates that BindPlane customers have been asking for! Some of these features include:

  • Dashboard API to create and share dashboards across projects
  • Log storage for up to 10 years
  • Metrics retention for up to 24 months
  • Increased granularity of metric write-up to 10 seconds

As Google continues its momentum with Google Cloud Operations, one thing is for sure – whether we call it Stackdriver or Google Cloud Logging and Monitoring, BindPlane will continue to help GCP customers extend their visibility to on-prem and hybrid clouds to accurately troubleshoot, monitor and report, and real-time alert on their full-stack all within GCP!

Get Started with BindPlane for Google Cloud Logging & Monitoring:

Registering for BindPlane to support Google Cloud Operations is still the same process. You can signup for your BindPlane account here

To learn more about BindPlane, visit Google’s documentation for Google Operations or view BindPlane on the Google Marketplace.

Blue Medora: Connecting Health and Performance data with Any Monitoring or Analytics Platform

by bluemedora_admin on February 20, 2020

This article originally appeared on Intellyx

We last covered Blue Medora in May 2018. The company continues to pull metrics data from multiple sources, correlates them, and sends them to the monitoring or analytics platform of choice.

Today, Blue Medora offers two products with different go-to-market strategies: True Visibility for on-premises VMware deployments, and BindPlane, a SaaS offering that supports cloud and hybrid IT environments.

Blue Medora has also recently added log monitoring to its capabilities, rounding out its cloud-native observability story, which extends its hybrid IT capabilities.

BindPlane is also able to query the environments it has access to, identifying available sources of data automatically, thus dramatically simplifying installation and configuration.

The Cloud Podcast: Bring Order to Your Monitoring with BindPlane

by Mike Kelly on February 12, 2020

Update: Google Stackdriver is now Google Cloud Logging and Google Cloud Monitoring. BindPlane will continue to integrate and support both of these products.

A few weeks ago I sat down with Justin Brodley and Jonathan Baker, hosts of the Cloud Pod podcast to talk about BindPlane, and to discuss logging and monitoring. The podcast was posted this past week and I want to provide you a quick overview of the topics we covered and how BindPlane from Blue Medora fits into the mix. You can listen to the podcast here.

2020 ITOps Observability Discussion, The cloud pod, Blue medora, BindPlane, Mike Kelly

In order to get the most out of the podcast and the concepts we discussed, I want to quickly break down who Blue Medora is and share some background on our latest SaaS product, BindPlane. Blue Medora lives in the performance monitoring space for IT, but we don’t think of ourselves as the user’s primary platform. With our BindPlane product, we help customers expand their logging and monitoring to their preferred destinations like Google Cloud Platform (GCP) and New Relic to expand the aperture of what they are able to observe. Our goal is to reduce the requirements for monitoring tools so that an organization can understand its full environment with BindPlane. We do this by making it easy to deploy agents, monitor the different components an organization needs, and put it all into a single view within a customer’s preferred centralized location. 

Our latest release expands BindPlane to now monitor logs within Google Stackdriver and New Relic. BindPlane Logs allowed us to deploy a fully managed log agent that comes with pre-configured log bundles, which lets customers get everything up and running with only what they need and want to be monitoring. Helping customers take control of what they are monitoring, and the frequency, helps them to avoid the surprise bill that comes at the end of the month from over-monitoring. By making it easy for users to get started and to update, BindPlane separates itself from other open source solutions by putting the customer in control.

Now that you have a basic understanding of BindPlane, here are a few of the additional topics that we dive into on the podcast:

  1. Serverless Technology (Kubernetes v Serverless)
  2. Machine Learning and AI 
  3. Configuration and Compliance Management
  4. On-premise to Cloud Migration 
  5. The BindPlane Roadmap

I don’t want to give too much away, but as we discuss the above topics, I share our relevant customer use cases for how BindPlane is helping DevOps save time and cut costs with monitoring both logs and metrics. I would also like to thank Justin and Jonathan once more for being great hosts and I hope that you will get a lot out of our conversation!

If you are interested in learning more about BindPlane, you can request a demo or get started today using our documentation portal to guide your setup. 

Stackdriver Log Management with BindPlane

by Nate Coppinger on January 20, 2020

Update: Google Stackdriver is now Google Cloud Logging and Google Cloud Monitoring. BindPlane will continue to integrate and support both of these products.

Technical Contributions from Blue Medora Chief Systems Architect, Craig Lee.

This is part one of a three-part blog series on Log Management for Google Stackdriver with BindPlane

Proper Log Management

Whether you’re a multi-billion-dollar tech firm or a small startup, sorting your data in a usable and logical manner can prove to be a major challenge. No matter the size of your company, you are going to have individuals within your organization that only need access to specific data and information. Providing them with anything extra ends up slowing down workflows and creates digital clutter. When you first begin ingesting data with Google Stackdriver Logging, the amount of data can be overwhelming and you’ll almost immediately find a need for log management

Log Management, Google, Stackdriver, Stackdriver Logging, BindPlane logs

Unfiltered logs can greatly slow down the workflow when you are ingesting log data from multiple data centers, web apps, databases, etc.… especially when each team only requires data from specific sources. For example, without the proper ability to sort and filter your data between sources, it leaves your database team sifting through web-app and other irrelevant data to locate what they are searching for and vice versa. Even if you are pulling data from a single source, it can be difficult to find the specific log or log set that you need.

Users may be looking to keep an eye out for different severity levels, specific log types, or even a single specific log. All of this can be extremely difficult and time-consuming to sort out without proper log management. 71% of users reported that current tools give hints but rarely find root cause. They end up losing days of productivity in this fruitless search. This is where log management and tagging comes into play.

Why you need Log Tagging

Log tagging will prove to be one of the most useful tools in your log management arsenal. Once you have implemented tagging, all of your filtering challenges within Google Stackdriver will be a thing of the past. Google Stackdriver, out of the box, offers some basic tagging features. These allow users to sort logs by message severity, the namespace, the application that the log is sent from, and the respective Google Datacenter. These basic tags are a great tool to help teams sort out the majority of noise within their log ingestion, but still does not solve issues with sorting out individual log instances within these applications and data centers.

To really drill down into tagging and custom log management, that will require some custom FluentD work. Custom work can prove to be difficult without the assistance of a service like BindPlane. We’ll dig more into that in part 2 of the series. Below you will see an example if a Kubernetes log message with custom log tagging applied.

Example Kubernetes Log Message:

Namespace, node_name, container_name, and other tags are in the JSON Payload of the log message

Log Management, Google, Stackdriver, Stackdriver Logging, BindPlane logs

Log Management on a Global Scale

Tagging seems like such a simple concept, and you’re probably thinking it would be common sense to have it implemented. As we said earlier, creating custom tags and log management can be very difficult and time consuming without help. This was made evident when one of our clients came to us with the challenge of scaling their log monitoring on a global level.

Our client was running seven application services between two data centers in US East-1 and Europe West-2, ingesting logs from six data sources, being used by six different teams, that were running on 50 different servers. Wow, that was complicated to type out, let alone starting to manage the sorting of the data to the correct teams. Now you can see why tagging and log management are not as simple as it seems on paper. Currently, our client’s database admins have to administer their MySQL databases in US-East 1 but have no need to see irrelevant processes coming from order procurement in Europe West-2 or any of the multiple other teams working within the organization. But without proper sorting, critical signals will be lost as these teams manually sort through the flow of thousands of log messages. 

Custom Tagging with BindPlane

Basic tagging in Google Stackdriver logging is a good place to start for log management, but in this customer use case, it can get complicated to manage at scale. Implementing BindPlane to help monitor your logs allows you to easily customize your log tagging at any scale. BindPlane’s capabilities easily customize each individual source and can create templates to apply at any scale. These templates save you from having to manually recreate and input each tagging options into every single data source.

Once all of these customizations have been implemented to your different sources, users on different teams will save time and effort when trying to find exactly what they need. Now rather than the order processing team in South Carolina having to sort through the production team logs from Frankfurt , Germany manually, they can just filter logs coming from US-East 1, to the team they are related to (order processing team), or the specific application they are looking to grab log data from, such as MongoDB.

Example Log Message with Tags:

Note the bindplane_app, function, and location tags.

A screenshot of a cell phone

Description automatically generated

Effectively managing log data can be challenging if your organization does not have a way to easily tag and identify the logs that are important to them. Join us in part two of the series to learn how to customize your log tagging in BindPlane for Stackdriver Logging and how it can increase efficiency and completely change your workflow.

Troubleshoot MongoDB Replication Lag

by Nate Coppinger on December 18, 2019

Update: Google Stackdriver is now Google Cloud Logging and Google Cloud Monitoring. BindPlane will continue to integrate and support both of these products.

What is MongoDB Replication Lag?

MongoDB is no different from other databases as in the fact that it relies on data replication, and even if we had quantum computers at our disposal, there will always be at least a small amount of lag when replicating operations from the primary to secondary node. MongoDB replication lag is specifically the interval of time from when an operation is run on an application in the primary node, and the operation being applied to the application on the secondary node from the oplog.

Troubleshoot mongoDB Replication Lag, Replication Lag, Replication flow chart

Why are You Experiencing MongoDB Replication Lag?

MongoDB replication lag occurs when the secondary node cannot replicate data fast enough to keep up with the rate that data is being written to the primary node. This can occur for a few reasons, so it can be hard to pinpoint exactly why you are experiencing replication lag. Some of the main culprits include network latency, disk throughput, concurrency, and large amounts of data writes to MongoDB. Your MongoDB replication lag could be caused by something as simple as network latency, packet loss within your network, or a routing issue. Any of which could be slowing down the replication from your primary node to your secondary.

One of the leading causes of replication lag in multi-tenant systems is slow disk throughput. If the filesystem on the secondary disk can’t replicate the data to the disks as fast as the primary, the secondary will have issues in keeping up. Disks may also run out of memory, I/O and CPU, keeping data from being written to secondary node disks and letting them fall further behind the primary.

Concurrency strikes again! As we mentioned in our last blog on MongoDB Lock Percentage, concurrency (while entirely supported and well-handled on MongoDB with their granularity) can sometimes cause unintended consequences within your system. In this case, large and long running write operations will lock up the system and block the replication to secondaries until complete, increasing replication lag. Similar to concurrency, when running frequent and large write operations, the secondary node disk will be unable to read the oplog as fast as the primary is being written to and fall behind on replication.

When and Why Should you be Concerned?

As stated above, even with the most powerful computers and databases at your fingertips, you will see some sort of replication lag, but the question is, how much lag is too much lag? Ideally, in a healthy system, MongoDB replication lag should remain as close to zero as possible, but that’s not always going to be possible. Sometimes, the secondary nodes may lag behind the primary, but will usually fix themselves without any intervention necessary, and that’s perfectly normal. However, if MongoDB replication lag persistently stays high and continues to rise, then you will need to step in and remedy the situation before the quality of your database begins to degrade and you have even more problems on your hand.

Troubleshoot mongoDB Replication Lag, Replication Lag, Replication lag fixes itself

You need to stay on top of this for a number of reasons. As you probably know, the main reason you have secondary nodes is for it to take over if your primary node is no longer apart of the majority active set and steps down, or it flat out fails. You will not want an out of date secondary node taking over for your primary node, and if it is so far behind that your database won’t function correctly, then you may even have to take down your entire database until your new primary node is updated, or the old one is recovered. To go along with this point, if your secondary node falls too far behind from your primary node, and operations are not being replicated and kept up-to-date, then if your primary fails and can’t be recovered, there will be a large amount of manual reconciliation that will need to be done which will take a lot of time, resources and create headaches for everyone involved.

How to Monitor and Minimize MongoDB Replication Lag

Make sure to use all of your tools at your disposal when minimizing MongoDB replication lag. Since the Opslogs has limited space, you won’t want to let it fall too far behind, or else the secondary node can’t catch up with the primary, and if this occurs, you will need to run a full sync. It’s important to avoid a full sync at all costs since it can be extremely expensive. There are a few different methods to make sure you’re on top of MongoDB replication lag and keep it from getting out of hand. First you will want to frequently check where the replication lag interval is sitting. To check on the current replication rate, use this command in the mongo shell that is connected to the Primary Node: rs.printSlaveReplicationInfo(). This will return the ‘syncedTo’ value for each member and when the most recent oplog entry was written to the secondary.

If you don’t want to manually check MongoDB replication lag every day, consider monitoring it with a service such as Google Cloud’s Stackdriver with BindPlane. BindPlane works in tandem with the leading data monitoring services to allow you to monitor metrics such as “Slave Delay” and “replication count” within and allows you to create intelligent alerts. By definition replication lag is the amount of time it takes for the secondary to read the opslog and replicate the data from the primary node, so if it takes 15 minutes for the secondary node to read the opslog, then there is a 15 minute replication lag time. This would obviously be too much time, so to mitigate it, you can set alerts to let you know when the replication interval exceeds your ideal time, 2-5 minutes for example. Along with monitoring MongoDB replication lag time directly, you can also monitor the disk related metrics of the secondary node to make sure they keep up with the primary. For example, you can monitor metrics such as Network I/O, CPU, and Disk Space. Alerts can be set up for these as well to keep you on your toes and help you stop excessive replication lag before it happens.

Troubleshoot mongoDB Replication Lag, Replication Lag, Replication lag in stackdriver, Google Stackdriver
Thank you for contacting us. Your information was received. We'll be in touch shortly.