Update: Google Stackdriver is now Google Cloud Logging and Google Cloud Monitoring. BindPlane will continue to integrate and support both of these products.
RabbitMQ is a great tool to have for resilient message passing in your system instead of HTTP calls that can get lost in the void. It also sets you up nicely to drive your systems with events for looser coupling among workflow steps. In this post, we’re going to look at how we can quickly get RabbitMQ up and running when using Google Cloud Platform, also known as GCP.
First, we’ll look at what options we have to install RabbitMQ. Then we’ll look at how to run and connect to it. After that, we’ll talk about how to monitor our queues for efficiency with Stackdriver. Since I’m new to GCP, we’ll be going through this with the eyes of a GCP beginner.
First, we want to quickly run through how to install RabbitMQ onto GCP. There are a few options to do this: as a Docker container, as a series of VMs, or as a Kubernetes cluster on Google Kubernetes Engine. We’ll be taking the third option: Google Kubernetes Engine, also known as GKE.
We’re choosing to install RabbitMQ onto GKE because GKE will handle most of the low-level details. This will allow us to maintain the best level of flexibility for most business applications. However, we do need to trade off low-level knowledge with Kubernetes knowledge. If you’re new to Kubernetes, like I am, this little book can get you started.
With knowledge of Kubernetes in tow, we’ll be taking the path of many great IT administrators: installing via the command line. If you’re already using GCP, you may know that you can access Cloud Shell from your browser within the GCP web UI. This allows us to do manual installs without needing to set everything up on our local workstation. However, for business systems, I do recommend scripting the installation out, version controlling it, and automating it as part of environment provisioning. While we’ll be installing via the command line in Google’s Cloud Shell, I’ll also show how the instructions look from the web UI.
For the vast majority of our installation, we’ll use Google’s RabbitMQ Cluster User Guide. This gives you guidance whether you prefer to install via the UI or command line. Follow the installation instructions up to getting the cluster status. The status should tell us that everything is up and running. Ensure that you enable Stackdriver Exporting. This is key for later when we talk about how to run RabbitMQ efficiently. Your status should look something like this:
There is one caveat to the process: I did run into an issue with the stateful set the first time I installed RabbitMQ via the UI. It didn’t properly provision a persistent volume, aka disk storage space, for the pod. Installing via the command line did take longer, but I didn’t run into any snags.
Here’s what it looks like when installing via the UI:
Once the status is good, we can figure out how to connect our applications to RabbitMQ. This depends on where your app is located.
The first step to connecting our applications is to get appropriate credentials. GKE stores this in a configuration secret called $INSTANCE_NAME-rabbitmq-secret. Follow the instructions for authentication and copy the password somewhere secure. The username is what we already specified as part of the installation process.
The next step to connecting our applications is to get the right URL wired up. If your application is also on GKE, you can cohesively manage and monitor everything together. In this case, we’ll use port-forwarding. Follow access option 2 to make RabbitMQ accessible to your application’s pod. You can check access from within the Cloud Shell by hitting http://127.0.0.1:15672 and logging in to the admin site with the credentials.
If your app is hosted elsewhere on GCP or even outside of GCP, we’ll follow the instructions on exposing the service externally. You can log in from your browser at http://[EXTERNAL IP]:15672 to ensure the service is working.
Now that we’re up and running, there are many different ways to add and maintain RabbitMQ exchanges and queues. The best ways are highly sensitive to the type of application you have and frameworks you use. Most languages have an admin SDK you can use to configure RabbitMQ. You can also use the HTTP API no matter your language or framework.
Whatever you use to configure, I recommend you version control the changes and automate them. This is often done as part of your continuous delivery pipeline.
We made quite some progress getting RabbitMQ installed and our application able to connect to it. Now we want to ensure we run RabbitMQ efficiently. To us, “efficiently” means that we’re only paying for what we need in order to meet our service-level objectives. There are many aspects of performance we can look at to monitor efficiency, but the main two we care about are message lead times and utilization. We’ll look intelligently at metrics through Stackdriver since we enabled this property during installation.
Message lead time is like web request latency. If it’s too slow, people may experience a laggy application and look for alternatives. Unfortunately, the metrics that come out of RabbitMQ to Stackdriver don’t directly tell us what these times are. Instead, we can derive them from knowing the count of messages in the queue is per time period. We call this queue depth. Ideally, our queue depth should be low for most parts of the day, with occasional spikes during busy periods. We should see these spikes quickly go back down, assuring us that we have low lead time per message.
Here is what queue depth can look like in Stackdriver:
In this chart, I break down depth by queue. I also track total messages, ready messages, and unacknowledged messages. This lets me detect errors in consumers. For example, a lot of unacknowledged messages could mean a bad consumer. In this chart, I only have one published message, so it’s not that exciting.
Using this data, I can make a few changes to my queuing to make it efficient. The main change is often to add more consumers to a specific queue where the depth builds up too consistently. But by monitoring real usage, I only add more consumers to the queues that need it and when they need it.
The next aspect of efficiency is utilization. This is how much of a resource, mainly CPU and memory, RabbitMQ uses up. Since RabbitMQ is stateful, meaning it persists the state of its message queues, we want to vertically scale pod CPU and memory capacity up or down. Kubernetes can actually autoscale these using its vertical pod autoscaler. While Kubernetes can simplify our scaling process, we still want to keep an eye on our utilization. We can do this directly on the pods through Stackdriver:
Ideally, both our CPU and memory usage are below 80%; otherwise, we can expect exponential slowdowns. If these charts go above the 80% threshold, we may need to tweak our autoscaling settings. Besides this, most scaling in RabbitMQ will be done at the consumer/queue level.
If these metrics are at too high a level, we can add more of our exported RabbitMQ metrics that show utilization by exchange or queue.
Even though I’m new to both GCP and Kubernetes, I was able to understand and deploy a working RabbitMQ service in under two hours. I never cease to be amazed at how far we’ve come in the provisioning of new infrastructure. In my early career, something like this would have taken days or weeks to procure the software. Then it would take days or hours to install and start up the service. Finally, it would take months of painful operational errors for us to get the scale and settings “just right.” Although GKE has a learning curve, the flexibility and power we have to keep our RabbitMQ healthy is amazing.
|This post was written by Mark Henke. Mark has spent over 10 years architecting systems that talk to other systems, doing DevOps before it was cool, and matching software to its business function. Every developer is a leader of something on their team, and he wants to help them see that.|