How to Measure Ruby App Performance with New Relic
It’s 12pm on a Monday. Your boss walks by: “The site feels…slow. I don’t know, it just does.” Hmmm. You riposte with the classic developer reply: “Well, it’s fast on my local machine.” Boom! Boss averted!
Unfortunately, you were a little dishonest. You know better than to think that speed in the local environment has anything to do with speed in production. You know that, right? Wait, we’re not on the same page here?
Several factors can cause Ruby applications to have performance discrepancies between production and development:
- Application settings, like code reloading Rails and most other Ruby web frameworks reload (almost) all of your application code on every request to pick up on changes you’ve made to files. That’s a pretty slow process. In addition, there a lot of subtle differences between apps in development and production modes, especially surrounding asset pipelines. Simpler frameworks may not have these behaviors, but don’t kid yourself - if anything in your app is changing in response to “RACK_ENV”, you could be introducing performance problems that you can’t catch in development.
- Caching behavior Rails disables caching in development mode by default. Obviously, turning that on has a big performance impact in production. In addition, caches in development work differently than caches in production, mostly due to the introduction of network latency. Even 10ms of network latency between your application server and your cache store can cripple pages that make many cache calls.
- Differences in data
This is an insidious one, and the usual cause for an app that seems slow in production but fast in development. Sure, that query you run locally (User.all, for example) only returns 100 rows in development using your seed data. But in production, that query could return 10,000 or 100,000 rows! In addition, consider what happens when those 10,000 rows you return need to be attached to 10,000 other records because you used
includesto pre-load them. Get ready to wait around.
- System configuration and resources Unless you’re using containers, system configuration will always differ between environments. This can even be as subtle as utilities being compiled with different compiler flags! Of course, even containers will run on different physical hardware, which can have severe performance consequences, especially regarding threading and concurrency.
- Virtualization Most people deploy to shared, virtualized environments nowadays. Unfortunately, that means a physical server will share resources with up to half-a-dozen or so virtual servers, which can negatively and unpredictably impact performance when one virtualized server is hogging up the resources available.
So what’s a developer to do? Why, install a performance monitoring solution in production! NewRelic is the tool I reach for. Not only is it free to start with, the tools included are extensive, even at the free level. In this post, I’m going to give you a tour of each of NewRelic’s features and how they can help you to diagnose performance hotspots in a Rails app.
Full disclosure - I don’t work work New Relic, and no one from New Relic paid for or even talked to me about this post. I haven’t used Skylight, New Relic’s biggest competitor in this space, so I can’t give you a good comparison of the two or their features. I hope to someday do a post on Skylight, but I’ll need a production app I can use it on first.
Pareto and Zipf
Before we get into any code or fancy graphs, though, I want to talk about principles. Actually, I want to tell you about an American linguist and an Italian economist.
George Kingsley Zipf was an American philologist that studied languages using a new and interesting field at his time - statistics. Zipf’s novel idea to apply statistics to the study of language landed him an astonishing insight: in nearly every language, some words are used a lot, but most (nearly all) words are used hardly at all. That is to say, if you took every English word ever written and plotted the frequency of words used as a histogram, you’d end up with a graph that looked something like what you see to the right. It’s a power law.
The Brown Corpus is 500 samples of English-language text comprising 1 million words. But just 135 unique words are needed to account for 50% of those million. That’s insane.
If you take Zipf’s probability distribution and make it continuous instead of discrete, you get the Pareto distribution.
Many of you probably see where I’m going with this by now. Stay with me.
The Pareto distribution, pictured at right, has been found to hold for a scary number of completely different and unrelated fields in the sciences. For example, here are some natural phenomena that exhibit a Pareto (power law) distribution:
- Wealth inequality
- Sizes of rocks on a beach
- Hard disk drive error rates (!)
- File size distribution of Internet traffic (!!!)
We tend to think of the natural world as random or chaotic. But often, it is anything but. Many probability distributions, in the wild, support the Pareto Principle:
80% of the output will come from 20% of the input
While you may have heard this before, what I’m trying to get across to you is that isn’t made up. The Pareto distribution is the real deal - utilized in hundreds of otherwise completely unrelated scientific fields - and we can use it’s ubiquity to our advantage.
Allow me to reformulate and apply this to web application performance:
80% of an application’s work occurs in 20% of it’s code.
I pity the fool that prematurely optimizes their application!This is why premature optimization is so bad and why performance monitoring, profiling and benchmarking are so important. What the Pareto Principle reveals to us is that optimizing any random line of code in our application is in fact unlikely to speed up our application at all! 80% of the “slowness” in any given app will be hidden away in a minority of the code. So instead of optimizing blindly, applying principles at random we read from blog posts or engaging in Hacker-News-Driven-Development by using the latest and “most performant” web technologies, we need to measure where the bottlenecks and problem areas are in our application.
Repeat after me: I will not optimize anything in my application until my metrics tell me so.
Getting an Overview
Let’s walk through the process I use when I look at a Ruby app on NewRelic.
When I first open up a New Relic dashboard, I’m trying to establish the broad picture: How big is this application? Where does most of its time go? Are there are any “alarm bells” going off just on the main dashboard?
New Relic uses a couple of terms that we’ll need to define:
Transactions This is New Relic’s cross-platform way of saying “response”. In Rails, a single “transaction” would be a single response from a controller action. Transactions from a Rails app in NewRelic look like “WelcomeController#index” and so on.
Real-User Monitoring (also RUM and Browser monitoring)
Response time - where does it go?
The web transaction response time graph is one of the most important on NewRelic, and forms the broadest possible picture of the backend performance of your app. NewRelic defaults to 30 minutes as the the timeframe, but I immediately change this to the longest interval available - preferably about a month, although 7 days will do.
The first thing I’ll look at here is the app server and browser response averages. Here are some rules of thumb for what you should expect these numbers to be in an average Rails application:
|App server avg response time||Status|
Of course, those numbers are just rules of thumb for Rails applications that serve up HTML - your typical “Basecamp-style” application. For simple API servers that serve JSON only, I might divide by 2, for example.
|Browser avg load time||Status|
|< 3 sec||Fast!|
|< 6 sec||Average|
|> 6 sec||Slow!|
I can hear the keyboards clattering already furiously emailing me: “That’s so slow! Rails sucks! Blah blah…”
I’m just sharing what I’ve seen in the wild in my own experience. Remember - Github, Basecamp and Shopify are all enormous WebScale™ Ruby shops that average 50-100ms responses, which is pretty good by anyone’s measure.
Based on what I’m seeing with these numbers, I know where to pay attention later on. For example, if I notice a fast or average backend but slow browser (real-user monitoring) numbers, I’ll go look at the browser numbers next rather than delving deeper into the backend numbers.
Note that most browser load times are 1-3 seconds, while most application server response times are 1-300 milliseconds. Application server responses, on average, are just 10% of the end-users total page loading experience. This means front-end performance optimization is actually far more important that most Rails developers will give it credit for. Back-end optimization remains important for scaling (lower response times mean more responses per second), but when thinking about the browser experience, they usually mean vanishingly little.
Next, I’m considering the shape of the response time graph. Does the app seem to slow down at certain times of day or during deploys?
The most important part of this graph, though, is to figure out how much time goes to what part of the stack. Here’s a typical Ruby application - most of its time is spent in Ruby. If I see an app that spends a lot of time in the database, web external, or other processes, I know there’s a problem. Most of your time should be spent in Ruby (running Ruby code is usually the slowest part of your app!). If, for example, I see a lot of time in web external, I know there’s probably a controller or view that’s waiting, synchronously, on an external API. That’s almost never necessary and I’d work to remove that. A lot of time in request queueing means you need more servers, because requests are spending too much time waiting for an open application instance.
Percentiles and Histograms
The histogram makes it easy to pick out what transactions are causing extra-long response times. Just click the histogram bars that are way far out to the right and pay attention to what controllers are usually causing these actions. Optimizing these transactions will have the biggest impact on 95% percentile response times.
Most Ruby apps response time histograms look like an power curve. Remember what I said above about Pareto. So, conversely, be sure to check out what actions take the least amount of time (the histogram bar furthest to the left). Are they asset requests? Redirects? Errors? Is there any way we can not serve these requests (in the case of assets, for example, you should be using a CDN)?
What realm of RPM are we playing in?
What it looks like optimizing a high-scale app in production It’s always helpful to check what “order of magnitude” we’re at as far as scale. Here are my rules of thumb:
|Requests per minute||Scale|
|< 10||Tiny. Should only have 1 server or dyno.|
|10 - 1000||Average|
|> 1000||High. “Just add more servers” may not work anymore.|
Apps above 1000 RPM may start running into scaling issues outside of the application in external services, such as databases or cache stores. When I see scale like that, I know my job just got a lot harder because the surface area of potential problems just got bigger.
Note that the top 3 transactions account for 2/3 of time consumed Now that I’ve gotten the lay of the land, I’ll start digging into the specifics. We know the averages, but what about the details? At this stage, I’m looking for my “top 5 worst offenders” - where does the app slow to a crawl? What’s the 80/20 of time consumed in this application - in other words, in what actions does this application spend 80% of its time?
Most Ruby applications will spend 80% of their time in just 20% of the application’s controllers (or code). This is good for us performance tweakers - rather than trying to optimize across an entire codebase, we can concentrate on just the top 5 or 10 slowest transactions.
For this reason, in the transactions tab, I almost always sort by most time consuming. If the top 5 actions in this tab consume 50% of the server’s time (they almost always do), and we speed them up by 2x, we’ve effectively scaled the application up by 25%! That’s free scale.
Alternatively, if an application is on the lower end of the requests-per-minute scale, I might sort by slowest average response time instead. This sort also helps if you’re concentrating on squashing 95th percentiles.
I’m carrying that “worst offender” mindset into the database. Now, if the previous steps have shown that the database isn’t a problem, I may glaze over this section or just try and make sure it’s not a single query that’s taking up all of our database time. Again, “most time consuming” is probably the best sort here.
Here’s some symptoms you might see here:
- Lots of time in #find If your top SQL queries are all model lookups, you’ve probably got a bad query somewhere. Pay attention to the “time consumption by caller” graph on the right - where is this query being called the most? Go check out those controllers and see if you’re doing a WHERE on a column that hasn’t been properly indexed, or if you’ve accidentally added an N+1 query.
- SQL - OTHER You may see this one if you’ve got a Rails app. Rails periodically issues queries just to check if the database connection is active, and those queries show up under this “OTHER” label. Don’t worry about them - there isn’t really anything you can do about it.
What I’m looking for here is to make sure that there aren’t any external services being pinged during a request. Sometimes that’s inevitable (payment processing) but usually it isn’t necessary.
Most Ruby applications will block on network requests. For example, if to render my cool page, my controller action tries to request something from the Twitter API (say I grab a list of tweets), the end user has to wait until the Twitter API responds before the application server even returns a response. This can delay page loading by 200-500ms on average, with 95th percentile times reaching 20 seconds or more, depending on what your timeouts are set at.
For example, what I can tell from this graph is that Mailchimp (purple spikes in the graph to the right) seems to go down a lot. Wherever I can, I need to make sure that my calls to Mailchimp have an aggressive timeout (something like 5 seconds is reasonable). I may even consider coding up a Circuit Breaker. If my app tries to contact Mailchimp a certain number of times and times out, the circuit breaker will trip and stop any future requests before they’ve even started.
GC stats and Reports
To be honest, I don’t find New Relic’s statistics here very useful. You’re better off with a tool like
memory_profiler. I don’t find New Relic’s “average memory usage per instance” graph very accurate for threaded or multi-process setups either.
If you’re having issues with garbage collection, I recommend debugging that in development rather than trying to use New Relic’s tools to do it in production. Here’s an excellent article by Heroku’s Richard Schneeman about how to debug memory leaks in Ruby applications.
In addition, I’m not going to cover the Reports, as they’re part of New Relic’s (rather expensive) paid plans. For what it’s worth, they’re pretty self-explanatory.
Browser / Real user monitoring (RUM)
Remember how we applied an 80/20 mindset to the top offenders in the web transactions tab? We want to do the same thing here. Change the timescale on the main graph to the longest available. Instead of the percentile graph (which is the default view), change it to the “Browser page load time” graph that breaks average load time down by its components.
- Request queueing Same as the web graph. Notice how little of an impact it usually has on a typical Ruby app - most queueing times are something like 10-20ms, which is just a minuscule part of the average 5 second page load.
- Web application This is the entire time taken by your app to process a request. Also notice how little time this takes out of the entire stack required to render a webpage.
- Network Latency. For most Ruby applications, average latency will be longer than the amount of time spent queuing and responding! This number includes the latency in both directions - to and from your server.
- DOM Processing This is usually the bulk of the time in your graph. DOM Processing in New Relic-land is the time between your client receiving the full response and the
asynctag. For more about getting rid of that, check out Google. In addition,
DOMContentReadyusually also gets slowed down by external CSS. Note that, in most browsers, the page pretty much still looks like a blank white window at this point.
- Page Rendering Page Rendering, according to NewRelic, is everything that happens between the
DOMContentReadyevent and the
loadwon’t fire until every image, script, and iframe is fully ready. So, the browser may have started displaying at least parts of the page before this is finished. Note also that
loadalways fires after
$(document).readyattaches functions to fire after
DOMContentLoaded, for example).
For a full guide to optimizing front-end performance issues you find here, see my extensive guide on the topic.
It’s important to note that while most users won’t see anything of your site until at least DOM Processing has finished, they probably will start seeing parts of it during Page Rendering. It’s impossible to know just how much of it they see. If your site has a ton of images, for example, Page Rendering might take ages as it downloads all of the images on the page.
NewRelic, and other production performance monitoring tools like it, is an invaluable tool for the performance-minded Rubyist. You simply cannot be serious about speed and not have a production profiling solution installed.
As a takeaway, I hope you’ve learned how to apply an 80/20 mindset to your Ruby application. This mindset can be applied at all levels of the stack, but don’t forget - profiling that isn’t based on what the end-user experience isn’t based in reality. That’s why, for a browser-based application, we should be paying attention first to our browser experience, not to our backend, even if that’s sometimes easier to measure.
Want a faster website?
I'm Nate Berkopec (@nateberkopec). I write online about web performance from a full-stack developer's perspective. I primarily write about frontend performance and Ruby backends. If you liked this article and want to hear about the next one, click below. I don't spam - you'll receive about 1 email per week. It's all low-key, straight from me.
Products from Speedshop
The Complete Guide to Rails Performance is a full-stack performance book that gives you the tools to make Ruby on Rails applications faster, more scalable, and simpler to maintain.Learn more
The Rails Performance Workshop is the big brother to my book. Learn step-by-step how to make your Rails app as fast as possible through a comprehensive video and hands-on workshop. Available for individuals, groups and large teams.Learn more
I've written a new book, compiled from 4 years of my email newsletter.
MRI Ruby's Global VM Lock: frequently mislabeled, misunderstood and maligned. Does the GVL mean that Ruby has no concurrency story or CaN'T sCaLe? To understand completely, we have to dig through Ruby's Virtual Machine, queueing theory and Amdahl's Law. Sounds simple, right?
Programmers vaguely realize that 'premature optimization is bad'. But what is premature optimization? I'll argue that any optimization that does not come from observed measurement, usually in production, is premature, and that this fact stems from natural facts about our world. By applying an empirical mindset to performance, we can...
I've taught over 200 people at live workshops, worked with dozens of clients, and thousands of readers to make their Rails apps faster. What have I learned about performance work and Rails in the process? What makes apps slow? How do we make them faster?
Many Rails developers don't understand what causes ActiveRecord to actually execute a SQL query. Let's look at three common cases: misuse of the count method, using where to select subsets, and the present? predicate. You may be causing extra queries and N+1s through the abuse of these three methods.
I've completed the 'second edition' of my course, the CGRP. What's changed since I released the course two years ago? Where do I see Rails going in the future?
Memory fragmentation is difficult to measure and diagnose, but it can also sometimes be very easy to fix. Let's look at one source of memory fragmentation in multi-threaded CRuby programs: malloc's per-thread memory arenas.
Application server configuration can make a major impact on the throughput and performance-per-dollar of your Ruby web application. Let's talk about the most important settings.
Choosing a new web framework or programming language for the web and wondering which to pick? Should performance enter your decision, or not?
Have you ever wondered how the heck Ruby's GC works? Let's see what we can learn by reading some of the statistics it provides us in the GC.stat hash.
Full HTTP/2 support for Ruby web frameworks is a long way off - but that doesn't mean you can't benefit from HTTP/2 today!
WebFonts are awesome and here to stay. However, if used improperly, they can also impose a huge performance penalty. In this post, I explain how Rubygems.org painted 10x faster just by making a few changes to its WebFonts.
The total size of a webpage, measured in bytes, has little to do with its load time. Instead, increase network utilization: make your site preloader-friendly, minimize parser blocking, and start downloading resources ASAP with Resource Hints.
One of the most important parts of any webpage's performance is the content and organization of the head element. We'll take a deep dive on some easy optimizations that can be applied to any site.
Your website is slow, but the backend is fast. How do you diagnose performance issues on the frontend of your site? We'll discuss everything involved in constructing a webpage and how to profile it at sub-millisecond resolution with Chrome Timeline, Google's flamegraph-for-the-browser.
rack-mini-profiler is a powerful Swiss army knife for Rack app performance. Measure SQL queries, memory allocation and CPU time.
Most "scaling" resources for Ruby apps are written by companies with hundreds of requests per second. What about scaling for the rest of us?
Ruby apps in the memory-restrictive and randomly-routed Heroku environment don't have to be slow. Achieve <100ms server response times with the tips laid out below.