Understanding Ruby GC through GC.stat

by Nate Berkopec (@nateberkopec) of (who?), a Rails performance consultancy.
Summary: Have you ever wondered how the heck Ruby's GC works? Let's see what we can learn by reading some of the statistics it provides us in the GC.stat hash. (1560 words/8 minutes)


I call that an object leak.
Most Ruby programmers don’t have any idea how garbage collection works in their runtime - what triggers it, how often it runs, and what is garbage collected and what isn’t. That’s not entirely a bad thing - garbage collection in dynamic languages like Ruby is usually pretty complex, and Ruby programmers are better off just focusing on writing code that matters for their users.

But, occasionally, you get bitten by GC - either it’s running too often or not enough, or your process is using tons of memory but you don’t know why. Or maybe you’re just curious about how GC works!

One way we can learn a bit about garbage collection in CRuby (that is, the standard Ruby runtime, written in C) is to look at the built-in GC module. If you haven’t read the docs of this module, check it out. There’s a lot of interesting methods in there. But for right now, we’re just going to look at one: GC.stat.


Me, reading MRI source.

GC.stat outputs a hash with a bunch of different numbers, but none of these numbers are really well documented, and some are just completely confusing unless you actually read the C code for Ruby’s GC! Rather than having you do that yourself, I’ve done it for you. Let’s take a look at all the information in GC.stat and see what we can learn about GC in Ruby.

Here’s what my GC.stat looks like in a just-booted irb session using Ruby 2.4.0:

{
  :count=>15,
  :heap_allocated_pages=>63,
  :heap_sorted_length=>63,
  :heap_allocatable_pages=>0,
  :heap_available_slots=>25679,
  :heap_live_slots=>25506,
  :heap_free_slots=>173,
  :heap_final_slots=>0,
  :heap_marked_slots=>17773,
  :heap_eden_pages=>63,
  :heap_tomb_pages=>0,
  :total_allocated_pages=>63,
  :total_freed_pages=>0,
  :total_allocated_objects=>133299,
  :total_freed_objects=>107793,
  :malloc_increase_bytes=>45712,
  :malloc_increase_bytes_limit=>16777216,
  :minor_gc_count=>13,
  :major_gc_count=>2,
  :remembered_wb_unprotected_objects=>182,
  :remembered_wb_unprotected_objects_limit=>352,
  :old_objects=>17221,
  :old_objects_limit=>29670,
  :oldmalloc_increase_bytes=>46160,
  :oldmalloc_increase_bytes_limit=>16777216
}

Ok, there’s a lot there. That’s 25 undocumented keys! Yay!

First, let’s talk about the GC counts:

{
  :count=>15,
  # ...
  :minor_gc_count=>13,
  :major_gc_count=>2
}


RGenGC, introduced in Ruby 2.1. Slide from Koichi Sasada.
These are pretty straightforward. minor_gc_count and major_gc_count are just counts of each type of GC since the start of this Ruby process. In case you didn’t know, since Ruby 2.1 there have been two types of garbage collections, major and minor. A minor GC will only attempt to garbage collect objects which are “new” - that is, they have survived 3 or less garbage collection cycles. A major GC will attempt to garbage collect all objects, even ones which have already survived more than 3 GC cycles. count will always equal minor_gc_count + major_gc_count. For more about this, see my talk at FOSDEM about the history of Ruby Garbage Collection.

Tracking GC counts can be useful for a few reasons. We can figure out if a particular background job, for example, always triggers GCs (and how many it triggers). For example, here’s a Rack middleware that logs the number of GCs that occurred while a web request was processing:

class GCCounter
  def initialize(app)
    @app = app
  end

  def call(env)
    gc_counts_before = GC.stat.select { |k,v| k =~ /count/ }
    @app.call(env)
    gc_counts_after = GC.stat.select { |k,v| k =~ /count/ }
    puts gc_counts_before.merge(gc_counts_after) { |k, vb, va| va - vb }
  end
end

This won’t be 100% accurate if your application is multithreaded, because another thread executing may have actually created the memory pressure which triggered these GC’s, but it’s a starting point!

Now, let’s move on to the heap numbers.

{
  # Page numbers
  :heap_allocated_pages=>63,
  :heap_sorted_length=>63,
  :heap_allocatable_pages=>0,

  # Slots
  :heap_available_slots=>25679,
  :heap_live_slots=>25506,
  :heap_free_slots=>173,
  :heap_final_slots=>0,
  :heap_marked_slots=>17773,

  # Eden and Tomb
  :heap_eden_pages=>63,
  :heap_tomb_pages=>0
}

In this context, the heap is a C data structure, sometimes also called the ObjectSpace, in which we keep references to of all the currently live Ruby objects. On a 64-bit system, each heap page contains approximately 408 slots. Each slot contains information about a single live Ruby object.

First, you’ve got some information about the overall size of the entire Ruby object space. heap_allocated_pages is just the number of currently allocated heap pages (er, duh). These pages may be completely empty, completely full, or somewhere in between. heap_sorted_length is the actual size of the heap in memory - if we have 10 heap pages, and then free the 5th page (or some other random page from the middle), the length of the heap is still 10 pages (since we cannot move pages around in memory). heap_sorted_length will always be greater than or equal to the number of actually allocated pages. Finally, we’ve got heap_allocatable_pages - these are heap-page-sized chunks of memory that Ruby currently owns (i.e., has already malloced) that we could allocate a new heap page in. If Ruby needs a new heap page for additional objects, it will use this memory space first.

Okay, now we’ve got a lot of numbers relating to the individual object slots. heap_available_slots is obviously the total number of slots in heap pages - GC.stat[:heap_available_slots] divided by GC::INTERNAL_CONSTANTS[:HEAP_PAGE_OBJ_LIMIT] will always equal GC.stat[:heap_allocated_pages]. heap_live_slots is the number of live objects, and heap_free_slots are slots in heap pages which are empty. heap_final_slots are object slots which have finalizers attached to them. Finalizers are sort of an obscure feature of Ruby - they’re just Procs which run when an object is freed. Here’s an example:

ObjectSpace.define_finalizer(self, self.class.method(:finalize).to_proc)

heap_marked_slots are pretty much the count of old objects (objects that have survived more than 3 GC cycles) plus write barrier unprotected objects, which we’re going to get to a minute.

As for practical use of the slot counts in GC.stat, I’d suggest monitoring heap_free_slots if you’re having memory bloat issues. Large numbers of free slots (like more than 300,000) usually indicates that you have a few controller actions which are allocating large numbers of objects and then freeing them, which can permanently bloat the size of your Ruby process. For more about fixing that, check out my Rubyconf talk of memory issues in Ruby.


WELCOME TO THE TOMB PAGES
Now we’ve got tomb_pages and eden_pages. Eden pages are heap pages which contain at least one live object in them. Tomb pages contain no live objects, and so have completely free slots. The Ruby runtime can only release tomb pages back to the operating system, eden pages can never be freed.

Briefly, there are a few cumulative allocated/freed numbers.

{
  :total_allocated_pages=>63,
  :total_freed_pages=>0,
  :total_allocated_objects=>133299,
  :total_freed_objects=>107793
}

These numbers are cumulative for the life of the process - they are never reset and will not go down. They’re pretty self explanatory.

Finally, we have the garbage collection thresholds.

{
  :malloc_increase_bytes=>45712,
  :malloc_increase_bytes_limit=>16777216,
  :remembered_wb_unprotected_objects=>182,
  :remembered_wb_unprotected_objects_limit=>352,
  :old_objects=>17221,
  :old_objects_limit=>29670,
  :oldmalloc_increase_bytes=>46160,
  :oldmalloc_increase_bytes_limit=>16777216
}

So, one major misconception Ruby developers have is about when garbage collection is triggered. We can trigger GCs manually with GC.start, but that doesn’t happen in production. Many seem to think that GC runs on some sort of timer - every X seconds or requests. That’s not true.

Minor GCs are triggered by a lack of free slots. Ruby doesn’t automatically GC anything - it only GCs when it runs out of space. So when there are no free_slots left, we run a minor GC - marking and sweeping all of the “new” (i.e. not old, have survived fewer than 3 GCs) objects and objects in the remember set and those which are not protected by the write-barrier. I’ll define those terms in a second.

Major GCs can be triggered by a lack of free slots after a minor GC, or any of the following 4 thresholds being exceeded: oldmalloc, malloc, old object count, or the “shady”/writebarrier-unprotected count. The part of GC.stat we’re looking at here shows each of those four thresholds (the limit) and the current state of the runtime on the way to that threshold.

malloc_increase_bytes refers to when Ruby allocates space for objects outside of the “heap” we’ve been discussing so far. Each object slot in the heap pages is only 40 bytes (see GC::INTERNAL_CONSTANTS[:RVALUE_SIZE]) so what happens when we have an object larger than 40 bytes (say, a long string)? We malloc some space just for that object somewhere else! If we allocate 80 bytes for a string, for example, malloc_increase_bytes will increase by 80. When this number reaches the limit, we trigger a major GC.

oldmalloc_increase_bytes is the same thing, but only includes objects that are old.

remembered_wb_unprotected_objects is a count of objects which are not protected by the write-barrier and are part of the remembered set. Let’s define both of those terms. The write-barrier is simply a interface between the Ruby runtime and an object, so that we can track references to and from the object when they’re created. C-extensions can create new references to objects without going through the write-barrier, so objects which have been touched by C-extensions are called “shady” or “write-barrier unprotected”. The remembered set is a list of old objects which have a reference to a new object.

old_objects is just a count of object slots marked as old.

Tracking these thresholds might be helpful if you’re having trouble with a large number of major GCs.

I hope this has been an educational look at GC.stat - it’s an informative hash which can be used to build temporary debugging solutions for when you’ve got bad GC behavior that you need to fix.

SHARE:
Reddit

Want a faster website?

I'm Nate Berkopec (@nateberkopec). I write online about web performance from a full-stack developer's perspective. I primarily write about frontend performance and Ruby backends. If you liked this article and want to hear about the next one, click below. I don't spam - you'll receive about 1 email per week. It's all low-key, straight from me.

Products from Speedshop

The Complete Guide to Rails Performance is a full-stack performance book that gives you the tools to make Ruby on Rails applications faster, more scalable, and simpler to maintain.

Learn more

The Rails Performance Workshop is the big brother to my book. Learn step-by-step how to make your Rails app as fast as possible through a comprehensive video and hands-on workshop. Available for individuals, groups and large teams.

Learn more

More Posts

Announcing the Rails Performance Apocrypha

I've written a new book, compiled from 4 years of my email newsletter.

Read more

We Made Puma Faster With Sleep Sort

Puma 5 is a huge major release for the project. It brings several new experimental performance features, along with tons of bugfixes and features. Let's talk about some of the most important ones.

Read more

The Practical Effects of the GVL on Scaling in Ruby

MRI Ruby's Global VM Lock: frequently mislabeled, misunderstood and maligned. Does the GVL mean that Ruby has no concurrency story or CaN'T sCaLe? To understand completely, we have to dig through Ruby's Virtual Machine, queueing theory and Amdahl's Law. Sounds simple, right?

Read more

The World Follows Power Laws: Why Premature Optimization is Bad

Programmers vaguely realize that 'premature optimization is bad'. But what is premature optimization? I'll argue that any optimization that does not come from observed measurement, usually in production, is premature, and that this fact stems from natural facts about our world. By applying an empirical mindset to performance, we can...

Read more

Why Your Rails App is Slow: Lessons Learned from 3000+ Hours of Teaching

I've taught over 200 people at live workshops, worked with dozens of clients, and thousands of readers to make their Rails apps faster. What have I learned about performance work and Rails in the process? What makes apps slow? How do we make them faster?

Read more

3 ActiveRecord Mistakes That Slow Down Rails Apps: Count, Where and Present

Many Rails developers don't understand what causes ActiveRecord to actually execute a SQL query. Let's look at three common cases: misuse of the count method, using where to select subsets, and the present? predicate. You may be causing extra queries and N+1s through the abuse of these three methods.

Read more

The Complete Guide to Rails Performance, Version 2

I've completed the 'second edition' of my course, the CGRP. What's changed since I released the course two years ago? Where do I see Rails going in the future?

Read more

A New Ruby Application Server: NGINX Unit

NGINX Inc. has just released Ruby support for their new multi-language application server, NGINX Unit. What does this mean for Ruby web applications? Should you be paying attention to NGINX Unit?

Read more

Malloc Can Double Multi-threaded Ruby Program Memory Usage

Memory fragmentation is difficult to measure and diagnose, but it can also sometimes be very easy to fix. Let's look at one source of memory fragmentation in multi-threaded CRuby programs: malloc's per-thread memory arenas.

Read more

Configuring Puma, Unicorn and Passenger for Maximum Efficiency

Application server configuration can make a major impact on the throughput and performance-per-dollar of your Ruby web application. Let's talk about the most important settings.

Read more

Is Ruby Too Slow For Web-Scale?

Choosing a new web framework or programming language for the web and wondering which to pick? Should performance enter your decision, or not?

Read more

Railsconf 2017: The Performance Update

Did you miss Railsconf 2017? Or maybe you went, but wonder if you missed something on the performance front? Let me fill you in!

Read more

Rubyconf 2016: The Performance Update

What happened at RubyConf 2016 this year? A heck of a lot of stuff related to Ruby performance, that's what.

Read more

What HTTP/2 Means for Ruby Developers

Full HTTP/2 support for Ruby web frameworks is a long way off - but that doesn't mean you can't benefit from HTTP/2 today!

Read more

How Changing WebFonts Made Rubygems.org 10x Faster

WebFonts are awesome and here to stay. However, if used improperly, they can also impose a huge performance penalty. In this post, I explain how Rubygems.org painted 10x faster just by making a few changes to its WebFonts.

Read more

Page Weight Doesn't Matter

The total size of a webpage, measured in bytes, has little to do with its load time. Instead, increase network utilization: make your site preloader-friendly, minimize parser blocking, and start downloading resources ASAP with Resource Hints.

Read more

Hacking Your Webpage's Head Tags for Speed and Profit

One of the most important parts of any webpage's performance is the content and organization of the head element. We'll take a deep dive on some easy optimizations that can be applied to any site.

Read more

How to Measure Ruby App Performance with New Relic

New Relic is a great tool for getting the overview of the performance bottlenecks of a Ruby application. But it's pretty extensive - where do you start? What's the most important part to pay attention to?

Read more

Ludicrously Fast Page Loads - A Guide for Full-Stack Devs

Your website is slow, but the backend is fast. How do you diagnose performance issues on the frontend of your site? We'll discuss everything involved in constructing a webpage and how to profile it at sub-millisecond resolution with Chrome Timeline, Google's flamegraph-for-the-browser.

Read more

Action Cable - Friend or Foe?

Action Cable will be one of the main features of Rails 5, to be released sometime this winter. But what can Action Cable do for Rails developers? Are WebSockets really as useful as everyone says?

Read more

rack-mini-profiler - the Secret Weapon of Ruby and Rails Speed

rack-mini-profiler is a powerful Swiss army knife for Rack app performance. Measure SQL queries, memory allocation and CPU time.

Read more

Scaling Ruby Apps to 1000 Requests per Minute - A Beginner's Guide

Most "scaling" resources for Ruby apps are written by companies with hundreds of requests per second. What about scaling for the rest of us?

Read more

Make your Ruby or Rails App Faster on Heroku

Ruby apps in the memory-restrictive and randomly-routed Heroku environment don't have to be slow. Achieve <100ms server response times with the tips laid out below.

Read more

The Complete Guide to Rails Caching

Caching in a Rails app is a little bit like that one friend you sometimes have around for dinner, but should really have around more often.

Read more

How To Use Turbolinks to Make Fast Rails Apps

Is Rails dead? Can the old Ruby web framework no longer keep up in this age of "native-like" performance? Turbolinks provides one solution.

Read more

Close

Get notified on new posts.

Straight from the author. No spam, no bullshit. Frequent email-only content.