Ludicrously Fast Page Loads - A Guide for Full-Stack Devs
Server response times, while easy to track and instrument, are ultimately a meaningless performance metric from an end-user perspective.
Actual end-user response to the word ‘microservices’ End-users don’t care how fast your super-turbocharged bare-metal Node.js server is - they care about the page being completely loaded as fast as possible. Your boss is breathing down your neck about the site being slow - but your Elixir-based microservices architecture has average server response times of 10 nanoseconds! What’s going on?
- Server response times can easily balloon without proper use of caching, both at the application and HTTP layers. Bad SQL queries in certain parts of the application can send times skyrocketing.
- JS and CSS assets must be concatenated, minified and placed in the right place in the document, or rendering may be blocked while the browser stops to load external resources (more on this later). In addition, these days when there’s a JQuery plugin or CSS mixin for just about anything, most developers have completely lost track of just how much CSS and JS is being loaded on each page. Even if, gzipped and minified, your CSS and JS assets are <100kb, once they’re un-gzipped, they still must be parsed and loaded to create the DOM and CSSOM (explained in more detail below). While gzipped size is important when considering how long CSS or JS will take to come across the network, uncompressed size is important for figuring out how long it will take the client to parse these resources and construct the page.
So what’s a good, performance-minded full stack developer to do? How can we take our page loads from slow to ludicrous speed?
But, rather than just tell you that XYZ technique is faster than another, I’m going to show you how and why. Rather than take my word for it, you can test different frontend optimizations for yourself. To do that, we’re going to need a profiling tool.
Enter Chrome Timeline
Note that most of Google’s documentation on Chrome Timeline is severely out of date and shows a “waterfall” view that no longer exists in Chrome as of October 2015 (Chrome 45). This post is up-to-date as of that time.
To open Chrome Timeline, open up Chrome Developer Tools (Cmd + Alt + I on Mac) and click on the Timeline tab. You’ll see a blank timeline with millisecond markings. For now, uncheck the “causes”, “paint” and “memory” checkboxes on the top, and disable the FPS counter by clicking the bar graph icon.
What your settings should look like These tools are mostly useful for people profiling client-side JS apps, which I won’t get into here.
The Chrome Timeline records page interactions a lot like a VCR. You can click the little circular icon (the record button) at any time to turn on Timeline recording, and then click it again to stop recording. If the Timeline is open during a refresh, it will automatically record until the page has loaded.
Let’s try it on http://todomvc-turbolinks.herokuapp.com/. This is a TodoMVC implementation I did for a previous blog on Turbolinks. While the Timeline is open, you can trigger a full page load with CMD + Shift + R and Chrome will automatically record the page load for you in Timeline.1(Be sure you’re doing a hard refresh here, otherwise you may not redownload any assets.)1 Be sure you’re doing a hard refresh here, otherwise you may not redownload any assets.
Note that browser extensions will show up on Chrome Timeline. Any extension that alters the page may show up and make your timelines confusing. Do yourself a favor and disable all of your extensions while profiling with Chrome Timeline.
We’re going to start with a walkthrough of a typical HTML page load in Timeline, and then we’re going to identify what this performance profile says about our application and how we can speed it up.
Here’s what my Timeline looked like:
254 ms from refresh to done - not bad for an old Rails app, eh?
Receiving the HTML
The first thing you’ll notice is that big chunk of idle time at the beginning. Almost nothing is happening until about 67ms after I hard-refreshed.
“An idle browser is the devil’s workshop.” What’s going on there? It’s a combination of server response time (on this particular app, I know it hovers around 20ms), and network latency (depending on how far you are from the US East Coast, anywhere from 10-300ms).
Even though we live in an age of mass cable and fiber optic internet, our HTTP requests still take a lot of time to go from place to place. Even at the theoretical maximum speed of an HTTP request (the speed of light), it would take a user in Singapore about 70ms to reach a server in the US. And HTTP doesn’t travel at the speed of light - cable internet works about half that speed. In addition, they make as many as a dozen intermediate stops along the way along the Internet backbone. You can see these stops using
traceroute. In addition, you can get the approximate network latency to a given server by simply using
ping (that’s what it was designed for!).
For example, I live in New York City. Pinging a NIST time server in Oregon, I usually can see network latency times of about 100ms
Oregon? Well these packets Oregonna take a long time to get there!. That’s a pretty substantial increase over the time we’d expect if the packets were traveling at the speed of light (~26ms). By comparison, my average network latency for a time server in Pennsylvania is just 20ms. And Indonesia? Packets take a whopping 364ms to make the round trip. For websites that are trying to keep page load times under 1 second, this highlights the importance of geographically distributed CDNs and mirrors.
Let’s zoom in on the first event on the timeline. It seems to happen in the middle of this big idle period. You can use the mouse wheel to zoom.
The first event on the Timeline is “Receive Response”.
A few milliseconds later, you’ll see a (tiny) “Receive Data” event. You might see one or two more miscellaneous events related to page unloading, another “Receive Data” event, and finally a “Finish Loading” event. What’s going on here?
The server has started responding to your request when you see that first “Receive Response” event. You’ll see several “Receive Data” events as bytes come down over the wire, completing with the “Finish Loading” event. This pattern of events will occur for any resource the page needs - images, CSS, JS, whatever. Once we’ve finished downloading the document, we can move on to parsing it.
“Parsing HTML” sounds like a pretty simple process, but Chrome (and any browser) actually has a lot of work to do. The browser will read the bytes of HTML off the network (or disk, if you’re viewing a page on your computer), and convert those bytes into UTF-8 or whatever document encoding you’ve specified. Then, the browser has to “tokenize” - basically taking the long text string of the HTML and picking out each tag, like
<a>. Imagine that the browser converts the ~100kb string of HTML into an array of several strings.
Me, waiting for The Verge to load Then it “lexes” these tokens (basically converts them into fancy objects) and finally constructs a DOM out of them. On complicated pages, these steps add up - on my machine, The Verge takes over 200ms just to parse the HTML. Yow.
The two teeny tiny blue lines there are the JS and CSS requests being sent.
<script src="/assets/application-0b54454ea478523c05eca86602b42d6542063387c4ee7e6d28b0ce20f5e2c86c.js" async="async" data-turbolinks-track="true"></script>
Because this script tag was marked with the
Non-blocking async! WhoOOoOOAaaa! This can be a huge boost to speeding up time-to-first-paint for most websites.
Browsers will not wait on external CSS before continuing past this step. If you think about it, this makes sense. CSS cannot modify the DOM, it can only style it and make it pretty. In order to even apply the CSS, we need to have the DOM constructed first. So the browser, smartly, simply sends the request for the CSS and moves on to the next step.
Note that this “Parse HTML” step will reoccur every time the browser has to read new HTML - for example, from an AJAX request.
The next major event you’re going to see is the purple “Recalculate Styles”. Unfortunately, this event covers a lot of things that actually happen during page construction. The first is the construction of the CSSOM.
As HTML is to the DOM, so CSS is to the CSSOM. Your CSS, after it’s downloaded has to be converted -> tokenized -> lexed -> constructed just like the HTML was. This process is usually the cause of any “Recalculate Styles” bars you see at the beginning of the page load.
“Recalculate Styles” can also mean a lot of other confusing things are happening with your CSS, like “recursive calculation of computed styles”, or whatever that means. The gist is that if you’re seeing a lot of time in “Recalculate Styles”, your CSS is too complicated. Try to eliminate unused or unnecessary style rules.
Why are we seeing Recalculate Styles events when the CSS hasn’t even been downloaded yet? The browser is applying the browser’s default CSS to the document, and it may also be applying any
style attributes present in the HTML markup itself (
display: none being a common one, present on this page).
You will probably see more purple events (Recalculate Styles and its cousin, Layout) later on in the timeline. Again, your browser does not wait for CSS to finish downloading - it’s already calculating styles and layouts based on just your HTML markup and the browser defaults right now. The rendering events you see later on occur once the CSS is finished downloading.
Slightly after your first Recalculate Styles event, you should see a purple “Layout” event. Basically, at this point, your browser has all of the DOM and CSSOM in memory and needs to turn it into pixels on the screen.
The browser traverses the visible elements of the DOM (actually the render tree), and figures out each node’s visibility, applicable CSS styles, and relative geometry (50% width of its parent and so on). Complicated CSS will obviously make this step longer, but so will complicated HTML.
If you’re seeing a lot of “layout” events during a page load, you may be experiencing something called “layout thrashing”.
In summary - in the “Layout” step, then, the browser is just calculating what’s visible, what isn’t, and where it should go on the page.
It’s generally at this point that you’ll see the blue bar in Timeline - this is the
async). Most browsers have not painted anything to the screen by this point.
To speed up
DomContentLoaded, you can do a few things:
- Make script tags
asyncwhere possible. Moving script tags to the end of the document doesn’t help speed up
asynctags is generally cleaner and more effective than using so-called ‘async’ script injection.
- Use less complex HTML markup.
- Avoid layout thrash (see above). Don’t use more than one stylesheet - concatenate your assets!
- Inline styles in moderation. Inlining styles means that the browser may try to parse the stylesheet before moving on to the rest of the document. Google recommends inlining only styles required to display above-the-fold content. This will slow down DOMContentLoaded but will speed up the window’s
loadevent. This may be true, but you certainly don’t want to inline all of your CSS. Also, figuring out what CSS rules you need for the above-the-fold content in this age of CSS frameworks and Bootstrap sounds like a lot of work to me. How much CSS do you need to render above-the-fold? All of it. As a rule of them, don’t consider inlining all of your CSS unless you’ve got about 50kb or less of it. Once HTTP2 becomes more common and we can download CSS, HTML and JS over the same connection, this optimization will no longer be needed.
As we move along the timeline to the right, you should start seeing some green bars in the flamegraph. These are Paint related events. There’s a whole lot that can go on in these events (and Chrome even provides profiling tools just for these painting events), but I’m not going to go too deep on them here. All you need to know is that paint events happen when the browser is done rendering (the purple bars - the process of turning your CSS and HTML into a layout) and needs to turn the layout into pixels on a screen.
The green bar in the timeline is the first paint - the first time anything is rendered to screen. Optimizing first paint is largely a matter of optimizing DOMContentLoaded and getting the stylesheet to the client as fast as possible. Any stylesheet that doesn’t specify a media query (like
Parse Author Style Sheet
Keep scrolling to the right on the Timeline. Wow - see how much longer it took to get to this part?
In my case, it took almost 40 ms of just waiting around to download the whole stylesheet - and this app’s stylesheet isn’t even that big! To be exact, we sent the request for the stylesheet at about 65ms, and it didn’t come back until 101ms. In reality, this actually extremely fast (in a real app, you would expect that to be more like 200-350ms at least), and we can’t really optimize that much further. I’m in NYC and Heroku is in Virginia, so most of that time is network latency anyway.
Once the stylesheet is downloaded, it’s parsed. You’ll see another cycle of purple events (as the CSSOM is re-calculated, we re-render the layout) and green events (now that the layout is updated, we render the result to the screen).
The stylesheet for this app is extremely simple, and my app appears to be wasting about 30ms waiting for the CSS to download. It may be worth investigating the performance impact of inlining the entire stylesheet in the HEAD of this page. Most sites won’t benefit from this optimization (see my bit about this above), but because this app is idling for about 20ms waiting for the styles to download, we may want to eliminate that network round-trip.
Finally, you should see the
Once all of those callbacks attached to
load have completed, you’ll see the red bar, which signifies the end of the
load . This is generally when the page is “ready” and finished loading. Finally!
Using Chrome Timeline to Debug Browser Speed
So, you’ve got a site that takes 5-10 seconds to get to the
load event. How can you use Timeline to profile it and find the performance hotspots?
- Hard reload (ctrl-shift-r) and load the Timeline with fresh data
- Look at the pie graph for the entire page load. After hard reloading, Chrome will show the aggregate stats for the entire page load in the pie graph. You can see here that it took about 2.23 seconds from my refresh input to get to
load. Get an idea of where you spend most of your time - is it in parsing (loading), scripting or rendering and painting? Is it idle time?
- Reduce Idle Idling comes from slow server responses and asset requests. If you’re idling a lot, make sure your server is still zippy-quick. If it is, you may have an unoptimized order of assets. See the “DomContentLoaded” section above.
- Reduce Loading Recall that “loading” here refers to time spent parsing HTML and CSS. To decrease loading time, you don’t have many options other than to decrease the amount of HTML and CSS you’re sending to the client.
asynctags to these scripts to get them off the rendering critical path, even if the vendor proudly claims the script is already “async!”. Try to look at the call stacks and figure out where you’re spending most of your time.
- Reduce Rendering and Painting Sites can also have quite a few layout changes and re-renders due to tools like Optimize.ly, something we can see by checking the “First Layout Invalidation” property of some of the “Layout” events in the Timeline. This is a tough one. Optimize.ly’s whole purpose is to essentially change the content of the page, so moving it to an
asyncscript tag may cause a “flash of unstyled content” where part of the page would look one way and then suddenly flash into a different styling. That isn’t acceptable, so we’re stuck with Optimize.ly’s slow and painful re-layouts here.
Do these things to make your pages load faster.
- You should have only one remote JS file and one remote CSS file. If you’re using Rails, this is already done for you. Remember that every little marketing tool - Olark, Optimize.ly, etc etc - will try to inject scripts and stylesheets into the page, slowing it down. Remember that the cost of these tools is not free. However, there’s no excuse for serving multiple CSS or JS files from your own domain. Having just one JS file and one CSS file eliminates network roundtrips - a major gain for users in high-latency network environments (international and mobile come to mind). In addition, multiple stylesheets cause layout thrashing.
srcattribute), so you may need to drop things like Mixpanel’s script into a remote file you host yourself (in Rails, you might put it into
application.jsfor example) and then make sure that remote script has an
asyncon external scripts takes them off the blocking render path, so the page will render without waiting for these scripts to finish evaluating.
asynctag, external CSS must go first. External CSS doesn’t block further processing of the page, unlike external JS. We want to send off all of our requests before we wait on remote JS to load.
- $(document).ready is not free. Every time you’re adding something to the document’s being ready, you’re adding script execution that delays the completion of page loads. Look at the Chrome Timeline’s flamegraph when your
loadevent fires - if it’s long and deep, you need to investigate how you can tie fewer events to the document being ready. Can you attach your handlers to
Want a faster website?
I'm Nate Berkopec (@nateberkopec). I write online about web performance from a full-stack developer's perspective. I primarily write about frontend performance and Ruby backends. If you liked this article and want to hear about the next one, click below. I don't spam - you'll receive about 1 email per week. It's all low-key, straight from me.
Products from Speedshop
The Complete Guide to Rails Performance is a full-stack performance book that gives you the tools to make Ruby on Rails applications faster, more scalable, and simpler to maintain.Learn more
The Rails Performance Workshop is the big brother to my book. Learn step-by-step how to make your Rails app as fast as possible through a comprehensive video and hands-on workshop. Available for individuals, groups and large teams.Learn more
I've written a new book, compiled from 4 years of my email newsletter.
MRI Ruby's Global VM Lock: frequently mislabeled, misunderstood and maligned. Does the GVL mean that Ruby has no concurrency story or CaN'T sCaLe? To understand completely, we have to dig through Ruby's Virtual Machine, queueing theory and Amdahl's Law. Sounds simple, right?
Programmers vaguely realize that 'premature optimization is bad'. But what is premature optimization? I'll argue that any optimization that does not come from observed measurement, usually in production, is premature, and that this fact stems from natural facts about our world. By applying an empirical mindset to performance, we can...
I've taught over 200 people at live workshops, worked with dozens of clients, and thousands of readers to make their Rails apps faster. What have I learned about performance work and Rails in the process? What makes apps slow? How do we make them faster?
Many Rails developers don't understand what causes ActiveRecord to actually execute a SQL query. Let's look at three common cases: misuse of the count method, using where to select subsets, and the present? predicate. You may be causing extra queries and N+1s through the abuse of these three methods.
I've completed the 'second edition' of my course, the CGRP. What's changed since I released the course two years ago? Where do I see Rails going in the future?
Memory fragmentation is difficult to measure and diagnose, but it can also sometimes be very easy to fix. Let's look at one source of memory fragmentation in multi-threaded CRuby programs: malloc's per-thread memory arenas.
Application server configuration can make a major impact on the throughput and performance-per-dollar of your Ruby web application. Let's talk about the most important settings.
Choosing a new web framework or programming language for the web and wondering which to pick? Should performance enter your decision, or not?
Have you ever wondered how the heck Ruby's GC works? Let's see what we can learn by reading some of the statistics it provides us in the GC.stat hash.
Full HTTP/2 support for Ruby web frameworks is a long way off - but that doesn't mean you can't benefit from HTTP/2 today!
WebFonts are awesome and here to stay. However, if used improperly, they can also impose a huge performance penalty. In this post, I explain how Rubygems.org painted 10x faster just by making a few changes to its WebFonts.
The total size of a webpage, measured in bytes, has little to do with its load time. Instead, increase network utilization: make your site preloader-friendly, minimize parser blocking, and start downloading resources ASAP with Resource Hints.
One of the most important parts of any webpage's performance is the content and organization of the head element. We'll take a deep dive on some easy optimizations that can be applied to any site.
New Relic is a great tool for getting the overview of the performance bottlenecks of a Ruby application. But it's pretty extensive - where do you start? What's the most important part to pay attention to?
rack-mini-profiler is a powerful Swiss army knife for Rack app performance. Measure SQL queries, memory allocation and CPU time.
Most "scaling" resources for Ruby apps are written by companies with hundreds of requests per second. What about scaling for the rest of us?
Ruby apps in the memory-restrictive and randomly-routed Heroku environment don't have to be slow. Achieve <100ms server response times with the tips laid out below.
Get notified on new posts.
Straight from the author. No spam, no bullshit. Frequent email-only content.