Profiling Drupal

Nathaniel Catchpole

on

February 10, 2009

Profiling Drupal

Over the past few months I've been working a fair bit on optimizing certain areas of Drupal core, mainly node_load(). While I've used devel and ab pretty much exclusively in the past, I've found myself needing to see exactly why a page is slow, and devel and benchmarks only get you so far with that. So I ended up using xdebug and kcachegrind/webgrind, which I'd never used before, and had regarded as voodoo before trying them out; they looked complicated and hard to set up. Turns out that they're not to hard to set up, quite easy to use, and very, very helpful. While Drupal is being used for some very large, very busy websites, there's still room for improvement in core in terms of performance, so we can do better. And part of doing better is knowing where the problems are and how to measure improvements. Before you profile, you need to know what to profile. Optimizing either the database or PHP is quite labour intensive, and particularly when working on Drupal core (but ideally in contributed modules or specific projects too), you also have to keep the code readable, extensible, and maintain functionality. Making something a couple of milliseconds faster at the cost of readability won't fly. So in my opinion it's best to focus on things that happen often - loading nodes, viewing users, or the code Drupal loads for every request. When optimizing a real site, this usually means your front page, and traffic logs will tell you which other pages are used a lot. Or the mysql query log analyzer will tell you which queries are taking the most time overall. If one administration page takes 3 seconds to load, then that's not great, but that page could take much more effort to optimize than shaving 20 ms off every page request on your site - which in terms of server load and user experience for most of your users makes a much greater difference. The next thing to do is to identify the actual bottlenecks. yslow and firebug can help you for front end performance - which are often a bigger issue in the wild, but while Drupal core can make those problems easier to solve (reverse cache support, CDN support etc.), they need real world situations to be measured in, and the actual solutions are often site specific. So here I'm going to focus on what you can find on your local development environment - and three tools, devel module, ab, and xdebug/cachegrind. devel has three very useful features for profiling: #1 devel generate. Unless you're doing micro-optimization of the bootstrap, you want to benchmark Drupal with a few thousand users, nodes comments and taxonomy terms. Devel generate gives you those in a matter of minutes. #2 query log - this will show you all the rendered queries on a page request, how many milliseconds they took, which function they were called from. For slow queries, you can then run them through MySQL's EXPLAIN - which will show you if they're using indexes, whether they cause table scans or filesorts etc.. One of the classic 'bad' queries in drupal is from tracker module, so if you want an example to try out visit /tracker with some pre-populated content, copy the query from the devel report, and paste it into the mysql command line with EXPLAIN in front of it. If a query looks hard to optimize in itself, then caching is another option to explore to avoid doing the same hard work over and over again. If you see exactly the same query being run more than once, then it's a good candidate for a static cache. For an example, visit user/n - and see user_load() querying the database several times for the same information (patch here). Beware though, over-use of static caches can bloat memory usage if they're not actually used, which limits the number of Apache processes you can run, and they can also cause unexpected behavior if not reset properly when data is modified during a page request. #3 Performance logging module. This is more useful on real sites rather than for profiling core, but it's quite new, so may not be well known yet. If you've got APC installed on your server, you can log execution times and memory usage for all the pages on your site, and have averages and page counts show up on a handy administration screen, with minimal impact on the running of your site. It's very handy, try it out! This started out as it's own project, but now comes in the devel download. However, once you've added indexes for a slow query, or cached a common request, you need to know if it actually made any difference. This is where Apache Bench comes in. There are basic instructions for using Apache Bench on Drupal.org, so I won't repeat them here. However, there's a few things to remember which are easy to get caught out with. 1. Disable Drupal's page caching - unless you pass in a cookie, ab is viewing your site as anonymous user, and benchmarking a cached page is only useful if you want to benchmark the effectiveness of the page cache. 2. Disable the MySQL query cache in my.cnf and restart MySQL. The query cache will do its best to avoid processing the same query from scratch across multiple requests - great for optimizing your LAMP stack, terrible when trying to optimise the database queries themselves. 3. Make sure your anonymous user has permissions to view the page you're benchmarking, and that while modifying the code, you didn't introduce syntax errors. Drupal's 403 pages, and white screens of death are very, very fast, as are pages with most of the content missing (for instance trying to benchmark comments when the anonymous user doesn't have the 'access comments' permission is quite useless). If in doubt, log out and take a look first. 4. If you're trying to optimise code, don't overwhelm your server with too much concurrency, ab -c1 n500 is much more likely to give you accurate results than ab -c10 n50. Also, run benchmarks three times before and after your patch is applied to ensure you don't have too much deviation in the results. If you've read this far, it's likely that you've already used devel and ab before, or at least heard of them. However I'd been using both for a couple of years before I tried xdebug and kcachegrind or webgrind. And kcachegrind/webgrind turn out to be a lot of fun. First, install xdebug. For me, installing kcachegrind just meant typing "aptitude install kcachegrind" - but if it's not that simple on your platform, you could try http://code.google.com/p/webgrind/ instead - for that you just unzip the folder into your webroot, change a couple of settings in config.php, and you're done. kcachegrind recently stopped working for me after an ubuntu upgrade, so I tried webgrind, and actually found it better for profiling since it concentrates much more on function calls and execution time than code flow without the extra clutter of function call graphs (pretty though they are). Next a bit of setup. xebug needs some entries in php.ini - these are the important ones:
zend_extension="/usr/lib/php5/20060613/xdebug.so"
# These let you trigger the profiling of drupal by adding ?XDEBUG_PROFILE=1
# to the URL in your browser.
xdebug.profiler_enable_trigger = on
xdebug.remote_enable = 1
xdebug.remote_port = 9000
xdebug.remote_host = localhost
Two more sneaky settings later on. So once this is working, you can visit http://drupal7.localhost?XDEBUG_PROFILE=1 - and xdebug will save a file named something like cachegrind.out.1234 to your /tmp folder. I usually rename these to something like 'cachegrind.out.patch' or 'cachegrind.out.HEAD' so I know what I'm looking at once they're open. Then, just open kcachegrind, browse to the file in the normal way, and take a look. You'll then see a screen like this: kcachegrind screenshot kcachegrind does much more than show performance bottlenecks, so we're going to focus on just a couple of columns in the left pane. Incl.: This tells you the total time spent in the function, including any functions it calls. So when menu_execute_active_handler() looks expensive, that's because all the functions needed to build a page in Drupal start there. So in general, look at this column last - in most cases, it's a red herring. You'll also find that the Incl. column adds up to much longer than the total request time, because every function is counted for itself, and for every function that calls it. Self.: The screenshot is sorted by this column, and that's because it's by far the most useful one there - it tells you how much time was spent inside any one function during the course of a page request. So a function that's called once, doesn't call any other functions, and takes 30ms to complete - will show up as 30ms (or 30,000 microseconds) in self. A function which takes 2ms, but is called 200 times, will show up as 400ms in this column. Function calls are also shown on this page. Note that cachegrind can also show these numbers as percentages, but personally I find the numbers easier to read. As you can see from that screenshot, one of the most expensive functions in all of Drupal core is format_date() who new! If viewing a few nodes or comments on a page, the time quickly starts to add up. If you see something which can be optimized - putting an if statement outside a loop, static caching or whatever, you then want to see if it made any difference. If it's a big difference, then ab will usually find it - but it needs to be quite a major change to show up in benchmarks like that. However, doing cachegrinds on a page only tests one request - and these can have a lot of variation especially if you're profiling on a machine doing lots of other things. So one way to get the best of both worlds is to use cachegrind and ab together. First, two more lines to add to php.ini -
# Set the filenames generated by xdebug to use REQUEST_URI instead of the apache process number
# so all requests to the same page end up in the same file.
xdebug.profiler_output_name = cachegrind.out.%R

# Set xdebug to append data to files instead of overwriting, this means you can view aggregate
# function calls over multiple requests.
xdebug.profiler_append = TRUE
Then use ab -c1 n50 (50's more than enough here, even 10 will probably do) to benchmark the page as before. You should ignore the ab output - since generating cachegrind files is expensive by itself. However, you'll now have a file in the same format as a single request, but showing function calls added together from all 50 requests. This gives you more solid data about which functions are the most expensive, and it can also show up functions which are being called far too many times - i.e. recently I discovered that element_sort() (a Drupal helper function for uasort(), to sort arrays with '#weight' properties) was getting called anything up to 5000 times on a request, and taking more time than was spent in the database on pages with a lot of nodes. About 10 lines of code later, and it's now called about 180 times on the same page in HEAD. So profiling is very useful for understanding existing bottlenecks, and can help you find new ones you didn't know existed. If you've ever found yourself trying to optimize something in the dark, then it can be many hours of trial and error to find bottlenecks (at worst disabling modules and slowly re-enabling them), a couple hours spent familiarizing yourself with profiling could save you a lot of time in the long run, and it's also hard data which helps to get your patches committed faster.

Share it!

but, I'm very curious about the patch that reduced 5000 function calls to 180 calls, would be helpful if you had a link to the commit, this would extend the tutorial to what you should do after grinding the cache :)

Thanks Catch!
Excellent post! This will be helpful many I'm sure. Including myself one day...
First, two more lines to add to settings.php -
Do you mean php.ini?
Both the maths, and the settings.php/php.ini switch.
A function which takes 2ms, but is called 200 times, will show up as 200ms in this column
400ms