Logging Apache's *_cache modules efficiency

Recently I had to setup several Apache web servers with an in-RAM cache for frequently accessed content with the help of Apache's mem_cache module. All the servers where front-end servers to application servers that sits behind.

Although in some cases the content expiration time was set as low as 30 seconds, it worked great because it reduced request concurrency for the most costly requests, providing an important performance improvement and reducing heavily the requests to the application servers. The problem comes when you want to log cache hits and misses in order to analyze cache efficiency.
The *_cache modules doesn't provide any information to apache through environment variables, e.g., neither provides any logging facility.

Googling about it I even found some people that modified mem_cache module source code to add some logging facility to it. However, that's not the way in my opinion. If you are working with separate servers (separate Apache instances, not necessarily two different machines althought it's better), there is a better solution to it:

1. Enable mod_headers in the application server and add an HTTP header to register the "cost" to process the request (the time taken by the server since it received the request until the response is sent to the wire). We need to add the following header:


Header set CacheMiss "%D %t"
2. In the front-end server we will add a new header (so we ned mod_headers again) and create a specific log file for the cache performance data.


Header set CacheHit "%D %t"
LogFormat "%t \"%r\" %X \"%{CacheMiss}o\" \"%{CacheHit}o\" %>s" performance
CustomLog /var/log/apache2/performance.log performance


The LogFormat drective tells Apache to define a new log entry type containing:

  • date and time the request came in

  • the first line of the HTTP request (like in common or combined log format)

  • Connection status when response is completed

  • the CacheMiss header sent by the backend. This is the most interesting thing: if the request is a cache hit, this field will be empty because no request is forwarded to the backend server! it will contain "D=n t=m", where m is the time the request was received, and m is the time from when the request was received to the time the headers are sent on the wire. This is a measure of the duration of the request.

  • the CacheHit header contain the same information than "CacheMiss", but from the front-end server point of view. This field will appear always, no matter if it's a cache hit or a miss.

  • Finally, "%>s" is the HTTP status code for the request.


Provided this log, is quite easy to analyze how many cache hits and misses we have over the total number of requests, the time to process a cache hit vs. a cache miss,... I highly recommend to enable this kind of log some days or weeks before enabling the cache, so you can analyze your cache performance better by comparing with historical data. This way you can know better the impact of every configuration change. This method is valid also to measure the request processing cost between front-end and backend servers, no matter if you are using a cache or not.

Enjoy with mem_cache and logging!

PS: I encourage you to read apache documentation for mod_log_config, mod_cache, mod_mem_cache and mod_headers. You may need to take a look at mod_proxy and mod_http_proxy too.

Story URL: 

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.
To prevent automated spam submissions leave this field empty.