Writing Filters for Apache 2.0
Posted by admin on October 12 2007 20:20:11

When the Apache developers first began talking about Apache 2.0, one of the major goals was for one module to be able to modify the output of another. This goal was realized earlier this year with the sixth alpha version. The mechanism used to make modifications are called filters. Originally it was difficult to write filters, but during the past few releases, the developers have improved the interface so that filters are much easier to create.

// <a href="http://ad.doubleclick.net/jump/ttm.onlamp/apacheart;sz=336x280;tile=3;ord=5271160268?" target="_blank"><img src="http://ad.doubleclick.net/ad/ttm.onlamp/apacheart;sz=336x280;ord=123456789?" width="336" height="280" border="0" alt="" /></a>

This article will cover some of the basic concepts of Apache filters. In my next column, I'll walk you through creating a filter. In the column after that, I will apply the same concepts toward writing an input filter.

Standard Filters

Filters work because the Apache developers consider Web pages as chunks of information. In general, we don't care what those chunks look like or how they are stored on the server. In Apache filter terminology, each chunk is stored in a bucket, and lists of buckets form brigades. Lists of brigades can then create a Web document. Filters operate on one brigade at a time, and are called upon repeatedly until the entire document has been processed. This allows the server to stream information to the client.

The basic Apache distribution includes several standard filters.

The first is the content_length_filter. This filter computes the content length of the response if possible. If the response is not fully available when this filter is first called and the protocol allows the server to send the response without a content-length header, then this filter just passes data to the next filter. It continues to count bytes, however, for logging purposes.

The second standard filter is the header_filter. The first time this filter is called, it formats the header table and sends all of the headers to next filter before sending the current page. This is important, because if your filter wants to modify headers, it must be inserted before the header_filter and it must buffer the entire page until it has made all of the modifications to the headers. Once your filter passes data to the next filter in the stack, you have effectively told Apache that you are done with that data, and it can be sent to the client.

The final filter is always the core_output_filter. This filter is responsible for writing all data to the network. To provide optimal usage of the available network bandwidth, Apache will buffer as much as 9KB of data before sending it to the client. However, filters can force Apache to send data immediately by flushing the current filter stack.

Filter Types and Their Meanings

Before a filter can be enabled for a given request, it must be registered with the server. This is done using the ap_register_output_filter function. This function is invoked with three arguments: the filter name, the filter function pointer and the filter type, such as:


he filter name is a server-wide unique identifier for this filter. No two filters can use the same string as their filter_name. For this reason, it is recommended that filter names have some sort of namespace protection unique to each module. The filter function is the function that should be added to the filter stack whenever this filter is specified. Next month, we will cover this function in more detail. Finally, a filter type must be specified. All filters have a type associated with them; this helps Apache to order filters correctly. The following is a list of filter types with their associated meanings.

Most filter writers will focus exclusively on AP_FTYPE_CONTENT filters. Once a filter is registered with the server, it can be added for a request. This is done using the ap_add_output_filter function, and is usually specified with the SetFilter directive in the httpd.conf file. The ap_add_output_filter accepts four arguments:

ap_add_output_filter(const char *name, void *ctx,
request_rec *r, conn_rec *c);

The first argument is the name that was registered with ap_register_output_filter. The ctx argument is an arbitrary pointer that is passed to the filter each time that it is called. This is useful when a single function implements multiple function. The final two arguments are a request_rec and conn_rec that the filter uses each time it is called. If a request_rec is not available, that field can safely be NULL. If the request_rec is NULL, the conn_rec must be provided. This allows a single filter chain to be used on both a request and sub-request, without requiring Apache to determine which request goes with which filter. Associating a request with a filter is done when adding the filter to the filter stack.

This article has just barely scratched the surface of filters, and we will take the next two months to delve into this topic. Writing filters is a complex topic, but by taking it slow, they can become a powerful way to enhance a Web server.