Fully grasping the flexibility provided by the mod_perl API is impossible without first understanding the different parts of the HTTP request, how they interact with each other, and how Apache stores this information internally. The chapter at hand begins our discussion of the fundamentals by introducing the Apache request object, which provides a framework for interacting with all of these.
At the heart of the Apache API is the request record, defined in the file src/include/httpd.h in the Apache source distribution. The request record contains information about the current request, such as incoming and outgoing HTTP headers, the relationship of the current request to any subrequests, the request URI, resulting physical filename, and more. We highly recommend that you spend a moment going through httpd.hcontained within are many of the minor details of Apache that you will not find documented anywhere else.
For mod_perl, access to the request record is granted through the instantiation of the Apache request object and a handful of methods provided by the Apache class. The request object is the key that releases the Apache request record, and with it you can begin to harness the full power of the mod_perl API.
You need to retrieve the Apache request object.
Use the request() method from the Apache class to construct the request object directly
my $r = Apache->request;
or, more idiomatically, simply pull the request object off of the argument list from a handler or Apache::Registry script
my $r = shift;
The Apache request object is at the center of the mod_perl API. It provides access to the Apache request record as well as other core mod_perl methodsalmost all the things that you will want to either peek at or manipulate. Nearly all of your mod_perlspecific code will begin by capturing the request object using one of the two methods shown here.
The Apache request object, like all objects in Perl, is merely a data structure bless()ed into the Apache class. The constructor for the Apache class is the request() method, which returns a new request object. Unlike traditional objects, however, the Apache request object has singleton-like propertiesevery request object created for a given request points to the same Apache request record and manipulates the same set of per-request data. Traditionally, most programmers end up placing the request object into $r, which is how you will see it appear throughout this book.
Because creating the Apache request object is such a frequent task, the request object is the first argument passed to mod_perl handlers. Well, unless your handler is a method handler, in which case the first argument is the invoking class, but we'll save that until Chapter 10.
As we already mentioned, because Apache::Registry is an example of a mod_perl handler, the request object is also the first argument passed to Registry scripts.
Although the idiomatic code of the second example is far more prevalent in both this book and on CPAN, the request() method is sometimes a preferable way to cleanly get at the request object. For instance, if you are writing a Perl module that needs to be intelligent about whether it is running under mod_perl or mod_cgi, you can effectively retrieve the request object using the request() syntax.
if ($ENV{MOD_PERL}) {
my $r = Apache->request;
$r->send_http_header('text/html');
}
else {
print "Content-type: text/html\n\n";
}
As mentioned in the Introduction, the request object offers methods for accessing the fields of the Apache request record. The most important methods are described in the remaining recipes in this chapter, which will give you a glimpse into some of the more fundamental, interesting, and practical uses for mod_perl and $r. A few of the less-frequented methods are saved until later chapters where we show them in specific applications.
You want to see the entire request.
Use the $r->as_string() method to view the message, including the client request and server response headers.
sub handler {
my $r = shift;
print STDERR $r->as_string;
return OK;
}
If you print out the results of $r->as_string() after Apache has finished sending the response (such as from a PerlCleanupHandler), you should see something similar to
GET /index.html HTTP/1.0 Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */* Accept-Charset: iso-8859-1,*,utf-8 Accept-Encoding: gzip Accept-Language: en Connection: Keep-Alive Host: http://www.example.com Pragma: no-cache User-Agent: Mozilla/4.73 (Windows NT 5.0; U) HTTP/1.0 200 OK Content-Location: index.html.en Vary: negotiate TCN: choice Last-Modified: Fri, 19 Jan 2001 19:39:47 GMT ETag: "4d52-51e-3a689803;3aedadb0" Accept-Ranges: bytes Content-Length: 1310 Keep-Alive: timeout=15, max=100 Connection: Keep-Alive Content-Type: text/html Content-Language: en Content-Location: index.html.en
This represents both the HTTP Request message and the HTTP Response message (without the message bodies) as defined by the HTTP protocol. You can find the entire protocol at http://www.w3.org/Protocols/rfc2616/rfc2616.html, but it doesn't exactly make for interesting bedtime reading. From a mod_perl programmer's point of view, the important things to understand (and understand well) about the protocol are the concepts of the HTTP message and client-request/server-response cycle.
The mechanism that drives the Web we interact with every day is incredibly different from the typical client-server environment that may be familiar programming territory. If you are already doing CGI programming, then much of this is not terribly new, but it is worth taking the time to understand the mechanics of it.
The HTTP request cycle consists of transmitting a series of HTTP messages back and forth between the user agent and server. All HTTP messages consist of an initial identifying line, followed by message headers, and ending with the message contents. The important concept to grasp is that a request cycle consists of a single iteration of each of these partsthe HTTP protocol is itself "stateless," and does nothing but describe the mechanism for the retrieval of a single resource.
Table 3.1 illustrates the methods used to access each of the parts of the HTTP message from the Apache request object.
|
Method Name |
Description |
|
the_request() |
Provides access to the Request-Line of the client request. For example, GET /index.html HTTP/1.0 in the preceding output. |
|
headers_in() |
Provides access to the incoming headers from the client request. |
|
content() |
Returns the message body of the client request, such as POSTed HTML form data. |
|
status_line() |
Provides access to the Status-Line of the client request. For example, HTTP/1.0 200 OK in the preceding output. |
|
headers_out() |
Provides access to the server response headers. |
|
print() |
Generates the message body of the server response. |
Although all Web programmers are interested in delivering content, mod_perl programmers usually take a special interest in the request headers, which describe aspects of how the content is delivered and received. Headers are used to communicate things such as whether the user agent can interpret compressed content, what language preference the end user has, and even whether content is considered stale and should be updated. The HTTP/1.1 protocol defines four types of headers:
Request Headers. Describe aspects of the incoming client request
Response Headers. Describe aspects of the server response
Entity Headers. Describe the contents of the transferred entity (usually the server resource)
General Headers. Multipurpose headers that can appear in either a request or response
Each of these headers has its own section in RFC 2616, with the exception of headers related to cookies, which you can find in RFC 2109 (http://www.w3.org/Protocols/rfc2109/rfc2109.txt). The recipes contained within the remainder of this book will often make reference to specific headers and use them to control how content is generated, so being familiar with them (or at least knowing where to look) is good.
Other than as an introduction to the basics of the HTTP protocol, and perhaps as a debugging tool, the as_string() method is not terribly useful in and of itself. The other methods described in this recipe that allow you to interact with the various parts of the HTTP message directly are much more interesting, and are discussed in more detail in the following recipes.
You want to access basic information about the incoming client request.
Use the request object to access the various fields of the Apache request record.
#!/usr/bin/perl -w
use strict;
my $r = shift;
# Send our basic headers.
$r->send_http_header('text/plain');
# Read things you would normally get from CGI.
print " REQUEST_METHOD is: ", $r->method, "\n";
print " REQUEST_URI is: ", $r->uri, "\n";
print "SERVER_PROTOCOL is: ", $r->protocol, "\n";
print " PATH_INFO is: ", $r->path_info, "\n";
print " QUERY_STRING is: ", scalar $r->args, "\n";
print "SCRIPT_FILENAME is: ", $r->filename, "\n";
print " SERVER_NAME is: ", $r->hostname, "\n";
Before any of your mod_perl code is allowed to interact with the incoming request, Apache populates various fields within the request record with various bits of important information. The Apache class provides a large set of methods to access the details of the incoming request directly from the Apache request record. These methods include uri(), args() and others. The sample Apache::Registry script illustrates their use.
Assume our sample Apache::Registry script is placed within the directory /usr/local/apache/perl-bin/, and we type the following URL into our Web browser:
http://www.example.com/perl-bin/echo.pl/extra?x=1The results would be comparable to what you would expect to see contained in %ENV for a normal CGI script. When programming in a mod_perl environment, such as when using Apache::Registry, the Apache methods are preferred over their %ENV counterparts because populating %ENV is expensive to do on every request (which is why the PerlSetupEnv directive exists, as described in Recipe 2.6). Additionally, these methods allow for greater flexibility, because most can also modify their corresponding field in the request record, and as you begin to program outside of the content-generation phase, the ability to alter the request record becomes important.
Table 3.2 summarizes the methods available to the Apache request object for interfacing with the client request data found in the request record.
|
Method |
Example Value |
Details |
|
args() |
x=1 |
Returns the chunk of text following the ? in the URL when called in a scalar context. |
|
filename() |
/usr/local/apache/ |
Provides access to the translated script name for this request. |
|
header_only() |
TRUE |
Returns true if the request is a HEAD request |
|
hostname() |
spinnaker.example.com |
Returns the name of the host running the script; this may well be different than the host in the URL in the user's browser. |
|
method() |
GET |
Provides access to the HTTP method used for this request is returned. GET and POST are most commonly used. |
|
path_info() |
/extra |
Provides access to the additional path information located after the script name for this request. Does not include the query string. |
|
protocol() |
HTTP/1.0 |
Returns the protocol for this request. Generally this is either HTTP/1.0 or HTTP/1.1. |
|
proxyreq() |
TRUE |
Returns true if the request is a proxy request |
|
uri() |
/perl-bin/echo.pl/extra |
Provides access to the request URI, which includes the basic request, plus additional path information. |
You can do a lot with this basic set of methods for reading the client request. However, mod_perl provides even higher level abstractions of this data. Later recipes introduce classes such as Apache::URI and Apache::Request, both high level object interfaces to the request information.
You need to access the headers from the incoming request.
Use $r->headers_in() to obtain access to the header data.
sub handler {
my $r = shift;
# Grab all of the headers at once...
my %headers_in = $r->headers_in;
# ... or get a specific header and do something with it.
my $gzip = $r->headers_in->get('Accept-Encoding') =~ m/gzip/;
$r->send_http_header('text/plain');
print "The host in your URL is: ", $headers_in{'Host'}, "\n";
print "Your browser is: ", $headers_in{'User-Agent'}, "\n";
print "Your browser accepts gzip encoded data\n" if $gzip;;
return OK;
}
As shown in Recipe 3.3, parts of the incoming client request have their own accessor methods. For the request headers, all are stored together in a table in the Apache request record and are accessible through the headers_in() method. headers_in() returns an array of key/value pairs in a list context or an Apache::Table object in a scalar context; our sample code uses both forms.
Acceptable client request headers are defined in section 5.3 of RFC 2616 (with the exception of the Cookie header, which is described in RFC 2109). As already mentioned, in addition to the client request headers, some general and entity headers may also apply to a client request. These three classes of headers contain many more headers than those listed here in Table 3.3, but the following are the ones that you are most likely to find yourself programming against.
|
Header |
Description |
|
Accept |
Lists acceptable media types for the server to present in response |
|
Accept-Charset |
Lists character sets the client will accept |
|
Accept-Encoding |
Lists encodings the client will accept |
|
Accept-Language |
Lists languages the client is most interested in |
|
Authorization |
A series of authorization fields |
|
Cookie |
Decribes a client cookie |
|
Host |
Name of the requested host server |
|
If-Match |
The entity tag of the client's cached version of the requested resource |
|
If-Modified-Since |
An HTTP-formatted date for the server to use in resource comparisons |
|
If-None-Match |
A list of entity tags representing the client's possible cached resources |
|
If-Unmodified-Since |
An HTTP-formatted date for the server to use in resource comparisons |
|
Referer |
An absolute or partial URI of the resource from which the current request was obtained |
|
User-Agent |
A string identifying the client software |
Note that the best way to access or alter a header isn't necessarily by going after the raw data with headers_in(). A multitude of methods are available in mod_perl that take care of the gory work of processing and parsing the incoming headers. For instance, the Apache::File class has methods for dealing with all the conditional If-* headers. Apache::File and these headers are discussed in detail in Chapter 6.
You need access to user HTML form input.
Use the args() or content() methods provided by the Apache class or, for greater flexibility, the param() method from the Apache::Request class.
The simple, but less flexible way is
sub handler {
my $r = shift;
my %query_string = $r->args; # GET data
my %post_data = $r->content; # POST data
# Continue along...
}
Or, using the Apache::Request class
use Apache::Request;
sub handler {
my $r = Apache::Request->new(shift);
$r->send_http_header('text/plain');
# Now, we use the param() method, which covers both GET and POST data.
foreach my $param ($r->param) {
print "$param => ", $r->param($param), "\n";
}
# Continue along...
}
Recipe 3.3 illustrates the args() method as a way of gaining access to form data. Unlike a GET request, which has a field in the Apache request record dedicated to holding the query string portion of the URI, data from a POST request is contained within the message body of the incoming request. To access this, mod_perl provides the content() method. Both args() and content() return a list of unescaped key/value pairs in a list context, providing a simple interface to end-user form data.
The Web has become an increasingly more complex programming environment, using HTML forms for more and more intricate uses. As such, you may find that typical args() and content() syntax is rather limiting. For instance, assigning the results of either method to a hash will remove any like named keys, whereas using an array instead will preserve the keys but does not lend itself to easy manipulation. Additionally, both will mishandle form fields that allow multiple choices, like
<form> <select name="castaways" multiple> <option>Gilligan</option> <option>The Skipper</option> <option>Mr. Howell</option> </select> </form>
For those cases where args() or content() prove to be too restrictive, or for a more general solution that handles (nearly) all types of HTML form data, consider using the Apache::Request module.
Apache::Request is part of the libapreq package, which you can find on CPAN under the Apache tree, and which you must install separately from mod_perl. As a whole, libapreq implements a Perl interface to underlying Apache C API methods that can manipulate client request data such as cookies, file uploads, and GET and POST data of type application/x-www-form-urlencoded or multipart/form-data.
The interface for Apache::Request is modeled after that of CGI.pm. It parses and unescapes both GET and POST data, and can be used to either get or set input parameters. The provided param() method functions just like that of CGI.pm, but unlike CGI.pm, the back-end is implemented in C instead of Perl. Apache::Request also has no methods for creating form elements. Both of these aspects make Apache::Request smaller and more efficient for simply accessing form data.
Another advantage is that Apache::Request is a complete subclass of the Apache class. Apache::Request->new() can be used as a drop-in replacement for Apache->request(), allowing you to add Apache::Request features while only requiring minimal changes to your existing code.
In addition to the new() constructor, Apache::Request also offers the instance() method. Instead of returning a new object every time, instance() always returns the same Apache::Request object for the current request.
my $r = Apache::Request->instance(Apache->request);
The instance() method becomes particularly useful when you need access to POST data from more than one handler (for instance, when processing the actual request and during logging). Because POST data is contained within the message body of the incoming request, it can only be read directly from the socket once per request. When you call param(), POSTed data is parsed, stashed in memory, and associated with the current Apache::Request object. A later call to new() creates an entirely new Apache::Request object that will not have access to the previous object's data. However, if you use the instance() method to create all of your Apache::Request objects instead of new(), then you can be sure that all calls to param() will have access to POSTed content, because they will access the data through the exact same object.
One thing worth keeping in mind is that param() is actually tied to the Apache::Table class. When called in a scalar context without any arguments, it will return an Apache::Table object, allowing you to use all the methods of the Apache::Table class for manipulating your data. See Recipe 3.14 later in this chapter for more information on the Apache::Table class.
You need to read data sent by the POST or PUT method that is not submitted in application/x-www-form-urlencoded or multipart/form-data format.
Use the read() method from the Apache class to read the submitted data.
sub handler {
my $r = shift;
my $content;
$r->read($content, $r->header_in('Content-length'));
$r->send_http_header('text/html');
$r->print("<html><body>\n");
$r->print("<h1>Reading data</h1>\n");
my (@pairs) = split(/[&;]/, $content);
foreach my $pair (@pairs) {
my ($parameter, $value) = split('=', $pair, 2);
$r->print("$parameter has value $value<br>\n");
}
$r->print("</body></html>\n");
return OK;
}
As discussed in Recipe 3.5, data that has been submitted via POST can be read either with the args() method, the content() method, or for more flexibility, the param() method from the Apache::Request class. However, this works only if the request MIME type is application/x-www-form-urlencoded or multipart/form-data. For other MIME types you need to use Apache's read() method to access the submitted data. You might use this, for example, to read the data submitted by a PUT request.
To get the incoming message body data into a variable, pass a scalar and length to the read() method. If you want to read the entire submission, set the length to the value of the Content-Length header, if it exists, as done in the sample code.
Note that Apache, through the TimeOut directive, sets a value for a timeout that will abort processing if the client no longer responds. If you find yourself consistently getting timeout errors when reading in large files, you can set the TimeOut directive to a higher value, or modify the value directly through Apache::Server's timeout() method, as shown in Recipe 4.1.
You need to store persistent data on the client browser by accessing and creating cookies.
Use the Apache::Cookie module, which provides a simple, object-oriented interface to cookies.
This example reads cookies from the client request, and prints the name and value:
use Apache::Constants qw(OK);
use Apache::Cookie;
use Apache::Request;
use strict;
sub handler {
my $r = Apache::Request->new(shift);
my %cookiejar = Apache::Cookie->new($r)->parse;
$r->send_http_header('text/plain');
foreach my $cookie (keys %cookiejar) {
$r->print($cookiejar{$cookie}->name, " => ",
$cookiejar{$cookie}->value, "\n");
}
return OK;
}
This code creates two cookies and sends them with the next response.
use Apache::Cookie;
use Digest::MD5;
use strict;
sub handler {
my $r = shift;
my $md5 = Digest::MD5->new;
$md5->add($$, time(), $r->dir_config('SECRET'));
my $session_cookie = Apache::Cookie->new($r,
-name => "sessionid",
-value => $md5->hexdigest,
-path => "/",
-expires => "+10d"
);
# Set the cookie.
$session_cookie->bake();
my $identity_cookie = Apache::Cookie->new($r,
-name => "identity",
-value => 'Arthur McCurry',
-path => "/hall_of_justice/",
-expires => "+365d",
-domain => ".superfriends.com",
-secure => 1
);
# Change the value...
$identity_cookie->value('aquaman');
# ... then set it.
$identity_cookie->bake();
# Continue along...
}
Apache::Cookie, like Apache::Request, is part of the libapreq package available on CPAN. Like Apache::Request, it has a C back-end that makes fast and direct calls to the Apache API, making it preferable to the CGI::Cookie interface on which it is based.
The Apache::Cookie class is used both to get and parse cookies from incoming requests and to create and send cookies on outgoing requests. Programmatically, you can pass in either an Apache::Request object as in the first example or, if you do not need to take advantage of any of Apache::Request's added features, you can just use the standard Apache request object, as shown in the second example. In both cases you will need to use Apache::Cookie's bake() method to actually send your cookies to the client.
Reading cookies is quite easy. Create an empty cookie object with new() method and then call parse(). This method returns either a hash or a hash reference that maps cookie names to cookie objects.
Creating cookies is easy, too. Just specify named parameters to the new() method for your cookie, like -name and -value, after which you can call methods to get and set the cookie's data elements. These methods all return the value of the data requested. Passing in an argument to any of these methods will change the value and return the new value. Table 3.4 summarizes the available methods and the corresponding named parameters used in the new() method.
|
Method Name |
Named Parameter for new() |
Notes |
|
name() |
-name |
The name of the cookie. |
|
value() |
-value |
The value of this cookie; it can be a scalar or an array. |
|
domain() |
-domain |
Specifies that this cookie should be sent to all hosts that end with the specified domain. The domain must begin with a dot. |
|
path() |
-path |
Ensures that this cookie is only sent to URLs that start with the specified path. |
|
expires() |
-expires |
Determines when the cookie becomes stale. Use any absolute or relative date format allowed by CGI.pm. |
|
secure() |
-secure |
If set, informs the client that the cookie should only be used on an encrypted (SSL) connection. |
The interesting thing to note is that bake() places the cookies into the err_headers_out table in the Apache request record, which makes them persist across redirects and other errors. See Recipe 3.13 for more details on the different outgoing headers.
You need to store files uploaded from HTML forms.
Use the upload() method provided by Apache::Request. It returns Apache::Upload objects that contain information about the uploaded file and provide access to the file data itself.
package Cookbook::PrintUploads;
use Apache::Constants qw(OK);
use Apache::Request;
use Apache::Util qw(escape_html);
use strict;
sub handler {
# Standard stuff, with added options...
my $r = Apache::Request->new(shift,
POST_MAX => 10 * 1024 * 1024, # in bytes, so 10M
DISABLE_UPLOADS => 0);
my $status = $r->parse();
# Return an error if we have problems.
return $status unless $status == OK;
$r->send_http_header('text/html');
$r->print("<html><body>\n");
$r->print("<h1>Upload files</h1>");
# Iterate through each uploaded file.
foreach my $upload ($r->upload) {
my $filename = $upload->filename;
my $filehandle = $upload->fh;
my $size = $upload->size;
$r->print("You sent me a file named $filename, $size bytes<br>");
$r->print("The first line of the file is: <br>");
my $line = ;
$r->print(escape_html($line), "<br>");
}
$r->print("Done......<br>");
# Output a simple form.
$r->print(<<EOF);
<form enctype="multipart/form-data" name="files" action="/upload"
method="POST">
File 1 <input type="file" name="file1"><br>
File 2 <input type="file" name="file2"><br>
File 3 <input type="file" name="file3"><br><br>
<input type="submit" name="submit" value="Upload these files">
</form>
</body></html>
EOF
return OK;
};
1;
Processing uploads requires a few small changes to the way we have been doing things with Apache::Request. Looking at our example, you'll notice that we are adding a few parameters to the call to Apache::Request->new(). To enable uploads, we set the DISABLE_UPLOADS option to 0. We also set POST_MAX to a sensible value; in this case, 10 megabytes. Next we call the parse() method to process the form data, including the uploaded files. If there are problems with the upload, they will surface here as a bad return code, suitable for returning or comparing to values from Apache::Constants. Additionally, an error message accessible through the notes() interface
my $errmsg = $r->notes("error-notes");
is provided which can be used in a custom response, another handler, or when logging.
After verifying that there were no errors during the file upload, the next step is to call Apache::Request's upload() method. upload() returns one or more Apache::Upload objects depending on its context. If called in a list context, as in the preceding example, the upload() method returns a list of all the files the user uploaded as Apache::Upload objects. In a scalar context with a form field name as an argument, it will return the specific file (if it exists).
my $upload = $r->upload('treasure');If you have a valid Apache::Upload object, you can access the uploaded file and all sorts of information related to it. Table 3.5 summarizes the most frequently used methods from the Apache::Upload class. For a complete list see the Apache::Request documentation.
|
Method Name |
Description |
|
filename() |
The filename associated with this upload. |
|
fh() |
An open filehandle you can use to read the uploaded file. |
|
info() |
Additional HTTP headers sent by the client, accessible as an Apache::Table object. |
|
name() |
The name of the form field containing the file. |
|
size() |
Size of the file, in bytes. |
|
tempname() |
Name of temporary spool file created on disk. |
|
type() |
The MIME type of this file, as determined by the client. |
You want to set the outgoing server response headers.
Use the specific server response methods, or the headers_out() method from the Apache class for those headers that do not have a specific method.
sub handler {
my $r = shift;
# Do something interesting, then...
# Set the MIME type.
$r->content_type('text/html');
# Set some other header.
$r->headers_out->set('Cache-Control' => 'must-revalidate');
# Now, send the headers.
$r->send_http_header;
# Continue along...
}
Setting all the proper headers required of a server response is not an easy task. The $r->as_string() output in Recipe 3.2 shows all the headers from a document that has been handled by mod_negotiation. As you can see from the abundance of headers present in the response, sending appropriate and meaningful headers can mean quite a lot of work. Fortunately, mod_perl offers help in many of the most difficult aspects of setting proper headers.
Acceptable server response headers are defined in section 6.2 of RFC 2616 (with the exception of the Set-Cookie header, which is described in RFC 2109). As with the incoming client request, general or entity headers may also apply to the server response. Again, more headers exist than those listed in Table 3.6, but these are the ones you are most likely to encounter.
|
Header |
Description |
|
Cache-Control |
One of several fields used to specify caching behavior |
|
Content-Encoding |
Specifies the encoding of the sent resource |
|
Content-Language |
Specifies the language of the content. |
|
Content-Length |
The length of the resource |
|
Content-Type |
Media type of the resource |
|
Etag |
Entity tag of the sent resource |
|
Expires |
Time the resource is considered to be stale |
|
Last-Modified |
Date of last modification of the resource |
|
Location |
An absolute URI for redirection |
|
Pragma |
Generic header that can be used to implement any client- or server-specific behavior |
|
Set-Cookie |
Describes a client cookie to be set |
|
Server |
Information about the server platform |
Like with the client request headers, the collection of server response headers occupies its own place in the Apache request record, which is accessible through the headers_out() method from the Apache class. The headers_out() method functions in the same way as the headers_in() method discussed in Recipe 3.4; it returns an array of key/value pairs in a list context or an Apache::Table object in a scalar context. However, unlike with the client request headers, there are very few headers that you will actually use headers_out() to manipulate.
In most cases, the outgoing server response headers each have their own specialized method that is used to access and alter the server response headers. This is true for a few reasons. Some response headers are considered worthy of their own place in the Apache request record due to the far-reaching implications they may have on the request. The Content-Type and No-Cache headers are an example of this. Other response headers are so tricky to implement that it is easier to take advantage of the mod_perl API than to figure it out for yourself, such as the Etag header.
Table 3.7 lists the particular headers that should never be manipulated through the headers_out() method, but instead via their designated interface using the Apache request object.
|
Method |
Description |
|
content_encoding() |
Provides access to the Content-Encoding information held in the Apache request record. |
|
content_languages() |
Provides access to the Content-Language array held in the Apache request record. |
|
content_type() |
Provides access to the MIME type for the requested resource that will accompany the Content-Type header. |
|
no_cache() |
Provides access to various cache-controlling headers, such as Pragma and Cache-Control. |
|
set_content_length() |
Sets the Content-Length header. |
|
set_etag() |
Sets the Etag header. |
|
set_last_modified() |
Sets the Last-Modified header. |
Examples of many of these methods will be shown throughout the book.
You want to make sure that dynamic documents are not cached by clients or proxy servers.
Use the no_cache() method from the Apache class, passing it in a single true value.
# Turn off cache headers for the current request. $r->no_cache(1);
When creating a wholly dynamic document, chances are that you do not want either the client or any in-between proxies caching the content you are about to generate. The no_cache flag in the Apache request record is used to control whether Apache will automatically send an Expires header formatted with the time of the request (according to the server, not the client). mod_perl provides the no_cache() method which, besides getting or setting the no_cache flag, offers control of the Pragma and Cache-Control headers as well. When called with a single true argument, no_cache() will set the Pragma and Cache-Control response headers to the string no-cache. The combination of all these headers should be sufficient to ensure that the content you send will be considered fresh by the majority of (compliant) browsers.
Calling no_cache(0) will keep Apache from sending the Expires header, and remove the Pragma and Cache-Control headers from the response header table. Because the Pragma header can be used by either the client or server to implement any custom behavior and not just caching, this particular feature may have unforeseen consequences.
An important aspect of the HTTP protocol is that, regardless of the status of the Expires, Pragma, or Cache-Control headers, clients and proxies should not be caching anything other than a successful response. Thus, it is not necessary to set no_cache(1) for all dynamic documents, just those that are generating actual content, and not those that will return a redirect or error response.
Although setting no_cache(1) is a quick and convenient solution to the problem of stale content, if you really gave your application some thought, you would probably find that the content you generate does not really have the ability to change on every request. In fact, your underlying data may only change once a day, or not for weeks. In these cases, you would benefit greatly from using cache-related headers properly, as described in Recipe 6.6.
You need to send the server response headers.
Use the send_http_header() method from the Apache class.
package Cookbook::SendWordDoc;
use Apache::Constants qw( OK NOT_FOUND );
use DBI;
use strict;
sub handler {
my $r = shift;
my $user = $r->dir_config('DBUSER');
my $pass = $r->dir_config('DBPASS');
my $dbase = $r->dir_config('DBASE');
my $dbh = DBI->connect($dbase, $user, $pass,
{RaiseError => 1, AutoCommit => 1, PrintError => 1}) or die $DBI::errstr;
my $sql= qq(
select document from worddocs
where name = ?
);
# determine the filename the user wants to retrieve
my ($filename) = $r->path_info =~ m!/(.*)!;
# do some DBI specific stuff for BLOB fields
$dbh->{LongReadLen} = 300 * 1024; # 300K
my $sth = $dbh->prepare($sql);
$sth->execute($filename);
my $file = $sth->fetchrow_array;
$sth->finish;
return NOT_FOUND unless $file;
$r->headers_out->set("Content-Disposition" =>
"inline; filename=$filename");
$r->send_http_header("application/msword");
print $file;
return OK ;
}
1;
After you have your server response headers in place, you can send them on their way using send_http_header(). It is important to understand that by sending headers you are initiating the start of the response, so any errors that occur after you send your headers will result in a rather unsightly document and will short-circuit Apache's built-in error-handling procedures. For this reason, doing all form field validations, error checking, and so on, prior to calling send_http_header() is considered good programming practice.
One nice thing about send_http_header() is that it accepts the MIME type of the response as an optional argument. This saves you the time of calling $r->content_type() yourself in a separate step or, in the case of legacy CGI scripts, needing to prepend Content-type: text/plain\n\n (or something similar) to your output.
Another convenient feature of the send_http_header() method is that, because it draws on the underlying Apache API, it is platform aware. This means you no longer have to be concerned with whether your script is going to be running on Unix, VMS, or an IBM 390; the proper CRLF character sequence will follow the end of the response headers, allowing for maximum portability (you were concerned, weren't you?).
You want to tell the client the status of the request by indicating a successful response or an error response, such as a redirect or a server error.
Use constants exported by the Apache::Constants class to communicate the status of the response back to Apache.
package Cookbook::Regex;
use Apache::Constants qw(:common);
use Apache::File;
use Apache::Log;
use strict;
sub handler {
my $r = shift;
my $log = $r->server->log;
my @change = $r->dir_config->get('RegexChange');
my @to = $r->dir_config->get('RegexTo');
unless ($r->content_type eq 'text/html') {
$log->info("Request is not for an html document - skipping...");
return DECLINED;
}
unless (@change && @to) {
$log->info("Parameters not set - skipping...");
return DECLINED;
}
if (@change != @to) {
$log->error("Number of regex terms do not match!");
return SERVER_ERROR;
}
my $fh = Apache::File->new($r->filename);
unless ($fh) {
$log->warn("Cannot open request - skipping... $!");
return DECLINED;
}
$r->send_http_header('text/html');
while (my $output = ) {
for (my $i=0; $i < @change; $i++) {
$output =~ s/$change[$i]/$to[$i]/eeg;
}
print $output;
}
return OK;
}
1;
Built in to the HTTP/1.1 specification is a series of status codes for communicating the status of the request back to the client. Everyone is familiar with error responses like 404 Not Found and 500 Internal Server Error that appear occasionally while surfing the Web. The 200 OK responses that are the norm typically go by unnoticed, masked by the actual content displayed by the browser. In each of these cases, Apache is returning an HTTP status code through the Status-Line of the server response. Through the mod_perl API we have the ability to control the status of the request by returning the appropriate status as the return value from our handler. Thinking of handlers as functions, not procedures, helps the return code of the handler defines the status of the request. The Apache::Constants module provides the complete set of response codes as symbolic, human-readable names. These codes are based on the standard HTTP response codes, with the addition of some Apachespecific codes.
It is easiest to begin with the HTTP specific codes. Internally, Apache stores the HTTP status of the request in the status slot of the Apache request record. At the start of each request, the status is set to the constant HTTP_OK, which corresponds to a 200 OK HTTP response. As the different Apache modules step into the request, each has the ability to set the response status to something other than HTTP_OK by returning an HTTP return code. For instance, mod_dir returns HTTP_MOVED_PERMANENTLY whenever a URI is received for a directory but does not contain a trailing slash, such as http://www.example.com/sails. Apache then propagates that response back to the end user in the form of a 301 Moved Permanently response, the browser redirects to the target URI http://www.example.com/sails/, and things continue as normal.
The mechanism is the same for your Perl handlers. Sending a response back to the client with a status other than 200 OK only requires that you return the appropriate HTTP status code back from your handler. These status codes are made available to your code through the Apache::Constants class, which allows you to import each constant by name or use a set of import tags. For instance, we used the :common import tag in the preceding sample handler.
Table 3.8 shows a few of the more common HTTP return codes, along with their Apache::Constants names, suitable for importing into your code. A larger list can be found in Appendix B, whereas the authoritative source is section 10 of RFC 2616.
|
Apache::Constants Constant |
HTTP Response Code |
|
AUTH_REQUIRED |
401 Unauthorized |
|
FORBIDDEN |
403 Forbidden |
|
NOT_FOUND |
404 Not Found |
|
REDIRECT |
302 Found |
|
SERVER_ERROR |
500 Internal Server Error |
Aside from these HTTP return codes, Apache maintains three constants that are meant to help facilitate the interaction between Apache and the various request handlers: OK, DECLINED, and DONE. All of these are also available through the Apache::Constants class. The reason for these Apache-specific codes will become clear in Part III, where we examine the Apache request cycle in detail. For the moment, we can just focus on some mechanics and save the details for later.
The most common return code is OK, which indicates success. Remember that Apache has already set the response to 200 OK, so it generally is not appropriate to return HTTP_OK from your handlerOK is the proper value in nearly all cases. DECLINED tells Apache that you have declined to process the request. This does not necessarily mean that you have not altered the request, only that you have chosen not to inform Apache about it. The final Apache-specific return code is DONE, which indicates that all content has been sent to the client and Apache should immediately skip to the logging phase of the request. DONE is rarely used, but is useful in certain circumstances. Recipe 11.6 shows an interesting application of the DONE return code.
For the most part, all of your handlers will return OK or one of the HTTP-specific error codes, such as REDIRECT or SERVER_ERROR. In reality, Apache treats any return code other than OK, DECLINED, or DONE as an error. While this may sound strange, all it really means is that Apache will start its error response cycle, which allows you to capture responses other than HTTP_OK with an ErrorDocument or the custom response mechanism discussed in Recipe 8.6. Reasons to choose DECLINED over OK as a return code are more fully discussed in later chapters, where each phase of the request cycle has its own peculiarities.
As we have mentioned a few times, Apache::Registry is really a mod_perl handler that wraps your Perl CGI code within a handler() subroutine. One of the side effects of the Apache::Registry design, however, is that any return value from Registry scripts is ignored by Apache::Registry::handler(). This means that the model we have described here does not apply, even though Apache::Registry scripts have access to the entire mod_perl API.
The way around this is to set the status of the request directly using the status() method from the Apache class. For instance, a typical idiom for Registry scripts that use the mod_perl API is:
$r->headers_out(Location => "/pirate_map.html"); $r->status(REDIRECT); return REDIRECT;
The interesting thing to note here is that, in the case of Registry scripts, the return value actually just serves as a way to exit the script gracefully without any further processing$r->status(REDIRECT) is the actual mechanism that is telling Apache to return 302 Found back to the client. Actually, it is a bit trickier than that. Apache::Registry returns the status you set with $r->status() back to Apache, then sets $r->status() back to HTTP_OK, because handlers typically do not alter $r->status() directly.
For our example and discussion we have only used a few of the constants available through the Apache::Constants class: Apache::Constants contains over 90 different constants used for the many different aspects of mod_perl, from the server-response codes we have discussed so far to constants only useful when dealing with the internal Apache API. Importing all those constants into your script wastes precious memory and is excessive for all but the most demanding Web application. To make life easier, Apache::Constants defines several import tags that group constants of similar purpose together, such as the :common tag used in the example code. Other convenient tags are listed in the Apache::Constants manpage.
Because every constant you import into your code increases your process size (albeit slightly), you can slim down this list by importing only those constants you actually use:
use Apache::Constants qw(OK REDIRECT SERVER_ERROR);
Not only does this keep unneeded symbols out of your process, as discussed in Chapter 9, but it also increases readability of your code. For the most part, our examples will make use of this more explicit syntax.
You want to set server response headers that will persist on errors or internal redirects.
Set the headers with the err_headers_out() method instead of headers_out().
use My::Utils; # some fictional utility package
sub handler {
my $r = shift;
# Invalidate the session on error.
unless (My::Utils::validate_user($r->user)) {
$r->err_headers_out->set('Set-Cookie' => 'session=expired');
return FORBIDDEN;
}
# Continue along...
}
Apache actually keeps two separate server response header tables in the request recordone for normal response headers and one for error headers. The difference between them is that the error headers are sent to the client even (not only) on an error response. Recall from the previous recipe that, to Apache, an error response is anything other than OK, DECLINED, or DONE.
The error headers are manipulated using the err_headers_out() method from the Apache class, which has the same interface as the headers_in() and headers_out() methods described previously. They are particularly useful for influencing browser behavior, such as creating cookies that persist across errors and can force a user to re-authenticate, as in the preceding example.
There are a few things to understand when it comes to manipulating error headers. First, it is important to note that unlike when sending content and returning OK, you should not call $r->send_http_header() before returning an error status. When Apache receives an error return status, such as SERVER_ERROR, REDIRECT, or AUTH_REQUIRED it will automatically send the proper set of headers, so there is no need for you to worry about it.
The second point to remember is that the error headers are always sent, which makes the name somewhat misleading. Don't fall into the trap of doing something like:
# This is a bogus example!
# err_headers_out() takes precedence over headers_out()!
# Capture errors with a cookie...
$r->err_headers_out->set('Set-Cookie' => 'error=whoops');
# ... otherwise set the session.
$r->headers_out->set('Set-Cookie' => 'session=$sessionid');
Then, later in another handler, trying to capture errors with
if ($cookiejar{$cookie}->name eq "error") {
# Do some error processing.
}
Here the end result will be that every request will contain the error cookie, making it rather meaningless for distinguishing errors.
If you take a moment to think about the error header mechanism, you might be prompted to question the standard practice of setting the Location header for a REDIRECT response via headers_out()
$r->headers_out->set(Location => 'http://www.example.com/entry.html'); return REDIRECT;
and use err_headers_out() instead. As it turns out, the Location header is handled as a special case when an error occurs. Apache will first look for the Location header in the headers_out table in the request record, then look in the err_headers_out table if the headers_out table yielded no results.
You want to manipulate headers that have multiple like header fields, but assigning them to a hash removes all but the last value.
Use the Apache::Table class to access your headers.
# Take a peek at what we are going to set.
my @cookies = $r->headers_out->get('Set-Cookie');
Calling headers_in() or headers_out() and assigning the return value to a hash has its limitations, especially when dealing with headers that may have more than one entry in the table, such as Set-Cookie. As we have mentioned a few times, calling either of these methods in a scalar context returns an Apache::Table object, which has its own set of methods in particular, the get() method, which returns a single value in a scalar context or a list of values in a list context.
Actually, the Apache::Table class is an important one to be familiar with, in part because it is the underlying class for many methods, including methods for manipulating some fields of the Apache request record:
err_headers_out()
headers_in()
headers_out()
notes()
subprocess_env()
as well as these additional methods:
dir_config() from the Apache class
info() from the Apache::Upload class
param() from the Apache::Request class
The Apache::Table class offers a consistent, powerful interface for manipulating the data beneath each of these methods. It ties into Apache's internal table structure, which allows for things such as headers to be stored in a case-insensitive manner with multiple values per key. This comes in handy when joining the case-insensitive HTTP protocol with case-sensitive Perl, making calls such as
my $encodings = $r->headers_in->get('accept-encoding');
successful, no matter how the user agent capitalized the header.
Apache::Table has only a handful of methods:
add()
clear()
do()
get()
merge()
new()
set()
unset()
of which you will probably only use a few in everyday programming. Although most should be self-explanatory, do() is a unique method that allows you to iterate over the entire table and uses a special idiom.
# Most user agents string multiple cookies together
# using ";" as the separator. Break these cases apart
# so each appears as a separate entry.
$r->headers_in->do(sub {
# The key/value pair is passed as the argument list.
my ($key, $value) = @_;
if ($key =~ m/Cookie/) {
print map { "Cookie => $_\n" } split /;\s?/, $value;
}
else {
print " $key => $value\n";
}
# do() exits on false, so we add this as good programming practice.
1;
});
Using the various Apache::Table methods to access mod_perl data structures has many advantages. As we demonstrated in Recipe 2.14, the PerlAddVar construct is made possible through the use of the Apache::Table framework and the $r->dir_config->get() interface, which allows programmers to access an entire array of values set within a configuration. Aside from typical uses like this, it pays to take a moment and contemplate the full range of power that you have available through the Apache::Table methods. For instance, although setting an outgoing header using the following might seem perfectly intuitive:
$r->headers_out->set('Set-Cookie' => 'punishment=plank');
you can also do things like
# Add values to our configuration. $r->dir_config->set(Filter => 'On');
which is equivalent to setting
PerlSetVar Filter On
in your httpd.conf for use by later handlers in the request. Cool.
You want to find out the result of a request to an internal resource, such as the resulting filename or MIME type.
Use lookup_uri() or lookup_file() methods from the Apache class to create an Apache::SubRequest object, then check the result of the subrequest using an appropriate method.
my $sub = $r->lookup_uri("/fleet/trireme.html");
# Find the absolute path to the resource.
my $filename = $sub->filename;
Because both Apache and mod_perl allow you to be creative in how URIs actually map to files (or avoid physical files all together), the ability to simulate server behavior for a given URI can be a powerful tool. For instance, it may seem obvious that trireme.html in the preceding example is an HTML file, but names can be deceiving, especially if the following RewriteRule is in place:
RewriteRule ^/fleet/(.*)\.html /images/$1.gif [PT]Of course, most of the time the URI will not be a hard-coded filename within the code but instead be determined dynamically at request time, making any conjecture on the programmer's part pointless. For this reason, mod_perl provides the lookup_uri() method, along with the similar but less often used lookup_file(), which allows you to run parts of the request cycle on either a URI or an absolute filename and test the results of the request against various criteria.
lookup_uri() and lookup_file() actually initiate what is known as a subrequest. A subrequest is an Apache request that is not directly associated with a request from a client browser. Subrequests can be initiated by Apache internally, such as with error responses trapped with an ErrorDocument configuration setting, or generated programmatically using the lookup_uri() and lookup_file() methods. Both lookup_uri() and lookup_file() run a request through part of the Apache request cycle and return an Apache::SubRequest object.
The Apache::SubRequest class is a complete subclass of the Apache class, and therefore the Apache::SubRequest object has access to all of the methods normally associated with the Apache request object. The Apache::SubRequest class also extends the Apache class by adding a single new method, run().
The lookup_uri() and lookup_file() methods function differently only in the respect that lookup_uri() will attempt to map the URI to a physical file using the translation phase of the request cycle, whereas lookup_file() will not. Both methods will then proceed through the access, authentication, and authorization phases, as well as the MIME type checking and fixup phases, but will stop short of actual content generation.
Generating a subrequest within your code allows for some interesting functionalityyou can simulate a request and see what results, or you can transfer control from one request to another, eliminating slow client-side redirects in your scripts. In the preceding sample code, we used a subrequest to determine the physical filename of a URI the end-user might request. We could just as easily have used the resulting Apache::SubRequest object to determine the MIME type of the file, or just about any other attribute of the request.
After you have performed any necessary tests on your subrequest, you can optionally run the content generation phase for that resource using Apache::SubRequest's run() method. Here is an example that sends content to the browser only if the subrequest turns out to be a plain file and is accessible.
my $sub = $r->lookup_uri($url);
# Send the file if it exists and the user has permission to see it.
# Unauthorized requests might return AUTH_REQUIRED or FORBIDDEN.
if (-f $sub->finfo && $sub->status == HTTP_OK) {
$r->send_http_header($sub->content_type);
return $sub->run;
}
# Otherwise, do something else...
The run() method added by the Apache::SubRequest class actually runs the content generation phase for the subrequest, sends the data to the client, and returns the exit status of the content handler, not the content generated by the subrequest. This status can be compared to any of the Apache::Constants HTTP return codes to determine whether the subrequest was successful, such as HTTP_OK or FORBIDDEN.
By default, headers set by subrequests are not allowed to pass through to the client. If you are using a subrequest to send content directly to the browser using run() you are required to send the server response headers yourself from the main request. This default behavior can, however, be altered by passing run() a single true argument, which will toggle whether the subrequest will be responsible for generating and sending its own headers.
# Forget about sending headers yourself, # let the Apache the subrequest do it. return $sub->run(1);
You need to alter request headers for a subrequest.
Call $r->headers_in() before initiating the subrequest.
$r->headers_in->set(Accept-Language => 'es');
my $sub = $r->lookup_uri('/armada.html');
my $filename = $sub->filename;
Ordinarily, the request headers for a subrequest are an exact copy of the headers present in the main client request. Part of the reason you may be initiating a subrequest, however, is to determine server behavior based on some additional parameter you don't currently have, such as a client cookie or, as in the preceding example, a different language tag. In these cases, tricking Apache by setting the incoming headers to simulate a different set of client parameters is often useful.
One of the downsides of this approach is that, depending on where you are in the request cycle, setting your own request headers can change the way the current request is processed. One possible workaround is to set the headers of the subrequest itself, as in
my $sub = $r->lookup_uri('/armada.html');
$sub->headers_in->set(Accept-Language => 'es');
return $sub->run(1);
but this is only really useful if you plan on calling run(), because setting headers for the subrequest occurs too late to affect anything other than the content generation phase.
You want to be able to determine whether a request is an actual client request or a subrequest.
Use the is_initial_req() method from the Apache class.
return OK unless $r->is_initial_req;
As you begin to write mod_perl handlers that fold, spindle, and mutilate the different parts of the request cycle, you may find that your custom processing does not really need to happen on every request, just the requests that the client will actually see. For instance, in the prior recipe we checked to see whether the user was allowed to view a particular resource based on the status of the subrequest. However, if you have set up your application such that the user is already authenticated by the time he can run your script, executing the authentication routines again wastes processor cycles.
The is_initial_req() method returns false if the request is the result of a subrequest or internal redirect and true for the main request. The preceding example is the typical idiom for PerlAuth*Handlersit only continues authenticating for the main request and avoids needless overhead for any subrequests.
You need to get or set the method used for the request.
Use the method() and method_number() methods from the Apache class.
use Apache::Constants qw(:methods NOT_FOUND);
use strict;
sub handler {
my $r = shift;
if ($r->method_number == M_POST) {
# Stash away the POST data.
my $content = $r->content;
# Now, change the request to a GET...
$r->method('GET');
$r->method_number(M_GET);
$r->headers_in->unset('Content-Length');
# ... and repopulate the query string with the POST data.
$r->args($content);
}
# Now, the custom response can use the POST data.
$r->custom_response(NOT_FOUND, '/perl-bin/docked.pl');
# Continue along...
}
There are times when you need to get or set the method used for a request, such as GET, POST, or HEAD. In such cases, you should also set the method number as well, as shown in the sample code. The method number is an internal constant used by the Apache API, and is available using the :methods import tag with Apache::Constants. The method numbers are then referred to as M_GET, M_POST, M_PUT, and M_DELETE.
Requests that originate with the HEAD method are handled specially by Apache. When a HEAD request is received, the method number is set to M_GET and the header_only() flag within the request record is set to true. You will often see the following in a handler:
sub handler {
my $r = shift;
$r->send_http_header('text/html');
# Don't generate content on a HEAD request.
return OK if $r->header_only;
# Continue along...
}
which honors HEAD requests by returning just the headers.
One common programmatic problem whose solution involves setting the request method is redirection of POST requests. The subtlety that arises here is that data sent via POST can be read in from the socket only once, and so must be stored somehow for later use. The solution handler snippet addresses this issue. Here, we change the method and method number to those appropriate for a GET request. We then unset the Content-Length header and populate the contents of the URI query string through the args() method. Now, our custom response can have access to any form fields submitted via a POST request.
You need access to the request object from an XS subroutine.
Use h2xs to build the stub of the module, then follow these detailed instructions.
Although Perl is a wonderful language, the extra effort needed to write an XS-based subroutine is sometimes worth the troublefor instance, when you have intense calculations that are better geared toward C, or when you can take advantage of a particular third-party function to perform the task at hand. We describe here some special considerations that you need to take into account if you want to have access to the Apache request object within XS routines.
The example we consider is an overly simple one, but it does have its utility in illustrating a few techniques as well as some interesting history. Although mod_perl provides access to nearly all the fields of the Apache request record, there are a few that mod_perl does not offer any method for, and thus are not accessible in your Perl handlers. The assbackwards flag in the request record is used to note whether the client is making a Simple-Request, which was allowed by the 0.9 version of the HTTP protocol. You can simulate a Simple-Request by making a GET request that does not have a protocol version in the request line.
$ telnet localhost 80 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. GET /perl-status <html> <head><title>Apache::Status</title></head> <body> ...
If Apache sees that the request is "simple," it will set the assbackwards flag in the request record, which reminds Apache to send an appropriately formatted Simple-Response when it sends the content.
Because all modern browsers use at least HTTP version 1.0, the likelihood of having to use mod_perl to conform to an HTTP/0.9 request is negligible, and in fact Apache deals with this for us when it parses the request. However, one of the interesting things to notice about the preceding dialogue is the lack of server response headers. In fact, this is the definition of a Simple-Response.
In effect, Apache uses the assbackwards flag to determine whether the response is allowed to include headers. This is an interesting feature, and one that mod_perl effectively takes advantage of in implementing $sub->run(1). Internally, Apache sets assbackwards to 1 when running a subrequest in order to suppress header generation. When calling run(1), mod_perl actually sets assbackwards back to 0, which signals Apache to send the response headers where it otherwise wouldn't.
We can implement our own function to give us access to the assbackwards flag in the request record, which mod_perl doesn't directly provide. As with building any XS module, it is best to start off with h2xs:
$ h2xs -APn Cookbook::SimpleRequest Writing Cookbook/SimpleRequest/SimpleRequest.pm Writing Cookbook/SimpleRequest/SimpleRequest.xs Writing Cookbook/SimpleRequest/Makefile.PL Writing Cookbook/SimpleRequest/test.pl Writing Cookbook/SimpleRequest/Changes Writing Cookbook /SimpleRequest/MANIFEST
This will create stubs for most of the files needed to build the module Cookbook::SimpleRequest. The first step is to edit the module file SimpleRequest.pm to add the name of our XS routine to @EXPORT_OK, following the good programming practice of not exporting any symbols by default. For our SimpleRequest.pm we take some liberties with the look of DynaLoader's bootstrap() method in our edits, but the end result is the same as provided by the default .pm file.
package Cookbook::SimpleRequest; use 5.006; use strict; use warnings; require Exporter; require DynaLoader; our @ISA = qw(Exporter DynaLoader); our @EXPORT_OK = qw(assbackwards); our $VERSION = '0.01'; __PACKAGE__->bootstrap($VERSION); 1;
The next file, SimpleRequest.xs, requires substantial modification.
#include "EXTERN.h" #include "perl.h" #include "XSUB.h" #include "mod_perl.h" #include "mod_perl_xs.h" MODULE = Cookbook::SimpleRequest PACKAGE = Cookbook::SimpleRequest PROTOTYPES: ENABLE int assbackwards(r, ...) Apache r CODE: get_set_IV(r->assbackwards); OUTPUT: RETVAL
This defines the function assbackwards(), which allows us to either retrieve the current value of assbackwards from the request record or set it to an integer value. Note that, in addition to the standard XS header files EXTERN.h, perl.h, and XSUB.h, we have included mod_perl.h which, in turn, will pull in any needed Apache header files. We also included mod_perl_xs.h, which defines some useful macros like get_set_IV, which does the dirty work for us.
The request record r in SimpleRequest.xs is of type Apache, which is not a data type that Perl understands on its own; The Apache type needs to be defined through a separate typemap file, which gives the rules for converting data types between C and Perl. So, we also need to create a file named typemap and drop in the following code:
TYPEMAP
Apache T_APACHEOBJ
OUTPUT
T_APACHEOBJ
sv_setref_pv($arg, \"${ntype}\", (void*)$var);
INPUT
T_APACHEOBJ
r = sv2request_rec($arg, \"$ntype\", cv);
Finally, we come to Makefile.PL which will be used to build and install the module, and which also requires significant modification.
#!perl
use ExtUtils::MakeMaker;
use Apache::src ();
use Config;
use strict;
my %config;
$config{INC} = Apache::src->new->inc;
if ($^O =~ /Win32/) {
require Apache::MyConfig;
$config{DEFINE} = ' -D_WINSOCK2API_ -D_MSWSOCK_ ';
$config{DEFINE} .= ' -D_INC_SIGNAL -D_INC_MALLOC '
if $Config{usemultiplicity};
$config{LIBS} =
qq{ -L"$Apache::MyConfig::Setup{APACHE_LIB}" -lApacheCore } .
qq{ -L"$Apache::MyConfig::Setup{MODPERL_LIB}" -lmod_perl};
}
WriteMakefile(
NAME => 'Cookbook::SimpleRequest',
VERSION_FROM => 'SimpleRequest.pm',
PREREQ_PM => { mod_perl => 1.26 },
ABSTRACT => 'An XS-based Apache module',
AUTHOR => 'authors@modperlcookbook.org',
%config,
;
This Makefile.PL, although complex, accomplishes a number of tasks that are necessary to tie everything together. It
Sets the include directories for finding header files through Apache::src->new->inc()
Sets the needed library directories and libraries for Win32, through the special hash %Apache::MyConfig::Setup
Sets some needed compiler flags for Win32
Sets PREREQ_PM to mod_perl (version 1.26 or greater), so that a warning will be given if this version of mod_perl is not present
Defines the ABSTRACT and AUTHOR used in making ppd files for ActiveState-like binary distributions
At this point, we are ready to go through the standard build procedure:
$ perl Makefile.PL Checking if your kit is complete... Looks good Writing Makefile for Cookbook::SimpleRequest $ make cp SimpleRequest.pm blib/lib/Cookbook/SimpleRequest.pm /usr/local/bin/perl -I/usr/local/lib/perl5/5.6.1/i686-linux-thread-multi
-I/usr/local/lib/perl5/5.6.1 /usr/local/lib/perl5/5.6.1/ExtUtils/xsubpp
-typemap /usr/local/lib/perl5/5.6.1/ExtUtils/typemap
-typemap typemap SimpleRequest.xs > SimpleRequest.xsc && mv SimpleRequest.xsc SimpleRequest.c ... chmod 755 blib/arch/auto/Cookbook/SimpleRequest/SimpleRequest.so cp SimpleRequest.bs blib/arch/auto/Cookbook/SimpleRequest/SimpleRequest.bs chmod 644 blib/arch/auto/Cookbook/SimpleRequest/SimpleRequest.bs $ su Password: # make install Installing /usr/local/lib/perl5/site_perl/5.6.1/i686-linux-thread-multi
/auto/Cookbook/SimpleRequest/SimpleRequest.so Installing /usr/local/lib/perl5/site_perl/5.6.1/i686-linux-thread-multi
/auto/Cookbook/SimpleRequest/SimpleRequest.bs Files found in blib/arch: installing files in blib/lib into architecture dependent library tree Installing /usr/local/lib/perl5/site_perl/5.6.1/i686-linux-thread-multi/Cookbook/SimpleRequest.pm Writing /usr/local/lib/perl5/site_perl/5.6.1/i686-linux-thread-multi/auto/Cookbook/SimpleRequest/.packlist Appending installation info to /usr/local/lib/perl5/5.6.1/i686-linux-thread-multi/perllocal.pod
After all this elaborate preparation, the use of this module is a little anticlimatic; we simply make up a handler that uses Cookbook::SimpleRequest in the standard way:
package Cookbook::SimpleTest;
use Apache::Constants qw(OK);
use Cookbook::SimpleRequest qw(assbackwards);
use strict;
sub handler {
my $r = shift;
# Get the old value and set the current value
# to supress the headers.
my $old = assbackwards($r, 1);
# Verify the new value.
my $new = assbackwards($r);
$r->send_http_header('text/plain');
$r->print("look ma, no headers!\n");
$r->print("old: $old, new $new\n");
return OK;
}
1;
Although this example doesn't do anything terribly useful, it does illustrate a general framework for constructing practical XS-based modules that use the Apache request object.
As we mentioned at the start, there are times when it is preferable or necessary to write a Perl interface to C routines. However, before you go off and implement a new method for some particular function that mod_perl seems to be missing, take a look through the Apache C API and try to find the functionality there. In addition to the request and related records, the Apache C API provides a number of public ap_* routines that you can hook into. Some of these are for convenience, but others should be used in preference to the corresponding data in the appropriate record.