In
Semi-Dynamic Data, Sheeri writes about Semi-Dynamic Data and content pregeneration. In her article, she suggests that for rarely changing data it is often adviseable to precompute the result pages and store them as static content. Sheeri is right:
Nothing beats static content, not for speed and neither for reliability. But pregenerated pages can be a waste of system ressources when the number of possible pages is very large, or if most of the pregenerated pages are never hit.
An intermediate scenario may be a statification system and some clever caching logic.
Statification is the process of putting your content generation code into a 404 page handler and have that handler generate requested content. The idea is that on a second request the content will be there and thus a static file is being served with a 200 OK, using the fast path of the web server.
A typical example for this kind of task would be a script that generates its name as a PNG: By requesting
http://vvv.k6p.de/statify/example.png, we are returning a PNG image that contains the text "example" - the number of potential PNGs is infinite, and we cannot possibly precalculate all of them. But a few PNGs are requested over and over - most likely because we are referencing them via some IMG tags in some HTML pages. It would be pointless to generate them again and again for each requester, because these images never change.
Here is how to do it, using PHP:
On a sample webserver, create a directory called statify. Inside, install a .htaccess file defining a 404 handler. The 404 handler is a PHP page, and to make things a bit easier on the brain should not be located inside /statify. The 404 handler must be able to create files inside /statify, so that directory has to be writeable by the webserver, for example by setting it to mode 01777.
CODE:
kris@h3118:/home/www/servers/vvv.koehntopp.de/pages> pwd
/home/www/servers/vvv.koehntopp.de/pages
kris@h3118:/home/www/servers/vvv.koehntopp.de/pages> ls -ld statify
drwxrwxrwt 2 kris kiel 4096 2006-08-17 19:10 statify
kris@h3118:/home/www/servers/vvv.koehntopp.de/pages> cat statify/.htaccess
ErrorDocument 404 /statify.php
To better understand what is going on create a statify.php like this:
CODE:
<?php phpinfo(INFO_VARIABLES); ?>
Here is the result:
The interesting variables are $_SERVER["DOCUMENT_ROOT"] and $_SERVER["REQUEST_URI"]: The DOCUMENT_ROOT is needed to find the directory to write the generated content to, and the basename of the REQUEST_URI is the basename of the file to generate. It may be tempting to just write to $_SERVER['DOCUMENT_ROOT']/$_SERVER['REQUEST_URI'], but I just know that you are not that naive.
A clean implementation will reject everything that is not asking for a PNG, and is also cleaning up the REQUEST_URI before using it to construct the filename:
CODE:
<?php
if (! preg_match('/\.png$/', $_SERVER['REQUEST_URI']))
exit();
$text = basename($_SERVER['REQUEST_URI'], ".png");
$cachename = $_SERVER['DOCUMENT_ROOT'] . "/statify/" . $text . ".png";
?>
Painting the image is straightforward:
CODE:
<?php
$im = ImageCreate(640, 200);
$bg = ImageColorAllocate($im, 255, 255, 255);
ImageColorTransparent($im, $bg);
$fg = ImageColorAllocate($im, 0,0,0);
ImageFilledRectangle($im, 0, 0, 640, 200, $bg);
@ImageTTFText($im,
64, 0,
10, 150,
$fg, $_SERVER['DOCUMENT_ROOT'] . "/arial.ttf",
$text
) or die("Unable to open font {$_SERVER[DOCUMENT_ROOT]}/arial.ttf");
?>
This now has to be written into the proper location as a file, and then served to the end user.
CODE:
<?php
@ImagePNG($im, $cachename) or die("Unable to write image $cachename");
ImageDestroy($im);
header("Content-Type: image/png");
echo @file_get_contents($cachename);
?>
After requesting a few test images, the /statify directory might look like this:
CODE:
kris@h3118:/home/www/servers/vvv.koehntopp.de/pages> ls -l statify
insgesamt 12
-rw-r--r-- 1 wwwrun nogroup 1784 2006-08-17 19:23 blabla.png
-rw-r--r-- 1 wwwrun nogroup 1273 2006-08-17 18:16 blah.png
-rw-r--r-- 1 wwwrun nogroup 2144 2006-08-17 18:16 example.png
Looking at the request log one can see what is going on (Yes, I am in Norway right now :-):
CODE:
kris@h3118:/home/www/servers/vvv.koehntopp.de/pages> grep statify/blabla.png $V/logs/access.log
217.148.149.36 - - [17/Aug/2006:19:23:30 +0200]
"GET /statify/blabla.png HTTP/1.1" <span style="font-weight:bold">404</span> 1784 "-"
"Mozilla/5.0 (compatible; Konqueror/3.5; Linux)"
217.148.149.36 - - [17/Aug/2006:19:30:49 +0200]
"GET /statify/blabla.png HTTP/1.1" <span style="font-weight:bold">200</span> 1784 "-"
"Mozilla/5.0 (compatible; Konqueror/3.5; Linux)"
The first request 404'ed and served up the 1784 bytes of blabla.png, writing the file. That was at 19:23. The second request 200'ed, and served up the same 1784 bytes, but this time using the fast path of the web server serving static content.
That is precisely what we wanted.
A similar, but different approach was taken in PEARs
XML_Transformer. In XML_Transformer, XML Input is being transformed into XML (or XHTML) output. So you get to make up your fantasy XML tags and tie PHP code to them that transforms these into useable XHTML. One particularly nice tag that is provided with XML_Transformer is the <img:gtext&gr; tag, which lets you write text which is then transformed into a regular <img> tag and an image to match.
CODE:
Sourcecode:
<img:gtextdefault bgcolor="888888" fgcolor="#000000"
font="/home/www/servers/vvv.koehntopp.de/pages/arial.ttf"
fontsize="32"
border="1" spacing="2"
split="" cacheable="yes"/>
<img:gtext>Antialias</img:gtext><br/>
Transformation result:
<span><img alt="Antialias"
height="42"
src="/cache/gtext/8f6aa7fe8b14c7a1408a898ab0e4522f.png"
width="169"></img></span><br></br>
Here, a 404 handler is not used. Instead the image file is generated with a random name, which is then delivered as the gtext
transformation result, an IMG tag. A second access to the same gtext tag will generate the same img tag with the same filename, and so the rendering of gtext tags content into an image will not happen twice. Note that here the page is not completely statified: The XML_Transformer will still be called to generate XHTML, but the images generated by the XML_Transformer are reuseable static files on the second and all subsequent calls.
The image filename is generated from all attributes of the gtext tag and the content of the tag. To make the filename safe to use in a filesystem and to keep it short, it is in fact the MD5 output of that string. And because XML attributes have no order, the attributes of the tag have to be sorted before they can be MD5'ed.
CODE:
function attributesToString($attributes) {
$string = '';
if (is_array($attributes)) {
ksort($attributes);
foreach ($attributes as $key => $value) {
$string .= ' ' . $key . '="' . $value . '"';
}
}
return $string;
}
$cachefile = md5(
attributesToString(
$this->_gtextAttributes
) . ':' . $word
) . '.png';
This will ensure that different filenames are being generated when any attribute of the gtext tag or the tags content change. It is sufficient to monitor the size of the cache directory: If the size crosses an upper threshold, delete the oldest file from the directory in a loop until the size falls below a lower threshold, implementing a LRU scheme and minimizing the actual image generation work being done in the language.
Statification is not limited to images, though, but can be applied to anything: Entire text pages or boxed page fragments generated from MySQL as well as any other thing that requires a lot of effort to generate. PEAR does provide a really nice little wrapper class
Cache Lite to handle these cases in PHP.
To use Cache_Lite, you basically create an object instance and then try to $data = $cache->get($id) an id. If this is successful, you may to stuff with $data, otherwise you have to calculate $data and at the end of the calculation, you $cache->save($data). On the second call to the same id, $data will then be returned and no calculation needs to be done.
Cache_Lite has even more useful subclasses. Cache_Lite_Output for example wraps any segment of PHP using the output buffer. Thus, you can use it to statify any PHP page or subsection of a PHP page.
CODE:
$cache = new Cache_Lite_Output(array(
"caching" => true,
"cacheDir" => CACHEDIR,
"lifeTime" => 86400 * 7,
"fileLocking" => true,
));
$key = $_SERVER["PHP_SELF"].":".serialize($_REQUEST);
if ($cache->start($key)) exit(); // cache hit - finished
include_once("blah.php"); // only include stuff on cache miss
generate_page();
$cache->end(); // write generated content to cache
Cache_Lite can also wrap and cache function calls: Using the Cache_Lite_Function class it should be possible to write an implementation of
Ackermann that actually terminates with results for many cases that otherwise overflow a systems stack.
The biggest drawback of Cache_Lite is currently that is takes away a lot of work from the system, but unlike 404 statification does not completely shortcut the scripting from the server, so the servers fast path is not being used.