Skip to content

Serving Images From A Database

Sheeri wrote:
So, most of the “I want images in MySQL” conversations are terminated with “Don’t.”
People do say this for a reason. Here is the story of your request going through the system:
Your systems network buffer fills with a new packet containing a new request from the network card. The system delivers the packet to your Apache, switching from kernel mode into user mode.

Apache processes the packet, handing it internally to its mod_php for processing. The mod_php parses the request, constructing a SQL query for the image blob, and pushes the SQL query into a kernel buffer, switching from user mode into kernel mode.

Your systems network buffer fills with a new packet containing a new SQL query from the network layer. The system delivers the packet to your mysqld, switching from kernel mode into user mode.

mysqld processes the packet, handing it internally to the query parser, generating a query cache miss, because the result set is too large to enter the query cache, ever. The parser, the optimizer and the storage engine work to access a very wide table, with a badly packed index tree (especially if it happens to be an InnoDB table). They generate a number of file system accesses on the way. mysqld goes to sleep on the read requests, and the context switches from user mode to kernel mode.

Your system receives a number of file read requests, requesting it to load a number of blocks from the disk into the mysqld process. Eventually, the blocks show up in the buffer cache, and mysqld is woken up, receiving file data for the read requests. The context switches from kernel to user mode.

mysqld takes the data read, and constructs a result set from it. It pushes the result set into the network buffers, and goes to sleep on the writes, switching from user mode to kernel mode.

The kernel pushes the network buffer with the result set into the Apache process, and switches from kernel mode to user mode.

mod_php inside Apache receives a result set from the network, and copies it into a network buffer, pushing it to the requesting user. It goes to sleep on the write, switching from user mode to kernel mode.

The kernel receives a network buffer, eventually pushing the result into the network card, writing it to the user.

The entire process is repeated multiple times, because the image is too large to be read and written in a single operation. A thread in mysqld, and an instance of httpd are occupied doing this, until the end users client connection finally has sucked off the last few bytes of the image through its slow connection.


This is your lighttpd, serving a static file from a filesystem:
Your systems network buffer fills with a new packet containing a new request from the network card. The system delivers the packet to your lighty, switching from kernel mode into user mode.

Lighty parses the request, mapping it into a static file on disk. Lighty calls sendfile, switching from user mode to kernel mode.

The kernel sendfile() function processes filesystem buffer cache blocks, turning them inside the kernel into network buffers with no return to user mode. The kernel pushes these blocks out using the network card. When the file has been served, lighty is notified about that.


Moral: You never want to feed from a database what can be fed as static content.

Trivia: Lighttpd is a fast and light httpd server with extremely good support for static images and FCGI. It was written and and is currently maintained by MySQL employee Jan Kneschke.

Trackbacks

MySQL-dump on : Serving Images from a File System

In Serving Images from a Database, Sheeri responded to me: I appreciate your ideas, but you haven't read what I've written: "the sheer number of files is enough to kill any os." I'm serious about that. we're talking 36 million images -- that's 36 mill

Die wunderbare Welt von Isotopp on : Ihr Imageserver ist ein Datenleck

Eine monstergefährliche Sicherheitslücke bei StudiVZ wird derzeit als Sau der Woche durch Kleinbloggersdorf getrieben. Was passiert? StudiVZ betreibt einen Webserver für dynamische Anwendungen und einen zweiten Webserver für statische Daten, zum Beispi

Comments

Display comments as Linear | Threaded

Jan Steinman on :

"You never want to feed from a database what can be fed as static content."

Never say never!

I guess TEXT fields should be the next to go. But why stop there? Let's make every read access from a database supply that information from a file instead?

I've benchmarked my application with images in BLOBs and as separate files. The difference was single-digit percentages. I'm willing to pay that for the ease of backup and other niceties of using a single storage mechanism.

That's not to say that single-digit performance increases may be worth pursuing, or that there aren't other reasons for storing images in files instead of in BLOBs. I'm not even arguing against most images being in files.

But a dogmatic banning of images in the database does not impress me at all!

Isotopp on :

QUOTE:
I've benchmarked my application with images in BLOBs and as separate files. The difference was single-digit percentages. I'm willing to pay that for the ease of backup and other niceties of using a single storage mechanism.


One interesting question in benchmarking is performance difference, and you say you have seen only a little performance difference between both variants.

The other interesting question in benchmarking is "What is the benchmark maxing out on?". A benchmark is usually "ramp up the load on a system and measure the number of results as well as the latency seen in generating these". Usually you get a typical saturation curve such as those shown in my Linuxtag talk.

When the system is saturated you have not only established a performance plateau, but you also can tell why there is a plateau here - what is the ressource that is limiting performance in this particular case.

Knowing the reason is important, because in some cases the ressource that is maxed out is not the one you are interested in for performance metering. For example, if you in your case see only a few percent performance difference in your two cases, it may be because both times the system maxes out on something unrelated such as the performance limit of the load generator, the network interface or external internet connection or something else that is not really part of the "network buffer - webserver - database server - disk" pipeline, which is what you really want to meter here.

Dave on :

"The difference was single-digit percentages."

I'm willing to bet you didn't do this while load-testing. For a single request, you are very correct, the performance difference is negligible. But try serving 100 users each grabbing images from a database, and see if you're still hitting single digit performance differences. I'm willing to bet the relationship is almost exponential for performance by the number of requests. In the end, the filesystem is always the best container for information you don't plan on changing a lot. Leave the database to hold DATA.

Peter Hodge on :

"Moral: You never want to feed from a database what can be fed as static content."

Yep, so you have a background script which runs every 2 minutes and pulls new images out of the database and writes them to disk.

sheeri on :

so how do you recommend serving up 36,000,000 images, taking up 30 G of space, with over ten web servers and nfs doesn't cut it? the sheer number of files is enough to kill any os.

darix on :

harddisks are cheap nowadays.

even taking into account some overhead for syncing the new files to all servers. it is still lower than the DB overhead for each request.

you could even use some kind of smart caching:

1. write to DB
2. on request check if there is a local file. (if you are a lighttpd user, checkout mod_cml and power-magnet.)
3. on local cache miss -> request file from DB -> write to disk. serve file.

in the next round trip you serve the static file.

4. in case you are an apache user mod_rewrite's REWRITE_COND ! -f

should help you.

darix

Isotopp on :

darix has hit it right - using the database not as a delivery mechanism, but as a distribition mechanism is quite simple and elegant.

In another life I was working with a large freemail provider over here in Germany. This outfit had quite a few image server, and in their case, rsync was being used as a distribution mechanism for static files, and a specially configured webserver as a delivery mechanism for static files.

Using dynamically generated content instead has a very noticeable performance impact, because of the interprocess communication overhead outlines above, and because the memory usage per slot is much higher in the dynamic case, limiting the degree of available parallelism.

Databases have their place, but (compared to filesystems) handling large binary objects is not one of their strengths - and that is even before you start thinking about handling objects larger than max_allowed_packet inside your database.

Sheeri on :

I appreciate your ideas, but you haven't read what I've written: "the sheer number of files is enough to kill any os."

I'm serious about that. we're talking 36 million images -- that's 36 million files (no thumbnails, they're all reduced size) plus directories to store them in. We have cacheing tuned very well, and it's still kills the OS.

Physically we CANNOT store these images on disk as individual files. That's what we've been doing, and the performance is horrible. In a MyISAM database, each table is one file, and thus MUCH easier to handle. In an innodb database, all innodb tables are encompassed in one file (or you can split them out).

So performance will be MUCH improved. We have a unique situation, to be sure; photos are CONSTANTLY being looked at. These are personal ad images and so are being looked at at a rate that is much higher than, say, a photo site.

I agree that in most situations, your solution works. I can tell you from having done it that way, that we're reaching the limit on that. So as the first poster said, "never say never".

Wouter Verhelst on :

What filesystem are you doing this on? And did you create hashed directories to store your files in? (so that you would end up with stuff like "/images/s/st/sto/storage_container.jpg", which will vastly improve performance when trying to read files in a directory)

The problem really is not your operating system, but the efficiency with which the file system implements "finding a file in the file system" on such high numbers. Some file systems (e.g., ReiserFS, though I wouldn't recommend using that) have been specifically optimized for "having millions of files in one directory"; if your needs are that specific, you really want to look into such things.

Sheeri on :

We have indeed looked at those kinds of things, and they're not appropriate for our environment. Hence the reason that MySQL is appropriate, though I realize and have said that it's only appropriate in a very narrow set of circumstances, such as our own.

Daniel on :

I like how you say it's enough to kill any OS, but it seems like you just say that without backing it up (IE, what OSes have you even _tried_ this on?).

Ever heard of ZFS? ZFS wouldn't break a sweat under that kind of setup I'm more than willing to bet. So please, don't go spreading FUD, do your homework next time and use ZFS if you've seriously got that many photos.

PHP Dreamer on :

I'm curious. What kind of set ups do the big guys use to stream video and photos? I'm setting up a travel website using php and of course I'm encountering the same problems many here have alluded to. I know what the low end is (because I live there and I don't like it!) but what if I wanted to scale up? I assume you-tube and folks with cash have considerably better options.

In essence, what should I be shooting for? A "Flash" 'ala defunkt Macromedia" server operation? Anything in MYSQL land that is competitive?

Please don't say Mircrosoft anything since I will break out in hives at the sound of Mr. Softy!!

Thanks!

David Sheldon on :

This is what the X-Sendfile header supported by lighttpd does. Your php can decide what content to view, then pass back an instruction to lighttpd to serve that content direct from disk using sendfile.

alex on :

It would be great if lighty'd have special treatment for relative URL redirects. Lets say dynamic request handler returns redirect to relative thus "local" URL
to static content. Instead of sendind that redirect back to browser lighty might serve that redirect the same way if browser sent a new request for that URL in response to a redirect.

Saying "redirect" I mean 3xx HTTP response code family.

darix on :

can you describe your setup a bit better? oO mod_Cml should handle the request locally without a round trip to the browser oO

darix

alex on :

I'm not familiar with mod_cml but I'm afraid a cache could not help in my situation.

I want fast-cgi script to see _all_ requests and decide what static contet should be served since there is no static mapping between any information sent by browser and static content.

Lets consider current (not so good) scenario:
* suppose some URL is handled by some script that gets request from browser
* this script makes decision which static content matches a request based on some criteria and sends a redirect to browser
* given such a response browser sends another request for that static resource
* HTTP server handles request for static content

In better scenario there should be no phase when browser receives redirect and send another request. That could be handled by HTTP server internally

darix on :

mod_cml is no cache.

alex on :

Whoops...

I thought it was about mod_cml of lighttpd and http://www.lighttpd.net/documentation/cml.html says the following:
QUOTE:

CML (Cache Meta Language) wants to solves several problems:

* dynamic content needs caching to perform
* checking if the content is dirty inside of the application is usually more expensive than sending out the cached data
* a dynamic page is usually fragmented and the fragments have different livetimes
* the different fragements can be cached independently

darix on :

mod_cml is a way to control the caching. but it doesnt do the caching on its own.

see the complete documentation of it. :)

alex on :

Alternatively, it could be something similar to Apache's mod_rewrite map which is handled by external programm:

RewriteMap mapname prg:/path/to/request-mapper

Jay on :

I am confused about why you are comparing:

1) using Apache to serve images stored in the database

and

2) using Lighttpd to serve images stored as files

It seems to me that your message would be more effective if you would stick to one server and use that server in both sides of the comparison. (ie., use the server as the control in the experiment.)

As it stands, you are making the second part of the comparison appear shorter and simpler than it probably would be if it also used Apache.

Tom on :

If you're using Rails, this seems like a pretty solid, and simple, solution:

http://scottraymond.net/articles/2005/07/05/caching-images-in-rails

Rails will automatically create a cached version of the image after it's first requested - just as it would a dynamic page - if you set up your controllers correctly. Pretty cool.

Anthony on :

I can’t believe this is a discussion! It is just plain (computer) common sense. The file system IS a database. It may not be a rational database, but it is none-the-less. 2 million pictures, 26 million pictures, 30GB, 100GB, 1TB, it does not matter. MySQL has to read and write the same amount of information to the disk. MySQL is simply overhead. At the very least it has to perform the same amount of I/O. If you are not getting better performance going direct to the file system, then you are NOT using the file system correctly.

Some one snottily asks: Why not write text out the files system? Because text can be queried and BLOBs can’t. Is that not obvious? That is why rational database are used: To retrieve information.

And why all this talk about NFS? Use a cluster! They are cheap and easy to set up. You have to put some thought into it. Take a MD5 hash of the filename or the file’s data. MOD that number with the number of computers in your cluster and write the file to that computer number.

For example, you get an MD5 of 123456789. Then you MOD it with 10 because you have ten computers in the cluster getting 9. Store the file on computer number 9. And store the file in a directory path like /1/2/3/4/5/6/7/8/9/My Picture.jpg. Then put all the metadata in MySQL. Someone wants the picture:
http://img.home.com/1/2/3/4/5/6/7/8/9/My Picture.jpg

The web server goes directly to its local harddrive. Well it should be a LUN over fibre channel in a SAN cluster with redundant failover and WAN replication. Or just use rsync. There is so much stuff out there for files. I will leave researching that as exercise for the reader.

BTW, You can MOD it again and spread it out over several local harddrives. As harddrives fill you can easily move the data and increase the number of drives. Oh and fibre really is spelled ‘fibre’ and not fiber channel.

Sheeri on :

Yes, the issue is that while the filesystem *is* a database, it's not the *same* database. Which means a file may be changed on the filesystem and the metadata may not be updated. There's no way to enforce a transaction between a filesystem and a database.

The most common case is if the entry is deleted in the database but the file remains, or the file is deleted but the entry remains.

There is simply no way to ensure data integrity if you store files on a filesystem. If data integrity is not important, that's OK, but make sure it's a conscious decision.

Isotopp on :

Newer Unix systems including Linux have in fact APIs that allow applications such as databases to monitor a directory or even a directory hierarchy for file changes.

In Linux, the standard way to do this is fam.

CODE:
kris@linux:~> rpm -qi fam
...
Description :
Fam is a file alteration monitoring service. With it, you can receive
signals when files are created or changed.
The Examples section of the FAM manual has an example titles "Example: A client monitors a directory containing two files. Later, another program creates a third file." which is the situation that most closely matches your scenario.

Bryan Ledford on :

Have you considered using something like Amazon S3 for image storage. At $0.15/GB-month you'd be looking at a whopping $4.50 + transfer costs, which are pretty affordable.

Add Comment

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA

BBCode format allowed