Skip to content

Notes on VM

Even when it is being repeated once more it is not true:
Stripping binaries using the ‘strip’ utility can also significantly reduce the memory footprint of the application
claims John Coggeshall.

While it is true that a file is smaller on disk after a strip, a quick run of "size" on a binary will show you that the actual binary part of the file is unchanged. Let's have a quick look at /proc/pid/maps to understand what happens.
linux:~ # cat /proc/1/maps
08048000-080bb000 r-xp 00000000 fd:01 51083      /sbin/init
080bb000-080bd000 rwxp 00072000 fd:01 51083      /sbin/init
080bd000-080df000 rwxp 080bd000 00:00 0          [heap]
bfd7b000-bfd90000 rw-p bfd7b000 00:00 0          [stack]
ffffe000-fffff000 ---p 00000000 00:00 0          [vdso]

linux:~ # ls -Ll /dev/system/root
brw-r-----  1 root disk 253, 1 Oct 14 11:35 /dev/system/root
linux:~ # ls -li /sbin/init
51083 -rwxr-xr-x  1 root root 489792 Sep  9  2005 /sbin/init

This is the memory layout of the process with PID 1, init. init has been loaded from device fd:01 (253:1) and has an inode number of 51083. From that file, the data at offset 0x0 is mapped into the memory area 0x08048000 to 0x080bb000- From the same file, the data at offset 0x72000 is mapped into the memory area 0x080bb000 to 0x080bd000.

In fact, the data in these regions has not actually been loaded at all, but only mapped. Linux and most other Unices will never access the file unless the memory pages mapped are actually being accessed, triggering a demand paging to retrieve the data from disk, one page at a time.

Linux binary files are laid out in a way that pages on disk can be mapped into memory. Only the pages that are accessed ever hit memory. Symbol information and other debug stuff is part of the on-disk file, but since it is not mapped, it is never brought into memory, ever. Debug information has no influence at all on the memory consumption of your program. Double check with "size" and by looking at the memory map of your program: The segment sizes reported will be the same before and after the strip.

And while we are discussing VM management, let's have a look at this claim as well:
If running PHP in Apache you can increase the speed in some cases by 30% just by compiling PHP statically within Apache (Of course, this increases the footprint of Apache, and each of it’s children in prefork)

kris@h3118:~> cat /proc/7366/maps| head -10
08048000-08099000 r-xp 00000000 03:05 3113038    /usr/sbin/httpd2-prefork
08099000-0809c000 rw-p 00051000 03:05 3113038    /usr/sbin/httpd2-prefork
0809c000-08252000 rwxp 0809c000 00:00 0
40000000-40013000 r-xp 00000000 03:05 2343070    /lib/
40013000-40014000 rw-p 00013000 03:05 2343070    /lib/
40014000-40015000 rw-p 40014000 00:00 0
40015000-40025000 r-xp 00000000 03:05 6832163    /lib/
40025000-40026000 rw-p 0000f000 03:05 6832163    /lib/
40026000-40028000 r-xp 00000000 03:05 1523985    /usr/lib/apache2/
40028000-40029000 rw-p 00001000 03:05 1523985    /usr/lib/apache2/

This is part of the memory map of my preforked, dynamically linked Apache. It consists of the httpd image, text, data and bss segment neatly layed out, followed by all the shared libraries (Apache modules) loaded. You will notice a number of read-only (text from "size" output) segments and writeable (data and bss from "size" output) segments.

When starting Apache, all of this is loaded (well, only mapped, actually) and then Apache forks and the children inherit the mapping. All read-only segments are using physical memory only once. For example the file 3113038 from 03:05 (httpd2-prefork) is actually shared between all children of Apache and occupies memory only once.

The file 2343070 from 03:05 (, the dynamlic linker) is shared even wider: The shared library is mapped into memory only once, but used by every dynamic binary in the system.

When you compile Apache statically, it will include a (partial) copy of libc, libz and all Apache modules used inside the httpd2-prefork binary, which will in turn be much larger than a dynamically linked httpd2-prefork binary. When you startup that file, it will indeed use more memory than a dynamically linked Apache, because instead of sharing a copy if libc, libz and other libraries, it will load its own private copy in time when the library pages are referenced.

But these are one time startup costs. When your static httpd2-prefork forks and generates worker slaves of itself, these children will actually share the text segment of their parent and the text segment of the httpd2-prefork image will occupy physical memory only once, even if it is being mapped hundreds of times.

Even better: The writeable parts of the image will be shared between the master and the worker slaves as well, initially. Only when a slave writes to a page of its writeable mapped memory, a copy is being made of that page, so that each slave will only copy the pages of writeable data that it actually touches.

So while there is a slight overhead associated with statically linking and running Apache, on a dedicated Apache machine that overhead may be much, much smaller than you actually think: You will have the httpd2-prefork code in physical memory only once, and the data set of each httpd2-prefork instance will be minimal at the page level.

In reality the memory overhead of a statically linked httpd2-prefork binary is very small. It is approximately loading a single additional copy of all the libraries that would have been shared in a dynamic binary and that are not specific to Apache in the first place (mostly libc and libm in a dedicated web server).


No Trackbacks


Display comments as Linear | Threaded

Anonymous on :

Don't listen to John. Most of the folks in the PHP community tend to ignore him. He means well, but he doesn't really understand the technology very well.

Rasmus on :

The presentation isn't bad. Just make sure you distinguish between performance and scalability and on that slide near the end I agree that suggesting stripped binaries is rather misguided and instead of talking about linking statically he should be explaining the performance difference between PIC and non-PIC libraries. The PHP Apache DSO is built non-PIC by default these days so you aren't going to see significant gains from linking it statically.

Isotopp on :

The presentation is okay. It is just these VM thingies that are niggling me, because almost nobody seems to be getting them right.

Add Comment

E-Mail addresses will not be displayed and will only be used for E-Mail notifications.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.

BBCode format allowed