A quick tour of DRBD

Snapshot of the vmware config used (two running instances required for the example)
The DRBD tour in this blog post has been created on two vmware instances with a Suse 10.0 Professional installation on each which I am using to show the most essential features of DRBD. Each vmware has a bit of memory, a network card, a boot disk with a text only Suse 10 installation and a second simulated 1 GB SCSI disk besides the boot disk to demonstrate stuff. The two instances are connected on a simulated local vmnet instance and share the 10.99.99.x/24 network, they are called left
(10.99.99.128) and right (10.99.99.129).
On each machine, in addition to the most basic Suse 10 Pro installation, the packages km_drbd and drbd have been installed. On left, we are partitioning the second hard disk completely for use with Linux using fdisk, creating /dev/sdb1 for use with DRBD.
We now have to define a /etc/drbd.conf to configure that disk into DRBD. A basic drbd.conf has to define a resource. A ressource is something that contains a disk partition on the left node, a matching partition on the right node, a network connection between them and definitions for error handlung and synchronisation.
The config file format is straight forward and uses named blocks in curly brackets and semicolon-terminated statements - if you know named, you'll feel right at home.
global options are set in a section aptly named global, and are currently limited to "minor-count" (the number of DRBD resources you'll be able to define), "dialog-refresh" (how quickly the startup dialog will redraw itself) and "disable-ip-verification" (disable some startup sanity checking that verifies that we are on the right machine).
All other config happens inside a resource section. That section needs to have a name, and the name can be anything, and can be quoted. We happen to use something boring like r0. Any ressource needs to have a protocol defined, and requires two "on" sections which define the two hosts we are going to use. It can also have startup, syncher, net and disk sections.
The protocol in DRBD is something like the "innodb_flush_log_at_trx_commit" in MySQL: It determines when the node comitting a disk write considers that write to be a success, and has essential influence on the speed and resilency of your DRBDs. It can be defined as A, B or C:
The beef is in the "on" sections, which also need names, specifically the hostname of the system carrying the device you want to mirror to. Inside your "on" section you'll define a device, a disk, an address and a meta-disk. This is fairly straightforward: The device is the /dev/drbdn which we will be using later on to work with.
The disk is the underlying real storage that will carry all our data, the address is an ip:udp-port pair used to talk to the local DRBD instance for this device (a different UDP port is needed for each DRBD disk) and the meta-disk is either internal or some dedicated metadata storage device. For simplicity of our example, we are using internal for now (DRBD will then use 128M of /dev/sdb1 for its internal purposes. Yes, that is a lot!).
Look at the on-sections in the example above: On left and right we will be using /dev/drbd0, and have it write to /dev/sdb1. We will communicate using UDP port 7788 on both ip numbers, 128 and 129.
How we handle local disk errors, we specify using the on-io-error handler of the unnamed disk section. "detach" means that on error we simply forget the local disk and operate in diskless mode: We read and write data from and to the disk of the remote node across the network. Other options are "pass_on" (the primary reports the error, the secondary ignores it) and "panic" (the nodes leaves the cluster with a kernel panic).
Both nodes are connected using a net section. Inside the net section, which is unnamed, we define the buffers and timeouts used by DRBD: sndbuf-size, timeout, connect-int, ping-int, max-buffers, max-epoch-size, ko-count, on-disconnect.
The sndbuf is being specified in KB, and determines how much buffer the local DRBD will reserve for communication with the remote node. It should be no smaller than 32K and no larger than 1M. The optimum size is dependent on your bandwidth delay product for the connection to the remote node.
If the partner node does not reply in timeout tenths of a second, this counts as a K.O. After ko-count of these, the partner is considered dead and dropped out of the cluster. The primary then goes into standalone mode. Also, the connection to the partner node is dropped on timeout, and will be restablished immediately. If that fails, every connect-int seconds a new try is being made. If on the other hand the connection between the two nodes is idle for more than ping-int seconds, a DRBD-internal ping is sent to the remote to check if it is still present.
How the node handles a disconnect can be specified using the on-disconnect handler: Valid choices are stand_alone (go from primary to standalone mode), reconnect (try to reconnect as described above) or freeze_io (try to reconnect, but halt all I/O as in a NFS hard mount, until the reconnect is successful).
DRBD uses 4K buffers to buffer writes to disk, and it does use at most max-buffers many of these (minimum 32, coming up to at least 128K). If you see many writes, this number needs to be set to some larger values.
max-epoch-size needs to be at least 10, and determines how many data blocks may be seen at most between two write barriers.
Across the network connection, the syncer does its work to keep both disks neat and tidy. It will use at most rate K/sec of bandwidth to do so, and the default is quite low (250 K/sec). For synchronisation, the disks are cut up slices and for each slice an al-extent is being used to indicate if and where it has been changed. A larger number of al-extents makes resynchronisation slower, but requires fewer metadata writes.
The number used here should be a prime, because it is used internally in hashes that benefit from prime number sized structures.
If you have multiple devices, all of these that are in the same group are resynchronized in parallel. If two DRBD devices reside on different physical disks, you can put them into the same group so that they are resynchronized in parallel without competing for seeks on the same disk. If two DRBD devices are partition on the same physical device, put them into different groups to avoid disk head threshing.
Inside the startup section, which is also is unnamed, we define two wait-for-connection timeouts: On startup, DRBD will try to find its partner node on the network. DRBD will remember if it was "degraded" the last time it went down or not - "degraded" here means that the partner node already was down and we are missing a mirror half. If we have been degraded prior to the node restart, we wait for 120 seconds for the second node to come up and continue the boot otherwise. If we have not been degraded, we require the second node to be present for our own
boot to complete (or require manual intervention).
Using this config file on left and right (copy it over using scp!), we can start DRBD on left and right. According to our config, left will hang until the start on right has completed ("wfc-timeout" is set to 0).
The system is now not in sync, and DRBD does not know which disk
is the leading disk in our little cluster. So both disks are in
secondary mode, and cannot be written to.
To change that, we need to make one disk (the one on left) a
primary, and then watch the system synchronize.
When we try to switch the left copy of r0 to primary, this does not work - the local replica is marked inconsistent and the system cannot decide if it can act as a proper primary. We have to insist that the left copy of r0 is the one we want to become the primary using the "-- --do-what-I-say" sledgehammer.
The system will then start to sync both disks, and by monitoring /proc/drbd we can follow the progress of this operation.
Our syncher is limited to 10M per second due to configuration, and so we will a synchronisation rate of approximately 10M/sec in /proc/drbd - the sync will take almost 1.5 minutes to complete.
We do not have to wait: Even with the sync running we are free to operate on the primary copy as we like. We like to have a file system on /dev/drbd0 and then mount this. Here is how:
We get 860M useable, plus 32M reiserfs journal, plus 128M drbd overhead. This really is not useful for small partitions.
Meanwhile, on right:
Onto this system we now install MySQL.
To fail over from left to right, a number of things need to be done:
Here is the manual switchover, on left:
And the inverse, on right:
In a real MySQL failover scenario, we do not know why the failover took place in the first place, and if the server data on the DRBD disk is useable. Thus, MySQL would most likely need to run a InnoDB recover and should also be run with MyISAM table autorepair. This will slow down failover time, a lot if you happen to require a very large InnoDB log.
CODE:
left:/tmp/share # fdisk /dev/sdb
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-130, default 1): 1
Last cylinder or +size or +sizeM or +sizeK (1-130, default 130): 130
Command (m for help): p
Disk /dev/sdb: 1073 MB, 1073741824 bytes
255 heads, 63 sectors/track, 130 cylinders
Units = cylinders of 16065 \* 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sdb1 1 130 1044193+ 83 Linux
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
left:/tmp/share #
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-130, default 1): 1
Last cylinder or +size or +sizeM or +sizeK (1-130, default 130): 130
Command (m for help): p
Disk /dev/sdb: 1073 MB, 1073741824 bytes
255 heads, 63 sectors/track, 130 cylinders
Units = cylinders of 16065 \* 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sdb1 1 130 1044193+ 83 Linux
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
left:/tmp/share #
We now have to define a /etc/drbd.conf to configure that disk into DRBD. A basic drbd.conf has to define a resource. A ressource is something that contains a disk partition on the left node, a matching partition on the right node, a network connection between them and definitions for error handlung and synchronisation.
The config file format is straight forward and uses named blocks in curly brackets and semicolon-terminated statements - if you know named, you'll feel right at home.
CODE:
global {
# we want to be able to use up to 5 drbd devices
minor-count 5;
dialog-refresh 5; # 5 seconds
}
resource r0 {
protocol C;
incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f";
on left {
device /dev/drbd0;
disk /dev/sdb1;
address 10.99.99.128:7788;
meta-disk internal;
}
on right {
device /dev/drbd0;
disk /dev/sdb1;
address 10.99.99.129:7788;
meta-disk internal;
}
disk {
on-io-error detach;
}
net {
max-buffers 2048;
ko-count 4;
on-disconnect reconnect;
}
syncer {
rate 10M;
group 1;
al-extents 257; # must be a prime number
}
startup {
wfc-timeout 0;
degr-wfc-timeout 120; # 2 minutes.
}
}
# we want to be able to use up to 5 drbd devices
minor-count 5;
dialog-refresh 5; # 5 seconds
}
resource r0 {
protocol C;
incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f";
on left {
device /dev/drbd0;
disk /dev/sdb1;
address 10.99.99.128:7788;
meta-disk internal;
}
on right {
device /dev/drbd0;
disk /dev/sdb1;
address 10.99.99.129:7788;
meta-disk internal;
}
disk {
on-io-error detach;
}
net {
max-buffers 2048;
ko-count 4;
on-disconnect reconnect;
}
syncer {
rate 10M;
group 1;
al-extents 257; # must be a prime number
}
startup {
wfc-timeout 0;
degr-wfc-timeout 120; # 2 minutes.
}
}
global options are set in a section aptly named global, and are currently limited to "minor-count" (the number of DRBD resources you'll be able to define), "dialog-refresh" (how quickly the startup dialog will redraw itself) and "disable-ip-verification" (disable some startup sanity checking that verifies that we are on the right machine).
All other config happens inside a resource section. That section needs to have a name, and the name can be anything, and can be quoted. We happen to use something boring like r0. Any ressource needs to have a protocol defined, and requires two "on" sections which define the two hosts we are going to use. It can also have startup, syncher, net and disk sections.
The protocol in DRBD is something like the "innodb_flush_log_at_trx_commit" in MySQL: It determines when the node comitting a disk write considers that write to be a success, and has essential influence on the speed and resilency of your DRBDs. It can be defined as A, B or C:
- In protcol C, which is the recommended setting, a write is considered completed when it has reached stable storage on the local and the remote node.
- In protocol B, we relax the constraints a little and consider the write completed when it
has reached the local disk and the remote buffer cache. This should be faster than C, but for some reason currently is not, so you should not be using it. - In protocol A we consider a write completed when it has reached the local disk and the local TCP send buffer. This may be okay for you, but for most people it is not.
The beef is in the "on" sections, which also need names, specifically the hostname of the system carrying the device you want to mirror to. Inside your "on" section you'll define a device, a disk, an address and a meta-disk. This is fairly straightforward: The device is the /dev/drbdn which we will be using later on to work with.
The disk is the underlying real storage that will carry all our data, the address is an ip:udp-port pair used to talk to the local DRBD instance for this device (a different UDP port is needed for each DRBD disk) and the meta-disk is either internal or some dedicated metadata storage device. For simplicity of our example, we are using internal for now (DRBD will then use 128M of /dev/sdb1 for its internal purposes. Yes, that is a lot!).
Look at the on-sections in the example above: On left and right we will be using /dev/drbd0, and have it write to /dev/sdb1. We will communicate using UDP port 7788 on both ip numbers, 128 and 129.
How we handle local disk errors, we specify using the on-io-error handler of the unnamed disk section. "detach" means that on error we simply forget the local disk and operate in diskless mode: We read and write data from and to the disk of the remote node across the network. Other options are "pass_on" (the primary reports the error, the secondary ignores it) and "panic" (the nodes leaves the cluster with a kernel panic).
Both nodes are connected using a net section. Inside the net section, which is unnamed, we define the buffers and timeouts used by DRBD: sndbuf-size, timeout, connect-int, ping-int, max-buffers, max-epoch-size, ko-count, on-disconnect.
The sndbuf is being specified in KB, and determines how much buffer the local DRBD will reserve for communication with the remote node. It should be no smaller than 32K and no larger than 1M. The optimum size is dependent on your bandwidth delay product for the connection to the remote node.
If the partner node does not reply in timeout tenths of a second, this counts as a K.O. After ko-count of these, the partner is considered dead and dropped out of the cluster. The primary then goes into standalone mode. Also, the connection to the partner node is dropped on timeout, and will be restablished immediately. If that fails, every connect-int seconds a new try is being made. If on the other hand the connection between the two nodes is idle for more than ping-int seconds, a DRBD-internal ping is sent to the remote to check if it is still present.
How the node handles a disconnect can be specified using the on-disconnect handler: Valid choices are stand_alone (go from primary to standalone mode), reconnect (try to reconnect as described above) or freeze_io (try to reconnect, but halt all I/O as in a NFS hard mount, until the reconnect is successful).
DRBD uses 4K buffers to buffer writes to disk, and it does use at most max-buffers many of these (minimum 32, coming up to at least 128K). If you see many writes, this number needs to be set to some larger values.
max-epoch-size needs to be at least 10, and determines how many data blocks may be seen at most between two write barriers.
Across the network connection, the syncer does its work to keep both disks neat and tidy. It will use at most rate K/sec of bandwidth to do so, and the default is quite low (250 K/sec). For synchronisation, the disks are cut up slices and for each slice an al-extent is being used to indicate if and where it has been changed. A larger number of al-extents makes resynchronisation slower, but requires fewer metadata writes.
The number used here should be a prime, because it is used internally in hashes that benefit from prime number sized structures.
If you have multiple devices, all of these that are in the same group are resynchronized in parallel. If two DRBD devices reside on different physical disks, you can put them into the same group so that they are resynchronized in parallel without competing for seeks on the same disk. If two DRBD devices are partition on the same physical device, put them into different groups to avoid disk head threshing.
Inside the startup section, which is also is unnamed, we define two wait-for-connection timeouts: On startup, DRBD will try to find its partner node on the network. DRBD will remember if it was "degraded" the last time it went down or not - "degraded" here means that the partner node already was down and we are missing a mirror half. If we have been degraded prior to the node restart, we wait for 120 seconds for the second node to come up and continue the boot otherwise. If we have not been degraded, we require the second node to be present for our own
boot to complete (or require manual intervention).
Using this config file on left and right (copy it over using scp!), we can start DRBD on left and right. According to our config, left will hang until the start on right has completed ("wfc-timeout" is set to 0).
CODE:
left:~ # rcdrbd start
Starting DRBD resources: [ d0 s0 n0 ].
..........
DRBD's startup script waits for the peer node(s) to appear.
- In case this node was already a degraded cluster before the
reboot the timeout is 120 seconds. [degr-wfc-timeout]
- If the peer was available before the reboot the timeout will
expire after 0 seconds. [wfc-timeout]
(These values are for resource 'r0'; 0 sec -> wait forever)
To abort waiting enter 'yes' [ 10]:
Starting DRBD resources: [ d0 s0 n0 ].
..........
DRBD's startup script waits for the peer node(s) to appear.
- In case this node was already a degraded cluster before the
reboot the timeout is 120 seconds. [degr-wfc-timeout]
- If the peer was available before the reboot the timeout will
expire after 0 seconds. [wfc-timeout]
(These values are for resource 'r0'; 0 sec -> wait forever)
To abort waiting enter 'yes' [ 10]:
The system is now not in sync, and DRBD does not know which disk
is the leading disk in our little cluster. So both disks are in
secondary mode, and cannot be written to.
CODE:
left:~ # cat /proc/drbd
version: 0.7.13 (api:77/proto:74)
SVN Revision: 1942 build by root@d233, 2006-01-21 02:46:41
0: cs:Connected st:Secondary/Secondary ld:Inconsistent
ns:0 nr:0 dw:0 dr:0 al:0 bm:112 lo:0 pe:0 ua:0 ap:0
version: 0.7.13 (api:77/proto:74)
SVN Revision: 1942 build by root@d233, 2006-01-21 02:46:41
0: cs:Connected st:Secondary/Secondary ld:Inconsistent
ns:0 nr:0 dw:0 dr:0 al:0 bm:112 lo:0 pe:0 ua:0 ap:0
To change that, we need to make one disk (the one on left) a
primary, and then watch the system synchronize.
CODE:
left:~ # drbdadm primary r0
ioctl(,SET_STATE,) failed: Input/output error
Local replica is inconsistent (--do-what-I-say ?)
Command '/sbin/drbdsetup /dev/drbd0 primary' terminated with
exit code 21
left:~ # drbdadm -- --do-what-I-say primary r0
left:~ # cat /proc/drbd
version: 0.7.13 (api:77/proto:74)
SVN Revision: 1942 build by root@d233, 2006-01-21 02:46:41
0: cs:SyncSource st:Primary/Secondary ld:Consistent
ns:9204 nr:0 dw:0 dr:9204 al:0 bm:112 lo:0 pe:0 ua:0 ap:0
[>...................] sync'ed: 1.4% (903916/913120)K
finish: 0:01:34 speed: 9,204 (9,204) K/sec
ioctl(,SET_STATE,) failed: Input/output error
Local replica is inconsistent (--do-what-I-say ?)
Command '/sbin/drbdsetup /dev/drbd0 primary' terminated with
exit code 21
left:~ # drbdadm -- --do-what-I-say primary r0
left:~ # cat /proc/drbd
version: 0.7.13 (api:77/proto:74)
SVN Revision: 1942 build by root@d233, 2006-01-21 02:46:41
0: cs:SyncSource st:Primary/Secondary ld:Consistent
ns:9204 nr:0 dw:0 dr:9204 al:0 bm:112 lo:0 pe:0 ua:0 ap:0
[>...................] sync'ed: 1.4% (903916/913120)K
finish: 0:01:34 speed: 9,204 (9,204) K/sec
When we try to switch the left copy of r0 to primary, this does not work - the local replica is marked inconsistent and the system cannot decide if it can act as a proper primary. We have to insist that the left copy of r0 is the one we want to become the primary using the "-- --do-what-I-say" sledgehammer.
The system will then start to sync both disks, and by monitoring /proc/drbd we can follow the progress of this operation.
Our syncher is limited to 10M per second due to configuration, and so we will a synchronisation rate of approximately 10M/sec in /proc/drbd - the sync will take almost 1.5 minutes to complete.
We do not have to wait: Even with the sync running we are free to operate on the primary copy as we like. We like to have a file system on /dev/drbd0 and then mount this. Here is how:
CODE:
left:~ # mkreiserfs /dev/drbd0
...
Format 3.6 with standard journal
Count of blocks on the device: 228272
Number of blocks consumed by mkreiserfs formatting process: 8218
Blocksize: 4096
Hash function used to sort names: "r5"
Journal Size 8193 blocks (first block 18)
Journal Max transaction length 1024
inode generation number: 0
UUID: 3f9270dd-894a-4da4-8818-35b691504974
ATTENTION: YOU SHOULD REBOOT AFTER FDISK!
ALL DATA WILL BE LOST ON '/dev/drbd0'!
Continue (y/n):y
Initializing journal - 0%....20%....40%....60%....80%....100%
Syncing..ok
ReiserFS is successfully created on /dev/drbd0.
left:~ # mount /dev/drbd0 /usr/local
left:~ # df -Th /usr/local
Filesystem Type Size Used Avail Use% Mounted on
/dev/drbd0
reiserfs 892M 33M 860M 4% /usr/local
...
Format 3.6 with standard journal
Count of blocks on the device: 228272
Number of blocks consumed by mkreiserfs formatting process: 8218
Blocksize: 4096
Hash function used to sort names: "r5"
Journal Size 8193 blocks (first block 18)
Journal Max transaction length 1024
inode generation number: 0
UUID: 3f9270dd-894a-4da4-8818-35b691504974
ATTENTION: YOU SHOULD REBOOT AFTER FDISK!
ALL DATA WILL BE LOST ON '/dev/drbd0'!
Continue (y/n):y
Initializing journal - 0%....20%....40%....60%....80%....100%
Syncing..ok
ReiserFS is successfully created on /dev/drbd0.
left:~ # mount /dev/drbd0 /usr/local
left:~ # df -Th /usr/local
Filesystem Type Size Used Avail Use% Mounted on
/dev/drbd0
reiserfs 892M 33M 860M 4% /usr/local
We get 860M useable, plus 32M reiserfs journal, plus 128M drbd overhead. This really is not useful for small partitions.
Meanwhile, on right:
CODE:
right:~ # cat /proc/drbd
version: 0.7.13 (api:77/proto:74)
SVN Revision: 1942 build by root@d233, 2006-01-21 02:46:41
0: cs:SyncTarget st:Secondary/Primary ld:Inconsistent
ns:0 nr:684728 dw:684728 dr:0 al:0 bm:149 lo:0 pe:0 ua:0 ap:0
[=============>......] sync'ed: 68.2% (294148/913120)K
finish: 0:00:25 speed: 11,380 (9,824) K/sec
version: 0.7.13 (api:77/proto:74)
SVN Revision: 1942 build by root@d233, 2006-01-21 02:46:41
0: cs:SyncTarget st:Secondary/Primary ld:Inconsistent
ns:0 nr:684728 dw:684728 dr:0 al:0 bm:149 lo:0 pe:0 ua:0 ap:0
[=============>......] sync'ed: 68.2% (294148/913120)K
finish: 0:00:25 speed: 11,380 (9,824) K/sec
Onto this system we now install MySQL.
CODE:
# tar -C /usr/local -xf /tmp/share/mysql-max-5.0.22-linux-i686-glibc23.tar.gz
# cd /usr/local
# ln -s mysql-max-5.0.22-linux-i686-glibc23/ mysql
# cd mysql
# groupadd mysql
# useradd -g mysql mysql
# chown -R mysql.mysql .
# ./scripts/mysql_install_db --user=mysql
# ./support_files/mysql.server start
...
# ./bin/mysql -u root
# cd /usr/local
# ln -s mysql-max-5.0.22-linux-i686-glibc23/ mysql
# cd mysql
# groupadd mysql
# useradd -g mysql mysql
# chown -R mysql.mysql .
# ./scripts/mysql_install_db --user=mysql
# ./support_files/mysql.server start
...
# ./bin/mysql -u root
To fail over from left to right, a number of things need to be done:
- MySQL needs to be stopped.
- The disk needs to be unmounted.
- The disk needs to be put in secondary on left.
- The disk needs to be put in primary on right.
- The disk needs to be mounted on right.
- MySQL needs to be started.
Here is the manual switchover, on left:
CODE:
left:~ #
/usr/local/mysql/support-files/mysql.server stop
Shutting down MySQL.
left:~ # umount /dev/drbd0
left:~ # drbdadm secondary r0
left:~ #
/usr/local/mysql/support-files/mysql.server stop
Shutting down MySQL.
left:~ # umount /dev/drbd0
left:~ # drbdadm secondary r0
left:~ #
And the inverse, on right:
CODE:
right:~ # cat /proc/drbd
version: 0.7.13 (api:77/proto:74)
SVN Revision: 1942 build by root@d233, 2006-01-21 02:46:41
0: cs:Connected st:Secondary/Secondary ld:Consistent
ns:0 nr:1109196 dw:1109196 dr:0 al:0 bm:168 lo:0 pe:0 ua:0
ap:0
right:~ # drbdadm primary r0
right:~ # cat /proc/drbd
version: 0.7.13 (api:77/proto:74)
SVN Revision: 1942 build by root@d233, 2006-01-21 02:46:41
0: cs:Connected st:Primary/Secondary ld:Consistent
ns:0 nr:1109196 dw:1109196 dr:0 al:0 bm:168 lo:0 pe:0 ua:0
ap:0
right:~ # mount /dev/drbd0 /usr/local
right:~ # /usr/local/mysql/support-files/mysql.server start
Starting MySQL.
version: 0.7.13 (api:77/proto:74)
SVN Revision: 1942 build by root@d233, 2006-01-21 02:46:41
0: cs:Connected st:Secondary/Secondary ld:Consistent
ns:0 nr:1109196 dw:1109196 dr:0 al:0 bm:168 lo:0 pe:0 ua:0
ap:0
right:~ # drbdadm primary r0
right:~ # cat /proc/drbd
version: 0.7.13 (api:77/proto:74)
SVN Revision: 1942 build by root@d233, 2006-01-21 02:46:41
0: cs:Connected st:Primary/Secondary ld:Consistent
ns:0 nr:1109196 dw:1109196 dr:0 al:0 bm:168 lo:0 pe:0 ua:0
ap:0
right:~ # mount /dev/drbd0 /usr/local
right:~ # /usr/local/mysql/support-files/mysql.server start
Starting MySQL.
In a real MySQL failover scenario, we do not know why the failover took place in the first place, and if the server data on the DRBD disk is useable. Thus, MySQL would most likely need to run a InnoDB recover and should also be run with MyISAM table autorepair. This will slow down failover time, a lot if you happen to require a very large InnoDB log.
Trackbacks
Die wunderbare Welt von Isotopp on : LVM, DRBD
Zwei Artikel im englischsprachigen Blog MySQL Dump: A Quick Tour of LVM: Was ist LVM und wie setzt man es ein? Eine Schritt-für-Schritt Lösung auf einer Suse 10 Professional in einer VMwareA Quick Tour of DRBD: Was ist DRBD und wie setzt man es ein? Eine
Comments
Display comments as Linear | Threaded
Axel Eble on :
If so, where to put either one? Software RAID as lowest level, DRBD on top and finally, on top of that, LVM?
Isotopp on :
Dave Edwards on :
Dave.
Ted Osadchuk on :
VMWare is a great tool for proof of concept, but lets see how this failover strategy works out there in the real world when a table recovery on a few dozen 6gb tables might take a few days
I really wish MySQL would stop trying to be something it's not, a high availability reliable transactional database. MySQL is a lightweight, easy to use, low overhead database system and it'll never be anything more (anyone who's worked in high volume mysql environments knows that this is the truth)
-t
Isotopp on :
Regarding the recovery situation: This depends a lot on which storage engine you are using. If you are using MyISAM tables, it is strongly recommended that you have set up a myisam_recover option in your my.ini. Recovery time after a switchover is dependent on the amount of data you have, which can be very long if your database is large.
With a transactional table type, things are very different: Here you'll see a recovery that is proportional to the size of the used part of the redo log, which usually is much quicker. DRBD and InnoDB for example work very well and will have decent switchover times, which do not depend on the size of the data.
Krishna Chandra Prajapati on :
Pedro on :
When are you going to come out with the heartbeat article?
Thanks,
Pedro
Matt Ruggles on :
James on :
Darek on :
Darren Cassar on :
DRDB stands for Distributed Replicated Block Device not Distributed Raw Block Device.
I tried emailing you instead of posting it as a comment but couldn't find a way :/ sorry. Just tought you should change it for correctness sake.
Cheers,
Darren
Manish on :
Can I assigin multiple disks target on device /dev/drbd0 like
device /dev/drbd0;
disk /dev/sdb1 /dev/sdc1;
Which combine two disks and make like raid 0.
Thank You,
Manish
Kristian Köhntopp on :