How large can a MySQL database become?
In Maximum MySQL Database Size? Nick Duncan wants to find out what the maximum size of his MySQL database can possibly be. He answers that with a list of maximum file sizes per file system type. That is not a useful answer.
While every file system does have a maximum file size, this limitation is usually not relevant when it comes to MySQL maximum database size. But let's start with file systems, anyway.
First: You never want to run a database system on a FAT filesystem, ever. In FAT, a file is a linked list of blocks in the FAT. That is, certain "seek" (backwards seek operations) operations become slower the larger a file is, because the file system has to position the file pointer by traversing the linked list of blocks in the FAT. Since seek operations are basically what a large database does all day, FAT is completely useless for this. So the actual file size limit of FAT is kind of moot for the purpose of this discussion.
Second: You also never want to run a database system on a 32 bit operating system. Not only does that limit your file size in Windows and also in certain ways in Linux, it will also limit the amount of system memory you can invest into MySQL buffer caches. That's kind of useless, because it is memory that makes databases fast.
In 64 bit systems and with modern file systems (NTFS in Windows and XFS on LVM2 in Linux, on a recent kernel), the operating imposed file size limit is multiple terabytes or petabytes, even. We will soon see that the exact number is not really relevant.
While every file system does have a maximum file size, this limitation is usually not relevant when it comes to MySQL maximum database size. But let's start with file systems, anyway.
First: You never want to run a database system on a FAT filesystem, ever. In FAT, a file is a linked list of blocks in the FAT. That is, certain "seek" (backwards seek operations) operations become slower the larger a file is, because the file system has to position the file pointer by traversing the linked list of blocks in the FAT. Since seek operations are basically what a large database does all day, FAT is completely useless for this. So the actual file size limit of FAT is kind of moot for the purpose of this discussion.
Second: You also never want to run a database system on a 32 bit operating system. Not only does that limit your file size in Windows and also in certain ways in Linux, it will also limit the amount of system memory you can invest into MySQL buffer caches. That's kind of useless, because it is memory that makes databases fast.
In 64 bit systems and with modern file systems (NTFS in Windows and XFS on LVM2 in Linux, on a recent kernel), the operating imposed file size limit is multiple terabytes or petabytes, even. We will soon see that the exact number is not really relevant.
The limits listed in Nick's table are file size limits, they are not database size limits.
In MyISAM, the limit, if it is ever reached, means that a single table is limited to that much data or index size (data is stored in one file, with a MYD extension, and indices are stored in another file, with a MYI extension).
In InnoDB, we have two modes of operation, depending on innodb_file_per_table. If it is on, each table gets its own file with an ibd extension; if it is off, the actual tables are stored in a set of files most commonly called ibdataXXX, the size and number of which are freely configureable.
Every database uses indices, and in every database the index needs to be able to address the data records. In MyISAM, a table can have a FIXED or DYNAMIC row format (COMPRESSED also exists, but is not really relevant here). In FIXED format, the MYI file refers to positions in the MYD file by using record numbers. These are converted into byte offsets within the MYD file by multiplying with the fixed row length. In a DYNAMIC row format table, MYI files store byte offsets into the MYD file.
It is not useful to have a MYD file that is larger than the addressable range from an index, so MySQL prevents you from having such a file with the error message 'table full'. The addressable range is set in MyISAM with the table attribute MAX_ROWS: You can CREATE TABLE t ... MAX_ROWS = x or you can later ALTER TABLE t MAX_ROWS=... and MyISAM will determine how many bytes will be needed to store a pointer to a row to be able to address the data. Be warned, though: ALTER TABLE makes a copy of your data during the ALTER TABLE command, and that will take time. Running an ALTER TABLE on a 4 GB table can take quite some time - it is not a good idea to get the MAX_ROWS value too small.
If you do not define a MAX_ROWS attribute at all, MyISAM will use a default number of bytes for the row pointer: myisam_data_pointer_size. This used to be 4 bytes up to MySQL 5.0, and is 6 bytes since and including MySQL 5.0.
Like any default, it can of course be changed in your my.cnf file. If you change it, it will affect all newly created or altered tables, but not any existing tables. The default value of 6 bytes is good for 256TB of data in each single MYD file. Note that even MySQL versions prior to 5.0 could use more than 4 GB per file, but not by default. Note also that the documentation states that the data pointer could be as large as 8 bytes, but that is rather pointless as we will see later.
In InnoDB, the situation is more complicated: Secondary indices are using the primary key to address a row of data. Depending on the definition of the primary key that can be less or far more than in MyISAM. There are other internal limits in InnoDB, and these limit the size of any single table to 256 TB.
Thus, as long as your file system supports a maximum file size of at least 256 TB, you will never run into any file system limitations before MySQL runs into other internal limits.
But it is very likely that your database will perform very badly long before that: Databases need memory to perform efficiently - disk seeks are counted in milliseconds (1/1 000 of a second), while memory accesses are counted in nanoseconds (1/1 000 000 000 of a second, that is one million times faster). Thus, any disk seek necessary because of a cache miss will be about a hundred thousand times to a million times slower that a regular cache hit.
So the useful size of your database is most likely limited by the amount of RAM that your database can use. A useful rule of thumb is about 10 times the amount of memory you dedicate to your caches, but with knowledge of the internal operation of the database it is easy to build a pathological case that pessimizes that well below that, as well as to optimize it way past that limit.
In any case, memory is more likely to be the limiting factor than file size limits of your operating system.
In MyISAM, the limit, if it is ever reached, means that a single table is limited to that much data or index size (data is stored in one file, with a MYD extension, and indices are stored in another file, with a MYI extension).
In InnoDB, we have two modes of operation, depending on innodb_file_per_table. If it is on, each table gets its own file with an ibd extension; if it is off, the actual tables are stored in a set of files most commonly called ibdataXXX, the size and number of which are freely configureable.
Every database uses indices, and in every database the index needs to be able to address the data records. In MyISAM, a table can have a FIXED or DYNAMIC row format (COMPRESSED also exists, but is not really relevant here). In FIXED format, the MYI file refers to positions in the MYD file by using record numbers. These are converted into byte offsets within the MYD file by multiplying with the fixed row length. In a DYNAMIC row format table, MYI files store byte offsets into the MYD file.
It is not useful to have a MYD file that is larger than the addressable range from an index, so MySQL prevents you from having such a file with the error message 'table full'. The addressable range is set in MyISAM with the table attribute MAX_ROWS: You can CREATE TABLE t ... MAX_ROWS = x or you can later ALTER TABLE t MAX_ROWS=... and MyISAM will determine how many bytes will be needed to store a pointer to a row to be able to address the data. Be warned, though: ALTER TABLE makes a copy of your data during the ALTER TABLE command, and that will take time. Running an ALTER TABLE on a 4 GB table can take quite some time - it is not a good idea to get the MAX_ROWS value too small.
If you do not define a MAX_ROWS attribute at all, MyISAM will use a default number of bytes for the row pointer: myisam_data_pointer_size. This used to be 4 bytes up to MySQL 5.0, and is 6 bytes since and including MySQL 5.0.
Like any default, it can of course be changed in your my.cnf file. If you change it, it will affect all newly created or altered tables, but not any existing tables. The default value of 6 bytes is good for 256TB of data in each single MYD file. Note that even MySQL versions prior to 5.0 could use more than 4 GB per file, but not by default. Note also that the documentation states that the data pointer could be as large as 8 bytes, but that is rather pointless as we will see later.
In InnoDB, the situation is more complicated: Secondary indices are using the primary key to address a row of data. Depending on the definition of the primary key that can be less or far more than in MyISAM. There are other internal limits in InnoDB, and these limit the size of any single table to 256 TB.
Thus, as long as your file system supports a maximum file size of at least 256 TB, you will never run into any file system limitations before MySQL runs into other internal limits.
But it is very likely that your database will perform very badly long before that: Databases need memory to perform efficiently - disk seeks are counted in milliseconds (1/1 000 of a second), while memory accesses are counted in nanoseconds (1/1 000 000 000 of a second, that is one million times faster). Thus, any disk seek necessary because of a cache miss will be about a hundred thousand times to a million times slower that a regular cache hit.
So the useful size of your database is most likely limited by the amount of RAM that your database can use. A useful rule of thumb is about 10 times the amount of memory you dedicate to your caches, but with knowledge of the internal operation of the database it is easy to build a pathological case that pessimizes that well below that, as well as to optimize it way past that limit.
In any case, memory is more likely to be the limiting factor than file size limits of your operating system.
Comments
Display comments as Linear | Threaded
Nick on :
Alexey Kopytov on :
That is also documented here: http://dev.mysql.com/doc/refman/5.5/en/innodb-restrictions.html
Darpan Dinker on :
Instead of a rule of thumb, we help users characterize their workload in a real world. Working-set (data frequently accessed) determines how much DRAM is required for a database to achieve the desired levels of performance. If database becomes IO limited on HDDs, Flash memory such as FusionIO can help achieve order of magnitude improvements.
We constantly run into users who run MySQL on many systems without utilizing any system effectively because each system is limited and sized to run mostly out of memory (to avoid hitting HDDs on read-misses).
Another perspective :)
Stewart Smith on :
Shibabrata on :