MySQL Fine Tunning Tips, examples, recomendations my.iniMySQL's
full-text search capability has few user-tunable parameters.
You can exert more control over full-text searching behavior if you have a MySQL source distribution because some
changes require source code modifications. See Section 2.4.15, “MySQL Installation Using a Source Distribution
Note that full-text search is carefully tuned for the most effectiveness. Modifying the default behavior in most cases
can actually decrease effectiveness. Do not alter the MySQL sources unless you know what you are doing.
Most full-text variables described in this section must be set at server startup time. A server restart is required
to change them; they cannot be modified while the server is running.
Some variable changes require that you rebuild the FULLTEXT indexes in your tables. Instructions for doing this are given at the end of this section.
The minimum and maximum lengths of words to be indexed are defined by the ft_min_word_len and ft_max_word_len system variables.
(See Section 5.1.3, “System Variables”.) The default minimum value is four characters; the default maximum is version dependent.
If you change either value, you must rebuild your FULLTEXT indexes. For example, if you want three-character words to be searchable,
you can set the ft_min_word_len variable by putting the following lines in an option file:
Then you must restart the server and rebuild your FULLTEXT indexes. Note particularly the remarks regarding myisamchk in
the instructions following this list.
To override the default stopword list, set the ft_stopword_file system variable. (See Section 5.1.3, “System Variables”.)
The variable value should be the pathname of the file containing the stopword list, or the empty string to disable stopword filtering.
After changing the value of this variable or the contents of the stopword file, restart the server and rebuild your FULLTEXT indexes.
The stopword list is free-form. That is, you may use any non-alphanumeric character such as newline, space, or comma to separate stopwords.
Exceptions are the underscore character (“_”) and a single apostrophe (“'”) which are treated as part of a word. The character set of the
stopword list is the server's default character set; see Section 220.127.116.11, “Server Character Set and Collation”.
The 50% threshold for natural language searches is determined by the particular weighting scheme chosen.
To disable it, look for the following line in myisam/ftdefs.h:
#define GWS_IN_USE GWS_PROB
Change that line to this:
#define GWS_IN_USE GWS_FREQ
Then recompile MySQL. There is no need to rebuild the indexes in this case.
By making this change, you severely decrease MySQL's ability to provide adequate relevance values for the MATCH() function.
If you really need to search for such common words, it would be better to search using IN BOOLEAN MODE instead, which does not observe
the 50% threshold.
To change the operators used for boolean full-text searches, set the ft_boolean_syntax system variable.
This variable can be changed while the server is running, but you must have the SUPER privilege to do so.
No rebuilding of indexes is necessary in this case. See Section 5.1.3, “System Variables”, which describes the rules governing how to set this variable.
If you want to change the set of characters that are considered word characters, you can do so in two ways.
Suppose that you want to treat the hyphen character ('-') as a word character. Use either of these methods:
Modify the MySQL source: In myisam/ftdefs.h, see the true_word_char() and misc_word_char() macros. Add '-'
to one of those macros and recompile MySQL.
Modify a character set file: This requires no recompilation. The true_word_char() macro uses a “character type”
table to distinguish letters and numbers from other characters. . You can edit the <ctype><map> contents in one of the
character set XML files to specify that '-' is a “letter.” Then use the given character set for your FULLTEXT indexes.
After making the modification, you must rebuild the indexes for each table that contains any FULLTEXT indexes.
If you modify full-text variables that affect indexing (ft_min_word_len, ft_max_word_len, or ft_stopword_file),
or if you change the stopword file itself, you must rebuild your FULLTEXT indexes after making the changes and restarting the server.
To rebuild the indexes in this case, it is sufficient to do a QUICK repair operation:
mysql> REPAIR TABLE tbl_name QUICK;
Each table that contains any FULLTEXT index must be repaired as just shown. Otherwise, queries for the table may yield incorrect results,
and modifications to the table will cause the server to see the table as corrupt and in need of repair.
Note that if you use myisamchk to perform an operation that modifies table indexes (such as repair or analyze),
the FULLTEXT indexes are rebuilt using the default full-text parameter values for minimum word length, maximum word length,
and stopword file unless you specify otherwise. This can result in queries failing.
The problem occurs because these parameters are known only by the server. They are not stored in MyISAM index files.
To avoid the problem if you have modified the minimum or maximum word length or stopword file values used by the server,
specify the same ft_min_word_len, ft_max_word_len, and ft_stopword_file values to myisamchk that you use for mysqld.
For example, if you have set the minimum word length to 3, you can repair a table with myisamchk like this:
shell> myisamchk --recover --ft_min_word_len=3 tbl_name.MYI
To ensure that myisamchk and the server use the same values for full-text parameters, place each one in both the [mysqld] and [myisamchk]
sections of an option file:
An alternative to using myisamchk is to use the REPAIR TABLE, ANALYZE TABLE, OPTIMIZE TABLE, or ALTER TABLE statements.
These statements are performed by the server, which knows the proper full-text parameter values to use.
After changing the stopword file it is not too wise to use the REPAIR TABLE tablename QUICK as mentioned
in the documentation when you have lots of records.
I had a table with ca 4 million records, and first I fell for this trap. The repair took more than 10 days.
After this I tried DROP INDEX and CREATE INDEX. That took only 40 minutes!
Query cache & min word length changes:
Using REPAIR ... QUICK after altering the min word length setting can make it appear that
the full text index is not working properly if the query cache is enabled.
After using REPAIR to rebuild the full text index I was confused to still get zero found rows
on some queries that should after have returned rows. After using DROP INDEX & CREATE INDEX I got the expected results.
My guess is that this is because of the query cache not being flushed for REPAIR as it is for ALTER TABLE (which is what DROP /
CREATE INDEX maps to).
I don't know whether this issue is a candidate for a bug report / feature request or just a documentation update though.
How I added '-' to the list of word characters:
The documentation is weak in two regards: (1) it doesn't explain how to modify the map and (2) it doesn't touch on
the implications of doing so. I'll try to solve (1), but cannot begin to speak to (2)
The charsets files exist at the location specified by the "character_sets_dir" system variable (use SHOW VARIABLES
to see this) and is typically compiled in as "/usr/share/mysql/charsets". The name of the file is given by the "character_set_...'
variables. Typically the default is "latin1". Thus the file I needed to change was /usr/share/mysql/charsets/latin1.xml
The <ctype><map> is the one we are after (other maps are "upper", "lower", "unicode" and the various collation maps).
The "ctype" map differs from the others in that is has a leading 0x00 before the character map, the meaning of which is unclear to me.
Each entry of the map appears to classify the corresponding character according to the following bitmask:
0x01 Upper-case word character
0x02 Lower-case word character
0x04 Decimal digit
0x08 Printer control (Space/TAB/VT/FF/CR)
0x10 Not-white, not a word
0x20 Control-char (0x00 - 0x1F)
0x80 Hex digit (0-9, a-f, A-F)
In my case, I needed the dash '-', but nothing else, so I altered the corresponding character position (0x2D - third row,
third from the right) from 0x10 (Not-white, not a word) to 0x01 (Upper-case word).
There is little on the web to address this, but some commentary in the forums suggested that this was NOT the way to do this,
but rather to write ones own full-text engine as the changing of the <ctype> map has implications for the SQL parser.
This may be true, but I suspect SQL parsing would require a stricter classification of characters.
The SQL statement "SELECT a-b FROM test" worked for me after this change.
Altering latin1.xml and restarting the server had the desired result.
Finally, there does not appear to be a way to create a new character set or collation without recompiling.
If this is true, it might be desirable for the standard distribution to include a "custom" character set for just this sort of thing.
Based on your example with the dash `-`, I had a look to make the single quote `'` (which is a word character by default), a non word character.
I had a look on a ascii table, the single quote is corresponding to the hexadecimal value 27.
I opened the file share/mysql/charsets/latin1.xml, I went to the upper map (0x27 is actually on the 3rd rows, 8th col from the left).
I went to this position in the ctype map, and surprised !!! This character is already set to 0x10
Not-white, not a word whereas it is a word character during tests !
From there, I'm pretty lost. Why the single quote is not detected as a non word as it should be ?
Modifying the mysql source in myisam/ftdefs.h works.
I modified the line #define misc_word_char(X) ((X)=='\'')
Is it the only way ?
About BLOBs in the database. I'm guessing that they are images and I tried that once and abandoned it on grounds of it beeing to slow.
But that time I had the DB and the webserver on the same machine and in this case you might not want to store any dynamic data on the
webserver and hence your solution with storing images in the DB.
Else the general recommendation (unless you have some other good reason to have the BLOBs in the DB like maybe simpler replication or
something like that) with images is to give them a system specific name and store the image as a normal file with the system specific
name in a directory on the webserver and then you store the info and URL in the database. That way you avoid having a lot of data in the DB
(which speeds things up) and you also avoid the overhead in MySQL to retrieve and transfer the data which is more that if the
webserver just retrieved it form the file system.
Another tip which I used (this is mostly if you are using MyISAM tables) is to separate the BLOBs from the other info.
If my table looked something like:
id, name, description, comment, ... , myImageBLOB
And I regularily search on the other columns for example a word or frase in the description or comment fields and don't
need the blob all the time. I split the BLOB to a separate table:
id, name, description, comment, ... ,
This also sped up things.
But maybe you can't change anything in the application so we can concentrate on tuning the server.
It is recommended that for exact result after changing my.cnf parameter you will restart the mysql
server and wait for 48 hours. This is because after restarting mysql will clear all memory and allocate new memory.
Based on last 48 hours logs we could go for exact modification.
Regarding read_buffer_size, each thread that does a sequential scan allocates a buffer of this size for each table it scans.
If you do many sequential scans,you might want to increase this value.
Had to do some fine tuning of MySQL 4.1.9 and here is what my.cnf file looks like for a 2GHz machine with 1GB of memory.
# MySQL 4.x has query caching available.
# Enable it for vast improvement and it may be all you need to tweak.
# Reduced to 200 as memory will not be enough for 500 connections.
# which is now: 64 + (1 + 1) * 200 = 464 MB
# max_connections = approx. MaxClients setting in httpd.conf file
# Default set to 100.
# Reduced wait_timeout to prevent idle clients holding connections.
# max_connect_errors is set to 10 by default
# Checked opened tables and adjusted accordingly after running for a while.
#tmp_table_size=32M by default
# Reduced it to 32 to prevent memory hogging. Also, see notes below.
# Reduced it by checking current size of *.MYI files, see notes below.
# Commented out the buffer sizes and keeping the default.
# sort_buffer_size=2M by default.
# read_buffer_size=128K by default.
# read_rnd_buffer_size=256K by default.
# myisam_sort_buffer_size=8M by default.
# thread_concurrency = 2 * (no. of CPU)
# log slow queries is a must. Many queries that take more than 2 seconds.
# If so, then your tables need enhancement.
# Remove the next comment character if you are not familiar with SQL
Below are notes on some of the important variables, I took down while tuning the config file.
MySQL 4 provides one feature that can prove very handy - a query cache. In a situation where the database has to repeatedly run the same queries on the same data set, returning the same results each time, MySQL can cache the result set, avoiding the overhead of running through the data over and over and is extremely helpful on busy servers.
The value of key_buffer_size is the size of the buffer used with indexes. The larger the buffer, the faster the SQL command will finish and a result will be returned. The rule-of-thumb is to set the key_buffer_size to at least a quarter, but no more than half, of the total amount of memory on the server. Ideally, it will be large enough to contain all the indexes (the total size of all .MYI files on the server).
A simple way to check the actual performance of the buffer is to examine four additional variables: key_read_requests, key_reads, key_write_requests, and key_writes.
If you divide the value of key_read by the value of key_reads_requests, the result should be less than 0.01. Also, if you divide the value of key_write by the value of key_writes_requests, the result should be less than 1.
The default is 64. Each time MySQL accesses a table, it places it in the cache. If the system accesses many tables, it is faster to have these in the cache. MySQL, being multi-threaded, may be running many queries on the table at one time, and each of these will open a table. Examine the value of open_tables at peak times. If you find it stays at the same value as your table_cache value, and then the number of opened_tables starts rapidly increasing, you should increase the table_cache if you have enough memory.
The sort_buffer is very useful for speeding up myisamchk operations (which is why it is set much higher for that purpose in the default configuration files), but it can also be useful everyday when performing large numbers of sorts.
The read_rnd_buffer_size is used after a sort, when reading rows in sorted order. If you use many queries with ORDER BY, upping this can improve performance. Remember that, unlike key_buffer_size and table_cache, this buffer is allocated for each thread. This variable was renamed from record_rnd_buffer in MySQL 4.0.3. It defaults to the same size as the read_buffer_size. A rule-of-thumb is to allocate 1KB for each 1MB of memory on the server, for example 1MB on a machine with 1GB memory.
If you have a busy server that's getting a lot of quick connections, set your thread cache high enough that the Threads_created value in SHOW STATUS stops increasing. This should take some of the load off of the CPU.
"Created_tmp_disk_tables" are the number of implicit temporary tables on disk created while executing statements and "created_tmp_tables" are memory-based. Obviously it is bad if you have to go to disk instead of memory all the time.