There are many ways to optimize MySQL. I think, understanding the basic issues and discussing the fixes itself would be a great read and that’s what has been the effort here.
Let’s see how to increase the perfomance of MySQL by adding indexes and using load data infile.
1. Improper or non-existent indexes:
Improper or non-existent indexes(indexes not been added to respective or any fields in the MySQL table)are one of the primary reasons for poor performance. Identifying and fixing them can often lead to great improvements.
Let’s see an example:
We have an employee table with 7 fields, note that none of the fields are indexed.
Assume that we are trying to find the monthly salary of an employees, say Jim Jone whose employee number is 101800.For this we use
MySQL has no clue where to find this record and has to look through the entire table, potentially thousands of records to find Jim’s data.
Using an Index can solve this problem real quick. An index is a separate file that is sorted, and contains only the field/s you’re interested in sorting on. End result? You save a good amount of time!
Before we alter the table structure above, let me share an important hint for anyone serious about optimizing their queries: EXPLAIN, the keyword in MySQL that will help you figure out what is wrong!
EXPLAIN shows how your queries are being used. By putting it before a SELECT, you can see whether indexes are being used properly, and what kind of join is being performed etc.
For example:
table | type | possible_keys | key | key_len | ref | rows | Extra |
---|---|---|---|---|---|---|---|
employee | ALL | NULL | NULL | NULL | NULL | 2 | where used |
Let’s see what are all these things?
- table: Shows us which table the output is about (for when you join many tables in the query)
- type: It tells us which type of join is being used. From best to worst the types are: system, const, eq_ref, ref, range, index, all
- possible_keys: Shows which possible indexes apply to this table
- key: Which one is actually used
- key_len: Give us the length of the key used. The shorter that better.
- Ref: Tells us which column, or a constant, is used
- rows: Number of rows mysql believes it must examine to get the data
- extra: Extra info – the bad ones to see here are “using temporary” and “using filesort”
Looks like our query is the worst of the worst! There are no possible keys to use, so MySQL has to go through all the records and forms a heavy traffic! Now let’s add the index we talked about earlier.
If we re-run the EXPLAIN, we get:
table | type | possible_keys | key | key_len | ref | rows | Extra |
---|---|---|---|---|---|---|---|
employee | const | PRIMARY | PRIMARY | 10 | const | 1 | where used |
The query above is a good one. The type of “join” is “const”, which means that the table has only one matching row. The primary key is being used to find this particular record, and the number of rows is 1.
Little knowledge of how indexes work make you to use them more efficiently. Firstly, note that when you update a table with an index, you have to update the index as well.
What about if you want to select on more than one criteria? (As you can see, it only makes sense to index those fields you use in the WHERE clause.) The query:
makes no use of an index at all. An index on firstname is useless. But,
would benefit from an index on surname.
Let’s see an example for an improper indexing:
Now let’s go for a complex one. We want to list all the employees where half of their overtime rate is less than 20. Normally, you correctly decide to add an index on overtime_rate, seeing as that’s the column in the where clause.
Now let’s run the query.
table | type | possible_keys | key | key_len | ref | rows | Extra |
---|---|---|---|---|---|---|---|
employee | ALL | NULL | NULL | NULL | NULL | 2 | where used |
Oops! Something went wrong! The problem lies in the “overtime_rate/2” part of your query. Every overtime_rate (and hence every record) has to be read in order to divide it by 2. So we should try and leave the indexed field alone, and not perform any calculations on it. Let’s try this query this way:
table | type | possible_keys | key | key_len | ref | rows | Extra |
---|---|---|---|---|---|---|---|
employee | range | overtime_rate | overtime_rate | 4 | NULL | 1 | where used |
Seems better! MySQL can perform the 20*2 calculation once, and then search the index for this constant. So keep your indexed field standing alone in the comparison, so that MySQL can use it to search.
Ordering by surname is a common requirement, so it would make sense to create an index on surname.
But in this example our employee table consists of thousands of people from place, and with the same surname. So we need to index on firstname as well. The good news is that MySQL uses leftmost prefixing, which means that a multi-field index A,B,C will also be used to search not only for a,b,c combinations, but also A,B as well as just A.
In our example, this means that an index of the type
is used for a queries such as
as well as
which both result in
However, the query
does not use an index, as firstname is not available from the left of the index, as shown below.
If you needed this kind of query, you would have to add a separate index on firstname.
2 Optimizing with LOAD DTA INFILE
Most of the sites need to be highly optimized for selects – consider a news site which performs millions of queries per day, but where the data arrives in large batches of text files. So for parts of the day, inserts need to be optimal, without noticeably affecting the millions trying to access the data.
The best way to insert the data is to use MySQL’s “LOAD DATA INFILE”. This is much faster (20 times according to MySQL).
The syntax is easy, and the code becomes a lot simpler too:
$mydb->query(“LOAD DATA INFILE ‘datafile.txt’ INTO TABLE employee (employee_number,firstname,surname,tel_no,salary) FIELDS TERMINATED BY ‘|'”);
LOAD DATA INFILE has defaults of:
FIELDS TERMINATED BY ‘t’ ENCLOSED BY ” ESCAPED BY ”
if you don’t specify any of these clauses. And, just as with an ordinary insert, you need to specify a field list if the order of the fields is different, or, as in the example above, you’re not inserting data for every field. Always specifying a field list is good practice for all queries anyway – if someone adds a field to the table at a later stage, you don’t want to go back and have to fix all your previous INSERT and SELECT * statements.
If you can’t get this to work properly, have a look at the format of your text file – every problem I’ve seen with LOAD DATA has been because of a corrupted text file. Every field in every row must be delimited correctly!
Summary
It’s not only INSERT that needs to be quick – sometimes you need to DELETE quickly too.
Don’t do a:
DELETE FROM <tablename>;
Rather, use:
TRUNCATE TABLE <tablename>;
The difference here is that DELETE drops records one by one, and that can be 1 million one by one’s too slow!
This has only been a brief introduction to optimizing queries and indexes. The best way is to try and rewrite the query in as many different ways as possible, and see which one runs more efficiently (Don’t forget the EXPLAIN!).