Performance analysis and optimization of mySQL limit usage and paginated queries

First, the use of limit

When we use query statements, we often have to return the first few or the middle rows of data, what should we do at this time? Don’t worry, mysql already provides us with such a feature.

SELECT * FROM table LIMIT [offset,] rows | `rows OFFSET offset ` 
(LIMIT offset, `length`)
SELECT
*
FROM table
where condition1 = 0
and condition2 = 0
and condition3 = -1
and condition4 = -1
order by id asc
LIMIT 2000 OFFSET 50000

The LIMIT clause can be used to force a SELECT statement to return a specified number of records. LIMIT accepts one or two numeric parameters. The argument must be an integer constant. If two parameters are given, the first parameter specifies the offset of the first returned record row, and the second parameter specifies the maximum number of returned record rows. The offset of the initial record row is 0 (not 1): For PostgreSQL compatibility, MySQL also supports the syntax: LIMIT # OFFSET #.

mysql> SELECT * FROM table LIMIT 5,10; 

To retrieve all record rows from an offset to the end of the recordset, you can specify a second parameter of -1:

mysql> SELECT * FROM table LIMIT 95,-1; 

If only one argument is given, it means that the maximum number of record rows returned:
mysql> SELECT * FROM table LIMIT 5; In
other words, LIMIT n is equivalent to LIMIT 0,n.

Second, the performance analysis of Mysql’s paginated query statements

MySql paginated SQL statement, if compared with MSSQL’s TOP syntax, then MySQL’s LIMIT syntax is much more elegant. It’s only natural to use it for pagination.

The most basic pagination method:

SELECT ... FROM ... WHERE ... ORDER BY ... LIMIT ... 

In the case of small and medium-sized data volumes, such a SQL is sufficient, and the only problem to pay attention to is to ensure that the index is used:
for example, if the actual SQL is similar to the following statement, then it is better to build a compound index on the category_id, id two columns:

SELECT * FROM articles WHERE category_id = 123 ORDER BY id LIMIT 50, 10

How to paginate a subquery:

As the amount of data increases, the number of pages will increase, and the SQL for the next few pages may be similar:
SELECT * FROM articles WHERE category_id = 123 ORDER BY id LIMIT 10000, 10

In short, the further back you page, the larger the offset of the LIMIT statement and the slower it will be.
At this time, we can improve the efficiency of pagination by subquerying, roughly as follows:

SELECT * FROM articles WHERE  id >=  
(SELECT id FROM articles  WHERE category_id = 123 ORDER BY id LIMIT 10000, 1) LIMIT 10 

JOIN pagination mode

SELECT * FROM `content` AS t1   
JOIN (SELECT id FROM `content` ORDER BY id desc LIMIT ".($page-1)*$pagesize.", 1) AS t2   
WHERE t1.id <= t2.id ORDER BY t1.id desc LIMIT $pagesize; 

After my tests, the efficiency of join paging and subquery paging is basically on the same level, and the time consumed is basically the same.
explain SQL statement:

id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> system NULL NULL NULL NULL 1  
1 PRIMARY t1 range PRIMARY PRIMARY 4 NULL 6264 Using where
2 DERIVED content index NULL PRIMARY 4 NULL 27085 Using index

Why is that? Because the query is done on the index, and the ordinary query is done on the data file, in general, the index file is much smaller than the data file, so the operation will be more efficient.

In fact, you can use a similar policy pattern to deal with pagination, such as judging that if it is within 100 pages, use the most basic pagination method, and if it is larger than 100 pages, use the pagination method of subqueries.

Third, for mysql tables with large data volumes, there are serious performance problems with USING LIMIT paging.

Query 30 records from 1000000 after:

SELECT * FROM `cdb_posts` ORDER BY pid LIMIT 1000000 , 30

SELECT * FROM `cdb_posts` WHERE pid >= (SELECT pid FROM  
`cdb_posts` ORDER BY pid LIMIT 1000000 , 1) LIMIT 30

Because to take out all the field contents, the first needs to span a large number of data blocks and take them out, while the second basically removes the corresponding content by directly positioning them according to the index field, and the efficiency is naturally greatly improved. The optimization of limit does not directly use the limit, but first obtains the id of the offer, and then directly uses the limit size to obtain the data.

It can be seen that the further back the page, the greater the offset of the LIMIT statement, and the more obvious the speed difference between the two will be.

In practical applications, you can use a similar policy pattern to deal with pagination, such as judging that if it is within 100 pages, use the most basic pagination method, and if it is larger than 100 pages, use the pagination method of subqueries.

Optimization idea: Avoid scanning too many records when the data volume is large

To ensure that the index index columns are contiguous, you can add a self-increment field to each table and add an index

Fourth, pagination query, total number of pages, how to limit the implementation

1. Limit pagination

limit pagination: curPage is the current page; PageSize is how many records per page

select * from student limit (curPage-1)*pageSize , pageSize;

2. Total number of pages

(1) Total number of pages: totalRecord is the total number of records; PageSize is how many records are divided into a page

int totalPageNum = (totalRecord +pageSize - 1) / pageSize;

2) Total number of queries: totalRecord is the total number of records

SELECT COUNT(*) FROM courses

Be the first to comment

Leave a Reply

Your email address will not be published.


*