SQL Server Denali

The introduction of paging within SQL Server Denali will have made a significant number of developers happy, all of which will of previously created home-baked solutions to the same problem. All the solutions have the same underlying problem – paging is by its nature is inefficient. Most solutions use the row number analytic function, and then sub-select data from that. For a large dataset that presents a problem – the data has to be fully scanned, sorted and allocated row numbers. A suitable index can eliminate the sort operator, but you still end up scanning the entire index to allocate the row numbers.

In Denali, we can see that they have added support to the Order By clause, to include a starting offsets and syntax to denote how many rows we wish to return. The syntax can be examined on MSDN (http://msdn.microsoft.com/en-us/library/ms188385%28v=SQL.110%29.aspx#Offset) and in brief is:

ORDER BY order_by_expression
    [ COLLATE collation_name ]
    [ ASC | DESC ]
    [ ,...n ]
[ <offset_fetch> ]

<offset_fetch> ::=
{
    OFFSET { integer_constant | offset_row_count_expression } { ROW | ROWS }
    [
      FETCH { FIRST | NEXT } {integer_constant | fetch_row_count_expression }
      { ROW | ROWS } ONLY
    ]
}

Seeing this new syntax, made me want to try it out and see how the query plans are affected. I am using the trusty Adventure Works as usual – a version for Denali has been put on codeplex, so one quick download later and I was ready to test the new syntax. (Adventure Works download : http://msftdbprodsamples.codeplex.com/releases/view/55330 )

For my tests, I used the production.product table, and wished to page the products based on their name. There is a non-clustered index on the Name field of the product table as well as a clustered index on the product_id, so what would the query plan give?

select * from Production.Product order by name asc 
offset 10 rows fetch first 10 rows only

And the query plan is not very surprising

So even with a new syntax the underlying problem remains, the nature of paging is that you are scanning the data, with statistics io turned on the stats come back with Table ‘Product’. Scan count 1, logical reads 15 etc. not particularly exciting and what we would expect given the table is contained within 15 pages. It was because of the stats though that I noticed an anomaly, in one of the tests, I had dropped to returning only a single row from the table as follows:

select * from Production.Product order by name asc
offset 10 rows fetch first 1 rows only

What I noticed was that the statistics changed to Table ‘Product’. Scan count 1, logical reads 24 – the entire table is contained within 15 pages, so how could it jump to reading 24?

A quick check of the query plan showed what has changed, the engine decided that it was cheaper to use the Name index, which for the purposes of the ordering was narrower and therefore more efficient, and then join back to the main table via the clustered key. Understandable, although the additional pages read is unlikely to make this more efficient, but I doubt you would see much real world difference. An oddity, but nothing really significant in it.

This triggered a more interesting thought, what happens if we reduce our fields so that the index is considered a covering index? is SQL going to get smart when making a selection – so far we have only seen full table scans occurring.

select Name, ProductID from Production.Product order by name asc
offset 20 rows fetch first 10 rows only

The query is now being covered by the name index since the non-clustered index includes the clustered key (ProductID) – and this changes the query plan again, although its pretty subtle change to notice.

The expected index scan appears, but if you look closely at the tooltip for the scan, the number of rows being read in the scan is not the total number of rows in the index, but a product of the offset + the number of rows requested. This was also reflected within the statistics, showing only 2 logical reads – the index uses 6 pages in total. As I changed the number of rows to offset / return the Actual number of rows read changed accordingly. o with a covering index in place, the query engine gets a bit more efficient and does a forward scan of the index until the point at which we have passed a sufficient number of rows. This sounds good – we have avoided scanning the whole index to provide the paged results in a slightly more efficient manner.

Except those with a quick mind will realise that the performance degrades as you go further and further down the list, requesting the 490-500th products will results in 500 rows being checked, not 30. By putting in a covering index we have sacrificed consistency on query times to gain some potential performance – the full scans solutions will broadly speaking take the same time regardless of which 10 rows you might be requesting, since it has to scan, sort, allocate numbers and then sub-select.

As features go, I like the paging – it removes the need for all the different homegrown solutions that are out there, but the performance of it remains a problem – this is no silver bullet to paging performance problems that people have.

Filed under: SQL Server Tagged: Indexes, Query Plan, SQL Server Denali

Latest Images

Trending Articles

Latest Images