Should I reverse order a queryset before slicing the first N records, or count it to slice the last N records?

Question

Let's say I want to get the last 50 records of a query that returns around 10k records, in a table with 1M records. I could do (at the computational cost of ordering):

data = MyModel.objects.filter(criteria=something).order_by('-pk')[:50]

I could also do (at the cost of 2 database hits):

# assume I don't care about new records being added between
# the two queries being executed
index = MyModel.objects.filter(criteria=something).count()
data = MyModel.objects.filter(criteria=something)[index-50:]

Which is better for just an ordinary relational database with no indexing on the criteria (eg postgres in my case; no columnar storage or anything fancy)? Most importantly, why?

Does the answer change if the table or queryset is significantly bigger (eg 100k records from a 10M row table)?


Show source
| django   | postgresql   | sql   2016-12-23 13:12 1 Answers

Answers to Should I reverse order a queryset before slicing the first N records, or count it to slice the last N records? ( 1 )

  1. 2016-12-23 13:12

    This one is going to be very slow

     data = MyModel.objects.filter(criteria=something)[index-50:]
    

    Why because it translates into

     SELECT * FROM myapp_mymodel OFFEST (index-50)
    

    You are not enforcing any ordering here, so the server is going to have to calulcate the result set and jump to the end of it and that's going to involve a lot of reading and will be very slow. Let us not forgot that count() queries aren't all that hot either.

    OTH, this one is going to be fast

    data = MyModel.objects.filter(criteria=something).order_by('-pk')[:50]
    

    You are reverse ordering on the primary key and getting the first 50. And the first 50 you can fetch equally quickly with

    data = MyModel.objects.filter(criteria=something).order_by('pk')[:50]
    

    So this is what you really should be doing

    data1 = MyModel.objects.filter(criteria=something).order_by('-pk')[:50]
    data2 = MyModel.objects.filter(criteria=something).order_by('pk')[:50]
    

    The cost of ordering on the primary key is very low.

Leave a reply to - Should I reverse order a queryset before slicing the first N records, or count it to slice the last N records?

◀ Go back