qsort switching to insertsort

Sun Nov 27 12:51:19 UTC 2016

> From: Alfred Perlstein <alfred at freebsd.org>
> Sent: Sunday, 27 November 2016 4:40 PM
> To: Tristan Verniquet; Mehmet Erol Sanliturk; Arne Dag Fidjestøl
> Cc: Hans Petter Selasky; freebsd hackers
> Subject: Re: qsort switching to insertsort
>     
> A couple notes:
> 
> 1) It's interesting to me that it's based on a simple heuristic as 
> opposed to also checking the depth of the recursion within the qsort 
> itself.
> 
> 2) Wondering if upon detection of this situation it might even be faster 
> to perform some randomization (shuffle) of the data and then retry.
> 
> 3) Wondering if there could be a way to return an error and indicate the 
> data is unsorted if the corner case is hit allowing the caller to make a 
> choice at that point.  Obviously the API would need to be changed.
> 
> I haven't thought too hard on this.

Note that the functionalilty in question is a special-case test which has been bolted on to the side. Nothing much is really lost by removing it, which according to my first link is what NetBSD has done. I don't think we'll get far discussing how to "fix" it as tempting as that is - probably better left for a very in depth studious well tested effort..

The qsort.c file is only 200 odd lines long. I highly recommend reading it to see what I mean (ie the case if swap_cnt == 0).

(The special-case is to turn sorting sorted data from O(nlogn) to O(n), but exposes a whole new set of average-case O(n^2) cases)

>
> One thing to keep in mind: Using qsort in a codepath has "deadlines" or 
> other deterministic needs is not going to work out well.  It's better to 
> use a sort with a known complexity with an upper bound than something 
> that can be defeated by a pathological input data set.
> 
> -Alfred

I guess I have the same comment/question as I do for Hans Petter Selasky's response.

In the cases where qsort would actually be used, don't you think the much higher likelyhood of triggering the insertsort worst case especially with some types of data makes a big difference?

Maybe, I see the insertsort case as similar (though obviously not as likely to be hit) as the "choose the last element" pivot selection which sorts O(n^2) on sorted data. If users need to expect O(n^2) anyway, why not keep the last element pivot selection? Instead, a pretty good effort has been made to make sure bad pivots won't be chosen with normal data. That is undermined slightly with the insertsort switch imho.

Tristan