How come usort (php) works even when not returning integers?

1.4k Views Asked by At

Yesterday at work I stumbled upon a piece of code that was roughly like this:

uasort($array, function($a, $b) {
    return isset($a['sort_code']) && isset($b['sort_code']) && $a['sort_code'] > $b['sort_code'];
});

this flipped a couple of switches in my head seeing as this can only return a boolean true or false instead of the 0 1 or -1 that is clearly stated in php.net. Best case scenario PHP will interpret false as 0 and true as 1. So one (the colleague that wrote this) could argue that it actually works.

I made the argument that it cannot possibly work because it will never ever ever return a -1. There is a state missing in this return-either-true-or-false from the sort callback and the way I think this is you will get either a wrong result or the process is not as performant.

I actually got down to writing a couple of tests and though I am no expert in sorting I actually got to the point where you could actually do away with the -1 state and still get a correct sorting with the same amount of calculations

effectively you could replace:

if ($a === $b) {
    return 0;
}

return $a < $b ? -1 : 1; 

with

return $a > $b;

Now, as I said, I am by no means an expert in sorting and my tests were less than thorough. However my results with small and bigger arrays showed that this is true. The array always got sorted correctly with the amount of calculations being the same using either method

So it really doesnt seem to be that big of a matter that there is no third state (-1)

I still stand by my opinion that you should follow the docs, especially PHP, to the letter. When php says I need either 0, a negative or positive number then that is exactly what one should do. You shouldnt leave stuff like that to chance and watch how things work by accident.

However I am intrigued by all this and would like some insight from somebody who knows more

thoughts? anyone?

3

There are 3 best solutions below

2
Justinas On

You do not have to actually return all -1, 0 and 1 for function to work. Returning 0 would not switch elements in place. Also you can return something like -1345 to act as -1.

Returning only 0 or 1 means you want just to "sink" some elements towards end of array, like have all NULL values at end or start of array and other sorting is not important:

$input = ['c', 'a', null, 'b', null];

usort($input, function ($a, $b) {
    return !is_null($b);
});

// 'b', 'a', 'c', null, null
var_dump($input);

In your case it's sorting to elements without sort_code to end and also sort_code in order

0
mickmackusa On

The logic in your uasort() has logic that caterers to making fewer overall comparisons and therefore (assuming the sorting rules deliver the expected result) executes with higher efficiency.

PHP's uasort() uses Quicksort as far as I can tell, but the docs say you should not rely on this fact (perhaps in case it changes in the future). An SO reference.

Here is a demo that shows that the sort call only makes enough comparisons to potentially break all ties. Employing a three-way comparison ends up being more expensive because more comparisons are required but would likely deliver a different (and better) sort due to the sorting instructions.

$array = [
    ['id' => 1, 'sort_code' => 4],
    ['id' => 2, 'sort_code' => 2],
    ['id' => 3],
    ['id' => 4, 'sort_code' => 8],
    ['id' => 5],
    ['id' => 6, 'sort_code' => 6]
];

uasort($array, function($a, $b) {
    echo "\n" . json_encode($a) . " -vs- " . json_encode($b) . " eval: ";
    echo $eval = (int)(isset($a['sort_code'], $b['sort_code']) && $a['sort_code'] > $b['sort_code']);
    return $eval;
});
echo "\n---\n";
var_export($array);

echo "\n======\n";

uasort($array, function($a, $b) {
    echo "\n" . json_encode($a) . " -vs- " . json_encode($b) . " eval: ";
    echo $eval = ($a['sort_code'] ?? 0) <=> ($b['sort_code'] ?? 0);
    return $eval;
});
echo "\n---\n";
var_export($array);

Output:

{"id":1,"sort_code":4} -vs- {"id":2,"sort_code":2} eval: 1
{"id":1,"sort_code":4} -vs- {"id":3} eval: 0
{"id":3} -vs- {"id":4,"sort_code":8} eval: 0
{"id":4,"sort_code":8} -vs- {"id":5} eval: 0
{"id":5} -vs- {"id":6,"sort_code":6} eval: 0
---
array (
  1 => 
  array (
    'id' => 2,
    'sort_code' => 2,
  ),
  0 => 
  array (
    'id' => 1,
    'sort_code' => 4,
  ),
  2 => 
  array (
    'id' => 3,
  ),
  3 => 
  array (
    'id' => 4,
    'sort_code' => 8,
  ),
  4 => 
  array (
   'id' => 5,
  ),
  5 => 
  array (
    'id' => 6,
    'sort_code' => 6,
  ),
)
======

{"id":2,"sort_code":2} -vs- {"id":1,"sort_code":4} eval: -1
{"id":1,"sort_code":4} -vs- {"id":3} eval: 1
{"id":2,"sort_code":2} -vs- {"id":3} eval: 1
{"id":1,"sort_code":4} -vs- {"id":4,"sort_code":8} eval: -1
{"id":4,"sort_code":8} -vs- {"id":5} eval: 1
{"id":1,"sort_code":4} -vs- {"id":5} eval: 1
{"id":2,"sort_code":2} -vs- {"id":5} eval: 1
{"id":3} -vs- {"id":5} eval: 0
{"id":4,"sort_code":8} -vs- {"id":6,"sort_code":6} eval: 1
{"id":1,"sort_code":4} -vs- {"id":6,"sort_code":6} eval: -1
---
array (
  2 => 
  array (
    'id' => 3,
  ),
  4 => 
  array (
    'id' => 5,
  ),
  1 => 
  array (
    'id' => 2,
    'sort_code' => 2,
  ),
  0 => 
  array (
    'id' => 1,
    'sort_code' => 4,
  ),
  5 => 
  array (
    'id' => 6,
    'sort_code' => 6,
  ),
  3 => 
  array (
    'id' => 4,
    'sort_code' => 8,
  ),
)
0
Olivier On

A comparison sort requires only a total preorder, which is basically a "less than or equal to" relation. In other words, it doesn't matter whether a < b or a = b; you don't need that information to sort, all you need to know is that a <= b. That's why most (if not all) sorting algorithms don't distinguish between values -1 and 0, they only test if the comparison function returns 1 or not 1.

However, to work correctly, the function needs to be transitive (otherwise the relation would not be a total preorder). This is NOT the case of the function given in the original post.

Here is an example:

$array = [
    ['sort_code' => 2],
    [],
    ['sort_code' => 1]
];

uasort($array, function($a, $b) {
    return isset($a['sort_code']) && isset($b['sort_code']) && $a['sort_code'] > $b['sort_code'];
});

var_export($array);

which gives:

array (
  0 => 
  array (
    'sort_code' => 2,
  ),
  1 => 
  array (
  ),
  2 => 
  array (
    'sort_code' => 1,
  ),
)

sort_code 2 comes before 1, which was not intended. That's because, according to the comparison function:

$array[0] <= $array[1]
$array[1] <= $array[2]
$array[0] >  $array[2]

If the relation was transitive, the first two lines would imply:

$array[0] <= $array[2]

which contradicts with what the function says.