Find k Percentiles of a Data Set

39 Views Asked by At

The task is to find the k percentiles in a given unsorted Data Set. A Data set has in reality k - 1 percentiles. What is a percentile?

Percentiles are the k-1 elements in a sorted Data Set where they divide the Data set in k equal sub-sets in size (with +1 or -1).

For example, in the given array: [2 5 7 10 17 21 34 48] the bold numbers are the percentiles for k = 3.

Now the code I have provided you below, is false. It doesn't compute all the values of k. The core idea is that we take the unsorted array and firstly we compute (with select) the m = floor(k/2) index of the Data set which will be near the middle of our data set. Then with partition we have a "somewhat" sorted dataset with the left values of the m being all smaller percentiles, and the right values being all larger percentiles.Then with recursion we do it again for the next (k/2)/2 element.

I can't think of a way that we can compute the other percentiles, and end up with a complexity of Θ(n) = nlog(k)

find_percentiles(A,p,q,k):
  select(A, floor(k/2)
  partition(A,p,q,floor(k/2)) #partition around floor(k/2)

  if (floor(k/2)) ==1: 
    return

  find_percentiles(A,p,floor(k/2), floor(k/2))
1

There are 1 best solutions below

0
Axel Kemper On

The following c# code might be a starting point for your own experiments:

    private static int[] find_percentiles(int[] A, int k)
    {
        int len = A.Length;

        if ((len < 1) || (k < 1))
        {
            //  no percentiles left: return empty array
            return [];
        }

        if (len <= k)
        {
            //  array too short: cannot devise percentiles
            return A;
        }

        //  it is more efficient to use a fast median selection
        //  https://rcoh.me/posts/linear-time-median-finding/
        //  or https://www.techieclues.com/blogs/how-to-calculate-the-median-of-an-array-in-csharp
        int pivot = A.OrderBy(x => x).ElementAt(len / 2); 

        if (k < 2)
        {
            return [pivot];
        }

        int[] upper = A.Where(x => x > pivot).ToArray();
        int[] lower = A.Where(x => x < pivot).ToArray();
        double gap = (len - (k - 1.0)) / ((double)k);
        int kUpper = (int)(upper.Length / (gap + 1));
        int kLower = k - 1 - kUpper;
        int[] upperK = find_percentiles(upper, kUpper);
        int[] lowerK = find_percentiles(lower, kLower);
        return lowerK.Union([pivot]).Union(upperK).ToArray();
    }