Python - Compute Local Minimum In Array Using Prefix Sums

586 Views Asked by At

I'm trying to solve the Min Avg Two Slice question from Codility.

I came up with the following code:

def solution(S, P, Q):
    weights = {'A': 1, 'C': 2, 'G': 3, 'T': 4}
    retVal = []
    for i in range(0, len(P)): 
        if P[i] == Q[i]: 
            retVal.append(weights[S[P[i]]])
        else: 
            retVal.append(local_min(S, weights, P[i], Q[i]))
    return retVal

minimums = {}
def local_min(S, weights, start, end):
    minVal = weights[S[start]]
    for i in range(start,end+1):
        val = weights[S[i]]
        if val == 1: 
            minimums[i] = 1
            return 1
        if val < minVal: 
            minVal = val
        minimums[i] = minVal
    return minVal

This works fine in terms of correctness, however it has a time complexity of O(n*m) since I'm re-computing the local_min every time and not reusing any results.

I'm trying to use the prefix_sums algorithm here to compute local mins or somehow memoize the computed min values using the minimums object and reuse them in subsequent calls to local_min.

The problem I'm running into is this - lets say we have the following array:

[1, 13, 2, 8, 20, 5]

We could compute an array of smallest values seen up to this point as follows:

For range 0, 6 it would simply be:

[1,1,1,1,1,1]

For 1,6 it would be:

[13, 2, 2, 2, 2]

For 2,6:

[2, 2, 2, 2]

And finally:

[8, 8, 8] and [20, 5]

I'm struggling to see how to adapt this approach to compute the smallest value in a given range and reduce the time complexity of my solution.

EDIT:

I came up with the following solution which achieves 100% on Codility in terms of correctness and performance and achieves a time complexity of O(n+m):

def solution(S, P, Q):
    weights = {'A': 1, 'C': 2, 'G': 3, 'T': 4}
    retVal = []
    for i in range(0, len(P)): 
        if P[i] == Q[i]: 
            retVal.append(weights[S[P[i]]])
        else: 
            if 'A' in S[P[i]:Q[i] + 1]: 
                retVal.append(1)
            elif 'C' in S[P[i]:Q[i] + 1]: 
                retVal.append(2)
            elif 'G' in S[P[i]:Q[i] + 1]: 
                retVal.append(3)
            else: 
                retVal.append(4)
    return retVal

I'm still a bit confused by this though - my assumption would be the in operation would take O(n) where n is the length of S so the overall complexity should be O(n * m) where m is the length of P and Q - could anyone explain why the complexity is in fact O(n + m)

2

There are 2 best solutions below

8
Yevgeniy Kosmak On

An approach offered by Matt Timmermans is less complicated in implementation and has better time complexity. An approach below is easier to come up with but less efficient.

If we want to solve RMQ faster, there are such options:

  • Segment Tree (best option as for me). O(n) for construction, O(log(n)) for min search. Not so easy to develop using Python.
  • Sqrt Decomposition. O(n) for construction, O(sqrt(n)) for min search. Very easy to develop.
  • Fenwick Tree. O(n) for construction, O(log(n)) for min search. Extremely easy to develop, quite hard to understand how it works.
  • Sparse Table. O(n*log(n)) for construction, O(1) for min search. Quite easy to develop.

All the links contain the implementation, some of them in Python.

0
Matt Timmermans On

The hard part of this problem is proving that there's an easy solution.

In particular, you can prove that you only need to look for slices of length 2 or 3.

Proof: Imagine that there is a slice of length >=4 that has the smallest possible average. We can divide it into two slices -- one consisting of the first 2 elements, and one consisting of the remainder. Neither part can have a smaller average than the whole, since the whole has the smallest possible average, and so both parts must have the same average as the whole, so those initial 2 elements are a valid result that starts at the same place.

So just iterate through the start positions and test all slices of length 2 or 3, and remember the first one with the smallest average.