How can I implement a linear time implementation of a function that counts the number of occurrences of a non-consecutive subsequence?

73 Views Asked by At

My Programming I class assigned an exercise in which we have to implement a function that counts the number of occurrences of a non-consecutive subsequence in a string. This is what we're asked to do:

To begin with, you have to implement a function such that, given a string s and three different characters c1, c2, c3 as parameters, returns how many times c1c2c3 occurs in s as (non-consecutive) subsequence. In other words, it returns the number of triples of indexes (i1,i2,i3) holding i1<i2<i3 and s[i1]=c1, s[i2]=c2, s[i3]=c3.

And this is the code I implemented:

#include <iostream>
#include <string>
#include <vector>

using namespace std;

int numberSubsequences(const string &s, char c1, char c2, char c3) {
    int count = 0;
    int lengthSequence = s.length();
    for ( int i = 0; i < lengthSequence - 2; ++i ) {
        if ( s[i] == c1 ) {
            for ( int j = i + 1; j < lengthSequence - 1; ++j ) {
                if ( s[j] == c2 ) {
                    for ( int k = j + 1; k < lengthSequence; ++k ) {
                        if ( s[k] == c3 ) {
                            count = count + 1;
                        }
                    }
                }
            }
        }
    }
    return count;
}

The problem is that I submitted this code and the submission program we use said that I exceeded the time limit for some private cases. Is there a way to simplify this function so that it passes all cases? Any help is appreciated. :)

2

There are 2 best solutions below

0
John Zwinck On

In order to avoid entirely doing your homework for you, here is a version which handles the case of two characters rather than three:

int numberSubsequences(const string &s, char c1, char c2) {
    int count1 = 0;
    int count2 = 0;

    for (size_t ii = 0; ii < s.size(); ++ii) {
        if (s[ii] == c1)
            ++count1;
        if (s[ii] == c2)
            count2 += count1;
    }

    return count2;
}

The main idea is to scan the input string only once, which is the best "big O" possible for this problem. At each position, we know how many times we have seen c1, and if we see c2 we accumulate the count of c1 which came before it.

Extending this to three or more characters is an exercise for the reader.

3
kcbsbo On

We can mark the coordinates of the positions where c1, c2, and c3 appear. Then, for any c1 coordinate, we need to find how many c2 coordinates are to the right of that coordinate as c2 candidates for that c1 coordinate. Similarly, for each c2 candidate, we use the same method to find the number of c3 coordinates for that case.

Because we already have the coordinate information for c2, we can use binary search (or a similar algorithm, such as a red-black tree) to get this information. The same is true for c3. In this case, the time complexity of building the coordinates is O(N), and the search process is n * log(n) * log(n). You can try it and it should meet the time requirements

Refer to the following code:

int numberSubsequences(const string &s, char c1, char c2, char c3) {
    set<int> p1,p2,p3;
    for(int i = 0; i < s.size(); i++) {
        if(s[i] == c1) {
            p1.insert(i);
        } else if (s[i] == c2) {
            p2.insert(i);
        } else if (s[i] == c3) {
            p3.insert(i);
        }
    }   
    
    int r = 0;
    for(auto i : p1) {
        auto found = p2.upper_bound(i);     
        for(auto iter = found; iter != p2.end(); iter++) {
            auto found2 = p3.upper_bound(*iter);
            r += std::distance(found2, p3.end());
        }       
    }
    return r;
}