K-means Cluster Centers Are the Same Every Run Although It Is Randomly Initialized

Question

K-means Cluster Centers Are the Same Every Run Although It Is Randomly Initialized

265 Views Asked by AmrSherbiny At 06 August 2020 at 17:41

I have an application where I need to implement K-means myself. I was trying my implementation in a small program to see how things are going; however, I find it really confusing that I get the exact same cluster centers every time I run the program. I am surprised because my cluster centers are randomly initialized, and I am using FLANN to update the clusters (both should make the output nondeterministic, right?). I'm probably missing something, and would appreciate some help.

cv::Mat kMeans(cv::Mat data, int k, int max_itr){
    cv::Mat centroids = cv::Mat(cv::Size(data.cols, k), data.type());
    std::vector<int> cluster_ids(data.rows, 0);

    //randomly initialize centroids
    for(int i = 0; i < k; i++)
    {
        int idx = rand() % data.rows;
        data.row(idx).copyTo(centroids.row(i));
    }

    int itr = 0;
    cv::flann::Index kdtree(centroids, cv::flann::KDTreeIndexParams(4));
    cv::Mat indices, dists;
    
    //TODO: add tolerance stopping condition
    while (itr < max_itr)
    {
        //assign data points to centroids
        for(int i = 0; i < data.rows; i++)
        {
            kdtree.knnSearch(data.row(i), indices, dists, 1, cv::flann::SearchParams(32));
            cluster_ids[i] = indices.data[0];
        } 
        //update centroids
        for(int i = 0; i < centroids.rows; i++)
        {
            cv::Mat sum = cv::Mat::zeros(cv::Size(data.cols, 1), data.type()); 
            int count = 0;
            for(int j = 0; j < data.rows; j++)
            {
                if(cluster_ids[j] != i) {continue;}
                sum += data.row(j);
                count++;
            }
            centroids.row(i) = sum / float(count);
        }
       itr += 1;
    }
return centroids;
}

int main()
{
    cv::Mat_<float> data;
    cv::Mat_<float> centroids;
    cv::Mat dst;
    //initialize with dummy data
    for(int i = 0; i < 1000; i++) 
    { 
        cv::Mat row = (cv::Mat_<float>(1, 3) << float(i), float(i), float(i));
        data.push_back(row);
    }
    centroids = kMeans(data, 20, 20);
    std::cout << centroids << std::endl;
 
    return 0;
}

Original Q&A

There are 2 best solutions below

stateMachine On 08 August 2020 at 22:57

OpenCV also provides a Random Number Generator class: cv::RNG. You can use it like this:

//Create the RNG object:
cv::RNG& rng = cv::theRNG();
//Seed it:
rng.state = cv::getTickCount();

//Print it:
std::cout<<"Random Number: "<<(int)rng<<std::endl;

**cigien** · Accepted Answer · 2020-08-06T17:44:09.547000

You need this line:

std::srand(std::time(nullptr)); // use current time as seed for random generator

at the start of your program to get different sequences from the rand function every time you run the program.

You can use some other seed if you want, but you still need to call srand with some source of randomness at the start of the program.

K-means Cluster Centers Are the Same Every Run Although It Is Randomly Initialized

There are 2 best solutions below

Related Questions in C++

Related Questions in OPENCV

Related Questions in K-MEANS

Related Questions in FLANN

Trending Questions

Popular # Hahtags

Popular Questions