Basically I'm creating a parallel program to calculate a julia set image of +50000 by 50000 pixels and I'm using MPI and PNG lib to do so. I have a struct of
typedef struct Block
{
int size;
int place;
png_bytep *rows;
} block;
block knapsack;
png_bytep *row_pointers;
and an allocate function of
void allocate()
{
row_pointers = (png_bytep *)malloc(sizeof(png_bytep) * height);
for (y = 0; y < height; y++)
row_pointers[y] = (png_byte *)malloc(sizeof(png_bytep) * width);
}
I have this function to create a 'knapsack' which is a block of row_pointers that i can distribute to other processes.(I will combine the functions later when I solve this message passing problem.
void pack(int index, int size)
{
knapsack.rows = (png_bytep *)malloc(sizeof(png_bytep) * size);
knapsack.size = size;
knapsack.place = index;
for (y = 0; y < size; y++)
{
knapsack.rows[y] = (png_byte *)malloc(sizeof(png_bytep) * width);
knapsack.rows[y] = row_pointers[index + y];
}
}
Then I want to do something like
MPI_Send(&knapsack, sizeof(knapsack), MPI_BYTE, 1, 1, MPI_COMM_WORLD);
MPI_Recv(&knapsack, sizeof(knapsack), MPI_BYTE, 0, 1, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
to send a block of these pointer to a bunch of nodes to do calculations on. It will eventually be tested on 128 cores. The problem is I'm having a lot of trouble sending nested structs around the system and after printing out the contents with printf("%x", knapsack.rows[0]); I can see they dont match after being sent. I've been looking into I and realizing because my pointers are not in a continuous block it's not sending properly. I've been looking into serializing my knapsack and come across flatbuffers and protocol buffers. These seem overly complicated for this and it's been hard to find a good tutorial on it. My other option seems to be MPI_Pack() but I'm not sure how bad the time increase will be as the whole objective is to push this as high as possible and do it quickly. Does anyone have any advice on the best way to go about this? Thanks!
A few issues ...
You don't have a true 2D array. You have a 1D array of pointers to rows which are 1D arrays of pixels (e.g.
png_bytepis a pointer to an array ofpng_bytepixels).This can be slow because of the extra pointer indirection. Unless your functions operate strictly on rows.
Possibly better to have a true 2D array of pixels (e.g.):
Your
MPI_Send[andMPI_Recv] just send thestruct.But, after a [non-zero] rank receives it, the
png_byte *data;pointer is meaningless because it is an address within the address space of the rank 0 process.Sending the struct tells the receiver the geometry (i.e. height/width) and other metadata, but it does not send the actual data.
Although there [probably] is some more advanced
MPI_*calls, here are examples that use the simpleMPI_Send/MPI_Recv...Sender:
Receiver:
The above assumes that you want to send the entire array to each [worker] rank. This is the simplest.
But, after you get that working, you may want to use
MPI_Scatter/MPI_Gatherinstead to send partial/submatrices to each worker rank. That is, each rank only operates on a 2D subwindow of the full matrix.Of course, you could still use row pointers, but the actual calls that transfer
knapsack.datawould need to be a loop that has a separate call for eachrowFor some additional information on how to split up the data for performance, see my recent[!] answer: Sending custom structs in MPI
Side note: If your nodes will be on the same physical processor system (e.g. you have a 128 core machine) and all cores can map/share the same memory, you might be better off using
pthreadsoropenmp. Certainly, the overhead would be less and possibly the program would be more cache friendly. YMMV ...