GCC -O2 with -march / -ftree-vectorize

900 Views Asked by At

I am trying out several compiler switches against a program that performs sobel kernel convolution on two images( 2000Hx3000W and 6800Hx8500W ). There are some observations that I am not able to interprete, following are the data - compiler flags and time taken in secs (please focus on the last column, as it signifies convolution on Y axis for the larger image):

O2-march=barcelona                  0.1483326   0.833264    1.6018882   28.6711242
O2-ftree-vectorize                  0.1462104   0.847973    1.506708    26.628592
O2                                  0.1468406   0.8368156   1.5999718   20.61377564
O2-ftree-vectorize-march=barcelona  0.1441898   0.827366    1.4687354   15.2572644

I expected -O2-march=barcelona to be moderately better, considering the machine I am running on is AMD barcelona. Any ideas as to why -O2 is better than -O2 -march?

About -ftree-vectorize, it should be able to run instructions in parallel since my loop is dependence free. But then, -O2-ftree-vectorize-march=barcelona is the best of the lot, when individually there are reasonable differences in timing.

It would be great if I could understand this behavior.

Regards,
Sayan

0

There are 0 best solutions below