I am execution a parallel kernel on GPU using OpenCL and JOCL.
I want to know:
1/ Is there any functions to know the kernel size in term of work-items and work groups and how it is executed in my Nvidia GPU platform?
2/ is there a possibility to know the execution time of the kernel without GPU/CPU data transfers because? I used java tools System.currentTimeMillis();
before starting the kernel and after but it includes the data transfers time.
3/ more precisely is there any possibility to know the execution time of each GPU core?
1) In kernel,
total number is multiplication of them but if kernel is launched only 1-dim then only first function is enough.
gives same thing for items in groups, not total items.
is similar but gives number of groups in total groups.
Number of dimensions are taken from
2) Event based performance queries from host code:
http://www.jocl.org/cloth/docs/doc-utils/org/jocl/utils/Events.html
computeExecutionTimeMs(org.jocl.cl_event event) Compute the execution time for the given event, in milliseconds.
1), 2) and 3) a profiler
can show all except "each core"(but gives info of "Lanes" which may not map to same core at all times but you can see what a single thread was doing) part. https://developer.nvidia.com/nvidia-nsight-visual-studio-edition visuals and tables give enough information about bottlenecks and kernel hotspots