Im using the intel-oneapi/vtune module so aren't the OpenCL drivers already bootstrapped to it, just like the intel-oneapi/compiler.
The reason I ask is when I run this vtune command:
vtune -collect hotspots -result-dir=vtune_result01 ./test_HPCCG 20 20 20
from the terminal I get this:
vtune: Warning: Hardware collection of CPU events is not possible on this system. Microarchitecture performance insights will not be available.
I still get some results:
CPU Time: 24.350s
Effective Time: 11.637s
Spin Time: 11.001s
| A significant portion of CPU time is spent waiting. Use this metric
| to discover which synchronizations are spinning. Consider adjusting
| spin wait parameters, changing the lock implementation (for example,
| by backing off then descheduling), or adjusting the synchronization
| granularity.
|
Imbalance or Serial Spinning: 2.912s
| The threading runtime function related to time spent on imbalance
| or serial spinning consumed a significant amount of CPU time.
| This can be caused by a load imbalance, insufficient concurrency
| for all working threads, or busy waits of worker threads while
| serial code is executed. If there is an imbalance, apply dynamic
| work scheduling or reduce the size of work chunks or tasks. If
| there is insufficient concurrency, consider collapsing the outer
| and inner loops. If there is a wait for completion of serial
| code, explore options for parallelization with Intel Advisor,
| algorithm, or microarchitecture tuning of the application's
| serial code with VTune Profiler Basic Hotspots or
| Microarchitecture Exploration analysis respectively. For OpenMP*
| applications, use the Per-Barrier OpenMP Potential Gain metric
| set in the HPC Performance Characterization analysis to discover
| the reason for high imbalance or serial spin time.
|
Lock Contention: 0.230s
Other: 7.859s
Overhead Time: 1.712s
Creation: 0.968s
Scheduling: 0.745s
Reduction: 0s
Atomics: 0s
Other: 0s
Total Thread Count: 32
Paused Time: 0s
Top Hotspots
Function Module CPU Time % of CPU Time(%)
-------------------------------------------------- ----------------------------- -------- ----------------
sched_yield libc.so.6 7.859s 32.3%
_INTERNAL65c1faee::tbb::detail::d0::machine_pause libtbb.so.12 5.643s 23.2%
[TBB Scheduler Internals] libtbb.so.12 2.726s 11.2%
Intel::OpenCL::Utils::AtomicCounter::operator long libcpu_device.so.2021.13.11.0 1.490s 6.1%
tbb::detail::r1::market::process libtbb.so.12 0.722s 3.0%
[Others] N/A 5.910s 24.3%
It displays that the correct cpu so im sure its detected the device. Or is this just not supported on this:
Top Tasks
Task Type Task Time Task Count Average Task Time
---------------- --------- ---------- -----------------
tbb_parallel_for 11.442s 34,018 0.000s
tbb_custom 1.246s 6,388 0.000s
Collection and Platform Info
Application Command Line: ./test_HPCCG "20" "20" "20"
Operating System: 5.4.0-167-generic DISTRIB_ID=Ubuntu DISTRIB_RELEASE=20.04 DISTRIB_CODENAME=focal DISTRIB_DESCRIPTION="Ubuntu 20.04.6 LTS"
Computer Name: ncc1
Result Size: 24.8 MB
Collection start time: 11:17:45 18/03/2024 UTC
Collection stop time: 11:17:57 18/03/2024 UTC
Collector Type: User-mode sampling and tracing
CPU
Name: Intel(R) Xeon(R) Processor code named Skylake
Frequency: 2.100 GHz
Logical CPU Count: 32
Cache Allocation Technology
Level 2 capability: not detected
Level 3 capability: available
If someone could also explain what the above input means. I dont get why I get the above instead of actual functions. Thanks!