Intel OneApi Vtune profiler not supporting my microarchitecture

36 Views Asked by At

Im using the intel-oneapi/vtune module so aren't the OpenCL drivers already bootstrapped to it, just like the intel-oneapi/compiler.

The reason I ask is when I run this vtune command:

vtune -collect hotspots -result-dir=vtune_result01 ./test_HPCCG 20 20 20

from the terminal I get this:

vtune: Warning: Hardware collection of CPU events is not possible on this system. Microarchitecture performance insights will not be available.

I still get some results:

    CPU Time: 24.350s
        Effective Time: 11.637s
        Spin Time: 11.001s
         | A significant portion of CPU time is spent waiting. Use this metric
         | to discover which synchronizations are spinning. Consider adjusting
         | spin wait parameters, changing the lock implementation (for example,
         | by backing off then descheduling), or adjusting the synchronization
         | granularity.
         |
            Imbalance or Serial Spinning: 2.912s
             | The threading runtime function related to time spent on imbalance
             | or serial spinning consumed a significant amount of CPU time.
             | This can be caused by a load imbalance, insufficient concurrency
             | for all working threads, or busy waits of worker threads while
             | serial code is executed. If there is an imbalance, apply dynamic
             | work scheduling or reduce the size of work chunks or tasks. If
             | there is insufficient concurrency, consider collapsing the outer
             | and inner loops. If there is a wait for completion of serial
             | code, explore options for parallelization with Intel Advisor,
             | algorithm, or microarchitecture tuning of the application's
             | serial code with VTune Profiler Basic Hotspots or
             | Microarchitecture Exploration analysis respectively. For OpenMP*
             | applications, use the Per-Barrier OpenMP Potential Gain metric
             | set in the HPC Performance Characterization analysis to discover
             | the reason for high imbalance or serial spin time.
             |
            Lock Contention: 0.230s
            Other: 7.859s
        Overhead Time: 1.712s
            Creation: 0.968s
            Scheduling: 0.745s
            Reduction: 0s
            Atomics: 0s
            Other: 0s
    Total Thread Count: 32
    Paused Time: 0s

Top Hotspots
Function                                            Module                         CPU Time  % of CPU Time(%)
--------------------------------------------------  -----------------------------  --------  ----------------
sched_yield                                         libc.so.6                        7.859s             32.3%
_INTERNAL65c1faee::tbb::detail::d0::machine_pause   libtbb.so.12                     5.643s             23.2%
[TBB Scheduler Internals]                           libtbb.so.12                     2.726s             11.2%
Intel::OpenCL::Utils::AtomicCounter::operator long  libcpu_device.so.2021.13.11.0    1.490s              6.1%
tbb::detail::r1::market::process                    libtbb.so.12                     0.722s              3.0%
[Others]                                            N/A                              5.910s             24.3%

It displays that the correct cpu so im sure its detected the device. Or is this just not supported on this:

Top Tasks
Task Type         Task Time  Task Count  Average Task Time
----------------  ---------  ----------  -----------------
tbb_parallel_for    11.442s      34,018             0.000s
tbb_custom           1.246s       6,388             0.000s
Collection and Platform Info
    Application Command Line: ./test_HPCCG "20" "20" "20" 
    Operating System: 5.4.0-167-generic DISTRIB_ID=Ubuntu DISTRIB_RELEASE=20.04 DISTRIB_CODENAME=focal DISTRIB_DESCRIPTION="Ubuntu 20.04.6 LTS"
    Computer Name: ncc1
    Result Size: 24.8 MB 
    Collection start time: 11:17:45 18/03/2024 UTC
    Collection stop time: 11:17:57 18/03/2024 UTC
    Collector Type: User-mode sampling and tracing
    CPU
        Name: Intel(R) Xeon(R) Processor code named Skylake
        Frequency: 2.100 GHz
        Logical CPU Count: 32
        Cache Allocation Technology
            Level 2 capability: not detected
            Level 3 capability: available

If someone could also explain what the above input means. I dont get why I get the above instead of actual functions. Thanks!

0

There are 0 best solutions below