Which of the "+" calculation is faster? 1) uint2 a, b, c; c = a + b; 2) ulong a, b, c; c = a + b;
Is uint2 operations faster than ulong in OpenCL on AMD GCN cards?
375 Views Asked by user1200759 At
1
There are 1 best solutions below
Related Questions in OPENCL
- What is the parameter for CLI YOLOv8 predict to use Intel GPU?
- How to exploit Unified Memory in OpenCL with CL_MEM_ALLOC_HOST_PTR flag?
- PyOpenCl code hanging on a simple get() - how can I troubleshoot?
- OpenCL dynamic parallelism enqueue_kernel() functionality
- Do all OpenCL drivers come with the IntelOneAPI compiler
- How to move an array of structures to the GPU?
- Passing arguments to OpenCL kernel, before execution finished
- OpenCV acceleration (OpenCL) of gaussian blur
- CL_DEVICE_NOT_AVAILABLE using Intel(R)Xeon(R)Gold 6240 CPU
- Launch Single Kernel on problem space vs Launch same kernel, multiple times on smaller problem spaces
- Running OpenCL programs on baremetal RISC-V core
- Why did an OpenCL rendering optimization make my code slower?
- OpenCL Kernel hangs at clEnqueueReadBuffer on AMD rocm
- Is it possible to assign works to each GPU thread instead of a work to group of GPU threads?
- Fast way to rearrange bit into new byte
Related Questions in AMD-GPU
- OpenCL dynamic parallelism enqueue_kernel() functionality
- WARNING: amdgpu dkms failed for running kernel
- Compiling hip code using hipcc -O0 for AMD GPU
- Accelerated PyTorch for Macbook with AMD GPUS
- Blender and other 3D applications don't launch
- How to compile clang llvm to amd gcn on linux ubuntu
- [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=1552686, emitted seq=1552688
- libc6-dev/libc-dev : "Unable to fix problems, bad packets are in “keep as is” mode."
- How do I Load Multiple Float4 from Memory to Registers using Inline GCN assembly in AMD HIP?
- Running pytorch or tensorflow in AMD APU
- How does the Linux DRM GEM work with the TTM in memory management?
- How can I make fragment_shader have a ouput to stencil_attachment?
- GLSL Error: '##' : not supported for these tokens
- hipMemcpy fails to copy
- Linux Stripes on Screen
Related Questions in AMD-GCN
- Data Loading into GCN
- How do I Load Multiple Float4 from Memory to Registers using Inline GCN assembly in AMD HIP?
- How to resolve _pickle.UnpicklingError
- Performance drop in matrix multiplication for certain sizes on AMD Polaris
- In OpenCL, can one take an array containing GCN Assembly and execute it (JIT)?
- What is the best practice for memory access in this N-body problem solved on AMD Radeon RX580?
- SIMD-16 and SIMD-32 advantage/disadvantage?
- How to read and write to Global Data Share in AMD GCN?
- How to compile .cl file that contains inline assembly for GCN cards?
- Is uint2 operations faster than ulong in OpenCL on AMD GCN cards?
- How to run two work groups per one compute unit on AMD GCN cards
- OpenCL and AMD GPU Architecture understanding
- V_SUB_F64 in AMD's GCN and VEGA instruction set
- GCM not receiving on ColorOS based devices
- OpenCL (AMD GCN) global memory access pattern for vectorized data: strided vs. contiguous
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
AMD GCN has no native 64-bit integer vector support, so the second statement would be translated into two 32-bit adds, one V_ADD_U32 followed by a V_ADDC_U32 which takes the carry flag from the first V_ADD_U32 into account.
So to answer your question they are both the same in terms of instruction count, however the first can be computed in parallel (instruction level parallelism) and could be faster IF your kernel is occupancy bound (ie. using lots of registers).
If your statements can be executed by the scalar unit (ie. they do not depend on the thread index) then the game changes and the second one will be just one instruction (vs. two) since the scalar unit has native 64-bit integer support.
However keep in mind your first statement is not the same as the second, you would lose the carry flag.