I am trying to use the Global Data Share (GDS) on AMD RX 480 for my application either on Linux or Windows. Although the GCN3 Specification Manual states that you can access the GDS without restrictions by setting an appropriate value to the m0 register and issuing Data Share instructions with the GDS bit on, that has not been the case in my experience. More specifically, you cannot access the GDS on Windows at all, and only 4KB is available at a time on Linux. Is there any way to lift this nonsensical restriction at all? Please refer to the following thread for the context of this question:
Is It Possible to Access the Entire 64KB of the Global Data Sharea (GDS) on AMD's GCN2+ GPU's?
347 Views Asked by meriken2ch At
0
There are 0 best solutions below
Related Questions in ASSEMBLY
- Is there some way to use printf to print a horizontal list of decrementing hex digits in NASM assembly on Linux
- How to call a C language function from x86 assembly code?
- Binary Bomb Phase 2 - Decoding Assembly
- AVR Assembly Clock Cycle
- Understanding the differences between mov and lea instructions in x86 assembly
- ARM Assembly code is not executing in Vitis IDE
- Which version of ARM does the M1 chip run on?
- Why would %rbp not be equal to the value of %rsp, which is 0x28?
- Move immediate 8-bit value into RSI, RDI, RSP or RBP
- Unable to run get .exe file from assembly NASM
- DOSbox automatically freezes and crashes without any prompt warnings
- Load function written in amd64 assembly into memory and call it
- link.exe unresolved external symbol _mainCRTStartup
- x86 Wrote a boot loader that prints a message to the screen but the characters are completely different to what I expected
- running an imf file using dosbox in parallel to a game
Related Questions in GPGPU
- OpenCL dynamic parallelism enqueue_kernel() functionality
- Sign a PGP public key using a private key and password, then save the signed key to a file
- Passing arguments to OpenCL kernel, before execution finished
- CUDA kernel for finding the min and max index of values in a 1D array greater than particular threshold
- Cuda __device__ member function with explicit template declaration
- AMD GPU Compute with c++
- Why is webgpu on mac "max binding size" much smaller than reported "max buffer size"?
- Running multiple times a python script from different threads using different gpus
- GPGPU with Radeon Pro VII in Windows
- Pytorch Memory Management Issue
- Perform vector calculation on GPU in C++, regardless of brand
- Reinterpret cast on *shared memory*
- Can I really launch a library kernel (CUkernel) rather than an in-context kernel (CUfunction)?
- How to use shared memory in PyCuda, LogicError: cuModuleLoadDataEx failed: an illegal memory access was encountered
- What (if anything) is this GPU compute or shader pattern called?
Related Questions in AMD-GCN
- Data Loading into GCN
- How do I Load Multiple Float4 from Memory to Registers using Inline GCN assembly in AMD HIP?
- How to resolve _pickle.UnpicklingError
- Performance drop in matrix multiplication for certain sizes on AMD Polaris
- In OpenCL, can one take an array containing GCN Assembly and execute it (JIT)?
- What is the best practice for memory access in this N-body problem solved on AMD Radeon RX580?
- SIMD-16 and SIMD-32 advantage/disadvantage?
- How to read and write to Global Data Share in AMD GCN?
- How to compile .cl file that contains inline assembly for GCN cards?
- Is uint2 operations faster than ulong in OpenCL on AMD GCN cards?
- How to run two work groups per one compute unit on AMD GCN cards
- OpenCL and AMD GPU Architecture understanding
- V_SUB_F64 in AMD's GCN and VEGA instruction set
- GCM not receiving on ColorOS based devices
- OpenCL (AMD GCN) global memory access pattern for vectorized data: strided vs. contiguous
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?