Accumulating Two Tensor Core wmma::accumulator Fragments

147 Views Asked by Elvir Crncevic At 04 December 2023 at 14:30

Let's say that I have two instances of wmma::fragment<wmma::accumulator, 16, 16, 16, half> a, b; (namely a and b). How would I go about conducting an element-wise addition of a and b and storing the result back into a?

Original Q&A

There are 1 best solutions below

Sebastian On 04 December 2023 at 22:27 BEST ANSWER

wmma fragments are actually stored in registers of the threads of a warp. So operations can be done, if each thread knows, what to do.

Scientists at the Tokyo Institute of Technology have developed a C++ library wmma_extension to (among other functions like recovering FP32 accuracy from TF32 tensor core operations) easily do arithmetic operations on wmma fragments.

The library can be found here: https://github.com/wmmae/wmma_extension

Doing arithmetic operations as a simple one-liner (plus the include) is shown here: https://github.com/wmmae/wmma_extension/blob/main/docs/ops.md

The scientists have released two related papers in 2023:

https://arxiv.org/pdf/2308.15152.pdf (Reducing shared memory footprint to leverage high throughput on Tensor Cores and its flexible API extension library)
https://arxiv.org/pdf/2203.03341.pdf (Recovering single precision accuracy from Tensor Cores while surpassing the FP32 theoretical peak performance)

Accumulating Two Tensor Core wmma::accumulator Fragments

There are 1 best solutions below

Related Questions in C++

Related Questions in DEEP-LEARNING

Related Questions in CUDA

Related Questions in GPU

Related Questions in CUDA-WMMA

Trending Questions

Popular # Hahtags

Popular Questions