What is the instruction number per cycle in fma with minus?

Question

What is the instruction number per cycle in fma with minus?

396 Views Asked by Jannus YU At 02 September 2017 at 07:11

If I use fma(a, b, c) in cuda, it means that the formula ab+c is calculated in a single ternary operation. But if I want to calculate -ab+c, does the invoking fma(-a, b, c) take one more multiply operation ?

Original Q&A

There are 1 best solutions below

**tera** · Accepted Answer · 2017-09-02T14:10:32.400000

Unfortunately shader assembly language is undocumented at that level.

However we can try it out:

#!/bin/bash
cat <<EOF > fmatest.cu
__global__ void fma_plus(float *res, float a, float b, float c)
{
    *res = fma(a, b, c);
}

__global__ void fma_minus(float *res, float a, float b, float c)
{
    *res = fma(-a, b, c);
}
EOF
nvcc -arch sm_60 -c fmatest.cu
cuobjdump -sass fmatest.o

gives

code for sm_60
    Function : _Z9fma_minusPffff
.headerflags    @"EF_CUDA_SM60 EF_CUDA_PTX_SM(EF_CUDA_SM60)"
                                                                 /* 0x001fc400fe2007f6 */
    /*0008*/                   MOV R1, c[0x0][0x20];             /* 0x4c98078000870001 */
    /*0010*/                   MOV R0, c[0x0][0x148];            /* 0x4c98078005270000 */
    /*0018*/                   MOV R5, c[0x0][0x14c];            /* 0x4c98078005370005 */
                                                                 /* 0x001fc800fe8007f1 */
    /*0028*/                   MOV R2, c[0x0][0x140];            /* 0x4c98078005070002 */
    /*0030*/                   MOV R3, c[0x0][0x144];            /* 0x4c98078005170003 */
    /*0038*/                   FFMA R0, R0, -R5, c[0x0][0x150];  /* 0x5181028005470000 */
                                                                 /* 0x001ffc00ffe000f1 */
    /*0048*/                   STG.E [R2], R0;                   /* 0xeedc200000070200 */
    /*0050*/                   EXIT;                             /* 0xe30000000007000f */
    /*0058*/                   BRA 0x58;                         /* 0xe2400fffff87000f */
                                                                 /* 0x001f8000fc0007e0 */
    /*0068*/                   NOP;                              /* 0x50b0000000070f00 */
    /*0070*/                   NOP;                              /* 0x50b0000000070f00 */
    /*0078*/                   NOP;                              /* 0x50b0000000070f00 */
    ..................................


    Function : _Z8fma_plusPffff
.headerflags    @"EF_CUDA_SM60 EF_CUDA_PTX_SM(EF_CUDA_SM60)"
                                                                /* 0x001fc400fe2007f6 */
    /*0008*/                   MOV R1, c[0x0][0x20];            /* 0x4c98078000870001 */
    /*0010*/                   MOV R0, c[0x0][0x148];           /* 0x4c98078005270000 */
    /*0018*/                   MOV R5, c[0x0][0x14c];           /* 0x4c98078005370005 */
                                                                /* 0x001fc800fe8007f1 */
    /*0028*/                   MOV R2, c[0x0][0x140];           /* 0x4c98078005070002 */
    /*0030*/                   MOV R3, c[0x0][0x144];           /* 0x4c98078005170003 */
    /*0038*/                   FFMA R0, R0, R5, c[0x0][0x150];  /* 0x5180028005470000 */
                                                                /* 0x001ffc00ffe000f1 */
    /*0048*/                   STG.E [R2], R0;                  /* 0xeedc200000070200 */
    /*0050*/                   EXIT;                            /* 0xe30000000007000f */
    /*0058*/                   BRA 0x58;                        /* 0xe2400fffff87000f */
                                                                /* 0x001f8000fc0007e0 */
    /*0068*/                   NOP;                             /* 0x50b0000000070f00 */
    /*0070*/                   NOP;                             /* 0x50b0000000070f00 */
    /*0078*/                   NOP;                             /* 0x50b0000000070f00 */
    .................................

So the FFMA instruction can indeed take an additional sign to apply to the product (note that it is applied to b in the shader assembly instruction, however this gives the same result). You can try the same with double precision operands and other compute capabilities instead of sm_60 as well, which will give you similar results.

What is the instruction number per cycle in fma with minus?

There are 1 best solutions below

Related Questions in CUDA

Related Questions in FMA

Trending Questions

Popular # Hahtags

Popular Questions