What does the "denormal input" exactly mean in assembly when we consider using DAZ flag for SSE Floating Points

Question

What does the "denormal input" exactly mean in assembly when we consider using DAZ flag for SSE Floating Points

410 Views Asked by lionel At 27 April 2020 at 11:35

I've read This article and do-denormal-flags-like-denormals-are-zero-daz-affect-comparisons-for-equality and I understand the usage and difference between FTZ and DAZ flags.

DAZ applies on input, FTZ on output from an FP operation.

What confused me is where does the denormal value come from in assembly view if FTZ is set. I think it can only be constant values either as immediate operands or from section .rodata (accessed with RIP-relative addressing).

But I found in my binary, there are no denormal values in these places but it still suffers from FP-ASSIST issues, causing bad performance.

If I set both DAZ and FTZ, the issue disappears and performance gets better. Actually I don't even find any denormal inputs in my source code. I am really confused, where does the denormal values come from?

Another question by the way, for instruction vmovsd 0x9498(%rip),%xmm0, supposing 0x9498(%rip) is a denormal value, what happens to xmm0 after this instruction executes, if we set FTZ or DAZ respectively?

In my understanding, DAZ would make it take 0x9498(%rip) as zero and mov 0 to xmm0; FTZ would move 0x9498(%rip) to xmm0 and found it is a denormal, so flush xmm0 to zero. I'm not sure, is it correct?

Original Q&A

There are 1 best solutions below

**Peter Cordes** · Answer 1 · 2020-04-27T12:30:51.430000

A denormal aka subnormal is a value with exponent field = 0 in the IEEE binary format. https://en.wikipedia.org/wiki/Double-precision_floating-point_format

When an FP math instruction (not move or pure bitwise boolean) reads such a number as an input operand, it has to handle that special case when lining up the mantissa with the other operand, and when applying the implicit top bit of the mantissa that's implied by the exponent being 0 or non-zero.

Yes most of the time FTZ on ouput is sufficient because most floating-point values are the results of other FP computations. And yes, FTZ is necessary because mul/div/add/sub on normal numbers can create a subnormal result. (For add the inputs need opposite signs). The other IEEE "basic" exactly-rounded operation, sqrt, can't create subnormals because it makes numbers closer to 1.0.

The obvious thing would be to use perf record to find out where you're getting FP-assists, and add some extra checks there to print or something when you find a denormal there. (Then set a breakpoint in that branch so you can examine the situation.)

Possible sources of denormals (not exhaustive) with FTZ set, i.e. other than FP math ops:

String-to-float that builds an FP bit-pattern with extended-precision integer, like Glibc's strtod
Input files / network if you're reading binary data.
Other threads or via shared memory from other processes running without FTZ. (FTZ/DAZ and rounding mode in MXCSR are per-thread architectural state. Speaking of which, if you only set FTZ in the main thread after starting another thread, it won't be effective for the already-started thread.)
Possibly integer manipulation of FP bit-patterns like nextafter. Also Possibly as part of the internals of an exp implementation that stuffs an integer into an exponent field of a double.
Compile-time constant values. They don't have to appear in the source code as a literal value, though. e.g. static double foo = DBL_MIN / 4.0; would be a compile-time denormal. But you would find them in .rodata or .data. Non-const non-zero static / global variables go in .data.

Obviously any manual manipulation of FP bit-patterns using integer stuff can do it, too. How to use bits in a byte to set dwords in ymm register without AVX2? (Inverse of vmovmskps) could have produced denormal inputs to a compare if I didn't spend an extra instruction to avoid it, but that's an unusual manual vectorization trick that compilers wouldn't be doing for you.

immediate operands

x86 doesn't have FP immediates; you'd have to mov rax, imm64 / movq xmm0, rax or similar. But compilers don't do that because it's generally more efficient to load from .rodata.

for instruction vmovsd 0x9498(%rip),%xmm0

vmovsd is just a load and always copies the 64 bits exactly; architecturally equivalent to a vmovq SIMD-integer load.

It doesn't run the value through an ALU so no MXCSR bits have any effect on vmovsd, FP shuffles, etc. Only instructions that do actual FP math and can raise FP exceptions are affected. You can tell by looking at the exceptions section of the asm manual entry. e.g. roundsd does obey DAZ to possibly round the input to zero before rounding it according to the specified mode.

What does the "denormal input" exactly mean in assembly when we consider using DAZ flag for SSE Floating Points

There are 1 best solutions below

Related Questions in FLOATING-POINT

Related Questions in SSE

Related Questions in INSTRUCTIONS

Related Questions in FAST-MATH

Related Questions in DENORMAL-NUMBERS

Trending Questions

Popular # Hahtags

Popular Questions