ARM VFPv3 assembler instructions

272 Views Asked by At

I am attempting to debug a very low level data fault in a TI AM 3358 MCU. It is coming from floating point math.

The system uses TI RTOS, the GNU 7.3.1 Compiler, and VFPv3 (is VFP a compiler settings? a FP math library? I'm not clear on the floating point code generation). So although I have disassembly listing fragments, the fix needs to be at the C code level.

This is a two part question:

First do I understand the mnemonics correctly? And why are some not listed?

I noticed the disassembly has opcodes that there are no mnemonics for. Here is a list fragment, no need to get into details here yet. Just notice mnemonics are missing, and I don't think they are immediate data (comments added by me as I reverse engineered the compiled code):

8003ced0:   EEF1FA10            vmrs       apsr_nzcv, fpscr     ; Pull STAT reg to ARM MCU
8003ced4:   DA000041            ble        #0x8003cfe0          ;  branch less-equal to x0x...3cfe0
8003ced8:   EEFD7BE0           .word       0xeefd7be0          ; ???  What is this
8003cedc:   EDC47A0A            vstr       s15, [r4, #0x28]     ; Store S15 <-  r4+28 = st->f2.z
8003cee0:   E584702C            str        r7, [r4, #0x2c]      ; Store r7 <-  r4+2c = st->f2.a
8003cee4:   E3A03000            mov        r3, #0
8003cee8:   E5843030            str        r3, [r4, #0x30]
8003ceec:   EE07CA90            vmov       s15, r12             ; ( I decode this below)
8003cef0:   EEF80BE7           .word       0xeef80be7           ; ???
8003cef4:   EE702BA2           .word       0xee702ba2           ; ???
8003cef8:   EEFD7BE2           .word       0xeefd7be2           ; ???
8003cefc:   EDC47A0D            vstr       s15, [r4, #0x34]
8003cf00:   E5845018            str        r5, [r4, #0x18]
8003cf04:   EE701BA1           .word       0xee701ba1
8003cf08:   EEFD7BE1           .word       0xeefd7be1

To be sure I could understand VFPv3 mnemonics, I decoded address 8002ceec as the following:

8003ceec:   EE07CA90            vmov       s15, r12
VMOV   (between ARM core register and single-precision register)
            1110    unconditional
            1110    
            0000    opt = 0: so this is TO the VFP
            0111    Vn = 7   (but still need one more bit from nibble 1)
            1100    Rt = 12 
            1010    
            1001    N = 1  (so n = 01111  =S15)
            0000

It came from https://developer.arm.com/documentation/ddi0406/c/Application-Level-Architecture/Instruction-Details/Alphabetical-list-of-instructions/VMOV--between-ARM-core-register-and-single-precision-register-?lang=en, (I'm pretty sure I got this correct, if not, any correction welcome)

So, what are op codes 0xeef80be7, 0xee702ba2, etc.? I am unable to decipher them in the ARM books or sites. Following the VFP/NEON pattern, this is some kind of 'unconditional move' but beyond that, I can't match the bit pattern to anything (and the web site is extremely unfriendly do this kind of search, I resorted to downloading a PDF and doing a bit search).

As for the second question, if there is an easy obvious answer, I'd appreciate being steered in the right direction.

This is a compiled C function which passes in a pointer to a structure. Then pulls members out of it and does some floating point math. I determined the structure address is stored in R4.

And example prototype would be

int Function(int x, int y, struct *a);

And is called as (fictional example)

Function (5,5,&st[0]);

later on

Function (5,7,&st[1]);

There is a Data Abort crash which only occurs when accessing the second structure. Never when accessing the first. And only when the VFP/Neon is accessing it, not the regular ARM registers.

Getting into the mud of the code, R4 is the address of the structure passed in:

8003cfe0:   EEFD7BE0           .word       0xeefd7be0          ;  branch lands here 
8003cfe4:   EDC47A06            vstr       s15, [r4, #0x18]    ;  CRASH  Store S15 <-  r4+24 = st->f1.x
8003cfe8:   E584C01C            str        r12, [r4, #0x1c]    ;  r12 = st->f1.y
8003cfec:   E3A03000            mov        r3, #0
8003cff0:   E5843020            str        r3, [r4, #0x20]

I verified all the offsets of the members from the pointers, and everything is correct.

Repeating, the crash occurs at address 8003cfe4, but only when the R4 pointer is pointing to the st[1], never when pointing to st[0].

I know a "Data Abort" comes from attempting to access memory that the MMU is not configured permissions for. And yet, everything else can access all the members of st[1]. This is only when the VFP code tries to access is.

In fact, at addresses 8003cedc, 8003cee0, and 8003cee8, which all execute before address 8003cfe4, can happily accessed members of that structure. Which makes me believe this is not a MMU access issue?

Could it be the result of a cache miss? Or is there some other VFP issue trying to move between the VFP system and memory? Or is there an issue where the coprocessor isn't ready yet?

I was able to get around this crash by removing all the floating point math. But that really harms the functionality of the application. I'd much prefer that the floating point math for correctly.

Any ideas would be welcomed.

-Scotty

2

There are 2 best solutions below

2
SpacemanScott On

While I don't have an answer to the unknown op codes, in an answer to the second part, the VFP coprocessor must have data transferred into and out of it on proper boundaries, in this case 4 bytes.

While the offsets into the structure were correctly aligned, the base of the structure itself was not. It started (due to packing) at address0x...2931. So the offset at 40 bytes in (+0x28) was on an odd number address.

Simply adding

}  __attribute__ ((aligned (4)))

at the end of the structure declaration solved the problem.

*** Update ***

I attempted many ways to replicate this issue in a code fragment. In all cases, the compiler generated code that moved the 32bit value from memory to a register before moving it into the neon processor.

I was able to forcibly cause the data fault with an inline assembler statement attempting to move unaligned data directly from an odd numbered address to the neon processor.

asm("vstr  s15, [r0, #0x4] ");

(R0 contained the base address ending in x1)

Therefore this is likely an optimization bug in the GNU compiler.

I post this in the event someone else gets bitten by this issue.

0
Pete Lomax On

Since I'm currently writing a disassembler, I quickly threw the hex from the question at it, and got the following result - no warranties or apologies!

EEF1FA10h,fmstat
DA000041h,ble #000083AC
EEFD7BE0h,ftosizd s15,d0
EDC47A0Ah,fsts s15,[r4+40]
E584702Ch,str r7, [r4+44]
E3A03000h,mov r3, 0
E5843030h,str r3, [r4+48]
EE07CA90h,fmsr s15,r12
EEF80BE7h,fsitod d0,s15
EE702BA2h,faddd d2,d0,d2
EEFD7BE2h,ftosizd s15,d2
EDC47A0Dh,fsts s15,[r4+52]
E5845018h,str r5, [r4+24]
EE701BA1h,faddd d1,d0,d1
EEFD7BE1h,ftosizd s15,d1