I'm reading the book Learn to program with Assembly by Jonathan Bartlett. I'm in chapter 7 on Data Record.The author introduces structs and records in Assembly. He created a simple record of a person in the file personsdata.s, where there is an array of 6 people and their characteristics: weight, hair colour, height and age.
persondata.s
.section .data
.globl people, numpeople
numpeople:
# Calculate the number of people in array
.quad (endpeople - people)/PERSON_RECORD_SIZE
people:
# Array of people
.quad 200, 2, 74, 20
.quad 280, 2, 72, 44 # me!
.quad 150, 1, 68, 30
.quad 250, 3, 75, 24
.quad 250, 2, 70, 11
.quad 180, 5, 69, 65
endpeople: # Marks the end of the array for calculation purposes
# Describe the components of the struct
.globl WEIGHT_OFFSET, HAIR_OFFSET, HEIGHT_OFFSET, AGE_OFFSET
.equ WEIGHT_OFFSET, 0
.equ HAIR_OFFSET, 8
.equ HEIGHT_OFFSET, 16
.equ AGE_OFFSET, 24
# Total size of the struct
.globl PERSON_RECORD_SIZE
.equ PERSON_RECORD_SIZE, 32
This file is only about this data record. At the end of this file there is constant PERSON_RECORD_SIZE which described what is the size in bytes of a single person is array. It's later used in a loop to go to the next person, specifically to its height. He created the program that returns a biggest height value.- tallest.s
tallest.s
.globl _start
.section .text
_start:
### Initialize Registers ###
# Pointer to first record
leaq people, %rbx
# Record count
movq numpeople, %rcx
# Tallest value found
movq $0, %rdi
### Check Preconditions ###
80Chapter 7
Data Records
# If there are no records, finish
cmpq $0, %rcx
je finish
### Main Loop ###
mainloop:
# %rbx is the pointer to the whole struct
# This instruction grabs the height field
# and stores it in %rax
movq HEIGHT_OFFSET(%rbx), %rax
# If it is less than or equal to our current
# tallest, go to the next one.
cmpq %rdi, %rax
jbe endloop
# Copy this value as the tallest value
movq %rax, %rdi
endloop:
# Move %rbx to point to the next record
addq $PERSON_RECORD_SIZE, %rbx
# Decrement %rcx and do it again
loopq mainloop
### Finish it off ###
finish:
movq $60, %rax
syscall
I understand everything in this record except one thing. We access the person's height using movq HEIGHT_OFFSET(%rbx), %rax and I understand that, but when it comes to moving to the next person and specifically the person's height, he uses addq $PERSON_RECORD_SIZE, %rbx. And here is my question. If this line is just about adding 32 to rbx to move to the next person and their height value in memory why is it using the $ sign before the constant name. I thought direct memory mode would be appropiraite here. We use it at the beginning movq numpeople, %rcx to move the number of people into the rcx module.
I've put $32 instead of $PERSON_RECORD_SIZE and it works fine. But when I put PERSON_RECORD_SIZE there is a segmentation error. I don't understand this. There seems to be some inconsistency.Is it some other addressing mode that I'm not aware of? (keep in mind I've been learning assembly for a couple of weeks now and I'm not a software engineer on a daily basis).I'm sure it's some detail I'm missing.
.equ foo, 32defines an assemble-time constant, not assembling any bytes in the current section of the output.Actually it defines a symbol like a
foo:label would, but "address"32, that's why in GAS you're able to export it with.globl fooso it's visible to the linker in the symbol table of the.ofile.add $foo, %rbxadds 32, the "value" (aka address) of the symbol.add foo, %rbxwould load from absolute address32, using the symbol as the address of a memory operand.In both cases, the symbol "address" becomes part of the machine code of the instruction being assembled, the difference is only whether the opcode is one that uses it as an immediate or as a memory operand. That's also true if you do
.quad footo emit 8 bytes with that value (address). This applies regardless of whether the symbol address aka value is labeling a position in some section, or is an integer defined with.equorfoo = 123(alternate syntax for the same thing).Unfortunately when assembling
add $foo, %rbx, the assembler doesn't know the value yet (it's only a link-time constant since it's an undefined symbol in this file). So it picksadd $imm32, %rbx, making the instruction 3 bytes larger in machine code thanadd $32, %rbxwhich would see the small value at assemble time and be able to pick the 8-bit immediate encoding. (https://www.felixcloutier.com/x86/add). For this reason, I wouldn't recommend using.equacross asm source files. Use the C preprocessor so you can#define foo 32in a.hthat you#includein both files.(The difference between a symbol labeling an address in some section vs. an absolute constant (section
*UND*) actually creates ambiguity in GNU assembler.intel_syntax noprefixmode, but that applies even for code that uses it earlier in the same source file than the.equ, not just across source files: see Distinguishing memory from constant in GNU as .intel_syntax. There's actually one ambiguity even in AT&T syntax, but only in a corner case that's not useful.)The "value" of a symbol is its address. (A rough analogy is
extern char foo[], so writingfoois the address, but AT&T syntax dereferences bare symbol names implicitly in instructions; the analogy works better in NASM.) If there are bytes in memory at that address, you can use the symbol to assemble instructions that will at runtime access them, but you can't get the bytes into an immediate embedded in the machine code, or use it to control a.reptor anything like that.Variables are a high-level language concept which you can implement in assembly using labels to define symbols ahead of directives like
.byteto emit some static storage, such asbar: .byte 123. The symbolbarhas a "value" which other instructions can refer to at assemble/link time (e.g. to generate a byte-load from 4 bytes later, likemovzbl bar+4(%rip), %eax).An asm source line involving the symbol
foowill assemble bytes into the output that include the symbol's "address" (or things based on it, like for relative addressing, ormov $foo-bar, %eaxfor distance between two symbols.)But you can't make the assembler reference the
123byte or any other bytes that happen to be at or near the address the symbol is attached to. Access to bytes assembled into the output can only happen at run-time. e.g..quad baremits the 64-bit absolute address, and.byte bartries to fit the absolute address into a byte but will fail at link time (unless you had a linker script that put your .data section in the very bottom of address-space!). If you wanted to avoid repeating yourself and hard-coding123in multiple places, you'd need to use an assemble-time constant like.equ barval, 123and use that in multiple places, like.byte barvalin multiple places.Use RIP-relative addressing
In 64-bit code you'd normally never write
add symbol, %reg; the 32-bit absolute addressing mode is not efficient or useful for anything except maybe MMIO at some absolute address, not for your own.data. You always wantadd symbol(%rip), %regRIP-relative addressing ifsymbolis the address of static storage. Using symbol addresses as 32-bit absolute is useful when used with other registers as an array index, as inadd my_array(,%rdx,8), %raxor something, but see 32-bit absolute addresses no longer allowed in x86-64 Linux? (you can't do that in a PIE executable.)See also Referencing the contents of a memory location. (x86 addressing modes) re: x86-64's selection of addressing modes.
And How do RIP-relative variable references like "[RIP + _a]" in x86-64 GAS Intel-syntax work? for more detail about the difference in meaning for
foo(%rip)vs.123(%rip).