The RISC-V instruction auipc does rd = (imm << 12) + PC, being rd the destination register and imm a 12 bit signed immediate.
The result of the above instruction will vary depending on at which address is the binary running. Let's suppose a system uses a bootloader to boot a firmware image. In that case, the initial PC for the firmware image will be different from 0x0. This fact will be reflected in the linker script by doing something like:
.text :
{
_text = .;
*(.text)
_etext = .;
} > FW_IMG
being FW_IMG something like:
FW_IMG (rx): ORIGIN = 2048, LENGTH = 2304
My question is, how can this work?
I mean, let's suppose a 32 bit CPU, and that the 4th instruction the compiler generates
is an auipc. Let's supposed that the FW image is to be placed at address 0x200000000, then, the PC will be 0x20000000 + 16 (4th instruction). Will the compiler be aware of this so it generates the right values etc. for the above auipc instruction?
EDIT
A good example of this is la. la is a pseudo-instruction that will be expanded to an auipc and an addi. If the compiler generates code to load a symbol, depending on where the image is to be located at runtime, the generated instructions will be different.
EDIT 2
I have tried to build the same image with 2 different linker scripts, completely different one from the other, and having that the first instruction is an la. The generated auipc instructions are indeed different in each cases, and they calculate the right address.
The only explanation I find to this is that, somehow, the assembler generates auipc 'placeholders' and then the linker fills them with the right values.
Let us ask the toolchain.
so.c
start.s
so.ld
no reason at this time for this to be an actually functioning program.
position dependent
position independent
In this case I put the got in the same section. So no major adjustment needed here. Get to the got, use the got to get to the data.
It tacked it on to .data if not specified apparently. But it is all good. You add 0x3000 to 0x3000 to get to 0x6000.
The call to more_fun is a pc-relative offset.
So until the program gets very big (or you play linker games to make function calls far apart) that all works.
Here is the thing about position independence...Think of it as the binary is a blob. If you load the binary above at 0x3000 then .data is at 0x6000, 0x3000 bytes away. But if you load at 0x20003000 then .data is at 0x20006000, which is still 0x3000 bytes away.
But, you have to update the got
But that is the whole point. You isolate the address of every global (or group of them) and put it in an table. Then if you want to relocate the program elsewhere you or the loader of the program has to find and change the entries in the got. In this case add 0x20000000 to all of them. Then the code all works.
In a bootloader situation where you are probably not an operating system parsing an elf file.
In your bootstrap you would auipc x15,0 to get the pc then you would use normal (linker plus programming) techniques to get the offset to and size of the got. And you would make the adjustment to each entry yourself before running code that relies on the .got to find the data.
Could the toolchain do this without a got?
Sure, but...
this
created an optimization I did not want.
I wanted this position independence
but despite asking for position independence I got this which is position dependent.
fpic vs fpie. You probably want the fpie to make life much easier but as shown you need to know the tools. The tools know how to do it but we seem to be able to trip them up.
This one bothered me and delayed even writing this answer.
LOL I thought this was completely broken, but now I see....Because I used the disassembler it broke it into 16 bit values so it is actually going to 0x40006000 and 0x30005000...whew
And just to confirm:
for fpie that works fine...and fpic does not change it based on different assumptions.
or
Depending on how you build it from that assembly language file.
Do I expect llvm to work exactly the same? Nope, I would personally go through the exercises before attempting to use that tool.
In general the toolchain (compiler, assembler, linker) work together, they pretty much have to. The compiler or even assembler will generate what it can with what it sees for that one object, or within one optimization domain. Then the linker does its job which depending on the ISA may modify individual instructions or fill in addresses or offsets in a pool or other to resolve all the externals. segment locations being external as well as they are not known at compile/assemble time. But then you can get into link time optimization or llvm has bytecode optimization between the frontend and backend that you can play with.
You have to know what items have to be pc-relative to each other, and then from that what items can move. .text relative to .data for example, can move the .text and not move the .data or can move both or can move .data without moving the .text, but the distance from .text to .got has to be fixed for some of those situations, but that is under your control.
If this is a bootloader situation then the loaded program is going into ram not some flash/rom and some ram so you can lump it all into one memory space and not have a .got or you can break it up and do the extra work, etc etc.
The concept and construction is similar for other instruction sets too, the specific details may vary, but the tools have to work together generating the right instructions, right EXTRA instructions, or .pool or other so that the linker can patch it all together modifying instructions or pool/table data.
The risc-v documents are about the worst I have seen in my career, the information we need seems to be there, but the organization and ability to find things is dreadful.
This is basically how we do (big) pc relative work in risc-v. The lower bits being zeroed out save having to do that ourselves or the linker having to do extra work with the offset in the following instruction(s). And as with most things you let the tools do the address work, you do not want to be counting instructions/bytes between things. And that address work is sometimes the compiler sometimes the assembler and sometimes the linker or a combination.
(I just did this .got thing yesterday or the day before here, and the tools were combining some data to make fewer entries in the .got which is obviously a good thing, could you imagine a program with a lot of globals or static locals? Position independents already adds enough overhead to the binary/data, but that would be...wow)