Crash when adding or removing data in code on an IRQ

135 Views Asked by At

I am developing an operating system as an hobby, and I'm having a strange issue when handling keyboard IRQ, I don't know why but I'm getting an Invalid Op Code ISR when too few string is presents in the code, and a breakpoint ISR when adding a little more string.

If I add a sufficient amount of string, the code works as excepted, and nothing goes wrong, but I can't just add string in the code and ignore the problem without understanding it...

Here is my ISR/IRQ handler (first argument of the macro is "isr" or "irq", second argument is the c handler, third argument is present only if handling irq)

%macro handler_macro 2-3
    handle_%+%1:
        pusha

        mov ax, ds
        push eax    ; save data segment
        mov esi, eax

        mov ax, 0x10
        mov ds, ax
        mov es, ax
        mov gs, ax
        mov fs, ax  ; set data segments

        push esp ; push the stack for the first arg of the c function
        call %2
        pop esp ; pop the pushed stack
        %if %0 = 3
            pop ebx
            mov ebx, esi
            mov ds, bx
            mov es, bx
            mov gs, bx
            mov fs, bx
        %else
            pop eax
            mov eax, esi
            mov ds, ax
            mov es, ax
            mov gs, ax
            mov fs, ax  ; restore data segments
        %endif

        popa

        add esp, 8  ; Remove the 2 params that's left in the stack
        sti          ; reenable interrupts
        iret
%endmacro

And there is the C handler for irq

/*
Structure in .h:
typedef struct reg_s {
    unsigned int eax_save;
    unsigned int edi, esi, ebp, esp, ebx, edx, ecx, eax;
    unsigned int code;
    unsigned int error;
    //unsigned int eip, cs, eflags, useresp, ss;
} reg_t;
*/

void irq_handler(reg_t *test)
    // Getting ISR 6 when all printfs are commented
    //printf("IRQ: %d\n", test->code); Uncommenting this one adds keeps on ISR 6
    //printf("");                      Same
    //printf("");                      Same
    //printf("");                      Uncommenting this one make the program works
    if (handler_fun[test->code] != NULL) {
        void (*fun)(void) = (void (*)(void)) handler_fun[test->code];
        //printf("Fun: %p\n", fun);
        fun();
    }
    pic_send_eio(test->code);
}

In case this could help the value of the fun pointer doesn't change from one test to another, and a difference that could (maybe?) change something is that by uncommenting the 4 printfs, the number of sector readed from disk switch from 45 to 46

2

There are 2 best solutions below

0
Architek On BEST ANSWER

In case anyone stumble on this, I found where this was comming from.

This was a really poor error just based on where I loaded the kernel in memory.

Basically, at some point when I added some bytes, gcc optimized the program to align the address of the functions correctly in memory, which caused the program to grow slightly more.

This slight increase in the kernel size made the kernel overwrite the stack a bit, which caused a panic in the system.

So for anyone who stumble on the same thing : Just check that your kernel and stack are correctly placed (and also that your kernel doesn't overwrite the bootloader by error before you perfom the far jump)

And another advice that I would give to avoid this dumb kind of error : Just write a multi-stage bootloader, this way you will be able to unlock more memory, put your kernel above the 1MB bareer, and in addition to have a more advanced bootloader, you will not stumble into this kind of error.

0
Brendan On

Use a debugger (e.g. maybe the debugger built into Bochs) to check that the krenel is loaded properly (and not corrupted after loading and before executing); to determine the address that caused the invalid opcode exception; and/or to single-step your interrupt handler and figure out what's going on.

Apart from that (unrelated to the symptoms):

a) I find it odd that your assembly stub saves and restores ds but not es or other data segment registers. Either don't bother at all (if user-space is "flat memory, no segmentation, no reason for user-space to modify DS in the first place") or do all segment registers (if user-space is expected to use segments, which isn't a common design choice).

b) IRQ handlers should never need any information about the interrupted code's state (e.g. void irq_handler(void)) so (for IRQ handlers) there's unnecessary work done in your assembly stub (including saving the "callee preserved" registers in the pusha that the IRQ handler won't need to access). Exception handlers are not IRQ handlers (and may have an error code that needs to be taken into account, may need to save CR2 to guard against a 2nd page fault trashing the 1st page fault's CR2 contents, may need to be task gates to guard against "kernel stack buggy" problems, may have to work around quirks like the resume flag, etc); and in general exception handlers probably shouldn't use a common handler (or common macro) at all because there's so many differences in the assembly stubs for them.

c) For PIC chips; it's nice to detect "spurious IRQ 7" and "spurious IRQ 15" in the IRQ's assembly stub and only call the (higher level) IRQ handler if the IRQ is a real IRQ. This makes the higher level code cleaner while also improving efficiency for spurious IRQs (no need to comply with C calling conventions if you don't call any C code).

d) There's no point doing sti before iret in an IRQ handler (as the iret loads EFLAGS from stack anyway). For IRQ handlers (which can't be started unless interrupts/IRQs were enabled previously) there is potentially a reason to do sti then nop then iret (to encourage a pending second IRQ to interrupt while the CPU is still at CPL=0 and avoid switching to CPL=3 and then having to switch back to CPL=0 again immediately after); but this doesn't happen for sti immediately followed by iret because sti causes IRQs to be enabled after the next instruction (after the iret) and not after the sti itself. Note that sti then nop then iret (and also sti then iret) is dangerous for exception handlers because exceptions can occur in situations where IRQs must remain disabled.