Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in…”

1.7k Views Asked by At

Background: we have an ILI2511 touch controller which raises an interrupt when touch movement is detected. Then the host polls the touch controller via I2C. During EMC testing, the touch controller might fire many interrupts and report wrong finger positions. Nevertheless the kernel should never crash. Here is the relevant code snippets:

Init:

error = devm_request_threaded_irq(dev, client->irq, NULL, ili251x_irq, IRQF_ONESHOT, client->name, data);

Bottom half:

static irqreturn_t ili251x_irq(int irq, void *irq_data)
{
  struct ili251x_data *data = irq_data;
  struct i2c_client *client = data->client;
  struct touchdata touchdata;
  int error;

  error = ili251x_read_reg(client, REG_TOUCHDATA,
                    &touchdata,
                    sizeof(touchdata) -
                          sizeof(struct finger)*TOUCHDATA2_FINGERS);

/* more code */
if (!error)
    ili251x_report_events(data, &touchdata);
else
    dev_err(&client-\>dev, "Unable to get touchdata, err = %d\\n", error);

  return IRQ_HANDLED;
}

ili251x_read_reg() uses the i2c controller.

I found no array overflow or pointer issues related to stack variables. Still under EMC heavy fire I'm getting a crash. I added a crash dump further down.

I wonder if the information I'm getting through the crash dump could be used the track down / circle in the issue? For example what does "ili251x_irq+0x200/0x2e4" mean? Is this some code pointer / data pointer referenced in a map file? I assume the stack protector is only active before returning to the caller function.

Example:

Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ili251x_irq+0x200/0x2e4
[  691.053700] 000: CPU0: stopping
[  691.056854] 000: CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O      5.4.106-rt54-ge578cc0824 #1
[  691.066197] 000: Hardware name: Generic DRA74X (Flattened Device Tree)
[  691.072749] 000: Backtrace:
[  691.075634] 000:
[  691.077561] 000: [<c0b93090>] (dump_backtrace) from [<c0b93408>] (show_stack+0x20/0x24)
[  691.085606] 000:  r7:c120dd44 r6:60000193 r5:00000000 r4:c12996e4
[  691.091718] 000: [<c0b933e8>] (show_stack) from [<c0ba0d04>] (dump_stack+0x98/0xac)
[  691.099408] 000: [<c0ba0c6c>] (dump_stack) from [<c0210798>] (handle_IPI+0x404/0x47c)
[  691.107275] 000:  r7:c120dd44 r6:00000004 r5:c129ebc4 r4:c12a8270
[  691.113387] 000: [<c0210394>] (handle_IPI) from [<c0202340>] (gic_handle_irq+0x9c/0xa0)
[  691.121428] 000:  r10:c12a8000 r9:c1201ed0 r8:fa213000 r7:fa212000 r6:fa21200c r5:c125a574
[  691.129722] 000:  r4:c120df34
[  691.132694] 000: [<c02022a4>] (gic_handle_irq) from [<c0201a78>] (__irq_svc+0x58/0xa0)
[  691.140643] 000: Exception stack(0xc1201ed0 to 0xc1201f18)
[  691.146148] 000: 1ec0:                                     00000000 0041cd04 00000000 c022879c
[  691.154793] 000: 1ee0: c1200000 c120d730 00000000 c120d778 c129e63d c105ea48 c12a8000 c1201f2c
[  691.163437] 000: 1f00: c1201f0c c1201f20 c02283fc c0209b34 a0000013 ffffffff
[  691.170513] 000:  r9:c1200000 r8:c129e63d r7:c1201f04 r6:ffffffff r5:a0000013 r4:c0209b34
[  691.178719] 000: [<c0209b04>] (arch_cpu_idle) from [<c0ba713c>] (default_idle_call+0x3c/0x48)
[  691.187282] 000: [<c0ba7100>] (default_idle_call) from [<c026ced8>] (do_idle+0xe4/0x150)
[  691.195407] 000: [<c026cdf4>] (do_idle) from [<c026d26c>] (cpu_startup_entry+0x28/0x2c)
[  691.203444] 000:  r9:c105ea48 r8:00000001 r7:c12a8000 r6:00000000 r5:00000002 r4:000000ce
[  691.211649] 000: [<c026d244>] (cpu_startup_entry) from [<c0ba0ee4>] (rest_init+0xd4/0xdc)
[  691.219859] 000: [<c0ba0e10>] (rest_init) from [<c1000b88>] (arch_call_rest_init+0x18/0x1c)
[  691.228249] 000:  r5:00000001 r4:c12a8054
[  691.232268] 000: [<c1000b70>] (arch_call_rest_init) from [<c10010e0>] (start_kernel+0x4dc/0x51c)
[  691.241091] 000: [<c1000c04>] (start_kernel) from [00000000>] (0x0)
[  691.247478] 001: ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ili251x_irq+0x200/0x2e4 ]---
[  691.259201] 001: ------------[ cut here ]------------

We have the chip ILI2511. The driver which I'm describing is from here: ili251x. I'm using rt kernel 5.4.y and have to stay on this version.

I tried upstream driver (https://elixir.bootlin.com/linux/v5.4.225/source/drivers/input/touchscreen/ili210x.c) but it does not work: it generates input data but when I touch a button the webbrowser, nothing happens.

Typical input queue with ili251x.c driver running: hexdump /dev/input/event0 0000000 9ae4 3fae 6874 0003 0003 0039 0006 0000 0000010 9ae4 3fae 6874 0003 0003 0035 1730 0000 0000020 9ae4 3fae 6874 0003 0003 0036 213f 0000 0000030 9ae4 3fae 6874 0003 0001 014a 0001 0000 0000040 9ae4 3fae 6874 0003 0003 0000 1730 0000

Typical input queue with (non-working) ili210x.c driver running: 0000000 53a0 3fae f15d 0002 0003 0039 0008 0000 0000010 53a0 3fae f15d 0002 0003 0035 1535 0000 0000020 53a0 3fae f15d 0002 0003 0036 1767 0000 0000030 53a0 3fae f15d 0002 0001 014a 0001 0000 0000040 53a0 3fae f15d 0002 0003 0000 1535 0000 0000050 53a0 3fae f15d 0002 0003 0001 1767 0000 I have not yet analyzed this stream.

I looked at (https://elixir.bootlin.com/linux/v5.15.82/source/drivers/input/touchscreen/ilitek_ts_i2c.c), but this one does not support ILI2511 according to compatibility list.

There is also a driver on github: (https://github.com/NewhavenDisplay/ILI2511-Ilitek-CTP-Drivers/tree/main/Linux%20-%20Ubuntu%2010.04~16.04/ilitek_limv5_9_0_1), which is I have not tried so far.

0

There are 0 best solutions below