Will sending `kill -11` to java process raises a NullPointerException?

385 Views Asked by At

For example, the HotSpot JVM implement null-pointer detection by catching SIGSEGV signal. So if we manually generate a SIGSEGV from external, will that also be recognized as NullPointerException in some circumstances ?

3

There are 3 best solutions below

9
VonC On BEST ANSWER

Will sending kill -11 to java process raises a NullPointerException?

It should not: a NullPointerException is a specific exception that occurs when an application tries to use an object reference that has the null value.

Yet, from JavaSE 17 / Troubleshooting guide / Handle Signals and Exceptions

The Java HotSpot VM installs signal handlers to implement various features and to handle fatal error conditions.

For example, in an optimization to avoid explicit null checks in cases where java.lang.NullPointerException will be thrown rarely, the SIGSEGV signal is caught and handled, and the NullPointerException is thrown.

In general, there are two categories where signal/traps happen:

  • When signals are expected and handled, like implicit null-handling. Another example is the safepoint polling mechanism, which protects a page in memory when a safepoint is required. Any thread that accesses that page causes a SIGSEGV, which results in the execution of a stub that brings the thread to a safepoint.

  • Unexpected signals. That includes a SIGSEGV when executing in VM code, Java Native Interface (JNI) code, or native code. In these cases, the signal is unexpected, so fatal error handling is invoked to create the error log and terminate the process.

That approach allows the JVM to optimize performance by reducing the overhead of explicit null checks in the code, relying instead on the operating system's memory protection mechanisms to detect access to null references. When such access occurs, the operating system generates a SIGSEGV signal, which the JVM then interprets as an attempt to dereference a null pointer, leading to the throwing of a NullPointerException.

However, it is important to note that this is an internal mechanism of the JVM and is distinct from externally generated SIGSEGV signals, such as those sent using the kill command. External SIGSEGV signals are generally used to indicate serious errors, including invalid memory access, and are more likely to result in a JVM crash or core dump rather than a NullPointerException.

+---------------------+         +-----------------------------------+
| External Process    |         | Java Process running on HotSpot   |
| sending SIGSEGV     | ------> | JVM                               |
| (kill -11)          |         | Likely JVM Crash or Core Dump     |
+---------------------+         +-----------------------------------+

Is the JVM always capable of detecting whether an external SIGSEGV is an external SIGSEGV or is it possible to confuse an external SIGSEGV for a null access when it happens at a specific time, i.e. when a potential null access is expected?

Again, it should not, but this is an implementation-specific aspect of JVM behavior.
That means the likelihood of such confusion happening in practice may vary depending on the JVM version, the specific code being executed, and the state of the JVM at the time of the signal.

See for instance "How does the JVM know when to throw a NullPointerException"

The JVM could implement the null check using virtual memory hardware. The JVM arranges that page zero in its virtual address space is mapped to a page that is unreadable + unwriteable.

Since null is represented as zero, when Java code tries to dereference null this will try to access a non-addressible page and will lead to the OS delivering a "segfault" signal to the JVM.

The JVM's segfault signal handler could trap this, figure out where the code was executing, and create and throw an NPE on the stack of the appropriate thread.

In that scenario, it should be easy to distinguish a trapped signal from within the code execution, from a received signal from the OS.

Also: "Can a SIGSEGV in Java not crash the JVM?"

There are definitely scenarios where the JVM's SIGSEGV signal handler may turn the SIGSEGV event into a Java exception.
You will only get a JVM hard crash if that cannot happen; e.g. if the thread that triggered the SIGSEGV was executing code in a native library when the event happened.

For instance:

HotSpot JVM deliberately generates SIGSEGV at startup to check certain CPU features. There is no switch to turn it off. I suggest skipping SIGSEGV in gdb altogether, because JVM uses it for its own purpose in many cases.


What if the stack happens to locate at accessing an address when the SIGSEGV is triggered externally?

The hotspot had a major refactoring around signal handling in JDK-8255711, resulting in commit dd8e4ff.

The current code is os_linux_x86.cpp#PosixSignals::pd_hotspot_signal_handler

  // decide if this trap can be handled by a stub
  address stub = nullptr;

  address pc          = nullptr;

  //%note os_trap_1
  if (info != nullptr && uc != nullptr && thread != nullptr) {
    pc = (address) os::Posix::ucontext_get_pc(uc);

    if (sig == SIGSEGV && info->si_addr == 0 && info->si_code == SI_KERNEL) {
      // An irrecoverable SI_KERNEL SIGSEGV has occurred.
      // It's likely caused by dereferencing an address larger than TASK_SIZE.
      return false;
    }

    // Handle ALL stack overflow variations here
    if (sig == SIGSEGV) {
      address addr = (address) info->si_addr;

      // check if fault address is within thread stack
      if (thread->is_in_full_stack(addr)) {
        // stack overflow
        if (os::Posix::handle_stack_overflow(thread, addr, pc, uc, &stub)) {
          return true; // continue
        }
      }
    }

    if ((sig == SIGSEGV) && VM_Version::is_cpuinfo_segv_addr(pc)) {
      // Verify that OS save/restore AVX registers.
      stub = VM_Version::cpuinfo_cont_addr();
    }

    if (thread->thread_state() == _thread_in_Java) {
      // Java thread running in Java code => find exception handler if any
      // a fault inside compiled code, the interpreter, or a stub

      if (sig == SIGSEGV && SafepointMechanism::is_poll_address((address)info->si_addr)) {
        stub = SharedRuntime::get_poll_stub(pc);
      } else if (sig == SIGBUS /* && info->si_code == BUS_OBJERR */) {
        // BugId 4454115: A read from a MappedByteBuffer can fault
        // here if the underlying file has been truncated.
        // Do not crash the VM in such a case.
        CodeBlob* cb = CodeCache::find_blob(pc);
        CompiledMethod* nm = (cb != nullptr) ? cb->as_compiled_method_or_null() : nullptr;
        bool is_unsafe_arraycopy = thread->doing_unsafe_access() && UnsafeCopyMemory::contains_pc(pc);
        if ((nm != nullptr && nm->has_unsafe_access()) || is_unsafe_arraycopy) {
          address next_pc = Assembler::locate_next_instruction(pc);
          if (is_unsafe_arraycopy) {
            next_pc = UnsafeCopyMemory::page_error_continue_pc(pc);
          }
          stub = SharedRuntime::handle_unsafe_access(thread, next_pc);
        }
      }
      else

#ifdef AMD64
      if (sig == SIGFPE  &&
          (info->si_code == FPE_INTDIV || info->si_code == FPE_FLTDIV)) {
        stub =
          SharedRuntime::
          continuation_for_implicit_exception(thread,
                                              pc,
                                              SharedRuntime::
                                              IMPLICIT_DIVIDE_BY_ZERO);
#else
      if (sig == SIGFPE /* && info->si_code == FPE_INTDIV */) {
        // HACK: si_code does not work on linux 2.2.12-20!!!
        int op = pc[0];
        if (op == 0xDB) {
          // FIST
          // TODO: The encoding of D2I in x86_32.ad can cause an exception
          // prior to the fist instruction if there was an invalid operation
          // pending. We want to dismiss that exception. From the win_32
          // side it also seems that if it really was the fist causing
          // the exception that we do the d2i by hand with different
          // rounding. Seems kind of weird.
          // NOTE: that we take the exception at the NEXT floating point instruction.
          assert(pc[0] == 0xDB, "not a FIST opcode");
          assert(pc[1] == 0x14, "not a FIST opcode");
          assert(pc[2] == 0x24, "not a FIST opcode");
          return true;
        } else if (op == 0xF7) {
          // IDIV
          stub = SharedRuntime::continuation_for_implicit_exception(thread, pc, SharedRuntime::IMPLICIT_DIVIDE_BY_ZERO);
        } else {
          // TODO: handle more cases if we are using other x86 instructions
          //   that can generate SIGFPE signal on linux.
          tty->print_cr("unknown opcode 0x%X with SIGFPE.", op);
          fatal("please update this code.");
        }
#endif // AMD64
      } else if (sig == SIGSEGV &&
                 MacroAssembler::uses_implicit_null_check(info->si_addr)) {
          // Determination of interpreter/vtable stub/compiled code null exception
          stub = SharedRuntime::continuation_for_implicit_exception(thread, pc, SharedRuntime::IMPLICIT_NULL);
      }
    } else if ((thread->thread_state() == _thread_in_vm ||
                thread->thread_state() == _thread_in_native) &&
               (sig == SIGBUS && /* info->si_code == BUS_OBJERR && */
               thread->doing_unsafe_access())) {
        address next_pc = Assembler::locate_next_instruction(pc);
        if (UnsafeCopyMemory::contains_pc(pc)) {
          next_pc = UnsafeCopyMemory::page_error_continue_pc(pc);
        }
        stub = SharedRuntime::handle_unsafe_access(thread, next_pc);
    }

    // jni_fast_Get<Primitive>Field can trap at certain pc's if a GC kicks in
    // and the heap gets shrunk before the field access.
    if ((sig == SIGSEGV) || (sig == SIGBUS)) {
      address addr = JNI_FastGetField::find_slowcase_pc(pc);
      if (addr != (address)-1) {
        stub = addr;
      }
    }
  }

The JVM uses various checks to determine the context of a SIGSEGV signal. However, I do not see a straightforward mechanism to distinguish an externally sent SIGSEGV from one internally generated due to a null reference access.

The signal handler examines the execution context, including the program counter and the stack, to infer the cause of the SIGSEGV. In case of a null reference, it looks for specific patterns that suggest a null pointer exception. But if an external SIGSEGV happens to coincide precisely with a situation where the JVM's execution state resembles that of a null pointer access, distinguishing between the two can be challenging.

However, such a scenario is relatively unlikely due to the level of precision required in timing.

0
Moziii On

Sending a kill -11 to a Java process will send a SIGSEGV (segmentation fault) signal to that process. SIGSEGV is a signal sent by the operating system to a process when it makes an invalid memory reference, or segmentation fault.

In the context of the Java HotSpot JVM, a NullPointerException is typically raised internally by the JVM when it detects an attempt to dereference a null reference. This is often implemented by catching a SIGSEGV signal that results from such an attempt. The JVM has a mechanism to differentiate between a SIGSEGV that is a legitimate NullPointerException and other segmentation faults.

When you externally send a SIGSEGV to a Java process (using kill -11), it's not equivalent to the JVM internally detecting a null reference access. Instead, it's an abrupt signal to the process that it has attempted to access memory that it shouldn't, which is typically outside the scope of normal Java exception handling.

To answer your questions:

  1. Will sending kill -11 to a Java process raise a NullPointerException?: No, sending a kill -11 (or SIGSEGV) to a Java process will not raise a NullPointerException. Instead, it will likely cause the JVM to crash or terminate unexpectedly because it's a signal that the process has attempted to access an invalid memory location.
  2. Is the JVM capable of detecting whether a SIGSEGV is external or due to a null access?: Yes, the JVM is generally capable of distinguishing between a SIGSEGV caused by a genuine null pointer dereference within the Java program (which would result in a NullPointerException) and other causes of SIGSEGV, such as an external kill command or other invalid memory accesses. The JVM uses its internal mechanisms to determine the context of the SIGSEGV and whether it corresponds to a null reference access.
  3. Can an external SIGSEGV be confused for a null access?: Under normal circumstances, an external SIGSEGV should not be confused with a null access within the JVM. The JVM's signal handlers are designed to interpret the context of the fault and distinguish between different causes of SIGSEGV. However, in complex systems or in cases of JVM bugs, unexpected behavior can occur, but this would be highly unusual and not the norm.
3
apangin On

Summary

Yes, in some marginal cases an external kill command may cause a bogus NullPointerException in a Java application. This behavior is platform-dependent and difficult to reproduce, however, I managed to trigger this in practice.

Background

HotSpot JVM employs a technique called "implicit null check", where the JVM compiles an access to an object field which offset is less than a page size (4096) to a single load/store instruction without extra overhead for checking the object reference for null. If such an instruction is executed for null reference, the OS raises SIGSEGV. The JVM's signal handler catches this signal and transfers control to the code that throws NullPointerException.

Not every SIGSEGV ends up with a NPE. HotSpot signal handler checks that

  • the current thread is a Java thread;
  • SIGSEGV occurs in a JIT-compiled code;
  • the address being accessed is within zero page (0x0 - 0xfff);
  • the fault instruction is marked as "implicit exception" and there is an exception handler assigned to this instruction.

In theory, if we craft a signal that satisfies all the conditions, HotSpot will treat it as NPE.

Practice

To increase chances of a user signal hitting the right instruction, we'll write an infinite loop that repeatedly stores to an object field. To prevent hoisting of the null check, the reference itself should be loaded from a volatile field.

public class BogusNPE {
    static volatile BogusNPE X = new BogusNPE();

    int n;

    public static void main(String[] args) {
        while (true) {
            BogusNPE x0 = X, x1 = X, x2 = X, x3 = X, x4 = X, x5 = X, x6 = X, x7 = X, x8 = X, x9 = X;
            x0.n = x1.n = x2.n = x3.n = x4.n = x5.n = x6.n = x7.n = x8.n = x9.n = 0;
        }
    }
}

Here I generated 10 stores in a row, all with an implicit null check.

Use -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly to verify that the corresponding mov instructions are annotated with implicit exception:

  0x00007fb4a4bd440c:   mov    0x70(%r10),%edx              ;*getstatic X {reexecute=0 rethrow=0 return_oop=0}
                                                            ; - BogusNPE::main@32 (line 8)
  0x00007fb4a4bd4410:   mov    0x70(%r10),%ebp              ;*getstatic X {reexecute=0 rethrow=0 return_oop=0}
                                                            ; - BogusNPE::main@37 (line 8)
  0x00007fb4a4bd4414:   mov    0x70(%r10),%eax              ;*getstatic X {reexecute=0 rethrow=0 return_oop=0}
                                                            ; - BogusNPE::main@42 (line 8)
  0x00007fb4a4bd4418:   mov    %r12d,0xc(%r12,%rax,8)       ; implicit exception: dispatches to 0x00007fb4a4bd4456
                                                            ;*putfield n {reexecute=0 rethrow=0 return_oop=0}
                                                            ; - BogusNPE::main@66 (line 9)
  0x00007fb4a4bd441d:   mov    %r12d,0xc(%r12,%rbp,8)       ; implicit exception: dispatches to 0x00007fb4a4bd4468
                                                            ;*putfield n {reexecute=0 rethrow=0 return_oop=0}
                                                            ; - BogusNPE::main@70 (line 9)
  0x00007fb4a4bd4422:   mov    %r12d,0xc(%r12,%rdx,8)       ; implicit exception: dispatches to 0x00007fb4a4bd447c
                                                            ;*putfield n {reexecute=0 rethrow=0 return_oop=0}
                                                            ; - BogusNPE::main@74 (line 9)

Run the program and get its PID:

$ jps
256 BogusNPE
280 Jps

Here pid=256, but we should send the signal not to a process, but to the particular thread. ID of the main thread is usually pid+1, that is 257.

$ sudo kill -11 257

It may take several attempts before we finally achieve the goal:

Exception in thread "main" java.lang.NullPointerException: Cannot assign field "n" because "x5" is null
        at BogusNPE.main(BogusNPE.java:9)

Nuances

On x86 platform, I could trigger NPE without sudo, but on 64-bit platforms sudo is important. Also, it's substantial that PID of the shell where we run kill is less than 4096. And that is why.

HotSpot checks that the fault address siginfo->si_addr is located in zero page (otherwise load/store instruction requires an explicit null check). However, si_addr is set only when SIGSEGV is raised by kernel, we cannot control it with kill command. For user-generated signals, si_pid (sending process ID) and si_uid (user ID of sending process) are set instead.

By a lucky chance, siginfo_t structure contains a union, where si_addr overlaps with si_pid and si_uid.

63       31       0
+-----------------+
|     si_addr     |
+-----------------+
| si_uid | si_pid |
+-----------------+

So, to produce si_addr value between 0 and 4096, we need to make si_uid = 0 (that is, invoke kill by user 0 or root), and set si_pid < 4096. On 32-bit systems, si_addr overlaps with si_pid only.

If the signal misses mov instruction with an implicit null check, or if si_addr is larger than the page size, the JVM will crash with a fatal error instead of throwing NPE.

Can JVM detect the source of SIGSEGV?

It is certainly possible to distinguish user-generated SIGSEGV from a signal caused by invalid memory access. The signal handler could just check si_code field of siginfo_t structure:

  • for a real NullPointerException, si_code will be SEGV_MAPERR;
  • for a signal sent by kill, tgkill or sigqueue, the code will be SI_USER, SI_TKILL or SI_QUEUE respectively.

However, current HotSpot implementation does not do that, and therefore it is possible to fool the JVM using the above trick.