Prelude

Question

Dereferencing null is undefined behavior, how to do it anyway?

The C and C++ standards have (for whatever historical reasons) basically imposed non uniformity of access to the address space, a priori to platforms (which are the final word on address space layout) that it may target, through the adoption of a null pointer and the rules around it.

This non uniformity of access stems from several correlated factors:

  • The null pointer constant converted to a pointer type creates a null pointer.
  • A pointer that compares equal to a null pointer is a null pointer.
  • Impossibilities in representing a pointer constant to access address 0 without avoiding a conflict with a null pointer constant.
  • It's undecidable to infer at translation, given all possible language constructions, that a pointer equates to a null pointer, before further operations are applied to it (leading to undecidable undefined behavior).
    • The null pointer gets implemented taking a value in the address space.
  • A null pointer is guaranteed to not point to any object or function.
  • Dereferencing a pointer that doesn't point to any object (a null pointer for example) is undefined behavior by the standard.
    • Undefined behavior language constructions are interpreted by implementers as a door to do anything, and most of the time it's just leading optimizers to act in counterintuitive ways: simply eliminating object code instead of letting it blow up in case you have no access to data pointed by the null pointer, despite the platform giving you access or not. A conflict exists between language rules and platform conditions.

So to access data pointed by a null pointer through a C/C++ compiler/implementation, one may have to look for means to do so.

By which means you may do it?

In case you don't know but is still curious for possible real situations: [1], [2], [3], [4], [5], [6].
Even though I'm talking about null pointers, the main aim here is taking back control to the full address space (in thesis)


Check documented measures

  • Clang: declare your "null pointers" volatile.
  • GCC: use the -fno-delete-null-pointer-checks compilation flag.

Some compilers may not try to be smart on null pointer dereference by default (like MSVC), you may check that, or look for their alternative documented method.

All of this being implementation specific measures.

Employ some technique

The following snippet of code whose assembly code output can be checked on godbolt.org gives two methods for dealing with this situation:

#include <new>
#include <cstdint>

void good() {
  volatile int *p = 0;
  *std::launder(p) = 42;
}

void fat() {
  volatile std::uintptr_t v = 0;
  volatile int *p = reinterpret_cast<int*>(v);
  *p = 42;
}

void ugly() {
  int *p = 0;
  *p = 42;
}

You can see that the assembly for fat() is kind of convoluted:

fat():
  mov QWORD PTR [rsp-8], 0
  mov rax, QWORD PTR [rsp-8]
  mov DWORD PTR [rax], 42
  ret

But it still does the job in the end, good() gives the optimal one, coupling volatile and std::launder:

good():
  mov DWORD PTR ds:0, 42
  ret

Notice ugly() is given as example of the optimizer kicking in over null pointer dereference, which curiously doesn't happen for MSVC because it's less aggressive.

For a throughly reference on the subject of dereferencing null, you may check the following sources:


𝌔 - 古之善為士者

I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn't resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years. In recent years, a number of program analysers like PREfix and PREfast in Microsoft have been used to check references, and give warnings if there is a risk they may be non-null. More recent programming languages like Spec# have introduced declarations for non-null references. This is the solution, which I rejected in 1965.

-- Sir Charles Antony Richard Hoare