Dereferencing null is undefined behavior, how to do it anyway?

The C and C++ standards have (for whatever historical reasons) basically imposed non uniformity of access to the address space, a priori to platforms (which are the final word on address space layout) that it may target, through the adoption of a null pointer and the rules around it.

This non uniformity of access stems from several correlated factors:

  • The null pointer constant converted to a pointer type creates a null pointer.
  • A pointer that compares equal to a null pointer is a null pointer.
  • Impossibilities in representing a pointer constant to access address 0 without avoiding a conflict with a null pointer constant.
  • It's undecidable to infer at translation, given all possible language constructions, that a pointer equates to a null pointer, before further operations are applied to it (leading to undecidable undefined behavior).
    • The null pointer gets implemented taking a value in the address space.
  • A null pointer is guaranteed to not point to any object or function.
  • Dereferencing a pointer that doesn't point to any object (a null pointer for example) is undefined behavior by the standard.
    • Undefined behavior language constructions are interpreted by implementers as a door to do anything, and most of the time it's just leading optimizers to act in counterintuitive ways: simply eliminating object code instead of letting it blow up in case you have no access to data pointed by the null pointer, despite the platform giving you access or not. A conflict exists between language rules and platform conditions.

So to access data pointed by a null pointer through a C/C++ compiler/implementation, one may have to look for means to do so.

By which means you may do it?

In case you don't know but is still curious for possible real situations: [1], [2], [3], [4], [5], [6].

Check documented measures

  • Clang: declare your "null pointers" volatile.
  • GCC: use the -fno-delete-null-pointer-checks compilation flag.

Some compilers may not try to be smart on null pointer dereference by default, you may check that, or look for their alternative documented method.

All of this being implementation specific measures.

Employ some technique

The following snippet of code whose assembly code output can be checked on (here's a g++ 7.0.0 20170212 screenshot, since it may cease to compile for usage of C++17 std::launder) gives three methods for dealing with this situation:

#include <new>
#include <cstdio>

extern void *my_address;
void *my_address = nullptr; // you should be safer putting this in a separate
                            // compilation unit

inline void *address (void *)
#ifdef __clang__
void *address (void *p) { return p; }
#ifndef __clang__
template <typename T>
T *laundered_address(T *p) { return std::launder(p); }
inline void *naive_address (void *p) { return p; }

int main() {
  if (*(int *)address((void *)0) == 42)
    std::puts("Go home std, you're drunk!\n");
  if (*(int *)my_address == 42)
    std::puts("My precious...\n");
#ifndef __clang__
  if (*laundered_address((int *)0) == 42)
    std::puts("Peace of mind is restored in C++17?!\n");
  if (*(int *)naive_address((void *)0) == 42)
    std::puts("You shall not pass!\n");

The first, through the address function, make use of specific compiler function attributes to ask the compiler to not optimize the function. This function takes an address and returns it back, it works as a blind spot for the optimizer that will run around it, because then it will be unable to judge what address is being returned, whether it's null or not, it's just a call to a function taking an address, but which could well return a "valid" non null one. This rests on the assumption that the remote possibility of an implementation producing spurious object code for checking nulls at runtime on every pointer access to "enforce undefined behavior" is zero, since it would be too costly.

The second achieves the same with the external my_address variable ideally imported from previously compiled object code that gets linked without whole program optimization. It uses the language, compiler and the established compilation model into defeating a potential inconvenient compiler's interpretation/implementation of the standard over this matter. The same technique can be applied using an imported function. A static variable should also work.

Both methods work for both C and C++.

The third is through a new addition to the C++17 standard library, the std::launder function, which asks the compiler to cancel any conflicting assumptions and assume an object at the given location. As can be seen, the current implementation isn't trying to rule out the usage of null pointer at compilation time nor is the proposal very explicit about this, it may change in the future, at which point it would cease to be a solution.

Notice that the last naive_address call is given as example of the optimizer kicking in.

For a throughly reference on the subject of dereferencing null, you may check the following sources:

𝌔 - 古之善為士者

I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn't resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years. In recent years, a number of program analysers like PREfix and PREfast in Microsoft have been used to check references, and give warnings if there is a risk they may be non-null. More recent programming languages like Spec# have introduced declarations for non-null references. This is the solution, which I rejected in 1965.

-- Sir Charles Antony Richard Hoare