dereferencing null pointers
Prelude
- Is dereferencing a pointer that's equal to nullptr undefined behavior by the standard?
- Do C and C++ standards imply that a special value in the address space must exist solely to represent the value of null pointers?
Question
Dereferencing null is undefined behavior, how to do it anyway?
The C and C++ standards have (for whatever historical reasons) basically imposed non uniformity of access to the address space, a priori to platforms (which are the final word on address space layout) that it may target, through the adoption of a null pointer and the rules around it.
This non uniformity of access stems from several correlated factors:
- The null pointer constant converted to a pointer type creates a null pointer.
- A pointer that compares equal to a null pointer is a null pointer.
- Impossibilities in representing a pointer constant to access address 0 without avoiding a conflict with a null pointer constant.
- It's undecidable to infer at translation, given all possible language
constructions, that a pointer equates to a null pointer, before further
operations are applied to it (leading to undecidable undefined behavior).
- The null pointer gets implemented taking a value in the address space.
- A null pointer is guaranteed to not point to any object or function.
- Dereferencing a pointer that doesn't point to any object (a null pointer for
example) is undefined behavior by the standard.
- Undefined behavior language constructions are interpreted by implementers as a door to do anything, and most of the time it's just leading optimizers to act in counterintuitive ways: simply eliminating object code instead of letting it blow up in case you have no access to data pointed by the null pointer, despite the platform giving you access or not. A conflict exists between language rules and platform conditions.
So to access data pointed by a null pointer through a C/C++ compiler/implementation, one may have to look for means to do so.
By which means you may do it?
In case you don't know but is still curious for possible real situations: [1], [2], [3], [4], [5], [6].
Even though I'm talking about null pointers, the main aim here is taking back
control to the full address space (in thesis)
Check documented measures
- Clang: declare your "null pointers"
volatile
. - GCC: use the
-fno-delete-null-pointer-checks
compilation flag.
Some compilers may not try to be smart on null pointer dereference by default (like MSVC), you may check that, or look for their alternative documented method.
All of this being implementation specific measures.
Employ some technique
The following snippet of code whose assembly code output can be checked on godbolt.org gives two methods for dealing with this situation:
#include <new>
#include <cstdint>
void lean() {
volatile int *p = 0;
new (const_cast<int *>(p)) int;
p = std::launder(p);
*p = 42;
}
void fat() {
volatile std::uintptr_t up = 0;
volatile int *p = new (reinterpret_cast<int *>(up)) int;
*p = 42;
}
void bad1() {
int *p = 0;
*p = 42;
}
void bad2() {
new (0) int(42);
}
You can see that the assembly for fat()
is kind of convoluted:
fat():
mov QWORD PTR [rsp-8], 0
mov rax, QWORD PTR [rsp-8]
mov DWORD PTR [rax], 42
ret
But it still does the job in the end, lean()
gives the optimal one, coupling
volatile
and std::launder
:
lean():
mov DWORD PTR ds:0, 42
ret
Notice bad1()
and bad2()
are given as example of the optimizer kicking in
over null pointer dereference, which curiously doesn't happen for MSVC because
it's less aggressive.
For a throughly reference on the subject of dereferencing null, you may check the following sources:
- A personal tale on a special value
- Do C and C++ standards imply that a special value in the address space must exist solely to represent the value of null pointers?
- Is dereferencing a pointer that's equal to nullptr undefined behavior by the standard?
- Why dereferencing a null raw pointer is undefined behaviour? (Rust)
𝌔 - 古之善為士者
I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn't resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years. In recent years, a number of program analysers like PREfix and PREfast in Microsoft have been used to check references, and give warnings if there is a risk they may be non-null. More recent programming languages like Spec# have introduced declarations for non-null references. This is the solution, which I rejected in 1965.