The C programming language —and its counterpart C++— has been a foundational technology for decades in operating systems, firmware, high-performance libraries, and critical components. However, this efficiency and low-level control rely on a feature that brings both power and risk: Undefined Behavior (UB).
In recent years, the C standardization committee (WG14) has begun promoting a coordinated effort to reduce, constrain, or reclassify numerous cases of Undefined Behavior, with the goal of improving code security and predictability without compromising the language’s characteristic efficiency.
This article presents an accurate technical overview of the nature of Udefined Behaviours, why it exists, how it affects the observable behavior of programs, and how recent standards (C23) and upcoming ones (C2y) are working to mitigate it.
What Exactly Is Undefined Behavior?
To understand UB, it is necessary to place it within the categories of behavior defined by ISO/IEC 9899:
Implementation-Defined Behavior
The standard delegates the decision to the compiler or system, but requires that this decision be documented.
Example: the exact size of types such as int.
Unspecified Behavior
The standard offers several valid alternatives, but the compiler is not required to document which one it chooses. The choice must be consistent within a single execution of the program.
Example: the order in which function arguments are evaluated.
Undefined Behavior (UB)
The standard imposes no requirements on the program’s behavior. Common examples include:
-
Dereferencing a null pointer.
-
Signed integer arithmetic overflow.
-
Out-of-bounds array access.
The absence of requirements in these cases means the compiler is free to:
-
Optimize under the assumption that UB never occurs.
-
Omit checks.
-
Generate unpredictable or non-intuitive code.
Why Does UB Exist in the First Place?
Contrary to popular belief (and sometimes to the opinion of tools such as MISRA), UB is not a flaw in the standard. It is intentionally present for three main reasons:
1. The impossibility of systematically diagnosing certain errors: Detecting some conditions at compile time is equivalent to solving undecidable problems (such as the halting problem). UB avoids requiring mandatory diagnostics for such cases.
2. Enabling high-level optimizations and efficient use of hardware: Taking conservative decisions for every possible error would require inserting costly checks. For example, left-shifting an integer produces different results on ARM, PowerPC, and x86 architectures. By leaving this as undefined, the compiler can emit the fastest available hardware instruction without adding expensive safety checks.
3. Language extensions: Many traditional compilers provide extensions, additional behaviors, or alternative conventions without violating standard conformance thanks to certain situations being classified as UB (such as extra modes in fopen).
The Danger of Optimizations: UB That “Time Travels”
One of the most fascinating and dangerous aspects of UB is how it interacts with modern optimizations under the “As If” rule. The compiler may transform your code however it wishes, as long as the observable result is the same as in the “abstract machine.”
However, when the compiler assumes that UB will never occur (a strategy known as Total License), it may eliminate safety checks or reorder code in surprising ways.
The Case of Invariant Hoisting (Loop Hoisting)
Imagine a loop where a remainder operation (%) is performed and its operands do not change across iterations. To optimize, the compiler hoists that operation out of the loop so it is computed only once at the beginning.
// Conceptual source code
// Código fuente conceptual
puts("Inicio del bucle");
for (int i = 0; i < 100; i++) {
// Si divisor es 0 o -1 con INT_MIN, esto es UB
resultado += base % divisor;
puts("Dentro del bucle");
}If divisor can trigger UB (for example, integer division by zero), the compiler may move the % operation outside the loop to optimize. If the UB manifests in this hoisted computation, the failure can occur before the program even enters the loop, misaligning the logical order with the actual execution order.
To the programmer, this appears as a kind of “time-traveling behavior”: the failure occurs before the code that logically contains the bug, making debugging significantly harder.
Compiler Strategies Toward UB
Compiler developers typically adopt one of three approaches:
- Hardware Behavior: Generate the assembly instruction and let the CPU decide. This is what many C programmers assume happens, but it is no longer the common practice.
- Diagnostics: Instrument the code with sanitizers (such as UBSan) to catch each error. These tools are essential during testing phases, although they come with significant performance costs.
- Assumption of No UB (Total License): The compiler assumes that UB never occurs and optimizes under that premise, potentially eliminating entire execution paths or applying aggressive transformations. This strategy is currently the most widespread in compilers following the LLVM or GCC model.
The Path Forward: C23 and Memory Safety
The WG14 committee is taking significant steps to clean up the language. In preparation for the C23 standard, more than 32 cases of undefined behavior have already been removed or clarified.
A concrete example: the register storage class
Previously, attempting to take the address of a variable declared with register was Undefined Behavior. In the new revisions, this has been reclassified as a constraint violation.
What’s the difference?
UB allows the compiler to do anything silently. A constraint violation requires the compiler to emit a diagnostic (an error or warning) at compile time. This shifts the problem from an unpredictable runtime failure to a safe and visible compile-time error.
Security Study Groups
Beyond eliminating specific UBs, a Memory Safety Study Group has been formed. The goal is not to turn C into Rust overnight, but to explore how optional “memory safety modes” or annotations could be added to detect buffer overflows and out-of-bounds accesses without breaking existing binary compatibility.
Conclusion
The C and C++ ecosystem is evolving toward a safer balance between efficiency, portability, and robustness. The removal or reclassification of numerous UB cases in C23, along with the advances planned for C2y, represents a significant shift toward greater reliability.
For today’s developers, the lesson is clear: UB cannot be ignored. Using high compiler warning levels (-Wall -Wextra), employing sanitizers during testing, and staying up to date with the C23/C2y standards are essential for writing robust and secure software.
Undefined Behavior will continue to exist because it is part of the essence of the language, but its scope and risks can be substantially reduced through good practices, modern tools, and the ongoing evolution of the standard.
