What Building malloc() Taught About Memory
Building a dynamic memory allocator from scratch reveals truths about computation that no high-level abstraction can convey—lessons in trade-offs, fragmentation, and the nature of resources.
What Building malloc() Taught About Memory
I have been contemplating memory—not the psychological kind, but the computational kind. The finite resource that every program consumes, that every system manages, that underlies every abstraction we build.
Most developers use malloc() and free() without understanding what happens beneath the interface. They request memory; memory appears. They release memory; it disappears. The mechanism is invisible, which is precisely how good abstractions should work.
But there is a cost to this invisibility. The developer who does not understand memory management builds systems that consume resources wastefully, that fragment under load, that exhibit performance characteristics they cannot explain. The abstraction serves them until it does not, and then they are helpless.
Building a memory allocator from scratch changed how I understand computation itself. This essay explores those lessons.
The Problem
The assignment appears simple: implement malloc(), free(), and realloc() for x86-64 systems. The constraints:
- No external libraries
- Must handle arbitrary allocation sizes
- Must be thread-safe
- Must be fast
What appears simple becomes complex once the implementation begins. As the computer scientist Donald Knuth observed:
"Memory allocation is one of the oldest problems in computer science. It seems like it should be trivial—just hand out chunks of memory when asked. But the problem becomes difficult when you consider that programs request memory of many different sizes, in unpredictable orders, and may or may not free it later." 1
The difficulty is not in the allocation itself. It is in the management—in maintaining the invariants that allow a program to request and release memory over long periods without exhausting resources or degrading performance.
The Fundamental Architecture
When a program starts, the operating system provides a heap—a contiguous region of virtual memory that the program can use for dynamic allocation. The allocator's responsibility is to carve this region efficiently.
The naive approach maintains a linked list of free blocks. When an allocation is requested, the allocator searches the list for a block large enough to satisfy the request. When a block is freed, it is added back to the list.
This works. It is also slow. Each allocation requires traversing the free list, which is O(n) in the number of free blocks. For programs that perform many allocations—and most programs do—this cost becomes prohibitive.
The computer scientist Paul Wilson, in his seminal survey of memory allocation techniques, catalogs the alternatives:
"The choice of allocation algorithm involves fundamental trade-offs between time and space efficiency. No single algorithm is optimal for all workloads. The allocator designer must understand the characteristics of the workloads they expect to handle and choose algorithms accordingly." 2
This insight—that there is no universally optimal solution, only trade-offs appropriate to specific contexts—applies far beyond memory allocation. It is a general principle of systems design.
Segregated Free Lists
The implementation I developed uses segregated free lists—a strategy that maintains multiple free lists, each containing blocks of a particular size class.
Size Class 0: 32-byte blocks
Size Class 1: 64-byte blocks
Size Class 2: 128-byte blocks
...
Size Class N: Large blocks
When an allocation is requested, the allocator determines the appropriate size class and searches only that list. If no block is available, it requests more memory from the operating system or splits a block from a larger size class.
This provides O(1) allocation for common sizes—a significant performance improvement over the naive approach. The cost is increased metadata overhead (multiple list pointers must be maintained) and the potential for internal fragmentation (a 33-byte request receives a 64-byte block, wasting 31 bytes).
The trade-off is worth it for most workloads. Programs tend to allocate many objects of similar sizes. The segregated approach exploits this locality.
The Coalescing Problem
Memory allocation creates a problem that free alone cannot solve: fragmentation.
Consider a heap that has been in use for some time. Blocks have been allocated and freed in various orders. The free blocks are scattered throughout the heap, interspersed with allocated blocks. Even if the total free memory is large, no single contiguous block may be large enough to satisfy a large allocation request.
This is external fragmentation—the phenomenon where memory is available but unusable because it is divided into pieces too small for the current request.
The solution is coalescing: when a block is freed, check whether the adjacent blocks are also free. If so, merge them into a single larger block. This maintains larger contiguous regions and reduces fragmentation.
The philosopher and computer scientist Edsger Dijkstra saw in this problem a more general principle:
"The art of programming is the art of organizing complexity; of mastering multitude and avoiding its bastard chaos as effectively as possible." 3
Fragmentation is a form of chaos—disorder that accumulates over time, degrading system performance. Coalescing is a form of maintenance—ongoing work to restore order and preserve the system's capacity to function.
This pattern appears throughout computing. File systems require defragmentation. Databases require vacuuming. Codebases require refactoring. The second law of thermodynamics applies to software: without ongoing maintenance, systems tend toward disorder.
Security Considerations
Memory allocators present an attractive target for attackers. If an attacker can overflow a buffer and corrupt the metadata that the allocator uses to manage blocks, they can manipulate the allocator into overwriting arbitrary memory locations.
The classic "heap overflow" attack exploits this vulnerability. By carefully crafting the overflow, an attacker can cause the allocator to write attacker-controlled data to attacker-specified addresses—often overwriting function pointers or return addresses to gain control of program execution.
The mitigation I implemented uses header obfuscation: each block's metadata is XORed with a secret value before being stored. When the metadata is read, it is XORed again to recover the original value. If an attacker corrupts the metadata without knowing the secret, the allocator will read garbage and likely crash—preventing the exploitation.
#define MAGIC 0xDEADBEEF
size_t encode_header(size_t header) {
return header ^ MAGIC;
}This is not bulletproof. A determined attacker who can leak memory contents may be able to discover the secret. But it raises the bar significantly, transforming a straightforward exploit into a more complex attack that may not be worth the effort.
The computer security researcher Matt Miller calls this "defense in depth":
"Security is not about making attacks impossible. It is about making attacks expensive. Each layer of defense increases the cost to the attacker. At some point, the cost exceeds the value, and the attacker moves to an easier target." 4
The principle applies beyond security. In any adversarial environment—whether the adversary is a hacker, a competitor, or simply entropy—the goal is not perfection but sufficient resilience.
Performance Results
After optimization, the allocator achieved performance competitive with production implementations:
- malloc: approximately 50 cycles for cached sizes
- free: approximately 30 cycles with coalescing
- Memory overhead: 8 bytes per block
- Fragmentation: less than 5% on typical workloads
These numbers are not exceptional by industrial standards. Production allocators like jemalloc and tcmalloc incorporate additional optimizations—thread-local caches, size-specific arenas, huge page support—that my implementation lacks.
But the numbers are respectable. More importantly, achieving them required understanding trade-offs that the high-level developer never encounters.
The Philosophical Dimension
Building a memory allocator teaches something that reading about memory allocators cannot: the visceral experience of resource scarcity.
In high-level languages with garbage collection, memory appears infinite. You allocate objects without concern for their size or lifetime. The garbage collector handles deallocation. The abstraction is so complete that memory might as well not exist.
This abstraction is useful—it allows developers to focus on application logic rather than memory management. But it comes with a cost. The developer who has never experienced resource scarcity builds systems that assume abundance. They allocate freely in hot paths. They create objects with short lifetimes that stress the garbage collector. They never ask whether an allocation is necessary.
The systems programmer who has built an allocator thinks differently. Every allocation is a decision. Every object has a cost. The question is not "can I allocate this?" but "should I?"
The economist Thomas Sowell captures the general principle:
"The first lesson of economics is scarcity: there is never enough of anything to fully satisfy all those who want it. The first lesson of politics is to disregard the first lesson of economics." 5
High-level abstractions are the politics of computing—they allow us to disregard scarcity, to pretend that resources are infinite, to build systems without confronting trade-offs. This is useful until it is not. When the system fails under load, when the garbage collector stalls, when memory is exhausted, the abstraction collapses and the reality of scarcity reasserts itself.
Lessons Generalized
The experience of building a memory allocator crystallized several principles that apply beyond memory management:
Every byte counts. In resource-constrained environments, overhead compounds. Headers consume memory that could hold payload. Metadata structures consume cache lines that could hold data. The trade-off between metadata and payload is constant, and the right answer depends on the workload.
Cache locality matters. Modern processors are optimized for sequential access. Data structures that scatter related information across memory pay a penalty in cache misses. Keeping metadata close to data improves performance dramatically.
Trade-offs are everywhere. Speed versus memory versus complexity—every design decision involves sacrifice. The naive linked-list allocator is simple but slow. The segregated free-list allocator is fast but wastes memory on internal fragmentation. The optimal choice depends on what you are optimizing for.
Debugging is difficult. When the heap is corrupted, printf does not work reliably—because printf itself allocates memory. The tools we rely on for debugging assume a functioning allocator. When the allocator is broken, we must find alternatives.
The Broader Perspective
Understanding memory allocation changes how one writes code at every level of abstraction.
You think twice before allocating in hot paths. You understand why languages implement garbage collectors—and what the garbage collector costs. You appreciate the complexity hidden behind simple APIs.
The philosopher Michael Polanyi distinguished between "focal awareness" and "subsidiary awareness"—the difference between what we attend to directly and what we rely on without attention.6 The experienced systems programmer has subsidiary awareness of memory. They do not think about it constantly, but their thinking is shaped by understanding it.
This is perhaps the deepest lesson. Not any specific technique, but a mode of relating to computational resources. An awareness that abstractions hide complexity, that complexity has costs, and that understanding the hidden complexity makes you a better user of the abstraction.
For those seeking genuine understanding of systems programming, building something that manages memory is invaluable. The lessons cannot be transmitted through reading alone. They must be discovered through the struggle of implementation—through the hours spent debugging corruption, optimizing performance, making trade-offs. The knowledge becomes embodied, woven into how you think about computing itself.
Bibliography
Footnotes
-
Knuth, Donald E. The Art of Computer Programming, Volume 1: Fundamental Algorithms. Addison-Wesley, 1968, p. 435. ↩
-
Wilson, Paul R., Mark S. Johnstone, Michael Neely, and David Boles. "Dynamic Storage Allocation: A Survey and Critical Review." International Workshop on Memory Management, 1995, p. 1. ↩
-
Dijkstra, Edsger W. "Notes on Structured Programming." Structured Programming. Academic Press, 1972, p. 6. ↩
-
Miller, Matt. "Exploit Mitigation Improvements in Windows 8." Black Hat USA, 2012. ↩
-
Sowell, Thomas. The Vision of the Anointed: Self-Congratulation as a Basis for Social Policy. Basic Books, 1995, p. 112. ↩
-
Polanyi, Michael. The Tacit Dimension. University of Chicago Press, 1966, p. 10. ↩