Memory Management in Kdb
Contents
Buddy Memory System - reference counting
Kdb uses a variant of the buddy memory system using reference counting for tracking live objects.
- Objects are allocated memory in blocks of powers of 2
Memory for objects < 32MB will come from an internal heap which can only ever grow - this memory is given back to the heap when the object is no longer referenced, and can be used again for further allocations < 32MB.
- Memory for objects > 32MB will be given back to the OS when the object is no longer referenced.
- Symbols are stored as an interned pool
Version Differences
Version | Behaviour |
---|---|
2.4 | Memory never returned |
2.5/2.6 | Unreferenced memory blocks over 32MB/64MB are returned immediately |
2.7 | Unreferenced memory blocks returned when memory full or .Q.gc[] called |
Finding the memory used
The following commands can be used to get memory usage
All values are in bytes
- used - subset of heap in actual use.
- heap - physically memory allocated to this process.
- peak - largest heap size that q process has yet had.
- wmax - the memory limit as set using the -w command line argument.
- mmap - memory used for memory mapping files on disk.
- mphy - physical memory available on the machine.
- syms - Number of distinct syms in this q process.
- symw - memory footprint of interned string pool.
In older versions of q, .Q.w[] was not present. The older, less user friendly way of obtaining the above statistics is
- \w - used heap peak wmax mmap
- \w 0 - syms symw
Garbage Collection
.Q.gc[]
(since 2.7) invokes the garbage collector. Returns the amount of memory that was returned to the OS.
Command line -g parameter
Switch garbage collection between immediate (1) and deferred (0) modes.
Reference counting in detail
The C API details reference couting as encountered when extending kdb using C
From within kdb we can use -16! - Returns the number of references to an object
Vectors are copied by reference when possible, but editing just one value causes another entire vector to be allocated. Note columns in a table are just vectors as shown below:
Memory Mapped files
There are two modes of memory mapping - immediate and deferred:
- deferred mode - column is memory mapped on demand as needed for the duration of the query.
- immediate mode - the columns memory map is maintained, whether or not the memory is actually used, this is down to OS details.
Compression and Memory
Uncompressed columns are stored in memory for the duration of a query, this can significantly increase memory requirements.