List Theory: A narrative on the design of lists as data structures
March 10th, 2018
Manifest Statement: List data structures are a fundamental construct of computation which deserve a flexible, expressive, operator interface that privileges composability of structure as well as composability of operations.
The motivation for this narrative comes from C++ where lists are a common intersection between arrays, singly-linked lists, doubly-linked lists, vectors, strings, etc. I restrict the intent of the design to an abstraction of such lists, but deviate from the Standard Template Library (STL) specification by following my own theory of interface design, leading to a suprisingly innovative alternative as means to building (or rebuilding) these classes, to be natural extensions of this narrative.
Infrastructure
Our model starts with a list of arbitrary but fixed length, and of arbitrary content. I will leave things at an intuitive level, but if one requires higher formality an infinite list of random content should suffice.
Substructure
To initiate our interface, we look for self-similar substructures, the natural candidates being sublists. They will be the focus of our interface. Keep in mind there are other possible (non-linear) substructures, but as we value computationally effective interfaces, we restrict ourselves to ones of the simplest recursive variety.
Algebra
Our substructures can be viewed in two ways: As their own structure allowing for their own interface, or as objects within an interface to the original. As we are now looking for the existence of appropriate algebras, we will treat these sublists as objects within an interface.
I take for granted that the natural monoidal operator for lists has already been well researched. I thus declare concatenation as our monoid, with the empty list as the identity element, to be our primary algebra.
Interface Layer
With a collection of substructures and a monoidal operator forming an algebra, we now look to turn this space into an interface layer by finding it a compressed representation. Furthermore, as we support structural composability in our design, our compression must also account for this.
I assert the natural compression of a list is a selector, which is a pair consisting of two positional endpoints. For comparison, in the context of C++ these would be lifted as boundary nodes (or pointers) of a list. Selectors hold value given that their positional endpoints have the following properties:
- They compress longer lists which would otherwise be represented as three or more positional points in full.
- They act as boundaries for which structural and operational composition are naturally supported.
- They can be used to locally decompress and retrieve required information to algorithmically navigate and operate within this interface layer.
With our selectors as compressed representation, we now need to show our underlying algebra still holds. Moreover, given that composability is a supported concern, with foresight we will divide our possible selectors into the following interval ranges:
- [a..b] closed: a selector containing both of its endpoints a, b;
- [a..b) closing: a selector containing its initial endpoint a, but not containing its terminal endpoint b;
- (a..b] opening: a selector not containing its initial endpoint a, but containing its terminal endpoint b;
- (a..b) open: a selector containing neither of its endpoints a, b.
Keep in mind our focus is navigational in nature, which is to say we privilege structure, not content. As such the endpoints in the above ranges are structural locations, not to be confused with list values.
From a narrative perspective, we limit ourselves to a single selector interface layer, but in implementation it is practical to split this into three layers: Selectors which represent whole lists, selectors which represent arbitrary sublists, and the special case where the selector's endpoints are the same—in this case such selectors may be optimized as the better known iterators which already exist in C++.
Ecology
Our ecology consists of operators which allow us to navigate toward all possible selectors. Structural differences can be summarized as differences in length, given that our selectors are intended to be the compressed forms of lists. This realization informs the following interpretation of the lifecycle stages of a selector as structure:
- Construction:
- initialize: This operator returns an empty selector with its endpoints set to the zero positional.
- Extension:
- catenate: This operator takes two selectors, links the terminal endpoint of the first, to the initial endpoint of the second, returning the single selector remaining.
- preserve: This operator takes a selector, and shifts its initial endpoint outward.
- reserve: This operator takes a selector, and shifts its terminal endpoint outward.
- Mutation:
- decrement: This operator takes a selector, and shifts its initial endpoint outward, while shifting its terminal endpoint inward.
- increment: This operator takes a selector, and shifts its initial endpoint inward, while shifting its terminal endpoint outward.
- Reduction:
- split: This operator takes a selector and a midpoint, and returns two new selectors with the specified midpoint as their endpoints.
- pretract: This operator takes a selector, and shifts its initial endpoint inward.
- retract: This operator takes a selector, and shifts its terminal endpoint inward.
- Destruction:
- terminalize: This operator takes an empty selector and sets its endpoints to the zero positional.
I have only outlined the most basic operators. There will be distinct implementations regarding selector parameters such as interval type (closed, closing, opening, open) as well as interval direction (forward, backward). Furthermore, each basic operator may be readily extended to more powerful variations which allow for arbitrary shift distances.
Content Mutation
Structure and content are othogonal, and we have exploited this property so far to focus entirely on structural operators of a list. In implementation it is worth considering an algebra of content oriented operators as well:
- repeat: This operator takes a selector and a content value, and assigns the content value to the range of selector values.
- morph: This operator takes a selector, a function, and a numerical range, and assigns to the selector values the image values of the function applied to the numerical range.
- map: This operator takes a selector, a function, and a list of the same length, and assigns to the selector values the image values of the function applied to the list.
In practice, it is worth optimizing the special case of map when the function is the identity. In this case, map simplifies to a "copy" operator, named here as assign. This operator may be further optimized in the special case when the selector is actually an iterator: In such a case, assign simplifies to a single memory assignment also known as set.
Batching
In implementation, content mutation is often interleaved with structural navigation. This is notably the case when optimizing to reduce parse cycles, which translates to refactoring several content mutations over the same navigational path within a structure, also known as batching. In any case, it is necessary to classify all operators as either structural or content oriented. In the context of C++ a subtlety arises as dynamic memory requires the use of allocators and deallocators, but how do we categorize these operators?
I submit that memory oriented operators should be interpreted as an additional form of content mutation. This can be seen by viewing hardware memory as a long array of addresses. In this case hardware memory is then a data structure in which each address holds application content as well as memory management content. The noteworthy pattern here being that both can be considered location content.
As such, our design will continue privileging navigational operators over mutative operators, and in the case of batching where we interleave these operations in practice, priority still goes to the act of iterating over the navigational path, and only then, while at a given location, do we batch mutate (including mutations of memory access). In practice, we thus add batch parameter support to navigational operators such as repeat, morph, map, as well as immutate, allocate, deallocate.