9 Recipe for Self-Referential Data (Recursive Types)
Some information has a recursive (self-referential) structure. For example, a natural number is either zero or the successor of a predecessor, which is also a natural number. An ancestor is a parent or the ancestor of a parent. A descendant is a child or the descendant of a child.
Such information can be represented by an instance of a self-referential data type: The type to be defined is mentioned in its definition. Recursive types are a special case of variant types: At least one variant is self-referential and at least one variant is not self-referential. As we shall see, self-referential data can represent information of arbitrary size.
Examples include lists and trees. A list is either empty (variant 1, not self-referential) or a value followed by a shorter list (variant 2, self-referential). A binary tree is either empty (variant 1, not self-referential) or a node with a left binary tree, a value, and a right binary tree (variant 2, two self-references).
PostFix uses arrays to represent self-referential data. The datadef operator allows to describe recursive data types. The operator also generates the necessary constructors, detectors (variant tests), and accessors.
list-sum.pf serves as the example for self-referential data.
9.1 Steps
9.1.1 Problem statement
Write down the problem statement as a comment. The problem statement should answer these questions: What information needs to be represented? What should the function (to be implemented) do with the data? What cases need to be considered?
#<
Computes the sum of the values of a list of integer numbers.
>#
9.1.2 Data definition
Write how the domain information should be represented in the program, or, vice versa, write how to interpret the data as real-world information.
Determine and name the variants. Determine the self-references. At least one variant must not be self-referential (base case) and at least one variant has to be self-referential (recursive case). For a list of numbers there are two variants. The variant that is not self-referential, which we call Null, is the empty list. (The name empty is already taken in PostFix.) The self-referential variant, which we call Cons, is an integer number followed by a list. Cons stands for construct, as it constructs a longer list by prepending a number to a shorter list. Determine the data types that occur in the different variants.
Null: no additional data in this variant, base case
Cons: two data items in this variant: value is an integer number, rest is a self-reference to the List type
Write the data definition:
List: {
Null: () # empty list
Cons: (value :Int, rest :List) # value, followed by list
} datadef
Note that the Cons variant could also be defined as Cons: (front :List, value :Int), in which case the value would be appended at the end of the existing list and not prepended in front of the existing list as in the data definition above.
9.1.3 Example Values for Data Definition
Create at least one example value per variant in the data definition. Create examples that use the self-referential variant(s) more than once, i.e., create examples of different lengths. These value examples are later used in the functional examples and expected results.
null # variant 1
10 null cons # variant 2 and then variant 1
10 20 null cons cons # variants 2, then variant 2, then variant 1
10 20 30 null cons cons cons # variants 2, 2, 2, and finally 1
9.1.4 Function name
list-sum:
9.1.5 Parameter list
list-sum: (a :List -> :Int)
9.1.6 Function stub
list-sum: (a :List -> :Int) {
0
} fun
9.1.7 Purpose statement
Briefly describes what the function does. Do not try to explain how. The purpose statement should ideally be a single sentence. However, multiple sentences may be necessary.
# Computes the sum of the values of the list.
9.1.8 Examples and expected results
Write down several examples with expected results in the test function. Write at least one example per variant in the data definition. Use the example values created above. Check that the code is parsed correctly. (Some tests will fail for the stub.)
list-sum-test: {
null list-sum 0 test=
10 null cons list-sum 10 test=
10 20 null cons cons list-sum 30 test=
10 20 30 null cons cons cons list-sum 60 test=
test-stats
} fun
list-sum-test
9.1.9 Template
Translate the data definition into a function template. Use the cond or cond-fun operators with suitable condition-action pairs. Write one condition per variant using the detector function for the respective variant. In the corresponding actions add the accessor functions. If the variant is self-referential, add one recursive call per self-reference.
The translation of the data definition of the list of integer numbers into a function template is:
list-sum: (a :List -> :Int) {
{ a null? } { ... }
{ a cons? } {
... a cons-value ...
... a cons-rest list-sum ...
}
} cond-fun
9.1.10 Function body
Implement the function body. Put required helper functions on a "wish list." These will be implemented later.
Combine the expressions in the template to obtain the expected results according to the test examples. Start with filling the non-recursive cases. For each such case there should be an example with an expected result. If not, add such examples to the test function. For the self-referential cases things are a bit more difficult. Use the recursive call as if the function had already been implemented and did what the purpose statement says. A meaningful purpose statement is thus especially important for recursive functions. The assumption that the recursive call works correctly (on the rest of the list without the first value) is the induction hypothesis. This assumption has also been termed the "leap of faith". Using the call in the implementation (and combining it with the first value) is the induction step. For the non-recursive base case we know that the function returns the right result. Do not try to mentally descend into the recursion as this typically does not work out.
# Computes the sum of the values of the list.
list-sum: (a :List -> :Int) {
{ a null? } { 0 }
{ a cons? } {
a cons-value
a cons-rest list-sum +
}
} cond-fun
The function uses the result of the recursive call (which is correct according to the induction hypothesis) and combines the result by adding it to the first value of the list (induction step).
9.1.11 Testing
Check that the function body satisfies the tests. Correct the function body (and the tests). Look for opportunities to simplify the structure of the code. This typically requires multiple iterations.
list-sum.pf, line 16: Check passed.
list-sum.pf, line 17: Check passed.
list-sum.pf, line 18: Check passed.
list-sum.pf, line 19: Check passed.
All 4 tests passed!
9.1.12 Review and revise
Review and revise the function name, the parameter names, and the purpose statement. Improve them if necessary. A design is not complete until it has a purpose statement and tests. The purpose statement should describe what the function computes (not how it does the computation) and should mention the given inputs and produced result. The test examples should cover corner cases and a typical case.
9.2 Exercises
Write a data definition for a list of female ancestors. Each value is the name, year of birth, and year of death. Write a function that computes the average age of a list of ancestors.
Write a data definition for a binary tree of parents. Each value is the name of the father and the name of the mother. Write a function that computes the most frequent name.