Design Recipes in C

The aim of the design recipes presented below is to help you to get from a textual problem statement to a well-organized and reliable solution. The design recipes are strongly inspired by How to Design Programs, Second Edition. The recipes presented below are available for different forms of data – from simple atomic data, like integer numbers, to arbitrary-length lists. The examples may seem simple and you may be tempted to solve them without following each of the steps presented below. We still urge you to follow these steps to learn and automate them and to be prepared to solve more complicated programming problems later on. Each step produces a well-defined intermediate product and makes it less likely that you stare at a blank screen, not knowing how to start the solution. The main steps in the design recipes are:

Problem statement
Data definition
Purpose statement and function header
Examples
Implementation
Test and revision

In the problem statement step the programmer analyzes the problem, extracts its essence and represents it as a textual description. The problem statement may already be given, but even then you need to analyze it to extract useful information.

The data definition step is about choosing the data types that your program uses to represent the real-world information from the problem statement. Example data types are integer numbers, text, and Boolean values (true, false). The data definition expresses how you wish to represent real-world information as data in the program. An example is real-world temperatures measured in degrees Celsius are represented as numbers in the program. Conversely, the data definition may state how to interpret data items found in the program. For example, the interpretation of the number is degrees Celsius. The following figure illustrates the mediating role of the data definition. (Note that these notions of data and information differ from others, e.g., in information theory.)

The purpose statement describes the effect of the program on the input data. We here focus on the program as an algorithm or function that takes some kind of input data and produces some kind of result data. An example purpose statement would be: “Takes a temperature value in degrees Celsius and returns the corresponding value in degrees Fahrenheit.” A closely related step is to come up with a descriptive name for the algorithm, as well as the names and types of the parameters (input) and the type of the result (output). In the C programming language this requires writing a function header in a specific way. An example header for a function that converts degrees Celsius to degrees Fahrenheit would be:

int celsius_to_fahrenheit(int celsius);

The function name (celsius_to_fahrenheit) is used to call this function, i.e., to apply it to a specific value. The parameter named celsius has type integer number. The result type is an integer number as well. This function maps an integer number to another integer number:

int -> int

The next step is to come up with examples of what you expect the algorithm (here: function) to do on a couple of inputs of your choice. For example, you may know that if you call celsius_to_fahrenheit with 0 °C, you expect a result of 32 °F (the conversion formula is °F = °C * 9 / 5 + 32):

Given: 0, expect: 32
Given: 10, expect: 50
Given -5, expect: 23
Given: 100, expect: 212

Coming up with these examples is trivial in this case, as the formula is known and can directly be written as one line in C, but in general it very much helps to mentally prepare the subsequent implementation. The examples will also serve as test cases later on. The examples should be chosen in such a way that they cover typical cases and boundary cases. More on this later.

With the function header and several examples the implementation step is greatly simplified. The implementation defines how to compute the result from the parameters. Depending on the complexity of the data type it might be helpful to first write down a template of the implementation. For simple data types and problems, the implementation can often be directly written down. For the above example the formula can directly be translated to a return statement in C.

int celsius_to_fahrenheit(int celsius) {
return celsius * 9 / 5 + 32;
}

The test and revision step involves checking whether the implementation produces the expected results. It also involves revising the function to correct or improve it. One (naive) option is to call the function on the examples, let the program print the function results on the console, and then manually check the output:

printiln(celsius_to_fahrenheit(0));   // output:  32
printiln(celsius_to_fahrenheit(10));  // output:  50
printiln(celsius_to_fahrenheit(-5));  // output:  23
printiln(celsius_to_fahrenheit(100)); // output: 212

The function printiln takes an integer number as its parameter, prints it to the console, and then prints a line break. The functions are applied inside-out, i.e., first celsius_to_fahrenheit is applied to 0, the result 32 is returned, and then printiln is applied to 32.

The manual solution is problematic, because the programmer has to remember to check – after each change of the function – whether the output still corresponds to the expected values. For the above implementation this is indeed the case as the console output of the program is:

32
50
23
212

Instead of manually checking whether the actual results match the expected results, a better option is to automatically check the examples. To this end, a check function is called with the actual result and the expected result. For integer values the test function is

check_expect_i(actual, expected);

where actual and expected are integer numbers (type int). The actual parameter is replaced by a function call on the value to test. For example, if function celsius_to_fahrenheit is given 0, we expect 32 as the result. The check becomes:

check_expect_i(celsius_to_fahrenheit(0), 32);

If actual is equal to expected, then the console output is:

check passed

If actual is not equal to expected (and the function erroneously produces 99 as its result), then the output is:

actual value 99 differs from expected value 32

The last and final sub-step is to revise the function. If the function fails a test case, the need for a revision is obvious. But even if the function is correct, there might be an opportunity to implement it more clearly or to give it a more descriptive name.

The test cases should always remain in the implementation file and not be deleted after the implementation. That way, later changes can still be checked against the test cases. In fact, the test cases should go in their own function, named by appending _test to the function name, e.g., celsius_to_fahrenheit_test:

static void celsius_to_fahrenheit_test(void) {
    // given 0, expect 32
    check_expect_i(celsius_to_fahrenheit(0), 32);
    // given 10, expect 50
    check_expect_i(celsius_to_fahrenheit(10), 50);
    // given -5, expect 23
    check_expect_i(celsius_to_fahrenheit(-5), 23);
    // given 100, expect 212
    check_expect_i(celsius_to_fahrenheit(100), 212);
}

The static keyword means that this test functions can only be called from within the same C file. The keyword void as parameter and return type means that the test function does not take any parameters and does not return any results. Its only purpose is to output, for each test case, whether it passed or failed. The // is a comment to human readers of the source code and does not have any effect in the program.

The functions printiln and check_expect_i are part of the Programming I C Library.

1.2 Programming I C Library

The Programming I C Library defines basic data types and functions to simplify C programming for beginners and to enable the step-by-step design approach outlined above. You can download the library with examples from this script and from the lecture.

1.3 Structure

The rest of this script is organized along increasingly complex kinds of data. In each week of the lecture we will cover one or two of these design recipes and apply them to the respective kinds of data. Elements of the C programming language will be introduced as necessary. This approach avoids the blocked presentation of all of the C language features in one place. It might sometimes be necessary to use certain language elements before explaining them in every detail. However, by the end of the lecture you should have a reasonably complete overview of the C language and – much more importantly – of a systematic approach to programming.

Learning to program – similar to learning a sport or a musical instrument – requires time and practice. Don’t let initial problems frustrate you. Learning to program is an excellent way to sharpen your analytical skills. Analyzing a problem, exploring solution possibilities, and evaluating a solution are skills that are necessary in many domains, not only in computer programming.

The general recipe presented above is adapted to different kinds of data. This script covers these specific recipes:

Recipe for Atomic Data
Recipe for Enumerations
Recipe for Intervals
Recipe for Itemizations
Recipe for Compound Data (Product Types)
Recipe for Variant Data (Sum Types)
Doing Things Repeatedly: Recipes for loops and recursion
Arrays: Fixed-Size Collections
Lists: Variable-Size Collections

2 Recipe for Atomic Data

Atomic data is data that cannot be broken down further. Examples are integer (int) and floating point numbers (double), Boolean values (bool), and strings (String). It could be argued that text strings are not atomic as they can be further subdivided into characters (char). Yet we treat strings as atomic here.

wages.c serves as the example for atomic data.

2.1 Steps

2.1.1 Problem statement

Write down the problem statement as a comment. The problem statement should answer these questions: What information needs to be represented? What should the function (to be implemented) do with the data? What cases need to be considered?

/*
Design a function that computes weekly wages with overtime from the number of hours worked. The hourly rate is 10 €/hour. Regular working time is 40 hours/week. Overtime is paid 150% of the normal rate of pay.
*/

2.1.2 Data definition

Write how the real-world information (also called domain information) should be represented in the program. Conversely, write how to interpret the data (e.g., a number) as real-world information (e.g., degrees Celsius). Write type definitions (typedef) if this clarifies things. Alternatively, you may decide to use existing types directly (e.g., int, double, bool, or String) and describe the interpretation in a comment.

Informal data definition (given as a comment):

// int represents degrees Celsius

The informal data definition is ignored by the compiler. In the program you have to remember that the value of type int actually represents degrees Celsius. Alternatively, you may introduce a synonym (e.g., Celsius) for an existing datatype (like int) and use that synonym instead of the existing data type. Choose the new type name so as to clarify the interpretation of a value in the program. In C new names for types are defined using the typedef keyword. Examples of formal data definitions of this kind are:

typedef int Celsius; // represents the temperature in degrees Celsius
typedef double MyType; // interpretation...
typedef String MyType; // interpretation...
typedef char MyType; // interpretation...

For the wages example, the formal data definition is:

typedef int Hours; // represents hours worked
typedef int Cents; // represents wage in cents

A downside of using type definitions like this is that they hide the actual data type that is used to represent the value. For example, Celsius is here defined in terms of integral numbers. It would also be reasonable to represent fractions of degrees (typedef double Celsius). The synonym Celsius does not tell what the underlying data type is.

2.1.3 Function signature

Write down the function signature (as a comment). The parameter types go left of the arrow (comma separated). The result type goes right of the arrow.

// Hours -> Cents

Equivalently, if using the actual data type:

// int -> int

2.1.4 Function name

Conceive a preliminary function name. This should ideally be a short non-abbreviated name. You may revise the name and find a better name in the last step.

hours_to_wages

2.1.5 Function header

Add the function signature to the name. Come up with preliminary parameter names. These should ideally be short non-abbreviated names. This completes the header of the function.

Cents hours_to_wages(Hours hours);

Equivalently, if using the actual data type:

int hours_to_wages(int hours);

2.1.6 Function stub

Write down the function stub, returning an arbitrary value from the range of the function. Check that the code compiles.

Cents hours_to_wages(Hours hours) {
return 0;
}

Equivalently, if using the actual data type:

int hours_to_wages(int hours) {
return 0;
}

2.1.7 Purpose statement

Write down a purpose statement (given as a comment).

// Computes the wage in cents given the number of hours worked.

2.1.8 Examples and expected results

Write down examples with expected results in the test method. Define any constants that the function needs. Check that the code compiles. (Some tests will fail for the stub.)

Examples:

For 0 hours worked, expect 0 cents.
For 20 hours worked, expect 20 * 1000 cents.
For 39 hours worked, expect 39 * 1000 cents.
For 40 hours worked, expect 40 * 1000 cents.
For 41 hours worked, expect 40 * 1000 + 1 * 1500 cents.
For 45 hours worked, expect 40 * 1000 + 5 * 1500 cents.

Corresponding test cases in test function:

void hours_to_wages_test() {
    check_expect_i(hours_to_wages(0), 0);
    check_expect_i(hours_to_wages(20), 20 * 1000);
    check_expect_i(hours_to_wages(39), 39 * 1000);
    check_expect_i(hours_to_wages(40), 40 * 1000);
    check_expect_i(hours_to_wages(41), 40 * 1000 + 1 * 1500);
    check_expect_i(hours_to_wages(45), 40 * 1000 + 5 * 1500);
}

The check_expect_i function takes two arguments: The actual result and the expected result. Each check will report the line number on which it appears. This helps to locate failed tests in the source code. The first argument contains the call to the function that is to be tested. This evaluates to the return value of the call.

The test function is invoked from the main function. The main function is the entry point of a C program.

int main(void) {
hours_to_wages_test();
return 0; // return success value to operating system
}

Running the above tests on the stub produces:

wages.c, line 20: check passed
wages.c, line 21: Actual value 0 differs from expected value 20000.
wages.c, line 22: Actual value 0 differs from expected value 39000.
wages.c, line 23: Actual value 0 differs from expected value 40000.
wages.c, line 24: Actual value 0 differs from expected value 41500.
wages.c, line 25: Actual value 0 differs from expected value 47500.
5 of 6 tests failed.

2.1.9 Function body

Implement the function body. Put required helper functions on a wish list. These will be implemented later.

How to identify the need for a helper function: A function should perform one well-defined task. A change in domain or data type should be outsourced in a helper function. A reusable subtask should be outsourced in a helper function (Don’t Repeat Yourself, DRY). It is often helpful to design a stub for the helper functions. This way you can already compile the program.

The implementation of the example function does not need helper functions. However, it needs an if-statement to decide whether the working time includes overtime.

// Computes the wage in cents given the number of hours worked.
Cents hours_to_wages(Hours hours) {
    if (hours <= 40) {
        return hours * 1000;
    } else {
        return 40 * 1000 + (hours - 40) * 1500;
    }
}

Equivalently, if using the actual data type:

// Computes the wage in cents given the number of hours worked.
int hours_to_wages(int hours) {
    if (hours <= 40) {
        return hours * 1000;
    } else {
        return 40 * 1000 + (hours - 40) * 1500;
    }
}

2.1.10 Testing

Check that the function body satisfies the tests. Correct the function body (and the tests). Look for opportunities to simplify the structure of the code. This typically requires multiple iterations.

Running the tests on the implemented function produces:

wages.c, line 20: check passed
wages.c, line 21: check passed
wages.c, line 22: check passed
wages.c, line 23: check passed
wages.c, line 24: check passed
wages.c, line 25: check passed
All 6 tests passed!

2.1.11 Review and revise

Review and revise the function name, the parameter names, and the purpose statement. Improve them if necessary. A design is not complete until it has a purpose statement and tests. The purpose statement should describe what the function computes (not how it does the computation) and should mention the given inputs and produced result. The test examples should cover corner cases and a typical case.

The complete source file wages.c now looks like this:

/*
Compile: make wages
Run: ./wages
*/

#include "base.h" // we use check_expect_i

/*
Design a function that computes weekly wages with overtime from the number of hours worked. The hourly rate is 10 €/hour. Regular working time is 40 hours/week. Overtime is paid 150% of the normal rate of pay.
*/

typedef int Hours; // represents hours worked
typedef int Cents; // represents wage in cents

// Hours -> Cents
Cents hours_to_wages(Hours hours); // function header

void hours_to_wages_test() {
    check_expect_i(hours_to_wages(0), 0);          // line 20
    check_expect_i(hours_to_wages(20), 20 * 1000); // line 21
    check_expect_i(hours_to_wages(39), 39 * 1000); // line 22
    check_expect_i(hours_to_wages(40), 40 * 1000); // line 23
    check_expect_i(hours_to_wages(41), 40 * 1000 + 1 * 1500);
    check_expect_i(hours_to_wages(45), 40 * 1000 + 5 * 1500);
}

// Computes the wage in cents given the number of hours worked.
Cents hours_to_wages(Hours hours) { // function implementation
    if (hours <= 40) {
        return hours * 1000;
    } else {
        return 40 * 1000 + (hours - 40) * 1500;
    }
}

int main(void) {
    hours_to_wages_test();
    return 0;
}

If you don’t like typedefs (to create synonyms of type names), you can omit them and use the underlying data type directly:

/*
Compile: make wages
Run: ./wages
*/

#include "base.h" // we use check_expect_i

/*
Design a function that computes weekly wages with overtime from the number of hours worked. The hourly rate is 10 €/hour. Regular working time is 40 hours/week. Overtime is paid 150% of the normal rate of pay.
*/

// int -> int
int hours_to_wages(int hours); // function header

void hours_to_wages_test() {
    check_expect_i(hours_to_wages(0), 0);          // line 20
    check_expect_i(hours_to_wages(20), 20 * 1000); // line 21
    check_expect_i(hours_to_wages(39), 39 * 1000); // line 22
    check_expect_i(hours_to_wages(40), 40 * 1000); // line 23
    check_expect_i(hours_to_wages(41), 40 * 1000 + 1 * 1500);
    check_expect_i(hours_to_wages(45), 40 * 1000 + 5 * 1500);
}

// Computes the wage in cents given the number of hours worked.
int hours_to_wages(int hours) { // function implementation
    if (hours <= 40) {
        return hours * 1000;
    } else {
        return 40 * 1000 + (hours - 40) * 1500;
    }
}

int main(void) {
    hours_to_wages_test();
    return 0;
}

Note that either the function header or the function implementation or both need to appear in the source file before a function can be called. Function headers are typically put in separate header files (like wages.h) and then included. In the above code, the function header of check_expect_i is located in file base.h from the Programming I C library. The #include line can be understood as the replacement of that line by the contents of the given file.

2.2 Generalizing the Function

The example function as written above contains some concrete bits of information, such as the work hours per week, the regular hourly rate, and the hourly rate for overtime. The weekly work hours are repeated multiple times, which contradicts the “Don’t Repeat Yourself” (DRY) principle. Such information should be represented as constants or as parameters to the function. What is represented as a constant and what is represented as a parameter depends on the needs of the application domain.

One possibility is to extract all information as constants:

const int WEEKLY_HOURS = 40; // regular work hours per week
const int HOURLY_RATE_REGULAR = 1000; // in cents
const int HOURLY_RATE_OVERTIME = 1500; // in cents

Cents hours_to_wages(Hours hours) {
    if (hours <= WEEKLY_HOURS) {
        return hours * HOURLY_RATE_REGULAR;
    } else {
        return WEEKLY_HOURS * HOURLY_RATE_REGULAR
               + (hours - WEEKLY_HOURS) * HOURLY_RATE_OVERTIME;
    }
}

Another possibility is to include all information as parameters:

Cents hours_to_wages(
    Hours weekly_hours,
    Cents hourly_rate_regular,
    Cents hourly_rate_overtime,
    Hours hours_worked)
{
    if (hours_worked <= weekly_hours) {
        return hours_worked * hourly_rate_regular;
    } else {
        return weekly_hours * hourly_rate_regular
               + (hours_worked - weekly_hours) *
                   hourly_rate_overtime;
    }
}

This makes the function more flexible, but also more difficult to use, because many arguments have to be provided. Typically only the hourly rate and hours worked change frequently (from worker to worker and for different weeks, respectively).

2.3 Exercises

Design a function has_overtime that computes whether or not the given number of work hours contains overtime.
Modify hours_to_wages such that the program stops if hours is negative. The function call exit(1); stops the program with error code 1.

3 Evaluating Functions Without Side-Effects

A function is like a cloze (a text with _____ gaps). To complete a cloze you fill in the right words. To call a function you fill in parameters of the right type. When the function

int hours_to_wages(int hours) {
    if (hours <= 40) {
        return hours * 10;
    } else {
        return 40 * 10 + (hours - 40) * 15;
    }
}

is called, the parameter hours is replaced by an actual integer number and then the body of the function is evaluated to an integer number. For example, the step by step evaluation of hours_to_wages(45) looks like this:

    hours_to_wages(45)
    Substitute the call with the function body, in which the parameter hours has been replaced with the argument value (45).
=>  if (45 <= 40) {
        return 45 * 10;
    } else {
        return 40 * 10 + (45 - 40) * 15;
    }
    Evaluate the condition (45 <= 40) of the if-statement (to false).
=>  if (false) {
        return 45 * 10;
    } else {
        return 40 * 10 + (45 - 40) * 15;
    }
    Because the condition is false, replace the evaluated if-statement with its else-part.
=>  return 40 * 10 + (45 - 40) * 15;
    Evaluate left multiplication operator.
=>  return 400 + (45 - 40) * 15;
    Evaluate sub-expression in parentheses.
=>  return 400 + 5 * 15;
    Evaluate right multiplication operator.
=>  return 400 + 75;
    Evaluate plus operator.
=>  return 475;
    Return the result value.
=>  475

For this particular function, the evaluation can be understood in analogy to simplifying an algebraic expression. Unfortunately, as we shall see, in C this is not always the case. Some functions have so-called side-effects, i.e., they modify some variable or they produce some output in addition to returning a value.

4 Recipe for Enumerations

An enumeration type can represent one of a fixed number of distinct values. Examples of information of this type include gender (male, female), music genres (pop, rock, classic, jazz, etc.), continents (Europe, Asia, Africa, etc.), and traffic light colors (red, green, yellow). Each individual enumerated value is a distinct value and does not carry any additional data. In C, such types can be represented as enumerations.

traffic_light.c serves as the example for enumerations.

4.1 Steps

4.1.1 Problem statement

/*
Design a function that returns the next color of a traffic light given the current color of the traffic light.
*/

4.1.2 Data definition

Write how the domain information should be represented in the program. Conversely, write how to interpret the data (e.g., a number) as real-world information (e.g., a traffic light state). Write a formal data definition as an enumeration.

enum TrafficLight {
    RED,
    GREEN,
    YELLOW
}; Forgetting the ; at the end of the enum definition is a syntax error.

This in effect creates three integer constants, RED with value 0, GREEN with value 1, and YELLOW with value 2. But these details are not essential right now.

4.1.3 Function signature

Write down the function signature (as a comment). The parameter types go left of the arrow (comma separated). The result type goes right of the arrow.

// enum TrafficLight -> enum TrafficLight

Remember, the problem statement says that we need a function that returns the next color of the traffic light, given its current color.

4.1.4 Function name

Conceive a preliminary function name. This should ideally be a short non-abbreviated name. You may revise the name and find a better name in the last step.

traffic_light_next

4.1.5 Function header

Add the function signature to the name. Come up with preliminary parameter names. These should ideally be short non-abbreviated names. This completes the header of the function.

enum TrafficLight traffic_light_next(enum TrafficLight tl);

4.1.6 Function stub

Write down the function stub, returning an arbitrary value from the range of the function. Check that the code compiles.

enum TrafficLight traffic_light_next(enum TrafficLight tl) {
return RED;
}

This function definition already compiles, but it gives us a traffic light that is constantly red – how frustrating.

4.1.7 Purpose statement

Write down a purpose statement (given as a comment).

// Produces the next color of a traffic light
// given the current color of the traffic light.

4.1.8 Examples and expected results

Write down examples with expected results in the test method. Define any constants that the function needs. Check that the code compiles. (Some tests will fail for the stub.)

Examples:

If the traffic light is RED, expect GREEN as the next state.
If the traffic light is GREEN, expect YELLOW as the next state.
If the traffic light is YELLOW, expect RED as the next state.

Corresponding test cases in test function:

void traffic_light_next_test() {
    check_expect_i(traffic_light_next(RED), GREEN);
    check_expect_i(traffic_light_next(GREEN), YELLOW);
    check_expect_i(traffic_light_next(YELLOW), RED);
}

Because RED, GREEN, and YELLOW are actually integer constants, we can use the check_expect_i function that takes integer arguments (hence the _i).

The test function is invoked from the main function:

int main(void) {
traffic_light_next_test();
return 0;
}

Running the above tests on the stub (which always returns RED) produces:

traffic_light.c, line 18: Actual value 0 differs from expected value 1.
traffic_light.c, line 19: Actual value 0 differs from expected value 2.
traffic_light.c, line 20: check passed
2 of 3 tests failed.

The output shows that enumerations are represented as integer values. The first element of the TrafficLight enumeration (here RED) gets assigned value 0, the second element (here GREEN) gets value 1, and the third one (here YELLOW) gets value 2.

4.1.9 Function body

Implement the function body. Put required helper functions on a wish list. These will be implemented later.

The implementation of the example function does not need helper functions:

// Produces the next color of a traffic light
// given the current color of the traffic light.
enum TrafficLight traffic_light_next(enum TrafficLight tl) {
    if (tl == RED) {
        return GREEN;
    } else if (tl == GREEN) {
        return YELLOW;
    } else if (tl == YELLOW) {
        return RED;
    }
    return RED; // this line should never be reached
}

The last return statement should never be reached, because all possible cases are covered above it. However, some C compilers cannot figure this out automatically and need a return statement at this point.

4.1.10 Testing

Check that the function body satisfies the tests. Correct the function body (and the tests). Look for opportunities to simplify the structure of the code. This typically requires multiple iterations.

The complete file traffic_light.c now looks like this:

/*
Compile: make traffic_light
Run: ./traffic_light
*/

#include "base.h"

/*
Design a function that returns the next color of a traffic light
given the current color of the traffic light.
*/

enum TrafficLight {
    RED,
    GREEN,
    YELLOW
};

// enum TrafficLight -> enum TrafficLight
enum TrafficLight traffic_light_next(enum TrafficLight);

void traffic_light_next_test() {
    check_expect_i(traffic_light_next(RED), GREEN);    // line 18
    check_expect_i(traffic_light_next(GREEN), YELLOW); // line 19
    check_expect_i(traffic_light_next(YELLOW), RED);   // line 20
}

// Produces the next color of a traffic light
// given the current color of the traffic light.
enum TrafficLight traffic_light_next(enum TrafficLight tl) {
    if (tl == RED) {
        return GREEN;
    } else if (tl == GREEN) {
        return YELLOW;
    } else if (tl == YELLOW) {
        return RED;
    }
    return RED;
}

int main(void) {
    traffic_light_next_test();
    return 0;
}

Running the tests on the implemented function produces:

traffic_light.c, line 18: check passed
traffic_light.c, line 19: check passed
traffic_light.c, line 20: check passed
All 3 tests passed!

4.1.11 Review and revise

In the given situation, the switch-statement is a good alternative to if-statements.

enum TrafficLight traffic_light_next(enum TrafficLight tl) {
    switch (tl) {
        case RED: return GREEN; // leave the function
        case GREEN: return YELLOW; // leave the function
        case YELLOW: return RED; // leave the function
    }
    return RED;
}

Since tl is an enumeration, many C compilers can automatically check whether the switch-statement handles all the cases of the enumeration. In C an enumeration value is just an integer. A careless programmer could set an invalid traffic light value (e.g., by calling traffic_light_next(-123)).

However, the switch-statement is dangerous. It falls through all conditions after the matching one, unless the case is terminated by a break-statement or a return-statement (the latter also terminates the function).

enum TrafficLight traffic_light_next(enum TrafficLight tl) {
    switch (tl) {
        case RED:
            ...
            break; <--- leave the switch-statement!
        case GREEN:
            ...
            break; <--- leave the switch-statement!
        case YELLOW:
            ...
            break; <--- leave the switch-statement!
    }
    // <--- the breaks jump to this point after the switch statement
    ...
    return ...;
}

Moreover, the cases in the switch-statement have to be constant integer values. Never put anything else there! If you are unsure, use if-statements!

Given this information, we can write a reusable template for functions that handle traffic lights:

... fn_for_traffic_light(enum TrafficLight tl) {
    switch (tl) {
        case RED:
            ...
            return...;
        case GREEN:
            ...
            return...;
        case YELLOW:
            ...
            return...;
    }
    return 0;
}

This template may be copied, pasted, and adapted for new functions that operate on this data type.

4.2 Type definitions and enumerations

If an enumeration is defined as

enum TrafficLight {
    RED,
    GREEN,
    YELLOW
};

it is necessary to use the keyword enum when declaring variables of this type. This can be avoided by adding the following type definition after the enum definition (or by combining enum declarations and type definitions).

typedef enum TrafficLight TrafficLight;

In general:

typedef <my complicated type> MyNewNameForComplicatedType;

Now you can simply write TrafficLight instead of enum TrafficLight. The name of the enumeration and the name of the defined type may be identical, as is the case here.

4.3 Exercises

Design a function continent_next that computes the next continent for a sailing ship. The route of the sailing ship is influenced by the wind direction. The next continent depends on the current continent and the wind direction (the direction from which the wind originates). Among others, the following relations hold: Europe –northwind–> Africa, Europe –westwind–> Asia, Asia –eastwind–> Europe. Complete these relations using a world map. To simplify the exercise assume that the Americas and Australia do not exist, and that if a ship reaches Antarctica it sinks. Assume that the only wind directions (directions from which the wind originates) are east, west, north, and south, and that they do not change during a single journey of a sailing ship from one continent to another. Use suitable enumerations.

5 Recipe for Intervals

When the domain information that needs to be represented in the program consists of one or more ranges of numbers, then the data definition is concerned with handling intervals. An example would be a taxation scheme in which there is no taxation for goods below 1000 €, moderate taxation between 1 k€ – 10 k€, and high taxation for goods above 10 k€. When dealing with intervals special care needs to be taken to treat the boundary cases correctly. In the example above, is a product of 1000 € still without taxation or does it fall in the next interval? Boundaries can be represented precisely in this way: [0, 1000), [1000, 10000), [10000, inf), where brackets [ and ] denote inclusive boundaries and parentheses ( and ) denote exclusive boundaries: 1000 does not belong to the interval [0, 1000), but 0 does.

Data definitions for intervals typically use enumerations to name each interval as well as constants to define the boundaries of the intervals.

tax.c serves as the example for intervals.

5.1 Steps

5.1.1 Problem statement

/*
A fictitious country has decided to introduce a three-stage sales tax. Cheap items below 1 k€ are not taxed. Goods of more than 10 k€ are taxed at 10%. Items in between are taxed at a rate of 5%. Give a data definition and define a function that computes the amount of tax for a given item price.
*/

5.1.2 Data definition

Define how the domain information should be represented in the program or, vice versa, how to interpret the data (e.g., a number) as real-world information (e.g., a tax rate). Name each interval in an enumeration.

enum TaxStage {
    NO_TAX,
    LOW_TAX,
    HIGH_TAX
};

A type definition is used to represent the currency. Domain knowledge is necessary to decide whether int is a suitable representation for the currency.

typedef int Euro; // int represents Euro

This typedef defines Euro a synonym for int. It is optional. Alternatively, you may use int directly.

Capture the interval boundaries as constants.

const Euro LOW_TAX_BOUNDARY = 1000; // interpret.: price in Euro
const Euro HIGH_TAX_BOUNDARY = 10000; // interpret.: price in Euro

5.1.3 Function signature

Write down the function signature (as a comment). The parameter types go left of the arrow (comma separated). The result type goes right of the arrow.

// Euro -> Euro

Equivalently, if using int directly:

// int -> int

5.1.4 Function name

Conceive a preliminary function name. This should ideally be a short non-abbreviated name. You may revise the name and find a better name in the last step.

sales_tax

5.1.5 Function header

Add the function signature to the name. Come up with preliminary parameter names. These should ideally be short non-abbreviated names. This completes the header of the function.

Euro sales_tax(Euro price);

Equivalently, if using int directly:

int sales_tax(int price);

5.1.6 Function stub

Write down the function stub, returning an arbitrary value from the range of the function. Check that the code compiles.

Euro sales_tax(Euro price) {
return 0;
}

Equivalently, if using int directly:

int sales_tax(int price) {
return 0;
}

5.1.7 Purpose statement

Write down a purpose statement (given as a comment). The purpose statement should describe what the function computes (not how it does that) and should mention the given inputs and produced result.

// Returns the amount of tax for the given price.

5.1.8 Examples and expected results

Write down examples with expected results in the test method. The test examples should cover corner cases (e.g., boundary values) and a typical case of each category (e.g. a value from the interior of an interval). Boolean functions should test positive and negative examples. You may already define constants for use in the implementation. Check that the code compiles. (Some tests will fail for the stub.)

Examples:

For a price of 0 € expect a sales tax of 0 €.
For a price of 537 € expect a sales tax of 0 €.
For a price of 1000 € expect a sales tax of 50 €.
For a price of 1282 € expect a sales tax of 64 €.
For a price of 10000 € expect a sales tax of 1000 €.
For a price of 12017 € expect a sales tax of 1202 €.

Corresponding test cases in test function:

int round_to_int(double d); // function header

static void sales_tax_test() {
    check_expect_i(sales_tax(0), 0);
    check_expect_i(sales_tax(537), 0);
    check_expect_i(sales_tax(1000),
                        round_to_int(1000 * 0.05));
    check_expect_i(sales_tax(1282),
                        round_to_int(1282 * 0.05));
    check_expect_i(sales_tax(10000),
                        round_to_int(10000 * 0.10));
    check_expect_i(sales_tax(12017),
                        round_to_int(12017 * 0.10));
}

The test cases need a helper function for rounding to whole Euros. In C at least the function header needs to appear before the point of usage. Thus we include the function header of round_to_int before sales_tax_test.

int round_to_int(double d) {
return (int)round(d); // round included from math.h
}

Note that the test cases cover each of the boundary values of the intervals, and at least one value from the interior of each interval.

5.1.9 Function body

Implement the function body. Put required helper functions on a wish list. These will be implemented later.

The implementation uses round_to_int as a helper function.

// Returns the amount of tax for the given price.
Euro sales_tax(Euro price) { // Euro stands for int
    if (0 <= price && price < 1000) { // NO_TAX interval
        return 0;
    } else if (1000 <= price && price < 10000) { // LOW_TAX interval
        return round_to_int(0.05 * price);
    } else if (price >= 10000) { // HIGH_TAX interval
        return round_to_int(0.10 * price);
    }
    // error: if this line is reached then price < 0
    printsln("sales_tax, error: negative price");
    exit(1); // terminate the program
}

The implementation reflects the structure of the data. As specified in the enumeration there are three cases: NO_TAX, LOW_TAX, and HIGH_TAX. These three cases correspond to three intervals: [0, 1000), [1000, 10000), and [1000, inf). For each interval there is an if-statement whose condition matches one of the intervals. When formulating the conditions care has to be taken to handle the boundary values correctly. In addition, there is an error case for negative prices. Since Euro is represented as int, a negative number could be provided, which could lead to an error. To avoid this problem, the currency could be represented as an unsigned int, but this would make it difficult to represent debt. Unsigned types have special problems. We will avoid unsigned types in this lecture.

5.1.10 Testing

Check that the function body satisfies the tests. Correct the function body (and the tests). Look for opportunities to simplify the structure of the code. This typically requires multiple iterations.

int main(void) {
sales_tax_test();
return 0;
}

The given implementation satisfies all test examples:

tax.c, line 33: check passed
tax.c, line 34: check passed
tax.c, line 36: check passed
tax.c, line 38: check passed
tax.c, line 40: check passed
tax.c, line 42: check passed
All 6 tests passed!

5.1.11 Review and revise

Review and revise the function name, the parameter names, and the purpose statement. Improve them if necessary. A design is not complete until it has a purpose statement and tests.

The above implementation should be improved. It still contains the raw interval boundaries and taxation rates. These should be stored as constants. Moreover, the conditions in the if-statements can be simplified because they are processed in sequence from the top.

const double LOW_TAX_RATE = 0.05;
const double HIGH_TAX_RATE = 0.10;

// Returns the amount of tax for the given price.
Euro sales_tax(Euro price) { // Euro stands for int
    if (price < 0) { // error
        printsln("sales_tax, error: negative price");
        exit(1); // terminate the program
    } else if (price < LOW_TAX_BOUNDARY) { // NO_TAX interval
        return 0;
    } else if (price < HIGH_TAX_BOUNDARY) { // LOW_TAX interval
        return round_to_int(LOW_TAX_RATE * price);
    } else { // HIGH_TAX interval
        return round_to_int(HIGH_TAX_RATE * price);
    }
}

Domain knowledge might suggest to include a constant for the first category as well. There could be a revision of the taxation system, in which the first interval of [0, 1000) Euro becomes subject to a small taxation rate as well. Moreover, an additional interval for higher-priced items could be introduced, which would require adding a case. It is up to the programmer’s judgment to decide how far the generalization should go, because this can come at a cost. E.g., introducing another constant for the first interval could increase the effort to read the code.

If we are satisfied with the function, we can write a reusable template, which can serve as a basis when implementing functions related to this taxation scheme in the future.

const double LOW_TAX_RATE = 0.05;
const double HIGH_TAX_RATE = 0.10;
... fn_for_tax(Euro price) { // Euro stands for int
    if (price < 0) { // error
        printsln("sales_tax, error: negative price");
        exit(1); // terminate the program
    } else if (price < LOW_TAX_BOUNDARY) { // NO_TAX interval
        ...
        return ...;
    } else if (price < HIGH_TAX_BOUNDARY) { // LOW_TAX interval
        ...
        return ...;
    } else { // HIGH_TAX interval
        ...
        return ...;
    }
}

This template can be adapted for new functions that operate on this data type.

5.2 Exercises

A taxation rate of 1.5% is introduced for items below 1000 €. Modify the sales_tax function to reflect this change.
A revision of the taxation system introduces a linearly increasing taxation rate for the middle interval (1 k€ – 10 k€). As before, the taxation rate is 5% at 1 k€ and 10% at 10 k€, but is linearly interpolated in between. Modify the sales_tax function to reflect this change.

6 Recipe for Itemizations

Itemizations mix aspects of enumerations and intervals. Like enumerations they denote data which represents different alternatives, but at least one of the alternatives holds additional data. An example is information about the position of a train in a tunnel: Either there is no train in the tunnel (case 1) or there is a train at a certain position in the tunnel (case 2). Case 2 has the train’s position as additional data.

Data definitions for itemizations typically use enumerations to name each alternative as well as additional data for some of the cases (e.g., a number to represent position). The additional data is given either in a struct or in a union.

move_train.c serves as the example for itemizations.

6.1 Steps

6.1.1 Problem statement

/*
A critical section of a railway track has a length of 10 km. Trains pass through the critical section in both directions. At most one train is allowed on this critical section at any one time. A control system is to be implemented that provides the position of a train in that section (if there is one) or an indication that there is no train in that section. Define a data definition for this information. Define a function that takes this data definition and advances the train's position by a given amount (in km). This may result in the train leaving the critical section.
*/

6.1.2 Data definition

Define how the domain information should be represented in the program or, vice versa, how to interpret the data (e.g., a number) as real-world information (e.g., a train position). Name each alternative in an enumeration.

enum TrainTag { // enumerates all possible alternatives
    T_ABSENT,
    T_PRESENT
};

struct Train { // has the alternative and the additional information
    enum TrainTag tag; // alternative
    // the distance from the start of the critical section (in km)
    double position; // additional information for T_PRESENT
}; Forgetting the ; at the end of the struct definition is a syntax error.

struct Train make_train_none() { // constructor function
    struct Train t = { T_ABSENT, 0.0 };
    return t;
}

struct Train make_train_at(double position) { // constructor function
    struct Train t = { T_PRESENT, position };
    return t;
}

A train can either be absent or present. If it is present, the position is given as a floating point number. Position 0.0 represents the start of the critical section, position 10.0 represents its end. The constructor functions simplify the creation of values of type struct Train. There is one constructor function for the case of no train in the critical section (make_train_none) and one constructor function for the case of a train at a particular position in the critical section (make_train_at).

6.1.3 Function signature

Write down the function signature (as a comment). The parameter types go left of the arrow (comma separated). The result type goes right of the arrow.

// struct Train, double -> struct Train

The function consumes two values (a train and an amount of movement). Positive amounts move the train towards the end of the critical section. Negative amounts move it towards the start. Note that this aspect is not captured in the data definition.

6.1.4 Function name

Conceive a preliminary function name. This should ideally be a short non-abbreviated name. You may revise the name and find a better name in the last step.

move_train

6.1.5 Function header

Add the function signature to the name. Come up with preliminary parameter names. These should ideally be short non-abbreviated names. This completes the header of the function.

struct Train move_train(struct Train train, double amount);

6.1.6 Function stub

Write down the function stub, returning an arbitrary value from the range of the function. Check that the code compiles.

struct Train move_train(struct Train train, double amount) {
return make_train_none();
}

6.1.7 Purpose statement

Write down a purpose statement (given as a comment). The purpose statement should describe what the function computes (not how it does that) and should mention the given inputs and produced result.

// Advances the train by the given amount. Considers the case
// that the train enters or leaves the critical section.

6.1.8 Examples and expected results

Unfortunately, there are only check functions for existing types, like int. The new type struct Train requires that we implement a specific check function for that type, which checks its components. The dot operator (.) allows accessing the components (tag and position) of the structure. To be able to locate failed tests in the source code, we use __LINE__ and __FILE__, which the compiler will replace by the line number and file name, respectively.

Trains are considered equal if their tags are equal and their positions are within a small range. EPSILON is a positive floating-point value close to zero, such as 0.00000001. It is used because not every real number (a mathematical entity) can be represented exactly as a floating-point value (a representation in the computer). For the check to pass, the actual value has to be within +/-EPSILON of the expected value.

// helper function for comparing trains
bool check_expect_train(int line, struct Train t, struct Train s) {
    bool tags_equal = base_check_expect_i(__FILE__, line,
                                           t.tag, s.tag);
    bool positions_within = base_check_within_d(__FILE__, line,
                              t.position, s.position, EPSILON);
    return tags_equal && positions_within;
}

The functions base_check_expect_i and base_check_within_d are included from base.h. The first checks integers for equality. The second checks whether two floating-point values are within +/-EPSILON of each other. Both functions print a message about the result (i.e., whether the check passed or failed).

The test examples handle several cases, such as the borders of the critical section, advancement in its interor, and entering and leaving the critical section.

Examples:

Advancing a train that is not in the critical section (make_train_none()) by 3.0 km results in a train that is still not in the critical section (make_train_none()).

Advancing a train that is at the start of the critical section (make_train_at(0.0)) by 0.0 km results in a train that is still at the start of the critical section (make_train_at(0.0)).

Advancing a train that is at position 1.0 km of the critical section (make_train_at(1.0)) by -1.0 km results in a train that is at the start of the critical section (make_train_at(0.0)).

etc.

Corresponding test cases in test function:

static void move_train_test() {
    // absent trains, moving an absent train has no effect
    struct Train actual = make_train_none(); // absent train
    actual = move_train(actual, 3.0); // move absent train by 3.0 km
    struct Train expected = make_train_none(); // absent train
    check_expect_train(__LINE__, actual, expected);
    // same as above, but as nested calls
    check_expect_train(__LINE__, move_train(make_train_none(), 3.0),
                                 make_train_none());

    // borders
    check_expect_train(__LINE__, move_train(make_train_at(0.0), 0.0),
                                 make_train_at(0.0));
    check_expect_train(__LINE__, move_train(make_train_at(10.0), 0.0),
                                 make_train_at(10.0));
    check_expect_train(__LINE__, move_train(make_train_at(1.0), -1.0),
                                 make_train_at(0.0));
    check_expect_train(__LINE__, move_train(make_train_at(9.0), 1.0),
                                 make_train_at(10.0));

    // interior (both before and after advance)
    check_expect_train(__LINE__, move_train(make_train_at(1.0), 0.0),
                                 make_train_at(1.0));
    check_expect_train(__LINE__, move_train(make_train_at(5.5), 1.5),
                                 make_train_at(7.0));
    check_expect_train(__LINE__, move_train(make_train_at(4.5), -1.0),
                                 make_train_at(3.5));

    // leaving the section
    check_expect_train(__LINE__, move_train(make_train_at(9.0), 1.1),
                                 make_train_none());
    check_expect_train(__LINE__, move_train(make_train_at(1.0), -1.1),
                                 make_train_none());

    // entering the section
    check_expect_train(__LINE__, move_train(make_train_at(-0.1), 0.1),
                                 make_train_at(0.0));
    check_expect_train(__LINE__, move_train(make_train_at(10.1), -0.1),
                                 make_train_at(10.0));
}

6.1.9 Function body

Implement the function body. Put required helper functions on a wish list. These will be implemented later.

We only need check_expect_train defined above.

// Advances the train by the given amount. Considers the case
// that the train enters or leaves the critical section.
struct Train move_train(struct Train train, double amount) {
    if (train.tag == T_ABSENT) {
        return make_train_none();
    } else if (train.tag == T_PRESENT) {
        double new_position = train.position + amount;
        if (new_position < 0.0 || new_position > 10.0) {
            return make_train_none();
        } else {
            return make_train_at(new_position);
        }
    }
    return train;
}

The implementation handles the two cases of the Train type. If a train is present, the new position is calculated. It is then checked whether the new position falls outside the critical section. If so, this corresponds to the absence of the train in the critical section. Otherwise a train representing presence at the new position is returned.

6.1.10 Testing

Check that the function body satisfies the tests. Correct the function body (and the tests). Look for opportunities to simplify the structure of the code. This typically requires multiple iterations. The function does not have side effects.

int main(void) {
move_train_test();
return 0;
}

6.1.11 Review and revise

Review and revise the function name, the parameter names, and the purpose statement. Improve them if necessary. A design is not complete until it has a purpose statement and tests.

The above implementation should be improved. The start and end positions should be replaced by constants. Moreover, the if-statements can be replaced by a switch-statement.

const double START_POSITION = 0.0;
const double END_POSITION = 10.0;
// Advances the train by the given amount. Considers the case
// that the train enters or leaves the critical section.
struct Train move_train(struct Train train, double amount) {
    double new_position = train.position + amount;
    switch (train.tag) {
        case T_ABSENT:
            return make_train_none();
        case T_PRESENT:
            if (new_position < START_POSITION ||
                new_position > END_POSITION)
            {
                return make_train_none();
            } else {
                return make_train_at(new_position);
            }
    }
    return train;
}

If we are satisfied with the function, we can write a reusable template that can serve as a basis for future functions on this type.

const double START_POSITION = 0.0;
const double END_POSITION = 10.0;
... fn_for_train(struct Train train, ...) {
    switch (train.tag) {
        case T_ABSENT:
            ...
            return ...;
        case T_PRESENT:
            ...
            return ...;
    }
    return ...;
}

6.2 Exercises

Write a function that takes two values of type Train and tests whether they are in conflict with each other, i.e., whether they correspond to a state that is not allowed in the critical section.
Design a function that takes a value of type Train and a percentage value p and checks whether the train is located in the first p% of the critical section.

7 Recipe for Compound Data (Product Types)

Compound data types aggregate potentially different kinds of data into a whole. C uses structures (struct) to represent values that consist of multiple components. The components in turn can be atomic or themselves structured. As an example we use a 2-dimensional point with x- and y-coordinates as components. Compound data types are also called product types, because they form the Cartesian product of their components.

point.c serves as the example for compound data.

7.1 Steps

7.1.1 Problem statement

/*
Objects are located somewhere on a 2D plane. Design a function that computes the distance of the center of an object to the origin of the coordinate system.
*/

7.1.2 Data definition

Write how the domain information should be represented in the program, or, vice versa, write how to interpret the data (e.g., a pair of numbers) as real-world information (e.g., a point in 2D space). Write a formal data definition as a struct. Write a constructor for creating and initializing values of this type.

struct Point {
    double x; // represents the x-coordinate of a point
    double y; // represents the y-coordinate of a point
}; Forgetting the ; at the end of the struct definition is a syntax error.
// constructor function for Point
struct Point make_point(double x, double y) {
    struct Point p = { x, y };
    return p;
}

The constructor function simplifies the creation of values of type struct Point.

7.1.3 Function signature

Write down the function signature (as a comment). The parameter types go left of the arrow (comma separated). The result type goes right of the arrow.

// struct Point -> double

7.1.4 Function name

Conceive a preliminary function name. This should ideally be a short non-abbreviated name. You may revise the name and find a better name in the last step.

distance_to_origin

7.1.5 Function header

Add the function signature to the name. Come up with preliminary parameter names. These should ideally be short non-abbreviated names. This completes the header of the function.

double distance_to_origin(struct Point p);

7.1.6 Function stub

Write down the function stub, returning an arbitrary value from the range of the function. Check that the code compiles.

double distance_to_origin(struct Point p) {
return 0.0;
}

7.1.7 Purpose statement

Write down a purpose statement (given as a comment).

// Computes the distance from the given point
// to the origin of the coordinate system.

7.1.8 Examples and expected results

Write down examples with expected results in the test method. Define any constants that the function needs. Check that the code compiles. (Some tests will fail for the stub.)

Examples:

For point (0,0) expect a distance to origin of 0.
For point (1,0) expect a distance to origin of 1.
For point (-1,0) expect a distance to origin of 1.
For point (3,4) expect a distance to origin of 5.
etc.

Corresponding test cases in test function:

static void distance_to_origin_test() {
    check_within_d(distance_to_origin(make_point(0, 0)),
        0.0, EPSILON);
    check_within_d(distance_to_origin(make_point(1, 0)),
        1.0, EPSILON);
    check_within_d(distance_to_origin(make_point(-1, 0)),
        1.0, EPSILON);
    check_within_d(distance_to_origin(make_point(0, 2)),
        2.0, EPSILON);
    check_within_d(distance_to_origin(make_point(0, -2)),
        2.0, EPSILON);
    check_within_d(distance_to_origin(make_point(3, 4)),
        5.0, EPSILON);
    check_within_d(distance_to_origin(make_point(3, -4)),
        5.0, EPSILON);
}

The test function shows multiple aspects. First, the constructor function make_point creates values of type Point. Second, because floating-point values are not exact, the check... function does not test for equality, but whether the function result is sufficiently close to the expected result. EPSILON is a constant of type double (e.g., 0.00000001) that specifies the allowed tolerance.

7.1.9 Function body

Implement the function body. Put required helper functions on a wish list. These will be implemented later.

The implementation of the example function does not need helper functions.

// Computes the distance from the given point
// to the origin of the coordinate system.
double distance_to_origin(struct Point p) {
// square root (sqrt) function header included from math.h
return sqrt(p.x * p.x + p.y * p.y);
}

The function inspects the components of the structure to compute the result value. The dot operator (.) is used to access a component value (e.g., x) of the compound value (p).

7.1.10 Testing

Check that the function body satisfies the tests. Correct the function body (and the tests). Look for opportunities to simplify the structure of the code. This typically requires multiple iterations.

int main(void) {
distance_to_origin_test();
return 0;
}

7.1.11 Review and revise

7.2 Type definitions and structures

If a compound value is defined as

struct Point {
double x; // represents the x-coordinate of a point
double y; // represents the y-coordinate of a point
};

it is necessary to use the keyword struct to declare variables of this type. The constructor function for type Point is an example:

struct Point make_point(double x, double y) {
struct Point p = { x, y };
return p;
}

The keyword struct is used in the return type and in the type definition of local variable p. You can avoid this by adding the following type definition after the structure definition (or by combining structure declarations and type definitions).

typedef struct Point Point;

In general:

typedef <my complicated type> NewNameForComplicatedType;

Now you can simply write Point instead of struct Point. The name of the structure and the name of the defined type may be identical, as is the case here.

7.3 Exercises

Design a function distance_points that computes the distance between two points.
Design a function circles_overlap that tests whether two circles overlap. Define a suitable data definition for circles.
Design a function and a data definition for points that are represented in polar coordinates. Search online for the definition of polar coordinates.

8 Recipe for Variant Data (Sum Types)

Sometimes, data can take on one of different variants. An example are points that may be represented either in Euclidean coordinates (x, y) or in polar coordinates (theta, magnitude). In the latter case, magnitude specifies the distance from the origin of the coordinate system and theta specifies the direction with respect to the positive x-axis. So, (0°, 3) is equivalent to (3, 0) in Euclidean coordinates, (90°, 2) is equivalent to (0, 2), and (45°, sqrt(2)) is equivalent to (1, 1).

A data type that can represent one of different choices is called a variant, a (tagged) union, or a sum type. In C, such types are represented as unions. The above example can be modeled as a union with two alternatives: PolarPoint or EuclideanPoint.

point_euclid_polar.c serves as the example for variant data.

8.1 Steps

8.1.1 Problem statement

/*
Points on the 2D plane may be given either in Euclidean coordinates or in polar coordinates. Design a function that computes the distance of such a point to the origin of the coordinate system.
*/

8.1.2 Data definition

First, enumerate the different variants:

enum PointTag {
TPointEuclid, // tag for a point in Euclidean coordinates
TPointPolar // tag for a point in polar coordinates
};

Then, define the variant as a tagged union. The tag is necessary to distinguish beetween variants. The union will store one of the embedded structs. (Components of a union overlap in memory.)

struct Point {
    enum PointTag tag; // indicate which variant follows
    union {
        // either the Euclidean variant...
        struct {
            double x;
            double y;
        };
        // ... or the polar variant
        struct {
            double theta;
            double magnitude;
        };
    };
};

Finally, define constructor functions for each of the variants:

// constructor for PointEuclid
struct Point make_point_euclid(double x, double y) {
    struct Point p;
    p.tag = TPointEuclid;
    p.x = x;
    p.y = y;
    return p;
}

// constructor for PointPolar
struct Point make_point_polar(double t, double m) {
    struct Point p;
    p.tag = TPointPolar;
    p.theta = t;
    p.magnitude = m;
    return p;
}

The data definition above uses anonymous structs (structs without a name). Alternatively, you may define a named struct for each variant and then put them in a union, like so:

struct EuclideanPoint {
    double x;
    double y;
};

struct PolarPoint {
    double theta;
    double magnitude;
};

struct Point {
    enum PointTag tag; // indicate which variant follows
    union {
        // either the Euclidean variant...
        struct EuclideanPoint euclid;
        // ... or polar variant
        struct PolarPoint polar;
    };
};

struct Point make_point_euclid(double x, double y) {
    struct Point p;
    p.tag = TPointEuclid;
    p.euclid.x = x;
    p.euclid.y = y;
    return p;
}

struct Point make_point_polar(double t, double m) {
    struct Point p;
    p.tag = TPointPolar;
    p.polar.theta = t;
    p.polar.magnitude = m;
    return p;
}

The key difference between structures and unions is that a structure contains each of its members, whereas a union only contains only one of its members (the one last set).

The rest of this chapter uses the first data definition (anonymous structs).

8.1.3 Function signature

Write down the function signature (as a comment). The parameter types go left of the arrow (comma separated). The result type goes right of the arrow.

// struct Point -> double

8.1.4 Function name

Conceive a preliminary function name. This should ideally be a short non-abbreviated name. You may revise the name and find a better name in the last step.

distance_to_origin

8.1.5 Function header

Add the function signature to the name. Come up with preliminary parameter names. These should ideally be short non-abbreviated names. This completes the header of the function.

double distance_to_origin(struct Point p);

8.1.6 Function stub

Write down the function stub, returning an arbitrary value from the range of the function. Check that the code compiles.

double distance_to_origin(struct Point p) {
return 0.0;
}

8.1.7 Purpose statement

Write down a purpose statement (given as a comment).

// Computes the distance from the given point
// to the origin of the coordinate system.

8.1.8 Examples and expected results

Write down examples with expected results in the test method. Define any constants that the function needs. Check that the code compiles. (Some tests will fail for the stub.)

static void distance_to_origin_test() {
    // test cases for polar variant
    check_within_d(
        distance_to_origin(make_point_polar(0.0, 0.0)),
        0.0, EPSILON);
    check_within_d(
         distance_to_origin(make_point_polar(0.0, 1.0)),
        1.0, EPSILON);
    check_within_d(
        distance_to_origin(make_point_polar(2.3, 2.0)),
        2.0, EPSILON);

    // test cases for Euclidean variant
    check_within_d(
        distance_to_origin(make_point_euclid(0.0, -2.0)),
        2.0, EPSILON);
    check_within_d(
        distance_to_origin(make_point_euclid(2.0, 0.0)),
        2.0, EPSILON);
    check_within_d(
        distance_to_origin(make_point_euclid(1.0, 1.0)),
        sqrt(2.0), EPSILON); // square root
}

It is important to generate test cases for each variant.

8.1.9 Function body

Implement the function body. Put required helper functions on a wish list. These will be implemented later.

The implementation of the example function does not need helper functions.

// Computes the distance from the given point
// to the origin of the coordinate system.
double distance_to_origin(struct Point p) {
    if (p.tag == TPointEuclid) {
        return sqrt(p.x * p.x + p.y * p.y); // square root
    } else if (p.tag == TPointPolar) {
        return p.magnitude;
    }
    return 0.0;
}

The function takes different strategies to compute the distance of the point to the root, depending on whether the point is given in Euclidean or in polar coordinates. The tag is used to distinguish these cases.

To distinguish between the variants, the switch-statement is a good alternative to if-statements.

double distance_to_origin(struct Point p) {
    switch (p.tag) {
        case TPointEuclid:
            return sqrt(p.x * p.x + p.y * p.y); // square root
        case TPointPolar:
            return p.magnitude;
    }
    return 0;
}

Since the tag is an enumeration, the compiler can check whether the switch-statement handles all the cases of the enumeration. In C an enumeration value is just an integer. A careless programmer could set an invalid tag value (e.g., -123). As long as only the constructors make_point_euclid and make_point_polar are used to create points, invalid tags are not an issue.

double distance_to_origin(struct Point p) {
    switch (p.tag) {
        case TPointEuclid:
            ...
            break; <--- leave the switch-statement!
        case TPointPolar:
            ...
            break; <--- leave the switch-statement!
    }
    // <--- the breaks jump to this point after the switch statement
    ...
    return ...;
}

Moreover, the cases in the switch-statement have to be constant integer values. Never put anything else there! If you are unsure, use if-statements!

Given this information, we can write a reusable template for functions that handle points:

... fn_for_points(struct Point p) {
    switch (p.tag) {
        case TPointEuclid:
            ...
            return... p.x... p.y...;
        case TPointPolar:
            ...
            return... p.theta... p.magnitude...;
    }
    return ...;
}

This template may be copied, pasted, and adapted for new functions that operate on this data type.

8.1.10 Testing

Check that the function body satisfies the tests. Correct the function body (and the tests). Look for opportunities to simplify the structure of the code. This typically requires multiple iterations.

int main(void) {
distance_to_origin_test();
return 0;
}

8.1.11 Review and revise

8.2 Exercises

Write a data definition for a type that can represent one of miles or kilometers. Use this data definition to represent distances. Design a function convert_distance that converts a distance to the other unit.
Write a data definition for a type Shape that can represent one of a rectangle, a circle, or an equilateral triangle. Write a function that computes the area of a shape.

9 Doing Things Repeatedly

It is often necessary, to repeatedly execute some code. C provides multiple language constructs for doing so, such as the while loop, the for loop, and the do-while loop. Such language constructs are not strictly necessary. In principle repeated execution can also be achieved through recursion. This involves functions that call themselves, either directly or indirectly.

Repetition raises the question of termination, i.e., the question whether the repetition will eventually stop. Termination is controlled through a condition that is tested on each iteration or recursive call, respectively.

Repetition also requires thinking about boundary cases. It is important to pay attention to the very first and the very last iteration.

9.1 While Loop

While loops consist of the keyword while followed by a condition in parentheses (...) followed by the loop’s body, a block of statements in curly braces {...}. The body may instead be a single statement.

...
statement_0;
while (condition) {
    statement_1;
    ...
    statement_n;
}
statement_x
...

While loops first test whether the condition is true. If so, the loop’s body is executed. A single execution of the body is denoted as an iteration. If the condition is false, the while loop terminates and the first statement after the while loop (statement_x) is executed. After each iteration the condition is tested again. While the condition is true, the loop continues (this is why it is called a while loop). If the condition is initially false, the body is never executed.

For a loop to terminate, the condition has to be initially false or some variable has to change to make the condition false after a number of iterations. This changing variable needs to be tested in the loop’s condition. A typical pattern is to increase the variable with each iteration until some upper bound is reached, at which point the loop terminates. Here is a simple example:

int i = 0;
while (i < 3) {
printiln(i); // print the value of i, followed be a line break
i = i + 1;
}

Variable i is initialized to 0. The condition 0 < 3 is true, so the first iteration starts. The printiln(0) statement gets executed. The symbol 0 appears as output. i is set to 1. The first iteration ends. As a reminder, the assignment statement = proceeds as follows: The expression i + 1 on its right-hand side gets evaluated to 0 + 1 = 1. This value is then assigned to variable i. After the first iteration has finished, the condition is evaluated again, and so on.

The execution can be illustrated as follows. The assignment statement separates the steps.

int i = 0;
=> initialize variable i with value 0
while (0 < 3) {
    printi(0);    // outputs 0
    i = 0 + 1;
}
=> assign value 1 to variable i
while (1 < 3) {
    printi(1);    // outputs 1
    i = 1 + 1;
}
=> assign value 2 to variable i
while (2 < 3) {
    printi(2);    // outputs 2
    i = 2 + 1;
}
=> assign value 3 to variable i
while (3 < 3)     // false, block {...} not executed
                  // loop terminates

What happens if we remove the last statement of the loop body (i = i + 1;)? Let’s illustrate the initial iterations.

int i = 0;
=>
while (0 < 3) {
    printi(0);    // outputs 0
}
=>
while (0 < 3) {
    printi(0);    // outputs 0
}
=>
while (0 < 3) {
    printi(0);    // outputs 0
}
=>
while (0 < 3) {
    printi(0);    // outputs 0
}
=>
while (0 < 3) {
    printi(0);    // outputs 0
}
...

The state of i never changes. The same condition is tested over and over again. This is an example of the infamous infinite loop.

The following example prints the multiples of 3 starting from 30 and going down to 3. Please try to implement this yourself before reading on.

The solution is:

int i = 30;
while (i >= 3) {
printiln(i); // print the value of i, followed be a line break
i = i - 3;
}

We again show how the execution proceeds:

int i = 30;
while (30 >= 3) {
    printiln(30);    // outputs 30
    i = 30 - 3;
}
=>
while (27 >= 3) {
    printiln(27);    // outputs 27
    i = 27 - 3;
}
=>
while (24 >= 3) {
    printiln(24);    // outputs 24
    i = 24 - 3;
}
=>
while (21 >= 3) {
    printiln(21);    // outputs 21
    i = 21 - 3;
}
=>
while (18 >= 3) {
    printiln(18);    // outputs 18
    i = 18 - 3;
}
=>
while (15 >= 3) {
    printiln(15);    // outputs 15
    i = 15 - 3;
}
=>
while (12 >= 3) {
    printiln(12);    // outputs 12
    i = 12 - 3;
}
=>
while (9 >= 3) {
    printiln(9);    // outputs 9
    i = 9 - 3;
}
=>
while (6 >= 3) {
    printiln(6);    // outputs 6
    i = 6 - 3;
}
=>
while (3 >= 3) {
    printiln(3);    // outputs 3
    i = 3 - 3;
}
=>
while (0 >= 3)     // false, body not executed
                   // loop terminates

9.2 For Loop

The for loop is equivalent to the while loop, except that it integrates initialization, condition, and update all in its header:

for (initialization; condition; update) {
    statement_1;
    ...
    statement_n;
}

The while loop

int i = 30;
while (i >= 3) {
printiln(i);
i = i - 3;
}

is equivalent (except for the scope of i) to this for loop

for (int i = 30; i >= 3; i = i - 3) {
printiln(i);
}

It requires fewer lines of code, but one has to memorize where the initialization, run condition, and update statements go. for loops are very popular among C programmers (but not among functional programmers, but that’s another story).

9.3 Recursion

Looping constructs such as while and for loops are not strictly necessary to let a program do things repeatedly. Another possibility is to put the steps that need to be repeated into a function that calls itself a certain number of times.

For example, this while loop

int i = 0;
while (i < 3) {
printi(i);
i = i + 1;
}

may be replaced by this function call

while_loop(0);

and this function (which calls itself)

void while_loop(int i) {
    if (i < 3) {
        printi(i);
        while_loop(i + 1); // function calls itself
                           // with another argument
    }
}

Expanding the recursive calls leads to the following steps:

while_loop(0);
=> the call expands to the body of function while_loop with every occurrence of parameter i replaced by argument 0
if (0 < 3) {
    printi(0);
    while_loop(0 + 1);
}
=> condition 0 < 3 is true...
if (true) {
    printi(0);
    while_loop(0 + 1);
}
=> ... thus the if-statement expands to its then-block
printi(0);
while_loop(0 + 1);
=> the call expands to the body of function while_loop with i replaced by 1
printi(0);
if (1 < 3) {
    printi(1);
    while_loop(1 + 1);
}
=> condition 1 < 3 is true...
printi(0);
if (true) {
    printi(1);
    while_loop(1 + 1);
}
=> ... thus the if-statement expands to its then-block
printi(0);
printi(1);
while_loop(1 + 1);
=> the call expands to the body of function while_loop with i replaced by 2
printi(0);
printi(1);
if (2 < 3) {
    printi(2);
    while_loop(2 + 1);
}
=> condition 2 < 3 is true...
printi(0);
printi(1);
if (true) {
    printi(2);
    while_loop(2 + 1);
}
=> ... thus the if-statement expands to its then-block
printi(0);
printi(1);
printi(2);
while_loop(2 + 1);
=> the call expands to the body of function while_loop with i replaced by 3
printi(0);
printi(1);
printi(2);
if (3 < 3) {
    printi(2);
    while_loop(2 + 1);
}
=> condition 3 < 3 is false...
printi(0);
printi(1);
printi(2);
if (FALSE) {
    printi(2);
    while_loop(2 + 1);
}
=> ... so the if-statement disappears
printi(0);
printi(1);
printi(2);

The while loop and the recursive call above produce the same result, but are not necessarily equivalent in terms of efficiency. The above steps help to understand how the recursion proceeds. It is not an accurate depiction of what is going on internally. We will discuss these details later.

9.4 Termination and Boundaries

As mentioned above, iteration requires checking whether the loop eventually terminates. The loop terminates if its run condition gets false. This can only happen through changes in the loop body (or the condition was initially false; or other events modify state, which we do not consider here). You should check, whether the contents of the loop body (or function body in the case of recursion) have the potential to finally make the run condition false. Otherwise the iteration will never terminate. Moreover, it is important to check the values of the variables inside a loop body on the first iteration. For each variable that appears in the iteration condition, check its initial value. The condition gives you additional information: The condition is true when entering the loop. It is false immediately after the loop (this might be different if a break statement is used, which we discuss later).

int i = 30;
while (i >= 3) {
    // here: i >= 3     (because loop condition is true here)
    // here: i <= 30    (because of initialization and update of i)
    printiln(i);
    i = i - 3;
}
// here: i < 3    (because loop condition is false here)

Given this loop it is easy to see that in the first iteration the value of i is 30. This follows directly from i’s initialization. Obtaining i’s value in the last iteration is more difficult. However, the loop condition helps: It is true at the start of the loop’s body, i.e. i >= 3. Combining this fact with the knowledge about the initialization and the update statement, we know that at the start of the loop body the value of i must be a in the interval [3, 30].

It is very easy (and common) to make errors with loops, in which the loop variable is off by one (e.g., goes one step too far). Hence always think through the value in the first iteration and what is implied by the loop condition. To practice thinking about edge cases that occur at iteration boundaries we introduce the concept of an assertion.

int i = 30;
while (i >= 3) {
    assert(i >= 3);  // from loop condition being true here
    assert(i <= 30); // from initialization and update
    printiln(i);
    i = i - 3;
}
assert(i < 3); // from loop condition being false here

The assert call will silently pass if the condition is true, otherwise it will output the line number of the assertion and terminate the program. For example

assert(1 == 2); // assume that this is on line 123

outputs

Line 123: assertion violated

The difference between assertions and test cases (check_expect_...) is that assertions state some abstract invariant truth about the state of execution at some point, whereas test cases are concrete examples of input data and expected results. Moreover, a fulfilled assertion does not produce output, a violated one stops the program. A fulfilled test case produces positive output, a violated test case produces negative output but does not stop the program so that other test cases will be executed, too.

We are now ready to formulate a design recipe for loops. It works identically with for loops and while loops.

9.5 Recipe for While Loops

Start with the following template to express an iteration through while loops. The template and the following example are contained in while_30_3_squared.c.

Template for while loops:

#include "base.h"

int main(void) {
    // purpose
    int i = initial_value;
    while (condition_on_i) {
        assert(condition_on_i);
        assert(condition_on_i_init_update);
        printiln(i); // for inspecting the iteration variable
        // statements ...i...;
        update i;
    }
    assert(!(condition_on_i)); // inverse of condition (! for "not")
    return 0;
}

Formulate a purpose statement for the iteration. Describe the range and order of values to be iterated over. If possible, name the first and the last value. Then describe the effect of statements on each iterated value or on the iterated values as a whole.

// Iterates over the multiples of 3, from 30 down to 3.
// Outputs the square of each of these values.

If necessary, change the name of the iteration variable i. If necessary, change the type of the iteration variable (e.g., to double). Fill in the initial value and the run condition according to the purpose statement. The run condition needs to appear in three places.

// Iterates over the multiples of 3, from 30 down to 3.
// Outputs the square of each of these values.
int i = 30;
while (i >= 3) {
    assert(i >= 3);
    assert(condition_on_i_init_update);
    printiln(i); // for inspecting the iteration variable
    // statements ...i...;
    update i;
}
assert(!(i >= 3)); // inverse of condition (! for "not")

Now define the update statement for the iteration variable i. Think about how the second value of the iteration variable is generated from its initial value. Then think about how in general the next value of the iteration variable is produced from the previous one. The purpose statement of the example states that we need the multiples of 3. 30 is a multiple of 3. The next smaller multiple of 3 is 27. The next smaller multiple of 3 is 24. In general, if i is a multiple of 3, then (i - 3) is the next smaller multiple of 3. This leads to the update statement

i = i - 3;

Now the missing assertion condition can be completed. This condition follows from the initialization of the iteration variable and the update statement.

// Iterates over the multiples of 3, from 30 down to 3.
// Outputs the square of each of these values.
int i = 30;
while (i >= 3) {
    assert(i >= 3);
    assert(i <= 30);
    printiln(i); // for inspecting the iteration variable
    // statements ...i...;
    i = i - 3;
}
assert(!(i >= 3)); // inverse of condition (not)

Now the iteration mechanics is complete. Compile the code and check the output. The output should correspond to the sequence described in the purpose statement. The example produces:

30
27
24
21
18
15
12
9
6
3

The purpose statement says that each iterated value should be squared. To achieve this replace statements ...i... from the template with the implementation. Outputting the square of each iterated value is simply

printiln(i * i); // square the iterated values

so adding this statement yields the solution.

// Iterates over the multiples of 3, from 30 down to 3.
// Outputs the square of each of these values.
int i = 30;
while (i >= 3) {
    assert(i >= 3);
    assert(i <= 30);
    // printiln(i); // for inspecting the iteration variable
    printiln(i * i); // square the iterated values
    i = i - 3;
}
assert(!(i >= 3)); // inverse of condition (not)

and the output is:

900
729
576
441
324
225
144
81
36
9

9.6 Recipe for For Loops

The recipe described in the previous section also applies to for loops. This is the template for for loops. The template and example are contained in for_30_3_squared.c.

Template for for loops:

#include "base.h"

int main(void) {
    // purpose
    for (int i = initial_value; condition_on_i; update i) {
        assert(condition_on_i);
        assert(condition_on_i_init_update);
        printiln(i); // for inspecting the iteration variable
        // statements ...i...;
    }
    return 0;
}

Since i is declared inside the loop header, it is not available outside the loop. The iteration variable may alternatively be declared outside the for loop :

int main(void) {
    // purpose
    int i = 0; // declaration outside of for loop
    for (i = initial_value; condition_on_i; update i) {
        assert(condition_on_i);
        assert(condition_on_i_init_update);
        printiln(i); // for inspecting the iteration variable
        // statements ...i...;
    }
    assert(!(condition_on_i)); // inverse of condition (not)
    return 0;
}

9.7 Aggregation with Loops

Often, the purpose of iteration is to produce some aggregated value, such as a sum or an average. The following example shows how to compute the sum of the squares of the first ten positive integers.

Start with the template for while loops. The example is contained in sum_1_10_squared.c.

#include "base.h"

int main(void) {
    // purpose
    int i = initial_value;
    while (condition_on_i) {
        assert(condition_on_i);
        assert(condition_on_i_init_update);
        printiln(i); // for inspecting the iteration variable
        // statements ...i...;
        update i;
    }
    assert(!(condition_on_i)); // inverse of condition (not)
    return 0;
}

Formulate a purpose statement:

// Iterates over the integers 1 to 10.
// Computes the sum of squares of these values.

Set the initial value (1), condition (i <= 10), update statement (i = i + 1;) and second assertion condition (i >= 1):

// Iterates over the integers 1 to 10.
// Computes the sum of squares of these values.
int i = 1;
while (i <= 10) {
    assert(i <= 10);
    assert(i >= 1);
    printiln(i); // for inspecting the iteration variable
    // statements ...i...;
    i = i + 1;
}
assert(!(i <= 10)); // inverse of condition (not)

Compile and check the output:

1
2
3
4
5
6
7
8
9
10

The sum is computed by adding up the squares of the iterated values one by one. The implementation needs an additional variable (sum) to hold the intermediary (and final) result of the sum.

// Iterates over the integers 1 to 10.
// Computes the sum of squares of these values.
int sum = 0; // <--- new variable for intermediary (and final) result
int i = 1;
while (i <= 10) {
    assert(i <= 10);
    assert(i >= 1);
    // printiln(i); // for inspecting the iteration variable
    sum = sum + i * i; // <--- adding the next squared value
    i = i + 1;
}
assert(!(i <= 10)); // inverse of condition (not)
prints("sum = "); // <--- output
printiln(sum);    // <--- output (value and newline)

The program produces

sum = 385

9.8 Recipe for Repetition through Recursion

Repetition through recursion is not very common in C. There are several reasons for this, one of which is efficiency. It is still useful to study this case, as it helps to understand algorithms on recursive data structures. The template for repetition through recursion is as follows. The template and example are contained in rec_30_3_squared.c.

Template for repetition through recursion (no return value):

#include "base.h"

// purpose
// Should only be called from repeat (or from itself).
void repeat_rec(int i) {
    if (condition_on_i) {
        assert(condition_on_i);
        printiln(i); // for inspecting the iteration variable
        // statements ...i...;
        repeat_rec(update_i); // function calls itself
                              // with another argument
    }
}

void repeat(void) {
    repeat_rec(initial_value);
}

int main(void) {
    repeat();
    return 0;
}

Function repeat acts as an initialization function and is only called once. Function repeat_rec is designed to be called from repeat and calls itself recursively. This is a common pattern for recursive functions.

Reusing an example from above, we come to the following recursive formulation:

// Iterates over the multiples of 3, from 30 down to 3.
// Outputs the square of each of these values.
// Should only be called from repeat (or from itself).
void repeat_rec(int i) {
    if (i >= 3) {
        assert(i >= 3);
        // printiln(i); // for inspecting the iteration variable
        printiln(i * i);
        repeat_rec(i - 3); // function calls itself
                           // with another argument
    }
}

void repeat(void) {
    repeat_rec(30);
}

int main(void) {
    repeat();
    return 0;
}

The recursive template is problematic. It defines a function, yet does not return a value. Its return type is void. If we wish to aggregate values during recursion and return the result, the following template is applicable.

Template for repetition through recursion (with return value):

// purpose
// Should only be called from repeat (or from itself).
int repeat_rec(int i) {
    if (condition) {
        assert(condition);
        // printiln(i); // for inspecting the iteration variable
        int contribution_of_i = ...;
        return conbribution_of_i <op> repeat_rec(update_value);
    } else {
        return initial_value_aggregation;
    }
}

int repeat(void) {
    return repeat_rec(initial_value);
}

int main(void) {
    printiln(repeat());
    return 0;
}

To make this concrete, let us modify an example from above (rec_sum_1_3_squared.c):

// Iterates over the integers 1 to 3.
// Computes the sum of squares of these values.
// Should only be called from sum_squares (or from itself).
int sum_squares_rec(int i) {
    if (i <= 3) {
        assert(i <= 3);
        // printiln(i); // for inspecting the iteration variable
        int contribution_of_i = i * i;
        return contribution_of_i + sum_squares_rec(i + 1);
    } else {
        return 0;
    }
}

int sum_squares(void) {
    return sum_squares_rec(1);
}

int main(void) {
    prints("sum = ");
    printiln(sum_squares());
    return 0;
}

An alternative is to make the accumulation of the values explicit by adding an accumulator as a second argument of the recursive function:

// Iterates over the integers 1 to 3.
// Computes the sum of squares of these values.
// Should only be called from sum_squares (or from itself).
int sum_squares_rec(int i, int acc) {
    if (i <= 3) {
        // printiln(i); // for inspecting the iteration variable
        return sum_squares_rec(i + 1, acc + i * i);
    } else {
        return acc;
    }
}

int sum_squares(void) {
    // initial values of iteration variable and accumulator
    return sum_squares_rec(1, 0);
}

This variant makes very clear how values are transferred from recursive call to recursive call. To clarify this point, we show the expansion of the recursive calls.

sum_squares();
=>
sum_squares_rec(1, 0);
=> the call expands to the function body with 1 for i and 0 for acc
if (1 <= 3) {
    return sum_squares_rec(1 + 1, 0 + 1 * 1);
} else {
    return 0;
}
=> 1 <= 3 is true, thus the if-statement expands to the then-case
return sum_squares_rec(1 + 1, 0 + 1 * 1);
=> the call expands to the function body with 2 for i and 1 for acc
if (2 <= 3) {
    return sum_squares_rec(2 + 1, 1 + 2 * 2);
} else {
    return 1;
}
=> 2 <= 3 is true, thus the if-statement expands to the then-case
return sum_squares_rec(2 + 1, 1 + 2 * 2);
=> the call expands to the function body with 3 for i and 5 for acc
if (3 <= 3) {
    return sum_squares_rec(3 + 1, 5 + 3 * 3);
} else {
    return 5;
}
=> 3 <= 3 is true, thus the if-statement expands to the then-case
return sum_squares_rec(3 + 1, 5 + 3 * 3);
=> the call expands to the function body with 4 for i and 14 for acc
if (4 <= 3) {
    return sum_squares_rec(4 + 1, 14 + 4 * 4);
} else {
    return 14;
}
=> 4 <= 3 is false, thus the if-statement expands to the else-case
return 14;
=> which finally yields the value 14
14

In some languages this variant is as efficient as non-recursive loops, because no information from the previous call has to be remembered. For example, gcc performs tail call optimization in this situation, if you use the compiler flag -O2.

The rest of this section is optional material.

Here are some sample timing results for n = 100’000 without option -O2:

Recursive, without accumulator:
sum = 1626540144
time: 0.674 ms

Recursive, with accumulator:
sum = 1626540144
time: 0.548 ms

Equivalent while loop:
sum = 1626540144
time: 0.249 ms

Here are some sample timing results for n = 100’000 with option -O2 to enable tail call optimization:

Recursive, without accumulator:
sum = 1626540144
time: 0.002 ms

Recursive, with accumulator:
sum = 1626540144
time: 0.002 ms

Equivalent while loop:
sum = 1626540144
time: 0.002 ms

9.9 Exercises

Write a while loop that computes the mean of all positive, even integers up to 100.
Write a recursive function with accumulator that computes the mean of all positive, even integers up to 100.
Three assertions are given with loops. Explain why these assertions hold.

10 Arrays: Fixed-Size Collections

Up to now we have only dealt with fixed-size data. These data were either atomic (like integers) or compound (like structures), but each individual item was explicitly named in the data type definition. This is inconvenient for representing collections of items of the same kind. An example would be a sequence of 24 temperature values in degrees Celsius, measured at each hour of the same day. This information can be represented as a fixed-size collection. When such a collection is created, we have to provide its size. Once created, the size does not change.

There are multiple possibilities for representing collections of items of the same type in a program. We start with arrays, which have a fixed number of elements. In the next chapter we will see lists, which are collections with a variable number of elements. The unifying characteristic of collections is that they treat a number of elements as a whole. Operations on such collections may then be applied to each element, which is a new form of abstraction.

Arrays store elements of a certain type. The elements can be retrieved by their index position. The first element has index 0, the second element has index 1, and so on. To simplify working with C arrays we introduce a set of functions. The documentation of these functions is located in arrays_lists.h, int_array.h, double_array.h, and string_array.h, respectively. Test examples and function implementations can be found in the corresponding .c files.

This chapter does not initially introduce new language concepts, but uses the provided array functions. Later we will discuss built-in C arrays, which pose some difficulties for beginners. Using this strategy we hope that you become familiar with working with fixed-size collections of data, before tackling details of the C language. The focus is on design recipes for processing fixed-size collections. For example, for a sequence of temperature values we might be interested in the average temperature, the minimum or the maximum temperature, or we might want to convert these values from degrees Celsius to degrees Fahrenheit.

The elements of arrays are typically processed using while and for loops, as introduced in the previous chapter.

10.1 Temperatures

Imagine that you take the temperatures outside the main building of Hannover University. You take measurements (in °C) at each full hour (the first one at 0:00 and the last one at 23:00). This is the result:

7, 6, 5, 6, 6, 5, 6, 6, 6, 7, 7, 9, 10, 11, 11, 12, 13, 12, 10, 9, 9, 8, 7, 4

This information can be written in a C program as (see array_temperatures.c):

/*
Compile: make array_temperatures
Run: ./array_temperatures
*/

#include "base.h"
#include "arrays_lists.h"

int main(void) {
    Array temps = ia_of_string( // create int array from string
        "7, 6, 5, 6, 6, 5, 6, 6, 6, 7, 7, 9, 10,"
        "11, 11, 12, 13, 12, 10, 9, 9, 8, 7, 4");
    ia_println(temps); // output the whole int array
    printiln(a_length(temps)); // output length of int array
    printiln(get(int, temps, 0)); // get first value of int array
    printiln(get(int, temps, 23)); // get last value of int array
    a_free(temps); // release memory of int array
    return 0;
}

Compiling and running this program yields this output:

[7 6 5 6 6 5 6 6 6 7 7 9 10 11 11 12 13 12 10 9 9 8 7 4]
24
7
4

Each individual temperature is represented as an integer. The collection of 24 temperatures is represented as an array of integers.

The function ia_of_string takes the sequence of temperatures as a string and produces a corresponding array. (The prefix ia_ stands for “integer array.”) Characters in the string that are not digits just separate numbers. The produced array is assigned to the temps variable. Function ia_println prints the array (first line of output), followed by a line break. Function a_length returns the number of elements in the array. Function printiln prints this value (i stands for “integer”, ln stands for “line break”) and produces the second line of output. Function get returns the array element of the given type at the given index position. The values at index positions 0 (first element) and 23 (last element) are output as the third and fourth line. Function a_free frees the memory used by the array. This function is not strictly necessary here. If an array is no longer needed, this function should be called in the way shown. We will discuss the details later.

Had we tried to access a non-existing index, e.g.,

printiln(get(int, temps, 24));

the output would have been

a_get: index 24 is out of range (array length: 24, allowed indices: 0..23)

Imagine we now wish to find out whether the array contains any freezing temperatures. To solve this problem we apply the design recipes for functions defined earlier and then focus on how to process the array in a loop.

Problem statement
The function to implement can be generalized a bit beyond arrays of 24 temperatures. We wish to write a function that is applicable to any array of integers, no matter how long and how to interpret the data. The function we want should just check, whether the array contains at least one negative value.
Data definition
The temperatures are defined as an array of integers. Each element represents a temperature in degrees Celsius. Each array index stands for a full hour, e.g., index 0 represents 0:00, index 6 represents 6:00.
Function signature
The function we look for takes an integer array and returns a Boolean value.
// Array(int) -> bool
Function name
contains_negative
Function header
bool contains_negative(Array array);
Function stub
bool contains_negative(Array array) {
return false;
}
Purpose statement
// Returns true if the array contains negative values,
// otherwise returns false.

Examples and expected results (test cases)

static void contains_negative_test(void) {
    Array array;

    array = ia_of_string("10, 20, 30");
    check_expect_b(contains_negative(array), false);
    a_free(array);

    array = ia_of_string("0, 0, 1, 1, 1");
    check_expect_b(contains_negative(array), false);
    a_free(array);

    array = ia_of_string("-1, -3");
    check_expect_b(contains_negative(array), true);
    a_free(array);

    array = ia_of_string("1, 3, -99");
    check_expect_b(contains_negative(array), true);
    a_free(array);

    array = ia_of_string("-1, 3, 99");
    check_expect_b(contains_negative(array), true);
    a_free(array);

    array = ia_of_string("");
    check_expect_b(contains_negative(array), false);
    a_free(array);
}

int main(void) {
    contains_negative_test();
    return 0;
}

At this stage the program compiles, but some of the tests will fail when running the program.

Function body

Now we implement the body of contains_negative. Clearly, we have to look at the individual values in the array to check for a negative one. We stop if we have found a negative value or looked at all of the values. This kind of task can be done with a for loop. We use the recipe from the previous chapter to start developing the algorithm.

// Returns true if the array contains negative values,
// otherwise returns false.
bool contains_negative(Array array) {
    // <from recipe for loops>
    // purpose
    for (int i = initial_value; condition_on_i; update_i) {
        assert(condition_on_i);
        assert(condition_on_i_init_update);
        printiln(i); // for inspecting the iteration variable
        // statements ...i...;
    }
    return false;
}

The first step is to formulate the purpose statement of the iteration. It consists of two parts, the first referring to the iteration (what indices in what order), the second referring to the operation on each iterated element: (1) Describe the range and order of elements to be iterated over. (2) Describe the effect of the loop body on each iterated value.

// Iterates over the elements of the array (any order).
// For each array element, checks whether it is negative.
// If so returns true, otherwise continues
// until all of the elements have been inspected.
// If there is no negative value, answers false.

Iterating over the elements requires iterating over their indices in the array. In the temperature example, the allowed index positions are 0 (for the first element), 1 (for the second element), up to 23 (for the last element). It is a general convention in C (and in many other programming languages) to start counting array indices at 0. Thus, we replace initial_value with 0 and update_i with i = i + 1. The run condition depends on the length of the array. The last allowed index is 23 for a length of 24 or, in general, n - 1 for length n. condition_on_i thus becomes i <= n - 1 or equivalently i < n. Make sure you understand why the last two expressions are equivalent for integer variables. We are ready to take the next step of the recipe and put in values and expressions for these placeholders:

bool contains_negative(Array array) {
    // Iterate over the elements of the array (any order).
    // ...
    for (int i = 0; i < a_length(array); i = i + 1) {
        assert(i < a_length(array));
        assert(i >= 0);
        printiln(i); // for inspecting the iteration variable
        // statements ...i...;
    }
    return false;
}

Now, compile the program and run it. For the above test cases, the output should be these indices and test results (possibly with different line numbers):

0
1
2
Line  24: check passed
0
1
2
3
4
Line  28: check passed
0
1
Line  32: Actual value false differs from expected value true.
0
1
2
Line  36: Actual value false differs from expected value true.
0
1
2
Line  40: Actual value false differs from expected value true.
Line  44: check passed

To simplify the discussion, we call contains_negative with the temps array given above. The indices are then

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

However, we are not interested in these indices per se, but in the array elements. To see them, replace

printiln(i); // for inspecting the iteration variable

with

printiln(get(int, array, i));

to obtain

7
6
5
6
6
5
6
6
6
7
7
9
10
11
11
12
13
12
10
9
9
8
7
4

Which are indeed the elements of the temps array.

To find out whether the array contains any negative values, we need to look at each array element in turn. We now focus our attention to the second part of the purpose statement, i.e., we check for each element whether it is smaller than zero and return true if so. The function becomes:

bool contains_negative(Array array) {
    // Iterate over the elements of the array (any order).
    // For each array element, check whether it is negative.
    // If so return true, otherwise continue
    // until all of the elements have been inspected.
    // If there is no negative value, answer false.
    for (int i = 0; i < a_length(array); i = i + 1) {
        assert(i < a_length(array));
        assert(i >= 0);
        int t = get(int, array, i);
        if (t < 0) {
            return true;
        }
    }
    return false;
}

Compiling and running yields:

false

Testing
Applying the test cases to the contains_negative yields the following output:
Line 24: check passed
Line 28: check passed
Line 32: check passed
Line 36: check passed
Line 40: check passed
Line 44: check passed

Review and revise

It is a good idea to call the a_length only once. Cleaning up the function we obtain:

bool contains_negative(Array array) {
    // Iterate over the elements of the array (any order).
    // For each array element, check whether it is negative.
    // If so return true, otherwise continue
    // until all of the elements have been inspected.
    // If there is no negative value, answer false.
    int n = a_length(array);
    for (int i = 0; i < n; i++) {
        assert(i < n);
        assert(i >= 0);
        if (get(int, array, i) < 0) {
            return true;
        }
    }
    return false;
}

10.2 Selecting Data from an Array

Imagine that we want to reduce the amount of data in the temperature array and only consider the data of every second hour, i.e., we only want the temperatures at 0:00, 2:00, 4:00, ..., 22:00. Let us apply the same recipe as above.

Problem statement
The function to implement needs to subsample the data and provide every second element of the original array. It should start with the first element.
Data definition
The information is represented in arrays with elements of type integer.
Function signature
The function we look for takes an integer array and returns an integer array.
// Array(int) -> Array(int)
Function name
every_other
Function header
Array every_other(Array array);
Function stub
Array every_other(Array array) {
return array(int, 0); // an empty array
}
Purpose statement
// Returns an array that contains every other element of
// the provided array, i.e., the result array contains
// the elements at the even indices of the provided array.

Examples and expected results (test cases)

static void every_other_test(void) {
    ia_check_expect(
        every_other(ia_of_string("1 2 3")),
                    ia_of_string("1 3"));
    ia_check_expect(
        every_other(ia_of_string("")),
                    ia_of_string(""));
    ia_check_expect(
        every_other(ia_of_string("-2")),
                    ia_of_string("-2"));
    ia_check_expect(
        every_other(ia_of_string("1 2")),
                    ia_of_string("1"));
    ia_check_expect(
        every_other(ia_of_string("1 2 3 4")),
                    ia_of_string("1 3"));
}

int main(void) {
    every_other_test();
    return 0;
}

The check function ia_check_expect compares two integer arrays, element by element. To pass, the lengths must be equal and for each index position the elements must be equal.

At this stage the program compiles, but some of the tests will fail when running the program. Note that no memory is released in these tests (a_free is never called) in order to improve readability and to focus on the essentials.

Function body

Now we implement the body of every_other. We again use the recipe for for loops from the previous chapter.

// Returns an array that contains every other element of
// the provided array, i.e., the result array contains
// the elements at the even indices of the provided array.
Array every_other(Array array) {
    // <from recipe for loops>
    // purpose
    for (int i = initial_value; condition_on_i; update_i) {
        assert(condition_on_i);
        assert(condition_on_i_init_update);
        printiln(i); // for inspecting the iteration variable
        // statements ...i...;
    }
    return array(int, 0); // an empty array
}

// Iterates over the even indices in the original order.
// Copy each element at an even index to the result array.

We are interested in the even indices, i.e., 0, 2, 4, etc. Thus, we replace initial_value with 0 and update_i with i = i + 2. The run condition depends on the length of the array: condition_on_i is replaced by i < a_length(array). The second assertion states that i is not negative and even. We are ready to take the next step of the recipe and put in values and expressions for these placeholders:

// Returns an array that contains every other element of
// the provided array, i.e., the result array contains
// the elements at the even indices of the provided array.
Array every_other(Array array) {
    // Iterates over the even indices in the original order.
    // Copies each element at an even index to the result array.
    for (int i = 0; i < a_length(array); i = i + 2) {
        assert(i < a_length(array));
        assert(i >= 0 && (i % 2) == 0); // % is the modulo operator
        printiln(i); // for inspecting the iteration variable
        // statements ...i...;
    }
    return array(int, 0); // an empty array
}

Applying every_other to the temps array yields these indices:

0
2
4
6
8
10
12
14
16
18
20
22

To print the corresponding elements, replace

printiln(i); // for inspecting the iteration variable

with

printiln(get(int, array, i));

to obtain

7
5
6
6
6
7
10
11
13
10
9
7

These values should not be printed, but put into a result array. The lengh of the result array is (n + 1) / 2, where n is the length of the input array. The division operator / here denotes an integer division, which discards the fraction. Check that this expression yields the correct lengths for the test cases given above. The remaining task is to put the values read from even indices of array to successive indices of result. The transformation of index positions in the input array to index positiosn of the result array is simply the integer division by 2.

The function becomes:

// Returns an array that contains every other element of
// the provided array, i.e., the result array contains
// the elements at the even indices of the provided array.
Array every_other(Array array) {
    int n = a_length(array); // length of input array
    int m = (n + 1) / 2; // length of result array
    Array result = array(int, m);
    // Iterate over the even indices in the original order.
    // Copy each element at an even index to the result array.
    for (int i = 0; i < a_length(array); i = i + 2) {
        assert(i < a_length(array));
        assert(i >= 0 && (i % 2) == 0);
        int x = get(int, array, i); // read from array
        set(int, result, i / 2, x); // write to result
    }
    return result;
}

int main(void) {
    Array temps = ia_of_string(
        "7, 6, 5, 6, 6, 5, 6, 6, 6, 7, 7, 9, 10,"
        "11, 11, 12, 13, 12, 10, 9, 9, 8, 7, 4");
    Array eo = every_other(temps);
    ia_println(eo);
    a_free(eo);
    a_free(temps);
    return 0;
}

Compiling and running yields:

[7 5 6 6 6 7 10 11 13 10 9 7]

Testing
Applying the test cases to the every_other yields the following output:
Line 181: check passed
Line 182: check passed
Line 183: check passed
Line 184: check passed
Line 185: check passed

Review and revise

A few final touches lead to this implementation:

// Returns an array that contains every other element of
// the provided array, i.e., the result array contains
// the elements at the even indices of the provided array.
Array every_other(Array array) {
    int n = a_length(array); // length of input array
    int m = (n + 1) / 2; // length of result array
    Array result = array(int, m);
    // Iterate over the even indices in the original order.
    // Copy each element at an even index to the result array.
    for (int i = 0; i < n; i += 2) {
        assert(i < n);
        assert(i >= 0 && (i % 2) == 0);
        set(int, result, i / 2, get(int, array, i));
    }
    return result;
}

If we wish, we can now repeatedly half the data like this:

int main(void) {
    Array temps = ia_of_string(
        "7, 6, 5, 6, 6, 5, 6, 6, 6, 7, 7, 9, 10,"
        "11, 11, 12, 13, 12, 10, 9, 9, 8, 7, 4");

    Array a = every_other(temps);
    ia_println(a);

    while (a_length(a) > 1) {
        Array b = every_other(a);
        ia_println(b);
        a_free(a);
        a = b;
    }

    a_free(a);
    a_free(temps);
    return 0;
}

The output is

[7 5 6 6 6 7 10 11 13 10 9 7]
[7 6 6 10 13 9]
[7 6 13]
[7 13]
[7]

10.3 Reducing an Array to a Single Value

Your task is now to find the average temperature during selectable periods of the day. These periods could be morning, afternoon, evening, etc. They can be precisely defined by specifying the start hour and end hour of the chosen period. Computing the temperature from the input array and the period is a larger task that can be broken down into three subtasks:

Compute a subarray for the given period.
Compute the sum of the elements in the subarray.
Divide the sum by the number of elements in the subarray.

These three subtasks could conceivably also be written as a single function. Choosing a suitable granularity for functions is part of the design decisions that a programmer has to make. A function should be responsible for a single, well-defined task only. Defining smaller function often pays off later, when such functions can be used again. For example, a function for creating a subarray is probably useful in many future programming tasks. The same is true for a function that computes the sum of the elements in an array. The decision is up to the programmer. But here, we will implement these three functions separately. This example is contained in array_mean.c.

10.3.1 Compute a subarray

Problem statement
Compute a subarray of a given array.
Function signature
The function takes an array and information about which part copy. The latter could be specified with a start index and a length, or with a start index and an end index. We choose to use start index and end index and specify that the start index is inclusive and the end index is exclusive. This is a very common convention.
// Array(int), int, int -> Array(int)
Function name
subarray
Function header
Array subarray(Array array, int start_index, int end_index);
Function stub
Array subarray(Array array, int start_index, int end_index) {
return array(int, 0); // an empty array
}
Purpose statement
// Creates a new array consisting of array[start_index, end_index).
// start_index is inclusive, end_index is exclusive.

Examples and expected results (test cases)

static void subarray_test(void) {
    Array array, array2, sub;
    array = ia_of_string("1 2 3 4");

    sub = subarray(array, 0, a_length(array));
    // [0, len) should give the original array
    ia_check_expect(array, sub);
    a_free(sub);

    sub = subarray(array, -1, a_length(array) + 1);
    // [-1, len + 1) should also give the original array
    ia_check_expect(array, sub);
    a_free(sub);

    array2 = ia_of_string("2 3 4");
    // [1, len) should drop the first element
    sub = subarray(array, 1, a_length(array));
    ia_check_expect(array2, sub);
    a_free(array2);
    a_free(sub);

    array2 = ia_of_string("2 3");
    // [1, len - 1) should drop the first and last element
    sub = subarray(array, 1, a_length(array) - 1);
    ia_check_expect(array2, sub);
    a_free(array2);
    a_free(sub);

    array2 = ia_of_string("");
    // [1, 1) is an empty range
    sub = subarray(array, 1, 1);
    ia_check_expect(array2, sub);
    a_free(sub);

    sub = subarray(array, 2, 1);
    // [2, 1) is an empty range as well
    ia_check_expect(array2, sub);
    a_free(array2);
    a_free(sub);

    a_free(array);
}

int main(void) {
    subarray_test();
    return 0;
}

Again, the check function ia_check_expect compares two integer arrays, element by element.

Verify that the program compiles. Think about the examples. Do they cover the important cases? Add more examples if necessary.

Function body

We start with the recipe from the previous chapter.

// Creates a new array consisting of array[start_index, end_index).
// start_index is inclusive, end_index is exclusive.
Array subarray(Array array, int start_index, int end_index) {
    // <from recipe for loops>
    // purpose
    for (int i = initial_value; condition_on_i; update_i) {
        assert(condition_on_i);
        assert(condition_on_i_init_update);
        printiln(i); // for inspecting the iteration variable
        // statements ...i...;
    }
    return array(int, 0);
}

// Iterate over the elements of the array from start_index
// (inclusive) to end_index (exclusive).
// Copy element array[start_index] to result[0].
// Copy element array[start_index + 1] to result[1].
// Copy element array[start_index + 2] to result[2].
// ...
// Copy element array[start_index + n] to result[n].
// Where n is end_index - start_index - 1.

Since there are two arrays involved (the input array and the result array) we need two indices. We call them i (for iterating over the input array) and j (for iterating over the result array). The initialization expression becomes int i = start_index, j = 0. The update expression update_i becomes i++, j++. The run condition becomes i < end_index. The end index is exclusive, thus we have to stop before it is reached. We modify printiln(i) to show both i and j. Filling in these gaps leads to:

// Creates a new array consisting of array[start_index, end_index).
// start_index is inclusive, end_index is exclusive.
Array subarray(Array array, int start_index, int end_index) {
    // <from recipe for loops>
    // purpose
    for (int i = start_index, j = 0; i < end_index; i++, j++) {
        assert(i < end_index);
        assert(i >= start_index);
        printf("i = %d, j = %d\n", i, j);
        // statements ...i...;
    }
    return array(int, 0);
}

Now, compile the program and run it. For the above test cases, the output should be these indices and test results (likely except the actual line numbers):

i = 0, j = 0
i = 1, j = 1
i = 2, j = 2
i = 3, j = 3
Line 53: check_expect_int_array: Actual length 4 differs from expected length 0
i = -1, j = 0
i = 0, j = 1
i = 1, j = 2
i = 2, j = 3
i = 3, j = 4
i = 4, j = 5
Line 57: check_expect_int_array: Actual length 4 differs from expected length 0
i = 1, j = 0
i = 2, j = 1
i = 3, j = 2
Line 62: check_expect_int_array: Actual length 3 differs from expected length 0
i = 1, j = 0
i = 2, j = 1
Line 68: check_expect_int_array: Actual length 2 differs from expected length 0
Line 74: check passed
Line 78: check passed

There are some invalid indices, like -1. In general, you should check input arguments for allowed values. Let us first think about when to return an empty array. An empty array should be returned if

start_index >= end_index or
start_index >= a_length(array) or
end_index <= 0.

If none of these are given, we still need to check whether start_index < 0 and end_index > a_length(array). In any of these cases, we set the indices to the minimum and maximum allowed values, respectively. The result is:

// Creates a new array consisting of array[start_index, end_index).
// start_index is inclusive, end_index is exclusive.
Array subarray(Array array, int start_index, int end_index) {
    if (start_index >= end_index ||
        start_index >= a_length(array) ||
        end_index <= 0)
    {
        return array(int, 0);
    }
    assert(start_index < end_index);
    assert(start_index < a_length(array));
    assert(end_index > 0);
    if (start_index < 0) start_index = 0;
    if (end_index > a_length(array)) end_index = a_length(array);
    for (int i = start_index, j = 0; i < end_index; i++, j++) {
        assert(i < end_index);
        assert(i >= start_index);
        printf("i = %d, j = %d\n", i, j);
        // statements ...i...;
    }
    return array(int, 0);
}

The output now only contains valid indices:

i = 0, j = 0
i = 1, j = 1
i = 2, j = 2
i = 3, j = 3
Line 53: check_expect_int_array: Actual length 4 differs from expected length 0
i = 0, j = 0
i = 1, j = 1
i = 2, j = 2
i = 3, j = 3
Line 57: check_expect_int_array: Actual length 4 differs from expected length 0
i = 1, j = 0
i = 2, j = 1
i = 3, j = 2
Line 62: check_expect_int_array: Actual length 3 differs from expected length 0
i = 1, j = 0
i = 2, j = 1
Line 68: check_expect_int_array: Actual length 2 differs from expected length 0
Line 74: check passed
Line 78: check passed

The remaining task is to create a result array of length end_index - start_index, to read from the input array and to copy to the output array:

// Creates a new array consisting of array[start_index, end_index).
// start_index is inclusive, end_index is exclusive.
Array subarray(Array array, int start_index, int end_index) {
    if (start_index >= end_index ||
        start_index >= a_length(array) ||
        end_index <= 0)
    {
        return array(int, 0);
    }
    assert(start_index < end_index);
    assert(start_index < a_length(array));
    assert(end_index > 0);
    if (start_index < 0) start_index = 0;
    if (end_index > a_length(array)) end_index = a_length(array);
    int n = end_index - start_index;
    Array result = array(int, n);
    for (int i = start_index, j = 0; i < end_index; i++, j++) {
        assert(i < end_index);
        assert(i >= start_index);
        int value = get(int, array, i); // read from input array
        set(int, result, j, value); // write to output array
    }
    return result;
}

Testing
All test cases pass:
Line 53: check passed
Line 57: check passed
Line 62: check passed
Line 68: check passed
Line 74: check passed
Line 78: check passed
Review and revise
Check whether you find aspects in the code that can be simplified.

10.3.2 Compute the sum of the array elements

Computing the sum of the elements is simple enough that we omit the full recipe and show the final function. As an exercise, and without the solution in front of you, follow the steps of the recipe and check whether you arrive at a similar solution.

// Computes the sum of the elements in an array.
int ia_sum(Array array) {
    int sum = 0;
    for (int i = 0; i < a_length(array); i++) {
        sum = sum + get(int, array, i);
    }
    return sum;
}

10.3.3 Compute the average of the array elements

This function reuses the two functions just defined. The solution is easy. Just divide the sum by the number of elements in the list. The main challenge lies in handling empty arrays. In this case division is not defined. There are multiple strategies to handling this. The most drastic one is to end the program. Another one would be to define a special output type that can either represent the average value or nothing, depending on whether the input array is empty. We start with the drastic version:

// Computes the average of the elements in an array.
int ia_average(Array array) {
    int n = a_length(array);
    if (n == 0) {
        printsln("Impossible. Bye.");
        exit(1); // terminate the program
    }
    int sum = ia_sum(array);
    return sum / n;
}

10.3.4 Compute the average for a range of array elements

Problem statement
Time to get back to our original problem: Computing the average temperature of a selectable timespan over a day.
Function signature
The function takes an array and the start and end indices of the relevant part.
// Array(int), int, int -> int
Function name
subarray_average
Function header
int subarray_average(Array, int, int);
Parameter names
int subarray_average(Array array, int start_index, int end_index);
Function stub
int subarray_average(Array array, int start_index, int end_index) {
return 0;
}
Purpose statement
// Computes the average over the specified subarray.
// The start index is inclusive, the end index is exclusive.

Examples and expected results (test cases)

static void subarray_average_test(void) {
    Array array;
    array = ia_of_string("1 2 3 4");
    check_expect_i(subarray_average(array, 0, 1), 1);
    check_expect_i(subarray_average(array, 3, 4), 4);
    check_expect_i(subarray_average(array, 0, 4), 2);
    check_expect_i(subarray_average(array, 1, 4), 3);
    a_free(array);
}

int main(void) {
    subarray_average_test();
    return 0;
}

Verify that the program compiles. Think about the examples. Do they cover the important cases? Add more examples if necessary.

Function body

The function body is quite simple, because we can use the functions defined above.

// Computes the average over the specified subarray.
// The start index is inclusive, the end index is exclusive.
int subarray_average(Array array, int start_index, int end_index) {
    Array sub = subarray(array, start_index, end_index);
    int average = ia_average(sub);
    a_free(sub);
    return average;
}

Testing
All test cases pass:
Line 210: check_expect_int passed
Line 211: check_expect_int passed
Line 212: check_expect_int passed
Line 213: check_expect_int passed
Review and revise
The function seems to be in good shape.

Let’s query a few temperatures:

int main(void) {
    Array temps = ia_of_string(
        "7, 6, 5, 6, 6, 5, 6, 6, 6, 7, 7, 9, 10,"
        "11, 11, 12, 13, 12, 10, 9, 9, 8, 7, 4");

    prints("Average morning temperature: ");
    printiln(subarray_average(temps, 6, 10));

    prints("Average forenoon temperature: ");
    printiln(subarray_average(temps, 10, 13));

    prints("Average afternoon temperature: ");
    printiln(subarray_average(temps, 13, 17));

    prints("Average evening temperature: ");
    printiln(subarray_average(temps, 17, 22));

    return 0;
}

The output is:

Average morning temperature: 6
Average forenoon temperature: 8
Average afternoon temperature: 11
Average evening temperature: 9

10.4 Optional Results

The ia_average function given above just ends the execution if it is called with an empty array. This behavior is reasonable, but is not apparent from the signature of the function (Array(int) -> int).

// Computes the average of the elements in an integer array.
int ia_average(Array array) {
    int n = a_length(array);
    if (n == 0) {
        printsln("Impossible. Bye.");
        exit(1);
    }
    int sum = ia_sum(array);
    return sum / n;
}

There is no way to signal the exceptional condition of an empty array to the caller, because the return type is integer and any integer could be the result of an averaging operation.

The solution is to use an option type as the return type of the function. The option type signals to the programmer that this function may return nothing (when the array or range are empty). The signature of the function becomes

Array(int), int, int -> IntOption

// Array(int), int, int -> IntOption
// Computes the average over the specified subarray.
// The start index is inclusive, the end index is exclusive.
IntOption subarray_average2(Array array, int start_index, int end_index) {
        if (start_index >= end_index) {
                return make_int_none();
        }
    Array sub = subarray(array, start_index, end_index);
    int average = ia_average(sub);
    a_free(sub);
    return make_int_some(average);
}

If the selection is empty, then the function returns a special int-none value, which stands for the absence of a regular result. If the selection is not empty, it returns an int-some value, which encapsulates the computed result. The caller may now inspect the result and act according to these two cases:

IntOption io = subarray_average2(temps, 17, 17);
if (io.none) {
printsln("none");
} else {
printiln(io.some);
}

As you may have noticed, an IntOption is an example of an Itemization type (see Recipe for Itemizations). It could also be implemented as a Variant type (see Recipe for Variant Data (Sum Types)). It is defined in basedefs.h.

typedef struct IntOption {
bool none;
int some;
} IntOption;

The constructor functions for this type are defined in base.c as

IntOption make_int_none(void) {
    IntOption op = { true, 0 };
    return op;
}

IntOption make_int_some(int some) {
    IntOption op = { false, some };
    return op;
}

10.5 Filtering Arrays

The function every_other may alternatively be written by using a filter function. A filter function takes the array and a predicate function. A predicate function is Boolean-valued. The filter function calls the predicate function for each element of the array. The predicate function returns true for elements that should be kept and false for elements that should be filtered out.

bool even_index(int element, int index, int x) {
return (index % 2) == 0; // % is the modulo operator
}

Array every_other(Array array) {
return ia_filter(array, even_index, 0);
}

The filter function ia_filter expects a predicate function with signature

int int int -> bool

The first parameter is the element’s value. It is not used in this example. The second parameter is the elemen’s index in the array. The third parameter is the value that was supplied to the ia_filter function as the last parameter (0 in this example). It is also not used in this example.

In the call ia_filter(array, even_index, 0) the memory location of the predicate function even_index is provided to ia_filter, so that ia_filter can call even_index for each element.

10.6 All in One Function

This section is optional material.

A potential issue with subarray_average as defined above is that the subarray is allocated just to compute the average, only to be thrown away immediately afterwards.

int subarray_average(Array array, int start_index, int end_index) {
    Array sub = subarray(array, start_index, end_index);
    int average = ia_average(sub);
    a_free(sub); // <--- throw away
    return average;
}

Let us compare the run time of this function with an alternative function in which everything is contained in a single function and no temporary subarray is allocated. The test program is compiled with optimization switched on (gcc -O2...).

unsigned long time_averaging(int n) {
    Array array = ia_range(0, n); // fill with 0, 1, ..., n - 1
    unsigned long x = 0;
    for (int i = 1; i < n; i++) {
        x += subarray_average(array, 0, i);
    }
    a_free(array);
    return x;
}

int main(void) {
    time_function(   time_averaging(50000)   );
    return 0;
}

The timimg results (given in milliseconds) are:

Variant		Time (ms)
original, with assertions, get		4545
three functions, no assertions, get		1330
three functions, no assertions, subscripts		485
three functions, no assertions, memcpy		333
one function, no assertions, get		723
one function, no assertions, subscripts		111

This is the single-function version with subscripts:

int subarray_average(Array array, int start_index, int end_index) {
    int n = array->n;
    if (start_index >= end_index ||
        start_index >= n ||
        end_index <= 0)
    {
        printsln("Cannot do this. Bye.");
        exit(1);
    }
    if (start_index < 0) start_index = 0;
    if (end_index > n) end_index = n;
    n = end_index - start_index;
    int sum = 0;
    int *p = array->a;
    for (int i = start_index; i < end_index; i++) {
        sum += p[i];
    }
    return sum / n;
}

This shows that accessing array elements via subscripting [ ] is much faster than using function get. The variant with subscripts takes only 36% of the running time (485 ms) compared to the getter function (1330 ms). The benefit of the latter is that it checks array bounds on each access. Subscripting of C arrays does not perform bounds checking and allows accessing arbitrary memory locations beyond the limits of the array. Furthermore, the unnecessary creation of a temporary array triples the running time (333 ms vs 111 ms). The memcpy function quickly copies contiguous blocks of memory. Typically, for loop bodies that get executed very often, it is beneficial to optimize for performance.

10.7 Exercises

Write function that returns the smallest element of an integer array.
Write function that computes the sum of all values of an integer array that are larger than their immediate predecessor in the array.
Compute the number of zero crossings in an array. A zero crossing is a transition from negative to positive values, or vice versa.

11 Memory in C

11.1 Introduction

In C there are three different ways of handling memory:

Automatic storage (call stack)
Static storage
Dynamic storage (heap)

When a program ends, all of its memory is released. (A running program is a process. When the process terminates, the operating system releases all of the resources associated with that process.)

11.2 Automatic Storage

Automatic storage refers to the call stack. Non-static local variables and function parameters are stored on the call stack. They are automatically allocated when a block is entered and automatically destroyed when a block is left. Their lifetime is thus linked to the block in which they are declared. This allows for function recursion as space for the new stack frame is allocated on the call stack: Arguments are pushed on the stack and space for local variables is set aside on the stack. Moreover, the return address to the instruction after the call is stored on the call stack.

Automatic storage is efficient, has clear semantics, and supports function argument passing well. One issue is that the stack may overflow if it is not large enough to hold all of the stack frames for currently active functions. This can become a problem mainly for recursive calls.

11.3 Static Storage

Static storage refers to variables that are allocated at the file level or as static local variables. The lifetime of such variables equals the lifetime of the program. The size of static storage needs to be known at compile time. This kind of storage is suitable for long-lived and “global” entities.

Static storage reduces the need for explicitly declared function parameters. However, functions that rely on static storage are more difficult to test and debug as they do not only rely on their parameters. Their behavior cannot be understood from their parameters alone. They are not necessarily functions in the mathematical sense.

11.4 Dynamic Memory

Dynamic memory (also known as free store or dynamic heap) provides more flexibility than automatic and static storage. Dynamically allocated memory is accessed through pointers. Blocks of dynamic memory are allocated with malloc, xmalloc, calloc, and xcalloc. They are released with free. These functions are declared as:

void* malloc(int n);
void* xmalloc(int n);
void* calloc(int n, int size);
void* xcalloc(int n, int size);
void free(void *p);

malloc and xmalloc take the length in bytes of the memory block needed and return a pointer to the start of the new memory block. calloc and xcalloc take the number of elements and the element size in bytes to allocate a block of length number * size bytes. The c in calloc/xcalloc stands for clear – each byte of the allocated block is set to 0. Allocated blocks are simply cast to the pointer type needed and can be used until explicitly released with free.

With (x)malloc / (x)calloc and free the program explicitly manages the lifetime of a block of memory in the program. This is very flexible and allows to handle memory objects whose size is only known at runtime, e.g., because it depends on user input or on unknown amounts of data received over the network. However, a big problem associated with manual dynamic memory allocation is to correctly free allocated memory. If the program fails to release memory that is no longer needed, then the program has memory leaks. Because dynamic memory is only accessible via pointers, if a pointer variable that holds the only reference to an allocated block of memory is accidentally being overwritten, the block cannot be accessed again.

11.4.1 Dynamic Memory Example

Sometimes variable amounts of memory are required. The exact amount may depend on user input and may not be available at compile time. In such situations dynamic memory allocation is helpful. An example would be a function function String stars(int n), which produces a String of n ‘*’ characters. So for example, stars(2) produces “**” and stars(8) produces “********”. Dynamic memory allocation allows requesting memory as needed. For stars this is n + 1 characters: n repetitions of the ‘*’ character followed by a ‘\0’ character to terminate the String. The complete function including test cases looks like this (see stars.c):

/*
Compile: make stars
Run: ./stars
*/

#include "base.h"
#include "string.h"

// note: typedef char* String;

String stars(int n) {
    char *s = xmalloc(n + 1);
    // or: char *s = xcalloc(n + 1, 1);
    for (int i = 0; i < n; i++) {
        s[i] = '*';
    }
    s[n] = '\0';
    return s;
}

void stars_test(void) {
    String s;
    check_expect_s(s = stars(0), "");
    free(s);
    check_expect_s(s = stars(1), "*");
    free(s);
    check_expect_s(s = stars(2), "**");
    free(s);
    check_expect_s(s = stars(3), "***");
    free(s);
}

int main(void) {
    base_set_memory_check(true);
    stars_test();
    return 0;
}

11.4.2 malloc vs. xmalloc

malloc and calloc either return a pointer to the first byte of the allocated memory block or NULL. On modern operating systems and computers with huge amounts of memory, getting NULL when trying to allocate memory is a very rare case, but it nonetheless needs to be handled in production code. This makes the code harder to read and write. The typical pattern of checking for NULL is:

int* p;
if ((p = malloc(100 * sizeof(int))) == NULL) {
// handle error
}
// use allocated memory

In contrast, xmalloc and xcalloc exit the program if a request for dynamic memory cannot be fulfilled. These functions never return NULL, which obviates the need to check for this case. However, xmalloc and xcalloc are non-standard extensions. They are used in the Programming I C Library for debugging. They are defined to keep track of which memory blocks have been dynamically allocated and tell the programmer if they have not been correctly released when the program exits. These implementations of xmalloc and xcalloc should not be used in production code, because the program has no way of reacting to scarce memory resources gracefully (e.g. by saving the current state to a file). Several implementations of xmalloc and xcalloc allow for defining an error handling function that is called in the event of an out-of-memory error before exiting.

11.4.3 The “Heap”

The following shows a simplified model of how blocks of memory are dynamically allocated and deallocated on the heap (also called free store). The example assumes that the heap consists of 16 bytes of memory, all of which are initially free. The diagrams show the effects of differenct calls to malloc, xmalloc, and free on the heap.

Byte *p1 = xmalloc(4); // returns 1

Requesting 4 bytes in the initial state yields address 1. The block extends from address 1 to 4.

Byte *p2 = xmalloc(3); // returns 5

Requesting a block of 3 bytes yields address 5. This block extends from address 5 to 7.

Byte *p3 = xmalloc(2); // returns 8

Requesting a block of 2 bytes yields address 8. This block extends from address 8 to 9.

free(p2); // p2 == 5

The block from address 5 to 7 is now free to be allocated again.

Byte *p4 = xmalloc(4); // 10

Requesting a block of 4 bytes yields address 10. This block extends from address 10 to 13. (The free block from address 5 to 7 is not large enough.)

Byte *p5 = malloc(4); // returns NULL

Using malloc to tequest another block of 4 bytes yields NULL, because there is no contiguous block of 4 or more bytes available. xmalloc(4) would have exited the program.

free(p3); // p3 == 8

The block from address 5 to 9 is now free to be allocated again.

Byte *p6 = xmalloc(4); // returns 5

Again requesting a block of 4 bytes yields address 5, because the first contiguous block of 4 or more bytes starts at address 5. The newly allocated block extends from address 5 to 8.

Byte *p7 = xmalloc(2); // returns 14

Requesting a block of 2 bytes yields address 14, because the first contiguous block of 2 or more bytes starts at address 14. The newly allocated block extends from address 14 to 15.

12 Lists: Variable-Size Collections

Lists are collections of variable size. Elements can be added to and removed from lists as necessary. This is in contrast to arrays, whose size is specified on creation and does not change afterwards. Lists are very flexible and versatile. For example, lists allow for efficient insertion between existing elements. However, accessing an element at a given position is less efficient than for arrays. A chain of pointers has to be followed to get to the desired element. Like arrays, lists allow treating a number of elements as a whole, with operations that are applied to each element of the list.

The Programming I C Library introduces lists for elements of various types. The documentation of these lists is available online and in the header files, list.h, int_list.h, double_list.h, string_list.h, and pointer_list.h, respectively. Test examples and function implementations can be found in the corresponding .c files.

The implementation of linked lists typically relies on pointers and dynamic memory, as introduced in the previous chapter.

12.1 Filtering Strings

Often the final size of a collection is not known at the start of an algorithm. An example would be an algorithm that processes a collection of words and only keeps those words that contain a particular substring. Initially it is unclear what size the filtered collection will have. Allocating an array of the same size as the original word collection would be wasteful. A list is much more suitable in this case. Words on the input collection are inspected one at a time. If they contain the substring (or match some other relevant criterion) they are appended to the output list. The output list grows with each added element. A possible implementation of a function that solves this problem is derived in the following (see keep_strings.c). The include directives are:

#include "base.h"
#include "string.h"
#include "list.h"
#include "string_list.h"

Problem statement
The function should keep those words of a collection of words that contain a given substring.
Data definition
The input collection and filtered result collection are Lists with elements of type String.
Function signature
The function we look for takes a List with elements of type String and a substring that desired words need to contain. The function returns the List filtered words.
// List(String), String -> List(String)
Function name
keep_if_contains
Function header
List keep_if_contains(List list, String part)
Function stub
List keep_if_contains(List list, String part) {
return sl_create(); // empty String list
}
Purpose statement
// Returns a list of those words of the input list that contain part.

Examples and expected results (test cases)

static void keep_if_contains_test(void) {
    // split sentence into list of words
    List a = sl_split("Lists are collections of variable size.", ' ');
    List expected = sl_split("Lists collections variable size.", ' ');
    List actual = keep_if_contains(a, "i");
    // "i" is part of every word of this sentence, so keep all
    sl_check_expect(actual, expected);
    sl_free(actual); l_free(expected);

    expected = sl_split("are variable", ' ');
    actual = keep_if_contains(a, "ar");
    // "ar" is only contained in words "are" and "variable"
    sl_check_expect(actual, expected);
    sl_free(a); sl_free(actual); l_free(expected);
}

The function sl_split splits strings at the given character (here: the space character). The check function sl_check_expect checks for equality of two string lists. To be considered equal both string lists need to have the same length and the elements at each index position need to be equal.

At this stage the program compiles, but the tests will fail when running the program.

Function body

The first approach just follows the chain of list nodes to check each element. Unfortunately, this exposes the Node structure.

List keep_if_contains(List list, String part) {
    List result = sl_create();
    for (StringListNode *node = list->first; node != NULL; node = node->next) {
        if (s_contains(node->value, part)) {
            sl_append(result, node->value);
        }
    }
    return result;
}

The function s_contains returns true if (and only if) the string given as the second argument (part) is contained in the string that is given as the first argument. The function sl_append appends a string to a list of strings.

Testing
The test cases pass.
keep_strings.c, line 63: check passed
keep_strings.c, line 68: check passed
Both tests passed!

Review and revise

An Iterator may be used to process the List. An iterator stores the current iteration state. In the case of Lists this means that it stores a pointer to the next node. There is a constructor function to create an Iterator from a List (l_iterator), a function to check whether there are more elements (l_has_next), and a function to advance to the next element and return its value (sl_next). The next-value-function is specific to the type of the list: Its return type corresponds to the elmenent type of the list. There are next-value-functions for int-lists (il_next), double-lists (dl_next), String lists (sl_next), and general lists (l_next). The latter just returns a pointer to the beginning of the list element. The next-value functions change the state of the iterator, because they advance to the next node. Hence, these functions take the address of the iterator – they use the iterator as an in/out argument. Iterators are not dynamically allocated and hence do not have to be released.

List keep_if_contains(List list, String part) {
    List result = sl_create();
    ListIterator iter = l_iterator(list);
    while (l_has_next(iter)) {
        String s = sl_next(&iter);
        if (s_contains(s, part)) {
            sl_append(result, s);
        }
    }
    return result;
}

Because filtering arrays is such a common case, there is a function sl_filter that allows for formulating filtering criteria very concisely. It takes the list, the criterion, and an optional parameter. This leads to a third possibility to process the list.

static bool contains(String element, int index, String x) {
return s_contains(element, x);
}

List keep_if_contains(List list, String part) {
return sl_filter(list, contains, part);
}

Again, each list element that contains the given part is included in the result. The s_contains function cannot be used directly, because its signature does not match to what the sl_filter function expects (the index parameter is missing).

13 Style

Style and naming conventions can be a controversial issue. The primary goal of establishing naming conventions is to make code easier to read. Source code is typically more often read than written. Programs should be written as clearly and consistently as possible. Consistent use of naming conventions and consistent layout of source code is a sign of high-quality work. A few recommendations follow.

13.1 Naming Conventions

Use short, common, and non-abbreviated names.
Use abbreviations sparingly.
Use very short variable names only as local variables, and if their meaning is clear from the context.
Use i, j, k as loop indices.
Do not make the type name part of the variable name, e.g., center rather than centerPoint.
Use plural for collections of items, e.g., names rather than nameArray.
Use lower-case function, variable, and file names.
Use underscore as a word separator, e.g., hours_to_wages. Note that this is a C convention. Java often uses camel case (e.g., hoursToWages).
Use CAPITALIZED_WITH_UNDERSCORES for constants.
Use CamelCase with an upper-case first letter for custom-defined type names, e.g., TrackingOrder.
Don’t use a special prefix for pointers, e.g. simply list rather than pList
Avoid l as a variable name and O as a type name as they are easily confused with 1 and 0, respectively.

13.2 Coding Style

Properly indent code. This is extremely important for readability.
Consistently indent code. Do not mix tab characters and spaces. The typical tab width is 4 spaces. Using spaces avoids problems with different editor settings.
Write spaces around operators, e.g., 1 + 2 * 3 instead of 1+2*3.
Use empty lines to visually structure blocks of code.
Write an open brace as the last character of a line, not as the first character of the following line.
Always use braces.

https://www.doc.ic.ac.uk/lab/cplus/cstyle.html

http://users.ece.cmu.edu/~eno/coding/CCodingStandard.html

13.3 camelCase vs. under_score

Dave Binkley, Marcia Davis, Dawn Lawrie, Christopher Morrell. To CamelCase or Under_score. IEEE 17th International Conference on Program Comprehension, 2009. ICPC ’09. (IEEE): 158–167.

Abstract: "The experiment builds on past work of others who study how readers of natural language perform such tasks. Results indicate that camel casing leads to higher accuracy among all subjects regardless of training, and those trained in camel casing are able to recognize identifiers in the camel case style faster than identifiers in the underscore style."

Bonita Sharif and Jonathan I. Maletic. An Eye Tracking Study on camelCase and under_score Identifier Styles. IEEE 18th International Conference on Program Comprehension, 2010. ICPC ’10. (IEEE): 196–205.

Abstract: "An empirical study to determine if identifier-naming conventions (i.e., camelCase and under_score) affect code comprehension is presented. An eye tracker is used to capture quantitative data from human subjects during an experiment. The intent of this study is to replicate a previous study published at ICPC 2009 (Binkley et al.) that used a timed response test method to acquire data. The use of eye-tracking equipment gives additional insight and overcomes some limitations of traditional data gathering techniques. Similarities and differences between the two studies are discussed. One main difference is that subjects were trained mainly in the underscore style and were all programmers. While results indicate no difference in accuracy between the two styles, subjects recognize identifiers in the underscore style more quickly."

http://en.wikipedia.org/wiki/CamelCase#cite_note-38

14 Error Messages

It is a helpful skill to be able to interpret error messages from the compiler. The following gives examples of common gcc error messages and gives advice on how to interpret them.

14.1 Compiler Error Messages

14.1.1 Missing ;

Output:

overtime.c:14:29: expected ';' after top level declarator
const int WEEKLY_HOURS = 40
^

Description:

A ; is missing (line 14, column 29).

Solution:

Add a ; to the end of the line.

14.1.2 Missing function declaration

Output:

overtime.c: In function 'hours_to_overtime_test':
overtime.c:11:20: implicit declaration of function 'hours_to_oxertime'
check_expect_i(hours_to_oxertime(20), 0);
^

Description:

Unknown function name hours_to_oxertime (line 11, column 20).

Solution:

In this case: Correct the function name. In general: Insert a function declaration (function header) above the first call of the function.

14.1.3 Library missing, in wrong location, or not compiled

Output:

overtime.c: In function 'hours_to_overtime_test':
overtime.c:11: undefined reference to 'printiln'

Description:

No implementation found for function printiln (line 11).

Solution:

Put prog1lib in the right directory (see slides) and/or compile prog1lib using make.

14.1.4 No return value

Output:

noreturn.c:8:1: control may reach end of non-void function [-Wreturn-type]

Source code:

#include "base.h"

String greeting(int type) {
    if (type == 1) {
        return "Hello";
    }
    // to solve error: return "Moin";    <-- line 7
}   <-- line 8

int main(void) {
    printsln(greeting(1));
    printsln(greeting(2));
    return 0;
}

Description:

At least one execution path does not return a value of the specified type String (line 8).

Solution:

Return a value of the specified type on each possible execution path. Here: uncomment line 7: return "Moin";

1	Introduction
2	Recipe for Atomic Data
3	Evaluating Functions Without Side-Effects
4	Recipe for Enumerations
5	Recipe for Intervals
6	Recipe for Itemizations
7	Recipe for Compound Data (Product Types)
8	Recipe for Variant Data (Sum Types)
9	Doing Things Repeatedly
10	Arrays: Fixed-Size Collections
11	Memory in C
12	Lists: Variable-Size Collections
13	Style
14	Error Messages