Design Recipes in C
Michael Rohs, October 21, 2018
The following design recipes and some of the given code examples are inspired by Matthias Felleisen et al.: How to Design Programs, Second Edition, Joe Politz: Design Recipes for Pyret Functions, and Gregor Kiczales: Introduction to Systematic Program Design.
1 Introduction
1.1 From Problem Statement to Solution
The aim of the design recipes presented below is to help you to get from a textual problem statement to a well-organized and reliable solution. The design recipes are strongly inspired by How to Design Programs, Second Edition. The recipes presented below are available for different forms of data – from simple atomic data, like integer numbers, to arbitrary-length lists. The examples may seem simple and you may be tempted to solve them without following each of the steps presented below. We still urge you to follow these steps to learn and automate them and to be prepared to solve more complicated programming problems later on. Each step produces a well-defined intermediate product and makes it less likely that you stare at a blank screen, not knowing how to start the solution. The main steps in the design recipes are:
Problem statement
Data definition
Purpose statement and function header
Examples
Implementation
Test and revision
In the problem statement step the programmer analyzes the problem, extracts its essence and represents it as a textual description. The problem statement may already be given, but even then you need to analyze it to extract useful information.
The data definition step is about choosing the data types that your program uses to represent the real-world information from the problem statement. Example data types are integer numbers, text, and Boolean values (true, false). The data definition expresses how you wish to represent real-world information as data in the program. An example is real-world temperatures measured in degrees Celsius are represented as numbers in the program. Conversely, the data definition may state how to interpret data items found in the program. For example, the interpretation of the number is degrees Celsius. The following figure illustrates the mediating role of the data definition. (Note that these notions of data and information differ from others, e.g., in information theory.)
The purpose statement describes the effect of the program on the input data. We here focus on the program as an algorithm or function that takes some kind of input data and produces some kind of result data. An example purpose statement would be: “Takes a temperature value in degrees Celsius and returns the corresponding value in degrees Fahrenheit.” A closely related step is to come up with a descriptive name for the algorithm, as well as the names and types of the parameters (input) and the type of the result (output). In the C programming language this requires writing a function header in a specific way. An example header for a function that converts degrees Celsius to degrees Fahrenheit would be:
int celsius_to_fahrenheit(int celsius);
The function name (celsius_to_fahrenheit) is used to call this function, i.e., to apply it to a specific value. The parameter named celsius has type integer number. The result type is an integer number as well. This function maps an integer number to another integer number:
int -> int
The next step is to come up with examples of what you expect the algorithm (here: function) to do on a couple of inputs of your choice. For example, you may know that if you call celsius_to_fahrenheit with 0 °C, you expect a result of 32 °F (the conversion formula is °F = °C * 9 / 5 + 32):
Given: 0, expect: 32
Given: 10, expect: 50
Given -5, expect: 23
Given: 100, expect: 212
Coming up with these examples is trivial in this case, as the formula is known and can directly be written as one line in C, but in general it very much helps to mentally prepare the subsequent implementation. The examples will also serve as test cases later on. The examples should be chosen in such a way that they cover typical cases and boundary cases. More on this later.
With the function header and several examples the implementation step is greatly simplified. The implementation defines how to compute the result from the parameters. Depending on the complexity of the data type it might be helpful to first write down a template of the implementation. For simple data types and problems, the implementation can often be directly written down. For the above example the formula can directly be translated to a return statement in C.
int celsius_to_fahrenheit(int celsius) {
return celsius * 9 / 5 + 32;
}
The test and revision step involves checking whether the implementation produces the expected results. It also involves revising the function to correct or improve it. One (naive) option is to call the function on the examples, let the program print the function results on the console, and then manually check the output:
printiln(celsius_to_fahrenheit(0)); // output: 32
printiln(celsius_to_fahrenheit(10)); // output: 50
printiln(celsius_to_fahrenheit(-5)); // output: 23
printiln(celsius_to_fahrenheit(100)); // output: 212
The function printiln takes an integer number as its parameter, prints it to the console, and then prints a line break. The functions are applied inside-out, i.e., first celsius_to_fahrenheit is applied to 0, the result 32 is returned, and then printiln is applied to 32.
The manual solution is problematic, because the programmer has to remember to check – after each change of the function – whether the output still corresponds to the expected values. For the above implementation this is indeed the case as the console output of the program is:
32
50
23
212
check_expect_i(actual, expected);
check_expect_i(celsius_to_fahrenheit(0), 32);
check passed
actual value 99 differs from expected value 32
The last and final sub-step is to revise the function. If the function fails a test case, the need for a revision is obvious. But even if the function is correct, there might be an opportunity to implement it more clearly or to give it a more descriptive name.
The test cases should always remain in the implementation file and not be deleted after the implementation. That way, later changes can still be checked against the test cases. In fact, the test cases should go in their own function, named by appending _test to the function name, e.g., celsius_to_fahrenheit_test:
static void celsius_to_fahrenheit_test(void) {
// given 0, expect 32
check_expect_i(celsius_to_fahrenheit(0), 32);
// given 10, expect 50
check_expect_i(celsius_to_fahrenheit(10), 50);
// given -5, expect 23
check_expect_i(celsius_to_fahrenheit(-5), 23);
// given 100, expect 212
check_expect_i(celsius_to_fahrenheit(100), 212);
}
The static keyword means that this test functions can only be called from within the same C file. The keyword void as parameter and return type means that the test function does not take any parameters and does not return any results. Its only purpose is to output, for each test case, whether it passed or failed. The // is a comment to human readers of the source code and does not have any effect in the program.
The functions printiln and check_expect_i are part of the Programming I C Library.
1.2 Programming I C Library
The Programming I C Library defines basic data types and functions to simplify C programming for beginners and to enable the step-by-step design approach outlined above. You can download the library with examples from this script and from the lecture.
1.3 Structure
The rest of this script is organized along increasingly complex kinds of data. In each week of the lecture we will cover one or two of these design recipes and apply them to the respective kinds of data. Elements of the C programming language will be introduced as necessary. This approach avoids the blocked presentation of all of the C language features in one place. It might sometimes be necessary to use certain language elements before explaining them in every detail. However, by the end of the lecture you should have a reasonably complete overview of the C language and – much more importantly – of a systematic approach to programming.
Learning to program – similar to learning a sport or a musical instrument – requires time and practice. Don’t let initial problems frustrate you. Learning to program is an excellent way to sharpen your analytical skills. Analyzing a problem, exploring solution possibilities, and evaluating a solution are skills that are necessary in many domains, not only in computer programming.
The general recipe presented above is adapted to different kinds of data. This script covers these specific recipes:
Recipe for Atomic Data
Recipe for Enumerations
Recipe for Intervals
Recipe for Itemizations
Recipe for Compound Data (Product Types)
Recipe for Variant Data (Sum Types)
Doing Things Repeatedly: Recipes for loops and recursion
Arrays: Fixed-Size Collections
Lists: Variable-Size Collections
2 Recipe for Atomic Data
Atomic data is data that cannot be broken down further. Examples are integer (int) and floating point numbers (double), Boolean values (bool), and strings (String). It could be argued that text strings are not atomic as they can be further subdivided into characters (char). Yet we treat strings as atomic here.
wages.c serves as the example for atomic data.
2.1 Steps
2.1.1 Problem statement
Write down the problem statement as a comment. The problem statement should answer these questions: What information needs to be represented? What should the function (to be implemented) do with the data? What cases need to be considered?
/*
Design a function that computes weekly wages with overtime from the number of hours worked. The hourly rate is 10 €/hour. Regular working time is 40 hours/week. Overtime is paid 150% of the normal rate of pay.
*/
2.1.2 Data definition
Write how the real-world information (also called domain information) should be represented in the program. Conversely, write how to interpret the data (e.g., a number) as real-world information (e.g., degrees Celsius). Write type definitions (typedef) if this clarifies things. Alternatively, you may decide to use existing types directly (e.g., int, double, bool, or String) and describe the interpretation in a comment.
// int represents degrees Celsius
The informal data definition is ignored by the compiler. In the program you have to remember that the value of type int actually represents degrees Celsius. Alternatively, you may introduce a synonym (e.g., Celsius) for an existing datatype (like int) and use that synonym instead of the existing data type. Choose the new type name so as to clarify the interpretation of a value in the program. In C new names for types are defined using the typedef keyword. Examples of formal data definitions of this kind are:
typedef int Celsius; // represents the temperature in degrees Celsius
typedef double MyType; // interpretation...
typedef String MyType; // interpretation...
typedef char MyType; // interpretation...
For the wages example, the formal data definition is:
typedef int Hours; // represents hours worked
typedef int Cents; // represents wage in cents
A downside of using type definitions like this is that they hide the actual data type that is used to represent the value. For example, Celsius is here defined in terms of integral numbers. It would also be reasonable to represent fractions of degrees (typedef double Celsius). The synonym Celsius does not tell what the underlying data type is.
2.1.3 Function signature
// Hours -> Cents
// int -> int
2.1.4 Function name
hours_to_wages
2.1.5 Function header
Cents hours_to_wages(Hours hours);
int hours_to_wages(int hours);
2.1.6 Function stub
Cents hours_to_wages(Hours hours) {
return 0;
}
int hours_to_wages(int hours) {
return 0;
}
2.1.7 Purpose statement
Write down a purpose statement (given as a comment).
// Computes the wage in cents given the number of hours worked.
2.1.8 Examples and expected results
Write down examples with expected results in the test method. Define any constants that the function needs. Check that the code compiles. (Some tests will fail for the stub.)
Examples:
For 0 hours worked, expect 0 cents.
For 20 hours worked, expect 20 * 1000 cents.
For 39 hours worked, expect 39 * 1000 cents.
For 40 hours worked, expect 40 * 1000 cents.
For 41 hours worked, expect 40 * 1000 + 1 * 1500 cents.
For 45 hours worked, expect 40 * 1000 + 5 * 1500 cents.
Corresponding test cases in test function:
void hours_to_wages_test() {
check_expect_i(hours_to_wages(0), 0);
check_expect_i(hours_to_wages(20), 20 * 1000);
check_expect_i(hours_to_wages(39), 39 * 1000);
check_expect_i(hours_to_wages(40), 40 * 1000);
check_expect_i(hours_to_wages(41), 40 * 1000 + 1 * 1500);
check_expect_i(hours_to_wages(45), 40 * 1000 + 5 * 1500);
}
The check_expect_i function takes two arguments: The actual result and the expected result. Each check will report the line number on which it appears. This helps to locate failed tests in the source code. The first argument contains the call to the function that is to be tested. This evaluates to the return value of the call.
The test function is invoked from the main function. The main function is the entry point of a C program.
int main(void) {
hours_to_wages_test();
return 0; // return success value to operating system
}
Running the above tests on the stub produces:
wages.c, line 20: check passed
wages.c, line 21: Actual value 0 differs from expected value 20000.
wages.c, line 22: Actual value 0 differs from expected value 39000.
wages.c, line 23: Actual value 0 differs from expected value 40000.
wages.c, line 24: Actual value 0 differs from expected value 41500.
wages.c, line 25: Actual value 0 differs from expected value 47500.
5 of 6 tests failed.
2.1.9 Function body
Implement the function body. Put required helper functions on a wish list. These will be implemented later.
How to identify the need for a helper function: A function should perform one well-defined task. A change in domain or data type should be outsourced in a helper function. A reusable subtask should be outsourced in a helper function (Don’t Repeat Yourself, DRY). It is often helpful to design a stub for the helper functions. This way you can already compile the program.
The implementation of the example function does not need helper functions. However, it needs an if-statement to decide whether the working time includes overtime.
// Computes the wage in cents given the number of hours worked.
Cents hours_to_wages(Hours hours) {
if (hours <= 40) {
return hours * 1000;
} else {
return 40 * 1000 + (hours - 40) * 1500;
}
}
// Computes the wage in cents given the number of hours worked.
int hours_to_wages(int hours) {
if (hours <= 40) {
return hours * 1000;
} else {
return 40 * 1000 + (hours - 40) * 1500;
}
}
2.1.10 Testing
Check that the function body satisfies the tests. Correct the function body (and the tests). Look for opportunities to simplify the structure of the code. This typically requires multiple iterations.
Running the tests on the implemented function produces:
wages.c, line 20: check passed
wages.c, line 21: check passed
wages.c, line 22: check passed
wages.c, line 23: check passed
wages.c, line 24: check passed
wages.c, line 25: check passed
All 6 tests passed!
2.1.11 Review and revise
Review and revise the function name, the parameter names, and the purpose statement. Improve them if necessary. A design is not complete until it has a purpose statement and tests. The purpose statement should describe what the function computes (not how it does the computation) and should mention the given inputs and produced result. The test examples should cover corner cases and a typical case.
The complete source file wages.c now looks like this:
/*
Compile: make wages
Run: ./wages
*/
#include "base.h" // we use check_expect_i
/*
Design a function that computes weekly wages with overtime from the number of hours worked. The hourly rate is 10 €/hour. Regular working time is 40 hours/week. Overtime is paid 150% of the normal rate of pay.
*/
typedef int Hours; // represents hours worked
typedef int Cents; // represents wage in cents
// Hours -> Cents
Cents hours_to_wages(Hours hours); // function header
void hours_to_wages_test() {
check_expect_i(hours_to_wages(0), 0); // line 20
check_expect_i(hours_to_wages(20), 20 * 1000); // line 21
check_expect_i(hours_to_wages(39), 39 * 1000); // line 22
check_expect_i(hours_to_wages(40), 40 * 1000); // line 23
check_expect_i(hours_to_wages(41), 40 * 1000 + 1 * 1500);
check_expect_i(hours_to_wages(45), 40 * 1000 + 5 * 1500);
}
// Computes the wage in cents given the number of hours worked.
Cents hours_to_wages(Hours hours) { // function implementation
if (hours <= 40) {
return hours * 1000;
} else {
return 40 * 1000 + (hours - 40) * 1500;
}
}
int main(void) {
hours_to_wages_test();
return 0;
}
If you don’t like typedefs (to create synonyms of type names), you can omit them and use the underlying data type directly:
/*
Compile: make wages
Run: ./wages
*/
#include "base.h" // we use check_expect_i
/*
Design a function that computes weekly wages with overtime from the number of hours worked. The hourly rate is 10 €/hour. Regular working time is 40 hours/week. Overtime is paid 150% of the normal rate of pay.
*/
// int -> int
int hours_to_wages(int hours); // function header
void hours_to_wages_test() {
check_expect_i(hours_to_wages(0), 0); // line 20
check_expect_i(hours_to_wages(20), 20 * 1000); // line 21
check_expect_i(hours_to_wages(39), 39 * 1000); // line 22
check_expect_i(hours_to_wages(40), 40 * 1000); // line 23
check_expect_i(hours_to_wages(41), 40 * 1000 + 1 * 1500);
check_expect_i(hours_to_wages(45), 40 * 1000 + 5 * 1500);
}
// Computes the wage in cents given the number of hours worked.
int hours_to_wages(int hours) { // function implementation
if (hours <= 40) {
return hours * 1000;
} else {
return 40 * 1000 + (hours - 40) * 1500;
}
}
int main(void) {
hours_to_wages_test();
return 0;
}
Note that either the function header or the function implementation or both need to appear in the source file before a function can be called. Function headers are typically put in separate header files (like wages.h) and then included. In the above code, the function header of check_expect_i is located in file base.h from the Programming I C library. The #include line can be understood as the replacement of that line by the contents of the given file.
2.2 Generalizing the Function
The example function as written above contains some concrete bits of information, such as the work hours per week, the regular hourly rate, and the hourly rate for overtime. The weekly work hours are repeated multiple times, which contradicts the “Don’t Repeat Yourself” (DRY) principle. Such information should be represented as constants or as parameters to the function. What is represented as a constant and what is represented as a parameter depends on the needs of the application domain.
One possibility is to extract all information as constants:
const int WEEKLY_HOURS = 40; // regular work hours per week
const int HOURLY_RATE_REGULAR = 1000; // in cents
const int HOURLY_RATE_OVERTIME = 1500; // in cents
Cents hours_to_wages(Hours hours) {
if (hours <= WEEKLY_HOURS) {
return hours * HOURLY_RATE_REGULAR;
} else {
return WEEKLY_HOURS * HOURLY_RATE_REGULAR
+ (hours - WEEKLY_HOURS) * HOURLY_RATE_OVERTIME;
}
}
Another possibility is to include all information as parameters:
Cents hours_to_wages(
Hours weekly_hours,
Cents hourly_rate_regular,
Cents hourly_rate_overtime,
Hours hours_worked)
{
if (hours_worked <= weekly_hours) {
return hours_worked * hourly_rate_regular;
} else {
return weekly_hours * hourly_rate_regular
+ (hours_worked - weekly_hours) *
hourly_rate_overtime;
}
}
This makes the function more flexible, but also more difficult to use, because many arguments have to be provided. Typically only the hourly rate and hours worked change frequently (from worker to worker and for different weeks, respectively).
2.3 Exercises
Design a function has_overtime that computes whether or not the given number of work hours contains overtime.
Modify hours_to_wages such that the program stops if hours is negative. The function call exit(1); stops the program with error code 1.
3 Evaluating Functions Without Side-Effects
A function is like a cloze (a text with _____ gaps). To complete a cloze you fill in the right words. To call a function you fill in parameters of the right type. When the function
int hours_to_wages(int hours) {
if (hours <= 40) {
return hours * 10;
} else {
return 40 * 10 + (hours - 40) * 15;
}
}
is called, the parameter hours is replaced by an actual integer number and then the body of the function is evaluated to an integer number. For example, the step by step evaluation of hours_to_wages(45) looks like this:
hours_to_wages(45)
Substitute the call with the function body, in which the parameter hours has been replaced with the argument value (45).
=> if (45 <= 40) {
return 45 * 10;
} else {
return 40 * 10 + (45 - 40) * 15;
}
Evaluate the condition (45 <= 40) of the if-statement (to false).
=> if (false) {
return 45 * 10;
} else {
return 40 * 10 + (45 - 40) * 15;
}
Because the condition is false, replace the evaluated if-statement with its else-part.
=> return 40 * 10 + (45 - 40) * 15;
Evaluate left multiplication operator.
=> return 400 + (45 - 40) * 15;
Evaluate sub-expression in parentheses.
=> return 400 + 5 * 15;
Evaluate right multiplication operator.
=> return 400 + 75;
Evaluate plus operator.
=> return 475;
Return the result value.
=> 475
For this particular function, the evaluation can be understood in analogy to simplifying an algebraic expression. Unfortunately, as we shall see, in C this is not always the case. Some functions have so-called side-effects, i.e., they modify some variable or they produce some output in addition to returning a value.
4 Recipe for Enumerations
An enumeration type can represent one of a fixed number of distinct values. Examples of information of this type include gender (male, female), music genres (pop, rock, classic, jazz, etc.), continents (Europe, Asia, Africa, etc.), and traffic light colors (red, green, yellow). Each individual enumerated value is a distinct value and does not carry any additional data. In C, such types can be represented as enumerations.
traffic_light.c serves as the example for enumerations.
4.1 Steps
4.1.1 Problem statement
Write down the problem statement as a comment. The problem statement should answer these questions: What information needs to be represented? What should the function (to be implemented) do with the data? What cases need to be considered?
/*
Design a function that returns the next color of a traffic light given the current color of the traffic light.
*/
4.1.2 Data definition
Write how the domain information should be represented in the program. Conversely, write how to interpret the data (e.g., a number) as real-world information (e.g., a traffic light state). Write a formal data definition as an enumeration.
enum TrafficLight {
RED,
GREEN,
YELLOW
}; Forgetting the ; at the end of the enum definition is a syntax error.
This in effect creates three integer constants, RED with value 0, GREEN with value 1, and YELLOW with value 2. But these details are not essential right now.
4.1.3 Function signature
// enum TrafficLight -> enum TrafficLight
Remember, the problem statement says that we need a function that returns the next color of the traffic light, given its current color.
4.1.4 Function name
traffic_light_next
4.1.5 Function header
enum TrafficLight traffic_light_next(enum TrafficLight tl);
4.1.6 Function stub
enum TrafficLight traffic_light_next(enum TrafficLight tl) {
return RED;
}
This function definition already compiles, but it gives us a traffic light that is constantly red – how frustrating.
4.1.7 Purpose statement
Write down a purpose statement (given as a comment).
// Produces the next color of a traffic light
// given the current color of the traffic light.
4.1.8 Examples and expected results
Write down examples with expected results in the test method. Define any constants that the function needs. Check that the code compiles. (Some tests will fail for the stub.)
Examples:
If the traffic light is RED, expect GREEN as the next state.
If the traffic light is GREEN, expect YELLOW as the next state.
If the traffic light is YELLOW, expect RED as the next state.
Corresponding test cases in test function:
void traffic_light_next_test() {
check_expect_i(traffic_light_next(RED), GREEN);
check_expect_i(traffic_light_next(GREEN), YELLOW);
check_expect_i(traffic_light_next(YELLOW), RED);
}
Because RED, GREEN, and YELLOW are actually integer constants, we can use the check_expect_i function that takes integer arguments (hence the _i).
The test function is invoked from the main function:
int main(void) {
traffic_light_next_test();
return 0;
}
Running the above tests on the stub (which always returns RED) produces:
traffic_light.c, line 18: Actual value 0 differs from expected value 1.
traffic_light.c, line 19: Actual value 0 differs from expected value 2.
traffic_light.c, line 20: check passed
2 of 3 tests failed.
The output shows that enumerations are represented as integer values. The first element of the TrafficLight enumeration (here RED) gets assigned value 0, the second element (here GREEN) gets value 1, and the third one (here YELLOW) gets value 2.
4.1.9 Function body
Implement the function body. Put required helper functions on a wish list. These will be implemented later.
How to identify the need for a helper function: A function should perform one well-defined task. A change in domain or data type should be outsourced in a helper function. A reusable subtask should be outsourced in a helper function (Don’t Repeat Yourself, DRY). It is often helpful to design a stub for the helper functions. This way you can already compile the program.
The implementation of the example function does not need helper functions:
// Produces the next color of a traffic light
// given the current color of the traffic light.
enum TrafficLight traffic_light_next(enum TrafficLight tl) {
if (tl == RED) {
return GREEN;
} else if (tl == GREEN) {
return YELLOW;
} else if (tl == YELLOW) {
return RED;
}
return RED; // this line should never be reached
}
The last return statement should never be reached, because all possible cases are covered above it. However, some C compilers cannot figure this out automatically and need a return statement at this point.
4.1.10 Testing
Check that the function body satisfies the tests. Correct the function body (and the tests). Look for opportunities to simplify the structure of the code. This typically requires multiple iterations.
The complete file traffic_light.c now looks like this:
/*
Compile: make traffic_light
Run: ./traffic_light
*/
#include "base.h"
/*
Design a function that returns the next color of a traffic light
given the current color of the traffic light.
*/
enum TrafficLight {
RED,
GREEN,
YELLOW
};
// enum TrafficLight -> enum TrafficLight
enum TrafficLight traffic_light_next(enum TrafficLight);
void traffic_light_next_test() {
check_expect_i(traffic_light_next(RED), GREEN); // line 18
check_expect_i(traffic_light_next(GREEN), YELLOW); // line 19
check_expect_i(traffic_light_next(YELLOW), RED); // line 20
}
// Produces the next color of a traffic light
// given the current color of the traffic light.
enum TrafficLight traffic_light_next(enum TrafficLight tl) {
if (tl == RED) {
return GREEN;
} else if (tl == GREEN) {
return YELLOW;
} else if (tl == YELLOW) {
return RED;
}
return RED;
}
int main(void) {
traffic_light_next_test();
return 0;
}
Running the tests on the implemented function produces:
traffic_light.c, line 18: check passed
traffic_light.c, line 19: check passed
traffic_light.c, line 20: check passed
All 3 tests passed!
4.1.11 Review and revise
Review and revise the function name, the parameter names, and the purpose statement. Improve them if necessary. A design is not complete until it has a purpose statement and tests. The purpose statement should describe what the function computes (not how it does the computation) and should mention the given inputs and produced result. The test examples should cover corner cases and a typical case.
In the given situation, the switch-statement is a good alternative to if-statements.
enum TrafficLight traffic_light_next(enum TrafficLight tl) {
switch (tl) {
case RED: return GREEN; // leave the function
case GREEN: return YELLOW; // leave the function
case YELLOW: return RED; // leave the function
}
return RED;
}
Since tl is an enumeration, many C compilers can automatically check whether the switch-statement handles all the cases of the enumeration. In C an enumeration value is just an integer. A careless programmer could set an invalid traffic light value (e.g., by calling traffic_light_next(-123)).
However, the switch-statement is dangerous. It falls through all conditions after the matching one, unless the case is terminated by a break-statement or a return-statement (the latter also terminates the function).
enum TrafficLight traffic_light_next(enum TrafficLight tl) {
switch (tl) {
case RED:
...
break; <--- leave the switch-statement!
case GREEN:
...
break; <--- leave the switch-statement!
case YELLOW:
...
break; <--- leave the switch-statement!
}
// <--- the breaks jump to this point after the switch statement
...
return ...;
}
Moreover, the cases in the switch-statement have to be constant integer values. Never put anything else there! If you are unsure, use if-statements!
Given this information, we can write a reusable template for functions that handle traffic lights:
... fn_for_traffic_light(enum TrafficLight tl) {
switch (tl) {
case RED:
...
return...;
case GREEN:
...
return...;
case YELLOW:
...
return...;
}
return 0;
}
This template may be copied, pasted, and adapted for new functions that operate on this data type.
4.2 Type definitions and enumerations
enum TrafficLight {
RED,
GREEN,
YELLOW
};
typedef enum TrafficLight TrafficLight;
In general:
typedef <my complicated type> MyNewNameForComplicatedType;
Now you can simply write TrafficLight instead of enum TrafficLight. The name of the enumeration and the name of the defined type may be identical, as is the case here.
4.3 Exercises
Design a function continent_next that computes the next continent for a sailing ship. The route of the sailing ship is influenced by the wind direction. The next continent depends on the current continent and the wind direction (the direction from which the wind originates). Among others, the following relations hold: Europe –northwind–> Africa, Europe –westwind–> Asia, Asia –eastwind–> Europe. Complete these relations using a world map. To simplify the exercise assume that the Americas and Australia do not exist, and that if a ship reaches Antarctica it sinks. Assume that the only wind directions (directions from which the wind originates) are east, west, north, and south, and that they do not change during a single journey of a sailing ship from one continent to another. Use suitable enumerations.
5 Recipe for Intervals
When the domain information that needs to be represented in the program consists of one or more ranges of numbers, then the data definition is concerned with handling intervals. An example would be a taxation scheme in which there is no taxation for goods below 1000 €, moderate taxation between 1 k€ – 10 k€, and high taxation for goods above 10 k€. When dealing with intervals special care needs to be taken to treat the boundary cases correctly. In the example above, is a product of 1000 € still without taxation or does it fall in the next interval? Boundaries can be represented precisely in this way: [0, 1000), [1000, 10000), [10000, inf), where brackets [ and ] denote inclusive boundaries and parentheses ( and ) denote exclusive boundaries: 1000 does not belong to the interval [0, 1000), but 0 does.
Data definitions for intervals typically use enumerations to name each interval as well as constants to define the boundaries of the intervals.
tax.c serves as the example for intervals.
5.1 Steps
5.1.1 Problem statement
Write down the problem statement as a comment. The problem statement should answer these questions: What information needs to be represented? What should the function (to be implemented) do with the data? What cases need to be considered?
/*
A fictitious country has decided to introduce a three-stage sales tax. Cheap items below 1 k€ are not taxed. Goods of more than 10 k€ are taxed at 10%. Items in between are taxed at a rate of 5%. Give a data definition and define a function that computes the amount of tax for a given item price.
*/
5.1.2 Data definition
enum TaxStage {
NO_TAX,
LOW_TAX,
HIGH_TAX
};
typedef int Euro; // int represents Euro
const Euro LOW_TAX_BOUNDARY = 1000; // interpret.: price in Euro
const Euro HIGH_TAX_BOUNDARY = 10000; // interpret.: price in Euro
5.1.3 Function signature
// Euro -> Euro
// int -> int
5.1.4 Function name
sales_tax
5.1.5 Function header
Euro sales_tax(Euro price);
int sales_tax(int price);
5.1.6 Function stub
Euro sales_tax(Euro price) {
return 0;
}
int sales_tax(int price) {
return 0;
}
5.1.7 Purpose statement
Write down a purpose statement (given as a comment). The purpose statement should describe what the function computes (not how it does that) and should mention the given inputs and produced result.
// Returns the amount of tax for the given price.
5.1.8 Examples and expected results
Write down examples with expected results in the test method. The test examples should cover corner cases (e.g., boundary values) and a typical case of each category (e.g. a value from the interior of an interval). Boolean functions should test positive and negative examples. You may already define constants for use in the implementation. Check that the code compiles. (Some tests will fail for the stub.)
Examples:
For a price of 0 € expect a sales tax of 0 €.
For a price of 537 € expect a sales tax of 0 €.
For a price of 1000 € expect a sales tax of 50 €.
For a price of 1282 € expect a sales tax of 64 €.
For a price of 10000 € expect a sales tax of 1000 €.
For a price of 12017 € expect a sales tax of 1202 €.
Corresponding test cases in test function:
int round_to_int(double d); // function header
static void sales_tax_test() {
check_expect_i(sales_tax(0), 0);
check_expect_i(sales_tax(537), 0);
check_expect_i(sales_tax(1000),
round_to_int(1000 * 0.05));
check_expect_i(sales_tax(1282),
round_to_int(1282 * 0.05));
check_expect_i(sales_tax(10000),
round_to_int(10000 * 0.10));
check_expect_i(sales_tax(12017),
round_to_int(12017 * 0.10));
}
int round_to_int(double d) {
return (int)round(d); // round included from math.h
}
Note that the test cases cover each of the boundary values of the intervals, and at least one value from the interior of each interval.
5.1.9 Function body
Implement the function body. Put required helper functions on a wish list. These will be implemented later.
How to identify the need for a helper function: A function should perform one well-defined task. A change in domain or data type should be outsourced in a helper function. A reusable subtask should be outsourced in a helper function (Don’t Repeat Yourself, DRY). It is often helpful to design a stub for the helper functions. This way you can already compile the program.
The implementation uses round_to_int as a helper function.
// Returns the amount of tax for the given price.
Euro sales_tax(Euro price) { // Euro stands for int
if (0 <= price && price < 1000) { // NO_TAX interval
return 0;
} else if (1000 <= price && price < 10000) { // LOW_TAX interval
return round_to_int(0.05 * price);
} else if (price >= 10000) { // HIGH_TAX interval
return round_to_int(0.10 * price);
}
// error: if this line is reached then price < 0
printsln("sales_tax, error: negative price");
exit(1); // terminate the program
}
The implementation reflects the structure of the data. As specified in the enumeration there are three cases: NO_TAX, LOW_TAX, and HIGH_TAX. These three cases correspond to three intervals: [0, 1000), [1000, 10000), and [1000, inf). For each interval there is an if-statement whose condition matches one of the intervals. When formulating the conditions care has to be taken to handle the boundary values correctly. In addition, there is an error case for negative prices. Since Euro is represented as int, a negative number could be provided, which could lead to an error. To avoid this problem, the currency could be represented as an unsigned int, but this would make it difficult to represent debt. Unsigned types have special problems. We will avoid unsigned types in this lecture.
5.1.10 Testing
Check that the function body satisfies the tests. Correct the function body (and the tests). Look for opportunities to simplify the structure of the code. This typically requires multiple iterations.
int main(void) {
sales_tax_test();
return 0;
}
The given implementation satisfies all test examples:
tax.c, line 33: check passed
tax.c, line 34: check passed
tax.c, line 36: check passed
tax.c, line 38: check passed
tax.c, line 40: check passed
tax.c, line 42: check passed
All 6 tests passed!
5.1.11 Review and revise
Review and revise the function name, the parameter names, and the purpose statement. Improve them if necessary. A design is not complete until it has a purpose statement and tests.
const double LOW_TAX_RATE = 0.05;
const double HIGH_TAX_RATE = 0.10;
// Returns the amount of tax for the given price.
Euro sales_tax(Euro price) { // Euro stands for int
if (price < 0) { // error
printsln("sales_tax, error: negative price");
exit(1); // terminate the program
} else if (price < LOW_TAX_BOUNDARY) { // NO_TAX interval
return 0;
} else if (price < HIGH_TAX_BOUNDARY) { // LOW_TAX interval
return round_to_int(LOW_TAX_RATE * price);
} else { // HIGH_TAX interval
return round_to_int(HIGH_TAX_RATE * price);
}
}
Domain knowledge might suggest to include a constant for the first category as well. There could be a revision of the taxation system, in which the first interval of [0, 1000) Euro becomes subject to a small taxation rate as well. Moreover, an additional interval for higher-priced items could be introduced, which would require adding a case. It is up to the programmer’s judgment to decide how far the generalization should go, because this can come at a cost. E.g., introducing another constant for the first interval could increase the effort to read the code.
If we are satisfied with the function, we can write a reusable template, which can serve as a basis when implementing functions related to this taxation scheme in the future.
const double LOW_TAX_RATE = 0.05;
const double HIGH_TAX_RATE = 0.10;
... fn_for_tax(Euro price) { // Euro stands for int
if (price < 0) { // error
printsln("sales_tax, error: negative price");
exit(1); // terminate the program
} else if (price < LOW_TAX_BOUNDARY) { // NO_TAX interval
...
return ...;
} else if (price < HIGH_TAX_BOUNDARY) { // LOW_TAX interval
...
return ...;
} else { // HIGH_TAX interval
...
return ...;
}
}
This template can be adapted for new functions that operate on this data type.
5.2 Exercises
A taxation rate of 1.5% is introduced for items below 1000 €. Modify the sales_tax function to reflect this change.
A revision of the taxation system introduces a linearly increasing taxation rate for the middle interval (1 k€ – 10 k€). As before, the taxation rate is 5% at 1 k€ and 10% at 10 k€, but is linearly interpolated in between. Modify the sales_tax function to reflect this change.
6 Recipe for Itemizations
Itemizations mix aspects of enumerations and intervals. Like enumerations they denote data which represents different alternatives, but at least one of the alternatives holds additional data. An example is information about the position of a train in a tunnel: Either there is no train in the tunnel (case 1) or there is a train at a certain position in the tunnel (case 2). Case 2 has the train’s position as additional data.
Data definitions for itemizations typically use enumerations to name each alternative as well as additional data for some of the cases (e.g., a number to represent position). The additional data is given either in a struct or in a union.
move_train.c serves as the example for itemizations.
6.1 Steps
6.1.1 Problem statement
Write down the problem statement as a comment. The problem statement should answer these questions: What information needs to be represented? What should the function (to be implemented) do with the data? What cases need to be considered?
/*
A critical section of a railway track has a length of 10 km. Trains pass through the critical section in both directions. At most one train is allowed on this critical section at any one time. A control system is to be implemented that provides the position of a train in that section (if there is one) or an indication that there is no train in that section. Define a data definition for this information. Define a function that takes this data definition and advances the train's position by a given amount (in km). This may result in the train leaving the critical section.
*/
6.1.2 Data definition
enum TrainTag { // enumerates all possible alternatives
T_ABSENT,
T_PRESENT
};
struct Train { // has the alternative and the additional information
enum TrainTag tag; // alternative
// the distance from the start of the critical section (in km)
double position; // additional information for T_PRESENT
}; Forgetting the ; at the end of the struct definition is a syntax error.
struct Train make_train_none() { // constructor function
struct Train t = { T_ABSENT, 0.0 };
return t;
}
struct Train make_train_at(double position) { // constructor function
struct Train t = { T_PRESENT, position };
return t;
}
A train can either be absent or present. If it is present, the position is given as a floating point number. Position 0.0 represents the start of the critical section, position 10.0 represents its end. The constructor functions simplify the creation of values of type struct Train. There is one constructor function for the case of no train in the critical section (make_train_none) and one constructor function for the case of a train at a particular position in the critical section (make_train_at).
6.1.3 Function signature
// struct Train, double -> struct Train
6.1.4 Function name
move_train
6.1.5 Function header
struct Train move_train(struct Train train, double amount);
6.1.6 Function stub
struct Train move_train(struct Train train, double amount) {
return make_train_none();
}
6.1.7 Purpose statement
Write down a purpose statement (given as a comment). The purpose statement should describe what the function computes (not how it does that) and should mention the given inputs and produced result.
// Advances the train by the given amount. Considers the case
// that the train enters or leaves the critical section.
6.1.8 Examples and expected results
Write down examples with expected results in the test method. The test examples should cover corner cases (e.g., boundary values) and a typical case of each category (e.g. a value from the interior of an interval). Boolean functions should test positive and negative examples. You may already define constants for use in the implementation. Check that the code compiles. (Some tests will fail for the stub.)
Unfortunately, there are only check functions for existing types, like int. The new type struct Train requires that we implement a specific check function for that type, which checks its components. The dot operator (.) allows accessing the components (tag and position) of the structure. To be able to locate failed tests in the source code, we use __LINE__ and __FILE__, which the compiler will replace by the line number and file name, respectively.
Trains are considered equal if their tags are equal and their positions are within a small range. EPSILON is a positive floating-point value close to zero, such as 0.00000001. It is used because not every real number (a mathematical entity) can be represented exactly as a floating-point value (a representation in the computer). For the check to pass, the actual value has to be within +/-EPSILON of the expected value.
// helper function for comparing trains
bool check_expect_train(int line, struct Train t, struct Train s) {
bool tags_equal = base_check_expect_i(__FILE__, line,
t.tag, s.tag);
bool positions_within = base_check_within_d(__FILE__, line,
t.position, s.position, EPSILON);
return tags_equal && positions_within;
}
The functions base_check_expect_i and base_check_within_d are included from base.h. The first checks integers for equality. The second checks whether two floating-point values are within +/-EPSILON of each other. Both functions print a message about the result (i.e., whether the check passed or failed).
The test examples handle several cases, such as the borders of the critical section, advancement in its interor, and entering and leaving the critical section.
Examples:
Advancing a train that is not in the critical section (make_train_none()) by 3.0 km results in a train that is still not in the critical section (make_train_none()).
Advancing a train that is at the start of the critical section (make_train_at(0.0)) by 0.0 km results in a train that is still at the start of the critical section (make_train_at(0.0)).
Advancing a train that is at position 1.0 km of the critical section (make_train_at(1.0)) by -1.0 km results in a train that is at the start of the critical section (make_train_at(0.0)).
etc.
Corresponding test cases in test function:
static void move_train_test() {
// absent trains, moving an absent train has no effect
struct Train actual = make_train_none(); // absent train
actual = move_train(actual, 3.0); // move absent train by 3.0 km
struct Train expected = make_train_none(); // absent train
check_expect_train(__LINE__, actual, expected);
// same as above, but as nested calls
check_expect_train(__LINE__, move_train(make_train_none(), 3.0),
make_train_none());
// borders
check_expect_train(__LINE__, move_train(make_train_at(0.0), 0.0),
make_train_at(0.0));
check_expect_train(__LINE__, move_train(make_train_at(10.0), 0.0),
make_train_at(10.0));
check_expect_train(__LINE__, move_train(make_train_at(1.0), -1.0),
make_train_at(0.0));
check_expect_train(__LINE__, move_train(make_train_at(9.0), 1.0),
make_train_at(10.0));
// interior (both before and after advance)
check_expect_train(__LINE__, move_train(make_train_at(1.0), 0.0),
make_train_at(1.0));
check_expect_train(__LINE__, move_train(make_train_at(5.5), 1.5),
make_train_at(7.0));
check_expect_train(__LINE__, move_train(make_train_at(4.5), -1.0),
make_train_at(3.5));
// leaving the section
check_expect_train(__LINE__, move_train(make_train_at(9.0), 1.1),
make_train_none());
check_expect_train(__LINE__, move_train(make_train_at(1.0), -1.1),
make_train_none());
// entering the section
check_expect_train(__LINE__, move_train(make_train_at(-0.1), 0.1),
make_train_at(0.0));
check_expect_train(__LINE__, move_train(make_train_at(10.1), -0.1),
make_train_at(10.0));
}
6.1.9 Function body
Implement the function body. Put required helper functions on a wish list. These will be implemented later.
How to identify the need for a helper function: A function should perform one well-defined task. A change in domain or data type should be outsourced in a helper function. A reusable subtask should be outsourced in a helper function (Don’t Repeat Yourself, DRY). It is often helpful to design a stub for the helper functions. This way you can already compile the program.
We only need check_expect_train defined above.
// Advances the train by the given amount. Considers the case
// that the train enters or leaves the critical section.
struct Train move_train(struct Train train, double amount) {
if (train.tag == T_ABSENT) {
return make_train_none();
} else if (train.tag == T_PRESENT) {
double new_position = train.position + amount;
if (new_position < 0.0 || new_position > 10.0) {
return make_train_none();
} else {
return make_train_at(new_position);
}
}
return train;
}
The implementation handles the two cases of the Train type. If a train is present, the new position is calculated. It is then checked whether the new position falls outside the critical section. If so, this corresponds to the absence of the train in the critical section. Otherwise a train representing presence at the new position is returned.
6.1.10 Testing
Check that the function body satisfies the tests. Correct the function body (and the tests). Look for opportunities to simplify the structure of the code. This typically requires multiple iterations. The function does not have side effects.
int main(void) {
move_train_test();
return 0;
}
6.1.11 Review and revise
Review and revise the function name, the parameter names, and the purpose statement. Improve them if necessary. A design is not complete until it has a purpose statement and tests.
The above implementation should be improved. The start and end positions should be replaced by constants. Moreover, the if-statements can be replaced by a switch-statement.
const double START_POSITION = 0.0;
const double END_POSITION = 10.0;
// Advances the train by the given amount. Considers the case
// that the train enters or leaves the critical section.
struct Train move_train(struct Train train, double amount) {
double new_position = train.position + amount;
switch (train.tag) {
case T_ABSENT:
return make_train_none();
case T_PRESENT:
if (new_position < START_POSITION ||
new_position > END_POSITION)
{
return make_train_none();
} else {
return make_train_at(new_position);
}
}
return train;
}
If we are satisfied with the function, we can write a reusable template that can serve as a basis for future functions on this type.
const double START_POSITION = 0.0;
const double END_POSITION = 10.0;
... fn_for_train(struct Train train, ...) {
switch (train.tag) {
case T_ABSENT:
...
return ...;
case T_PRESENT:
...
return ...;
}
return ...;
}
6.2 Exercises
Write a function that takes two values of type Train and tests whether they are in conflict with each other, i.e., whether they correspond to a state that is not allowed in the critical section.
Design a function that takes a value of type Train and a percentage value p and checks whether the train is located in the first p% of the critical section.
7 Recipe for Compound Data (Product Types)
Compound data types aggregate potentially different kinds of data into a whole. C uses structures (struct) to represent values that consist of multiple components. The components in turn can be atomic or themselves structured. As an example we use a 2-dimensional point with x- and y-coordinates as components. Compound data types are also called product types, because they form the Cartesian product of their components.
point.c serves as the example for compound data.
7.1 Steps
7.1.1 Problem statement
Write down the problem statement as a comment. The problem statement should answer these questions: What information needs to be represented? What should the function (to be implemented) do with the data? What cases need to be considered?
/*
Objects are located somewhere on a 2D plane. Design a function that computes the distance of the center of an object to the origin of the coordinate system.
*/
7.1.2 Data definition
Write how the domain information should be represented in the program, or, vice versa, write how to interpret the data (e.g., a pair of numbers) as real-world information (e.g., a point in 2D space). Write a formal data definition as a struct. Write a constructor for creating and initializing values of this type.
struct Point {
double x; // represents the x-coordinate of a point
double y; // represents the y-coordinate of a point
}; Forgetting the ; at the end of the struct definition is a syntax error.
// constructor function for Point
struct Point make_point(double x, double y) {
struct Point p = { x, y };
return p;
}
The constructor function simplifies the creation of values of type struct Point.
7.1.3 Function signature
// struct Point -> double
7.1.4 Function name
distance_to_origin
7.1.5 Function header
double distance_to_origin(struct Point p);
7.1.6 Function stub
double distance_to_origin(struct Point p) {
return 0.0;
}
7.1.7 Purpose statement
Write down a purpose statement (given as a comment).
// Computes the distance from the given point
// to the origin of the coordinate system.
7.1.8 Examples and expected results
Write down examples with expected results in the test method. Define any constants that the function needs. Check that the code compiles. (Some tests will fail for the stub.)
Examples:
For point (0,0) expect a distance to origin of 0.
For point (1,0) expect a distance to origin of 1.
For point (-1,0) expect a distance to origin of 1.
For point (3,4) expect a distance to origin of 5.
etc.
Corresponding test cases in test function:
static void distance_to_origin_test() {
check_within_d(distance_to_origin(make_point(0, 0)),
0.0, EPSILON);
check_within_d(distance_to_origin(make_point(1, 0)),
1.0, EPSILON);
check_within_d(distance_to_origin(make_point(-1, 0)),
1.0, EPSILON);
check_within_d(distance_to_origin(make_point(0, 2)),
2.0, EPSILON);
check_within_d(distance_to_origin(make_point(0, -2)),
2.0, EPSILON);
check_within_d(distance_to_origin(make_point(3, 4)),
5.0, EPSILON);
check_within_d(distance_to_origin(make_point(3, -4)),
5.0, EPSILON);
}
The test function shows multiple aspects. First, the constructor function make_point creates values of type Point. Second, because floating-point values are not exact, the check... function does not test for equality, but whether the function result is sufficiently close to the expected result. EPSILON is a constant of type double (e.g., 0.00000001) that specifies the allowed tolerance.
7.1.9 Function body
Implement the function body. Put required helper functions on a wish list. These will be implemented later.
How to identify the need for a helper function: A function should perform one well-defined task. A change in domain or data type should be outsourced in a helper function. A reusable subtask should be outsourced in a helper function (Don’t Repeat Yourself, DRY). It is often helpful to design a stub for the helper functions. This way you can already compile the program.
The implementation of the example function does not need helper functions.
// Computes the distance from the given point
// to the origin of the coordinate system.
double distance_to_origin(struct Point p) {
// square root (sqrt) function header included from math.h
return sqrt(p.x * p.x + p.y * p.y);
}
The function inspects the components of the structure to compute the result value. The dot operator (.) is used to access a component value (e.g., x) of the compound value (p).
7.1.10 Testing
Check that the function body satisfies the tests. Correct the function body (and the tests). Look for opportunities to simplify the structure of the code. This typically requires multiple iterations.
int main(void) {
distance_to_origin_test();
return 0;
}
7.1.11 Review and revise
Review and revise the function name, the parameter names, and the purpose statement. Improve them if necessary. A design is not complete until it has a purpose statement and tests. The purpose statement should describe what the function computes (not how it does the computation) and should mention the given inputs and produced result. The test examples should cover corner cases and a typical case.
7.2 Type definitions and structures
struct Point {
double x; // represents the x-coordinate of a point
double y; // represents the y-coordinate of a point
};
struct Point make_point(double x, double y) {
struct Point p = { x, y };
return p;
}
typedef struct Point Point;
In general:
typedef <my complicated type> NewNameForComplicatedType;
Now you can simply write Point instead of struct Point. The name of the structure and the name of the defined type may be identical, as is the case here.
7.3 Exercises
Design a function distance_points that computes the distance between two points.
Design a function circles_overlap that tests whether two circles overlap. Define a suitable data definition for circles.
Design a function and a data definition for points that are represented in polar coordinates. Search online for the definition of polar coordinates.
8 Recipe for Variant Data (Sum Types)
Sometimes, data can take on one of different variants. An example are points that may be represented either in Euclidean coordinates (x, y) or in polar coordinates (theta, magnitude). In the latter case, magnitude specifies the distance from the origin of the coordinate system and theta specifies the direction with respect to the positive x-axis. So, (0°, 3) is equivalent to (3, 0) in Euclidean coordinates, (90°, 2) is equivalent to (0, 2), and (45°, sqrt(2)) is equivalent to (1, 1).
A data type that can represent one of different choices is called a variant, a (tagged) union, or a sum type. In C, such types are represented as unions. The above example can be modeled as a union with two alternatives: PolarPoint or EuclideanPoint.
point_euclid_polar.c serves as the example for variant data.
8.1 Steps
8.1.1 Problem statement
Write down the problem statement as a comment. The problem statement should answer these questions: What information needs to be represented? What should the function (to be implemented) do with the data? What cases need to be considered?
/*
Points on the 2D plane may be given either in Euclidean coordinates or in polar coordinates. Design a function that computes the distance of such a point to the origin of the coordinate system.
*/
8.1.2 Data definition
Write how the domain information should be represented in the program, or, vice versa, write how to interpret the data (e.g., a pair of numbers) as real-world information (e.g., a point in 2D space). Write a formal data definition as a struct. Write a constructor for creating and initializing values of this type.
enum PointTag {
TPointEuclid, // tag for a point in Euclidean coordinates
TPointPolar // tag for a point in polar coordinates
};
struct Point {
enum PointTag tag; // indicate which variant follows
union {
// either the Euclidean variant...
struct {
double x;
double y;
};
// ... or the polar variant
struct {
double theta;
double magnitude;
};
};
};
// constructor for PointEuclid
struct Point make_point_euclid(double x, double y) {
struct Point p;
p.tag = TPointEuclid;
p.x = x;
p.y = y;
return p;
}
// constructor for PointPolar
struct Point make_point_polar(double t, double m) {
struct Point p;
p.tag = TPointPolar;
p.theta = t;
p.magnitude = m;
return p;
}
struct EuclideanPoint {
double x;
double y;
};
struct PolarPoint {
double theta;
double magnitude;
};
struct Point {
enum PointTag tag; // indicate which variant follows
union {
// either the Euclidean variant...
struct EuclideanPoint euclid;
// ... or polar variant
struct PolarPoint polar;
};
};
struct Point make_point_euclid(double x, double y) {
struct Point p;
p.tag = TPointEuclid;
p.euclid.x = x;
p.euclid.y = y;
return p;
}
struct Point make_point_polar(double t, double m) {
struct Point p;
p.tag = TPointPolar;
p.polar.theta = t;
p.polar.magnitude = m;
return p;
}
The key difference between structures and unions is that a structure contains each of its members, whereas a union only contains only one of its members (the one last set).
The rest of this chapter uses the first data definition (anonymous structs).
8.1.3 Function signature
// struct Point -> double
8.1.4 Function name
distance_to_origin
8.1.5 Function header
double distance_to_origin(struct Point p);
8.1.6 Function stub
double distance_to_origin(struct Point p) {
return 0.0;
}
8.1.7 Purpose statement
Write down a purpose statement (given as a comment).
// Computes the distance from the given point
// to the origin of the coordinate system.
8.1.8 Examples and expected results
Write down examples with expected results in the test method. Define any constants that the function needs. Check that the code compiles. (Some tests will fail for the stub.)
static void distance_to_origin_test() {
// test cases for polar variant
check_within_d(
distance_to_origin(make_point_polar(0.0, 0.0)),
0.0, EPSILON);
check_within_d(
distance_to_origin(make_point_polar(0.0, 1.0)),
1.0, EPSILON);
check_within_d(
distance_to_origin(make_point_polar(2.3, 2.0)),
2.0, EPSILON);
// test cases for Euclidean variant
check_within_d(
distance_to_origin(make_point_euclid(0.0, -2.0)),
2.0, EPSILON);
check_within_d(
distance_to_origin(make_point_euclid(2.0, 0.0)),
2.0, EPSILON);
check_within_d(
distance_to_origin(make_point_euclid(1.0, 1.0)),
sqrt(2.0), EPSILON); // square root
}
It is important to generate test cases for each variant.
8.1.9 Function body
Implement the function body. Put required helper functions on a wish list. These will be implemented later.
How to identify the need for a helper function: A function should perform one well-defined task. A change in domain or data type should be outsourced in a helper function. A reusable subtask should be outsourced in a helper function (Don’t Repeat Yourself, DRY). It is often helpful to design a stub for the helper functions. This way you can already compile the program.
The implementation of the example function does not need helper functions.
// Computes the distance from the given point
// to the origin of the coordinate system.
double distance_to_origin(struct Point p) {
if (p.tag == TPointEuclid) {
return sqrt(p.x * p.x + p.y * p.y); // square root
} else if (p.tag == TPointPolar) {
return p.magnitude;
}
return 0.0;
}
The function takes different strategies to compute the distance of the point to the root, depending on whether the point is given in Euclidean or in polar coordinates. The tag is used to distinguish these cases.
To distinguish between the variants, the switch-statement is a good alternative to if-statements.
double distance_to_origin(struct Point p) {
switch (p.tag) {
case TPointEuclid:
return sqrt(p.x * p.x + p.y * p.y); // square root
case TPointPolar:
return p.magnitude;
}
return 0;
}
Since the tag is an enumeration, the compiler can check whether the switch-statement handles all the cases of the enumeration. In C an enumeration value is just an integer. A careless programmer could set an invalid tag value (e.g., -123). As long as only the constructors make_point_euclid and make_point_polar are used to create points, invalid tags are not an issue.
However, the switch-statement is dangerous. It falls through all conditions after the matching one, unless the case is terminated by a break-statement or a return-statement (the latter also terminates the function).
double distance_to_origin(struct Point p) {
switch (p.tag) {
case TPointEuclid:
...
break; <--- leave the switch-statement!
case TPointPolar:
...
break; <--- leave the switch-statement!
}
// <--- the breaks jump to this point after the switch statement
...
return ...;
}
Moreover, the cases in the switch-statement have to be constant integer values. Never put anything else there! If you are unsure, use if-statements!
Given this information, we can write a reusable template for functions that handle points:
... fn_for_points(struct Point p) {
switch (p.tag) {
case TPointEuclid:
...
return... p.x... p.y...;
case TPointPolar:
...
return... p.theta... p.magnitude...;
}
return ...;
}
This template may be copied, pasted, and adapted for new functions that operate on this data type.
8.1.10 Testing
Check that the function body satisfies the tests. Correct the function body (and the tests). Look for opportunities to simplify the structure of the code. This typically requires multiple iterations.
int main(void) {
distance_to_origin_test();
return 0;
}
8.1.11 Review and revise
Review and revise the function name, the parameter names, and the purpose statement. Improve them if necessary. A design is not complete until it has a purpose statement and tests. The purpose statement should describe what the function computes (not how it does the computation) and should mention the given inputs and produced result. The test examples should cover corner cases and a typical case.
8.2 Exercises
Write a data definition for a type that can represent one of miles or kilometers. Use this data definition to represent distances. Design a function convert_distance that converts a distance to the other unit.
Write a data definition for a type Shape that can represent one of a rectangle, a circle, or an equilateral triangle. Write a function that computes the area of a shape.
9 Doing Things Repeatedly
It is often necessary, to repeatedly execute some code. C provides multiple language constructs for doing so, such as the while loop, the for loop, and the do-while loop. Such language constructs are not strictly necessary. In principle repeated execution can also be achieved through recursion. This involves functions that call themselves, either directly or indirectly.
Repetition raises the question of termination, i.e., the question whether the repetition will eventually stop. Termination is controlled through a condition that is tested on each iteration or recursive call, respectively.
Repetition also requires thinking about boundary cases. It is important to pay attention to the very first and the very last iteration.
9.1 While Loop
While loops consist of the keyword while followed by a condition in parentheses (...) followed by the loop’s body, a block of statements in curly braces {...}. The body may instead be a single statement.
...
statement_0;
while (condition) {
statement_1;
...
statement_n;
}
statement_x
...
While loops first test whether the condition is true. If so, the loop’s body is executed. A single execution of the body is denoted as an iteration. If the condition is false, the while loop terminates and the first statement after the while loop (statement_x) is executed. After each iteration the condition is tested again. While the condition is true, the loop continues (this is why it is called a while loop). If the condition is initially false, the body is never executed.
For a loop to terminate, the condition has to be initially false or some variable has to change to make the condition false after a number of iterations. This changing variable needs to be tested in the loop’s condition. A typical pattern is to increase the variable with each iteration until some upper bound is reached, at which point the loop terminates. Here is a simple example:
int i = 0;
while (i < 3) {
printiln(i); // print the value of i, followed be a line break
i = i + 1;
}
Variable i is initialized to 0. The condition 0 < 3 is true, so the first iteration starts. The printiln(0) statement gets executed. The symbol 0 appears as output. i is set to 1. The first iteration ends. As a reminder, the assignment statement = proceeds as follows: The expression i + 1 on its right-hand side gets evaluated to 0 + 1 = 1. This value is then assigned to variable i. After the first iteration has finished, the condition is evaluated again, and so on.
The execution can be illustrated as follows. The assignment statement separates the steps.
int i = 0;
=> initialize variable i with value 0
while (0 < 3) {
printi(0); // outputs 0
i = 0 + 1;
}
=> assign value 1 to variable i
while (1 < 3) {
printi(1); // outputs 1
i = 1 + 1;
}
=> assign value 2 to variable i
while (2 < 3) {
printi(2); // outputs 2
i = 2 + 1;
}
=> assign value 3 to variable i
while (3 < 3) // false, block {...} not executed
// loop terminates
What happens if we remove the last statement of the loop body (i = i + 1;)? Let’s illustrate the initial iterations.
int i = 0;
=>
while (0 < 3) {
printi(0); // outputs 0
}
=>
while (0 < 3) {
printi(0); // outputs 0
}
=>
while (0 < 3) {
printi(0); // outputs 0
}
=>
while (0 < 3) {
printi(0); // outputs 0
}
=>
while (0 < 3) {
printi(0); // outputs 0
}
...
The state of i never changes. The same condition is tested over and over again. This is an example of the infamous infinite loop.
The following example prints the multiples of 3 starting from 30 and going down to 3. Please try to implement this yourself before reading on.
The solution is:
int i = 30;
while (i >= 3) {
printiln(i); // print the value of i, followed be a line break
i = i - 3;
}
We again show how the execution proceeds:
int i = 30;
while (30 >= 3) {
printiln(30); // outputs 30
i = 30 - 3;
}
=>
while (27 >= 3) {
printiln(27); // outputs 27
i = 27 - 3;
}
=>
while (24 >= 3) {
printiln(24); // outputs 24
i = 24 - 3;
}
=>
while (21 >= 3) {
printiln(21); // outputs 21
i = 21 - 3;
}
=>
while (18 >= 3) {
printiln(18); // outputs 18
i = 18 - 3;
}
=>
while (15 >= 3) {
printiln(15); // outputs 15
i = 15 - 3;
}
=>
while (12 >= 3) {
printiln(12); // outputs 12
i = 12 - 3;
}
=>
while (9 >= 3) {
printiln(9); // outputs 9
i = 9 - 3;
}
=>
while (6 >= 3) {
printiln(6); // outputs 6
i = 6 - 3;
}
=>
while (3 >= 3) {
printiln(3); // outputs 3
i = 3 - 3;
}
=>
while (0 >= 3) // false, body not executed
// loop terminates
9.2 For Loop
for (initialization; condition; update) {
statement_1;
...
statement_n;
}
int i = 30;
while (i >= 3) {
printiln(i);
i = i - 3;
}
for (int i = 30; i >= 3; i = i - 3) {
printiln(i);
}
It requires fewer lines of code, but one has to memorize where the initialization, run condition, and update statements go. for loops are very popular among C programmers (but not among functional programmers, but that’s another story).
9.3 Recursion
Looping constructs such as while and for loops are not strictly necessary to let a program do things repeatedly. Another possibility is to put the steps that need to be repeated into a function that calls itself a certain number of times.
int i = 0;
while (i < 3) {
printi(i);
i = i + 1;
}
may be replaced by this function call
while_loop(0);
and this function (which calls itself)
void while_loop(int i) {
if (i < 3) {
printi(i);
while_loop(i + 1); // function calls itself
// with another argument
}
}
Expanding the recursive calls leads to the following steps:
while_loop(0);
=> the call expands to the body of function while_loop with every occurrence of parameter i replaced by argument 0
if (0 < 3) {
printi(0);
while_loop(0 + 1);
}
=> condition 0 < 3 is true...
if (true) {
printi(0);
while_loop(0 + 1);
}
=> ... thus the if-statement expands to its then-block
printi(0);
while_loop(0 + 1);
=> the call expands to the body of function while_loop with i replaced by 1
printi(0);
if (1 < 3) {
printi(1);
while_loop(1 + 1);
}
=> condition 1 < 3 is true...
printi(0);
if (true) {
printi(1);
while_loop(1 + 1);
}
=> ... thus the if-statement expands to its then-block
printi(0);
printi(1);
while_loop(1 + 1);
=> the call expands to the body of function while_loop with i replaced by 2
printi(0);
printi(1);
if (2 < 3) {
printi(2);
while_loop(2 + 1);
}
=> condition 2 < 3 is true...
printi(0);
printi(1);
if (true) {
printi(2);
while_loop(2 + 1);
}
=> ... thus the if-statement expands to its then-block
printi(0);
printi(1);
printi(2);
while_loop(2 + 1);
=> the call expands to the body of function while_loop with i replaced by 3
printi(0);
printi(1);
printi(2);
if (3 < 3) {
printi(2);
while_loop(2 + 1);
}
=> condition 3 < 3 is false...
printi(0);
printi(1);
printi(2);
if (FALSE) {
printi(2);
while_loop(2 + 1);
}
=> ... so the if-statement disappears
printi(0);
printi(1);
printi(2);
The while loop and the recursive call above produce the same result, but are not necessarily equivalent in terms of efficiency. The above steps help to understand how the recursion proceeds. It is not an accurate depiction of what is going on internally. We will discuss these details later.
9.4 Termination and Boundaries
As mentioned above, iteration requires checking whether the loop eventually terminates. The loop terminates if its run condition gets false. This can only happen through changes in the loop body (or the condition was initially false; or other events modify state, which we do not consider here). You should check, whether the contents of the loop body (or function body in the case of recursion) have the potential to finally make the run condition false. Otherwise the iteration will never terminate. Moreover, it is important to check the values of the variables inside a loop body on the first iteration. For each variable that appears in the iteration condition, check its initial value. The condition gives you additional information: The condition is true when entering the loop. It is false immediately after the loop (this might be different if a break statement is used, which we discuss later).
int i = 30;
while (i >= 3) {
// here: i >= 3 (because loop condition is true here)
// here: i <= 30 (because of initialization and update of i)
printiln(i);
i = i - 3;
}
// here: i < 3 (because loop condition is false here)
It is very easy (and common) to make errors with loops, in which the loop variable is off by one (e.g., goes one step too far). Hence always think through the value in the first iteration and what is implied by the loop condition. To practice thinking about edge cases that occur at iteration boundaries we introduce the concept of an assertion.
int i = 30;
while (i >= 3) {
assert(i >= 3); // from loop condition being true here
assert(i <= 30); // from initialization and update
printiln(i);
i = i - 3;
}
assert(i < 3); // from loop condition being false here
assert(1 == 2); // assume that this is on line 123
Line 123: assertion violated
The difference between assertions and test cases (check_expect_...) is that assertions state some abstract invariant truth about the state of execution at some point, whereas test cases are concrete examples of input data and expected results. Moreover, a fulfilled assertion does not produce output, a violated one stops the program. A fulfilled test case produces positive output, a violated test case produces negative output but does not stop the program so that other test cases will be executed, too.
We are now ready to formulate a design recipe for loops. It works identically with for loops and while loops.
9.5 Recipe for While Loops
Start with the following template to express an iteration through while loops. The template and the following example are contained in while_30_3_squared.c.
Template for while loops:
#include "base.h"
int main(void) {
// purpose
int i = initial_value;
while (condition_on_i) {
assert(condition_on_i);
assert(condition_on_i_init_update);
printiln(i); // for inspecting the iteration variable
// statements ...i...;
update i;
}
assert(!(condition_on_i)); // inverse of condition (! for "not")
return 0;
}
Formulate a purpose statement for the iteration. Describe the range and order of values to be iterated over. If possible, name the first and the last value. Then describe the effect of statements on each iterated value or on the iterated values as a whole.
// Iterates over the multiples of 3, from 30 down to 3.
// Outputs the square of each of these values.
If necessary, change the name of the iteration variable i. If necessary, change the type of the iteration variable (e.g., to double). Fill in the initial value and the run condition according to the purpose statement. The run condition needs to appear in three places.
// Iterates over the multiples of 3, from 30 down to 3.
// Outputs the square of each of these values.
int i = 30;
while (i >= 3) {
assert(i >= 3);
assert(condition_on_i_init_update);
printiln(i); // for inspecting the iteration variable
// statements ...i...;
update i;
}
assert(!(i >= 3)); // inverse of condition (! for "not")
Now define the update statement for the iteration variable i. Think about how the second value of the iteration variable is generated from its initial value. Then think about how in general the next value of the iteration variable is produced from the previous one. The purpose statement of the example states that we need the multiples of 3. 30 is a multiple of 3. The next smaller multiple of 3 is 27. The next smaller multiple of 3 is 24. In general, if i is a multiple of 3, then (i - 3) is the next smaller multiple of 3. This leads to the update statement
i = i - 3;
// Iterates over the multiples of 3, from 30 down to 3.
// Outputs the square of each of these values.
int i = 30;
while (i >= 3) {
assert(i >= 3);
assert(i <= 30);
printiln(i); // for inspecting the iteration variable
// statements ...i...;
i = i - 3;
}
assert(!(i >= 3)); // inverse of condition (not)
30
27
24
21
18
15
12
9
6
3
printiln(i * i); // square the iterated values
// Iterates over the multiples of 3, from 30 down to 3.
// Outputs the square of each of these values.
int i = 30;
while (i >= 3) {
assert(i >= 3);
assert(i <= 30);
// printiln(i); // for inspecting the iteration variable
printiln(i * i); // square the iterated values
i = i - 3;
}
assert(!(i >= 3)); // inverse of condition (not)
900
729
576
441
324
225
144
81
36
9
9.6 Recipe for For Loops
The recipe described in the previous section also applies to for loops. This is the template for for loops. The template and example are contained in for_30_3_squared.c.
Template for for loops:
#include "base.h"
int main(void) {
// purpose
for (int i = initial_value; condition_on_i; update i) {
assert(condition_on_i);
assert(condition_on_i_init_update);
printiln(i); // for inspecting the iteration variable
// statements ...i...;
}
return 0;
}
int main(void) {
// purpose
int i = 0; // declaration outside of for loop
for (i = initial_value; condition_on_i; update i) {
assert(condition_on_i);
assert(condition_on_i_init_update);
printiln(i); // for inspecting the iteration variable
// statements ...i...;
}
assert(!(condition_on_i)); // inverse of condition (not)
return 0;
}
9.7 Aggregation with Loops
Often, the purpose of iteration is to produce some aggregated value, such as a sum or an average. The following example shows how to compute the sum of the squares of the first ten positive integers.
#include "base.h"
int main(void) {
// purpose
int i = initial_value;
while (condition_on_i) {
assert(condition_on_i);
assert(condition_on_i_init_update);
printiln(i); // for inspecting the iteration variable
// statements ...i...;
update i;
}
assert(!(condition_on_i)); // inverse of condition (not)
return 0;
}
// Iterates over the integers 1 to 10.
// Computes the sum of squares of these values.
// Iterates over the integers 1 to 10.
// Computes the sum of squares of these values.
int i = 1;
while (i <= 10) {
assert(i <= 10);
assert(i >= 1);
printiln(i); // for inspecting the iteration variable
// statements ...i...;
i = i + 1;
}
assert(!(i <= 10)); // inverse of condition (not)
1
2
3
4
5
6
7
8
9
10
The sum is computed by adding up the squares of the iterated values one by one. The implementation needs an additional variable (sum) to hold the intermediary (and final) result of the sum.
// Iterates over the integers 1 to 10.
// Computes the sum of squares of these values.
int sum = 0; // <--- new variable for intermediary (and final) result
int i = 1;
while (i <= 10) {
assert(i <= 10);
assert(i >= 1);
// printiln(i); // for inspecting the iteration variable
sum = sum + i * i; // <--- adding the next squared value
i = i + 1;
}
assert(!(i <= 10)); // inverse of condition (not)
prints("sum = "); // <--- output
printiln(sum); // <--- output (value and newline)
sum = 385
9.8 Recipe for Repetition through Recursion
Repetition through recursion is not very common in C. There are several reasons for this, one of which is efficiency. It is still useful to study this case, as it helps to understand algorithms on recursive data structures. The template for repetition through recursion is as follows. The template and example are contained in rec_30_3_squared.c.
#include "base.h"
// purpose
// Should only be called from repeat (or from itself).
void repeat_rec(int i) {
if (condition_on_i) {
assert(condition_on_i);
printiln(i); // for inspecting the iteration variable
// statements ...i...;
repeat_rec(update_i); // function calls itself
// with another argument
}
}
void repeat(void) {
repeat_rec(initial_value);
}
int main(void) {
repeat();
return 0;
}
Function repeat acts as an initialization function and is only called once. Function repeat_rec is designed to be called from repeat and calls itself recursively. This is a common pattern for recursive functions.
// Iterates over the multiples of 3, from 30 down to 3.
// Outputs the square of each of these values.
// Should only be called from repeat (or from itself).
void repeat_rec(int i) {
if (i >= 3) {
assert(i >= 3);
// printiln(i); // for inspecting the iteration variable
printiln(i * i);
repeat_rec(i - 3); // function calls itself
// with another argument
}
}
void repeat(void) {
repeat_rec(30);
}
int main(void) {
repeat();
return 0;
}
The recursive template is problematic. It defines a function, yet does not return a value. Its return type is void. If we wish to aggregate values during recursion and return the result, the following template is applicable.
// purpose
// Should only be called from repeat (or from itself).
int repeat_rec(int i) {
if (condition) {
assert(condition);
// printiln(i); // for inspecting the iteration variable
int contribution_of_i = ...;
return conbribution_of_i <op> repeat_rec(update_value);
} else {
return initial_value_aggregation;
}
}
int repeat(void) {
return repeat_rec(initial_value);
}
int main(void) {
printiln(repeat());
return 0;
}
To make this concrete, let us modify an example from above (rec_sum_1_3_squared.c):
// Iterates over the integers 1 to 3.
// Computes the sum of squares of these values.
// Should only be called from sum_squares (or from itself).
int sum_squares_rec(int i) {
if (i <= 3) {
assert(i <= 3);
// printiln(i); // for inspecting the iteration variable
int contribution_of_i = i * i;
return contribution_of_i + sum_squares_rec(i + 1);
} else {
return 0;
}
}
int sum_squares(void) {
return sum_squares_rec(1);
}
int main(void) {
prints("sum = ");
printiln(sum_squares());
return 0;
}
An alternative is to make the accumulation of the values explicit by adding an accumulator as a second argument of the recursive function:
// Iterates over the integers 1 to 3.
// Computes the sum of squares of these values.
// Should only be called from sum_squares (or from itself).
int sum_squares_rec(int i, int acc) {
if (i <= 3) {
// printiln(i); // for inspecting the iteration variable
return sum_squares_rec(i + 1, acc + i * i);
} else {
return acc;
}
}
int sum_squares(void) {
// initial values of iteration variable and accumulator
return sum_squares_rec(1, 0);
}
This variant makes very clear how values are transferred from recursive call to recursive call. To clarify this point, we show the expansion of the recursive calls.
sum_squares();
=>
sum_squares_rec(1, 0);
=> the call expands to the function body with 1 for i and 0 for acc
if (1 <= 3) {
return sum_squares_rec(1 + 1, 0 + 1 * 1);
} else {
return 0;
}
=> 1 <= 3 is true, thus the if-statement expands to the then-case
return sum_squares_rec(1 + 1, 0 + 1 * 1);
=> the call expands to the function body with 2 for i and 1 for acc
if (2 <= 3) {
return sum_squares_rec(2 + 1, 1 + 2 * 2);
} else {
return 1;
}
=> 2 <= 3 is true, thus the if-statement expands to the then-case
return sum_squares_rec(2 + 1, 1 + 2 * 2);
=> the call expands to the function body with 3 for i and 5 for acc
if (3 <= 3) {
return sum_squares_rec(3 + 1, 5 + 3 * 3);
} else {
return 5;
}
=> 3 <= 3 is true, thus the if-statement expands to the then-case
return sum_squares_rec(3 + 1, 5 + 3 * 3);
=> the call expands to the function body with 4 for i and 14 for acc
if (4 <= 3) {
return sum_squares_rec(4 + 1, 14 + 4 * 4);
} else {
return 14;
}
=> 4 <= 3 is false, thus the if-statement expands to the else-case
return 14;
=> which finally yields the value 14
14
In some languages this variant is as efficient as non-recursive loops, because no information from the previous call has to be remembered. For example, gcc performs tail call optimization in this situation, if you use the compiler flag -O2.
The rest of this section is optional material.
Recursive, without accumulator:
sum = 1626540144
time: 0.674 ms
Recursive, with accumulator:
sum = 1626540144
time: 0.548 ms
Equivalent while loop:
sum = 1626540144
time: 0.249 ms
Recursive, without accumulator:
sum = 1626540144
time: 0.002 ms
Recursive, with accumulator:
sum = 1626540144
time: 0.002 ms
Equivalent while loop:
sum = 1626540144
time: 0.002 ms
9.9 Exercises
Write a while loop that computes the mean of all positive, even integers up to 100.
Write a recursive function with accumulator that computes the mean of all positive, even integers up to 100.
Three assertions are given with loops. Explain why these assertions hold.
10 Arrays: Fixed-Size Collections
Up to now we have only dealt with fixed-size data. These data were either atomic (like integers) or compound (like structures), but each individual item was explicitly named in the data type definition. This is inconvenient for representing collections of items of the same kind. An example would be a sequence of 24 temperature values in degrees Celsius, measured at each hour of the same day. This information can be represented as a fixed-size collection. When such a collection is created, we have to provide its size. Once created, the size does not change.
There are multiple possibilities for representing collections of items of the same type in a program. We start with arrays, which have a fixed number of elements. In the next chapter we will see lists, which are collections with a variable number of elements. The unifying characteristic of collections is that they treat a number of elements as a whole. Operations on such collections may then be applied to each element, which is a new form of abstraction.
Arrays store elements of a certain type. The elements can be retrieved by their index position. The first element has index 0, the second element has index 1, and so on. To simplify working with C arrays we introduce a set of functions. The documentation of these functions is located in arrays_lists.h, int_array.h, double_array.h, and string_array.h, respectively. Test examples and function implementations can be found in the corresponding .c files.
This chapter does not initially introduce new language concepts, but uses the provided array functions. Later we will discuss built-in C arrays, which pose some difficulties for beginners. Using this strategy we hope that you become familiar with working with fixed-size collections of data, before tackling details of the C language. The focus is on design recipes for processing fixed-size collections. For example, for a sequence of temperature values we might be interested in the average temperature, the minimum or the maximum temperature, or we might want to convert these values from degrees Celsius to degrees Fahrenheit.
The elements of arrays are typically processed using while and for loops, as introduced in the previous chapter.
10.1 Temperatures
Imagine that you take the temperatures outside the main building of Hannover University. You take measurements (in °C) at each full hour (the first one at 0:00 and the last one at 23:00). This is the result:
7, 6, 5, 6, 6, 5, 6, 6, 6, 7, 7, 9, 10, 11, 11, 12, 13, 12, 10, 9, 9, 8, 7, 4
/*
Compile: make array_temperatures
Run: ./array_temperatures
*/
#include "base.h"
#include "arrays_lists.h"
int main(void) {
Array temps = ia_of_string( // create int array from string
"7, 6, 5, 6, 6, 5, 6, 6, 6, 7, 7, 9, 10,"
"11, 11, 12, 13, 12, 10, 9, 9, 8, 7, 4");
ia_println(temps); // output the whole int array
printiln(a_length(temps)); // output length of int array
printiln(get(int, temps, 0)); // get first value of int array
printiln(get(int, temps, 23)); // get last value of int array
a_free(temps); // release memory of int array
return 0;
}
[7 6 5 6 6 5 6 6 6 7 7 9 10 11 11 12 13 12 10 9 9 8 7 4]
24
7
4
Each individual temperature is represented as an integer. The collection of 24 temperatures is represented as an array of integers.
The function ia_of_string takes the sequence of temperatures as a string and produces a corresponding array. (The prefix ia_ stands for “integer array.”) Characters in the string that are not digits just separate numbers. The produced array is assigned to the temps variable. Function ia_println prints the array (first line of output), followed by a line break. Function a_length returns the number of elements in the array. Function printiln prints this value (i stands for “integer”, ln stands for “line break”) and produces the second line of output. Function get returns the array element of the given type at the given index position. The values at index positions 0 (first element) and 23 (last element) are output as the third and fourth line. Function a_free frees the memory used by the array. This function is not strictly necessary here. If an array is no longer needed, this function should be called in the way shown. We will discuss the details later.
printiln(get(int, temps, 24));
a_get: index 24 is out of range (array length: 24, allowed indices: 0..23)
Imagine we now wish to find out whether the array contains any freezing temperatures. To solve this problem we apply the design recipes for functions defined earlier and then focus on how to process the array in a loop.
Problem statement
The function to implement can be generalized a bit beyond arrays of 24 temperatures. We wish to write a function that is applicable to any array of integers, no matter how long and how to interpret the data. The function we want should just check, whether the array contains at least one negative value.
Data definition
The temperatures are defined as an array of integers. Each element represents a temperature in degrees Celsius. Each array index stands for a full hour, e.g., index 0 represents 0:00, index 6 represents 6:00.
Function signature
The function we look for takes an integer array and returns a Boolean value.// Array(int) -> bool
Function name
contains_negative
Function header
bool contains_negative(Array array);
Function stub
bool contains_negative(Array array) {
return false;
}
Purpose statement
// Returns true if the array contains negative values,
// otherwise returns false.
Examples and expected results (test cases)
static void contains_negative_test(void) {
Array array;
array = ia_of_string("10, 20, 30");
check_expect_b(contains_negative(array), false);
a_free(array);
array = ia_of_string("0, 0, 1, 1, 1");
check_expect_b(contains_negative(array), false);
a_free(array);
array = ia_of_string("-1, -3");
check_expect_b(contains_negative(array), true);
a_free(array);
array = ia_of_string("1, 3, -99");
check_expect_b(contains_negative(array), true);
a_free(array);
array = ia_of_string("-1, 3, 99");
check_expect_b(contains_negative(array), true);
a_free(array);
array = ia_of_string("");
check_expect_b(contains_negative(array), false);
a_free(array);
}
int main(void) {
contains_negative_test();
return 0;
}
At this stage the program compiles, but some of the tests will fail when running the program.
Function body
Now we implement the body of contains_negative. Clearly, we have to look at the individual values in the array to check for a negative one. We stop if we have found a negative value or looked at all of the values. This kind of task can be done with a for loop. We use the recipe from the previous chapter to start developing the algorithm.// Returns true if the array contains negative values,
// otherwise returns false.
bool contains_negative(Array array) {
// <from recipe for loops>
// purpose
for (int i = initial_value; condition_on_i; update_i) {
assert(condition_on_i);
assert(condition_on_i_init_update);
printiln(i); // for inspecting the iteration variable
// statements ...i...;
}
return false;
}
The first step is to formulate the purpose statement of the iteration. It consists of two parts, the first referring to the iteration (what indices in what order), the second referring to the operation on each iterated element: (1) Describe the range and order of elements to be iterated over. (2) Describe the effect of the loop body on each iterated value.// Iterates over the elements of the array (any order).
// For each array element, checks whether it is negative.
// If so returns true, otherwise continues
// until all of the elements have been inspected.
// If there is no negative value, answers false.
Iterating over the elements requires iterating over their indices in the array. In the temperature example, the allowed index positions are 0 (for the first element), 1 (for the second element), up to 23 (for the last element). It is a general convention in C (and in many other programming languages) to start counting array indices at 0. Thus, we replace initial_value with 0 and update_i with i = i + 1. The run condition depends on the length of the array. The last allowed index is 23 for a length of 24 or, in general, n - 1 for length n. condition_on_i thus becomes i <= n - 1 or equivalently i < n. Make sure you understand why the last two expressions are equivalent for integer variables. We are ready to take the next step of the recipe and put in values and expressions for these placeholders:bool contains_negative(Array array) {
// Iterate over the elements of the array (any order).
// ...
for (int i = 0; i < a_length(array); i = i + 1) {
assert(i < a_length(array));
assert(i >= 0);
printiln(i); // for inspecting the iteration variable
// statements ...i...;
}
return false;
}
Now, compile the program and run it. For the above test cases, the output should be these indices and test results (possibly with different line numbers):0
1
2
Line 24: check passed
0
1
2
3
4
Line 28: check passed
0
1
Line 32: Actual value false differs from expected value true.
0
1
2
Line 36: Actual value false differs from expected value true.
0
1
2
Line 40: Actual value false differs from expected value true.
Line 44: check passed
To simplify the discussion, we call contains_negative with the temps array given above. The indices are then0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
However, we are not interested in these indices per se, but in the array elements. To see them, replaceprintiln(i); // for inspecting the iteration variable
withprintiln(get(int, array, i));
to obtain7
6
5
6
6
5
6
6
6
7
7
9
10
11
11
12
13
12
10
9
9
8
7
4
Which are indeed the elements of the temps array.To find out whether the array contains any negative values, we need to look at each array element in turn. We now focus our attention to the second part of the purpose statement, i.e., we check for each element whether it is smaller than zero and return true if so. The function becomes:bool contains_negative(Array array) {
// Iterate over the elements of the array (any order).
// For each array element, check whether it is negative.
// If so return true, otherwise continue
// until all of the elements have been inspected.
// If there is no negative value, answer false.
for (int i = 0; i < a_length(array); i = i + 1) {
assert(i < a_length(array));
assert(i >= 0);
int t = get(int, array, i);
if (t < 0) {
return true;
}
}
return false;
}
Compiling and running yields:false
Testing
Applying the test cases to the contains_negative yields the following output:Line 24: check passed
Line 28: check passed
Line 32: check passed
Line 36: check passed
Line 40: check passed
Line 44: check passed
Review and revise
It is a good idea to call the a_length only once. Cleaning up the function we obtain:
bool contains_negative(Array array) {
// Iterate over the elements of the array (any order).
// For each array element, check whether it is negative.
// If so return true, otherwise continue
// until all of the elements have been inspected.
// If there is no negative value, answer false.
int n = a_length(array);
for (int i = 0; i < n; i++) {
assert(i < n);
assert(i >= 0);
if (get(int, array, i) < 0) {
return true;
}
}
return false;
}
10.2 Selecting Data from an Array
Imagine that we want to reduce the amount of data in the temperature array and only consider the data of every second hour, i.e., we only want the temperatures at 0:00, 2:00, 4:00, ..., 22:00. Let us apply the same recipe as above.
Problem statement
The function to implement needs to subsample the data and provide every second element of the original array. It should start with the first element.
Data definition
The information is represented in arrays with elements of type integer.
Function signature
The function we look for takes an integer array and returns an integer array.// Array(int) -> Array(int)
- Function name
every_other
- Function header
Array every_other(Array array);
Function stub
Array every_other(Array array) {
return array(int, 0); // an empty array
}
Purpose statement
// Returns an array that contains every other element of
// the provided array, i.e., the result array contains
// the elements at the even indices of the provided array.
Examples and expected results (test cases)
static void every_other_test(void) {
ia_check_expect(
every_other(ia_of_string("1 2 3")),
ia_of_string("1 3"));
ia_check_expect(
every_other(ia_of_string("")),
ia_of_string(""));
ia_check_expect(
every_other(ia_of_string("-2")),
ia_of_string("-2"));
ia_check_expect(
every_other(ia_of_string("1 2")),
ia_of_string("1"));
ia_check_expect(
every_other(ia_of_string("1 2 3 4")),
ia_of_string("1 3"));
}
int main(void) {
every_other_test();
return 0;
}
The check function ia_check_expect compares two integer arrays, element by element. To pass, the lengths must be equal and for each index position the elements must be equal.
At this stage the program compiles, but some of the tests will fail when running the program. Note that no memory is released in these tests (a_free is never called) in order to improve readability and to focus on the essentials.
Function body
Now we implement the body of every_other. We again use the recipe for for loops from the previous chapter.// Returns an array that contains every other element of
// the provided array, i.e., the result array contains
// the elements at the even indices of the provided array.
Array every_other(Array array) {
// <from recipe for loops>
// purpose
for (int i = initial_value; condition_on_i; update_i) {
assert(condition_on_i);
assert(condition_on_i_init_update);
printiln(i); // for inspecting the iteration variable
// statements ...i...;
}
return array(int, 0); // an empty array
}
The first step is to formulate the purpose statement of the iteration. It consists of two parts, the first referring to the iteration (what indices in what order), the second referring to the operation on each iterated element: (1) Describe the range and order of elements to be iterated over. (2) Describe the effect of the loop body on each iterated value.// Iterates over the even indices in the original order.
// Copy each element at an even index to the result array.
We are interested in the even indices, i.e., 0, 2, 4, etc. Thus, we replace initial_value with 0 and update_i with i = i + 2. The run condition depends on the length of the array: condition_on_i is replaced by i < a_length(array). The second assertion states that i is not negative and even. We are ready to take the next step of the recipe and put in values and expressions for these placeholders:// Returns an array that contains every other element of
// the provided array, i.e., the result array contains
// the elements at the even indices of the provided array.
Array every_other(Array array) {
// Iterates over the even indices in the original order.
// Copies each element at an even index to the result array.
for (int i = 0; i < a_length(array); i = i + 2) {
assert(i < a_length(array));
assert(i >= 0 && (i % 2) == 0); // % is the modulo operator
printiln(i); // for inspecting the iteration variable
// statements ...i...;
}
return array(int, 0); // an empty array
}
Applying every_other to the temps array yields these indices:0
2
4
6
8
10
12
14
16
18
20
22
To print the corresponding elements, replaceprintiln(i); // for inspecting the iteration variable
withprintiln(get(int, array, i));
to obtain7
5
6
6
6
7
10
11
13
10
9
7
These values should not be printed, but put into a result array. The lengh of the result array is (n + 1) / 2, where n is the length of the input array. The division operator / here denotes an integer division, which discards the fraction. Check that this expression yields the correct lengths for the test cases given above. The remaining task is to put the values read from even indices of array to successive indices of result. The transformation of index positions in the input array to index positiosn of the result array is simply the integer division by 2.
The function becomes:// Returns an array that contains every other element of
// the provided array, i.e., the result array contains
// the elements at the even indices of the provided array.
Array every_other(Array array) {
int n = a_length(array); // length of input array
int m = (n + 1) / 2; // length of result array
Array result = array(int, m);
// Iterate over the even indices in the original order.
// Copy each element at an even index to the result array.
for (int i = 0; i < a_length(array); i = i + 2) {
assert(i < a_length(array));
assert(i >= 0 && (i % 2) == 0);
int x = get(int, array, i); // read from array
set(int, result, i / 2, x); // write to result
}
return result;
}
int main(void) {
Array temps = ia_of_string(
"7, 6, 5, 6, 6, 5, 6, 6, 6, 7, 7, 9, 10,"
"11, 11, 12, 13, 12, 10, 9, 9, 8, 7, 4");
Array eo = every_other(temps);
ia_println(eo);
a_free(eo);
a_free(temps);
return 0;
}
Compiling and running yields:[7 5 6 6 6 7 10 11 13 10 9 7]
Testing
Applying the test cases to the every_other yields the following output:Line 181: check passed
Line 182: check passed
Line 183: check passed
Line 184: check passed
Line 185: check passed
Review and revise
A few final touches lead to this implementation:
// Returns an array that contains every other element of
// the provided array, i.e., the result array contains
// the elements at the even indices of the provided array.
Array every_other(Array array) {
int n = a_length(array); // length of input array
int m = (n + 1) / 2; // length of result array
Array result = array(int, m);
// Iterate over the even indices in the original order.
// Copy each element at an even index to the result array.
for (int i = 0; i < n; i += 2) {
assert(i < n);
assert(i >= 0 && (i % 2) == 0);
set(int, result, i / 2, get(int, array, i));
}
return result;
}
int main(void) {
Array temps = ia_of_string(
"7, 6, 5, 6, 6, 5, 6, 6, 6, 7, 7, 9, 10,"
"11, 11, 12, 13, 12, 10, 9, 9, 8, 7, 4");
Array a = every_other(temps);
ia_println(a);
while (a_length(a) > 1) {
Array b = every_other(a);
ia_println(b);
a_free(a);
a = b;
}
a_free(a);
a_free(temps);
return 0;
}
[7 5 6 6 6 7 10 11 13 10 9 7]
[7 6 6 10 13 9]
[7 6 13]
[7 13]
[7]
10.3 Reducing an Array to a Single Value
Your task is now to find the average temperature during selectable periods of the day. These periods could be morning, afternoon, evening, etc. They can be precisely defined by specifying the start hour and end hour of the chosen period. Computing the temperature from the input array and the period is a larger task that can be broken down into three subtasks:
Compute a subarray for the given period.
Compute the sum of the elements in the subarray.
Divide the sum by the number of elements in the subarray.
These three subtasks could conceivably also be written as a single function. Choosing a suitable granularity for functions is part of the design decisions that a programmer has to make. A function should be responsible for a single, well-defined task only. Defining smaller function often pays off later, when such functions can be used again. For example, a function for creating a subarray is probably useful in many future programming tasks. The same is true for a function that computes the sum of the elements in an array. The decision is up to the programmer. But here, we will implement these three functions separately. This example is contained in array_mean.c.
10.3.1 Compute a subarray
Problem statement
Compute a subarray of a given array.
Function signature
The function takes an array and information about which part copy. The latter could be specified with a start index and a length, or with a start index and an end index. We choose to use start index and end index and specify that the start index is inclusive and the end index is exclusive. This is a very common convention.// Array(int), int, int -> Array(int)
Function name
subarray
- Function header
Array subarray(Array array, int start_index, int end_index);
Function stub
Array subarray(Array array, int start_index, int end_index) {
return array(int, 0); // an empty array
}
Purpose statement
// Creates a new array consisting of array[start_index, end_index).
// start_index is inclusive, end_index is exclusive.
Examples and expected results (test cases)
static void subarray_test(void) {
Array array, array2, sub;
array = ia_of_string("1 2 3 4");
sub = subarray(array, 0, a_length(array));
// [0, len) should give the original array
ia_check_expect(array, sub);
a_free(sub);
sub = subarray(array, -1, a_length(array) + 1);
// [-1, len + 1) should also give the original array
ia_check_expect(array, sub);
a_free(sub);
array2 = ia_of_string("2 3 4");
// [1, len) should drop the first element
sub = subarray(array, 1, a_length(array));
ia_check_expect(array2, sub);
a_free(array2);
a_free(sub);
array2 = ia_of_string("2 3");
// [1, len - 1) should drop the first and last element
sub = subarray(array, 1, a_length(array) - 1);
ia_check_expect(array2, sub);
a_free(array2);
a_free(sub);
array2 = ia_of_string("");
// [1, 1) is an empty range
sub = subarray(array, 1, 1);
ia_check_expect(array2, sub);
a_free(sub);
sub = subarray(array, 2, 1);
// [2, 1) is an empty range as well
ia_check_expect(array2, sub);
a_free(array2);
a_free(sub);
a_free(array);
}
int main(void) {
subarray_test();
return 0;
}
Again, the check function ia_check_expect compares two integer arrays, element by element.
Verify that the program compiles. Think about the examples. Do they cover the important cases? Add more examples if necessary.
Function body
We start with the recipe from the previous chapter.// Creates a new array consisting of array[start_index, end_index).
// start_index is inclusive, end_index is exclusive.
Array subarray(Array array, int start_index, int end_index) {
// <from recipe for loops>
// purpose
for (int i = initial_value; condition_on_i; update_i) {
assert(condition_on_i);
assert(condition_on_i_init_update);
printiln(i); // for inspecting the iteration variable
// statements ...i...;
}
return array(int, 0);
}
The first step is to formulate the purpose statement of the iteration. It consists of two parts, the first referring to the iteration (what indices in what order), the second referring to the operation on each iterated element: (1) Describe the range and order of elements to be iterated over. (2) Describe the effect of the loop body on each iterated value.// Iterate over the elements of the array from start_index
// (inclusive) to end_index (exclusive).
// Copy element array[start_index] to result[0].
// Copy element array[start_index + 1] to result[1].
// Copy element array[start_index + 2] to result[2].
// ...
// Copy element array[start_index + n] to result[n].
// Where n is end_index - start_index - 1.
Since there are two arrays involved (the input array and the result array) we need two indices. We call them i (for iterating over the input array) and j (for iterating over the result array). The initialization expression becomes int i = start_index, j = 0. The update expression update_i becomes i++, j++. The run condition becomes i < end_index. The end index is exclusive, thus we have to stop before it is reached. We modify printiln(i) to show both i and j. Filling in these gaps leads to:// Creates a new array consisting of array[start_index, end_index).
// start_index is inclusive, end_index is exclusive.
Array subarray(Array array, int start_index, int end_index) {
// <from recipe for loops>
// purpose
for (int i = start_index, j = 0; i < end_index; i++, j++) {
assert(i < end_index);
assert(i >= start_index);
printf("i = %d, j = %d\n", i, j);
// statements ...i...;
}
return array(int, 0);
}
Now, compile the program and run it. For the above test cases, the output should be these indices and test results (likely except the actual line numbers):i = 0, j = 0
i = 1, j = 1
i = 2, j = 2
i = 3, j = 3
Line 53: check_expect_int_array: Actual length 4 differs from expected length 0
i = -1, j = 0
i = 0, j = 1
i = 1, j = 2
i = 2, j = 3
i = 3, j = 4
i = 4, j = 5
Line 57: check_expect_int_array: Actual length 4 differs from expected length 0
i = 1, j = 0
i = 2, j = 1
i = 3, j = 2
Line 62: check_expect_int_array: Actual length 3 differs from expected length 0
i = 1, j = 0
i = 2, j = 1
Line 68: check_expect_int_array: Actual length 2 differs from expected length 0
Line 74: check passed
Line 78: check passed
There are some invalid indices, like -1. In general, you should check input arguments for allowed values. Let us first think about when to return an empty array. An empty array should be returned ifstart_index >= end_index or
start_index >= a_length(array) or
end_index <= 0.
If none of these are given, we still need to check whether start_index < 0 and end_index > a_length(array). In any of these cases, we set the indices to the minimum and maximum allowed values, respectively. The result is:// Creates a new array consisting of array[start_index, end_index).
// start_index is inclusive, end_index is exclusive.
Array subarray(Array array, int start_index, int end_index) {
if (start_index >= end_index ||
start_index >= a_length(array) ||
end_index <= 0)
{
return array(int, 0);
}
assert(start_index < end_index);
assert(start_index < a_length(array));
assert(end_index > 0);
if (start_index < 0) start_index = 0;
if (end_index > a_length(array)) end_index = a_length(array);
for (int i = start_index, j = 0; i < end_index; i++, j++) {
assert(i < end_index);
assert(i >= start_index);
printf("i = %d, j = %d\n", i, j);
// statements ...i...;
}
return array(int, 0);
}
The output now only contains valid indices:i = 0, j = 0
i = 1, j = 1
i = 2, j = 2
i = 3, j = 3
Line 53: check_expect_int_array: Actual length 4 differs from expected length 0
i = 0, j = 0
i = 1, j = 1
i = 2, j = 2
i = 3, j = 3
Line 57: check_expect_int_array: Actual length 4 differs from expected length 0
i = 1, j = 0
i = 2, j = 1
i = 3, j = 2
Line 62: check_expect_int_array: Actual length 3 differs from expected length 0
i = 1, j = 0
i = 2, j = 1
Line 68: check_expect_int_array: Actual length 2 differs from expected length 0
Line 74: check passed
Line 78: check passed
The remaining task is to create a result array of length end_index - start_index, to read from the input array and to copy to the output array:// Creates a new array consisting of array[start_index, end_index).
// start_index is inclusive, end_index is exclusive.
Array subarray(Array array, int start_index, int end_index) {
if (start_index >= end_index ||
start_index >= a_length(array) ||
end_index <= 0)
{
return array(int, 0);
}
assert(start_index < end_index);
assert(start_index < a_length(array));
assert(end_index > 0);
if (start_index < 0) start_index = 0;
if (end_index > a_length(array)) end_index = a_length(array);
int n = end_index - start_index;
Array result = array(int, n);
for (int i = start_index, j = 0; i < end_index; i++, j++) {
assert(i < end_index);
assert(i >= start_index);
int value = get(int, array, i); // read from input array
set(int, result, j, value); // write to output array
}
return result;
}
Testing
All test cases pass:Line 53: check passed
Line 57: check passed
Line 62: check passed
Line 68: check passed
Line 74: check passed
Line 78: check passed
Review and revise
Check whether you find aspects in the code that can be simplified.
10.3.2 Compute the sum of the array elements
Computing the sum of the elements is simple enough that we omit the full recipe and show the final function. As an exercise, and without the solution in front of you, follow the steps of the recipe and check whether you arrive at a similar solution.
// Computes the sum of the elements in an array.
int ia_sum(Array array) {
int sum = 0;
for (int i = 0; i < a_length(array); i++) {
sum = sum + get(int, array, i);
}
return sum;
}
10.3.3 Compute the average of the array elements
This function reuses the two functions just defined. The solution is easy. Just divide the sum by the number of elements in the list. The main challenge lies in handling empty arrays. In this case division is not defined. There are multiple strategies to handling this. The most drastic one is to end the program. Another one would be to define a special output type that can either represent the average value or nothing, depending on whether the input array is empty. We start with the drastic version:
// Computes the average of the elements in an array.
int ia_average(Array array) {
int n = a_length(array);
if (n == 0) {
printsln("Impossible. Bye.");
exit(1); // terminate the program
}
int sum = ia_sum(array);
return sum / n;
}
10.3.4 Compute the average for a range of array elements
Problem statement
Time to get back to our original problem: Computing the average temperature of a selectable timespan over a day.
Function signature
The function takes an array and the start and end indices of the relevant part.// Array(int), int, int -> int
Function name
subarray_average
Function header
int subarray_average(Array, int, int);
Parameter names
int subarray_average(Array array, int start_index, int end_index);
Function stub
int subarray_average(Array array, int start_index, int end_index) {
return 0;
}
Purpose statement
// Computes the average over the specified subarray.
// The start index is inclusive, the end index is exclusive.
Examples and expected results (test cases)
static void subarray_average_test(void) {
Array array;
array = ia_of_string("1 2 3 4");
check_expect_i(subarray_average(array, 0, 1), 1);
check_expect_i(subarray_average(array, 3, 4), 4);
check_expect_i(subarray_average(array, 0, 4), 2);
check_expect_i(subarray_average(array, 1, 4), 3);
a_free(array);
}
int main(void) {
subarray_average_test();
return 0;
}
Verify that the program compiles. Think about the examples. Do they cover the important cases? Add more examples if necessary.
Function body
The function body is quite simple, because we can use the functions defined above.// Computes the average over the specified subarray.
// The start index is inclusive, the end index is exclusive.
int subarray_average(Array array, int start_index, int end_index) {
Array sub = subarray(array, start_index, end_index);
int average = ia_average(sub);
a_free(sub);
return average;
}
Testing
All test cases pass:Line 210: check_expect_int passed
Line 211: check_expect_int passed
Line 212: check_expect_int passed
Line 213: check_expect_int passed
Review and revise
The function seems to be in good shape.
int main(void) {
Array temps = ia_of_string(
"7, 6, 5, 6, 6, 5, 6, 6, 6, 7, 7, 9, 10,"
"11, 11, 12, 13, 12, 10, 9, 9, 8, 7, 4");
prints("Average morning temperature: ");
printiln(subarray_average(temps, 6, 10));
prints("Average forenoon temperature: ");
printiln(subarray_average(temps, 10, 13));
prints("Average afternoon temperature: ");
printiln(subarray_average(temps, 13, 17));
prints("Average evening temperature: ");
printiln(subarray_average(temps, 17, 22));
return 0;
}
Average morning temperature: 6
Average forenoon temperature: 8
Average afternoon temperature: 11
Average evening temperature: 9
10.4 Optional Results
The ia_average function given above just ends the execution if it is called with an empty array. This behavior is reasonable, but is not apparent from the signature of the function (Array(int) -> int).
// Computes the average of the elements in an integer array.
int ia_average(Array array) {
int n = a_length(array);
if (n == 0) {
printsln("Impossible. Bye.");
exit(1);
}
int sum = ia_sum(array);
return sum / n;
}
There is no way to signal the exceptional condition of an empty array to the caller, because the return type is integer and any integer could be the result of an averaging operation.
The solution is to use an option type as the return type of the function. The option type signals to the programmer that this function may return nothing (when the array or range are empty). The signature of the function becomes
Array(int), int, int -> IntOption
// Array(int), int, int -> IntOption
// Computes the average over the specified subarray.
// The start index is inclusive, the end index is exclusive.
IntOption subarray_average2(Array array, int start_index, int end_index) {
if (start_index >= end_index) {
return make_int_none();
}
Array sub = subarray(array, start_index, end_index);
int average = ia_average(sub);
a_free(sub);
return make_int_some(average);
}
IntOption io = subarray_average2(temps, 17, 17);
if (io.none) {
printsln("none");
} else {
printiln(io.some);
}
As you may have noticed, an IntOption is an example of an Itemization type (see Recipe for Itemizations). It could also be implemented as a Variant type (see Recipe for Variant Data (Sum Types)). It is defined in basedefs.h.
typedef struct IntOption {
bool none;
int some;
} IntOption;
The constructor functions for this type are defined in base.c as
IntOption make_int_none(void) {
IntOption op = { true, 0 };
return op;
}
IntOption make_int_some(int some) {
IntOption op = { false, some };
return op;
}
10.5 Filtering Arrays
The function every_other may alternatively be written by using a filter function. A filter function takes the array and a predicate function. A predicate function is Boolean-valued. The filter function calls the predicate function for each element of the array. The predicate function returns true for elements that should be kept and false for elements that should be filtered out.
bool even_index(int element, int index, int x) {
return (index % 2) == 0; // % is the modulo operator
}
Array every_other(Array array) {
return ia_filter(array, even_index, 0);
}
int int int -> bool
In the call ia_filter(array, even_index, 0) the memory location of the predicate function even_index is provided to ia_filter, so that ia_filter can call even_index for each element.
10.6 All in One Function
This section is optional material.
A potential issue with subarray_average as defined above is that the subarray is allocated just to compute the average, only to be thrown away immediately afterwards.
int subarray_average(Array array, int start_index, int end_index) {
Array sub = subarray(array, start_index, end_index);
int average = ia_average(sub);
a_free(sub); // <--- throw away
return average;
}
unsigned long time_averaging(int n) {
Array array = ia_range(0, n); // fill with 0, 1, ..., n - 1
unsigned long x = 0;
for (int i = 1; i < n; i++) {
x += subarray_average(array, 0, i);
}
a_free(array);
return x;
}
int main(void) {
time_function( time_averaging(50000) );
return 0;
}
The timimg results (given in milliseconds) are:
Variant |
| Time (ms) |
original, with assertions, get |
| 4545 |
three functions, no assertions, get |
| 1330 |
three functions, no assertions, subscripts |
| 485 |
three functions, no assertions, memcpy |
| 333 |
one function, no assertions, get |
| 723 |
one function, no assertions, subscripts |
| 111 |
int subarray_average(Array array, int start_index, int end_index) {
int n = array->n;
if (start_index >= end_index ||
start_index >= n ||
end_index <= 0)
{
printsln("Cannot do this. Bye.");
exit(1);
}
if (start_index < 0) start_index = 0;
if (end_index > n) end_index = n;
n = end_index - start_index;
int sum = 0;
int *p = array->a;
for (int i = start_index; i < end_index; i++) {
sum += p[i];
}
return sum / n;
}
This shows that accessing array elements via subscripting [ ] is much faster than using function get. The variant with subscripts takes only 36% of the running time (485 ms) compared to the getter function (1330 ms). The benefit of the latter is that it checks array bounds on each access. Subscripting of C arrays does not perform bounds checking and allows accessing arbitrary memory locations beyond the limits of the array. Furthermore, the unnecessary creation of a temporary array triples the running time (333 ms vs 111 ms). The memcpy function quickly copies contiguous blocks of memory. Typically, for loop bodies that get executed very often, it is beneficial to optimize for performance.
10.7 Exercises
Write function that returns the smallest element of an integer array.
Write function that computes the sum of all values of an integer array that are larger than their immediate predecessor in the array.
Compute the number of zero crossings in an array. A zero crossing is a transition from negative to positive values, or vice versa.
11 Memory in C
11.1 Introduction
Automatic storage (call stack)
Static storage
Dynamic storage (heap)
When a program ends, all of its memory is released. (A running program is a process. When the process terminates, the operating system releases all of the resources associated with that process.)
11.2 Automatic Storage
Automatic storage refers to the call stack. Non-static local variables and function parameters are stored on the call stack. They are automatically allocated when a block is entered and automatically destroyed when a block is left. Their lifetime is thus linked to the block in which they are declared. This allows for function recursion as space for the new stack frame is allocated on the call stack: Arguments are pushed on the stack and space for local variables is set aside on the stack. Moreover, the return address to the instruction after the call is stored on the call stack.
Automatic storage is efficient, has clear semantics, and supports function argument passing well. One issue is that the stack may overflow if it is not large enough to hold all of the stack frames for currently active functions. This can become a problem mainly for recursive calls.
11.3 Static Storage
Static storage refers to variables that are allocated at the file level or as static local variables. The lifetime of such variables equals the lifetime of the program. The size of static storage needs to be known at compile time. This kind of storage is suitable for long-lived and “global” entities.
Static storage reduces the need for explicitly declared function parameters. However, functions that rely on static storage are more difficult to test and debug as they do not only rely on their parameters. Their behavior cannot be understood from their parameters alone. They are not necessarily functions in the mathematical sense.
11.4 Dynamic Memory
Dynamic memory (also known as free store or dynamic heap) provides more flexibility than automatic and static storage. Dynamically allocated memory is accessed through pointers. Blocks of dynamic memory are allocated with malloc, xmalloc, calloc, and xcalloc. They are released with free. These functions are declared as:
void* malloc(int n);
void* xmalloc(int n);
void* calloc(int n, int size);
void* xcalloc(int n, int size);
void free(void *p);
malloc and xmalloc take the length in bytes of the memory block needed and return a pointer to the start of the new memory block. calloc and xcalloc take the number of elements and the element size in bytes to allocate a block of length number * size bytes. The c in calloc/xcalloc stands for clear – each byte of the allocated block is set to 0. Allocated blocks are simply cast to the pointer type needed and can be used until explicitly released with free.
With (x)malloc / (x)calloc and free the program explicitly manages the lifetime of a block of memory in the program. This is very flexible and allows to handle memory objects whose size is only known at runtime, e.g., because it depends on user input or on unknown amounts of data received over the network. However, a big problem associated with manual dynamic memory allocation is to correctly free allocated memory. If the program fails to release memory that is no longer needed, then the program has memory leaks. Because dynamic memory is only accessible via pointers, if a pointer variable that holds the only reference to an allocated block of memory is accidentally being overwritten, the block cannot be accessed again.
11.4.1 Dynamic Memory Example
Sometimes variable amounts of memory are required. The exact amount may depend on user input and may not be available at compile time. In such situations dynamic memory allocation is helpful. An example would be a function function String stars(int n), which produces a String of n ‘*’ characters. So for example, stars(2) produces “**” and stars(8) produces “********”. Dynamic memory allocation allows requesting memory as needed. For stars this is n + 1 characters: n repetitions of the ‘*’ character followed by a ‘\0’ character to terminate the String. The complete function including test cases looks like this (see stars.c):
/*
Compile: make stars
Run: ./stars
*/
#include "base.h"
#include "string.h"
// note: typedef char* String;
String stars(int n) {
char *s = xmalloc(n + 1);
// or: char *s = xcalloc(n + 1, 1);
for (int i = 0; i < n; i++) {
s[i] = '*';
}
s[n] = '\0';
return s;
}
void stars_test(void) {
String s;
check_expect_s(s = stars(0), "");
free(s);
check_expect_s(s = stars(1), "*");
free(s);
check_expect_s(s = stars(2), "**");
free(s);
check_expect_s(s = stars(3), "***");
free(s);
}
int main(void) {
base_set_memory_check(true);
stars_test();
return 0;
}
11.4.2 malloc vs. xmalloc
int* p;
if ((p = malloc(100 * sizeof(int))) == NULL) {
// handle error
}
// use allocated memory
In contrast, xmalloc and xcalloc exit the program if a request for dynamic memory cannot be fulfilled. These functions never return NULL, which obviates the need to check for this case. However, xmalloc and xcalloc are non-standard extensions. They are used in the Programming I C Library for debugging. They are defined to keep track of which memory blocks have been dynamically allocated and tell the programmer if they have not been correctly released when the program exits. These implementations of xmalloc and xcalloc should not be used in production code, because the program has no way of reacting to scarce memory resources gracefully (e.g. by saving the current state to a file). Several implementations of xmalloc and xcalloc allow for defining an error handling function that is called in the event of an out-of-memory error before exiting.
11.4.3 The “Heap”
The following shows a simplified model of how blocks of memory are dynamically allocated and deallocated on the heap (also called free store). The example assumes that the heap consists of 16 bytes of memory, all of which are initially free. The diagrams show the effects of differenct calls to malloc, xmalloc, and free on the heap.
Byte *p1 = xmalloc(4); // returns 1
Requesting 4 bytes in the initial state yields address 1. The block extends from address 1 to 4.
Byte *p2 = xmalloc(3); // returns 5
Requesting a block of 3 bytes yields address 5. This block extends from address 5 to 7.
Byte *p3 = xmalloc(2); // returns 8
Requesting a block of 2 bytes yields address 8. This block extends from address 8 to 9.
free(p2); // p2 == 5
The block from address 5 to 7 is now free to be allocated again.
Byte *p4 = xmalloc(4); // 10
Requesting a block of 4 bytes yields address 10. This block extends from address 10 to 13. (The free block from address 5 to 7 is not large enough.)
Byte *p5 = malloc(4); // returns NULL
Using malloc to tequest another block of 4 bytes yields NULL, because there is no contiguous block of 4 or more bytes available. xmalloc(4) would have exited the program.
free(p3); // p3 == 8
The block from address 5 to 9 is now free to be allocated again.
Byte *p6 = xmalloc(4); // returns 5
Again requesting a block of 4 bytes yields address 5, because the first contiguous block of 4 or more bytes starts at address 5. The newly allocated block extends from address 5 to 8.
Byte *p7 = xmalloc(2); // returns 14
Requesting a block of 2 bytes yields address 14, because the first contiguous block of 2 or more bytes starts at address 14. The newly allocated block extends from address 14 to 15.
12 Lists: Variable-Size Collections
Lists are collections of variable size. Elements can be added to and removed from lists as necessary. This is in contrast to arrays, whose size is specified on creation and does not change afterwards. Lists are very flexible and versatile. For example, lists allow for efficient insertion between existing elements. However, accessing an element at a given position is less efficient than for arrays. A chain of pointers has to be followed to get to the desired element. Like arrays, lists allow treating a number of elements as a whole, with operations that are applied to each element of the list.
The Programming I C Library introduces lists for elements of various types. The documentation of these lists is available online and in the header files, list.h, int_list.h, double_list.h, string_list.h, and pointer_list.h, respectively. Test examples and function implementations can be found in the corresponding .c files.
The implementation of linked lists typically relies on pointers and dynamic memory, as introduced in the previous chapter.
12.1 Filtering Strings
Often the final size of a collection is not known at the start of an algorithm. An example would be an algorithm that processes a collection of words and only keeps those words that contain a particular substring. Initially it is unclear what size the filtered collection will have. Allocating an array of the same size as the original word collection would be wasteful. A list is much more suitable in this case. Words on the input collection are inspected one at a time. If they contain the substring (or match some other relevant criterion) they are appended to the output list. The output list grows with each added element. A possible implementation of a function that solves this problem is derived in the following (see keep_strings.c). The include directives are:
#include "base.h"
#include "string.h"
#include "list.h"
#include "string_list.h"
Problem statement
The function should keep those words of a collection of words that contain a given substring.
Data definition
The input collection and filtered result collection are Lists with elements of type String.
Function signature
The function we look for takes a List with elements of type String and a substring that desired words need to contain. The function returns the List filtered words.// List(String), String -> List(String)
Function name
keep_if_contains
Function header
List keep_if_contains(List list, String part)
Function stub
List keep_if_contains(List list, String part) {
return sl_create(); // empty String list
}
Purpose statement
// Returns a list of those words of the input list that contain part.
Examples and expected results (test cases)
static void keep_if_contains_test(void) {
// split sentence into list of words
List a = sl_split("Lists are collections of variable size.", ' ');
List expected = sl_split("Lists collections variable size.", ' ');
List actual = keep_if_contains(a, "i");
// "i" is part of every word of this sentence, so keep all
sl_check_expect(actual, expected);
sl_free(actual); l_free(expected);
expected = sl_split("are variable", ' ');
actual = keep_if_contains(a, "ar");
// "ar" is only contained in words "are" and "variable"
sl_check_expect(actual, expected);
sl_free(a); sl_free(actual); l_free(expected);
}
The function sl_split splits strings at the given character (here: the space character). The check function sl_check_expect checks for equality of two string lists. To be considered equal both string lists need to have the same length and the elements at each index position need to be equal.
At this stage the program compiles, but the tests will fail when running the program.
Function body
The first approach just follows the chain of list nodes to check each element. Unfortunately, this exposes the Node structure.List keep_if_contains(List list, String part) {
List result = sl_create();
for (StringListNode *node = list->first; node != NULL; node = node->next) {
if (s_contains(node->value, part)) {
sl_append(result, node->value);
}
}
return result;
}
The function s_contains returns true if (and only if) the string given as the second argument (part) is contained in the string that is given as the first argument. The function sl_append appends a string to a list of strings.
Testing
The test cases pass.keep_strings.c, line 63: check passed
keep_strings.c, line 68: check passed
Both tests passed!
Review and revise
An Iterator may be used to process the List. An iterator stores the current iteration state. In the case of Lists this means that it stores a pointer to the next node. There is a constructor function to create an Iterator from a List (l_iterator), a function to check whether there are more elements (l_has_next), and a function to advance to the next element and return its value (sl_next). The next-value-function is specific to the type of the list: Its return type corresponds to the elmenent type of the list. There are next-value-functions for int-lists (il_next), double-lists (dl_next), String lists (sl_next), and general lists (l_next). The latter just returns a pointer to the beginning of the list element. The next-value functions change the state of the iterator, because they advance to the next node. Hence, these functions take the address of the iterator – they use the iterator as an in/out argument. Iterators are not dynamically allocated and hence do not have to be released.
List keep_if_contains(List list, String part) {
List result = sl_create();
ListIterator iter = l_iterator(list);
while (l_has_next(iter)) {
String s = sl_next(&iter);
if (s_contains(s, part)) {
sl_append(result, s);
}
}
return result;
}
Because filtering arrays is such a common case, there is a function sl_filter that allows for formulating filtering criteria very concisely. It takes the list, the criterion, and an optional parameter. This leads to a third possibility to process the list.
static bool contains(String element, int index, String x) {
return s_contains(element, x);
}
List keep_if_contains(List list, String part) {
return sl_filter(list, contains, part);
}
Again, each list element that contains the given part is included in the result. The s_contains function cannot be used directly, because its signature does not match to what the sl_filter function expects (the index parameter is missing).
13 Style
Style and naming conventions can be a controversial issue. The primary goal of establishing naming conventions is to make code easier to read. Source code is typically more often read than written. Programs should be written as clearly and consistently as possible. Consistent use of naming conventions and consistent layout of source code is a sign of high-quality work. A few recommendations follow.
13.1 Naming Conventions
Use short, common, and non-abbreviated names.
Use abbreviations sparingly.
Use very short variable names only as local variables, and if their meaning is clear from the context.
Use i, j, k as loop indices.
Do not make the type name part of the variable name, e.g., center rather than centerPoint.
Use plural for collections of items, e.g., names rather than nameArray.
Use lower-case function, variable, and file names.
Use underscore as a word separator, e.g., hours_to_wages. Note that this is a C convention. Java often uses camel case (e.g., hoursToWages).
Use CAPITALIZED_WITH_UNDERSCORES for constants.
Use CamelCase with an upper-case first letter for custom-defined type names, e.g., TrackingOrder.
Don’t use a special prefix for pointers, e.g. simply list rather than pList
Avoid l as a variable name and O as a type name as they are easily confused with 1 and 0, respectively.
13.2 Coding Style
Properly indent code. This is extremely important for readability.
Consistently indent code. Do not mix tab characters and spaces. The typical tab width is 4 spaces. Using spaces avoids problems with different editor settings.
Write spaces around operators, e.g., 1 + 2 * 3 instead of 1+2*3.
Use empty lines to visually structure blocks of code.
Write an open brace as the last character of a line, not as the first character of the following line.
Always use braces.
https://www.doc.ic.ac.uk/lab/cplus/cstyle.html
http://users.ece.cmu.edu/~eno/coding/CCodingStandard.html
See also: GTK+ coding convention
13.3 camelCase vs. under_score
Dave Binkley, Marcia Davis, Dawn Lawrie, Christopher Morrell. To CamelCase or Under_score. IEEE 17th International Conference on Program Comprehension, 2009. ICPC ’09. (IEEE): 158–167.
Abstract: "The experiment builds on past work of others who study how readers of natural language perform such tasks. Results indicate that camel casing leads to higher accuracy among all subjects regardless of training, and those trained in camel casing are able to recognize identifiers in the camel case style faster than identifiers in the underscore style."
Bonita Sharif and Jonathan I. Maletic. An Eye Tracking Study on camelCase and under_score Identifier Styles. IEEE 18th International Conference on Program Comprehension, 2010. ICPC ’10. (IEEE): 196–205.
Abstract: "An empirical study to determine if identifier-naming conventions (i.e., camelCase and under_score) affect code comprehension is presented. An eye tracker is used to capture quantitative data from human subjects during an experiment. The intent of this study is to replicate a previous study published at ICPC 2009 (Binkley et al.) that used a timed response test method to acquire data. The use of eye-tracking equipment gives additional insight and overcomes some limitations of traditional data gathering techniques. Similarities and differences between the two studies are discussed. One main difference is that subjects were trained mainly in the underscore style and were all programmers. While results indicate no difference in accuracy between the two styles, subjects recognize identifiers in the underscore style more quickly."
http://en.wikipedia.org/wiki/CamelCase#cite_note-38
14 Error Messages
It is a helpful skill to be able to interpret error messages from the compiler. The following gives examples of common gcc error messages and gives advice on how to interpret them.
14.1 Compiler Error Messages
14.1.1 Missing ;
overtime.c:14:29: expected ';' after top level declarator
const int WEEKLY_HOURS = 40
^
14.1.2 Missing function declaration
overtime.c: In function 'hours_to_overtime_test':
overtime.c:11:20: implicit declaration of function 'hours_to_oxertime'
check_expect_i(hours_to_oxertime(20), 0);
^
14.1.3 Library missing, in wrong location, or not compiled
overtime.c: In function 'hours_to_overtime_test':
overtime.c:11: undefined reference to 'printiln'
14.1.4 No return value
noreturn.c:8:1: control may reach end of non-void function [-Wreturn-type]
#include "base.h"
String greeting(int type) {
if (type == 1) {
return "Hello";
}
// to solve error: return "Moin"; <-- line 7
} <-- line 8
int main(void) {
printsln(greeting(1));
printsln(greeting(2));
return 0;
}