One Grumpy Code Monkey

Tuesday, January 18, 2011

C++: Stream iterators and containers

I've been writing C++ code for several years now (since 2005), but I haven't really been writing idiomatic C++; it always looked more like C code with a personality disorder. Once I saw examples of how the standard containers were used in working code, I glommed onto the pattern pretty quickly; after all, code like


for (std::vector<Foo>::iterator it = foo.begin(); it != foo.end(); ++it)
  do_something_with(*it);

was pretty easy to grasp. I understood that iterators were an abstraction for pointers, yeah, but I never really grasped how fundamental they were to writing C++.

Several months ago, I had a ~~neurological event~~ mental breakthrough, and it was inspired by stream iterators, something I had never heard of until recently. You could use an iterator to iterate over an input stream as though it were like any other container. Furthermore, you could use algorithms such as std::copy to load the contents of a stream into a container, like so:


std::copy(std::istream_iterator(in), 
          std::istream_iterator(), 
          std::inserter(container, container.end()));

So I womped up this little example that reads a sequence of id-name pairs into a std::map container, then writes the contents of the container back to standard output.


#include <iostream>
#include <iterator>
#include <algorithm>
#include <sstream>
#include <string>
#include <map>

/**
 * Define a record type for the istream template parameter.
 */
struct Record
{
  int            m_id;
  std::string    m_name;

  Record() {}
  Record(int id, std::string name) : m_id(id), m_name(name) {}
  Record(const Record& rec) : m_id(rec.m_id), m_name(rec.m_name) {}

  /**
   * Cast operator; converts contents of this record to a std::pair
   * suitable for inserting into a map container
   */
  operator std::pair<const int, std::string>() const { return std::make_pair(m_id, m_name); }
};

/**
 * Converts a map entry to a formatted string
 */
std::string recordToString(std::pair<const int, std::string> it)
{
  std::stringstream tmp;
  tmp << "{ id: " << it->first << "; name: " << it->second << "}";
  return str;
}

/**
 * Main function
 */
int main(void)
{
  /** 
   * Set up the input stream.  For this example we'll use a string stream for
   * input.
   */
  std::string data = "1 foo 2 bar 3 bletch 4 blurga";
  std::stringstream instream;
  instream.str(data);

  /**
   * Target map container
   */
  std::map<int, std::string> vmap;

  /**
   * Copy data from the stream into the container.
   */
  std::copy(std::istream_iterator<Record>(instream),  // initially points to beginning of stream
            std::istream_iterator<Record>(),          // points "nowhere"
            std::inserter(vmap, vmap.end()));         // creates an output iterator
                                                      // pointing to the end of the 
                                                      // map

  /**
   * Write the contents of the container back out to standard output.  Since 
   * std::ostream_iterator cannot directly format parameters of type
   * std::pair<U,V>, we use the std::transform algorithm to call the
   * recordToString function on each element of the map and pass the resulting
   * string to the output iterator for standard output.  
   */
  std::transform(vmap.begin(), 
                 vmap.end(), 
                 std::ostream_iterator<std::string>(std::cout, " "), 
                 recordToString);
  std::cout << std::endl;

  return 0;
}

Thursday, November 04, 2010

Pointers in C

One of the biggest hurdles for people learning C is learning about pointers.

Conceptually, pointers are simple beasties: the value of a pointer expression is the location (address) of another object in memory. What tends to trip people up are the following:

C (and C++) pointer declaration and expression syntax, especially with complex pointer types;
Passing pointers as arguments to functions;
The affect of the base type on pointer arithmetic;
Using multiple levels of indirection; and
The behavior of array expressions with respect to pointers.

Pointer declarations and expressions

C declaration syntax is expression-centric, rather than object-centric; another way of describing it is that C follows a "declaration mimics use" paradigm. In short, the form of a C declaration should closely match the form of an expression of that type in an executable statement. For example, if we have a pointer to an integer and we want to access that integer value, we dereference the pointer using the unary * operator, like so:


x = *p;

The type of the expression *p is int, so the declaration of the pointer p is written as


int *p;

In the declaration above, int is the type specifier and *p is the declarator. The declarator introduces the name of the item being declared, along with additional type information not provided by the type specifier. The int-ness of p is provided by the type specifier int, while the pointer-ness is provided by the * in the declarator. Note that in executable code, whitespace doesn't matter:


x = * p;

will be parsed as x = (*p);. Similarly, whitespace doesn't matter in the declaration, so any of the following are legal:


int *p;
int* p;
int * p;

In all cases, the * operator is part of the declarator, so all of the above variations parse as


int (*p);

This means that in a declaration like


int* a, b;

only a is declared as a pointer to int; b is declared as a regular int.

Similarly, if we have an array of pointers to integers, and we want to access a particular integer value, we subscript the array and dereference the result, like so:


x = *ap[i];

Again, the type of the expression *ap[i] is int, so the declaration of the array is written as


int *ap[N];

Similar to the simple pointer declaration above, the int-ness of ap is specified by the type specifier, while the array-ness and pointer-ness are specified by the declarator. Since the subscript operator has a higher precendence than the dereference operator, the expression *ap[i] is parsed as *(ap[i]). Suppose that instead of an array of pointers to integers, we had a pointer to an array of integers; to access a particular integer value, we'd write


x = (*pa)[i];

because we have to dereference the pointer to the array before applying the subscript operator. By now, it should be clear that the declaration for a pointer to an array of integers would be


int (*pa)[N];

Pointer expressions and declarations can get arbitrarily complex: you can have arrays of pointers, pointers to arrays, pointers to arrays of pointers, pointers to functions, pointers to functions returning pointers, arrays of pointers to functions returning pointers to arrays, pointers to pointers, pointers to pointers to pointers, etc. Here are the basic rules for any type T:


T *p;      // p is a pointer to T
T *a[N];   // a is an N-element array of pointers to T
T (*a)[N]; // a is a pointer to an N-element array of T
T *f();    // f is a function returning a pointer to T
T (*f)();  // f is a pointer to a function returning T

As mentioned above, the * operator is the dereference operator; given any expression p of type T * (pointer to T), the value of the
expression *p is the value of the thing being pointed to and its type is T. The inverse of the * operator is the unary &, or address-of operator. Given any lvalue expression e of type T, the value of the expression &e is the address of e and the type of the expression is T *.

Given the following declarations


int i = 1;
int *p = &i;

and the following memory map:

Item	Address	Contents
Item	Address	0x00	0x01	0x02	0x03
i	0x80C01230	0x00	0x00	0x00	0x01
p	0x80C01234	0x80	0xC0	0x12	0x30

then the following are all true:

Expression	Value	Type
i	0x00000001	int
&i	0x80C01230	int * // pointer to int
p	0x80C01230	int * // pointer to int
*p	0x00000001	int
&p	0x80C01234	int ** // pointer to pointer to int

Note the wrinkle introduced with the last expression &p; we have a pointer to pointer to int (again, the type of an expression &e is T *, and in this case T is int *). It's possible (in some cases unavoidable) to have multiple levels of indirection with pointers.

Pointers and function arguments

One of the primary reasons we need to use pointers in C is to allow functions to modify their arguments. C passes all function arguments by value; the formal argument is a distinct object in memory from the actual argument and receives a copy of the value in the actual argument. For example, given the code


void swap(int x, int y)
{
  int tmp = x;
  x = y;
  y = tmp;
}

int main(void)
{
  int a = 5, b = 10; 
  printf("before: a = %d, b = %d\n", a, b);
  swap(a, b);
  printf("after: a = %d, b = %d\n", a, b);
  return 0;
}

the formal parameters x and y are distinct objects in memory from a and b, so the changes to x and y in the swap function are not reflected in a and b. Again, a hypothetical memory map may help:

At the beginning of swap
Item	Address	Contents
Item	Address	0x00	0x01	0x02	0x03
a	0x80C01230	0x00	0x00	0x00	0x05
b	0x80C01234	0x00	0x00	0x00	0x0A
x	0x8800C000	0x00	0x00	0x00	0x05
y	0x8800C004	0x00	0x00	0x00	0x0A

At the end of swap
Item	Address	Contents
Item	Address	0x00	0x01	0x02	0x03
a	0x80C01230	0x00	0x00	0x00	0x05
b	0x80C01234	0x00	0x00	0x00	0x0A
x	0x8800C000	0x00	0x00	0x00	0x0A
y	0x8800C004	0x00	0x00	0x00	0x05

Modifying the contents of address 0x8800C000, or x, has no effect on the contents of 0x80C01230, or a.

Since we want the swap function to modify the values of a and b in main, we must pass pointers to those objects so that the function may access those memory locations:


void swap(int *x, int *y)
{
  int tmp = *x;
  *x = *y;
  *y = tmp;
}

int main(void)
{
  int a = 5, b = 10;
  printf("before: a = %d, b = %d\n", a, b);
  swap(&a, &b);
  printf("after: a = %d, b = %d\n", a, b);
  return 0;
}

Instead of passing a and b to swap, we pass the expressions &a and &b, which are pointer values. Similarly, instead of declaring swap's parameters x and y as integers, we declare them as pointers to integers.

The memory map now looks like this:

At the beginning of swap
Item	Address	Contents
Item	Address	0x00	0x01	0x02	0x03
a	0x80C01230	0x00	0x00	0x00	0x05
b	0x80C01234	0x00	0x00	0x00	0x0A
x	0x8800C000	0x80	0xC0	0x12	0x30
y	0x8800C004	0x80	0xC0	0x12	0x34

At the end of swap
Item	Address	Contents
Item	Address	0x00	0x01	0x02	0x03
a	0x80C01230	0x00	0x00	0x00	0x0A
b	0x80C01234	0x00	0x00	0x00	0x05
x	0x8800C000	0x80	0xC0	0x12	0x30
y	0x8800C004	0x80	0xC0	0x12	0x34

The value of the parameter x is the address of a, or 0x80C01230; thus, the expression *x refers to the same area of memory as a, so assigning to the expression *x in swap has the same effect as assigning to the expression a in main.

Basically, if you want a function to modify something in its caller, you must pass a pointer to that thing.

Types and pointer arithmetic

When you index into an array using the [] operator or apply the ++ or -- operators to an array expression, you are performing pointer arithmetic. The result of the operation depends on the base type of the pointer. When you advance a pointer value by 1, that's in terms of the size of the base type; if the base type is 1 byte wide, you advance 1 byte. If the base type is 4 bytes wide, you advance 4 bytes.

Array indexing is implemented in terms of pointer arithmetic; when you write the expression a[i], it is evaluated as *(a + i) (remembering that i indicates a the number of elements past a, not the number of bytes; if the base type of a is N bytes wide, then the result of a + i is i * N bytes from a).

Note that this introduces two quirks of C syntax. The first is that array indexing is commutative; if a[i] is a legal expression, then so is i[a] (a + i == i + a), and they give the same result. The second is that we may apply the subscript operator to a pointer expression as well as an array expression (subject to certain restrictions). We'll get into this a little more in the section on pointers and arrays.

Multiple levels of indirection

There are a number of occasions when you have to deal with pointers to pointers. One common one is when you want a function to modify a pointer value. It's similar to the swap issue above, but instead of wanting the function to modify an integer in main, we want it to modify a pointer to integer:


  void swap(int **x, int **y)
  {
    int *tmp = *x;
    *x = *y;
    *y = tmp;
  }

  int main(void)
  {
    int a = 5;
    int b = 10;
    int *pa = &a;
    int *pb = &b;

    printf("before: a = %d, b = %d, pa = %p, pb = %p\n", a, b, (void *) pa, (void *) pb);
    swap(&pa, &pb);
    printf("after:  a = %d, b = %d, pa = %p, pb = %p\n", a, b, (void *) pa, (void *) pb);
    return 0;
  }

At the beginning of swap
Item	Address	Contents
Item	Address	0x00	0x01	0x02	0x03
a	0x80C01230	0x00	0x00	0x00	0x05
b	0x80C01234	0x00	0x00	0x00	0x0A
pa	0x80C01238	0x80	0xC0	0x12	0x30
pb	0x80C0123C	0x80	0xC0	0x12	0x34
x	0x8800C004	0x80	0xC0	0x12	0x38
y	0x8800C008	0x80	0xC0	0x12	0x3C

At the end of swap
Item	Address	Contents
Item	Address	0x00	0x01	0x02	0x03
a	0x80C01230	0x00	0x00	0x00	0x05
b	0x80C01234	0x00	0x00	0x00	0x0A
pa	0x80C01238	0x80	0xC0	0x12	0x34
pb	0x80C0123C	0x80	0xC0	0x12	0x30
x	0x8800C004	0x80	0xC0	0x12	0x38
y	0x8800C008	0x80	0xC0	0x12	0x3C

This example was kind of artificial; we'll see some more realistic scenarios in the next section.

Arrays and pointers

There is a close correspondence between arrays and pointers in C, as described in the section above on pointer arithmetic. This falls out of C's treatment of array expressions.

Except when it is the operand of the sizeof or unary & operator, or it is a string literal being used to initialize another array in a declaration, an expression with type "N-element array of T" will have its type implicitly converted ("decay") to type "pointer to T", and its value will be the address of the first element in the array.

This behavior is responsible for more grief among C programmers than just about any other aspect of the language. When you pass an array argument as a parameter to a function such as printf(), what actually gets passed is a pointer value, not an array.

For example:


#include <math.h>
#include <time.h>

void randomize(int *arr, size_t elements)
{
  size_t i;
  for (i = 0; i < elements; i++)
    arr[i] = rand();
}

int main(void)
{
  int values[20];
  size_t i;

  randomize(values, 20);
  for (i = 0; i < 20; i++)
    printf("values[%2d] = %d\n", i, values[i]);

  return 0;
}

Note that the type of values in main is int [20] (20-element array of int), but the parameter arr in randomize is type int *. When the compiler sees the array expression values in the call to randomize, it implicitly converts the type from an array of integer to a pointer to integer. Thus the declaration of arr as a pointer to int in randomize, rather than an array.

In an effort to be confusing, C allows you to declare the parameter using array syntax, such as


void randomize(int arr[], size_t elements)

or even


void randomize(int arr[20], size_t elements)

However, in both cases, the declaration is interpreted as int *arr. Note that this is only true in the context of a function parameter declaration.

As mentioned above, one of the exceptions to the conversion rule is when the array expression is the operand of the unary & operator; if the type of an expression e is "N-element array of T", then the type of the expression &e is "pointer to N-element array of T", or T(*)[N]. The following tables show the types of various expressions involving arrays:

Declaration: T a[N];
Expression	Type	Decays to
a	T [N]	T *
&a	T (*)[N]
*a	T
a[i]	T

Declaration: T a[N][M];
Expression	Type	Decays to
a	T [N][M]	T (*)[M]
&a	T (*)[N][M]
*a	T [M]	T *
a[i]	T [M]	T *
&a[i]	T (*)[M]
*a[i]	T
a[i][j]	T

Declaration: T a[N][M][L];
Expression	Type	Decays to
a	T [N][M][L]	T (*)[M][L]
&a	T (*)[N][M][L]
*a	T [M][L]	T (*)[L]
a[i]	T [M][L]	T (*)[L]
&a[i]	T (*)[M][L]
*a[i]	T [M]	T *
a[i][j]	T [M]	T *
&a[i][j]	T (*)[M]
*a[i][j]	T
a[i][j][k]	T

The pattern for higher-dimensional arrays should be clear from here on out.

Monday, May 10, 2010

This is a test

This is a test of Google's code prettyprinter:

#include <stdio.h>

int main(void)
{
  printf("This is a test of the Google prettyprinter\n");
  return 0;
}

Okay. Having to use entity codes for the angle brackets is annoying as hell. Must still be missing some magic element.

Wednesday, November 14, 2007

Arrays in C: Part 1

C's treatment of arrays is a little different from most other languages, causing no end of grief to people who are new to the language, particularly when it comes to passing arrays as parameters.

An array is a contiguous block of memory sized to hold N elements of a particular type (except for void, incomplete, or function types). Arrays are 0-origin, meaning that elements of an N-element array are numbered 0..N-1. Example:

int arr[10];
...
for (i = 0; i < 9; i++)
arr[i] = ...;

In C (and most C-derived languages), multidimensional arrays are treated as arrays of arrays, e.g.:

#define ROWS ...
#define COLS ...
int table[ROWS][COLS];

Again, this describes a contiguous chunk of memory sized to hold ROWS * COLS integers.

Simple so far, right?

Well, here's where it stops being simple. When an array identifier appears in most contexts, its type is converted from "array of T" to "pointer to T", and its value is set to point to the first element in the array. The only times this conversion doesn't happen is when the array identifier is an operand of either the address-of (&) or sizeof operators. When you hear someone say something like "arrays are just pointers" or "arrays and pointers are the same thing," they're garbling this concept. Arrays are not pointers, but the type of the array identifier is converted to a pointer in most contexts.

Also, array subscripting is done in terms of pointer arithmetic. The expression a[i] is functionally equivalent to *(a+i); that is, i is an offset from the base address a (this is why array indices go from 0 to N-1 instead of from 1 to N). It also means that array indexing is commutative; a[i] == i[a], although you'll rarely see the second form in code that isn't deliberately obfuscated.

Arrays are lvalues, but non-modifiable. In short, that means the following is illegal:

int a[10], b[10];
a = b;

You cannot assign a new value to an array object.

Tuesday, November 13, 2007

Playing with CSS

I'm adding some style elements for presenting source code. Here's what I have so far:

/**
* Welcome to the canonical First C Program
* that every larval programmer writes.
*/
#include <stdio.h>

int main(void)
{
printf("This is a test of the Emergency Broadcast System\n");
return 0;
}

I've had to widen the content part of the template to accommodate.

Monday, October 29, 2007

Fun with source control

So we're using Rational ClearCase for version control where I work, and for the most part it's a good tool, but I would really really like it if the tool made it more obvious which files were under version control and which were not, because I wound up breaking a build over the weekend after not adding one critical file to source control.

So, new rule -- I will always have at least two views. One will be for development, the other strictly for checking the build. Because modern technology is supposed to reduce the amount of work you have to do.

In other news, Gregor bit through the mouse cord on my computer last night. Whee.

Labels: coding, Monday

Saturday, October 20, 2007

The new new kid

The menagerie expands yet again.

Meet Gregor:

Your basic yellow dog. He is named after Gregor Mendel, since his genetics are ... complicated. His mom is part Golden Retriever. After that, it's a mystery. He's a water-loving dog (at least he'll play the hell out of the water in his water dish). Several people have claimed they see Lab or Chow in there. My sister-in-law the vet simply says, "he's a very mixed breed," and that we really won't know what all he has in him until he's six months old or so.

We picked him up Friday afternoon from a person in our raw food coop. She had adopted a dog from the local animal shelter who somehow missed getting spayed. Several weeks later she found she'd gotten a 7 for 1 deal. Gregor's litter is a mixed bag, with at least two separate fathers.

His first night was fairly uneventful (two accidents, but only because the sleepy apes were slow to recognize the signs of Full Puppy Syndrome). He's still fairly timid, which suits the cats just fine. Their reactions have ranged from total indifference, to cautious surveillance, to, well, this:

Lewis spent the entire night staring at Gregor. Thankfully he was less interested today. Gregor's been swiped at a couple of times and shook it off fairly well.

Both my wife and I grew up with dogs, but this is our first time raising a puppy. It's...different.

Labels: pets