One of the biggest hurdles for people learning C is learning about pointers.
Conceptually, pointers are simple beasties: the value of a pointer expression is the location (address) of another object in memory. What tends to trip people up are the following:
- C (and C++) pointer declaration and expression syntax, especially with complex pointer types;
- Passing pointers as arguments to functions;
- The affect of the base type on pointer arithmetic;
- Using multiple levels of indirection; and
- The behavior of array expressions with respect to pointers.
Pointer declarations and expressionsC declaration syntax is
expression-centric, rather than object-centric; another way of describing it is that C follows a "declaration mimics use" paradigm. In short, the form of a C declaration should closely match the form of an expression of that type in an executable statement. For example, if we have a pointer to an integer and we want to access that integer value, we
dereference the pointer using the unary
* operator, like so:
x = *p;
The type of the
expression *p is
int, so the declaration of the pointer
p is written as
int *p;
In the declaration above,
int is the
type specifier and
*p is the
declarator. The declarator introduces the name of the item being declared, along with additional type information not provided by the type specifier. The int-ness of
p is provided by the type specifier
int, while the pointer-ness is provided by the
* in the declarator. Note that in executable code, whitespace doesn't matter:
x = * p;
will be parsed as
x = (*p);. Similarly, whitespace doesn't matter in the declaration, so any of the following are legal:
int *p;
int* p;
int * p;
In all cases, the
* operator is part of the declarator, so all of the above variations parse as
int (*p);
This means that in a declaration like
int* a, b;
only
a is declared as a pointer to
int;
b is declared as a regular
int.
Similarly, if we have an array of pointers to integers, and we want to access a particular integer value, we subscript the array and dereference the result, like so:
x = *ap[i];
Again, the type of the
expression *ap[i] is
int, so the declaration of the array is written as
int *ap[N];
Similar to the simple pointer declaration above, the int-ness of
ap is specified by the type specifier, while the array-ness and pointer-ness are specified by the declarator. Since the subscript operator has a higher precendence than the dereference operator, the expression
*ap[i] is parsed as
*(ap[i]). Suppose that instead of an array of pointers to integers, we had a pointer to an array of integers; to access a particular integer value, we'd write
x = (*pa)[i];
because we have to dereference the pointer to the array
before applying the subscript operator. By now, it should be clear that the declaration for a pointer to an array of integers would be
int (*pa)[N];
Pointer expressions and declarations can get arbitrarily complex: you can have arrays of pointers, pointers to arrays, pointers to arrays of pointers, pointers to functions, pointers to functions returning pointers, arrays of pointers to functions returning pointers to arrays, pointers to pointers, pointers to pointers to pointers, etc. Here are the basic rules for any type
T:
T *p; // p is a pointer to T
T *a[N]; // a is an N-element array of pointers to T
T (*a)[N]; // a is a pointer to an N-element array of T
T *f(); // f is a function returning a pointer to T
T (*f)(); // f is a pointer to a function returning T
As mentioned above, the
* operator is the dereference operator; given any expression
p of type
T * (pointer to
T), the value of the
expression
*p is the value of the thing being pointed to and its type is
T. The inverse of the
* operator is the unary
&, or
address-of operator. Given any lvalue expression
e of type
T, the value of the expression
&e is the
address of
e and the type of the expression is
T *.
Given the following declarations
int i = 1;
int *p = &i;
and the following memory map:
Item |
Address |
Contents |
---|
0x00 |
0x01 |
0x02 |
0x03 |
---|
i |
0x80C01230 |
0x00 |
0x00 |
0x00 |
0x01 |
p |
0x80C01234 |
0x80 |
0xC0 |
0x12 |
0x30 |
then the following are all true:
Expression |
Value |
Type |
---|
i |
0x00000001 |
int |
&i |
0x80C01230 |
int * // pointer to int |
p |
0x80C01230 |
int * // pointer to int |
*p |
0x00000001 |
int |
&p |
0x80C01234 |
int ** // pointer to pointer to int |
Note the wrinkle introduced with the last expression
&p; we have a pointer to pointer to
int (again, the type of an expression
&e is
T *, and in this case
T is
int *). It's possible (in some cases unavoidable) to have multiple levels of indirection with pointers.
Pointers and function argumentsOne of the primary reasons we need to use pointers in C is to allow functions to modify their arguments. C passes all function arguments by value; the formal argument is a distinct object in memory from the actual argument and receives a
copy of the value in the actual argument. For example, given the code
void swap(int x, int y)
{
int tmp = x;
x = y;
y = tmp;
}
int main(void)
{
int a = 5, b = 10;
printf("before: a = %d, b = %d\n", a, b);
swap(a, b);
printf("after: a = %d, b = %d\n", a, b);
return 0;
}
the formal parameters
x and
y are distinct objects in memory from
a and
b, so the changes to
x and
y in the
swap function are not reflected in
a and
b. Again, a hypothetical memory map may help:
Item |
Address |
Contents |
---|
0x00 |
0x01 |
0x02 |
0x03 |
---|
a |
0x80C01230 |
0x00 |
0x00 |
0x00 |
0x05 |
b |
0x80C01234 |
0x00 |
0x00 |
0x00 |
0x0A |
x |
0x8800C000 |
0x00 |
0x00 |
0x00 |
0x05 |
y |
0x8800C004 |
0x00 |
0x00 |
0x00 |
0x0A |
At the beginning of swap
Item |
Address |
Contents |
---|
0x00 |
0x01 |
0x02 |
0x03 |
---|
a |
0x80C01230 |
0x00 |
0x00 |
0x00 |
0x05 |
b |
0x80C01234 |
0x00 |
0x00 |
0x00 |
0x0A |
x |
0x8800C000 |
0x00 |
0x00 |
0x00 |
0x0A |
y |
0x8800C004 |
0x00 |
0x00 |
0x00 |
0x05 |
At the end of swap
Modifying the contents of address
0x8800C000, or
x, has no effect on the contents of
0x80C01230, or
a.
Since we want the
swap function to modify the values of
a and
b in
main, we must pass
pointers to those objects so that the function may access those memory locations:
void swap(int *x, int *y)
{
int tmp = *x;
*x = *y;
*y = tmp;
}
int main(void)
{
int a = 5, b = 10;
printf("before: a = %d, b = %d\n", a, b);
swap(&a, &b);
printf("after: a = %d, b = %d\n", a, b);
return 0;
}
Instead of passing
a and
b to
swap, we pass the expressions
&a and
&b, which are pointer values. Similarly, instead of declaring
swap's parameters
x and
y as integers, we declare them as pointers to integers.
The memory map now looks like this:
Item |
Address |
Contents |
---|
0x00 |
0x01 |
0x02 |
0x03 |
---|
a |
0x80C01230 |
0x00 |
0x00 |
0x00 |
0x05 |
b |
0x80C01234 |
0x00 |
0x00 |
0x00 |
0x0A |
x |
0x8800C000 |
0x80 |
0xC0 |
0x12 |
0x30 |
y |
0x8800C004 |
0x80 |
0xC0 |
0x12 |
0x34 |
At the beginning of swap
Item |
Address |
Contents |
---|
0x00 |
0x01 |
0x02 |
0x03 |
---|
a |
0x80C01230 |
0x00 |
0x00 |
0x00 |
0x0A |
b |
0x80C01234 |
0x00 |
0x00 |
0x00 |
0x05 |
x |
0x8800C000 |
0x80 |
0xC0 |
0x12 |
0x30 |
y |
0x8800C004 |
0x80 |
0xC0 |
0x12 |
0x34 |
At the end of swap
The value of the parameter
x is the address of
a, or
0x80C01230; thus, the expression
*x refers to the same area of memory as
a, so assigning to the expression
*x in
swap has the same effect as assigning to the expression
a in
main.
Basically, if you want a function to modify something in its caller, you must pass a
pointer to that thing.
Types and pointer arithmeticWhen you index into an array using the
[] operator or apply the
++ or
-- operators to an array expression, you are performing pointer arithmetic. The result of the operation depends on the base type of the pointer. When you advance a pointer value by 1, that's in terms of the size of the base type; if the base type is 1 byte wide, you advance 1 byte. If the base type is 4 bytes wide, you advance 4 bytes.
Array indexing is implemented in terms of pointer arithmetic; when you write the expression
a[i], it is evaluated as
*(a + i) (remembering that
i indicates a the number of
elements past
a, not the number of bytes; if the base type of
a is N bytes wide, then the result of
a + i is
i * N bytes from
a).
Note that this introduces two quirks of C syntax. The first is that array indexing is commutative; if
a[i] is a legal expression, then so is
i[a] (
a + i == i + a), and they give the same result. The second is that we may apply the subscript operator to a
pointer expression as well as an array expression (subject to certain restrictions). We'll get into this a little more in the section on pointers and arrays.
Multiple levels of indirectionThere are a number of occasions when you have to deal with pointers to pointers. One common one is when you want a function to modify a
pointer value. It's similar to the
swap issue above, but instead of wanting the function to modify an integer in
main, we want it to modify a pointer to integer:
void swap(int **x, int **y)
{
int *tmp = *x;
*x = *y;
*y = tmp;
}
int main(void)
{
int a = 5;
int b = 10;
int *pa = &a;
int *pb = &b;
printf("before: a = %d, b = %d, pa = %p, pb = %p\n", a, b, (void *) pa, (void *) pb);
swap(&pa, &pb);
printf("after: a = %d, b = %d, pa = %p, pb = %p\n", a, b, (void *) pa, (void *) pb);
return 0;
}
At the beginning of swap
Item |
Address |
Contents |
---|
0x00 |
0x01 |
0x02 |
0x03 |
---|
a |
0x80C01230 |
0x00 |
0x00 |
0x00 |
0x05 |
b |
0x80C01234 |
0x00 |
0x00 |
0x00 |
0x0A |
pa |
0x80C01238 |
0x80 |
0xC0 |
0x12 |
0x30 |
pb |
0x80C0123C |
0x80 |
0xC0 |
0x12 |
0x34 |
x |
0x8800C004 |
0x80 |
0xC0 |
0x12 |
0x38 |
y |
0x8800C008 |
0x80 |
0xC0 |
0x12 |
0x3C |
At the end of swap
Item |
Address |
Contents |
---|
0x00 |
0x01 |
0x02 |
0x03 |
---|
a |
0x80C01230 |
0x00 |
0x00 |
0x00 |
0x05 |
b |
0x80C01234 |
0x00 |
0x00 |
0x00 |
0x0A |
pa |
0x80C01238 |
0x80 |
0xC0 |
0x12 |
0x34 |
pb |
0x80C0123C |
0x80 |
0xC0 |
0x12 |
0x30 |
x |
0x8800C004 |
0x80 |
0xC0 |
0x12 |
0x38 |
y |
0x8800C008 |
0x80 |
0xC0 |
0x12 |
0x3C |
This example was kind of artificial; we'll see some more realistic scenarios in the next section.
Arrays and pointersThere is a close correspondence between arrays and pointers in C, as described in the section above on pointer arithmetic. This falls out of C's treatment of array expressions.
Except when it is the operand of the
sizeof or unary
& operator, or it is a string literal being used to initialize another array in a declaration, an expression with type "N-element array of
T" will have its type implicitly converted ("decay") to type "pointer to
T", and its value will be the address of the first element in the array.
This behavior is responsible for more grief among C programmers than just about any other aspect of the language. When you pass an array argument as a parameter to a function such as
printf(), what actually gets passed is a
pointer value, not an array.
For example:
#include <math.h>
#include <time.h>
void randomize(int *arr, size_t elements)
{
size_t i;
for (i = 0; i < elements; i++)
arr[i] = rand();
}
int main(void)
{
int values[20];
size_t i;
randomize(values, 20);
for (i = 0; i < 20; i++)
printf("values[%2d] = %d\n", i, values[i]);
return 0;
}
Note that the type of
values in main is
int [20] (20-element array of
int), but the parameter
arr in
randomize is type
int *. When the compiler sees the array expression
values in the call to randomize, it implicitly converts the type from an array of integer to a pointer to integer. Thus the declaration of
arr as a pointer to int in
randomize, rather than an array.
In an effort to be confusing, C allows you to declare the parameter using array syntax, such as
void randomize(int arr[], size_t elements)
or even
void randomize(int arr[20], size_t elements)
However, in both cases, the declaration is interpreted as
int *arr. Note that this is
only true in the context of a function parameter declaration.
As mentioned above, one of the exceptions to the conversion rule is when the array expression is the operand of the unary
& operator; if the type of an expression
e is "N-element array of
T", then the type of the expression
&e is "pointer to N-element array of
T", or
T(*)[N]. The following tables show the types of various expressions involving arrays:
Declaration: T a[N];
Expression |
Type |
Decays to |
---|
a |
T [N] |
T * |
&a |
T (*)[N] |
|
*a |
T |
|
a[i] |
T |
|
Declaration: T a[N][M];
Expression |
Type |
Decays to |
---|
a |
T [N][M] |
T (*)[M] |
&a |
T (*)[N][M] |
|
*a |
T [M] |
T * |
a[i] |
T [M] |
T * |
&a[i] |
T (*)[M] |
|
*a[i] |
T |
|
a[i][j] |
T |
|
Declaration: T a[N][M][L];
Expression |
Type |
Decays to |
---|
a |
T [N][M][L] |
T (*)[M][L] |
&a |
T (*)[N][M][L] |
|
*a |
T [M][L] |
T (*)[L] |
a[i] |
T [M][L] |
T (*)[L] |
&a[i] |
T (*)[M][L] |
|
*a[i] |
T [M] |
T * |
a[i][j] |
T [M] |
T * |
&a[i][j] |
T (*)[M] |
|
*a[i][j] |
T |
|
a[i][j][k] |
T |
|
The pattern for higher-dimensional arrays should be clear from here on out.