Pointers

Given a storage cell, there are two values associated with that storage cell: its contents and its address. So given an integer variable declaration   int ii;,   ii is associated with an integer storage cell. The expression ii refers to the contents of that cell and the expression &ii refers the the address of that cell. The indirection operator * accesses the contents of the storage cell pointed by an address so *(&ii) refers to the same thing as ii. Pointer variables contain address values.

It is important to understand how the symbols & and * are used in C. The symbol & as a unary operator is the address of operator (as a binary operatory it is the bit-wise and operator). The addressof operator may only be applied to variables. The resulting value is the address of the storage cell associated with that variable. If the type of the variable is X, the type of the address of operation is pointer to X.

In expressions, the symbol * as a unary operator means pointer indirection (as a binary operator it means multiplication). The indirection operator can be applied to any expression that is the data type pointer to X. The result of applying the indirection operator would be the value of type X in the storage cell that the pointer operand points to.

In a declaration statement, the symbol * adds pointer to to the type the declaration refers to. Note the symbol & is NOT used in declarations.

A generic pointer (a pointer whose type matches any other pointer) is declared as follows.

	void	* ptr;

Collections: Structs and Arrays

C has two types of collections. Structs are heterogenous collections and arrays are homogeneous collections.
	struct nn
		{	
		int	ii;	    ss:	+-------+-------+-------+
		char	cc;		|ii:	|cc:	|ff:	|
		float	ff;		+-------+-------+-------+
		} ss;	

	struct nn s2;		    s2:	+-------+-------+-------+
					|ii:	|cc:	|ff:	|
					+-------+-------+-------+
			


	int	a1[3];		    a1:	******	+-------+-------+-------+
					*  --*->|0:	|1:	|2:	|
					******	+-------+-------+-------+
					   
	struct	{
		int	i1;	    a2:	+-------+-------+-------+
		int	i2;		|i1:	|i2:	|i3:	|
		int	i3;		+-------+-------+-------+
		} a2;

	int	a2d[2][3];	    a2d:******	******	+-------+-------+-------+
					*  --*->*  --*->|0:	|1:	|2:	|
					******	******	+-------+-------+-------+
						*  --*->|0:	|1:	|2:	|
						******	+-------+-------+-------+
Assuming an int is 4 bytes, a char is 1 byte, a float is 4 bytes, ss is a 9 byte storage cell containing 3 elements. (Actually, most compilers will allocate more space for ss leaving unused space after the char so each element in ss begins on an even memory address.) The optional identifier nn defines a structure type. Additional struct variables can be defined just using the struct identifier.

The array declaration creates two separate entities. First, there is an array of 3 int elements which would occupy 12 bytes of memory. The identifier a1 is NOT be the array itself, it is a pointer value containing the address of the actual array. In this document, this pointer value will be referred to as an array reference. Furthermore a1 is not a normal storage cell, it is a compile-time constant. As a consequence, a1 may not be assigned a value and memory does not have to be allocated for it (although memory is allocated for the array itself). Since a1 is a constant, some compilers flag trying to takes its address as an error, just like taking the address of the constant 5 would be an error. But some compilers allow the addressing operator to be applied to array reference constants, with the resulting value being the address itself (a1 == &a1). IMHO, taking the address of an array reference constant should be flagged at compile time as an error.

Notice how collections a1 and a2 are treated differently even though they are both collections of 3 integers. Identifier a1 is a pointer (constant), an array reference. On the otherhand, a2 is the 12-byte collection itself. Array identifiers are treated differently than any other identifiers and the inconsistency causes much confusion. Whenever arrays are involved, programmers inexperienced in C must take time to carefully create a diagram of the storage cells, their relationships, and their data types.

Multi-dimensional arrays are implemented by creating arrays of arrays. In the example above, a2d and a2d[0] contain the same address, that of a2d[0][0].

Pointer and Arrays

Note that variables declared to be pointers and variables declared to be arrays both contain addresses. Operators that work on pointers also work on array referencess. So the following two code segments are functionally equivalent.
int aa[10];
int *p1;
int *p2;
int ii;

p1 = ⅈ
p2 = aa;
aa[5] = 20;
p2[7] = 30;
*p1 = 7;
int aa[10];
int *p1;
int *p2;
int ii;

p1 = ⅈ
p2 = aa;
*(aa+5) = 20;
*(p2+7) = 30;
p1[0] = 7;

Perhaps another way of thinking of the programs above is each has TWO arrays of integers. The first array contains ten integers and is pointed to by a1. The second array is an aray of one integer and is pointed to by the expression &ii. Although there is no variable that stores the ten integer array, the variable ii does contain the one integer array.

Consider the program spat1.c. Draw a diagram of the variables in the program. Determine the data type of all the values to be printed. What errors/warnings would you expect when the program compiled? If all errors are removed, what output would you expect from the corrected program?

Given the similarities between array references and pointers, how does one make sense of them?

  1. In situtations where you need an array, declare and use arrays just as you would in Java.
    	int ii;
    	int aa[10];
    	for (ii=0; ii<10; ii++)
    		aa[ii] = ii;
  2. In programs that require data types other than the primitive basic types, Java programmers define objects and use object references. In Java, object references are just pointers. So if in Java you have a node class, node variables store pointers to node objects. In C, the pointer nature of the reference is explicit.
    	typedef struct nn { int value;  struct nn *next; } node;
    	node	*top;
    	node	*pp;
    	pp = top;
    	while (pp != NULL)
    		{
    		printf ("%d\n", (*pp).value);
    		pp = (*pp).next;
    		}
    
  3. When you need a copy of an array, you must copy the array.
    	int	a1[10];
    	int	a2[10];
    	int	ii;
    	//  Code that assign values to a1.
    	for (ii=0; ii<10; ii++)
    		a2[ii] = a1[ii];
    	// Note: a2 = a1 would not work, a2 is a constant!  Even if a2 was
    	// not a constant, you would be assigned the pointer not the array.
    
  4. When you want an alias for an array, you may use a pointer. This is seldom appropriate with a 1-dimensional array but does become useful for multi-dimensional arrays.
    	int	quiz[30][10];	// 30 students, 10 quiz grades each
    	int	*best;		// A dynamically assigned array reference.
    	int 	ii;
    	// Set best to grades of student with best grade on quiz 0.
    	best = quiz[0];
    	for (ii=1; ii<30; ii++)
    		if (quiz[ii][0] > best[0])
    			best = quiz[ii];
    	// Note best is an alias for some quiz[ii], so if best[1]
    	// got changed, quiz[ii][1] would also change.
    
  5. If the size of an array is not known till run time, a pointer must be used as an array reference. The pointer is assigned dynamically, allocated storage for the array.
    	int	*aa;	// Really an array reference but the pointer value is assigned dynamically.
    	int	size;
    	int	ii;
    	// Code that sets size.
    	aa = malloc (size * sizeof(int)); Returns a pointer to size int-sized storage cells.
    	for (ii=0; ii
    
  6. In multi-dimensional arrays where the second dimension is variable, a pointer must be used. Consider the following example, where we need an 2-d array to store 3 types of grades(hws, quizes, attendance) where there are a different number of scores for each type of grade.
    	int	*(grade[3]);
    	grade[0] = malloc (HWS_CNT * sizeof(int));
    	grade[1] = malloc (QUIZ_CNT * sizeof(int));
    	grade[2] = malloc (ATT_CNT * sizeof(int));
    

Typedef

The typedef statement creates an alias for another type name. The typedef statement has the same form as a variable declaration except the key word typedef precedes the declaration and the type name is placed where the variable name would be.
	typedef	int	age;
	typedef	char	line[80];
After the definitions above, the following are equivalent.
int v1; age v1;
int *v2; age *v2;
char v3[80]; line v3;
char v4[10][80]; line v4[10];
char (*v5)[80]; line *v5;

Array Parameters

When arrays are declared as formal parameters, they work like dynamic arrays. The reference is being passed as the parameter, not the array itself. The formal parameter serves as an alias for the actual argument. Thus it can be declared to be a popinter, just as array aliases where declared as pointers in the example above. The formal parameter is stored in a memory location, so the compiler does not make the parameter a constant. The compiler allows either of the following notations to indicate an array formal parameter, they are functionally identical.
	void ff(int arg[])	void ff(int *arg) 
Perhaps the former is more readable, but the latter is more commonly used and more accurately describes the alias nature.

Note that just as in Java, if the contents of an array formal parameter are changed in a function, the contents of the array actual argument will also be changed. One can prevent that from happening by using const in the declaration.

	void ff(int const arg[])	void ff(int const * arg) 

if a 2-dimensional array is passed into a function, it is necessary to specify all but the first dimension in the declaration. If the parameter is a 10x30 array, all the following are acceptable.

	void ff(int arg[][30])
	void ff(int arg[10][30])
	void ff(int (*arg)[30])
The following are not acceptable because for the compiler to compute the location of arg[ii] for an index value other than 0, the compiler must know how many element in each arg[ii]. The following declarations do not provide that information. the
	void ff(int *(arg[10]))
	void ff(int **arg)
	void ff(int arg[][])

C-style strings

Strings in C are arrays of characters. The literal "abc\n" defines a 5-element array with the 0'th element containing the character'a', the 1'th element containing the character'b', the 2'th element containing the character'b', the 3'th element containing the character'\n', AND the 4'th element containing the end-of-string character'\0'. The literal is an character array reference (a pointer constant), pointing to the first element in the 5-element array. Since objects in Java are references, C-style ctrings work much like Java strings except in C the end-of-string character must be included in the array since C does not keep track of array size.
	char	*s1;
	char	*s2;
	char	s3[9];
	int const s3_SZ = (sizeof(s3) / sizeof(char))
	s1 = "abc";
	s2 = a2;
	strcpy(s3, s1);			// Careful, what if s1 is bigger than s3
	strncpy(s3, s2, 9);		// Better
	strncpy(s3, s2, s3_SZ);		// Best
	strncpy(s3, "123", s3_SZ);
Consider the second parameter in main. it is an array of strings. But since a string is an array of characters, that means the parameter is a 2d array of characters. However, it cannot be declared as   char args[][]   because each string is a different size. As was discussed above, when the second dimension in a 2d array is dynamic, the following declaration is required:   char *(args[])  . Since a parameter is being declared, the folowing declaration is equivalent:   char **args  .

See string(3) for useful string functions.

	int strcmp(char const *str1, char const *str2);
	int strcpy(char *str1, char const *str2);
	int strlen(char const *str);
	int strcat(char *str1, char const *str2);

void

The keyword void is used for the following.