Fundamentals of String Manipulation in C: Part 1

We’ve already seen arrays, which are chains of contiguous pointers to variables, and we’ve seen how we use them. There are arrays available of all data types, including integers of all lengths, floats, doubles and chars. A char array could be used to hold a series of characters corresponding to a word or sentence, as such:

#include <stdio.h>

int main(void)
{
    char word[] = {'f', 'o', 'o', 'b', 'a', 'r'};
    int i;
    for (i = 0; i < 6; i++)
	printf("%c", word[i]);
    printf("\n");
    return 0; /* Recall that main() returns this as an error code */
}

The array is defined with a series of characters, six in total, and the for loop goes through each character in sequence and prints it to the screen. This is, however, a cumbersome procedure and requires knowledge of the length of each individual character array. Because the manipulation of text in C is so common, there is a simpler, less cumbersome way of declaring and using arrays of characters, as illustrated below:

#include <stdio.h>

int main(void)
{
    /* Note that there are no curly brackets, and quotation marks are
    * used. */
    char word[] = "foobar";
    int i;
    for (i = 0; i < 6; i++)
	printf("%c", word[i]);
    printf("\n");
    return 0;
}

The format of the character array in this example is known as a string. Indeed, we’ve seen them before many times without explicitly referring to them as such; the printf() function takes arguments which are strings. Indeed, we can change the printf() arguments in this example to make it easier and less cumbersome:

#include <stdio.h>

int main(void)
{
    char word[] = "foobar";
    printf("%s\n", word);
    return 0;
}

This is much simpler to read, less dependent on knowing the length of the string and less error-prone. The question is, however, “How does printf() know when to stop printing?” The answer is that the word[] string isn’t just declared with the six characters in “foobar”, but also with a so-called null character, which is denoted using “”. Internally, printf() goes through the characters in the string until it reaches the null character, whereby the function terminates and returns.

There is another function in <stdio.h> which acts like printf() with a %s format specifier, and which is easier to remember and use. puts() takes a single argument, a character array, and prints it to the screen with a newline character at the end. This is illustrated below:

#include <stdio.h>

int main(void)
{
    char word[] = "foobar";
    puts(foobar);
    return 0;
}

As you can see, this is a cleaner way of printing strings to the screen than printf(), albeit without the flexibility. String printing is commonplace in text-based systems with C programming, and so puts() is often useful.

Now that we have a string, we might want to add it to another string. This can be done by declaring a larger character array which is large enough to accommodate all of the characters of both strings, and then copying the contents of both strings together. This can be illustrated with the following example:

#include <stdio.h>

int main(void)
{
    char word_1[] = "foobar";
    char word_2[] = "baz";
    char combined[10];
    int i = 0; j = 0;

    /* Continue going through the string until '' is reached */
    while (word_1[i] != '')
    {
	combined[i] = word_1[i];
	i++; /* Incrementing i goes to the next slot in memory for
	* both arrays */
    }
    /* i is now equal to the first empty element in combined */

    /* Starting again with a new counter variable */
    while (word_2[i] != '')
    {
	combined[i+j] = word_2[j];
	j++;
    }
    combined[i+j] = ''; 
    printf("%s\n", combined);
    return 0;
}

This will print out a string with the contents “foobarbaz”. The problem here, though, is that as above, with the definition of the character array using individual characters, is that this is complicated, cumbersome and error-prone. For this reason, several functions have been defined in the C standard library under the header file, <string.h>, which more cleanly deal with the manipulation of strings. The following program demonstrates some of them:

#include <stdio.h>
#include <string.h>

int main(void)
{
    char word_1[] = "foobar";
    char word_2[] = "baz";
    char combined[10];

    strcpy(combined, word_1);
    strcat(combined, word_2);
    puts(combined);
    return 0;
}

As you can see, this is a far cleaner way of defining these operations, without any need for explicit counters or anything else which would make the program more error-prone. Both strcpy() and strcat()take their arguments in the following form:
strcpy(destination, source);
strcat(destination, source);

strcpy() (standing for string copy) is used to copy the contents of one string to another, including the ” null character. strcat() (standing for string concatenate) is used to append the contents of one string onto the end of another; the word “concatenate” is merely a fancy way of saying “link

together”, in this case referring to joining two pieces of text together.

There are a number of other useful functions defined in <string.h> which are of note. strcmp() takes two strings and compares the characters in them. If the characters are exactly the same, the function returns 0; otherwise, it returns a positive or negative value depending on the difference in ASCII values between the first character that disagrees in both strings. strlen() returns the length of a string. Both functions are illustrated below:

#include <stdio.h>
#include <string.h>

int main(void)
{
    char word_1[]="hello";
    char word_2[]="world";
    char word_3[]="concatenate";

    /* Note the different values returned each time. */
    printf("The difference between word_1 and word_2 is %d\n",
    strcmp(word_1, word_2));

    printf("The difference between word_2 and word_1 is %d\n",
    strcmp(word_2, word_1));

    printf("The length of word_3 is %d\n", strlen(word_3));

    return 0;
}

This program returns the following results:

The difference between word_1 and word_2 is -15
The difference between word_2 and word_1 is 15
The length of word_3 is 11

This is consistent with the difference between ‘h’ and ‘w’ in the ASCII table; this is also consistent with the length of the word “concatenate”. These functions can therefore be used usefully when comparing text, which is a common task in the C programming language.

Using scanf(), we can write our own text into a character array as a string; however, this does not have any security against the characters overflowing the string and causing a segmentation fault. The use of scanf() to write into a string is illustrated below:

#include <stdio.h>

int main(void)
{
    char string[21]; /* 20 characters plus the null character. */

    /* Note that we don't use the & dereference operator here. */
    scanf("%s", string);
    printf("%s", string);
    return 0;
}

This will work successfully unless the characters entered are greater than 20; this will cause a segmentation fault and cause the program to halt unexpectedly. As with printf() and puts(), there is a function defined for entering characters specifically into a string called gets(). gets() is, however, considered dangerous and replacements are available in many extended C standard libraries; however, gets() is defined in ISO C90 and C99 and is available on all implementations of Standard C. It is illustrated below:

#include <stdio.h>

int main(void)
{
    char string[21];
    gets(string);
    puts(string);
    return 0;
}

Another thing to note is that strings, unlike character arrays defined with discrete characters, are not modifiable. Assignment statements cannot be used on a string or its elements. A function must therefore be used on the string in order to modify it.

Finally, there are a number of functions which are defined in <stdlib.h> which can be used to convert a string of characters with numerical values (i.e. ‘0’ to ‘9’) into integers, floats or doubles. atoi(), for example, (standing for ASCII to integer) converts a string consisting of such values, e.g. “1234” into an integer value. Other similar functions are also contained in <stdlib.h> and would be described in a reference of the C standard library (such as Appendix B of The C Programming Language (Kernighan & Ritchie, 2nd Edition, 1988)).

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: