Fundamentals of String Manipulation in C: Part 2

Having discussed most of the important functions relating to strings before, there are only a few others of particular note. We saw gets() previously, which acts like scanf(“%s”, string), and while it performs the job it’s asked to do, it is regarded as somewhat dangerous as it is prone to buffer overflow. A function does exist in the C standard library in <stdio.h> which is somewhat safer.

fgets() is an equivalent function to gets() designed to work on file input. Unlike gets(), you can specify a maximum number of characters to be taken in, which mitigates the buffer overflow that can occur with gets(). fgets() is called with three arguments: the string where input will be stored; the maximum number of characters, including the null character and either stdin, the reference of the standard input stream, or a pointer to a variable of the type FILE. File pointers will be discussed later; for now, we are only interested in stdin.

An example of the use of fgets() is illustrated below:

#include <stdio.h>

int main(void)
{
    char string[30];
    printf("Please enter a string: ");
    fgets(string, 30, stdin);
    puts(string);
    return 0;
}

One peculiar difference between fgets() and gets() is that fgets() does not remove newline characters from its input, while gets() does. This is to be noted when trying to concatenate two strings with strcat() which have been entered from the standard input stream with fgets().

Previously, we also saw the strcmp() function for comparing two strings. A similar function, strstr() (for string string) can be used to find an instance of a string of smaller or equal size within another string. It takes two arguments, both of them strings, and returns a pointer to the first instance of the string being searched for in the string being searched. The following program demonstrates strstr() in action.

#include <stdio.h>
#include <string.h>

int main(void)
{
    char string[] = "yellow dinosaurs eat snow reluctantly";
    char *p;
    int index;

    /* Looking for the location of "eat" in string[] */
    p = strstr(string, "eat");
    /* Finding the element within the array where "eat" begins */
    index = p - string;
    printf("The string \"eat\" begins at index %d of string[]\n");

  return 0;
}

This returns the following:

The string "eat" begins at index 16 of string[]

Searching for multiple instances of strings using strstr() requires a slight modification of our program. We can do this by creating an index, or a point in the array where the last instance of the string was found, and call the strstr() function from the next contiguous point in memory (i.e. the pointer string + index + 1). This example will benefit from the following illustration:

#include <stdio.h>
#include <string.h>

int main(void)
{
    char long_string[] = "The C programming language was invented in the \n"
    "early 1970s by the computer scientist, Dennis Ritchie, who was then \n"
    "working at Bell Labs in New Jersey, which had just removed its \n"
    "support from the Multics project.\n";
    int index = -1, count = 0, i;
    char *p;

    printf("%s", long_string);

    for (i = 0; i < strlen(long_string); i += index) {
	p = strstr(long_string + index + 1, "in");
	if ((index = p - long_string) >= strlen(long_string) || p == NULL)
	break;
	++count;
    }
    printf("\nThe string \"in\" has been located in the string %d times\n", count);
    return 0;
}

This example returns the following:

The C programming language was invented in the 
early 1970s by the computer scientist, Dennis Ritchie, who was then 
working at Bell Labs in New Jersey, which had just removed its 
support from the Multics project.

The string "in" has been located in the string 5 times

One thing to notice about this program is that the counter variable is not incremented by 1 on every loop, but instead by the value of index; this ensures that the loop continues only as long as there are still instances of the string being searched for to be counted.

In the last tutorial, We demonstrated atoi(), a function which converts a string consisting of numeral digits into a decimal integer. It was mentioned at the time that other functions of the same type exist. atof(), for instance, converts a string consisting of numeral digits, exponents and at most one radix point into a double; atol() operates like atoi(), except that it returns a long int. On most modern compilers, atol() works exactly like atoi(), but on older compilers with 2-byte ints, the two functions work differently.

The implementation of simple versions of these two functions is discussed in The C Programming Language (Kernighan & Ritchie, 2nd Edition, 1988). Other functions of this type with more flexibility exist, like strtod(), strtol() and strtoul(), the operations of which can be found in any good C reference material.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: