Fundamentals of ASCII File Input/Output in C

Since the beginning of digital computing, people have seen the necessity for persistent data which can be quickly loaded into or saved from a computer. Konrad Zuse’s Z3 mechanical computer from 1940 supported instruction loading from punched 35mm film stock; similarly, Charles Babbage’s unbuilt Analytical Engine was to support punch cards similar to those used in Jacquard looms.

When C was first developed in Bell Labs, it was designed with the purpose of being a systems programming language, a more portable and friendly alternative to assembly language. File input and output features were therefore an important part of the language, and are dealt with in a consistent fashion by a series of functions contained within the C Standard Library. In the ISO C90 and C99 standards for C, the file input/output functions are contained within <stdio.h>.

The most elementary operations which one may wish to do with files are to open them, allowing them to be read from and written to, and to close them when those operations have completed. In order to do this, we need some way of representing a file. For this purpose, C provides us with a special type of structure called a file pointer, denoted with the type FILE *. The specifics of the FILE * type are defined in <stdio.h>, but understanding them is not important for understanding file input/output operations. It merely suffices to say that the type facilitates these operations.

Once we have a file pointer within our file, we can use it to open a file. The function used for this task is fopen(), which is declared in the following fashion:

fp = fopen("foo.file", "r");

where fp is the name of the file pointer, the first argument of fopen() is a string containing the name of the file to be opened and the second argument is the mode, a string containing directives to the operating system regarding what level of access is to be allowed to the file. In this case, the mode is “r“, allowing read-only access.

If the file can be opened, the file pointer is updated and the program may continue. However, if a file cannot be opened, e.g. because the user lacks permissions for the file, the fopen() function returns the value NULL, which can be used to set up error reporting and functions. This will be demonstrated below.

As most operating systems have a limit on the number of files that can be open at any one time, and as we may want to clear a buffer which a certain function is operating on, it is a good idea to close a file when we are done with it. Just as C provides fopen() for the purposes of opening a file, it provides the function fclose() for closing a file, which is initialised in the following manner:

fclose(fp);

where fp is again the name of the file pointer. fclose() is automatically called on every file pointer which is still open at the end of a program; it is still a good idea to call it manually on each file as a matter of habit.

The use of both the fopen() and fclose() functions is illustrated below:

#include <stdio.h>

int main(void)
{
    FILE *fp;

    /* Checking for errors! */
    if ((fp = fopen("foo.txt", "r")) == NULL) {
        printf("Error: File cannot be opened\n");
        return 1;
    } else {
        printf("File successfully opened\n");
        fclose(fp);
    }
    return 0;
}

Now that we have these elementary operations, we will want to actually manipulate the files we have opened. Two simple operations which act character by character, like getchar() and putchar(), are fputc() and fgetc(), which insert a character into a file and retrieve a character from a file respectively. These functions can be used to copy the contents of one file to another, for instance:

#include <stdio.h>
#define FILENAME 256

int main(void)
{
    FILE *ifp, *ofp;
    char c;
    char input[FILENAME], output[FILENAME];

    printf("Enter the name of the input file: ");
    scanf("%s", input);

    if ((ifp = fopen(input, "r")) == NULL) {
        puts("Error: input file invalid");
        return -1;
    }

    printf("Enter the name of the output file: ");
    scanf("%s", output);

    if ((ofp = fopen(output, "w")) == NULL) {
        puts("Error: output file invalid");
        return -1;
    }

    while ((c = fgetc(ifp)) != EOF) {
        fputc(c, ofp);
    }
    fclose(ifp);
    fclose(ofp);

    return 0;
}

This set of functions has its uses, but is somewhat limited in its scope. As well as this, only a single character is read or written at a time, which was an issue which we addressed with string functions previously. As well as this, all of the input and output performed by the fputc() and fgetc() functions involves ASCII text.

We will address the issue of strings first. Just as there are functions in the C standard library for getting strings from the standard input and printing them to the standard output, there are functions for getting and printing strings to and from files. fgets(), a function which we briefly saw when dealing with strings previously, and fputs() are equivalents to the gets() and puts() functions found in <string.h>.

We can illustrate a function which gets the contents of a file and prints it to the screen. Several programs which perform functions similar to this exist in Unix. The closest to this program is the cat program called with a single input file; we will therefore call this program “meow“.

#include <stdio.h>
#define MAXCHARS 81

int main(int argc, char **argv)
{
    FILE *ifp;
    char line[MAXCHARS];

    if (argc != 2) {
        puts("Usage: meow ");
        return -1;
    } else if ((ifp = fopen(argv[1], "r")) == NULL) {
        puts("Error: Input file invalid");
        return -2;
    } else {
        while ((fgets(line, MAXCHARS, ifp)) != NULL) {
            fputs(line, stdout);
        }
    fclose(ifp);
    }
    return 0;
}

Note the use of fputs() in this program rather than puts() or printf(). puts() prints another newline character for every line, which does not print the statements faithfully as they are contained in the input files; printf() similarly contains problems with printing files with percent signs, as these are of significance to the printf() function. The fputs() function prints the contents of all text files more faithfully than either of the other functions we may use.

fputs() isn’t just good for printing file contents to the screen; it is also useful for printing strings to files. With this function, we could implement a very simple text editor, similar in concept to the ed text editor which was once the standard Unix line editor. Our editor, which I will call “eddie“, does not have anywhere near the complexity or feature set of even the ed text editor, yet it will serve adequately as an example. Indeed, it is not really accurate to compare our text editor with ed; it is closer to the functionality of the cat program called with the < operator.

#include <stdio.h>
#define MAXCHARS 81

int main(int argc, char *argv[])
{
    FILE *ofp;
    char line[MAXCHARS];

    if (argc != 2) {
        puts("Usage: eddie ");
        return -1;
    } else if ((ofp = fopen(argv[1], "w")) == NULL) {
        puts("Error: Output file invalid");
        return -1;
    } else {
        while ((fgets(line, 81, stdin)) != NULL && strcmp(line, ".\n")) {
            fputs(line, ofp);
        }
    fclose(ofp);
    }
    return 0;
}

Note the strcmp() comparison made in the while loop. A similar comparison is made in the editing mode of the actual ed text editor, checking if the string being entered is a full stop followed by a newline character. This combination of characters is relatively rare in text editing, whether it be in writing source code or in writing human-language texts, and serves as an adequate delineation character to distinguish the end of a text file.

Now that the file string functions have been demonstrated, we can move onto formatted input. I mentioned above that one of the limitations of the fgets() and fputs() functions was that they could only work with ASCII text input. Often, we want to take in the contents of a file in an integer or floating-point form, allowing us to create the likes of database programs.

This is where the fprintf() and fscanf() functions come into play. As the names suggest, these functions are equivalent to the printf() and scanf() functions used for the standard output and input strings. They allow us to address incoming or outgoing data in numerical (and Boolean, in C99) form, as well as ASCII characters. This gives us a greater deal of flexibility, particularly when it comes to data processing applications and the like.

The following functions from a Little Man Simulator whose source code is available under the GNU General Public License demonstrate the loading of a file into memory and saving it to a file. While these functions are regrettably not particularly illuminating outside of their original context, they do adequately demonstrate the use of the fscanf() and fprintf() functions.

/* Function: load_file
Prompts the user to enter a filename, then loads the contents of this file
into the program as instructions/data contents of the mailbox array.
Arguments: mailbox[], an array of short integers.
Returns: a short integer equal to -1 or 0. */
short load_file (short mailbox[])
{
    FILE *fp;
    short i = 0;
    int digits;
    char filename[FILE_LENGTH];

    printf("Enter a valid filename: ");
    scanf("%s", filename);

    /* If file won't open, i.e. *fp == NULL, exit with error state; otherwise,
    open file and use fscanf to enter the digits into the mailboxes. */
    if ((fp = fopen(filename, "r")) == NULL)
    return -1;
    else {
        while (i < MAILBOXES && (fscanf(fp, "%d", &digits)) != EOF) {
            mailbox[i] = digits;
            ++i;
        }
    }
    fclose(fp);
    return 0;
}

In this first function, a filename is obtained, which is used for the operation of the fopen() function. The contents of the file, which contains a set of integers which have no greater than three significant digits, are taken and each integer is placed into a variable named “digits”, before being placed into a corresponding mailbox slot. This continues until the i variable exceeds the number of mailboxes (typically 100 in a standard Little Man implementation) or until the end-of-file character is encountered.

/* Function: save_file
Prompts the user to enter a filename, then saves the instructions/data
contents of the mailbox array as contents of this file.
Arguments: mailbox[], an array of short integers.
Returns: a short integer equal to -1 or 0. */
short save_file (short mailbox[])
{
    FILE *fp;
    short i;
    char filename[FILE_LENGTH];

    printf("Enter a valid filename: ");
    scanf("%s", filename);

    /* If file won't open, i.e. *fp == NULL, exit with error state; otherwise,
    open file and save the contents of memory to it. */
    if((fp = fopen(filename, "w+")) == NULL)
        return -1;
    else {
        for (i = 0; i < MAILBOXES; i++) {
            fprintf(fp, "%d\n", mailbox[i]);
        }
    }
    fclose(fp);
    return 0;
}

This corresponding function takes the contents of memory and for each mailbox slot, prints an integer of at most three significant digits to the file specified by the user. Empty mailbox slots are printed as zeroes to the output file.

It should be noted that all of these functions only operate on ASCII text files, even with the formatted input. Binary file input/output will be explained subsequently, but the functions are contained within <stdio.h> just as with the ASCII file input/output functions.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: