Fundamentals of Binary Input/Output in C

Previously, we examined some of the operations which C allows you to do with files in the form of ASCII text, and how the use of these operations could greatly increase the flexibility of the programs that you can write in C. With these functions, we can take our first steps towards writing a text editor, a file concatenator, a database program or many other useful utilities. However, being able to deal only with ASCII text has its limitations; we are unable, for instance, to operate on image, sound or formatted text files.

In order to expand our programs further, we must delve into the world of the binary file operations which exist in the C programming language. As with the plain-text operations, the binary operations are defined in the <stdio.h> header file. The fopen() function includes special modes for reading from or writing to a file in binary mode, which looks something like this:

fp = fopen("foo.bin", "rb");

Note the addition of the “b” tag to the mode specification. This is required in some operating systems to act upon the file with operations other than the plain-text operations we discussed before, opening up the block input/output operations such as fread() and fwrite(). In a POSIX-compliant system (such as Unix, Linux, Mac OS X, et cetera), this is not required, but for maximum cross-compatibility, it should be included anyway.

While we’re discussing opening files, there is a function included in the C standard library which closes and reopens a file, allowing one to change the read-write permissions which the file is opened with. This function is named freopen(), and is called in the following fashion:

freopen("foo.file", "r+b", fp);

where “foo.file” is the name of the file in question, the mode is as in fopen() and fp is the file pointer or stream which the file is to be associated with. This can be used to change the permissions of a file from read-only to read-write, to append, to write-only or to change the access from plain-text mode to binary mode. It is more traditionally used to change the streams that the standard input/output streams are associated with, particularly in systems where stdin, stdout and stderr cannot be closed manually.

Now that we have a file open in binary mode, we may perform some binary operations on it. The fread() and fwrite() functions are more complicated to declare than the plain-text functions we have seen so far, but these functions have greater flexibility as a result. We will begin with fread(), which takes in a number of items in binary mode. It is declared in the following fashion:

fread(buffer, size_z, number, fp);

where buffer represents the variable or array where the things read by the function are stored, size_z represents the size of the objects to be taken in (for example, sizeof(char) for single bytes, or sizeof(short) for 2-byte short integers), number represents the number of objects to be read in, and fp represents the file pointer or stream which the data is to be read from.

The following simple program takes in a number of integers from a file with the following hexadecimal contents, written in the Intel x86-compatible little-endian format:

0A 00 00 00 14 00 00 00 1E 00 00 00 28 00 00 00 32 00 00 00
37 00 00 00 3C 00 00 00 41 00 00 00 46 00 00 00 4B 00 00 00

If read into a text editor like Emacs, this file is represented by the following meaningless string:

^@^@^@^T^@^@^@^^^@^@^@(^@^@^@2^@^@^@7^@^@^@<^@^@^@A^@^@^@F^@^@^@K^@^@^@

However, we can read this into a C program in binary format and get some meaning out of it.

#include <stdio.h>

int main(void)
{
    FILE *fp;
    int bar[10];
    int i;

    fp = fopen("foo.bin", "rb");
    if (fp == NULL) {
        puts("Error: Input file cannot be read");
        return -1;
    }
    else {
        fread(bar, sizeof(int), 10, fp);
        for (i = 0; i < 10; i++) {
            printf("%d ", bar[i]);
        }
        putchar('\n');
    }
    fclose(fp);
    return 0;
}

This program prints out the following:

10 20 30 40 50 55 60 65 70 75

In any machine with 32-bit int variables and little-endian bit organisation, the same results will apply. However, this code isn’t particularly machine-portable, and will have different results in different computers. For instance, using this code in a computer which uses the IBM POWER architecture, such as a seventh-generation games console like the Xbox 360, will have a radically different result than the one we get with x86 processors. This has to be borne in mind when using binary files in different computers.

Just as we may wish to both read and write plain-text files using C’s functions, we may wish to do the same with binary files. We may perform these writing operations using the fwrite() function, which takes similar arguments to fread(). The declaration of this function is illustrated below:

fwrite(buffer, size_z, number, fp)

where again, buffer refers to the variable or array where the values are stored, size_z refers to the size of the values, number refers to the number of values to be written, and fp refers to the file pointer or stream where the binary is to be written. The following program calculates the first ten powers of two, stores them in an integer array and writes these values to a file.

#include <stdio.h>

int main(void)
{
    FILE *fp;
    int powers_two[10];
    int i;

    powers_two[0] = 1;
    for (i = 1; i < 10; i++) {
        powers_two[i] = powers_two[i-1] * 2;
    }

    fp = fopen("bar.bin", "wb");
    if (fp == NULL) {
        puts("Error: Input file invalid");
        return -1;
    } else {
        fwrite(powers_two, sizeof(int), 10, fp);
    }
    fclose(fp);
    return 0;
}

In my little-endian AMD x86_64 machine, the binary values written to the bar.bin file have the following hexadecimal values:

01 00 00 00
02 00 00 00
04 00 00 00
08 00 00 00
10 00 00 00
20 00 00 00
40 00 00 00
80 00 00 00
00 01 00 00
00 02 00 00

Converted into decimal, we get the values 1, 2, 4, 8, 16, 32, 64, 128, 256 and 512, as expected. Again, these values are almost meaningless in ASCII; most of the bit arrangements correspond to control characters, and the string of characters is of little interest. We can, however, read in the binary data as we did before and perform calculations on the values as standard integers.

Now that we have our basic block input/output functions, we can start investigating some of the other file access functions which we can use on our file pointers. There are a considerable number of these functions defined in <stdio.h>, not all of which are of immediate interest, but which have their own utility in a C program.

One of the functions which is of interest is the rewind() function, which returns the file position to the beginning of the file, clearing the end-of-file and error flags in the process. One might liken this to rewinding a tape; this would also be of use when reading a sound file in a music player, which is one type of program which works on binary files. The rewind() function is illustrated below, reading in integers from the bar.bin file defined in the last program:

#include <stdio.h>

int main(void)
{
    FILE *fp;
    int foo[10];
    int i;

    if ((fp = fopen("bar.bin", "rb")) == NULL) {
        puts("Error: Input file invalid");
        return -1;
    } else {
        for (i = 0; i < 10; i+=5) {
            fread(&foo[i], sizeof(int), 5, fp);
            rewind(fp);
        }
    }

    fclose(fp);

    for (i = 0; i < 10; i++) {
        printf("%d ", foo[i]);
    }
    putchar('\n');

    return 0;
}

This program opens the bar.bin file in binary format, then starts reading values into the foo array one set of four bytes at a time as before. Note the use of the & reference operator; because five integers are read at a time, we need to reference the specific element of the array which we wish to read into, otherwise, we will end up with the first five elements of the array being read into twice, and the others being undeclared. When we run this program, we get the following results:

1 2 4 8 16 1 2 4 8 16

The first five powers of two have been read in to the array twice, with the file rewinding on each occasion. If we were to read into a larger array, the function would start reading from the first value in bar.bin, 01 00 00 00.

Given that we have a function which rewinds the file fully, we might want to prove to ourselves explicitly that the rewind() function has actually rewinded the file position back to the start, and to find the file position before the rewind() function is called. To do this, we can use the ftell() function, a function which returns a long integer value which tells us the current file position. We can expand our previous program with calls to the ftell() function contained within our for loop:

#include <stdio.h>

int main(void)
{
    FILE *fp;
    int foo[10];
    int i;

    if ((fp = fopen("bar.bin", "rb")) == NULL) {
        puts("Error: Input file invalid");
        return -1;
    } else {
        for (i = 0; i < 10; i+=5) {
            fread(&foo[i], sizeof(int), 5, fp);
            printf("%d\n", ftell(fp)); /* Calling ftell() */
            rewind(fp);
            printf("%d\n", ftell(fp)); /* And calling it again */
        }
    }

    fclose(fp);

    for (i = 0; i < 10; i++) {
        printf("%d ", foo[i]);
    }
    putchar('\n');

    return 0;
}

The results from this program are:

20
0
20
0
1 2 4 8 16 1 2 4 8 16

Using this, we can see that the file position was at the twenty-first (noting that as usual in C, we start counting from zero) byte or the start of the sixth integer position in the file before the rewind() function was called, and that the file position returns to the first byte after the rewind() function is called. We have verified that the rewind() function does, in fact, work as we expect it to.

It is clear, though, that the rewind() function is rather limited in its scope; it can only return to the beginning of the file. A music player that could only rewind to the start of a song would be considered terribly limited, and just as with a modern digital music player, we can seek out a specific byte within a file and operate from that position. This is where the fseek() function comes in to play, which works somewhat like rewind(), but with a lot more flexibility. fseek() is called in the following fashion:

fseek(fp, offset, origin)

where fp is the file pointer for which we wish to change the file position, offset is the number of bytes we want to move the file position, given a certain origin, which is equal to the defined value SEEK_SET for the beginning of the file, SEEK_CUR for the current file position and SEEK_END for the end of the file. For this function, we might want to define a larger binary input file, so that we can see the full extent of the fseek() function. The following set of hexadecimal values was saved to the baz.bin file:

01 01 00 00 01 02 00 00 01 04 00 00 01 08 00 00 01 10 00 00 01 20 00 00 01 40
00 00 01 80 00 00 01 00 01 00 01 00 02 00 01 00 04 00 01 00 08 00 01 00 10 00
01 00 20 00 01 00 40 00 01 00 80 00 01 00 00 01 01 00 00 02 01 00 00 04 01 00
00 08 01 00 00 10 01 00 00 20 01 00 00 40 01 00 00 80 02 01 00 00 02 02 00 00
02 04 00 00 02 08 00 00 02 10 00 00 02 20 00 00 02 40 00 00 02 80 00 00 02 00
01 00 02 00 02 00 02 00 04 00 02 00 08 00 02 00 10 00 02 00 20 00 02 00 40 00
02 00 80 00 02 00 00 01 02 00 00 02 02 00 00 04 02 00 00 08 02 00 00 10 02 00
00 20 02 00 00 40 02 00 00 80

The following program declares an integer array of size 48, and moves the file pointer to i bytes past the start of the array every time the for loop runs.

#include <stdio.h>
#define ARRAYSIZE 48

int main(void)
{
    FILE *fp;
    int values[ARRAYSIZE];
    int i, j;

    if ((fp = fopen("baz.bin", "rb")) == NULL) {
        puts("Error: Input file invalid");
        return -1;
    } else {
        for (i = 0; i < ARRAYSIZE; i++) {
            fread(&values[i], sizeof(int), 1, fp);
            fseek(fp, i, SEEK_SET);
        }
    }

    for (i = 0; i < ARRAYSIZE; i++) {
        printf("%d ", values[i]);
    }
    putchar('\n');

    return 0;
}

This returns the following:

257 257 16777217 33619968 131328 513 16777218 67174400 262400 1025 16777220
134283264 524544 2049 16777224 268500992 1048832 4097 16777232 536936448
2097408 8193 16777248 1073807360 4194560 16385 16777280 -2147418112 8388864
32769 16777344 65536 16777472 65537 16777472 65537 33554688 131073 16777728
65538 67109120 262145 16778240 65540 134217984 524289 16779264 65544

All of these values are somewhat closely related to powers of two, as the series of values which we placed into the bar.bin file would suggest. Nevertheless, this is not an incredibly exciting program, nor are the programs we defined before. We are simply working on a set of integers, but binary files can be more interesting.

Executable files are a sort of binary file, although with modern computer architectures – and for that matter, even for older, more consistent architectures – working on these executables in unadorned hexadecimal is difficult. Other types of files that are in binary format are the likes of MP3 music files, MPEG movies and various formats of image files.

One of the simplest image formats is the uncompressed BMP bitmap format defined by Microsoft. The mandatory components of a BMP file are the 14-byte header, which stores general information about the file, a DIB header of various size which contains more detailed information about the bitmap image, and a pixel array, which consists of blocks of bytes which encode the red, green and blue colour values of the pixels, along with an optional transparency (or alpha) value.

Using this information, we can write a simple application which reads in a BMP file, then does something with this file, such as inverting the values of the colours. The application then writes this information to a separate file, including the header and DIB header. The following program takes an input file, which has been defined in this instance as having a 40-byte DIB header corresponding to one of the seven BMP header types that exist. The filename of the input file can be taken in as a command-line argument, or defined within the program itself.

/* Image manipulation: Colour inverter */
/* Based on the work of Leo Tilson @ Dublin Institute of Technology,
   modifications made by Richard Kiernan */

#include <stdio.h>
#include <stdlib.h>
#define CHAR_MAX 2560

int main(int argc, char **argv)
{
    FILE *input_image;
    FILE *output_image;

    unsigned char header[14]; /* Stores the header of the file */
    unsigned char info[40]; /* Stores the DIB header of the file */
    unsigned char *image; /* Once malloc() is called, stores the image data
                           * for the file */

    unsigned int num_read; /* For error checking */

    unsigned int width; /* All derived from the DIB header */
    unsigned int row_length;
    unsigned int height;
    unsigned int im_size; 

    unsigned int row; /* Two counter variables */
    unsigned int pixel; 

    unsigned char blue; /* Stores the blue byte for each pixel */
    unsigned char green; /* Stores the green byte for each pixel */
    unsigned char red; /* Stores the red byte for each pixel */

    unsigned char *lineptr; /* We'll use this to go through each pixel */

    char filename[CHAR_MAX];

    if (argc > 2) {
        printf("Usage: inverter [filename]\n");
        exit(1);
    } else if (argc == 1) {
        printf("Please enter a filename: ");
        scanf("%s", filename);
    }

    input_image = fopen(argc == 2 ? argv[1] : filename, "rb");
    output_image = fopen("output.bmp", "wb");

    if ((input_image == NULL) || (output_image == NULL)) {
        printf("Error: Failed to open an image file\n");
    } else {
        printf("Image opened successfully\n");

        /* Retrieve the header from the input image */
        num_read = fread(header, sizeof(char), 14, input_image);
        if (num_read != 14)
            exit(2); /* Not a valid BMP file! No point in continuing. */

        /* Retrieve the bigger info block from the input image */
        num_read = fread(info, sizeof(char), 40, input_image);
        if (num_read != 40)
            exit(3); /* Not the type of BMP file we're looking for. */

        /* Using the info block to tell us necessary information about the
         * file, like width, height, et cetera. */
        width = info[4] + info[5] * 256 + info[6] * 256 * 256 + info[7] * 256
            * 256 * 256;
        height = info[8] + info[9] * 256 + info[10] * 256 * 256 + info[11] * 256
            * 256 * 256;
        im_size = info[20] + info[21] * 256 + info[22] * 256 * 256 + info[23]
            * 256 * 256 * 256;

        row_length = im_size / height;

        /* Now, allocate memory for the image and read it in. */

        image = (char *) malloc (im_size);
        if (image == NULL)
            exit(4); /* Wrong size; no point in continuing */

        num_read = fread(image, sizeof(char), im_size, input_image);
        if (num_read != im_size)
            exit(5); /* Some sort of error in reading in the file. */

        /* Now to invert the image's colours */
        for (row = 0; row < height; row++) {
            /* Define the pointer position for this row */
            /* This is the first blue byte on the row */
            lineptr = image + row * row_length;

            for (pixel = 0; pixel < width; pixel++) {           
                /* Define the current colours */
                blue = *lineptr;
                green = *(lineptr + 1);
                red = *(lineptr + 2);

                /* Invert the colours */
                *lineptr = 255 - blue ;
                *(lineptr + 1) = 255 - green;
                *(lineptr + 2) = 255 - red;

                lineptr += 3 /* Go to next pixel */
            }
        }

        /* Write this to the output file */
        fwrite(header, sizeof(char), 14, output_image);
        fwrite(info, sizeof(char), 40, output_image);
        fwrite(image, sizeof(char), im_size, output_image);

        free(image);
        fclose(input_image);
        fclose(output_image);
    }
}

It is important to note that this is not the most elegant way to write this program, nor is it the fastest. This program could be improved by using pointers rather than the array notation used here, for instance. However, this does demonstrate the sort of program which can be written as soon as you have access to operations that work on files at a binary level.

Using this program as a framework, we could also perform such tasks as changing the colours to greyscale, saturating the colours or other functions that we might see in a fully-featured image manipulation program. Similarly, knowing the details of a music file format would allow us to write programs which could manipulate that type of file.

A final function which is of interest in the context of operations on files is the remove() function. As the name suggests, it is used to remove a file from storage as long as no other filenames are linked to the file. This function allows us to create a simple program along the lines of the Unix rm command, which we will call rm_file.

#include <stdio.h>

int main(int argc, char *argv[])
{
    if (argc == 2) {
        if (remove(argc[1]) != 0) {
            printf("Error: File %s could not be deleted\n", argv[1]);
        } else {
            printf("File %s successfully deleted\n", argv[1]);
        }
    } else {
        printf("Usage: rm_file [filename]);
    }
    return 0;
}

As persistent data is such an important idea in computer programming, it is useful to have a series of functions defined in the language you are using that easily operate on that persistent data, and between plain-text and binary input/output operations in C, you have a set of functions which have been tried and tested. These functions allow us to build basic applications, like our basic “text editor”, or the colour inverter program defined above, that can form parts of a more fully-featured system.

Advertisements

A Small Collection of FizzBuzz Solutions

Author’s Note: This is just a small collection of solutions in various programming languages that I’m learning in order to solve the FizzBuzz problem. None of the programs are particularly sophisticated, and the use of the modulo operator or remainder function in the various languages makes these programs slow. The Scheme implementation of this problem is especially messy, because of the need to use the (newline) function for each operation, and my use of iterative recursion rather than the (do) construct.

C:

#include <stdio.h>

int main(void)
{
    int i;

    for (i = 1; i <= 100; i++) {
	if (i % 3 == 0)
	    printf("Fizz");
	if (i % 5 == 0)
	    printf("Buzz");
	if (i % 3 && i % 5)
	    printf("%d", i);
	putchar('\n');
    }
    return 0;
}

C – using embedded ternary operators:

#include <stdio.h>

int main(void)
{
    int i;

    for (i = 1; i <= 100; i++) {
	i % 3 == 0 ? (i % 5 == 0 ? printf("FizzBuzz\n") : printf("Fizz\n")) :
	    (i % 5 == 0 ? printf("Buzz\n") : printf("%d\n", i));
    }
    return 0;
}

Python:

for i in range(1, 101):
    if i % 3 == 0 && i % 5 == 0:
        print("FizzBuzz")
    elif i % 3 == 0:
        print("Fizz")
    elif i % 5 == 0:
        print("Buzz")
    else:
        print(i)

Scheme:

(define (fizzbuzz)
  (define (counter n)
    (cond ((> n 100) 'done)
          ((and (zero? (remainder n 3))
                (zero? (remainder n 5)))
                (display "FizzBuzz") (newline)
                (counter (1+ n)))
          ((zero? (remainder n 3))
                (display "Fizz") (newline)
                (counter (1+ n)))
          ((zero? (remainder n 5))
                (display "Buzz") (newline)
                (counter (1+ n)))
          (else (display n) (newline)
                (counter (1+ n)))))
  (counter 1))

FORTRAN 77:

c FORTRAN 77 compliant "FizzBuzz" procedure      
      PROGRAM FZZBZZ
      INTEGER I
      DO 10, I=1,100
         CALL COUNT(I)
 10   CONTINUE
      END

      SUBROUTINE COUNT(I)
      INTEGER I
      IF (MOD(I, 3) .EQ. 0 .AND. MOD(I, 5) .EQ. 0) THEN
         WRITE (*,*) 'FizzBuzz'
      ELSE IF (MOD(I, 3) .EQ. 0) THEN
         WRITE (*,*) 'Fizz'
      ELSE IF (MOD(I, 5) .EQ. 0) THEN
         WRITE (*,*) 'Buzz'
      ELSE
         WRITE (*,*) I
      END IF
      END

Fundamentals of ASCII File Input/Output in C

Since the beginning of digital computing, people have seen the necessity for persistent data which can be quickly loaded into or saved from a computer. Konrad Zuse’s Z3 mechanical computer from 1940 supported instruction loading from punched 35mm film stock; similarly, Charles Babbage’s unbuilt Analytical Engine was to support punch cards similar to those used in Jacquard looms.

When C was first developed in Bell Labs, it was designed with the purpose of being a systems programming language, a more portable and friendly alternative to assembly language. File input and output features were therefore an important part of the language, and are dealt with in a consistent fashion by a series of functions contained within the C Standard Library. In the ISO C90 and C99 standards for C, the file input/output functions are contained within <stdio.h>.

The most elementary operations which one may wish to do with files are to open them, allowing them to be read from and written to, and to close them when those operations have completed. In order to do this, we need some way of representing a file. For this purpose, C provides us with a special type of structure called a file pointer, denoted with the type FILE *. The specifics of the FILE * type are defined in <stdio.h>, but understanding them is not important for understanding file input/output operations. It merely suffices to say that the type facilitates these operations.

Once we have a file pointer within our file, we can use it to open a file. The function used for this task is fopen(), which is declared in the following fashion:

fp = fopen("foo.file", "r");

where fp is the name of the file pointer, the first argument of fopen() is a string containing the name of the file to be opened and the second argument is the mode, a string containing directives to the operating system regarding what level of access is to be allowed to the file. In this case, the mode is “r“, allowing read-only access.

If the file can be opened, the file pointer is updated and the program may continue. However, if a file cannot be opened, e.g. because the user lacks permissions for the file, the fopen() function returns the value NULL, which can be used to set up error reporting and functions. This will be demonstrated below.

As most operating systems have a limit on the number of files that can be open at any one time, and as we may want to clear a buffer which a certain function is operating on, it is a good idea to close a file when we are done with it. Just as C provides fopen() for the purposes of opening a file, it provides the function fclose() for closing a file, which is initialised in the following manner:

fclose(fp);

where fp is again the name of the file pointer. fclose() is automatically called on every file pointer which is still open at the end of a program; it is still a good idea to call it manually on each file as a matter of habit.

The use of both the fopen() and fclose() functions is illustrated below:

#include <stdio.h>

int main(void)
{
    FILE *fp;

    /* Checking for errors! */
    if ((fp = fopen("foo.txt", "r")) == NULL) {
        printf("Error: File cannot be opened\n");
        return 1;
    } else {
        printf("File successfully opened\n");
        fclose(fp);
    }
    return 0;
}

Now that we have these elementary operations, we will want to actually manipulate the files we have opened. Two simple operations which act character by character, like getchar() and putchar(), are fputc() and fgetc(), which insert a character into a file and retrieve a character from a file respectively. These functions can be used to copy the contents of one file to another, for instance:

#include <stdio.h>
#define FILENAME 256

int main(void)
{
    FILE *ifp, *ofp;
    char c;
    char input[FILENAME], output[FILENAME];

    printf("Enter the name of the input file: ");
    scanf("%s", input);

    if ((ifp = fopen(input, "r")) == NULL) {
        puts("Error: input file invalid");
        return -1;
    }

    printf("Enter the name of the output file: ");
    scanf("%s", output);

    if ((ofp = fopen(output, "w")) == NULL) {
        puts("Error: output file invalid");
        return -1;
    }

    while ((c = fgetc(ifp)) != EOF) {
        fputc(c, ofp);
    }
    fclose(ifp);
    fclose(ofp);

    return 0;
}

This set of functions has its uses, but is somewhat limited in its scope. As well as this, only a single character is read or written at a time, which was an issue which we addressed with string functions previously. As well as this, all of the input and output performed by the fputc() and fgetc() functions involves ASCII text.

We will address the issue of strings first. Just as there are functions in the C standard library for getting strings from the standard input and printing them to the standard output, there are functions for getting and printing strings to and from files. fgets(), a function which we briefly saw when dealing with strings previously, and fputs() are equivalents to the gets() and puts() functions found in <string.h>.

We can illustrate a function which gets the contents of a file and prints it to the screen. Several programs which perform functions similar to this exist in Unix. The closest to this program is the cat program called with a single input file; we will therefore call this program “meow“.

#include <stdio.h>
#define MAXCHARS 81

int main(int argc, char **argv)
{
    FILE *ifp;
    char line[MAXCHARS];

    if (argc != 2) {
        puts("Usage: meow ");
        return -1;
    } else if ((ifp = fopen(argv[1], "r")) == NULL) {
        puts("Error: Input file invalid");
        return -2;
    } else {
        while ((fgets(line, MAXCHARS, ifp)) != NULL) {
            fputs(line, stdout);
        }
    fclose(ifp);
    }
    return 0;
}

Note the use of fputs() in this program rather than puts() or printf(). puts() prints another newline character for every line, which does not print the statements faithfully as they are contained in the input files; printf() similarly contains problems with printing files with percent signs, as these are of significance to the printf() function. The fputs() function prints the contents of all text files more faithfully than either of the other functions we may use.

fputs() isn’t just good for printing file contents to the screen; it is also useful for printing strings to files. With this function, we could implement a very simple text editor, similar in concept to the ed text editor which was once the standard Unix line editor. Our editor, which I will call “eddie“, does not have anywhere near the complexity or feature set of even the ed text editor, yet it will serve adequately as an example. Indeed, it is not really accurate to compare our text editor with ed; it is closer to the functionality of the cat program called with the < operator.

#include <stdio.h>
#define MAXCHARS 81

int main(int argc, char *argv[])
{
    FILE *ofp;
    char line[MAXCHARS];

    if (argc != 2) {
        puts("Usage: eddie ");
        return -1;
    } else if ((ofp = fopen(argv[1], "w")) == NULL) {
        puts("Error: Output file invalid");
        return -1;
    } else {
        while ((fgets(line, 81, stdin)) != NULL && strcmp(line, ".\n")) {
            fputs(line, ofp);
        }
    fclose(ofp);
    }
    return 0;
}

Note the strcmp() comparison made in the while loop. A similar comparison is made in the editing mode of the actual ed text editor, checking if the string being entered is a full stop followed by a newline character. This combination of characters is relatively rare in text editing, whether it be in writing source code or in writing human-language texts, and serves as an adequate delineation character to distinguish the end of a text file.

Now that the file string functions have been demonstrated, we can move onto formatted input. I mentioned above that one of the limitations of the fgets() and fputs() functions was that they could only work with ASCII text input. Often, we want to take in the contents of a file in an integer or floating-point form, allowing us to create the likes of database programs.

This is where the fprintf() and fscanf() functions come into play. As the names suggest, these functions are equivalent to the printf() and scanf() functions used for the standard output and input strings. They allow us to address incoming or outgoing data in numerical (and Boolean, in C99) form, as well as ASCII characters. This gives us a greater deal of flexibility, particularly when it comes to data processing applications and the like.

The following functions from a Little Man Simulator whose source code is available under the GNU General Public License demonstrate the loading of a file into memory and saving it to a file. While these functions are regrettably not particularly illuminating outside of their original context, they do adequately demonstrate the use of the fscanf() and fprintf() functions.

/* Function: load_file
Prompts the user to enter a filename, then loads the contents of this file
into the program as instructions/data contents of the mailbox array.
Arguments: mailbox[], an array of short integers.
Returns: a short integer equal to -1 or 0. */
short load_file (short mailbox[])
{
    FILE *fp;
    short i = 0;
    int digits;
    char filename[FILE_LENGTH];

    printf("Enter a valid filename: ");
    scanf("%s", filename);

    /* If file won't open, i.e. *fp == NULL, exit with error state; otherwise,
    open file and use fscanf to enter the digits into the mailboxes. */
    if ((fp = fopen(filename, "r")) == NULL)
    return -1;
    else {
        while (i < MAILBOXES && (fscanf(fp, "%d", &digits)) != EOF) {
            mailbox[i] = digits;
            ++i;
        }
    }
    fclose(fp);
    return 0;
}

In this first function, a filename is obtained, which is used for the operation of the fopen() function. The contents of the file, which contains a set of integers which have no greater than three significant digits, are taken and each integer is placed into a variable named “digits”, before being placed into a corresponding mailbox slot. This continues until the i variable exceeds the number of mailboxes (typically 100 in a standard Little Man implementation) or until the end-of-file character is encountered.

/* Function: save_file
Prompts the user to enter a filename, then saves the instructions/data
contents of the mailbox array as contents of this file.
Arguments: mailbox[], an array of short integers.
Returns: a short integer equal to -1 or 0. */
short save_file (short mailbox[])
{
    FILE *fp;
    short i;
    char filename[FILE_LENGTH];

    printf("Enter a valid filename: ");
    scanf("%s", filename);

    /* If file won't open, i.e. *fp == NULL, exit with error state; otherwise,
    open file and save the contents of memory to it. */
    if((fp = fopen(filename, "w+")) == NULL)
        return -1;
    else {
        for (i = 0; i < MAILBOXES; i++) {
            fprintf(fp, "%d\n", mailbox[i]);
        }
    }
    fclose(fp);
    return 0;
}

This corresponding function takes the contents of memory and for each mailbox slot, prints an integer of at most three significant digits to the file specified by the user. Empty mailbox slots are printed as zeroes to the output file.

It should be noted that all of these functions only operate on ASCII text files, even with the formatted input. Binary file input/output will be explained subsequently, but the functions are contained within <stdio.h> just as with the ASCII file input/output functions.