Historical Operating Systems: Version 7 Unix

The story of Unix is one of the most interesting in computing history, a story of how a little pet project from one computer researcher in AT&T’s Bell Labs spread to become one of the pillars on which modern computer systems are built. All stories, of course, must begin somewhere, and the story of Unix begins in 1969, shortly after Bell Labs dropped out of the Multics project, fearing the “second-system effect” that had caused the likes of IBM’s OS/360 series of operating systems to be delivered late and bloated. In the wake of this, one of the Multics researchers, Ken Thompson, decided to port a game that he had written for the Multics system to something more cost-effective than the General Electric GE-645 mainframe that the game originally ran on.

In the process of porting the game, Space Travel, to a little-used PDP-7 minicomputer belonging to Bell Labs, Thompson and another former Multics researcher, Dennis Ritchie, ended up implementing an operating system which took influence from Multics while trying to avoid some of its pitfalls. This was the first version of what was originally called Unics, named by Brian Kernighan as an allusion to Multics; it featured some very modern features for an operating system for any system of that era, let alone one implemented on a cheap minicomputer. Among the features of the initial Unics system were a full user-accessible pre-emptive multitasking system, a hierarchical file system and a command-line interpreter. As research into Unics continued, more features were included, making it into a truly multi-purpose operating system with multi-user capacity, and Unics was renamed Unix.

Such features were possible on a computer with limited resources because of a certain minimalism in the operating system design, which relied on the modularity of several small utility programs to provide capabilities then uncommon even in mainframe operating systems. AT&T agreed to allow Thompson, Ritchie and the other Unix researchers to port the operating system to a more powerful PDP-11/20 minicomputer in exchange for the development of a typesetting system for the Unix system; it was on this platform that the operating system was rewritten in Ritchie’s C programming language, thus becoming one of the first operating systems to be written in a high-level language. C owed – and still owes – its success to much the same principles as Unix, namely a minimalistic, modular flexibility that belies its simplicity and allows complex techniques to be performed with simple components.

Under the terms of an anti-trust lawsuit against the Bell System, AT&T was prohibited from commercialising any computer technology it developed; in fact, it was obliged to licence its technology to any organisation which wished to use it. Therefore, Unix was distributed to universities, commercial firms and the United States government under a licence which distributed the Unix source code along with the binaries; this gave many computer science students a look at the innards of an operating system in a way that was impossible with many other operating systems. The implementation of Unix in a high-level language like C allowed it to be reimplemented on systems other than the original PDP-11; this was a demonstration of flexibility formerly unknown in the world of computing. The availability of the source code allowed Unix to be easily reverse-engineered, leading to a continuing development of new versions even as AT&T were finally allowed to commercialise it in 1982. In 1983, both Ken Thompson and Dennis Ritchie received the ACM Turing Award, the highest honour in computer science, for their joint development of Unix and of C, an award well-deserved for an operating system to which many modern operating systems, such as Mac OS X, Linux, Solaris and others, owe their lineage.

Version 7 Unix, developed in 1979, was the last version of Research Unix to be see wide distribution; its influence is still felt in all Unix and Unix-like operating systems today. Version 7 was the first version of Unix to include the Bourne shell which succeeded the Thompson shell found in previous versions, allowing greater programmability in a manner resembling ALGOL 68; it also included several new features such as awk, yacc, Make and the first instance of the malloc() routine now included in the C standard library. All of this made a powerful programming environment by the standards of the late 1970s.

Back in 2002, Caldera International released all of the Unix versions up to Version 7 and its 32-bit VAX port, Unix 32V, under a free software licence allowing free use of all of the source code, as well as distributing the original disc images. As I always explore the environments of any operating system I write an article about, I used one of these disc images with a PDP-11/45 simulator on SIMH with 256KB of memory. The first thing I noticed when I booted the simulator into Version 7 Unix was how usable it was by modern standards. OK, the shell lacks any of the modern trappings like command history, aliases or even a display of the current directory on the command prompt (that’s what the pwd command is for!), the text editor is the infamously terse and cryptic ed and the C compiler uses the historical, pre-ISO standard K&R dialect of C, but the operating system still shares enough features with a modern Linux or Unix command line for me to use my previous knowledge to reasonable effect.

Version 7 Unix - Source Code

Version 7 Unix – back when programmers could do more with 256KB of memory than some modern programmers can do with 4 gigabytes.

The basic utilities of the early Research Unix versions did seem to require you to be a programmer of some sort to get any sort of real use out of it; programming tools on the disc images of Version 7 Unix include compilers for C and FORTRAN 77, an interpreter for a variant of BASIC, the Ratfor preprocessor for FORTRAN and an assembler for native code. yacc, lex and other additional programming tools round out the Unix programming environment.

Editing your source files requires the use of ed, a simple line editor which can still be found in modern Unix and Linux systems, but which is seldom used, having been displaced by the likes of vim, GNU Emacs and GNU nano. The terse, cryptic syntax of ed was once infamous; almost all commands are entered using a single alphanumeric character for the command, plus some numbers and symbols as arguments, while the editor itself has two modes, similar to vi, except that there is no way of telling them apart by sight. Like many of the early Unix utility programs, ed was designed for the limitations of teletypes; in this case, it really shows.

As Unix was allowed to expand for purposes including typesetting, it should be evident that some of the other tools on Unix were developed with that in mind. The likes of troff and nroff were designed for typesetting on a C/A/T typesetter, a device which allowed typesetting without expensive devices like Monotype or Linotype hot metal typesetters. By 1979, the C/A/T typesetter was becoming obsolete, but Brian Kernighan had not yet completed his device-independent version of troff by the time that Version 7 Unix had been released; the version of troff used in Version 7 Unix was a version written in C by its original author, Joe Ossanna.

Not all of Version 7 Unix’s programs were serious in nature, just as not all of a modern desktop operating system’s programs are serious. As Unix was originally designed as an environment for porting a game onto a new computer, it is to be expected that Unix has a few games on it. The games included a chess game, a game of Hangman and the famous Hunt the Wumpus – with the source code full of goto statements!

Hunt the Wumpus

Killed the Wumpus on the first try!

A fair amount of the source code for these programs is available as is on the disc image I used, including a lot of the utility programs, a few of the games and the mathematical libraries. Comparing these bits of source code with the likes of the GNU Coreutils, modern variations of the old Unix programs, one notices that the Version 7 Unix utilities are a lot more sparse – although one might argue that they are more elegant – than the GNU utilities. The GNU echo utility in version 8.15 of the Coreutils is 275 lines long and covers circumstances such as converting hexadecimal to decimal; the Version 7 Unix echo command is barely 20 lines long and has a single flag controlling whether it should print a new line at the end of output. One may argue that the GNU echo command is far more flexible, and it is, but one might also argue that the Version 7 Unix echo command closer resembles the original intent of Unix. Such arguments begin “holy wars”, though, and as I don’t really have a strong enough grasp on the utility of such commands to truly judge them, I’ll leave the argument there.

What is clear though is that Version 7 Unix looks modern and familiar enough to clearly be the ancestor of many of today’s operating systems. It may not have a flashy graphical user interface like modern Unix and Linux variants, but when you get to the guts of these modern operating systems, you get to something that looks very like the same Research Unix systems that the likes of Ken Thompson, Dennis Ritchie and Brian Kernighan were programming on over thirty years ago. The code is different, especially with the GNU-derived operating systems where it was realised that to replace Unix, you must first replicate it perfectly, but the utilities have the same style of usage.

Even more of an influence on the world of computing was the C programming language that underpins not only Version 7 Unix, but almost every serious operating system still in use; by being able to underpin Unix, the C programming language was proved to be a serious contender in the systems programming field at a time when operating systems implemented in high-level languages were limited to mainframes. As system resources have grown, C’s minimalistic modularity and flexibility has proven itself up to the task of scaling up to modern computer systems. There truly is no better memorial for Dennis Ritchie than the language he invented back in 1972, and there will be no better memorial for Ken Thompson than the operating system which changed the world of computing utterly.

Advertisements

Functions and Pointers in C: A Brief Guide

Author’s Note: Yet another piece of filler. Basically, I’ve been doing a lot of programming recently, and I wrote this guide as an aid for some people. It’s not particularly long, nor does it contain much insight, but it represents the most substantial piece of writing I could conjure forth this week.

Any arbitrary C program that you see will contain at least one function – main() – and unless it is very simple, more than one. Functions are a way in C to segment a program into several discrete pieces. In almost all cases, a C program could be written entirely in main() with the only calls to external functions being ones in the C standard library, but the program would be a complicated mess, difficult to read and even more difficult to modify or to reuse any of the components.

A C function is declared in two parts: The function declaration, or a statement of intent as to the components of the function; and the function definition, which contains the body of the function. The declaration takes the form of a function prototype, which is usually declared either before or inside the main() function.

A function declaration looks something like this:

int square(int i);

Note the semicolon at the end; the declaration looks similar to declaring a variable. This defines a function named square() with an integer argument and an integer return value. Any use of square() which does not agree with this function declaration is an error.

Once we have the function declaration, we can write the function definition. This looks something like this:

int square(int n)
{
    return n * n;
}

At this point, we can further discuss the arguments and return type of this function. The argument variable, n, is internal to this function, cannot be called by any other function and is separate from any other variables which may happen to be called “n” in other functions. If this argument was called with the value of a variable in another function, the argument is /not/ the same as that variable; it is another variable entirely which just happens to have the value of the initialising variable.

This can be demonstrated with the following simple program:

#include <stdio.h>

int main(void)
{
    int a = 12;
    printf("%d\n", a);
    printf("%d\n", square(a));
    printf("%d\n", a);
    return 0;
}

int square(int i)
{
    i = 5;
    return i * i;
}

 

 

This program has the result:

12

25

12

Even though square() was called with a variable, the function has no access to this variable during its operation. The value of the variable, a, is taken and put into the variable, i, while the program is executing the square function. Changes to i do not affect the corresponding variable in main().

If one actually wants to change the variable, a, in main() with reference to the square function, there are two ways to achieve this. The first is to redefine the variable, a, with the return value of square() in main(), as such:

int main(void)
{
    ...
    a = square(a);
    ...
}

The second is to use a pointer. Before I address this, the question must be asked: Why use pointers in the first place for anything? Firstly, their judicious use can make a program more compact and quicker at the machine-code level, particularly with array referencing. Secondly, they are necessary for dynamic memory allocation (i.e. malloc() and its brethren). Thirdly, they are necessary to provide the address for call-by-reference function declaration. With that in mind, we’ll continue to their use in functions.

Using a pointer in the above program would necessitate a few changes to the already existing function declaration, definition and statement of use in main().

#include <stdio.h>

int square(int *i); /* Notice the use of the asterisk beside i */

int main(void)
{
    int a = 12;
    printf("%d\n", a);
    /* & is the reference operator, and sends the address of a to
     * square rather than the value.
     */
    printf("%d\n", square(&a));
    printf("%d\n", a);
    return 0;
}

int square(int *i)
{
    /* The asterisk is the dereference operator, changing the value
     * stored in the address referred to by the pointer *i.
     */
    *i = 5;
    return (*i) * (*i);
}

Using this notation, the result is:

12

25

5

The reason for this is that unlike the variable i, which stored a value which was equal to that value going into the function, the pointer i, which is dereferenced in the function using *i, stores the address of an already existing variable. The variable i was a separate address from the variable a in main(); the pointer i is a separate address from the variable a, but by containing its address, it can be used to perform operations on that variable from another function.

Another use of pointers is in reference to arrays. Indeed, arrays and pointers are very closely linked, and an array could be thought of as a set of pointers which are contiguous in memory. These two statements are therefore equivalent:

array[1]

*(array+1)

The use of the pointer notation is typically quicker than that of the array notation, but often more difficult to understand. The reason for the additional speed comes from various operations at the machine-code level; usually, however, the use of array notation won’t result in any adverse effects to the speed of the program.

One of the other uses of pointers is in dynamic memory allocation using the likes of malloc(). A call to malloc() looks something like this:

/* Use malloc to create an array of 1000 int variables */

int *array = (int *) malloc (1000 * sizeof(int))

 

 

The syntax of malloc() looks unfriendly, particularly due to the cast to a typed pointer that needs to be put before it. This is because malloc() returns something called a pointer to void (i.e. void *), which is like a generic pointer which can be cast to any typed pointer, and to which all typed pointers can be casted to. Using this intermediate, generic pointer is the only way to change the type of pointers. Any other typecasting operation on pointers is illegal and will show up as an error in the compiler.