"CookBook" toplevel

At present I intend to cover the six languages (PHP, perl, CMU Common Lisp, C, C++, and java) which were demonstrated at three steps beyond CGI Hello World (proper decoding of contents of HTML form and consequent processing and output appropriate to the specific contents of the submitted form).

Table of contents:

Chapter 1: Data types part 1 (commonly used as literals)
Chapter 2: Syntax for program structure
- Function/method/infix calls/operations (including some essentials)
- Blocks (including declarations and go-to)
- Function/method definitions (including main)
- Overall structure of a program
- Decisions (branches) and auto-repeated units (loops)
Chapter 3: Data-processing operators/functions/methods for various data types general plan here / skeleton started 2007.Feb.01 here, most c/c++ operators included as of Feb.04, all done Feb.12 and starting with c libraries, ctype.h halfway done Feb.13)
Chapter 4: Control mechanisms and programming paradigms (may eventually include: decisions & loops, mapping across collections, enumerations/iterators & continuations/streams & pipes & filters, generic functions & operator/function overloading, instance methods, classes & interfaces & inheritance, RPC & RMI, etc.) (started 2007.Feb.08)

Note that I intend to cover the following classes of software features:

Internal data-processing operations that would be usable with virtually any type of application
Use of standard I/O in console applications
WebServer (a.k.a. "ServerSide Web") applications, running under CGI or as PHP script, for generating pure text-only HTML
Distributed processing, including threads, controlling sub-processes, sockets/streams

I do not plan to cover any of the following, simply because I have no convenient access to any development environment for any of these:

GUI applications
Any embedding of non-textual elemements such as images or sound in Web pages
Other ServerSide Web platforms such as JSP, ASP, J2EE
HTML output specific to any particular ClientSide technology such as cellphones with WAP, any Web browser other than lynx
ClientSide Web applications such as JavaScript or applets or plug-ins

Other sources:

Anything I haven't yet included might be found here in much terser form but still perhaps useful if you then use Google to find the documentation for the keyword cited there.
Also, the original perl cookbook sourcecode and partial translations to several other languages can be found here, but I noticed some mistakes in my casual browsing of the CommonLisp section, so you shouldn't trust this as accurate.
This site contains complete applications in up to appx. 40 different languages, some of which test basic language support such as threads or graphics, others of which demonstrate the programmer's ability to express a complete algorithm or application in the language.
This site shows how to write 'Hello World' and a few other simple applications, in up to nearly 130 different languages.

Caveat: For the most part the version of C that I'm describing is the standard that ISO adopted in 1990, minus anything that didn't survive the new 1999 standard. I'll try to specially flag any library functions or other features that are new with the new standard ISO adopted in 1999. For Common Lisp I'm pretty much restricting my descriptions to Guy Steele's 1984 book "Common Lisp The Language", minus any functions which have been removed since then, with nothing new since then included, although if I become aware of any additions that are especially useful for the purpose of this document I might include them on an ad hoc basis. For Java, I'm generally describing version 1.2.2 because that's the only version available to me, but some enhancements of versions 1.3 and 1.4 might slip in because that's the only online documentation available (except the new version 1.5 which is so radically different as to be beyond any reason). As for C++, I'm generally relying on the textbook "Learning C++, third edition" by Eric Nagler, which implies in the preface that it conforms to "ISO/IEC 14882 ...". As for perl and PHP, suggestions as to any official standard are welcome.

An integer, in mathematics, is a whole number, such as the counting numbers 1, 2, 3, etc., the additive identity 0 (zero), and the additive inverses of the counting numbers -1, -2, -3, etc. Most languages support some subset of these mathematical integers as data within programs written in those languages. I'll discuss some of the pecularities of these subsets later.

A computer program is usually written using a programming language, which is a way of representing data values (called literals) and operations upon those data values (called algorithms) within a textual syntax, usually US-ASCII although some languages also support expression of programs using other charactersets. All the examples presented here, including programs synthesized automatically, will use US-ASCII. Below are Web pages containing or pointing at samples of simple programs in the languages I'll be discussing in this cookbook. Note that PHP is specifically designed as an embedded language in Web pages to provide dynamic content (similar to JSP or ASP), and AFAIK cannot be used in any other way (except by emulating a Web evironment somehow). All the other languages can be used either to generate Web content or as standalone applications in a standard-input/output environment such as Unix/Linux or in a DOS window. Java also includes built-in support for GUI applications (which won't be discussed in this cookbook), and the other languages (except PHP) can be used to program GUI applications with addition of add-on libraries.

If you are a beginner to computer programming, all those samples of programs will probably look totally mysterious. They are here just to give you a basic idea that there's this syntax called "source" (for a particular programming language, different syntax for each different programming language), and the file itself is nothing more than an ordinary text file (not a word-processing document -- Use NotePad or other text editor, not Word, to create and edit such source files!), which contains a mixture of words and/or numbers and/or punctuation, and to show you that the syntax varies between different languages, some easier for you to understand than others, depending on your past experience. But you'll note each of these source listings contains some version of the phase "Hello, World!", because the task each program does is simply to type out (on stdout), or display (on your Web browser), that phrase, when the program is invoked.

Later when you understand one or more of these programming languages, you might come back here and take another look, and those samples will suddenly look trivial rather than mysterious.

So now that you know that integers are mathematical objects, and that source programs are files of text, which are *not* normally treated as if mathematical integers nor any other kind of number (although each character is coded internally as a 7-bit binary integer, that's what US-ASCII is after all, but that internal character code is *not* the integer you want in your program most of the time), how do you include integers in your source program, not the internal codes for each character in US-ASCII, but arbitrary integers you really want??

That's what literals are, a way to specify data values, such as integers, directly in your source program, in a user-friendly way. For example, if you want the numberic value 42 (fourty-two) in your program, you simply type the '4' (four) digit followed by the '2' (two) digit, with no other characters between them, i.e. you type exactly "42" (without the quotes) in your program. Note that you must have some kind of separation between such a literal and any other alphanumeric text in your program, as well as certain punctuation characters such as "." (called "period" in American English, "full-stop" in British English). For example, if you say "42 86" (without quotes) that's the number 42 followed by the number 86 (which is valid syntax in some languages, such as lisp), but if you run it all together as "4286" (without quotes), that's a single number, four thousand two hundred and eighty six, and if you run it together with a word such as "print42" (without quotes) that's something entirely different which we won't discuss until later, and you can probably guess that "42.86" (without quotes) won't give you two separate integers either.

Note that I had to keep saying "without quotes" because if you put quotes around that text you'll get something entirely different in your program, a string, which we'll discuss later.

I mentionned earlier that different programming languages allow different subsets of the integers. Now I'll discuss these practical matters more specifically for each language:

lisp has the most general implementation of integers. You can use any sequence of numeric decimal digits ('0' thru '9', no hexadecimal extensions such as 'A' thru 'F'), no matter how long, without any intervening non-digit characters (be careful not to accidently include any line breaks), and lisp will accept the whole sequence as a single literal integer, so long as there's enough memory to store both the original string and the consequent big integer (and perhaps some additional temporary storage) simultaneously during the conversion process from textual source input to internal big-integer representation. The exactly correct integer value will be stored, not some rounded-off or truncated value. You can then use that internal integer value in the same way you'd use any other integer value, such as in arithmetic calculations. So long as your operating system allows lines in text files to be of unlimited length, you can include such literal integer values in any lisp source file. So long as your operating system allows lines on keyboard input to be of unlimited length (and handles visual line break in some user-friendly manner without inserting line-breaks into your input as seen by the lisp read-eval-print interpretor), you can manually enter such very-long literal integers directly from the keyboard.

c has the most restrictive subset of integers, namely only specific sizes (numbers of bits in twos-compliment representation) that will fit in various sizes of machine words. No check is necessarily made if your calculations overflow this limit, and even if any such check is made, there's no way within the confines of the language to express the mathematically correct value, so throwing an exception may be the best the runtime system can do. But it might simply wrap around, usually to a value of the opposite sign (positive vs. negative) from the correct sign. This is called undefined behaviour, so beware!! If you try to use a literal which exceeds the size limit, the compiler may silently wrap around, or may issue a compiler diagnostic. With this extreme limit, literal integer values have the same syntax as in lisp. There are add-on libraries available which implement big integers, but there's no way to express such integer values as literals. You have to build such a large integer by calculating it from smaller integers or by conversion from a string (see later).

c++ has essentially the same restrictions as c. There is a standard library that provides support for big integers, but such values can't be expressed as literals. You have to build such a large integer by calculating it from smaller integers or by conversion from a string (see later).

java has the same restrictions as c for built-in data types and literals to represent them. Java has a well-supported and well-documented class for big integers, which is shipped with every java installation, which has a well-defined constructor for converting a string representation of a mathematical integer into a BigInteger object. Of course such objects can't be used as literals. But on the other hand, an expression that invokes a BigInteger constructor can be used most places a literal might be used, so it's not really much of a handicap that they aren't true literals directly as BigIntegers. Note however that the BigInteger class is not available in the environment of most java applets within Web pages, because they aren't considered part of the core language.

perl doesn't actually have an integer data type. Instead it has a generic numeric data type which slides between integers and floats. Specifically, experiments I performed show that integers from 0 thru 999999999999999 are treated as integers (both internally and on printing), while integers from 1000000000000000 beyond are treated as floats. But all the way up to 2**53 = 9007199254740992 the floating point representation has enough significant bits to hold the entire integer without loss of information, it's just that you can't see the low-order digit when such a number is printed. Beyond that point, only the 53 high-order bits are retained internally, the remaining low-order bits discarded (with last kept bit as-is or bumped up by 1 depending on roundoff). On input, up to 250 digits are allowed, converting to internal floats, with 53 significant bits. But if you enter 251 numeric characters in a row, intending an integer literal with such a large value, instead the input parser aborts with "Number too long". But you can compute a floating point value a short ways beyond that point. The syntax for input, except for the 250-digit limit in literals, is the same as lisp.

PHP implements integers much the same way that perl does. But there's no arbitrary restriction on input to 250 digits in a literal. Instead you can make literals as long as the internal representation will support. If you input or compute a value that exceeds a 308-digit integer, it gets changed to infinity instead. But it has the same problem representing exact integers as perl, for the same convert-to-float reason.

So now that we have an idea how to enter integer values as literals in program source in six languages, all basically the same syntax (contiguous sequence of decimal digits, separated from other text by some sort of delimiter such as whitespace), how do we know our literal really got into the computer as an integer value? By printing it back out!

In lisp, printing out what you entered is no problem. That's what happens by default in a interactive environment. (And it even happens, inappropriately, in a CGI environment, unless you take special measures to prevent it.) You start up lisp, and it waits for you to type a line of input, then it converts what you typed into an internal value, performs what's called "evaluation" on that value (which doesn't do anything if what you entered is a literal, which is why they are called "literals", because the result is literally the same as what you entered), then prints out that resultant value (your literal back out at you). This is called the read-eval-print loop, because over and over endlessly it reads your input, evaluates it, and prints out the result.

The trick with lisp is to get it started in the first place. If you're on Unix, you simply type the name of the runnable lisp interpretor, which might be just "lisp", or might be a more specific vendor name such as "cmucl" or "sbcl" etc. If you're on RedHat Linux, you need to first open a terminal window, which you can probably get from the START menu, or from a special button in the bar at the bottom of the screen, then it's just like on Unix. If you're on MicroSoft windows, you need to open a DOS window, which on modern versions of Windows you can get from the START menu by selecting COMMAND or CMD or something like that, then it's mostly like on Unix or Linux. If you're using an old Macintosh with Allegro Common Lisp, you double-click on the icon, which puts you in a two-window mode, with a read-eval-print loop in the bottom window and a emacs-like editor in the upper window. If you're using a newer Macintosh with System 10 which is like Linux, I don't know, I've never used one. If somebody has such a Mac with some useful version of lisp with read-eval-print installed, please tell me so that I can include that info here.

Now once you somehow get lisp "interpretor" started, it'll probably print out some banner identifying what version it is, then it'll enter the read-eval-print loop. So you just type an integer literal on one line, for example 42, and then press ENTER or RETURN, whatever the key is on your computer for finishing lines of input, and lisp will print that number 42 on a new line and then wait for further input. So suppose you enter two numbers, with a space between them, such as...

...on a single line of input? Lisp will read the 42, and print it out on a new line. Then lisp will read the 69 and print it out on another new line. Then lisp will see you haven't typed any more input, so lisp will wait for new input. If you have access to lisp, you should try that right now.

With perl, you can likewise try single inputs interactively, but you must explicitly tell perl to print the value. There's no read-eval-print that does that automatically. Also you can do it directly from the command line, rather than first starting up an interpretor then talking to it. You type the name of the perl interpretor (usually just "perl"), then a space, then the flag -e, then another space, then an apostrophe (single quote), then one line of perl source followed by a semicolon, then another apostrophe (single quote), and finally you press ENTER or RETURN. For example, here's how to verify that the integer literal 42 got in OK:

But note especially: perl doesn't output a newline after printing the number 42. It just leaves the cursor hanging after the number, and on Unix at least the shell prompt for next input is run right after it on the same line. More on that shortly below.

Perl allows you to put more than one perl statement on a single line of input. Just be sure to have a semicolon after each individual statement. For example, to input the literal 42 and print it, then input the literal 69 and print it, you can say this:

...Again, there's no automatic newline after each output, so the 42 and 69 run together with each other as well as with the following shell prompt. To force a line break (in the output) at any point, include this statement: print "\n";. For example, to print 42 on a line by itself, not run together with the next shell prompt, do this:

...and to print 42 on one line and 69 on another line, and move to yet another line for the next shell prompt, do this:/p>

You can also put parentheses around the value to be printed. You might as well try this now, because using parens in this way will be necessary later on when printing the values of more complicated expressions. (It's also the syntax that is required in several other languages.)

For each of the next three languages (c c++ and java), you must create an actual source file containing an entire program, not just one line of sourcecode to be interpreted, then you must compile that program (into an executable with c or c++, into a bytecode classfile with java), and finally you must run that compiled program. You need to know how to do this on your system before you can do any experiments with literals, so I suggest you go learn how to do it now! See the simple hello world samples here, and check documentation on your particular system how to create source files using a text editor (such as NotePad), and how to compile them, and how to run them. As soon as you know how to do that for the hello world programs in whichever languages interest you, proceed below.

Here's a minimal c program to verify that the literal 42 got in as an integer. You don't need to understand anything about the source program except where the 42 sits in this template. This is only two lines long.

Try it! Copy those two lines to a text file, with name ending with ".c", such as test.c, then try compiling it just as you did with hello.c earlier, then try running it the same way. You should see 42 printed out, without any line break after it, so it's run into the next shell prompt.

...then compile that file. I tried that, but the compiler gave me a warning: decimal constant is so large that it is unsigned and then when I tried running the compiled program, it printed the value 1084227585 instead of the correct mathematical value as would happen in lisp. You should try playing around with the size of such literals to see what the limit is with the version of c you have there.

...Try compiling that as-is, and then running it. Then try changing the 42 to a really long literal. For example:

...When I tried that, I got two errors: integer constant out of range and warning: decimal integer constant is so large that it is unsigned. Also, those errors caused the compiler to not produce any compiled output file, so I can't "just try it" to see what mangled integer value it used as I could with the c program.

Here's an equivalent program in java. Note that we have to be careful to match the name of the source file with the name of the class it will compile to, or else we'll get all confused. I've named my source file "lit.java", the "lit" part from the literals we're demonstrating. Accordingly the name of the class has been set up as "lit":

The commands for compiling and running are standardized everywhere, so I'll tell you the commands instead of making you research them as you had to do for c and c++ programs.

As with the perl (without printing \n) and c and c++ programs, the number 42 is run together with the next shell prompt. I could have fixed that by using println instead of print, but I wanted to make the java program as nearly isomorphic to the c and c++ programs as possible. So anyway, what happens if we change the 42 to be a really long literal? For example

...When I tried that, I got the following error diagnostic Integer literal out of range. Decimal int literals must be in the range -2147483648 to 2147483647. which is considerably more informative than the c or c++ diagnostic, because it tells me precisely what range my literal failed to be in, so I don't have to experiment to try to guess that range. As with c++, the compiler failed to produce any compiled output, so I can't "just try it" as I could with c.

With PHP, you can't enter single lines of code interactively as with lisp and perl, nor can you compile and run standalone applications as with c c++ and java. Instead you need to set up a full Web page on a Web server which is enabled for PHP. Assuming you've already done this for the simple Hello World program, all you need to do now is modify that PHP file or create another one like it to demonstrate entering literals and the displaying the value so you can see it got in correctly. For the moment we won't bother trying to make this a proper HTML file. The bare mininum needed in the file is

After you have that installed on your Web site and working (it should display a screen with just the number 42 on it), I encourage you to edit the literal 42 to be other values, including numbers so large that they get converted to approximate floating-point values, so you'll get a feel for use of integer literals in PHP.

Let's return to perl again. Previously we used perl only in single-line-at-shell-prompt mode. It's also possible to store perl scripts in a file as if they were executable programs, and then run the perl interpretor on them. This allows programs that are more than just one line long. On Unix and Linux systems, the first line of the file should tell where to find the perl interpretor. For example on this Unix system where I'm working, that first line says:

...Then the rest of the file consists of lines of perl source program. For example the second line might say:

Once you have such a perl script file (let's say it's called try1.perl), you can invoke it in two ways: You can call up the perl interpretor explicitly and pass it the name of the perl script file:

...or you can invoke the script directly from the shell as if were a shell script or an executable program:

Strings, as literal data if possible

The second data type we'll discuss is strings, which are finite sequences of characters. We'll discuss how to use literals to produce string values inside your programs. But wait, you may ask, the entire source file is already a sequence of characters. Why can't we just use that? Because the source file is parsed for syntax particular to the programming language, and what is produced is not one long string simply reciting your whole source program verbatim. Instead, as we saw in the previous section, the digits 4 and 2 in sequence represent the numeric value (integer) fourty-two, not the character sequence (string).

So then how do you get sequences of characters from your source file treated just as sequences of characters, not as special syntax such as for numbers or commands to print or display? You put quote marks around the part you want quoted. In every language except perl and php, you use double quotes, for example "42" is the string of two characters in lisp c c++ or java. In perl and php you use single quotes instead, thus '42' is the string of two characters in those languages.

But what if you want to include the quote mark inside a quoted string? You put a backslant immediately before it. For example, here's a string literal (in lisp c c++ or java) containing quote marks around a word: "She said \"hello\" to him."

But what if you want a backslant itself to be one of the characters inside a string? To do that, you use two backslants in sequence. For example, if you wanted to show how to write a fraction forwards and backwards, the string literal saying that might read "2/3 = 3\\2".

If you stick to quoting alphanumeric text, spaces between words, commas and periods and question marks as punctuation, and an occasional quotemark or backslant, that's all you need to know about the syntax of strings. But you have to be careful about a few other special characters that have special meanings inside strings, which are different for the various programming languages. For example, inside strings in c c++ java perl and php (all languages except lisp), a backslant followed by the letter n is converted to a newline character. In lisp, you can actually include newlines and other characters directly within the string-literal notation. (But that's not recommended except in format strings, see the format function which will eventually be described later in chapter 3.)

So now we print out strings to verify they got in correctly, just as we did with integers earlier. But there's one difference in lisp: There are two ways to print out strings, verbatim as the characters which are inside them, or converted back into string-literal syntax so that they can be read back in later as strings. When you're talking to the read-eval-print loop, the default is to print out as string literal syntax to be readable back in. In all the other languages, when you print (in c c++ java perl) or echo (in php), you get just the actual characters of the string printed, not the full syntax of the string literal.

So here's an interactive lisp session, where input to lisp reader is highlighted while output from lisp printer is plaintext:

...Hmm, that's not very revealing, except where two different strings were entered on the same line of input and lisp printed them back out on separate lines. As I said, the default is to print out results converted back to source format, in this case the same string literal sytnax you typed in. You can't see what exactly the characters in the string are. But here's a trick, you won't understand until later, but you can just use it for now. After lisp has printed out a string in full string-literal format, say next:

...That converts the last output (which must have been a string) into a list of explicit characters one by one, so you can see each character explicitly. It even uses names for all-white characters such as Space, so you can be sure there was a space instead of a tab there. If you had a tab in your string, it'd show you that instead. Here's another trick you can use, without really understanding it for now, to force lisp to print just the exact characters in a string, without the extra syntax for string literals:

...After each princ input, the first line of output is from the princ, which prints the short way, just the characters in the string, while the second line of output is the normal printing of the read-eval-print loop which converts the string into literal syntax.

In c c++ java or perl, you may have noticed the string literals in the Hello World programs you ran earlier. Here are listings of condensed versions of those programs one after another:

By now you should be able to tell just by looking which language is which. Notice how \n is used inside each string to include a newline character. (We didn't do that earlier in the section on integer literals, we just let the 42 run together with the following shell prompt most of the time, because I didn't feel like confusing you by doing something you didn't yet understand if it wasn't really necessary. Although I did break that rule once when 42 and 69 ran together and I showed you a trick for breaking them onto separate lines. Now you should understand how that trick worked, by printing a string with a newline in it.)

So now that you know how to enter two kinds of literals into your programs, integers and strings, how about we write a program that combines the two in a slightly nontrivial way? We'll have a program tell a short story, using integer literals every place a number turns up, otherwise using string literals. First the complete program in c:

...Notice to output integers we had to use a trick, which you won't understand for a while. That's OK. Just think of that as a template into which you can put whatever string literals and integer literals you want to have read in then printed.

...As before, we didn't have to do anything special when we print integers. You probably could have done this translation from c to java yourself. Here's the program converted to perl next:

Note that with PHP, because we're generating HTML output, we can't just output a newline to go to the next line, we need to include the HTML command for breaking the line, namely <br>. You can run that program remotely if you'd like. (Yes, I really did actually try all the programs I listed above, and for the PHP program the only way to try it was to install it as a Web-accessible page, so there it was already, so I might as well tell you where to find it.)

We come to lisp last because up to now we've typed in a single statement of source code at a time, or maybe two on one line, and the read-eval-print loop always showed the output on a separate line for each input statement. But here we really do want all the pieces of text (from strings and from integers) run together. It just won't do to print each on a separate line. So we have to use what for now has to be treated as a "trick" for collecting a whole bunch of little statements into one big statement that gets executed all at one time. This can be done in various ways, but I choose to use PROG, the so-called "program" feature of lisp. There's an old lisp joke that without PROG you can't write programs in lisp. It's not true, but PROG sure is one handy way to group together a bunch of statements to force them to be read in as a single unit and executed all in one sequence without any automatic line breaks between them. So at this point you should just treat this as a template to be filled in with any sequence of lisp statements you'd like. Also, we'll use the PRIN2 trick to force every output of a string to be just the characters inside the string without the quotemarks that would make string literals. Also for consistency, to make this a template you can substitute any literals into, integer or string or other types we might discuss later, I've used the PRINC trick for both strings and integers even though it isn't really needed for integers. (But there's another more subtle reason why we must explicitly tell it to print each literal. Later you'll understand why. Since PRINC is the only way I've shown you to explicitly print something, we might as well use it for integers here.) So here is the "program" in lisp:

If you have lisp available where you are, I suggest you copy and paste that whole batch as a single unit from this Web page to your read-eval-print loop, rather than try to manually key it in or copy&paste line by line. Alterately you can put that whole "program" into a text file, which can be named anything you want, but I suggest "coo.lisp", then from the read-eval-print loop you say this:

...That will cause the entire (PROG ...) group to be read in and then executed just as if you had typed or copied&pasted the whole thing directly into the read-eval-print loop.

Floating-point numbers

The only other data type which is available as literals in all six programming languages is floating point numbers. These are used as approximations to real numbers which aren't always integers. They use a sort of scientific notation. Ordinary scientific notation is a decimal fraction times a power of ten, or just a decimal fraction by itself if the particular value isn't large enough nor tiny enough to warrant the full syntax with power of ten. In these computer programming languages, literals for floating point values look almost like scientific notation, decimal fraction times power of ten, but upon input they are converted to a binary form of scietific notation, a binary fraction times a power of two. This means the value inside the machine is almost never exactly equal mathematically to what you thought you had typed in.

For example, if you type in a literal of 0.1, i.e. 1/10, what you might really get inside the machine is exactly 13421773/134217728. What, that's absurd!! But that's what you actually get, at least in lisp, because that's the closest exact binary fraction to the decimal fraction you typed in. In lisp it's easy to check things like that, because there's a function that converts floating-point values into their exact rational values. (I did that just now, which is where I came up with that horrendous fraction.) Note that denominator is exactly 2**27. (How did I figure that out, you may ask? I asked lisp to tell me the base-2 logarithm of that huge denominator, which came out almost exactly 27, so I guessed it should have been exactly 27. Then to check that guess, I asked lisp to compute 2**27 exactly and I then visually compared the two numbers.)

In the other programming languages the same sort of conversion of decimal fraction input to internal binary fraction occurs, but it's much more difficult to figure out what value you're actually getting. Hmm, I suppose you could try multiplying by various powers of 2 until you get an integer. Let me try that in c right now ... yeah, that worked. I wrote a program loop that did exactly that, and the first one that came out an exact integer was: 0.1 * 2**27 = 13421773.000000. Now where did I see those numbers before, the power of two, and that numerator? Hey, both lisp and gnu c use exactly the same internal representation for single-presision floating-point values! I wonder if that's because IEEE standardized the internal representation of floating point values, and both CMUCL and gnu c are following the IEEE standard? I bet java and c++ are following the same standard. I'm not so confident about perl and PHP, but you never know. It's not worth checking it out right now, so I'll leave it as an unresolved question.

In addition to error introduced on input (due to conversion from decimal fraction to binary fraction, except in rare cases where exact conversion is possible, such as 0.5 which is exactly 1/2 i.e. 1 * 2**(-1) internally, and any exact integer value that's not too large), there's also more error introduced almost every time an arithmetic calculation is performed on floating-point values, and finally there's more error introduced almost every time a floating point value is output, due to conversion from internal binary floating-point representation to printable decimal fraction notation or decimal scientific notation.

In several of these programming languages, there are actually two different internal floating-point representations, single precision and double precision. As you may guess, double precision carries approximately twice as many sigificant digits of accuracy as single precision. Accordingly each language has slightly different notations for literals of the two precisions, and correspondingly at least lisp has two slightly different notations when such values are printed out. I prefer to avoid using floating-point values most of the time, because of the accumulating errors, so I'm not currently knowledgeable as to the different formats for the two precisions in any of the languages except lisp. Perhaps some reader of this tutorial who is expert at one of the other five languages will please enlighten me as to the details. Meanwhile I'll leave the details omitted here.

Rational numbers

The only one of the programming languages that implements rational numbers as a primitive data type, or as a type of literal, is lisp. The syntax for a rational number is simply a fraction, like 2/3 or 22/7, or -1/4. On input, lisp automatically checks to make sure the denominator isn't zero (signals an error), and divides out any common factor (reduces the fraction to lowest terms), before storing the value internally. All arithmetic with rational numbers (and integers) is exact, no error introduced at any point. Output likewise is exact.

Other literal types, and linked-lists

That's the last of the data types that are commonly used as literals within program source. Lisp allows several other data types to be used as literals, such as vectors (one-dimensional arrays), and complex numbers (real and imaginary part, each an integer or rational or floating-point approximation to a real number), and character objects (those #\Space etc. you saw earlier when I did a trick to show the individual characters in a string), but in practice software is hardly ever written with them as literals. Instead, some function builds such a data object at run time, given some specification in terms of integers and/or strings plus specification of the algorithm to build the object, or some declaration causes such data structures to be built at load time. So I'll wait until we're dealing with functions that compute values, including building complex data objects, and declarations that allocate variables and constants, such as arrays, before I discuss the rest of the data types.

Lisp has one other data type that is very often expressed explicitly in source code, but it's not a literal, it needs to be quoted whenever it's intended to be used verbatim rather than executed as part of an algorithm. These are standard pairs (implemented as specialized pairs of value-holding cells, such a pair called a "CONS cell" in lisp jargon, represented as dotted pairs), and lists (implemented as linked-lists of standard pairs, and represented as sequences of separate objects enclosed in parentheses) including nested lists. Here's the representation of a standard pair whose left half is the integer 42 and whose right half is the string "Hello World":

Here's a list of three elements, the floating-point approximation to 0.1, the string "Mary had a little lamb", and a list of two elements which are the integers 3 and 7:

...Remember, as I said but need to emphasize: These are not literals, so don't try typing them just like that into the lisp read-eval-print loop unless you like to see error diagnostics. If you really do want to play around with typing in nested structures of standard pairs and seeing how they print out, compressed as much as possible into list notation, put an apostrophe (single quote mark) in front of the expression. For example, type in this:

...and the read-eval-print loop should type out the nested list I showed you a few lines above here.

By the way, did you notice that the lisp program templates I showed earlier are in fact lists much like those. Yes, lists are the format of all lisp expressions/programs except for the most trivial. We'll get into that topic in the next section when we cover function calls, and later when we cover control structures, both of which are expressed as lists (usually nested) in lisp.

Function/method/infix calls/operations/operators/expressions

No matter what language you're using, there must be some simple operations already defined for you to just use, or you have no way to get started doing anything. We already discussed literals, which are expressions which simply express themselves as values in a program. All the rest of the values generated by your program will be computed by means of some function or method or infix operator that was already defined, using as input data those literals you learned how to type above. (For the moment, these functions/methods/operators will all have been set up by somebody else earlier. Later we'll learn how you can set up your own new functions and/or methods and/or new meanings for infix operators, which you can then use just like the built-in ones.)

All the languages except lisp have infix operators, usually for simple arithmetic operations such as addition or multiplication, and simple comparisons such as less-than or equal, and logical combinations such as and or inclusive-or, but also for string concatenation in java, and regular-expression matching in perl. (Lisp does all these operations via function calls which will be discussed next below.) The syntax for using infix operators is simple: First you write the first expression which produces a value feeding in, then you write the infix operator such as + (for addition) or * (for multiplication), finally you write the second expression which produces a value feeding in. For example, starting with with two integer literals 42 and 29 feeding in, to add them you say 42+29. That same expression works in php, perl, c, c++, or java. The value produced by that expression is then ...

...2+9 is 11, write 1 and carry the 1, 4+2 is 6, plus the carry is 7, so write down 7 and carry 0, no more digits, and last carry was 0, so we're done. So the resultant value of the expression, the sum of those two numbers, is

...That's how you might do it on paper using decimal arithmetic. What really happens inside the computer is that those literals are converted to binary on input, so 42 decimal becomes 101010 binary, and 29 decimal becomes 11101 binary, and then when the addition operation is performed it does:

...then later when you ask the computer to print out that value in decimal, it's converted back into decimal notation. Anyway, if you just write a simple expression that performs a single arithmetic operation on two numbers, that's what happens.

Now what happens if you try to combine several operations in a single expression. For example, suppose you want to multiply 42 by 2 and then add the result to 69. You nest the expressions. You write 42*2 to do the multiplication, and you nest that expression inside an addition expression, like this 42*2+69. But how does the computer know that's supposed to mean multiply first then add, instead of add 2+69 then multiply 42 by that? Because of what's commonly called operator precedence. Mixing addition and multiplication is assumed to have multiplication nested inside addition, so that multiplication is performed first, then addition is performed later using the result(s) from the multiplication. So in that case, 42*2 gives 84, then that's added to 69 to get 153.

But what if you really want to add first then multiply the result(s) of the addition? You use parentheses to group the parts that should be computed first, then after everything inside parenthesis is done the arithetic outside the parentheses is performed on the results. For example 42*(2+69) will add 2+69, to get 71, then multiply 42*71 to get ... 2982. Is that right? (Of course all the numbers inside the computer are in binary, and the arithmetic is done in binary. You see decimal only when you ask the computer to print out the result.)

Now these languages (php, perl, c, c++, java) have a lot more infix operators than just + for addition and * for multiplication. There's also subtraction and division, and bit-shifting (in binary of course), and bitwise operations (using the binary representation of course), and several more. Those languages also have some unary operators, such as taking the negative (additive inverse) of a number, which use this same operator-precedence way of treating the syntax, except that instead of two sub-expressions separated by the infix operator, you have a prefix operator followed by a single sub-expression.

Infix (and unary) operators have one more complication: If you chain several operators of the same precedence together, like 3+5+7 or 12/2/3, does the computer take that to mean start from the leftmost operator, perform that operation, then work that result into the operation to the right, like most of your pocket calculators do, or does the computer take that to mean start at the rightmost operator, perform that operation, then work the result back into operators to the left? That's what left or right associativity determines, which is different for some operators, usually from left to right, but the other direction in some cases. Note that some actual structure of the language beyond simple operations on values, for example conditionals (discussed later in the decision/loop section), are likewise caught up in operator precedence. It's really quite a mess that most programmers can't memorize so they keep a reference page handy that lists all the operators in their favorite language. Or they use parenthesis whenever they don't know the precedence, to force the operations to be done in the desired order.

Here's one of several attempts to present the reference table of operator precedence and associativity online: (WikiPedia) I like it because it lists the several languages, not just c and c++, which share this same standard of using infix and unary operators. However it doesn't format well on the Web browser I'm using here, so I would prefer some other table that formats better. Still looking down the list of Google results (searched for "c operator preference table") to see if I can find one that's legible here, but meanwhile I especially like the remark "Experimental results showing that developers have poor knowledge of binary operator precedence.", but the link is to a PDF file which I can't view here, sigh. I'd like to see it!! Ah, here's the first Google search result that's legible here (iedu.com), and another that formats really well (ssuet.edu.pk).

After infix operators, the second way to invoke pre-programmed operations is by calling functions. (Static methods in java and c++ are essentially just functions, so everything I say here applies to them just as well.) The syntax for calling a function in all languages except lisp is operation name followed by open parens followed by arguments separated by commas followed by close parens. For example, in c, printf is a function that takes a string as an argument. (It actually takes a format-specification string followed by additional arguments that are related to special tokens within the format-specification string. But if you just give it a string without any special tokens within it, what I say here happens:) For example, you might write this statement in c (the semicolon at the end is not part of the function call, it's because just about every statement in c must be followed by a semicolon):

...So what does that function-call do? It calls printf, passing it the string as an argument. But what does it do??? Well when printf is called, it prints out the string on your terminal or display. Actually it transmits the characters of the string to standard output, and then if you're at a shell those characters are transmitted to your terminal or display. Then having done its job, printf returns to whereever in your program you called it from, so the program can continue doing whatever is next in sequence. So now you know part of that hello world program you tried earlier. You still don't know how this statement earlier worked:

...Basically the %d in the format-specification string says there'll be a numeric value as the next argument, which is to be converted to decimal (hence the d) notation before printing (transmitting to standard output actually). For now that's all you need to know about the details of printf and its format specification string. You can use the %d token plus a following integer value to print the decimal representation of that integer value, and you can use a string directly as the first and only argument to printf if you want to print a string (assuming that string doesn't have any percent characters in it).

In the other languages which have this syntax for function calls (php, perl, c++, java), only the name of the function and some subtleties of its semantics are different. The basic form is the same. Here's a table of equivalents in these languages:

Note that in php and perl, you don't need the parentheses for such simple values, you can say just:

Now we come to lisp which has a totally different syntax for function calls, namely evaluated lists. First you type in a list of items, that is an open parethesis, then the several items in sequence, separated by whitespace (no commas are required nor allowed!), then a closing parenthesis. No semicolon is needed at the end of a statement. The closing parenthesis is enough to indicate the list is done. This list gets converted to internal form, which is a linked list. Then when the evaluator gets it (in the read-eval-print loop of course!), it looks at the first element in the list to see whether it's the name of a function, or something else. If it's the name of a function, then the remaining elements of the list are evaluated likewise, and upon return from all those evaluations, the evaluated results are passed to the function. Then whatever that function returns is passed back to whoever caused this evaluation to happen in the first place. In the case of the read-eval-print loop, the return value is then printed.

Here's a simple example of a function call, an arithmetic operation, adding two numbers. Each number is given as a literal.

...Ah, you've seen that particular addition example several times before, but using infix notation in all the other languages. But like I said above, lisp doesn't have infix operators. All arithmetic is done by function calls. This is your first example of that (except for prin2, which however did output not arithmetic so this is your first example of calling an arithmetic function in lisp). So what happens if you type that to the read-eval-print loop? First that whole expression is read (parsed) from that syntax into internal form as a linked list, something like this (ptr means a pointer, arrow shows where it points to):

...It's not really as gigantic as it looks in that picture. Each of those boxes is typically a single word of memory. In some implementations, both the NIL and the small numbers are actually stored inside those words, rather than pointed at from them, so it might actually be more like this in memory:

...In any case, after read parses your input expression and builds that linked list inside memory, eval takes over. First it checks the first element of that list, sees it's + which is a symbol (discussed later), notices that symbol is the name of a function, so eval will now treat the entire linked-list as a function-call. So next the arguments will be evaluated. 42 is a literal, so it evaluates to itself, namely 42. Likewise 29 is a literal, which evaluates to itself, namely 29. Finally those evaluated arguments are passed to the function, which adds the numbers (rememer they're in binary internally), and returns the sum of them, which would be 71 in decimal, but it's really all in binary internally. eval gets that value and passes it right back to the read-eval-print loop, which passes it to print. So what does print do with it? Converts it to decimal notation and prints it out (transmits it through standard output to your terminal or display), where you finally see 71 printed out or displayed.

Now a more complicated example, nested arithmetic, just like we did with nested usage of infix operators in those other languages:

...This is so much simpler than those other languages. You're always using parentheses, so there's no worry over operator precedence. You just nest your levels of parentheses the same as you want your operations to nest. Here's what eval does with that:

All that same complicated stuff happens inside the other languages, but with the extra complication of infix operators and operator-precedence, and the fact that c and c++ and java always compile such expressions as 42*2+69 into a sequence of inline machine-language instructions instead of actually calling functions (which is also why you get only fixed-precision numbers using machine arithmetic, and in case of overflow there is no way whatsoever to provide the mathematically correct answer, but what does happen is "undefined behaviour". In any case, it gives you wrong results instead of larger numeric values using as much space as needed as happens in lisp), this theoretical way of looking at what happens is obscured. And it's all hidden from you because the parse tree (the nested lists in lisp) are present only in the compiler environment (in c c++ and java anyway). They aren't accessible for you to see yourself except in lisp.

I'll give a few more examples of useful functions in lisp, showing both the syntax (again to drum in the point) and the semantics of these particular functions (because you'll need them often). First a function we saw as an unexplained mystery in examples earlier, namely princ. You recall the expression:

...for example? Now you know what's happening with that expression. The first element of that list is the name of a function, so the remaining element in the list is evaluated, but it's a literal so it evaluates to itself, then princ is called, which prints the characters of the string without the literal sytax (quotes around it), just the raw sequence of characters in the string. This is analagous to echo in PHP, print in perl, or System.out.print in java. If you want to print a string with the quotes around it (and backslant-escapes for special characters within it), i.e. just like the print part of the read-eval-print loop prints a string, except you want it printed under your control from inside your progarm instead of only from the top level of the read-eval-print loop, you do it with prin1. For example:

...That wouldn't be particularily useful of course. That's just to show the difference between princ and prin1. There's one more function I used in the cookie-jars "program" earlier: terpri, which doesn't take any arguments. It just ends the current line by moving the cursor to the start of the next line. (It transmits a system newline character through standard output, which your terminal or display device interprets as moving down to start of next line, and scrolling the window to make room for a new line if necessary.) Since it takes no arguments, it looks like this:

(Technical note you won't need for a while, but to satisfy any nitpickers reading this: Each of those functions princ prin1 and terpri take an extra optional argument that specifies the stream, which is used when you don't want to write to standard output, but to some other stream instead, such as to a file you're writing, or to a sub-process, or to a network stream, etc.)

Another note that might actually interest you already: You may have noticed that printf in c and c++ supports a format-specification string. This allows a mix of verbatim string characters and converted values of integers etc. to be output in a single function call. Lisp has a similar function called format. You'll be learning the details of these functions later. I'm not including these complications in this section because the purpose here is to introduce the function-calling mechanism in these six languages and give a few examples of really basic and easy-to-use functions (arithmetic and output) you'll need in the simple programming examples in the rest of this chapter.

In lisp, the mechanism for defining and calling functions has a additional feature not present in other languages: Keywords arguments. When defining a function (see later section), you can specify one or more keyword arguments, which are optional when the function is called. Some built-in functions have keyword arguments, such as a function for searching a string for a character that might be there, where you can use a keyword argument to tell it where to start and where to end if you don't want the whole string searched. To call a function with a keyword argument, you say :keyword (the actual keyword, such as :START, not the word keyword), followed by the value for that parameter to the function. These pairs can be in any sequence, but all must be after the required arguments.

In the descriptions of the built-in functions which take keyword arguments, there is notation for indicating which keyword parameters are allowed. The notation is something like this:

The meaning of that is that arg1 arg2 thru argz are each required, but then any combination of the keywords may be used, each a pair of the keyword itself and the corresponding value. For example, here's a description of a function's calling conventions in that notation:

where there are no required arguments, only keyword arguments, so all of the following forms of call are syntactically valid:

There are two wonderful uses for keyword parameters/arguments: When you have a function that has extra features not often used, so you don't want to require callers to supply them every time your function is called, and due to rare use the other programmer is unlikely to remember the sequence of them anyway, but is likely to remember their names. Also when you have a function already used in several places, but you want to add a new feature to it which is compatible with all the old features, and you don't want to require all callers to change their code to include the new arguments. You supply default values to make the function work just like before if the new keywords aren't supplied.

Just for fun, let's call that character-in-string function I mentionned. It's called position, and it tells you the index within the string where the first instance of the target character appears. Let's put several copies of that same character in the string so we can see how the keyword arguments affects its behaviour. Emphasized text below is what is typed into the read-eval-print loop, while normal text is what lisp types back out and my comments after semicolon:

There are three other keyword arguments available for position, so it's a good thing you need to specify only the ones you're actually using instead of all six every time, or your code would be a mess.

Stub: Instance-method calls, in c++ and java. (Lisp does this by the same syntax as function calling.)

A function is really a lambda expression in essence. A lambda expression is the formal way, in the lambda calculus, to specify a function. When you call a function by name, such as (square -5), what you're really doing is using the symbol square to look up the corresponding lambda expression (lambda (x) (* x x)), and then applying that lambda expression, or its compiled equivalent, to the value -5. But in lisp it's possible to skip the step of defining a symbol to have a function definition and then calling it by that name, instead just apply the lambda expression directly. If the first element in a form is a lambda expression, that's what happens. For example:

...A lambda expression used in that way is an example of an "anonymous function", a function without a name, a function that just is.

If you want to generate an anonymous function anywhere else, you need to wrap (function ...) around it. Note that if you wrap (function ...) around a symbol, lisp will retrieve the function associated with that symbol. Most commonly, explicit anonymous functions are used when you need to pass a function to some other function, such as APPLY or FUNCALL, or one of the mapping operations such as MAPCAR, or if you want to store an anonymous function inside a list or other container. For example, suppose you want to define three functions that do different things with a numeric argument, and then call one of them at random. You make a list of the three functions, and store it in a variable, like this:

...then you set up the value you want to use as argument to one of those functions, for example:

...Either of those will work. funcall takes each argument separately, whereas apply takes a list of all arguments. Of course you can avoid all those SETQs (except the very first) by nesting the value and index in a single form to evaluate, like this:

Stub: Calling a function which is the value of a variable, in lisp (done already above) and c (instead of calling a function specified explicitly in the code, i.e. by name or lambda expression).

Stub: Constructing a function-call form (a list whose first element is a function name or lambda expression), either by explicit code or by macro expansion, and then calling eval to evaluate the form, thereby in effect calling the function, in lisp.

Blocks

In all the programming languages except lisp, there's a strict distinction between statements and expressions. Statements can occur only at the statement level, which includes sequential steps in a block (to be introduced shortly), or at the very top level of a script (in perl or PHP only), and nested within certain statement-level syntax, such as the action-taken part of conditionals and repeat-loops (to be introduced in a later section). Expressions can occur only as the right-hand side of an assigment (to be introduced later in this section), or as an index of an array access (a type of function call, although in every language except lisp it has a different syntax we haven't discussed yet), or as an argument to a function call, or in the condition-to-test part of a conditional or repeat-loop, or nested inside other expressions. There are a few kinds of expression which are also allowed at the statement level, such as function calls and assignment expressions (including arithmetic operators that side-effect one of the operands).

In lisp there is no such distinction between statement and expression. Every statement returns a value, even if it's just NIL as the default when there's no useful value to return, so every statement can be used as a expression. (Technical note: It's possible to return more than one value, in which case only the first value is passed upward if the expression is nested inside another expression, specifically to pass a argument to a function, or to assign a value to a variable. It's possible to return no values at all, in which case a value of NIL is made up if the expression is nested inside another expression, same meaning. The only way to capture multiple values that have been returned from some function you called is to use a special form that deals with multiple values. These will be discussed later in a special section on multiple values. This of course applies only to lisp.) Even blocks and function-definitions return values in lisp. Contrarywise, every return value can be ignored, so a function call that would normally be considered an expression can be used instead as a statement. In particular, where in other languages there are totally different syntax for a decision between two different statements to execute (an if statement), and a decision between two different values to return (an if expression), in lisp these are handled by exactly the same syntax.

So when we speak of "statements" below, in all languages except lisp we mean specifically those forms of syntax that are allowed at the statement level, whereas in lisp we mean any form whatsoever which happens at the moment to be used at the statement level.

A block is a special notation (in all languages except lisp) or form (in lisp) that lists a sequence of statements that will be executed in sequence just as listed in the block. In all the languages except lisp, the special notation is open curly brace, followed by the statements in sequence (with semicolon at the end of each), followed by close curly brace. For example, if you look back to that cookies-in-jars program from earlier, you may notice the c and java versions have all the printing statements in such a block. Here's another, shorter, block, in c, to illustrate the point again:

...Notice in the first statement, 3+5 is just part of the string of text being printed verbatim, whereas in the second statement 3+5 is an infix-operator expression to actually add the literal integers 3 and 5 to produce a new integer value, which is then passed to printf to be converted to decimal notation and printed. The third statement is just to put a period (British "full stop") at the end of the sentence, and then output a line-break. This block could appear anywhere a statement could appear in a c program, such as the body of a function definition (discussed later), or the to-execute part of a conditional or loop, or simply nested as a statement within another block. This block (or any block) could not appear where expressions are required, such as an argument being passed to a function, or nested inside an infix-operator expression such as arithmetic, or as the right-hand side of an assignment, etc. Note that in all languages except lisp, blocks don't return values, so they can't be used where values are required. (Other languages than lisp don't automatically make up a default value such as NIL when a value is expected.)

In lisp, blocks do return values, so they can be used as expressions nested inside other expressions (usually as arguments to function-call forms or assignments). Partly for that reason, in lisp there are several different kinds of blocks: PROGN always returns the value of the very last expression in the block, after discarding the return values of all the other statements in the block. PROG1 saves the value from the very first expression, continues with the rest of the statements in the block, discarding all their return values, and then fetches that saved value from the first expression and finally returns it. PROG by default returns NIL, but allows a return from anywhere within the block, with whatever value is specified at the point of the return statement. PROG also allows declaration of local variables (to be discussed near the end of this section), and allows go-to statements that jump to labels within the block. This makes PROG most like blocks in other languages. The main difference is that a RETURN statement within a PROG returns a value from the PROG itself, not from the toplevel function where the PROG is nested in the function definition. But most commonly a PROG is used at the very top level of a function definition, where returning a value from the PROG in effect returns that value from the function, thereby working exactly like RETURN statements in other languages.

...The PROGN form returns the last value, namely 153. The PROG1 form returns the first value, namely 111. The first PROG form returns NIL. The second PROG form returns 2898, but the compiler probably issues a warning because the last statement cannot ever be executed. (In CMUCL the warning message says "Deleting unreachable code.") Notice that in the PROGN and PROG1 forms, right after the first element in the list which is that special-form name, we continue immediately with the statements to be evaluated in sequence. But in the PROG, there's a () between the PROG name and the first of the statements to execute. You recall I said that a PROG allows local variables to be declared. That's where it's done, in the first sub-list after the PROG, before the actual statements to execute. In this case we aren't declaring any local variables so we have an empty list at that point.

I've been hinting at declaring local variables within blocks. Here's where I really talk about it! In all the languages except lisp, there's a special syntax for declaring variables, and that declaration must state what type of data the variable is to hold, such as int (word-sized integer), or char (byte-sized integer capable of holding a single character), etc. In lisp you just list all the names of the local variables you want to declare, and that's all you need. Each variable can be used for any type of data you want. (Note: You can include type-declaractions next in a PROG block, which restrict the type of data which a variable can hold, and thereby allow the compiler to generate more efficient code, but I won't discuss any of that in this "Cookbook".)

Let's suppose you want to declare a variable called ch which will be used to hold a character, and another variable called n1 which will be used to hold an integer. Here's how to do it in c and then in lisp:

c++ and java are basically done the same way as c. I don't know about PHP or perl, need a proofreader to help me here.

So what's the value of declaring local variables? So that you can assign values to them which are private to your one block here (and to any other blocks nested within them). So how do you assign values to local variables? With assignment statements!! Here's the syntax in c (same in all languages except lisp), and then in lisp:

As you see, in c and the other languages (except lisp), you use an infix assignment operator. In lisp you use a special form, a form whose first element is the special operator SETQ. Remember I said the evaluator looks at the first element of the list to see whether it's the name of a function, or "something else". If it's the name of a function, all the rest of the elements are evaluated, and then their return values are passed as arguments to the function. But SETQ (and PROGN PROG1 and PROG earlier) are not names of functions, they are names of special operators, which herald special forms rather than function calls. So how does SETQ work? It ignores the second element for the moment, and evaluates the third element, in this case 42. The value from that is then assigned as the new value of the local variable named by the second element, n1 here. Whatever value that variable might have had before is discarded, replaced by the new value.

So why bother setting a value on a variable?? So that it can later be retrieved to be used in later calculations! So how do we retrieve the current value of a variable? Remember we had literals, which evaluated to themselves, and various kinds of complicated expressions and statements which did other things? But the only place we saw a word by itself (not text within a string) was either as the name of a function as the first element of a list to be evaluated (or in non-lisp languages in a special function-call syntax), or as the variable to get a new value, in an assignment statement? Well if you just use the name of a variable all by itself anywhere an expression is needed, what happens is that the current value of that variable is retrieved and returned as the value of that expression. So the following four blocks, first pair in c, then second pair in lisp, all produce the same output:

...In the first block of each pair, we pass the integer value of the literal 42 directly to printf or prin2 to be printed. In the second block of each pair, we assign 42 as the value of the local variable n1, then pick up that value from the variable n1 and pass it to printf or prin1.

At this point you know how to write a complete program in lisp, simply write a PROG, declaring whatever variables you need, writing whatever sequence of statements you want executed (evaluated), and feed it to the read-eval-print loop. For the other languages, you need only a tiny bit more to complete a program. In c or c++ or java you need to define a function called main (see next section). In java you also need to embed that function definition inside a class definition. In perl you really don't need to even set up a block, you can just string a bunch of toplevel statements into a script. But you probably want the header line that tells what interpretor to use. In PHP you need to wrap your script inside meta-HTML which identifies it as a PHP script, and of course the whole thing must be inside a Web page that is accessible for remote access, and which is enabled for PHP processing. The section after the next formally covers these issues and gives examples of each.

Function/method definitions (including main)

Functions are defined in c as follows: Say the type of value the function will return (or void if the function doesn't return any value), then the name of the function you want to define (no special keyword is needed to introduce it), then an open parenthesis, then declarations of the formal parameters if any separated by commas, then a close parethesis, then a block (open-brace statements close-brace) which is the body of the function. Example:

...Later when that name is used where an expression is allowed, the formal parameters will be temporarily set up to have the actual arguments as their values, then the body will be executed, then the temporary bindings of the formal parameters will be discarded. Note that copies of the actual arguments are passed. There's no way in c or java to have a formal parameter, which is like a local variable, share the same location in memory as the place where the original value came from, even if the original value came from a variable. So it's possible to change the value of the in-effect local variable (formal parameter) without affecting the value of the original variable whose value had been passed. But note that if a pointer (not discussed yet, but you probably get the basic idea already just from the word) to something, a variable or a data structure, is passed as an argument to a function, inside that function code can follow through the pointer and modify whatever memory is there or nearby. So by that means only it's possible for calling a function to side-effect a variable or data-structure in the calling code.

As you see in the example above, there's a return statement. This is how you return a value to the calling program. It's also how you get out of the body of the function. The return statement provides both services at the same time, set up the return value, and actually return to the caller. In the example, the return statement was the last statement in the body, so actually returning to the caller was somewhat redundant with falling to the end anyway. But if the return statement were to be placed earlier, the function would return right then, never executing any later statements in the block. It wouldn't be of much value to always return from the middle, but there might be a condition to test (see later section) where the function returns immediately sometimes and later other times.

In c, each function definition is supposed to be before the first time it is called. That's because the compiler works forward through the file checking each fuction-call to make sure the function is declared (i.e. at least a prototype showing return type and formal parameters), and generates an error message immediately when it sees a attempt to call a function that isn't declared. It can't guess that you're calling a function you will be defining later in the file. Most of the time you actually define each function before using it. But if you really must call a fuction before you defie it, for example if you have two functions that call each other (recursively probably), there's a way around this limitation: You write just a declaration for any function that needs to be called before it's defined. You write the type of return value, then the name of the function, then an open parenthesis, then the formal parameter declaractions separated by commas, and then the close parenthesis, all just like with a real function definition, but then you just put a semicolon instead of a block of statements. For example, if g1 wants to call g2 but g1 is defined before g2, you can write:

...For the moment you probably won't need to know that, and IMO it's a pain to try to keep the declaration matching the actual function definition when developing brand-new code, so I prefer unless absolutely necessary to define every function before it's called.

In java you define functions just like in c, except before the type of return value you must say whether the function is public or private, and whether it's static (a regular function) or an instance method (which we won't discuss just yet). Public means the function can be called from other files (classes) that are loaded at the same time, whereas private means it's accessible only from other functions and methods within the same file (class). Example:

In lisp, to define a function, you write a list with the first element being the special-form symbol DEFUN, then the second element being the name of the function you want to define, then the third element being the list of formal parameters (no type-declarations here), then all the rest of the elements are statements, executed in sequence, with the value of the last statement being passed back as the return value of the function just like with a PROGN block. (This is often called an implicit PROGN. Many kinds of special forms in lisp have implicit PROGNs. You'll see them popping up all over the place when we get to the descriptions of the various special forms such as COND and WHEN and UNLESS and LET etc.) Example:

...Note that unlike the other languages, you don't need an explicit return statement, you just need an expresssion that produces a value as the last statement in the implicit PROGN. RETURN is needed only inside a PROG. (Technical note that beginners can ignore: PROG is actually a macro that expands into a nesting of a TAGBODY inside a LET which in turn is inside a BLOCK, not the same kind of blocks we've been discussing before, more like just a binding of the name of returned values, which is NIL for PROG blocks.)

So what if you really do want a PROG block as the body of a function definition, so you can translate some c or java code directly across to lisp line-by-line? Well nothing's stopping you from having a PROG be the one and only statement of the implicit PROGN which is the body of a function definition. A RETURN from the PROG returns a value from the PROG, but the PROG is the only hence the last statement in the body of the function definition, so that return value is passed right through as the return value of the function, so you get an effect just as if the RETURN statement returned from the function. Here's a translation of our C program above:

In perl, to define a function, you write the reserved word sub, then the name of the function, then open-brace, then the statements in the body of the function, then close-brace. Note that you don't specify any formal parameters. Instead, whenever a function is called, the actual arguments are collected into an array, and it's passed to the function using the name _. Inside the body you say $_[0] to fetch the first argument, $_[1] to fetch the second argument, etc. To find out how many arguments have been passed, say @_. Normally at the start of the body of the function you set up local variables for each parameter, so that later you can refer to them by those meaningful names instead of by their parameter numbers.

Note that if you just assign values to variables without declaring them as local, you will be assigning values to global variables by that name, which is poor programming practice almost all the time. To declare a variable as local and assign a value to it, you put my() around the $name of the variable, and put that whole expression on the left side of an assignment. For example:

Note that you don't actually need the return statement in a function definition. The last "statement" in the block (body) can be a simple expression, which is automatically returned, somewhat like an implicit PROGN in a DEFUN form in lisp.

Note also that you don't need a separate statement to fetch the value of each parameter and assign to a local variable. You can do a destructing assignment like this:

...but then you're using global variables. Combining the explicit declaraction of local variables with the destructuring assignment, we have:

In PHP, you define a function by saying the reserved word function, then the name of function you want to define, then an open parenthesis, then the formal $parameters separated by commas, then close parenthesis, then open brace, then statements in body, then close brace. For example:

Now back to lisp for that super-duper feature that I mentionned earlier, which isn't available in any of the other languages: Keyword arguments! When defining a function, after the normal (required) formal parameters, you say &key followed by lists of two elements, each of the form (keyword defaultValue). If your function is later called with a keyword specified, the caller-supplied value is used, otherwise the defaultValue is used. As I mentionned earlier, there are two good uses for keyword arguments: When there are several optional features easily remembered by name but only a few used most of the time, and when you want to add a new feature to a function that is already in service without disrupting already-existing calls to how it was previously defined. I'll illustrate adding a new feature below.

Here's an example of the original definition of a function to compute the square of a number:

Now suppose later we want to add a feature of acting "stupid", computing the square of a negative number as the negative of the square. We don't want to write a whole new function, we just want the old function to sometimes act stupid. So we add a keyword parameter:

...so now if the stupid parameter is supplied and set to any true value, and also n is negative, then the negative square is returned, otherwise the normal square is returned. Thus:

Overall structure of a program

In c and c++, you write a program by defining a function called main, and any other functions it may need to call that aren't in libraries, and you also set up the compiler to know where the header files for the libraries are, so that the compiler can find the function templates so it knows how many and what type arguments each library function takes, and so that the compiler can tell the loader where to find the actual library files to load with your program (the header files contain such loader instructions). The usual sequence is to first include whatever header files are needed (these files will be incorporated into your program to be compiled just as if they were really part of the program you wrote), then make any global variable declarations you need (hopefully very very few!!), then do all your function definitions, ending with the defintion of main. For example:

stdio.h is the header file for stdio which is the standard I/O (input/output) library, containing such things as printf. Except for our continuing fudge of two cases of printf (printing a string verbatim, and printing the decimal conversion of just a single integer by itself), you should by now fully understand everything in that sample program. When we cover the full description of printf later (or see that link above), you'll understand how to condense all those multiple printf statements into this single statement:

...but that's a little advanced for this section which is just to show you how to put together a complete program out of the pieces that were described earlier.

With java, you never have to specify any header files, but you do have to specify any classes that contain functions/methods that you will be call that aren't considered part of the core java language. But if you are using only those core-language classes, you won't even have to do that. The most common package you might really want to use outside the core language is java.util, which you will need to declare at the top of your source file if you use any classes in it. (I'll discuss that topic later since you won't need it for the kind of simple programs you'll be writing at first.)

But the one thing you will need to do, no matter how simple your java program is, which you didn't need to do with c, is to put all your functions, including main, inside a class that you declare/define. You should make the first part of the name of your source file be the same as the name of your class. For example, if your class is called "Squ", the source file should be called "Squ.java". Here's the above c program translated to java:

...Notice that the entire body of a class definition has braces around it just as if it were a block of statements. Otherwise, all the rest of what you see in that example should make total sense. So now you know everything you need to write java programs.

Technical note to satisfy experts. Beginners may skip this paragraph if they wish. I have't discussed instance methods at all. But in fact we've been using a couple instance methods all along. Here's how that line of code

...really works: System is a class within the core language library which I mentionned above. out is a class-level variable (i.e. a class global) within that class. So System.out in the program fetches the value of that variable, which is an object of class PrintStream. That class has several methods called print. The method actually called depends on the type of argument passed to it. In this case, we are passing an object of class String. So the method called is print(java.lang.String), which sends all the characters of the string to the stream, in this case standard output. The next line of the program says

...which works exactly like the above except that n is a variable of type int, so the method that gets called is print(int), which generates the string representation of the integer, i.e. the decimal notation for the integer, and sends it to the stream, i.e. to standard output.

A program in c++ is organized exactly like a program in c, i.e. including header files, declaring any globals if any you need, and defining all your functions, including main. The tiny differences are that header files don't have the ".h" suffix in c++, and the I/O library you use works totally differently from the corresponding c library. Here's the c program translated to c++:

lisp is a whole different world! Instead of being forced to write standalone programs or scripts or HTML pages with embedded scriptlets, you are given an interactive environment, a read-eval-print loop. Every time you write a function, and get it working properly, that function can be used as an almost-toplevel program any time you want. You just write a form that calls it, and that simple form is your "program". At no point do you suddenly have a complete program when when you didn't a moment before. So if you want to write an interactive application using lisp, the application isn't a program or a script as it is in the other languages, it's a function.

...Good, I got it right. Now to build my function definition around that single line of code:

...Good, it works. Next I want to translate the function showsq into lisp. First I translate each line separately. Because I used the same variable name n, I already have a test value set up there. I check it just to be sure:

...Working, next to translate: printf(".\n"); which requires two lines of code in lisp the way I've been doing up to now:

...Both working, that's the last line of code to translate in that block. Next to collect all those lines of code into a PROG block and see if it works as a unit. This will be my first chance to verify that spacing between words and numbers is correct, because we'll be generating all the output during a single read-eval-print loop instead of doing each operation on a separate line as above. So here it goes all at once:

Next I wrap a function definition around the whole block, and make sure I declare my local variable of the PROG block:

...Working fine. Now to wrap the function definition around it. No local variables to declare in the PROG block.

So now I have available, any time I want, the low-level function sq, the master utility showsq which I can call interactively with any argument I want, and the toplevel canned script main. (Actually when I'm programming for myself, my current toplevel test script is usually called q, because (q) is a lot easier to type than (main). It's only in c c++ and java where the toplevel program absolutely must be called main.)

Decisions (branches) and auto-repeated units (loops)

One of the major features of computers, and languages to program them, is the ability of the computer to make decisions in real time as a program runs, doing one thing if the data is in a certain range and doing another thing if the data is in another range, and the ability of a human to write computer software that tells the computer precisely what algorithm to use for making such decisions. The simplest kind of such live decision is a two-way branch. Some test is performed on some data, resulting in a true or false result, and then one thing is done for true and another thing is done for false. Note two things are going on here, the test to make a true/false value (called a "boolean"), and using that true/false value to make a decision which thing to do. For my example, I'm going to modify the function sq in the above example so that it returns the stupid square. If the argument is negative, it returns the negative of the square, otherwise it returns the correct square. So if the number n is negative, we must compute -n*n, whereas if it's positive we compute n*n as before. (If it's zero, either way gives the same result so we can do either calculation that case, it doesn't matter, but we must decide which way to do it. I'll decide that if it's zero we'll use the positive calculation.) Here is that one expression in the various programming languages:

So if you edit the sample program, replacing the argument to return in the c c++ or java programs, or replacing the body of the function definition in lisp, it'll do that. (Of course I tried all of those before writing this paragraph!)

By the way, that same syntax (if condition trueDoThis falseDoThis) works in lisp everywhere, statement for effect or expression for value. But in c or c++ or java the above syntax works only for expressions, not for statements. If you want a 2-way decision for statements, the syntax is if (condition) { trueDoThis; } else { falseDoThis; } Many programmers have trouble remembering that condition ? trueDoThis : falseDoThis syntax, so they pull the whole thing out to the statement level where the syntax is easier to remember. Instead of the expression decision like we did above, namely:

...Doesn't that look ugly having to repeat the return even though it's the same in both cases?

Next we're going to construct a program loop using a block with a label and a go-to. This is the crude way, but at the machine level this is exactly the logic that is used to make program loops, and this is the easiest way to explain it the first time around. (Later we'll refactor it using a higher-level primitive for loops.) We'll change the main function to run a loop from -3 to +5, calling showsq for each of the integers in that interval, in sequence starting at -3. We'll start by setting a variable to have the value -3, then call showsq. Then we'll add 1 to that variable's value. If it's still less then or equal to 5, we'll go back to the showsq step again, otherwise we'll return. I'll show the main functions for lisp and c only. You should be able to figure it out for c++ and java easy enough. Lisp first:

...OK, I'll admit it, I was a little rusty and needed a couple attempts to get it right, but there it is working now! Either program, the lisp or c version, produces this output (remember we're still using the dumb square version of the sq function):

Most programming languages have a special syntax for setting up loops, which avoid the use of go-to. Both c (and c++ and java), and lisp, have at least two each (in lisp it's done via a special form). Here's that c program converted each way:

...Note in the do form the sub-expression (> num 5) specifies the exit condition rather than the keep-looping condition, the opposite of the condition in each of the c programs. Note also that in the loop form, the test is implicit in the keyword upto and following last value. (In that respect, lisp's loop form is similar to the way fortran and algol specified loops.)

Multi-programming-language "cookbook", toplevel

Integers (whole numbers), as literal data to get you started