Multi-programming-language "cookbook", matrix (Chapter 3)

(Purpose and scope of this document)
(Suggestions for use of this document)

Generic or multi-type:
Simple data types:
[Packages]
and [Symbols] [Strings]
[Classes]
[Sequences] (including integer indexes)
[Structures/Records]
General-purpose containers (other than linked-lists):
Input/Output:

Samples of code (Just started 2007.Feb.10, Web-accessible demos of examples of how to accomplish various tasks in all six languages, after HTML FORM contents have already been decoded. Currently contains only one task: Validating that a particular form field contains a (string) representation of an integer, and if so then converting to actual integer value, and if successful then counting past that value just to show we have a actual integer there.)

(Link back to main file)

 

 

 


Purpose and scope of this document

The primary purpose of this document is to provide a set of answers for questions of the general form "in language L, how can a program specify conversion of data-type T1 to data-type T2" for all reasonable combinations of two data types. This idea was inspired by a question of that form which appeared in a newsgroup, specifically how to find the ASCII character code for a given character. (original question: <1hslw8p.x4kuq71t0cwqoN%wrf3@stablecross.com> / complaint that Google search didn't find the needed info, which actually triggered my realization of a need for this 'matrix': <1hslyqe.188ewmp1wjd3gqN%wrf3@stablecross.com>) I've chosen to provide this "matrix" (as if the two data-types were row- and column-headers in a rectangular table) for six (at present when I wrote this in 2007) programming languages, namely the six (c, c++, java, Common Lisp, perl, PHP) for which I know how to implement demonstrations via CGI and have such facilities available on this Unix shell ISP, so that I can in fact eventually provide sample source code and runnable demonstrations of each transformation that I describe. (Update 2008.March: Flying Thunder was added to the CGI-capable languages, so eventually I should include it here too.) At each cell in this matrix, all six languages are compared as to how to program that one transformation in each. For the near future, I'm going to restrict my coverage to those such data-type transformations which are available as a built-in primitive (operator or function or method etc.) in at least one of these languages. I'm starting with those which are available in C (about 80% done as of Feb.25), then when that's done I plan to extend my coverage to C++, then Common Lisp, then java (up to J2SE version 1.4 only), then to perl and PHP.

The secondary purpose of this document more generally is to document every commonly-available and generally-useful data-processing task which is available as a built-in primitive (operator or function or method etc.) in at least one of these languages. Somebody who doesn't yet write computer software might thereby get a preliminary "feel" for which tasks are already programmed and immediately accessible by a single line of code compared to which tasks require loading a third-party library or writing an algorithm from scratch yourself. This information could serve to entice somebody into starting to write software, or allow somebody to assess whether a proposed project will be easy (just paste together a few existing functions or methods) or hard (write ten thousand new lines of code from scratch), and hence whether it's worth the novice's energy to even start such a project. This information could also help somebody choose an appropriate programming language for a new project, if such a choice is allowed (by the funding agency or supervisor/boss etc.)

To do (rough draft here): Add disclaimer that for C I want first to describe everthing available from the very start (K&R), and then try to include new features that have been added or changed via the C89 and C99 standards which haven't necessarily all gotten incorporated into the particular version of C that you may be using.

 

 

 


Suggestions for how to use this document

If you know the two primary data types you want to work with or convert between, find one of them as a primary heading in the matrix, then the other as a sub-heading after "and". If there's just a stub for the sub-heading, tell me and I'll expedite filling in that section. If there's already a link, click on it, and you'll get to a section that details functions/operators/methods that coordinate those two data types somehow.

If you know the name for a function in one language, which performs some task of interest, and you want to learn how to perform the equivalent task in another language, use the local-search feature of your Web browser to directly search for the name you already know. For a binary operator, surrounded the search term with spaces. For a unary operator, I'm not sure how to find it, maybe just browse the section that deals with the two datatypes, searching for that operator character within that section.

For the more basic info introducing these various data types in the various languages (chapter 1), and how to write programs in these languages (chapter 2), backtrack to the main file.

 

 

 


Booleans

Most languages don't have an explicit boolean data type, so they fake boolean by mapping true/false values into a subset of values of some other data type. Logical negation: given boolean b1, compute its logical inverse: Logical AND: Given boolean b1,b2, compute their logical conjunction: Logical OR: Given boolean b1,b2, compute their logical disjunction:

 

 

 


Numbers

Unary minus: Given number n1, compute its additive inverse: Binary plus: Given numbers n1,n2, compute their sum: Assignment plus: Given numeric variable n1, and number n2, replace the value of n1 by the sum of the old value of n1 and n2: Binary minus: Given numbers n1,n2, compute n2 subtracted from n1: Assignment minus: Given numeric variable n1, and number n2, replace the value of n1 by the difference of the old value of n1 minus n2: Multiply: Given numbers n1,n2, compute their product: Assignment multiply: Given numeric variable n1, and number n2, replace the value of n1 by the product of the old value of n1 times n2: Divide: Given numbers n1,n2, compute quotient of n1 divided by n2:

Divide-floor, divide-ceiling, divide-truncate, divide-round: (depends on type and sign of numbers, see Floats or Integers)

Remainder (modulus): (depends on type and sign of numbers, see Floats or Integers)

 

 

 


Integers

Pre-increment: Given an integer variable n1, increase its value by one, and also return the result: Post-increment: Given an integer variable, save the old value, increase the actual value by one, return the saved old value: Pre-decrement: Given an integer variable n1, decrease its value by one, and also return the result: Post-decrement: Given an integer variable, save the old value, decrease the actual value by one, return the saved old value: Divide (exact result): Given integers n1,n2, divide n1 by n2, producing the exact integer or rational result depending on whether n2 is or is not an exact multiple of n1: Divide-floor: Given integers n1,n2, in effect compute rational number which is the exact quotient of n1 divided by n2, but if it's not an integer then take instead the nearest integer that is smaller (closer to negative infinity), but actually produce that effect more efficiently without generating an intermediate rational result: Divide-ceiling: Given integers n1,n2, in effect compute rational number which is the exact quotient of n1 divided by n2, but if it's not an integer then take instead the nearest integer that is larger (closer to positive infinity), but actually produce that effect more efficiently without generating an intermediate rational result: Divide-truncate: Given integers n1,n2, in effect compute rational number which is the exact quotient of n1 divided by n2, but if it's not an integer then take instead the nearest integer that is smaller in absolute value (closer to zero), but actually produce that effect more efficiently without generating an intermediate rational result: Divide-truncate assignment: Given integer variable n1, integer n2, in effect compute rational number which is the exact quotient of n1 divided by n2, but if it's not an integer then take instead the nearest integer that is smaller in absolute value (closer to zero), but actually produce that effect more efficiently without generating an intermediate rational result, then assign that as the new value of n1: Divide-round: Given integers n1,n2, in effect compute rational number which is the exact quotient of n1 divided by n2, but if it's not an integer then take instead the nearest integer in either direction, but actually produce that effect more efficiently without generating an intermediate rational result: Modulus: Given two integers n1,n2, divide n1 by n2, using the floor of the quotient, and return only the remainder, thus the return value is in the interval [0, n2) if n2 is positive, in the interval (n2 0] if n2 is negative): Remainder: Given two integers n1,n2, divide n1 by n2, using the truncate of the quotient, and return only the remainder, thus the return value is in the interval (-abs(n2), 0] if n1 is negative, in the interval [0, abs(n2)) if n1 is positive: Remainder assignment: Given integer variable n1, integer n2, compute remainder using old value of n1 and n2, store result back in variable n1: Quotient and remainder: Given integers n1,n2, divide n1 by n2, returning both quotient and remainder: Pseudo-random numbers:

Complete table-of-contents for Common Lisp data types, constants, and functions dealing with Numbers (not just Integers), are here.

 

 

 


Floats

Divide: Given f1,f2 floats, compute f1 divided by f2 as closely as the floating-point representation will allow: Divide assignment: Given f1 floating-point variable, f2 float, change the value of f1 to be the quotient of the old value of f1 divided by f2 as closely as the floating-point representation will allow:

Modulus: Stub (lisp)

Remainder: Stub (lisp)

Exponential, logarithm, square root, and related functions: Trigonometric functions: Most of the languages have the usual functions for sine cosine and tangent, while some also have cotangent secant and cosecant. Details are too much for me to include here, so I'm referring you to documentation elsewhere: Inverse trigonometric functions: Most of the languages have the usual functions for arcsine arccosine and arctangent, while some also have arccotangent arcsecant and arccosecant, and some also have the two-argument arc-tangent which uses the signs of the x,y values to uniquely determine the quadrant. Details are too much for me to include here, so I'm referring you to documentation elsewhere: Hyperbolic functions, and inverse hyperbolic functions:

Complete table-of-contents for Common Lisp data types, constants, and functions dealing with Numbers (not just Floats), are here.

 

 

 


Pointers

In lisp and java, just about every object is handled via a pointer, but these pointers are invisible to the user/programmer. A very few data types are handled immediately, and which these are is implementation dependent in lisp. Furthermore, in lisp, even within a single formal data type, such as integer, some values are handled immediately whereas others are fullfledged objects in memory handled indirectly via pointers. For example, in most lisp systems, small integers are immediate while all other integers are "bignum" (similar to java "BigInteger" class) objects, and conversion between the two types is automatic as circumstances warrant (for example, a small integer calculation that "overflows" automatically builds a "bignum" to hold the complete result, while any "bignum" calculation whose result happens to be within the small integer range produces an immediate instead of indirect-pointer value). In java by comparison, immediate values and fullfledged objects are kept quite distinct, and any value that can exist as an immediate value can in fact exist in either form, and the programmer must explicitly call a method to convert between immediate numbers and number objects. For details about the conversion (in java) between immediate objects and "wrapper" object classes, see the specific type, such as [integer] or [float].

In the other languages (c, c++, etc.), by comparison, the user is given raw pointers to play with, and the slighest mistake in the program can corrupt unrelated places in memory that happen to be near the intended place, which can either corrupt data or totally destroy a functionning program. The rest of this section, and other sections relating pointers to other data types, deal with this topic of working with raw pointers.

Indirect-fetch (dereference): Given a pointer p1, get whatever data is located at the memory location pointed at by p1: Indirect-store: Given a value val, and a pointer p1, store that value into the memory location pointed at by p1: Pre-increment: Given a pointer p1, increase the pointer by one unit of data, and return the new pointer value: Pre-decrement: Given a pointer p1, decrease the pointer by one unit of data, and return the new pointer value: Post-increment: Given a pointer p1, save the old value, increase the pointer by one unit of data, and return the saved old value: Post-decrement: Given a pointer p1, save the old value, decrease the pointer by one unit of data, and return the saved old value:

 

 

 


Integers and Pointers together

Multiple advance pointer (value): Given pointer p1, and positive integer n1, return new pointer which is p1 incremented by n1 times the size of one data unit: Multiple backup pointer (value): Given pointer p1, and positive integer n1, return new pointer which is p1 decremented by n1 times the size of one data unit: Multiple increment pointer in-place: Given pointer variable p1, and positive integer n1, modify p1 in-place to contain the old value of p1 incremented by n1 times the size of one data unit: Multiple decrement pointer in-place: Given pointer variable p1, and positive integer n1, modify p1 in-place to contain the old value of p1 decremented by n1 times the size of one data unit: Compute #dataitems offset: Given two pointers n1,n2 of the same type, where both point within the same array or other allocated block of memory and are separated by an exact multiple of that type of data, compute the offset from n2 to n1 as multiple of data item size:

 

 

 


Integers as bitmasks

Inside the machine, in several of these languages, integers are implemented in two's complement notation. The individual bits of this representation can be manipulated, as if a bit vector (1-d array), with the rightmost bit corresponding to the numeric value 1 and successively leftward bits corresponding to powers of 2 larger, except the leftmost bit (of signed integers only) which is the 2's-complement-sign bit, as if 1 bits represented values of true and 0 bits represented values of false within the vector, by these operations:

Bitwise (1's) complement: Given integer bitmask n1, compute bitwise complement, which means every 1 bit becomes 0 bit and vice versa: Bitwise AND: Given integer bitmasks n1,n2, compute bitwise logical conjunction: Bitwise AND assignment: Given integer bitmask variable n1, integer bitmask n2, compute bitwise logical conjunction, store result back into n1: Bitwise inclusive OR: Given integer bitmasks n1,n2, compute bitwise logical disjunction: Bitwise inclusive OR assignment: Given integer bitmask variable n1, integer bitmask n2, compute bitwise logical disjunction, store back into n1: Bitwise exclusive OR (XOR): Given integer bitmasks n1,n2, compute bitwise logical XOR: Bitwise exclusive OR (XOR) assignment: Given integer bitmask variable n1, integer bitmask n2, compute bitwise logical XOR, store result back into n1: Shift-left: Given integer n1 treated as bitmask, and small positive integer n2 treated as integer, compute new value obtained by shifting n1 leftward by n2 bits, using zero bits to fill the n2 vacated positions at the rightmost end: Shift-left assignment: Given integer variable n1 treated as bitmask, and small positive integer n2 treated as integer, compute new value obtained by shifting n1 leftward by n2 bits, using zero bits to fill the n2 vacated positions at the rightmost end, store result back into n1: Shift-right: Given integer n1 treated as bitmask, and small positive integer n2 treated as integer, compute new value obtained by shifting n1 rightward by n2 bits, discarding the n2 bits that run off the right end: Shift-right assignment: Given integer variable n1 treated as bitmask, and small positive integer n2 treated as integer, compute new value obtained by shifting n1 rightward by n2 bits, discarding the n2 bits that run off the right end, store result back into n1:

lisp -- Numbers treated as bit vectors are treated as if there were an infinite number of copies of the leftmost bit (the sign bit) extending forever to the left. This is important to understand when combining bitwise operations earlier above and shifting operations immediately above. Non-negative numbers as bitmasks can represent the characteristic function of a finite set. Negative numbers as bitmasks then represent the characteristic function of an infinite set whose complement is finite. Shifting then represents Hilbert's Hotel room-shifting operations.

 

 

 


Numbers and Booleans together

Less-than: Given two numbers n1,n2, compare them, return true iff n1 is less than n2: Less-than chain: Given three or more numbers n1,n2,...,ny,nz, compare adjacent pairs, return true iff every pair satisfies the less-than relation: Less-than or equal: Given two numbers n1,n2, compare them, return true iff n1 is less than or equal to n2: Greater-than: Given two numbers n1,n2, compare them, return true iff n1 is greater than n2: Greater-than or equal: Given two numbers n1,n2, compare them, return true iff n1 is greater than or equal to n2: Equal: Given two numbers n1,n2, compare them, return true iff n1 is equal to n2: Not-equal: Given two numbers n1,n2, compare them, return true iff n1 is not equal to n2: No pair equal: Given three or more numbers n1,n2,...,ny,nz, return true iff they are all distinct, i.e. no equal pairs anywhere in the set:

 

 

 


Special features of compiler and loader dealing with pointers

Obtain pointer to variable: Given a variable v1, look up in the compiler's memory map to see where the loader will set it up, and generate a pointer to that location, of the type type*, where type is the type of that variable: Obtain size of a variable or datatype: Given variable v1, or datatype dt, look up the size (in bytes) of that type of object and return it as an integer:

 

 

 


Pointers and Booleans together

Less-than: Given pointers p1,p2, compare them, return true iff p1 points to memory location which is before memory location pointed at by p2: Less-than or equal: Given pointers p1,p2, compare them, return true iff p1 points to memory location which is before or same as memory location pointed at by p2: Greater-than: Given pointers p1,p2, compare them, return true iff p1 points to memory location which is after memory location pointed at by p2: Greater-than or equal: Given pointers p1,p2, compare them, return true iff p1 points to memory location which is after or same as memory location pointed at by p2: Equal: Given pointers p1,p2, compare them, return true iff p1 points to exactly the same memory location as does p2: Not-equal: Given pointers p1,p2, compare them, return true iff p1 points to any different memory location from where p2 points:

 

 

 


Assignment and temporary binding

Store value in global and/or already-declared variable: Given name of variable var, and expression exp which computes a value val, store that val into var, replacing whatever value might have been there previously: Parallel binding: Given simple variable names var1 var2 ... varz, and corresponding expressions for their initial values exp1 exp2 ... expz, temporarily bind those values to those variables, all in parallel (compute all the values first, then do all the bindings in one batch), then execute forms form1 form2 ... formz in that context, then return the value from formz: Sequential nested binding: Given simple variable names var1 var2 ... varz, and corresponding expressions for their initial values exp1 exp2 ... expz, each of which (except the first) may include references to any of the earlier variable names, temporarily bind those values to those variables, sequentially: First evaluate exp1 in the original context, then bind var1 to that value. Next in the enhanced context with that new binding, evaluate exp2, then bind var2 to that value, etc. sequentially until varz has been bound to the value from expz. Then execute forms form1 form2 ... formz in that context, then return the value from formz:

 

 

 


Scope resolution

Namespaces are handled so very differently in these languages:
Common lisp / c++ / java
There's virtually no commonality that can be collected together into a single kind of action, so I'll simply describe each mechanism separately. By comparison, c perl and PHP have only a single namespace, no scope resolution required.

Common lisp has packages, in which symbols are located. Each symbol in turn may have a function and/or a variable and/or other properties attached to it. Each symbol also has a link back to the package in which it is primarily interned, or a null link if it's non-interned. The package name is a simple name, generally following the syntax of the local name of a symbol. These symbols and packages are actual objects accessible to your program at rutime, not just symbols used by the compiler and loader, so you can explore them interactively from the read-eval-print loop if you wish. The following cases apply: The following packages are standard in every Common Lisp implementation: C++ has namespaces which are somewhat analagous to Common Lisp's packages. But since they are purely a compiler feature (no actual symbol in a package-like thing at runtime) which is communicated to the loader (to provide correct linkage), no such thing as a symbol at runtime hence no package-like namespace at runtime, and there's no such thing as a keyword package and no such thing as an non-interned symbol. However there is such a thing as a "global" namespace. Also a sub-namespace is created by each struct, creating a hierarchy of namespaces if the struct is itself inside a namespace. The following cases apply: The following namespace is standard in most c++ implementations:

Java has a whole hierarchy (tree) of namespaces, where the leaf nodes are called classes and the next layer above them are called packages. One major branch of this tree has the name java at the top and contains all packages and classes directly defined by Sun MicroSystems for inclusion in the java language as they define it. There's another branch of the tree for each major vendor or industry group that provides add-ons to what Sun provides. Finally there's a private branch for users to define their own local classes. There's a system variable called CLASSPATH which defines where all the roots of all the trees can be found on the local filesystem. When you specify the fully qualified name of a class, it'll match any directory path from any of these starting points, so it's a good idea not to make your own private sub-directory from your personal starting point that exactly matches one of the toplevel names defined by Sun or other public sources. Within each directory in the trees, jar files may substitute for actual filesystem sub-directories, so you need a class browser, not just shell commands, to explore deep into these hierarchies.

Java uses a period (British "full stop") to separate naming levels within the package hierarchy, and to separate the package name from the class name, and to separate the class name from either a static method name or a static member name, and to separate a reference to an instance from a method name (static or instance). The following combinations are possible: The following public packages are most important to know about:

There's pretty good online documentation for public java packages, and the classes and interfaces within each such package, and the constants and static variables and methods within each such class (and virtual methods within each such interface), a multi-level document organized in exactly that way, for example: version 1.3 / version 1.4.2

 

 

 


Type conversion

Safe conversion: Silently convert from a low-precision or small-range datatype to a higher-precision or wider-range datatype as needed: Unsafe conversion (casting): Convert in the reverse direction of that chain, discarding low-order or high-order bits to cram the wide data into a narrow spot if it doesn't exactly fit, or discarding the fractional part of a non-integer, to obtain an "equivalent" (cough cough) value of the new data type. The expression produces a value of the wider type, while you need to generate a value of newtype which is narrower:

 

 

 


Arrays (in general, or specifically multi-dimensional)

Allocating static arrays: At the global level, the array is created at the time the program is started, and disposed at the time the program exits. Within any block, such as the body of a function definition, the array is created anew each time the block is entered (such as when the function is called), and disposed each time control passes out of the block (such as when the function returns). Given the desired type of elements, the desired name of the symbol used to reference the array, and the desired numbers n1 n2 ... nz of elements along each dimension, allocate the array: Allocating dynamic arrays: You can make a new array any time you want, and get rid of it any time you want. It's a "first-class citizen", just like numbers, able to be passed around by reference, attached to multiple places such as variables or fields in containers. Given the desired numbers n1 n2 ... nz of elements along each dimension, allocate the array: Array indexing: Given an array already allocated, of dimension k, and integer indexes i1, i2, ... ik, reference the element at that position within the array:

 

 

 


Dynamic allocation of memory and objects

c: c++: In the other languages we don't allocate a block of memory and later copy data into it to build our desired object. Instead we create a new object occupying newly-allocated memory all in one operation:

 

 

 


Structures/Records

A structure, or record, is a compact organization of different types of data, as opposed to an array which is a regular repetition of exactly the same type of data (possibly a generic pointer) without variation. To specify the organization of a structure/record, instead of specifying a single type and then saying how many copies are required along each axis, the type of each component must be explicitly stated individually. Furthermore the organization of a structure/record can be nested, whereby sub-structures are part of the overall structure being defined. These sub-structures may be copies of previously defined structures, or layers of structure being defined at the same time as the overall structural organization. The way that structures are defined is sufficiently differently in the various languages, and sufficiently complicated in each, with not much the same between language families, that I'm treating them in separate paragraphs here:

c has several variant syntaxes: c++ similar to c:

Note that when directly nesting structures, the large structure is actually a very large structure which physically contains the inner structures within it. There is no way that two different instances of the same large structure can share any part of their insides, simply because memory in a computer is contiguous and it's impossible for the rest of the larger structure before or after the shared component to be in two different places at the same time. If you want shared structures between different large records, you use a pointer instead of direct physical nesting. Then you can have two instances of the large structure (not so large as before), each containing a field which points to exactly the same instance of the smaller structure virtually inside both of the larger structures. Of course the way you reference a separate object only pointed-at from inside the main object is different from how you'd reference a sub-object truly inside the larger object. Also since you are allocating the main object and the sub-object separately, you have a little more work to do there.

In lisp you don't have the option of physically nesting sub-structures within a larger structure, so you always use pointers from the large record to the included record. Since lisp has a garbage collector to automatically get rid of anything no longer accessible to your program, you don't have to go around chasing pointers to get rid of all the items linked from the main structure, so this is much more practical than in c (or even c++). To define a record organization and type simultaneously:

In java there aren't structures formally defined in the language. Instead, to emulate such, you define classes which have only instance variables (fields), no static (global) variables, and no methods whatsoever. This can get messy, because each class generates a separate object file, and to make compilation automatic as needed each class should be defined in a separate source file, so your directory can get huge if you do this a lot.

Instantiation/allocation of structures is simple enough that I'll do all the languages together as usual in this matrix document: Referring to an entire structure: Referring to a slot (field) within a structure:

 

 

 


Characters and Booleans

String-character? Given character ch, return a true value iff the character is of the special type that can be included as an element in a string: Is character lower case? Given character ch, return true value if it's a lower-case letter, false value otherwise: Is character upper case? Given character ch, return true value if it's an upper-case letter, false value otherwise: Is character upper/lower convertible? Given character ch, return true value if it's an upper-case letter convertible to lower case, or vice versa, false value otherwise: Is character alphabetic? Given character ch, return true value if it's alphabetic, false value otherwise: Is character a decimal digit? Given character ch, return true value if that character represents a digit in the decimal number system, false value otherwise: Is character alphanumeric? Given character ch, return true value if that character is alphanumeric (letter or digit), false value otherwise: Is character a hexadecimal digit? Given character ch, return true value if that character is a digit in the standard hexadecimal system (0..9, then A..F or a..f), false value otherwise: Is character a digit per some base? Given character ch, and integer base in the range 2 thru 36, return true value if that character is a digit in that particular base (an extension of the standard hexadecimal system: 0..9, then A..F..Z or a..f..Z as needed), false value otherwise: Is character punctuation? Given character ch, return true value if that character is punctuation (printing, but neither alphanumeric nor space), false value otherwise: Is character whitespace? Given character ch, return true value if that character is whitespace, false value otherwise: Is character blank? Given character ch, return true value if that character is blank (space or tab), false value otherwise: Is character graphic? Given character ch, return true value if that character is graphic (has glyph associated with it), false value otherwise: Is character printing? Given character ch, return true value if that character is printing (graphic or the space character, but not tab), false value otherwise: Is character a control? Given character ch, return true value if that character is a control (anything not printing, including tab), false value otherwise: Is character a standard US/UK ASCII character? Given character ch, return true value if that character is in the standard US/UK 7-bit ASCII characterset, false value otherwise:

In c, characters (declared as type char) are simply very short integers (8 or 9 bits) considered as if characters, whereas in Common Lisp, characters are a whole separate kind of data runtime-distinguishable from any integer (although deep inside the character object there is of course the ASCII code for the character). Accordingly in c you compare the ASCII numeric value of characaters just by directly comparing them as integers, whereas in Common Lisp you call special functions for comparing them (or convert to their ASCII values and compare as integers, which is more work, so why bother?). (More about characters in these various languages in the toplevel Characters section.)

Character equality: Given two characters ch1 and ch2, return a true value if they are they exactly the same character, otherwise a false value: Character inequality: Given two characters ch1 and ch2, return a false value if they are they exactly the same character, otherwise a true value: Character less-than: Given two characters ch1 and ch2, return a true value if ch1 comes before ch2, false otherwise: Character greater-than: Given two characters ch1 and ch2, return a true value if ch1 comes after ch2, false otherwise: Character less-than-or-equal: Given two characters ch1 and ch2, return a true value if ch1 comes before ch2, or if ch1 is the same character as ch2, false otherwise (if ch1 comes after ch2): Character greater-than-or-equal: Given two characters ch1 and ch2, return a true value if ch1 comes after ch2, or if ch1 is the same character as ch2, false otherwise (if ch1 comes before ch2):

In lisp, case-insensitive comparisons of characters are performed using a special character ordering where the corresponding upper-case and lower-case letters are equated pairwise. Whether this is done by simply mapping all upper-case characters to lower-case before comparing, or vice versa, or using a total different scheme for ordering characters, is implementation dependent. These comparisons are used mostly to compare alphabetic words, where all implementations produce the same results.

Case-insensitive character equality: Given two characters ch1 and ch2, return a true value if they are they the same character, ignoring distinctions between case (upper/lower), otherwise a false value: Case-insensitive character inequality: Given two characters ch1 and ch2, return a false value if they are they the same character, ignoring distinctions between case (upper/lower), otherwise a true value: Case-insensitive character less-than: Given two characters ch1 and ch2, return a true value if ch1 comes before ch2, ignoring distinctions between case (upper/lower), false otherwise: Case-insensitive character greater-than: Given two characters ch1 and ch2, return a true value if ch1 comes after ch2, ignoring distinctions between case (upper/lower), false otherwise: Case-insensitive character less-than-or-equal: Given two characters ch1 and ch2, return a true value if ch1 comes before ch2, or if ch1 is the same character as ch2, ignoring distinctions between case (upper/lower), false otherwise (if ch1 comes after ch2, ignoring case): Case-insensitive character greater-than-or-equal: Given two characters ch1 and ch2, return a true value if ch1 comes after ch2, or if ch1 is the same character as ch2, ignoring distinctions between case (upper/lower), false otherwise (if ch1 comes before ch2, ignoring case):

 

 

 


Characters

In c, characters (declared as type char) are simply very short integers (8 or 9 bits) considered as if characters, whereas in Common Lisp, characters are a whole separate kind of data runtime-distinguishable from any integer (although deep inside the character object there is of course the ASCII code for the character). Accordingly in Common Lisp you can intermix characters and numbers in containers such as linked-lists and arrays, and later when you retrieve such a object the type system will tell you which kind of object it is. In c, by comparison, you can't ever intermix (*) characters (as type char) and integers because they occupy different amounts of storage which must be known at compile time. If you expand characters to more bits to occupy the same amount of storage as some type of integer, and intermix them with true integers, there's no way to later tell them apart. (*) (By "intermix" I mean like in an array or linked-list or other uniform-contents container. Of course you can define structures that have special slots occupied by characters and other slots occupied by integers, but that's not what I'm talking about. I'm referring only to containers where at compile time it isn't yet known which elements will contain characters and which will contain integers.) (Some older versions of lisp, such as MacLisp and Emacs-lisp, didn't have characters either, and used integers instead, much like c. But the only dialect of lisp covered in this "cookbook/matrix" document is Common Lisp, so not to worry.) In java, you can have either a character or an integer as a fullfledged object, whose type can be checked at run time, but also you can have primitive types where you must declare the type at compile time and can't intermix different types at all in a container. (See the [Characters and Integers] section for how to convert from one type to the other in lisp or java.) Perl doesn't have characters either, but instead of using integers it uses single-character strings as stand-ins for character objects.

Convert to lower case: Given character ch which is an upper case letter, return the lower-case equivalent: Convert to upper case: Given character ch which is a lower case letter, return the upper-case equivalent: Convert to US-ASCII: If ch is a character not in the US/UK ASCII character set, force it into that character set by simply zeroing out all bits except the seven low-order bits:

 

 

 


Characters and Strings

In lisp, a string is internally a vector (one-dimensional array), allowing all the usual array operations on it (see the sections [Arrays] and [Arrays and Integers] for details), but is also considered to be a sequence, allowing all the usual sequence operations on it (see the sections [Sequences] and [Sequences and Integers] for details). This section deals only with functions that work only with strings in regard to their character elements, except where a more general sequence function serves as an equivalent of a string-specific function in another language.

In c, there's no string type in the first place. One-dimensional arrays of characters, containing non-zero bytes terminated as a zero (NUL) byte, serve as "strings". See the sections [Arrays and Integers] and [Strings] for relevant information.

The rest of this section deals with advanced relationships between strings and characters, beyond simply indexing elements within character arrays which is covered in [Strings and Integers] and [Vectors and Integers].

Get name of character: Given character ch, if the character has a name, return that name, otherwise return false value: Get character with given name: Given string str, if the string is the name of some character, return that character, otherwise return false value:

 

 

 


Characters and Integers

In c, there's no such data type as character as truly distinct from integers, instead character literals are really integers, and any sufficiently small non-negative integer can be treated as a character when printing (see printf for details of how to achieve that effect), so conversion happens whenever the other type is needed, such as across an assignment, or when a character is fed into a arithmetic function/operator. Consequently no special functions are needed to convert back and forth.

In lisp, integers and characters are two completely different data types, so conversion functions are necessary. See the section Characters for why this design decision is sometimes better.

Get numeric code for given character: Given character ch, return the corresponding numeric code, per the system's standard character encoding (usually ASCII, but might be EBCDIC on some systems): Get character for given numeric code: Given integer n, if it's the code for a character, per the system's standard character encoding (usually ASCII, but might be EBCDIC on some systems), then return that character: Convert weight of digit to character representation of digit: Given integer base (in range 2 to 36), and integer n (in range 0 to base-1), return the character representing that digit: Is character a valid digit? If so, what weight? Given integer base (in range 2 to 36), and character ch, if it's a valid digit in that base then return it's numeric value (weight), otherwise return a false result:

 

 

 


Characters and anything

Coerce to character if possible: Given anything foo, if it can be coerced to a character, return that character, otherwise return a false result:

 

 

 


Strings and Integers

Parse integer: Given a string str containing the representation of an integer in some base, return the integer numeric value:

Safe parsing of integer, validation of input from user or other unsafe source in six languages.

 
Length of string: Given string str, return its effective length: Copy string (overwriting, exact length): Given string strfrom, mutable byte vector strto, and size telling how many characters to copy, copy exactly that number of characters from strfrom overwriting the initial segment of strto: Concatenate strings (overwriting, exact length): Given string strfrom, mutable byte vector strto, and size telling how many characters to copy, copy exactly that number of characters from strfrom, overwriting starting at the effective end of strto: Compare strings lexicographically: Given strings str1,str2, return -1 or +1 depending on direction of first difference, or which exactly matches but is longer, or 0 if all corresponding characters are the same and lengths are the same: Compare prefixes of strings lexicographically: Given strings str1,str2, and size of portion to compare, return -1 or +1 depending on direction of first difference, or 0 if all corresponding characters are the same: Find character in string: Given string str, and character ch, find first instance of ch as element of string: Find character in string, searching backwards: Given string str, and character ch, find last instance of ch as element of string: Find substring in string: Given strings needle,haystack, find first location where needle exactly matches a substring of haystack: Skip over particular characters in string: Given strings str,bag, find first character of str which is not any one of the characters in bag: Skip until first particular character in string: Given strings str,bag, find first character of str which is any of the characters in bag:

 

 

 


Strings and Floats

Parse float: Given string str containing the representation of a floating-point number, parse it, return the float (numeric) value:

 

 

 


Strings

Strings in c are nothing more than byte vectors (1-d arrays) whose individual elements contain character codes (ASCII on most systems, EBCDIC on some) for non-NUL characters, followed by a single NUL byte just after the end of the string, followed by junk from there to the end of the allocated vector. These can exist as string literals (which aren't allowed to be modified in either contents or length), or static declared allocations (which can be modified per both contents and length but only so long as the effective length plus the extra NUL byte don't exceed the total allocation size, but warning: no runtime check for such out-of-bounds overwrite happens, and that's a common cause of trashing memory and/or violating security), or dynamic allocation (where realloc might be able to change the total allocation size, or might make a copy elsewhere, but otherwise these are similar to static declared allocations).

Strings in Common Lisp are all allocated objects. These can exist as string literals (which aren't allowed to be modified in either contents or length), or as runtime-constructed objects (which can be modified by per-character overwriting but the length is constant, unless the string is created with a fill pointer, and even then the length can't exceed the allocated size unless the string was also created adjustable). See [Vectors] for details about fill pointer and adjustable. All array indexing, including indexing within strings, is fully checked against index out of bounds, signalling an exception when that happens, thereby preventing trashing of memory.

Strings in java are all allocated objects, and are all totally immutable regardless of whether they are string literals or constructed at runtime. But StringBuffers act like strings that can be expanded, and otherwise modified, after creation. Despite the commonality of primitive data types between c and java, arrays of characters are not used as strings in java, although arrays of integers might at times be used to hold buffers of raw/numeric data which might contain character codes.

The functions defined below, and in other sections relating Strings to other data types, reflect these implementation differences between the several languages.

Copy string (overwriting): Given string strfrom, and mutable byte vector strto, copy strfrom to overwrite the initial segment of strto: Copy string (allocate new): Given string strfrom, allocate a new copy of it, return pointer to the copy: Concatenate strings (overwriting): Given string strfrom, and mutable byte vector strto, copy strfrom to overwrite strto starting past the effective end of strto: Concatenate strings (allocating new): Given strings str1,str2, allocate a new string which is the concatenation of them:

 

 

 


Vectors and Integers

Copy block of bytes (overwriting, exact length): Given pointers ptrfrom and ptrto, and size telling how many bytes to copy, copy exactly that number of bytes starting from whereever ptrfrom points, overwriting starting whereever ptrto points: Copy block of bytes (overwriting, exact length, or to delimiter): Given pointers ptrfrom and ptrto, integer c, and size telling how many bytes to copy, copy exactly that number of bytes starting from whereever ptrfrom points, overwriting starting whereever ptrto points, except stop early if a byte matching c is encountered: Fill block of bytes (overwriting, exact length): Given pointer ptrto, integer c, and size telling how many bytes to fill, write copies of c repeatedly, filling the block starting whereever ptrto points, for a total number of size copies: Fill block of bytes with zero (overwriting, exact length): Given pointer ptrto, and size telling how many bytes to fill, write copies of c repeatedly, filling the block starting whereever ptrto points, for a total number of size copies: Compare blocks of bytes lexicographically: Given pointers ptr1,ptr2, and size telling how many bytes to compare, return -1 or +1 depending on direction of first difference, or 0 if all corresponding byte-pairs are the same: Find byte in block: Given pointer ptr, byte by, and size telling how many bytes to search, find the first byte matching by within memory starting at ptr: Find sub-block in block: Given pointer needle, needle_len telling how many bytes to try to match, pointer haystack, haystack_len telling how many bytes total to search within, find first sub-block of haystack exactly matching needle:

 

 

 


Floats and Integers

Test to distinguish positive or negative infinity: Given float x, return -1 if x represents negative infinity, 1 if x represents positive infinity, and 0 otherwise: Split out power of 2 from float: Given float x, multiple/divide by a power of 2 such that the result is at least 1/2 but less than 1, return that scaled float and the exponent of 2 that was used: Scale float by specified power of 2: Given float x, and integer exponent, return the value x * (2**exponent):

 

 

 


Floats and Booleans

 

 

 


how to contact me

Copyright 2007 by Robert Elton Maas, all rights reserved