List Operations

Mathscript is derived from LISP. It shares with LISP a faculty that makes it especially well adapted to create powerful simulations. That is, its ability to create complex and rich structures dynamically. This it does through the manipulation of lists. In this section, we discuss a few of the list-manipulation functions that are available to MathScript, and we discuss the special conventions for manipulating lists.

For lists in general, the relevant Commands and Functions are:

In what follows, it will be important to distinguish between the names of things and the things themselves. An object in Mathscript, say a Matrix, may have a name. And again, it may not. For example, if M and N are the names of matrices of the same dimension, it means that, upon evaluation, the value[M] and value{N] are matrix objects. It is also true that value[M+N] is a matrix object, but would be incorrect to say that M+N is the name of that object. In Lisp, and in Mathscript, the relationship between name and object is different from the relationship between those things in Mathematics. In Mathscript, the name is not identical to the value. There always intervenes the process of evaluation which replaces the name by the object that the name represents.

In Mathscript, atoms are the most elementary units of data. A variable is a name that evaluates to itself. Decimal numbers, strings, vectors, matrices, constants, and pointers also evaluate to themselves. These items in the language that evaluate to themselves are called Atoms. All other items in the language (functions, non-atomic expressions, rational numbers, equations, and Lists) are composite objects, constructed in one way or other, from Lists. It will be useful in this section, therefore, to distinguish between Atoms and Lists.

The atom False plays a unique role. It is the name of the Empty List: list().

Now the other type of basic data structure in MathScript, the List, is most simply defined recursively:

A binary tree is a pair (a:b) with the property that a and b are either atoms or binary trees. A picture of the binary tree (a:b)

is: *

/ \

a b

We shall be concerned only with finite binary trees. This means that only a finite number of atoms appear in the tree. Now symbols such as (a:b) are called 'dotted pairs'. We use the semicolon instead of the period in this implementation to avoid confusion with decimal points.

Actually, the term 'tree' indicates that there is no 'turning back' in this structure, and is strictly speaking a misnomer. More general directed graphs with two exit edges from each node are possible but we shall not discuss these yet.

Now finite binary trees, while they can be read and written by the system are not used very much. Instead, a third type of data structure, called a List will dominate. Still it should be kept in mind that MathScript represents nearly everything to itself as either atom or binary tree. We shall use the word item to refer to an entity which is either an atom or a list.

We discuss now the internal representation of Lists. First of all, a list is an ordered finite sequence of items enclosed in parentheses: (item1 item2 item3 .... itemN) where each item is either an atom or a list. Again we require that a finite number of atoms appear among all the sublists of the list. Notice the 'recursive' style of definition. This is very common and natural in MathScript in light of the fact that its most basic data objects are best defined in terms of themselves. Again we postpone until later the discussion of lists that 'turn back' such as 'circular' lists.

Externally, this list would be entered into Mathscript by evaluating:

list( arg1, arg2, ..., argN )

where: value[arg1] = item1, value[arg2] = item2, ..., value [argN] = itemN

Now a list such as the one above can be constructed as binary tree by making use of the special atom False in the following way. The value of the atom False, value[False], is the empty List, ( ), which Lisp and MathScript print as NIL.

*

/ \

item1 *

/ \

item2 *

/ \

( item1 item2 item3 ... itemN ) ==> item3 .

.

.

*

/ \

itemN ()

This is in fact the convention used internally to represent lists. The system distinguishes between the left and right branch at each node. Then a binary tree with the property that each right terminal node is the empty list is a list. Note that the items above may themselves be lists.

These then are the data structures on which we'll focus: atoms and lists. We mentioned that atoms have values and that for literal atoms those values can be changed. Let us see now how this happens.

If, at a command line, you type: Print Foo, then press Enter, you will receive an error message saying that the name Foo is not recognized. (If on the other hand, you type Print R

you will not receive this message, because MathScript makes all single-letter names, with the exception of E and D, variables). Thus MathScript cannot evaluate Foo. Suppose now we want to give Foo the value 1.

Type: Make Foo 1

If you type Print Foo now, you will see 1 displayed. We shall call the result of evaluation either of an atom or of a list the value of that item.

Now suppose you type:

make a list(1,2,3).

Then you will have created a list object. It is represented internally as:

*

/ \

1 *

/ \

2 *

/ \

3 ()

or as (1 2 3).

This list is an object, and it is the value of a. That is, whenever a is evaluated, it is replaced by this list. Now since the list is an object, it ought to be possible to find its value! A simple way to do this is to type, at a command line,

evaluate(a). The effect of this is to evaluate a first in order to find the actual argument to the evaluate( * ) function, then to apply evaluate to the result. Unsurprisingly, evaluate simply causes another evaluation. Thus evaluate gets to take the value of the object (1 2 3). The result is the error: Function not defined!

Let us try to understand what has happened. When MathScript evaluates a list

such as (1 2 3) , it always (that is, without exception) interprets the value of the first item in the list as a function to be applied to some list of arguments.

This rule is unequivocal. When MathScript evaluates an expression such as

( Fn a1 a2 ... aN ), where Fn is the name of a function, the list of arguments is the list

( value[a1] value[a2] .... value[aN] ). Note the 'recursive' character of MathScript evaluation:

to evaluate the list, MathScript must first evaluate the variables a1...aN which

may themselves be lists, and so on...

MathScript introduces a wrinkle which will be discussed later. Fn may be

an atom or a list, and will actually be evaluated first, over and over again

until one of the resulting values is something which can be applied to a

list of arguments.

At this point, although perhaps it's not obvious why, we will mention that the order of evaluation of these items is important. And they are evaluated from left to right: first a1, then a2, and so on. Then the effect of the evaluation of (Fn a1 a2 ... aN) will be the application of function value[Fn] to the list (value[a1] value[a2] ... value[aN]).

To review briefly what has been said so far, there are two basic data structures with which we shall work: atoms and lists. Dotted pairs, or binary trees will generally not be used here. Atoms and lists have values. The values of atoms (with the exception of numerical atoms)

can be assigned using the make command. Other methods for assigning values to atoms

will be discussed later. Lists cannot be assigned values. In either case, the values are obtained on evaluation. Finally, when MathScript evaluates a list in the form ( f b1 b2 b3 ... bN ) it evaluates it by first evaluating f, and continuing to evaluate the results until it finds a primitive system function object, or a user-defined function object. At this point, it computes up a list of arguments for the function object and applies the function to the list of arguments.

The conclusion is that in general one should avoid evaluating list objects in MathScript. It's fine to evaluate the name of a list object, but that is not the same thing as evaluating the list object itself.

Two things are clear. First, lists are very versatile structures, and it is desirable to work with them because of their versatility. Second, it is dangerous to evaluate lists because of Lisp's evaluation protocol. In light of these considerations, we will plot a route, using MathScript functions, that avoids list evaluation but that still leads to many of the fruits of Lisp programming.

Suppose we consider the following general problem. We would like to construct lists and given those lists, select items from them, say the first, the second, and so on. We would also like to be able to select the last item on the list. Finally, suppose that we would like

to rearrange the items in the list. For simplicity, let us say that we want a way to reverse a general list, that is, place the items in the opposite order. These are straightforward problems, and much of value about MathScript programming can be learned by solving them one at a time.

Let us start with the problem of creating a list, say a list of names:

(sally jeremy michael lisa emily). How does one create such a list?

If you type

list(sally, jeremy, michael, lisa, emily)

at a command line and enter it, MathScript dutifully attempts to apply

the function list to the arguments:

( value[Sally] , value[Jeremy] , value[Michael], value[Lisa] , value[Emily])

Since the atom Sally has not been given a value, it returns an error message.

The answer is first to make those names variables:

make variable Sally;

make variable Jeremy;

make variable Michael;

make variable Lisa;

make variable Emily;

Then if you type, at a command line:

list(sally, jeremy, michael, lisa, emily)

the value of the variable ANSWER will be the list:

(sally jeremy michael lisa emily) The effect of the list function is to assemble its evaluated arguments into a list.

Now a better way to do this would be to type:

make friends list(sally, jeremy, michael, lisa, emily)

Then the value of friends would be (sally jeremy michael lisa emily)

and friends can be used later as the name of the list.

Suppose now the value of friends is a list, such as the one above. How

could we extract the first element? MathScript supplies a special function for the purpose. The function is called first.

First takes a single argument which must be a nonempty list. Thus the result of evaluating first(mylist) is the item (atom or list) which is the first item of the list: value[mylist]. Thus if you have entered

make friends list(sally, jeremy, michael, lisa, emily)

then type:

print first(friends)

MathScript prints SALLY. This is fine, and obviously very useful. Now try entering first(Sally). You get the error message that CAR was applied to an atom. CAR is Lisp's name for First. And First must only be applied to a nonempty list.

This sort of error occurs frequently for reasons which we'll see later. First is an example of a 'selector' operation, it selects an item in a list.

There is a 'dual' selecter which we must discuss. Recall that the list Friends above actually is represented internally as the binary tree:

*

/ \

value[ Friends ] = Sally *

/ \

Jeremy *

/ \

Michael *

/ \

Lisa *

/ \

Emily ()

First(friends) is just the atom obtained by choosing the left branch at the top node of the tree. If instead, the right branch is chosen, we would have

*

/ \

Jeremy *

/ \

Michael *

/ \

Lisa *

/ \

Emily ()

that is, the List (not atom): (Jeremy Michael Lisa Emily). The MathScript function which selects the right branch is called Rest. Lisp calls it: Cdr (after contents of the decrement register). So

Rest(Friends) has value (Jeremy Michael Lisa Emily). It is in this sense that

Rest is dual to First. But notice that Rest must also have a nonempty list as

argument (otherwise it returns an error), and that Rest always returns

as value a list while First might return a list, or it might return an atom.

While First and Rest are the basic selectors, it is possible to generate

in principle infinitely selecters from them. Suppose for example we wanted

a way to get the second element of a list. One way to do it would be to

take advantage of MathScript's talent for composing functions. Thus the value of

First (Rest (Friends)) is the application of First to (Jeremy Michael Lisa Emily), which

is Jeremy. In general, if a list L has at least two elements, the effect

of First (Rest( L)) is to select the second item. This brings us to an

interesting point. Suppose Z has value (Jeremy). What is the value of

First (Rest (Z)) ? Well Rest(Z) is the list consisting of every item in Z

following Jeremy... But there are no items following Jeremy, and so this

list, Rest (Z), is empty! In terms of trees, the value of Z has the

representation:

*

/ \

Jeremy ()

Thus, in light of our earlier remarks, Rest(Z) has the value ( ). It has the same value as the atom False: the empty list. The empty list is considered to be an atom, and this point must be kept in mind, because not taking it into account can lead easily to bugs in MathScript programs. Now since Rest (Z) has the value NIL, (and this is what MathScript prints if Rest (Z) is entered), it is clear that First (Rest (Z)) will return the error that First is applied to an atom.

The compositions of First and Rest occur so frequently that special selectors are available as shortcuts. These are Firstlist and Restlist.

Firstlist( lis, n) returns the list consisting of the first value[n] items of the list value[lis].

For example, Firstlist(lis,1) is the same as First(lis). Firstlist(friends,3) has value:

(Sally Jeremy Michael).

Restlist( lis, n) returns the list consisting of the items after the value[n] item of the list value[lis]. For example, Restlist(lis,1) is the same as Rest(lis). Restlist(friends,3) has value:

(Lisa, Emily).

We have answered a few of the questions that opened this section. A few remain. Suppose for example we wanted a function which would return the last element of a list regardless of the length of the list. A little reflection shows that all we have done so far is inadequate for the

problem because the selectors choose specific pieces, and the position of the last element of a list is not known a priori, since the length of the list is not known. Now we may build such a function recursively, and this opens up a fascinating realm of possibilities, for which the use of lists is well suited.

First of all, we could, as we have seen above, get the last element of Friends by entering First( Restlist(friends, 4)). And in general, we could accomplish the same in a non-recursive way with the length function:

First ( Restlist (lis, length(lis) - 1) )

But let's explore another way. We'll define a program that calls itself to accomplish what we want. Programs that cause themselves to be called while they are executing are called recursive. While anything that can be done recursively can also be done non-recursively (and many should!) recursive programs often have a simplicity and elegance that one only finds otherwise in Mathematical proofs. And they are as delightful to discover as a theorem. Finally, for many real-world, heavy-duty applications, such as parsing, the only conceivable programming solution is a recursive one.

Let us represent the general list whose last item we want by the atom Y. We shall mean for the value of Y to be this list, so we think of Y in this context as the name of the list. In this case, Y is a variable, just as we thought of friends as a constant. Now let us think through the strategy for getting the last item in Y. There are several cases:

CASE 1: Value[Y] = the empty list. In this case there is no

last element, in fact the list has no elements. Our program

should return an error message of some sort. We

shall handle this problem a bit later.

CASE 2: Value[Y] is a list with precisely one item such as

(thisitem) or ((this is an item)) or ( 123 ). In this

case, the program should return First (Y), the first,

hence the last item of the list. Note that when First (Y)

is evaluated, Y will be evaluated first, returning the list,

and then First will be applied to the list, as desired.

CASE 3: Value[Y] has more than one item, for example the list

( 0 (0) (0 (0)) ) which contains 3 items. In this case

the problem is too large to handle. We might try to

simplify it by handing our function Rest ( Y) and asking

for the last element of that! The key observation here

is that the last item in the list which is the value of

Rest (Y) is the last item in the list which is the value

of Y in case the value of Y has at least two elements.

These cases exhaust the possibilities. The strategy of testing and

handling simple cases, and passing simplified versions of more difficult

cases back to the original function is very common in MathScript. It is called

Recursion and the example sketched above is the simplest sort of recursion

called tail recursion. Let us see how this strategy might be implemented.

Now given Y, it is possible to test easily which of the three cases above applies.

If value[Y] is the empty list, then we are in CASE 1.

If value[Rest (Y)] is the empty list, then we are in CASE 2

Else we are in CASE 3.

Let us create the program Last that will take a single list argument, and return the last item in that list. The format is:

Program Last (li X) { <body > }

We leave open the question, for now, of what appears in the < body > above.

MathScript will store the definition of the program Last, and whenever it evaluates

a statement of the form Last(Y) where the value of Y is some list, it will return the last item of that list.

Recall the strategy we outlined for defining Last.

When Last (Y) is evaluated, where value[Y] is some list, say L,

  1. Last should first check whether value[L ] = empty list. In this case, it should return

an error message.

  1. If value[L ] is not the empty list, then Last should check whether L has just one item,

that is, whether Rest (Y) evaluates to the empty list. In this case, Last should

return First (Y), the first and last item of L.

  1. Otherwise, Last gives up this round. But it removes the first item of L

and hands the rest of L to a copy of itself. Then the value of this

expression will be the value of Last (Rest (Y)). This recursion must end

because eventually the successive Rests will reduce to a list with a single

element and then CASE 2 will take over.

How do we test whether a list is empty? Simply use the MathScript test Function Empty.

Test Functions like Empty, IsNumber, IsAtom, and relations such as ==, <, >, <= , >= , and not

are indispensable for programs like this one.

Let the body of Last be the if statement in which the parameter X is

assumed to evaluate to the list L = value[Y] when the statement Last(Y) is made.

Program Last (li X) {

if empty(X) then {print "Error!" } else

{ if empty(Rest(X)) then {return First(X)} else { Last(Rest(X)) } }

}

This looks as though it should work. The only problem is that it doesn't compile. The compiler complains that Last is an unknown term. That is because the program hasn't been created before the compiler comes to the internal mention of Last.

The way around this problem, as for all recursive program definitions, is to declare to the Compiler that Last is a function name in a Local Statement:

Program Last (li X) {

local (fn last) {

if empty(X) then {print "Error!" } else

{ if empty(Rest(X)) then {return First(X)} else { Last(Rest(X)) } }

}

}

Now, the compiler is happy. And if you type

print last(friends)

the program prints: Emily.

And if you try

print last(list())

the program prints

Error!

Next, we discuss a few other operations for manipulating lists.

The Append Function is useful for joining lists together. Actually Append takes any number of arguments and joins their values, which must be lists, to form a single long list.

Append ( list(1, 2 ,3), list(4,5), list(6) )

returns (1 2 3 4 5 6) then.

A point to be made here is that the last argument to Append may evaluate to the empty list, but the first arguments may not. If it does, the system complains that it is an atom. Append builds new list structure on each call. Thus

Make MyCopy Append( MyList, False )

Creates a copy of MyList.

A function which behaves like Append but does not build copies is Concatenate.

Concatenate changes the list pointers to return its value. In general it is faster than APPEND. It can take any number of arguments, all of which must evaluate to lists (the last may be the empty list)).

Given two lists (A B) and (F G)

(1) (2)

* *

/ \ / \

A * F *

/ \ / \

B () G ()

Suppose that value[X] = (A B) and value[Y] = (F G) then

Concatenate(X,Y) creates the list (A B F G) by replacing the first pointer to ( ) with the dotted

pointer to node (2).

(1) - - -> (2)

* | *

/ \ | / \

A * | F *

/ \ | / \

B () - - G ()

The call does the surgery, and what is returned is the new list:

(1)

*

/ \

A *

/ \ = (A B F G)

B *

/ \

F *

/ \

G ()

Now when this is done, the value of X is (A B F G) since it returns the list pointed to by node (1). And this has been changed! Y has not been altered at all. The action is similar

when a longer list of lists is concatenated. This action has the effect of modifying all items in the

system which contained pointers to node (1), the original X.

Thus Concatenate is a 'dangerous' function, to be used with care. To see

the possibilities try

make y list (H,J)

make z concatenate( y, y);

You have created a Circular List called Z. If you attempted to print Z, you would crash the

system because MathScript would try to print this infinite list.

(H J H J H J H J H J H J ..... ). Nevertheless circular lists such as this are very useful in some applications.

While we are talking about 'dangerous' list-manipulating functions, we mention two others: Replacefirst and Replacerest.

Replacefirst surgically replaces the first item in a list with a new item. For example Given the lists:

(1) (2)

* *

/ \ / \

A * F *

/ \ / \

B () G ()

with value[X] = (A B) and value[Y] = (F G) then

Replacefirst(X,Y)

has the effect of replacing the First item of the list X with the item Y.

The result is to change the value of X to ( ( F G) B )

(1)

*

/ \

* *

/ \ / \

F * B ()

/ \

G ()

Given the lists: value[X] = (A B) and value[Y] = (F G)

(1) (2)

* *

/ \ / \

A * F *

/ \ / \

B () G ()

Replacerest(X,Y)

has the effect of replacing the Rest of the list X with the list Y. It does this by changing pointers:

(1) ------->(2)

* | *

/ \ | / \

A * -------- F *

/ \ / \

B () G ()

The second argument to Replacerest must be a list. The call causes the value of X to be (A F G)

(1)

*

/ \

A *

/ \

F *

/ \

G ()

The next function we discuss, Node, is the most basic list creation primitive. Node is not dangerous. It takes two arguments, and given

Node(U,V) here is what it does.

Suppose x = value[U] and y = value[V] are either atoms or lists. These are represented by pointers to nodes.

x: * y: *

/ \ / \

... ... .. ..

Node( U, V ) returns z where z is the following:

z: *

/ \

* *

x y

For example,

Node(1,2)

returns then the dotted pair (1:2).

Node( 1, list(2) ) returns the list (1 2). In general, if the second argument to Node is a list, the return value will be a list. In fact, if value[L] is a list, then Node(W, L) is the list obtained by

enlarging the list value[L] by attaching w = value{W) to the front. There is a similar function Adjoin that adds an item to a list:

Adjoin( W , L) attaches value[W] to the list value[L] only if value[W] is not already an element at the top level of value[L]. Thus,

Adjoin( 1, list(2,3,4, list(1,2)) ) returns (1 2 3 4 ( 1 2) ), but

Adjoin( 2, list(2,3,4) ) returns (2 3 4)

The reader is invited to experiment with Node. It is the basic tool used by MathScript for the creation of new structure. Try to decide why Node does not have the same effect as Append.

It happens at times that a function returns a list, and then it is desired to apply a function of a single argument to each of the items in that list, and to return the list of results.

For example, suppose value[X] = (1 2 3)

And suppose you had defined the squaring function:

make sqr(x) x^2;

Then the Map function lets you apply sqr to each of the elements in the list, and it returns the list result. Thus,

Map( sqr, X ) returns (1 4 9)

The final functions that we discuss here are Lookup and Member. Lookup is designed to work with Association lists. An Association List is a list of lists of the form:

Value [ AList ] = ( (key1 ,...) (key2, ...) ... (keyn, ...) )

The first item in each sublist is a key, a variable, or expression. The remaining item(s) in each sublist are the objects to be associated with the key. If you are interested in looking up the items associated at any time with a key, you may call:

Lookup ( key, Alist )

then if value[key] is one of the key1, key2, ..., keyn, the entire associated sublist is returned, otherwise the empty list is returned.

Member has a simpler job. If value[L] is a list, then use member to learn if some item is in value[L] at the top level. For example, if key is an arbitrary item (atom or list),

Member ( key, L) returns the empty list if value[key] is not value[L] at the top level. If it is in value[L] then it returns the list, starting with the first occurrence of value[key] in value[L], and continuimg to the end of value[L].