Level 2: About SML

type – a set of elements having common computational characteristics

Slides covering this material can be found here.

Primitive Types and Values

Calculator isolated on white

Numerical calculators provide buttons for a collection of common operations such as plus (+), minus (-), times (*), and divide (/). Programming languages are similar and typically have a collection of values for which they provide a predefined set of operations which include plus, minus, times, and divide.

Values which are “native” to a programming language are called primitive values. There are two sets of primitive values that we are particularly interested in at this time: integers and strings.

From a mathematical perspective, the set of integers is infinite. In contrast, for all programming languages the set of integers is (large but) finite. In SML, there are 231 integer values. Half of these values (230) form the set of non-negative integers (note that this means there are 230-1 positive integers) and the other half form the set of negative integers.

In SML, the unary minus sign is denoted (∼). A negative integer is formed by putting the unary minus sign in front of a non-negative integer. Normally, people would write -1, -27, and -42 to denote the numbers minus 1, minus 27 and minus 42. In SML, we write ∼1, ∼27, and ∼42.

The set of integer values for the language SML is shown below.

{ ∼230, …, ∼1, 0, 1, …, 230-1 }

In SML, this set of integer values is referred to as the type int. We say: The number 5 is a value of type int. By this we mean that 5 is an element of the type (i.e., set) int.

Strings

Antique Typewriter

A string is a sequence of zero or more characters enclosed in double quotes. Examples of strings include: “hello world”, “this is a string”, and program 3″. The empty string consists of zero characters and is denoted “”.

For now, we will limit the characters used in strings to those on a standard keyboard (think of the characters corresponding to keys on an old fashioned typewriter). The only exception being that we will forbid the backslash (\) and double-quote (“) symbols from occurring as a character in a string.

In SML, this set of values is referred to as the type string. We say: The string “hello” is a value of type string. By this we mean that “hello” is an element of the type (i.e., set) string.

Primitive Operators

At this stage, we will discuss the primitive operators listed in the Table below.

Operator    SML Symbol    Example
addition + 11 + 4 = 15
subtraction 11 – 4 = 7
multiplication * 11 * 4 = 44
integer division div 11 div 4 = 2
modulus mod 11 mod 4 = 3

It is assumed that the reader is familiar with addition, subtraction, and multiplication. However, it is worth discussing the semantics (i.e., meaning) of the integer division and modulus operators in a bit more detail. The result of an integer division operation is called a quotient and represents the integer portion of the result of dividing one integer by another. The result of the modulus operation is called the remainder and represents the integer amount that remains when integer division stops. The div and mod operators can be formally defined as follows:

x div y = a ⇔ 0 ≤ x – a*y ∧ x – a*y < y

x mod y = b ⇔ ∃ a : a*y + b = x ∧ 0 ≤ b ∧ b < y

 

One thing worth noting about integer operations is that they are closed. This means that performing an integer operation on two integers will produce a result that also belongs to the set of integers (i.e., the result is also an integer). In contrast, the real-valued division (e.g., floating point division) operation is not closed over the set of integers. For example, 1/4 = 0.25 which is not an integer. While on the other hand, both 1 div 4 = 0 and 1 mod 4 = 1 are integer values.

In practice, we will consider the integer operations on the SML type int (i.e., the set int) to be closed. Technically speaking, this is not true. For example, multiplying two very large values of type int can produce a value larger than 230-1 which is the largest value in int. However, the magnitudes of numbers we will be working with are “small” enough that integer operations will produce results that belong to the type int.

Integer-based Arithmetic Expressions and their Evaluation

Abacus

An integer-based arithmetic expression is an expression consisting of one or more integer values and variables (discussed in an upcoming section) together with zero or more (binary) integer operations (e.g., 1 + 2 * 3 – 4 div 5 mod 6). The evaluation of an integer-based arithmetic expression consists of a sequence of integer operations. A binary operation reduces (or simplifies) two integers and one operator to a single integer (e.g., 3+5 ⇒ 8). Evaluation is complete when the entire expression is reduced to a single integer, at which point, no further reduction is possible. In the context of evaluation, expressions consisting of a single integer are considered to be in normal form.

Expression    Normal Form   
1+2+3 6
1+2*3 7
1-2-3 ~4

In SML, the evaluation of integer-based arithmetic expressions proceeds as it does in regular math, with a preference given to left-to-right evaluation when choices are available. For example, there are two possibilities for evaluating the expression 1+2+3.

  • Option 1 – Left-to-right: 1+2+3 ⇒ 3+3 ⇒ 6
  • Option 2 – Right-to-left: 1+2+3 ⇒ 1+5 ⇒ 6.

Note that when evaluating an expression, multiplication, division, and modulus operations are performed before addition and subtraction operations. This is because multiplication, division, and modulus have a higher precedence than addition and subtraction.

While some expressions, like 1+2+3, can be evaluated either from right-to-left or from left-to-right, other expressions, like 1-2-3, must be evaluated from left-to-right. Specifically, expressions involving the division, modulus, and subtraction operators must be evaluated from left-to-right. This is because these operators are left associative.

The following table shows the precedence and associativity rules for integer operators in SML. These are the rules that SML uses when evaluating integer-based arithmetic expressions.

Operator    Precedence    Associativity
+, – 6 left associative
*, div, mod 7 left associative

As shown below, parenthesis can be used to alter the evaluation order of an expression.

  • 2+3*4 ⇒ 2+12 ⇒ 14
  • 2+(3*4) ⇒ 2+12 ⇒ 14
  • (2+3)*4 ⇒ 5*4 ⇒ 20

One way to think of this is that parenthesis create sub-expressions within larger expression. For example, in the expression 3*(4+5), contains the sub-expression (4+5).

In SML, as well as most modern programming languages, sub-expressions are evaluated in the order in which they are encountered. Consider the evaluation of the following expression which contains two sub-expressions.

2*(3+4)*(5+3) ⇒ 2*7*(5+3) ⇒ 14*(5+3) ⇒ 14*8 ⇒ 112

 

In the examples we have seen so far, the evaluation of expressions containing sub-expressions is relatively simple. However, sub-expressions can be nested within other sub-expressions thereby creating an expressions with complex structures.

(2+3)*(2+(3+4)*5) ⇒ 5*(2+(3+4)*5) ⇒ ⇒ 5*(2+7*5) ⇒ 5*(2+35) ⇒ 5*37 ⇒ 185

Tuples

In SML, a tuple value is a comma-separated sequence of two or more values enclosed in parenthesis. The type of a tuple value is an asterisk separated sequence consisting of the types of the values that make up the tuple.

Tuple Value    Type
(5,6) int*int
(5,6,7) int*int*int
(5,”hello”) int*string

A special case occurs when a “tuple” contains fewer than two values. In the case where the tuple contains zero values, we write (). This is a special value, whose type, in SML, is called unit. A tuple consisting of one value, is considered to be a parenthesized expression.

Tuple Value    Type
() unit
(5) int

Tuple Expressions and their Evaluation

A tuple expression is a comma-separated sequence of expressions enclosed in parenthesis. The evaluation of a tuple expression is accomplished by evaluating the expressions in the tuple from left-to-right.

Tuple Expression Evaluation Sequence
(5+2,4+6) (5+2,4+6) ⇒ (7,4+6) ⇒ (7,10)
(5*(2+3),(2+3)+(4+5)) (5*(2+3),(2+3)+(4+5)) ⇒ (5*5,(2+3)+(4+5)) ⇒ (25,(2+3)+(4+5)) ⇒ (25,5+(4+5)) ⇒ (25,5+9) ⇒ (25,14)

Variables, Names, and Identifiers

In SML, a variable is a language construct which can be associated a value and a type. In this section, we will not focus on the specifics of how a variable can be associated with a value. Instead, we will explore the ideas underlying variables. In books on programming languages, the terms variable, name, and identifier (id) are often used loosely in an interchangeable manner. Conceptually, a variable represents a place where values can be stored. An identifier is how the value in a place can be referenced within a program. In SML, a value always has a type associated with it.

Variable Name/Id Value Type
x x 5 int

In Bricklayer Level 2, we are interested in declaring functions. Part of a function declaration involves giving a name to the function. The understanding of a function can be significantly improved by naming it appropriately. In order to do this, it helps to have an understanding of the rules constraining the choice of names (aka identifiers).

We will define an identifier as a sequence of characters beginning with an alphabetic character (e.g., a-z or A-Z) followed by zero or more alphanumeric characters (e.g., a-z, A-Z, or 0-9) or underscores. Note that this definition of identifier is a subset of the set of legal SML identifiers.

Tokens and Spaces

Opening an old book.At a low level, a novel can be described as a sequence of characters. At a higher level, a novel could be described in terms of a sequence of words, sentences, paragraphs, or chapters.

Though some humans have remarkable reading abilities, the normal way to read a novel is sequentially – one word at a time, from left-to-right, top-to-bottom, one page after another. In order to accomplish this, we must be able to distinguish words (among other things). To recognize a word, it is necessary to know where the word begins and ends. In natural languages (such as English), spacing and punctuation is used to enable the reader to distinguish words. In computer terminology (i.e., nerdspeak) the grouping of characters into words is known as lexical analysis.

Whether words are separated from one another by one space or more than one space is irrelevant. Furthermore, whether a sentence begins on one page and ends on another page, or begins and ends on the same page is inconsequential to the meaning of the sentence. However, removing all spacing between words can change their meaning as shown in the example below.

They have come to get her.versusThey have come together.

 

Similar issues arise in computer programs. To the computer, a Bricklayer program is a sequence of characters stored in a file having a dot-bl extension. One of the first steps a computer does to prepare a program for execution is called lexical analysis. In lexical analysis, the sequence of characters representing the program to be executed is translated into a sequence of tokens, which is the computer programming version of a word (i.e., a token is to a computer program what a word is to a novel).

White space (e.g., blanks and newlines) is one of the primary ways for a programmer to separate one token from another. It is always safe to put one (or more) spaces between tokens.

The Bricklayer function call put2D_2x4_BLUE(0,2) contains the following six tokens.

Token Comment
put2D_2x4_BLUE Bricklayer function name
( reserved symbol – left parenthesis
0 value of type int
, reserved symbol – comma
2 value of type int
) reserved symbol – right parenthesis

In the table below there are four versions of the function call put2D_2x4_BLUE(0,2). To the computer all four calls mean exactly the same thing.

put2D_2x4_BLUE(0,2)
put2D_2x4_BLUE   (0,2)
put2D_2x4_BLUE   (0,  2)
put2D_2x4_BLUE   (0,  2    )

Note that white space (e.g., blanks and newlines) can be used to format code to make it more understandable. Formatting is an important component of good coding.