Relationale: A Brief Overview

I. Introduction

Relationale is a domain-specific language implementing relational algebra atop a general-purpose host language. Its features include a static type system, type inference, lexical scope, and delayed evaluation. By keeping its values in the same type system as the host language, it avoids any type system impedence mismatches. A proof-of-concept is being developed in Python.

II. Type System

A familiar convention is followed to distinguish types from values at the lexical level: names of types begin with an uppercase letter, and names of values begin with a lowercase letter. Thus, the type for strings is String, but a string variable might be called string.

A. Scalars

A scalar is a single, atomic value. Scalars are representations of objects from the host language, and their data type corresponds to the class in the host language of which the scalar is an instance. To expose a host object to the Relationale type system, a chaperone class is defined for each type of scalar, which defines how to perform basic operations on values of that type such as testing for equality, comparison, construction, indexing, etc.

There are four scalar types guaranteed to be present in Relationale: Bool, Int, Float and String; in Python, these correspond to the bool, int, float and str classes respectively, and complete chaperones for these classes are provided by Relationale. Furthering their special treatment in the language, there exists a literal sytnax for each of these core types:

# Bools are represented by the keywords true and false
i_am_a_liar := true
king_is_bald := false

# Ints and Floats are denoted as expected
answer := 42
pi := 3.141
n_a := 6.022e23

# String literals have a syntax similar to in C
my_string := "this is a string\nwith an embedded newline"

For other scalar types, there is a generalized syntax for constructing new values. This is accomplished by passing zero or more core scalar arguments to a type constructor as follows:

address := Email("foo@bar.com")
some_date := Date(2006, 10, 30)
meeting_time := Time(11, 30)
nothing := None()

Above, address would be a scalar of type Email; some_date a value of type Date; meeting_time a value of type Time; and nothing a value of a type denoting nothingness, None.

Constructors may be used to convert values from one type to another. For example, we may convert a string to an integer using the Int constructor, and then to a floating-point value using the Float constructor:

str_value := "853142"
int_value := Int(str_value)
float_value := Float(int_value)

If construction fails (for example, if the string passed to Int is malformed), a constructor throws a runtime error.

Because Relationale has a static type system and performs type inference, given the following code,

address := Email("foo@bar.com")
copy := address

Relationale can infer that copy is also a scalar of type Email. In the event the type inference engine is unable to handle your expression, you can use the type qualification operator, :: as discussed below. Scalar types are denoted by the names of their constructors.

B. Tuples

A tuple is a collection of slots, or mappings of names to scalar values, and is similar to what is known as an associative array or a hash in other languages. The following is an example of constructing a tuple describing an employee:

joe := (emp_name: "Joe Smith", dept_id: "Finance", emp_id: 1053,
hire_date: Date(2005, 10, 1))

The order in which the name/value pairs are listed is not significant. Tuples may have no name/value pairs at all, but the usefulness of empty tuples is questionable. Variables may be given as slot values when constructing tuple literally, and variable bindings will not collide with slot names since slot names are "quoted:"

emp_name := "Bob Roberts"
bobs_hire_date := Date(2001, 4, 12)

# emp_name will be substituted only on the value side of the first slot
bob := (emp_name: emp_name, dept_id: "Management", emp_id: 812,
hire_date: bobs_hire_date)

Like all Relationale values, all tuples have corresponding types. And while the type inference algorithm is smart enough that you'll probably never have to type a tuple yourself, the type of all the above tuples would be denoted as follows:

(emp_name :: String, dept_id :: String, emp_id :: Int, hire_date :: Date)

This is called a type signature, and specifically here a tuple signature.

C. Relations

The workhorse of Relationale, a relation is a set of zero or more tuples, all of the same type. Its literal syntax is a list of tuples separated by commas, enclosed by curly braces:

employees := { (emp_name: "Alice Johnson", dept_id: "Sales", emp_id: 1210,
hire_date: Date(2006, 4, 18)),
joe, bob, linda }

Note that a relation is in fact a set and not a multiset (as is a common source of confusion), meaning that duplicates are ignored and order is insignificant. This means the following two relations are in fact equal to each other:

r1 := { joe, joe, joe, joe, bob, bob, joe, bob, joe }
r2 := { bob, joe }

The type of a relation is expectedly based on the type of tuples it may contain. The above relations would share the following type:

{ (emp_name :: String, dept_id :: String, emp_id :: Int, hire_date :: Date)* }

Here you may read the asterisk as you would the Kleene star in regular expressions: a relation of this type may consist of zero or more tuples of the enclosed type. In most cases, the type system will correctly infer the type of relations, but there are times when you must type relations explicitly, for example when constructing empty relations.

D. Promises

A promises is a way to delay the evaluation of an expression, and is written by enclosing an expression in square brackets. For example, the following code assigns a promised expression to the variable get_candidates. While the expression says to return all the items in the applicants relation which are not also in the rejects relation, it isn't actually evaluated at this point.

get_candidates := [applicants - rejects]

Clearly a promise by itself is not very useful. What is needed is a way to "unbox" the expression and evaluate it as required. This is accomplished by the force keyword:

current_candidates := force get_candidates

As can be seen, the value of a forced promise depends on the state of the environment when it is being forced. Separate invocations of a promise will generally evalulate to different values.

Promises may delay only expressions, not statements. This is because only expressions carry types, and a promise, like any other Relationale value, must have a type. The signature of a promise, then, is the type of the delayed expression enclosed in square brackets. Depending on the type of the applicants and rejects relations, then, get_candidates might have a type of:

[{ (name :: String, app_date :: Date, app_id :: Int)* }]

The forced expression would have as its type the unboxed contents of the promise type.

1. Constraints

Promises offer an extremely convenient and straightforward way of expressing constraints. Relationale provides an assert function which takes a promise of type [Bool] as an argument. All asserted promises must evaluate to true after a statement is executed, or the statement is rolled back and an error is raised. The following example asserts what would be called a foreign key constraint in SQL. It states that there must be no values of dept_id in the employees relation (the foreign key) that are not also in the departments relation (the primary key):

dept_id_constraint := [size((employees -> (dept_id)) - (departments -> (dept_id))) = 0]
assert(dept_id_constraint)

E. Views

Suppose we had a set of organization members, and we commonly want to separate the adults from the children. One way to do this would be to create two promises, forcing them whenever we want the values (the syntax for | will be discussed later):

get_children := [members | [age < 18]]
get_adults := [members | [age >= 18]]

# Later on...

children := force get_children
adults := force get_adults

This would be less than ideal though, since we must remember to force the promises whenever we want the values currently in sync with the members relation. A better way would be to have special promises that force themselves whenever they are named. These self-forcing promises are called views, and are constructed with the [:=] operator as follows:

children [:=] members | [age < 18]
adults [:=] members | [age >= 18]

# Or even better:
adults [:=] members - children

Now, children and adults may be treated as relations, and their values will automatically reflect the contents of the members relation. (Updates to views are not planned for the initial version of the language.) As such, the following statements would be semantically equivalent:

boys := (members | [age < 18]) | [gender = Gender("Male")]
boys := children | [gender = Gender("Male")]

Views have a type signature very similar to promises: they're identical except for a trailing ! indicating that the view is self-forcing. Assuming a simple type for the members relation, the type signature of both the children and adult views above is:

[{ (id :: Int, name :: String, age :: Int)* }]!

III. Operators

A. Relational Operators

These operators are derived from relational algebra. Each returns a new relation, satisfying the closure property of the relational algebra. All relational operators, with the exception of project, have a variant for compound assignment; for the sake of brevity, an example will be given only for union.

1. Union

Given two relations r1 and r2 of the same type, the union operation

r1 + r2

returns a new relation of the same type as the arguments by combining the tuples from both operands. No duplicate tuples will be present. This may be used to add new tuples to a relation, as in the following example:

employees := employees + { alice, bob }

As with the other relational operators (except ->), this statement may be abbreviated with a compound assignment operator as follows:

employees += { alice, bob }

2. Intersection

The following expression evaluates to the tuples common to both r1 and r2:

r1 & r2

3. Difference

This expression evaluates to all tuples in r1 which are not also in r2:

r1 - r2

4. Symmetric Difference

Symmetric difference is the set-theoretic equivalent to exclusive or, and so the following expression evaluates to all tuples in either r1 or r2, but not both:

r1 ^ r2

5. Cartesian Product

A Cartesian product of two relations is another relation populated by tuples built by combining every tuple in the first with every tuple in second. For example, given the following relations:

r1 := { (letter: "a"), (letter: "b") }
r2 := { (number: 1), (number: 2), (number: 3) }

the Cartesian product, denoted by

r1 * r2

would be equal to:

{ (letter: "a", number: 1),
  (letter: "a", number: 2),
  (letter: "a", number: 3),
  (letter: "b", number: 1),
  (letter: "b", number: 2),
  (letter: "b", number: 3) }

Note that r1 and r2 may have no duplicate slots.

6. Natural Join

The natural join of two relations r1 and r2, denoted by

r1 @ r2

is the relation containing all combinations of tuples in r1 and r2, with all common slots equal. Thus for the following relations:

employees := { (emp_name: "Josh", emp_id: 1, dept_id: 1),
               (emp_name: "Sue", emp_id: 2, dept_id: 1),
               (emp_name: "Bob", emp_id: 3, dept_id: 2) }
departments := { (dept_id: 1, dept_name: "Marketing"),
                 (dept_id: 2, dept_name: "Sales") }

Their natural join would be equal to:

{ (emp_name: "Josh", emp_id: 1, dept_id: 1, dept_name: "Marketing"),
(emp_name: "Sue", emp_id: 2, dept_id: 1, dept_name: "Marketing"),
(emp_name: "Bob", emp_id: 3, dept_id: 2, dept_name: "Sales") }

The relations being joined must have at least one common slot.

7. Project and Rename

Project and rename, often implemented as two distinct operations in many axiomatizations of the relational algebra, are implemented by the same operator in Relationale:

employees -> (emp_name as name, ...)

This expression evaluates to a new relation with the same contents of employees, but with the emp_name slot renamed to name. (The ellipsis says to leave the remaining slots alone.) Using this projection template, one may also eliminate slots, as the following example demonstrates:

employees -> (emp_id)

This evalutes to a relation populated with only the emp_id slot of employees.

8. Filter (a.k.a. Restrict, Select)

The filter operator selectively extracts tuples from a relation into a new relation. It takes as its right-hand side a promise of type [Bool]. For each tuple, if the promise evaluates to true, it is retained; if false, it is discarded:

employees | [dept_id = 2]

The environment of the promise is populated by the names of the slots of the relational expression on the left-hand side. If this creates a collision with a desired binding, one may qualify the relation:

employees as e | [e.dept_id = 2]

9. Update

people := people << [(age: age + 1)]

Update takes a promise that must evaluate to a tuple whose type is a subset of the updated relation's tuple's type. The promise is executed once per tuple in the relation (with the names of the relation bound in the promise's namespace exactly like filter), and the resulting tuple is used to set the values of the relation on the left-hand side of the << operator. If the tuple is has fewer slots than the relation being updated, only the given slots are changed.

Because this operator takes a promise, relations can be updated in terms of themselves. Like filter, it accepts name qualifications when populating the environment of the promise. Further, updates may be abbreviated with the compound assignment operator <<= as follows:

people <<= [(age: age + 1)]

10. Outer Join

There is no operator for outer join in Relationale. Instead, it is implemented as a function that takes three arguments: the two relations being joined and a placeholder value to mark missing values. (The placeholder is necessary since as Relationale has no equivalent to SQL's NULL.) For example:

result := outer_join(a, b, None())

C. Type Operators

Because Relationale performs type inference, you will seldom need to declare explicitly the types of your values. However, there are times when the type inference algorithm is unable to infer the intended type—a common example would be the creation of an empty relation. Two type operators help solve this: type qualification, ::, and type declaration, ::=.

Suppose we wish to create a relation that will contain employees, but have nobody to enter yet. We might say:

employees ::= { (emp_id :: Int, emp_name :: String)* }
employees := {}

This would be semantically equivalent to the inline type qualification:

employees := {} :: { (emp_id :: Int, emp_name :: String)* }

While you may declare and qualify any variable you like, the type engine is usually smart enough to infer correctly its type (excepting the case of empty relations). Providing an incompatible type will cause the type engine to raise an exception.

D. Comparison Operators

Scalar arguments; which are defined for a scalar type is determined by its chaperone.