Archive

Archive for January, 2012

Another Example of Range Termination

Continuing on from my previous post on Range Termination in Go, I’d just like to present another example I’ve come across on how it can be helpful, especially with declarative programming. The following code is for a Graph data structure I’m working on in Scala (available on my github). At the moment, without range termination, a definition for the boolean function, `isDirectedForest`, looks like this:

def isDirectedForest: Boolean =
  if ( this is cyclic )
    return false
  else {
    for ( v <- BFS over this from lastVertex )
      if ( ( this indegreeOf v ) > 1 )
        return false
    return true
  }

It works, but there are several problems. First, the most obvious, is that it’s not immediately clear what the function is actually doing. Secondly, there are several exit points; clean coding style encourages having only one exit point for each function. Third, for the loop, this is the only way we can actually achieve early termination (sorry, I refuse to use `break`). The range termination syntax I presented in the previous post can be used here to give a much more elegant definition for this function:

def isDirectedForest: Boolean = {
  var maxIndegreeIs1 = true
  if ( this isNot cyclic )
    for ( v <- BFS over this from lastVertex; maxIndegreeIs1 )
      maxIndegreeIs1 = ( this indegreeOf v ) <= 1
  return ( this isNot cyclic ) && maxIndegreeIs1
}

Which is a much more declarative definition of the function – we’re not defining it so much as we are describing what it does. The final (and only) return statement succinctly tells us what makes a graph a directed forest – it is such if the graph (this) is acyclic and the maximum indegree of any vertex is 1.

Binary Hash Tree

I’m not sure if this exists already, but it’s a straightforward data structure for maintaining a set of objects that don’t have any natural ordering, that is, they can’t be sorted and therefore can’t be put in a binary tree under normal circumstances. Take for instance a universe of furniture types, F, which contains all the different types of furniture in existence, and you want to maintain a subset of this universe, F = { f | f ∈ U }. The typical options would be linear array, hash table, or a tree to maintain the set. Lookup in the linear array would have to be performed using sequential sort, since there’s no natural ordering of the elements, so no sorting, so no binary sort, so lookup in the linear array has and O(n). A hash table would give a decided improvement initially, but since there will be so many members the initial gains will be quickly forgotten as clustering and long buckets (again unsortable) give rise to O(n) complexity once again. A tree is the next option, but the question is how to distribute the elements.

Elements of a balanced binary hash tree, with their hash subscripted to the right

The binary hash tree is simply a binary tree that has the hash of the element as its key. Theoretically, all hash codes of elements of a universe are unique, so theoretically no collisions can occur in this tree, as we are not mapping the hash codes to a domain, as would be the case in a hash table (in practice, we are bounded by the number of bits we use to hold the hash code). The data structure is very straightforward, and can be enhanced by balancing the tree to give O(log2(n)) time. We can also use this hash code “ordering” to sort the collection and then perform a binary search through the array (which is essentially the same as searching the binary tree, just using a linear data structure), which will give us O(log2(n)) again. Note that we can’t use this structure as a bucket for hashing, as all elements in a bucket have the same hash code already.

Range Termination in Go

I’ve had an simple idea for a while that I’d like to see implemented in modern languages – a terminating condition for iteration statements. Instead of breaking out of a loop with an `if` and `break` (cousin to `goto`, and just as bad in my opinion), include a terminating condition in the loop definition. I haven’t seen this feature implemented in any mainstream languages so far, and it seems to me to be a handy one to have. Take, for example, sequential sort with early termination in Java, using an iterator:

public static<T> boolean collectionContains( Collection<T> ts, final T e ) {
    int i = 0;
    for ( T t : ts ) {
        if ( e == t ) {
            break;
        }
        i++;
    }
    return ts.size() == i;
}

I would prefer to be able to write something like the following:

public static<T> boolean collectionContains( Collection<T> ts, final T e ) {
    int i = 0;
    for ( T t : ts; e == t ) {
        i++;
    }
    return ts.size() == i;
}

A very simple concept, but I feel it tidies up the code dramatically, and is a feature I feel they should have included when they first introduced the extended for loop.

I wrote a small hack to introduce the feature into the current Go compiler. I’ve only recently looked into learning Go, but it’s philosophy appeals to me and it’s thinking seems to lend itself to the idea I’m presenting here. The following are changes I’ve made to the Go compiler to add on the range termination:

In go/src/cmd/go.y (the lexer and parser), line 604, in rule 'for_header':
}
|   range_stmt ';' osimple_stmt
    {
        // osimple_stmt is the early termination condition
        $1->ntest = $3;
    }
|   range_stmt

This simply checks if a semi-colon appeared after the range, and if it does, adds the conditional that appears after it to the range node (range nodes don’t have the `ntest` member populated initially). Like I say, this is only a hack, if I were to actually implement this change I wouldn’t modify the range node after it was created, I would instead modify it during its  construction in the range_stmt rule. Anyway, moving on:

In go/src/cmd/range.c (typechecking for ranges), line 99, in function 'void walkrange(Node *n)':
    Node *fn, *tmp;
    Node *ntest;
    NodeList *body, *init;

This simply adds the `ntest` node for storing the current contents of `n->ntest` that we created above, as the contents of `n->ntest` are overwritten in the `switch` further on in the function.

Line 131:
    }

    ntest = N;
    if (n->ntest) {
        typecheck(&n->ntest, Erv);
        ntest = n->ntest;
    }

    switch(t->etype) {

Here we initialise `ntest` to n->ntest if it exists, after we check its validity.

Line 160:
        n->ntest = nod(OLT, hv1, hn);
        if (ntest != N) {
            typecheck(&n->ntest, Erv);
            n->ntest = nod(OANDAND, n->ntest, ntest);
        }
        n->nincr = nod(OASOP, hv1, nodintconst(1));

In the first switch statement, in the case that we are iterating over an array, `n->ntest` is overwritten and the terminating condition is set to be when the index is greater than or equal to the size of the array. So to add our extra terminating condition (if there is one), we first check the validity of the existing `n->ntest` (this would be carried out at the bottom of the function, but since I’m modifying the node I’ll do it here as well) and then “AND” it with our terminating condition (stored in `ntest`). The approach is the same for map and string, but not for channel, as the channel case doesn’t actually modify `n->ntest`, so we needn’t bother with it. Again, this is only a quick hack to implement the feature.

Line 191:
        n->ntest = nod(ONE, nod(OINDEX, hit, nodintconst(0)), nodnil());
if (ntest != N) {
typecheck(&n->ntest, Erv);
n->ntest = nod(OANDAND, n->ntest, ntest);
}

Line 251:
        n->ntest = nod(ONE, hv1, nodintconst(0));
        if (ntest != N) {
            n->ntest = nod(OANDAND, n->ntest, ntest);
            typecheck(&n->ntest, Erv);
        }
        n->ntest->ninit = list(list1(nod(OAS, ohv1, hv1)), a);

Now all that’s left to do is compile the project with `go/src/all.bash`. If you set your `$GOROOT` to the `go` directory you were working in you should have a new `6g` that will compile for loops with range termination. For instance:

package main

import "fmt"

func main() {
        m := map[string]float64{"0":0.0, "1":1.0, "pi":3.1415, "4":4.0}
        for k, v := range m; v != 1.0 {
                fmt.Printf( "key %s, value %g\n", k, v )
        }
        s := "abcde"
        for i, c := range s; c != 'c' {
                fmt.Printf( "\tfound value %q at %d\n", c, i )
        }
        var a [5]string
        a[0] = "a"
        a[1] = "b"
        a[2] = "c"
        a[3] = "d"
        a[4] = "e"
        for i, c := range a; c != "d" {
                fmt.Printf( "found value %q at %d\n", c, i )
        }
}
Output:
key pi, value 3.1415
key 4, value 4
key 1, value 1
found value 'a' at 0
found value 'b' at 1
found value 'c' at 2
found value "a" at 0
found value "b" at 1
found value "c" at 2
found value "d" at 3
Categories: Code, Go