A piece of intertube about the Clojure programming language, algorithms and artificial intelligence.

Sunday, November 8, 2009

Tail recursion and function composition

The chapter four of the ANSI Common Lisp book has an interesting code to insert a node in a binary search tree (BST). The code, ported to Clojure, is:

(defstruct node :elt :l :r)

(defn bst-insert [bst x comp]
  (if (nil? bst)
    (struct node x)
    (let [elt (:elt bst)]
      (if (= x elt)
        bst
        (if (comp x elt)
          (struct node
                  elt
                  (bst-insert (:l bst) x comp)
                  (:r bst))
          (struct node
                  elt
                  (:l bst)
                  (bst-insert (:r bst) x comp)))))))


The function is non-destructive. Creating a BST can be done like this:

(def nums (reduce (fn [acc x] (bst-insert acc x <)) nil [5 8 4 2 1 9 6 7 3]))
Obviously the function does not use a tail recursion. The value of the recursive call to bst-insert is used by the struct function. How could we transform it to have a tail recursion? We need to store on a stack the information gained while doing the recursion: the branches we have visited and the branches taken. We store the information while we go from the root to the leaf. We can then unstack and reconstruct the tree from the leave to the root. This is done in the bst-insert-tc function:
(defn bst-insert-tc [bst x comp]
  (loop [bst bst
         acc '()]
    (if (nil? bst)
      (loop [tree (struct node x)
             stack acc]
        (let [[lr elt branch] (first stack)]
          (if (empty? stack)
            tree
            (if (= lr :l)
              (recur (struct node elt branch tree) (rest stack))
              (recur (struct node elt tree branch) (rest stack))))))
      (let [elt (:elt bst)]
        (if (= x elt)
          bst
          (if (comp x elt)
            (recur (:l bst) (cons [:r elt (:r bst)] acc))
            (recur (:r bst) (cons [:l elt (:l bst)] acc))))))))

This approach works but is a bit naive and the second loop can be simplified by using reduce:
(defn bst-insert-tc2 [bst x comp]
  (loop [bst bst
         acc '()]
    (if (nil? bst)
      (reduce (fn [tree [lr elt branch]]
                (if (= lr :l)
                  (struct node elt branch tree)
                  (struct node elt tree branch)))
              (struct node x)
              acc)
      (let [elt (:elt bst)]
        (if (= x elt)
          bst
          (if (comp x elt)
            (recur (:l bst) (cons [:r elt (:r bst)] acc))
            (recur (:r bst) (cons [:l elt (:l bst)] acc))))))))
Once I had the first tail recursive version, I asked my friend Yogi how he would transform the bst-insert code into a tail recursive version. He did find a really elegant solution, in Haskell. Here is the idea: instead of stacking information about the tree and then performing at the end calls to reconstruct the tree in reverse order, we can build a function constructing a new tree as we go from the root to the leaf. When we are on a node, what we do is something like that (if the element to be inserted is smaller than the node's element):
(struct node elt l (:r bst))
The problem is, when we are doing this operation, we do not know the value of l. l is a tree to be construct, and we can know its value only by doing a new recursive call. Since we do not know the value of it at this time, we will created an anonymous function that will be able to take this value as an argument later, when the value will be known. We thus construct anonymous functions at each level of the recursion, composing them into a unique function with comp. Each anonymous function will return the appropriate tree for the left or right branches. At the end of the recursion, we know all the arguments. The last argument is the value of the element to be inserted. We can directly call the composed function which will create the whole tree. We do not even need to unstack anything:
(defn bst-insert-tc3 [bst x cmp]
  (loop [bst bst
         f identity]
    (if (nil? bst)
      (f (struct node x))
      (let [elt (:elt bst)]
        (if (= x elt)
          bst
          (if (cmp x elt)
            (recur (:l bst) (comp f (fn [l] (struct node elt l (:r bst)))))
            (recur (:r bst) (comp f (fn [r] (struct node elt (:l bst) r))))))))))
The code shows how functional programming can be both elegant and efficient.
The full code for the BST is available here

Saturday, October 24, 2009

Dijkstra's algorithm in Clojure

The Dijkstra's algorithm is an efficient algorithm to compute the shortest path between two nodes of a directed graph with positives costs (distances). In this post we rely loosely on the pseudo-code of Wikipedia and on the nice illustrated example of the french version to develop this algorithm in Clojure. The functions in the clojure.zip namespace will not be used.

We represent the graph with a map. Keys are the nodes of the graph, values are another map mapping each child to its distance to its parent.

Thus, the representation for this graph:
graph
is:
{:A {:B 85, :C 217 }, :B { :C 25 }, :C {} }

Symbols (:A, :B etc.) are used to represent the nodes, but any type would be fine.

We don't want our algorithms to rely on a particular graph structure so access to the graph is made via helper functions. The distance function return the distance between two nodes and the children function return a sequence containing the children for a given node.

(defn distance [net nodesrc nodedst]
  ((net nodesrc) nodedst))

(defn children [net node]
  (keys (net node)))

In the Dijkstra's algorithm, nodes' distances to the initial node are progressively updated and the predecessor of a node, for its known shortest distance, is updated. We take the same approach but we don't store distances with a value of infinity, i.e. the distances to the initial node for nodes not yet explored are not kept. We call "root" the initial node in our code, and "rdistance" the distance to the root for a given node.

The rdistances are stored in a sorted map, rdists. Each rdistance is mapped to another map mapping each node to their predecessor for this rdistance. This allow to efficiently implement the
u := vertex in Q with smallest dist[]
operation describe in the pseudo-code: we just need to take the first element of the map. Once the smallest rdistance is found, we removed it from the map to save space. How do we keep the rdistance of a node if rdistances are removed from the rdists map? Predecessors and rdistances are kept together in the preds map. Rdists is only used for newly discovered rdistances. Note that with this scheme we don't need a Q set like in the pseudo-code, rdists acting like a one. When rdists is empty, that mean we have explored all reachable nodes.

When the node with the smallest rdistance has been found, we need to update the rdistances of its children. This is done in the update-rdists function.

(defn- update-rdists [rdists preds net node dist children distance]
  "return [rdists preds] updated"
  (reduce (fn [acc x]
            (let [curdist (+ dist (distance net node x))
                  prevdist (second (preds x))
                  nrdists (first acc)
                  npreds (second acc)]
              (if (nil? prevdist)
                [(add-rdist nrdists x node curdist) (assoc npreds x [node curdist])]
                (if (< curdist prevdist)
                  [(add-rdist nrdists x node curdist prevdist) 
                   (assoc npreds x [node curdist])]
                  [nrdists npreds]))))
          [rdists preds]
          (children net node)))
When true, the condition (nil? prevdist) means that the rdistance for the currently examined node is not known, i.e. infinite. The reduce function allows us to "iterate" on each child of the node with the smallest rdistance, and to update the values of rdists and preds. They are stored together in a vector, and pass via the accumulator between each iteration.

Uncommenting the printf function in the code allows us to follow the steps of the algorithm.

minnode = :A preds = {:E [:A 173], :C [:A 217], :B [:A 85], :A [:A 0]} rdists ={85 {:B :A}, 173 {:E :A}, 217 {:C :A}}
For instance, the previous line means that the node with the smallest rdistance was A, then its children' rdistances were updated. The rdistance from :B is 85 and its predecessor is A, the one from :E is 173 etc. In this case rdists and preds convey nearly the same information, this is because its one of the first iteration of the algorithm. Things change after when the exploration progress.

Please feel free to criticize and improve the algorithm. One idea could be to use the memoize function to cach the result of the dijkstra function.

If you know a good test suite for graph algorithms, please post a comment.

Full code is available here

Followers