Transforming an XML Tree with Scala Partial Functions

In my last blog, I outlined how I found the Scala XML library a pleasant solution for unpleasant XML format conversion jobs. In those jobs, I had to completely transform the document from one grammar to another.

When you need to make small tweaks to a document, the library a bit more of a hassle. This page by Burak Emir, the author of the Scala XML library, states: “The Scala XML API takes a functional approach to representing data, eschewing imperative updates where possible. Since nodes as used by the library are immutable, updating an XML tree can a bit verbose, as the XML tree has to be copied.” A verbose example follows.

Here is what I needed to do. Whenever I had a <div class="example"><p></p></div>, I had to replace it with the actual file name, with each line preceded by a line number.

That part is simple:

def getExample(node: Node) =    
  <ol>{io.Source.fromFile(new File((node \ "p").toString)).getLines().map(
    w => <li><pre>{w} </pre></li>)}</ol>

But how can you say “Do this for all <div class="example">, and leave the rest alone?”

In a functional program, you need to copy the tree, so I figured I should write a universal transformer method.

 * Transforms all descendants matching a predicate.
 * n a node
 * pred the predicate to match
 * trans the transformation to apply to matching descendants
def transformIf(n: Node, pred: (Node)=>Boolean, trans: (Node)=>Node): Node = 
  if (pred(n)) trans(n) else
    n match { 
      case e: Elem => 
        if (e.descendant.exists(pred)) 
          e.copy(e.prefix, e.label, e.attributes, e.scope, 
  , pred, trans))) 
        else e
      case _ => n 

The if (e.descendant.exists(pred)) part isn't strictly necessary. I just wanted to reuse nodes when there was no need for rewriting.

This solved my immediate problem.

It turned out that I needed to change some other nodes as well. I could have done two transforms, or rewritten my method to take a sequence of (predicate, transformer) pairs. But then I remembered something about partial functions in the actor library.

This blog brought me up to speed. A case expression { case ... => ...; case ... => ... } can be converted to a PartialFunction. There are methods for checking whether a value is covered by one of the cases, and for applying the function. In other words, I could trivially extend my previous method to partial functions:

def transform(n: Node, pf: PartialFunction[Node, Node]) =
  transformIf(n, pf.isDefinedAt(_), pf.apply(_));  

Burak Emir explains how one can write case statements that check conditions with attributes. This is what it looks like.

transform(doc.docElem, {
    case node @ <div>{_*}</div> if   
      node.attribute("class").getOrElse("").toString == "example" => getExample(node)
    // Other cases go here
    case ... => ...

It reads quite nicely. When you have a div whose class attribute is example, call the getExample method.

Eat your heart out, Java!

There is a larger message here. Consider again the task described in this blog, i.e. replacing <div class="example"><p></p></div> with <ol><li><pre>each line in that file</pre></li></ol>? Yes, I could program it in Java, but the thought makes my skin crawl.

A while ago, I resolved to use Scala for all my little processing tasks so that I would get to know it over time. It was painful at first—tasks that I know I could have completed easily in Java took some research and definitely took me out of my comfort zone. But over time, this has paid off. I can now easily do tasks in Scala that I would never have attempted in Java.