CS 152 - Lecture 7

Cover page image

Cay S. Horstmann

Language Translation Process

Lexical Analysis

Notes on White Space

Parsing

Code Generation

Code Optimization

Regular Expressions

Regular Expressions

Convenience Notation

Regex Examples

The Scala Regex Library

Lab

???

Step 1: Regular Expressions

Write down regular expressions for

Step 2: Check Your Regexes

Use the Scala Regex.findAllIn method to check your work from Step 1 with the strings

What results do you get?

Tip: In Java, it is unpleasant to deal with strings containing backslashes and quotes. For example, the regular expression ([^"\\]|\\.)*, as a Java string, is "([^\"\\\\]|\\\\.)*" Scala has an alternate way of specifying strings. When strings are enclosed in """...""", nothing inside is escaped. (Of course, you can't have a """ inside.) For example, """([^"\\]|\\.)*""".

Step 3: A Simple Lexer

In a lexer, we specify a set of patterns that are tried in sequence. For example, a simple language might have the following token types:

Note that the order matters. We want if, def, val recognized as reserved words, not identifiers.

In this step, write a function firstMatch(input : String, patterns : List[Regex]): String that returns the first match in the input string for any of the regular expressions or null if there is no match. For example,

val patterns = List("if|def|val".r, """\p{L}(\p{L}|\p{N}|_)*""".r,
   """[+-]?\p{N}+""".r, "[+*/%<=>-]".r, "[(){};]".r, """\p{Z}+""".r
)
val input = "if(x<0) 0 else root(x);"
firstMatch(input, patterns)
String : if
firstMatch(input.substring(2), patterns)
String : (

What is the code for your function?

Hint: This is simple recursion. If the first regex matches, return the match, otherwise call firstMatch(input, patterns.tail).

Step 4: Complete the Lexer

Write a function tokens(input : String, patterns : List[Regex]) : List[String] that returns a list of matching tokens. For example,

tokens(input, patterns)
List[String] = List(if, (, x, <, 0, ), , 0, , else, , root, (, x, ), ;, )

That's again simple recursion. If the input is empty, return the empty list. Otherwise, get the first match. If it's null, return the empty list. Otherwise, recursively call tokens(input.substring(first.length), patterns).

What is the code of your function?