Tuesday 30 August 2011

A little Lua DSL for generating CSS


Lua has always been a good data description language and this makes it well suited to internal DSLs, that is, domain specific languages which are implemented in the language itself.

Consider the task of generating CSS. A very powerful notation, but not a complete language; there are no variables, functions or control structures. A common strategy is to generate the CSS from a template. The CSS for Blogger sites is embedded in an XML file which parameterizes the colours and widths involved. Like most XML formats it is clumsy and not exactly friendly to work with.

Apart from the parameterization problem, I personally find it hard to remember the formats of some common CSS properties and wanted a notation that fitted my head better.

This is what we will be working towards initially:

 css "ul li" {
     list_style = "none inside"
 }

Lua provides convenient syntactical sugar for function calls with a single argument that is either a string or a table constructor; the parentheses may be omitted. So the implementation is straightforward; css is a function which is given a selector and returns a function which processes the 'spec' table.

 function css (selector)
     out:write(selector..' ')
     return function (spec)
         out:write '{\n'
         for prop, val in pairs(spec) do
             prop = prop:gsub('_','-')
             out:write('\t'..prop..' = '..val..';\n')
         end
         out:write '}\n'
     end
 end

Here out can be any object which supports the write method, so that out = io.stdout is a good start for testing. There is a little massaging of the property names so that we can use underscores instead of hyphens, but otherwise it's a fairly literal translation.

As a special case, we'd like to write margin or padding using these forms:

 margin = 5;
 margin = {left='0.5em',right='0.5em'}

Sizes are either numbers or strings; if they are numbers they are explicitly converted into pixel values. The second declaration translates into two CSS properties:

 margin-left = 0.5em;
 margin-right = 0.5em;

so the strategy will be to go through the 'spec' (table of Lua key/value pairs) and expand these.

 function process_margin (tbl,spec,side)
    if type(spec) ~= 'table' then
       tbl['margin'..side_str(side)] = size_str(spec)
    elseif has_sides(spec) then
       for side, sspec in pairs(spec) do
          process_margin(tbl,sspec,side)
       end
    else
       error('must contain only top, left, right and bottom keys')
    end
 end

size_str will take a number like 5 and return '5px', pass strings through and otherwise throw an error. side_str('left') expands to '-left', and side_str(nil) expands to ''. The other auxilliary function has_sides is only true if the keys of the spec table is one of the side names; in this case the function is applied recursively to each of the values. A simple generalization also covers the similar padding case.

Similar recursive logic gives us this notation for border:

 border = true; -- default border
 border = {width=2}; -- border with width 2px
 border = {left=true}; -- default border on left side only
 border = {left={color='#DDD'}} -- left border with given colour

Apart from being more explicit, this format is easier to parameterize; the sizes are all numbers, and the colours are separate strings.

Currently, the selectors are still plain strings. That is fine enough, but we can do better:

 css.h2 {
    color = '#999',
    border = {bottom=true}
 }
 css.id.left {
    float = 'left',
    width = leftm,
    border = {right = {color = '#AAA'}}
 }
 css.id.left.ul {
    list_style = 'none inside',
    padding = {left = 0}
 }

Here the word id is special; it will translate as '#'. But how to make css.id.left generate the selector string '#left'?

Chains of properties can be easily converted into a set of operations with the 'dot builder' pattern.

 obj = setmetatable({},{
     __index = function(self,key)
         self[#self+1] = key
         return self
     end
 })

The __index metamethod only fires if the object does not contain the key. This is exactly what we want here, because the object is just used as an array, and has no key/value associations. Each 'lookup' merely returns the object, so it is executed purely for the side-effect of updating the array.

 > = obj.fred.alice.jane
 table: 0x00376d80
 > for i = 1,#obj do print(obj[i]) end
 fred
 alice
 jane
 > = table.concat(obj,',')
 fred,alice,jane

With a little special handling of id and class, this gives us the desired notation.

Flexible layouts can be easily generated dynamically using this DSL:

 require 'css'
 width = 500
 lmargin = 50
 leftm = 150
 gap = 10
 fore = '#FFFFFF'
 back = '#000000'
 left_fore = '#FFFFFF'
 left_back = '#000000'
 css.body {
    width = width,
    margin = {left=lmargin},
    color = fore,
    background_colour = back
 }
 css.id.left {
    float = 'left',
    width = leftm,
    color = left_fore,
    background_colour = left_back
 }
 css.id.content {
    margin = {left=leftm+gap},
 }
 print(tostring(css))  -- could write to file, etc

I present this more as an example of how flexible Lua DSLs can be, rather than a working solution. Mostly there's a good reason to keep CSS files separate from code, but at the least this provides a flexible way to generate that CSS.

With Lua web frameworks such as Orbiter, this can be directly used for interactive style modification/customization.

The full source for css.lua is available here

Thursday 11 August 2011

Pretty (and Lightweight) Code Highlighting with Lua

When I write articles, I like to write text in a plain editor, with no formatting options available. So cute on-line editors are not my cup of tea; they do not scale to more than a few paragraphs.

Blogger does allow you to by-pass the cute but irritating editor, and work directly with HTML, which can be directly pasted in. However, blank lines are considered to be paragraph breaks which is inconvenient if you are generating the HTML with some preprocessor.

Markdown is a good way to write text with hyperlinks, emphasis and maybe the occaisional list; it has been designed to be a good publishing format and deliberately does not give you too many options. Any indented block is rendered as-is as <pre><code>. However, I am also a programmer and like source examples to look reasonably pretty. So feeding this blog presented a technical challenge.

One way to get syntax highlighting is using client-side JavaScript like Alex Gorbatchev's SyntaxHighlighter. You can include this in Blogger, and the result is indeed good looking. The approach works by marking up code samples so that the highlighter can scan the code and re-arrange the DOM on the fly.

The first downside is that this involves a fair amount of extra typing for each snippet:

 <script type='syntaxhighlighter' class='brush: c#; wrap-lines: false'>
 foreach (var str in new List<string> { "Hello", "World" })
 {
    Console.Write(str);
    Console.Write(" ");
 }
 </script>

The second one is that pulling in all this extra JavaScript can lead to an appreciable lag, for instance the Lua snippets site. So a better solution is generate the pretty HTML up front; a few hundred extra bytes of markup will render practically instantly.

So this felt like a job for Captain Scripting.

The difference between a script and a program has been endlessly debated; the classic position is often called Ousterhout's Dichotomy where the world is divided into system and scripting languages. In this view, the ideal script is just 'glue' that flexibly connects parts written in some other language, which is an extension of the classic Unix Way.

It's a ultimately a scale thing, not a language thing; a small program that scratches a single itch, often for a single person. By this measure you can do scripting in any language (although it can get tedious with C because it's too low level.) Dynamic languages do have a big advantage for shorter programs, which is one reason why any serious programmer should have one of them in their toolkit.

Lua is my favourite little language, and a fair amount of the open-source work I've done over the last few years has been a response to the often-quoted remark "Lua does not come with batteries". In many ways, Lua is the C of dynamic languages: compact and efficient. This is more than an analogy, since the authors of Lua base the core functionality of the language around the abstract platform provided by ISO C. They see it as the job of the commmunity to provide libraries, and generally you can find them for most needs. If you prefer the batteries-included approach there is Lua for Windows; LuaRocks is a packaging tool, like Ruby's Gems; either can easily provide the prerequisites for this script: the markdown and penlight packages.

Niklas Frykholm's markdown.lua can be used as a command-line Markdown-to-HTML converter, but is most useful as a library. This is the basic engine needed by our script.

 require 'pl'
 require 'markdown'
 local text = utils.readfile('article.md')
 text = markdown(text)
 utils.writefile('article.html',text)

Here I'm leaning on Penlight to do some of the grunt work. You will notice some of the hallmarks of the scripting spirit; no error checking. That's cool for our purposes at first; error handling can always be added later.

The entertaining part is syntax highlighting. Penlight provides a lexical scanner which can read source and break it up into classified tokens like 'string', 'keyword', etc.

 local spans = {keyword=true,number=true,string=true,comment=true}
 function prettify (code)
    local res = List()
    res:append '<pre>\n'
    local tok = lexer.lua(code,{},{})
    local t,val = tok()
    if not t then return nil,"empty file" end
    while t do
       val = escape(val)
       if spans[t] then
          res:append(span(t,val))
       else
          res:append(val)
       end
       t,val = tok()
    end
    return res:join ()
 end

The scanner tok is a function which returns two things, the token type and its string value. We find the types to be highlighted by looking them up in the table spans. (This is an example of the 'pythonic' style that Penlight enables by providing a List class.)

This generates the HTML spans:

 local function span(t,val)
    return ('<span class="%s">%s</span>'):format(t,val)
 end

That is, we just assume that the CSS contains classes that give the token names a particular colour, etc.

 .keyword {font-weight: bold; color: #6666AA; }
 .number  { color: #AA6666; }
 .string  { color: #8888AA; }
 .comment { color: #666600; }

Finally we have to escape things like < as &lt;:

 local escaped_chars = {
    ['&'] = '&amp;',
    ['<'] = '&lt;',
    ['>'] = '&gt;',
 }
 local function escape(str)
    return (str:gsub('[&<>]',escaped_chars))
 end

Lua's string.gsub is a marvelous function, here mapping any special characters to their HTML representations using a lookup table.

All that remains to be done is to identify the indented blocks and convert them using prettify. That's straightforward but tedious.

The final bit of massaging happens to the generated HTML after Markdown processing has taken place: normally blank lines mean nothing in HTML, but Blogger is making up its own rules here. So we scrub out extra lines after paragraphs and code blocks:

 function remove_spurious_lines (txt)
    return txt:gsub('</p>%s*','</p>\n'):gsub('</pre>%s*','</pre>\n')
 end

And that's basically it; prettify.lua is passed the article name (without the .md extension) and the language to use, and writes out a file name.html in a form that can be directly pasted into Blogger. If there is a third argument, then a HTML document with inlined style is generated, which is useful for previewing.

To see the colours, you have to modify your blog's CSS, but this is straightforward; just skip down until you see the CSS and paste in the above CSS snippet. (I found a pre { font-weight: bold } helped readability; the rest is a matter of taste.) This technique will of course work with other blog engines that allow direct HTML entry.

The moral of the story: batteries are important. Having the right tools around makes it easier to do the job without too much copy-and-pasting from the World Wide Scratchpad.

The final script is available here.

Wednesday 10 August 2011

Enjoying Java (Again)

Fun with Java seems like an unlikely combination these days. All the cool kids have moved to dynamic languages, or to more exciting JVM languages like Scala. Meanwhile the old dog plods along, not really learning any new tricks, doing a corporate gig as the COBOL of the 21st Century.

I have a fair amount of stuff prototyped with LuaJava, which had grown into a hairy blob which needed rewriting - the point at which exploratory programming breaks down and you have to move from scripting to programming. This does not automatically mean changing from a dynamic language, of course: 'scripting language' is perjorative and somehow implies that doing better isn't possible given the language. But here I had to make the adult choice between a language which my peers did not know and one which would be a better bet for maintainability. But I was determined to have fun in the process.

It is perfectly possible to enjoy Java programming, especially if you use a more dynamic way of thinking. Certain sacred cows will be inconvenienced, of course. The heart of the matter is how to be sure that a program is correct; the static types perspective is that the compiler should catch as many problems as possible, whereas the dynamic perspective is that errors will happen anyway, so make sure that they happen as soon as possible. A language like Java allows solutions along the whole continuum between these positions.

The verbosity of Java is an issue, but can often be worked around creatively. Several features introduced in Java 1.5 are helpful. (Some of these clearly came about because of competition from C#.) The first is methods that can take a variable number of arguments of a type:

 static Object[] A(Object... objects) {
     return objects;
 }

This is useful and also succinctly expresses how the varargs mechanism works; it is just syntactical sugar for passing an array. Now we can do cool things like this without the distraction of the syntactical overhead:

 Object[] ls = A("hello",2.3,A(true,2,"what"));

Auto-boxing (another feature 'inspired' by C#) makes these list expressions work as expected. The result is in fact rather close to what a Python or Lua programmer would recognize as data.

The Collection Literals proposal would certainly be useful, but appears not to have made it in Java 7. (In any case, this proposal only applies to immutable lists, maps, etc.)

Here is an easy map constructor:

 static Map M (Object... objects) {
     Map map = new HashMap();
     for (int i = 0; i < objects.length; i += 2) {
         map.put(objects[i], objects[i+1]);
     }
     return map;
 }
 ...
 Map<String,Integer> map = M("one",1,"two",2,"three",3);

Eclipse does not approve of this code, and expresses itself with yellow ink; we have lost some compile-time guarantees and gained something in return. Is the increased expressiveness worth the loss in type-safety? It depends how you evaluate the cost; a dynamic-language person is used to type uncertainty, and immediately thinks about error checking and testability. No seasoned programmer regards a correct compilation as anything but the first step to correct code. The dynamic strategy is to gain expressivity and flexiblity, losing some static type safety, and try to fail as soon as possible in the development cycle. That is, 'late binding' is best if the binding isn't too late.

Many would argue that the type-safety guarantees that Java can provide are hopelessly inadequate anyway (any variable of object type can be null for instance.) So they seek languages where ultimately it is impossible to write incorrect programs (at the cost of making it impossible to write non-trivial programs, but I digress.)

There are many applications for which Java reflection was clearly intended. There is an irritating amount of detail involved in writing command-line programs in any language, and the following is one way of approaching the problem. The immmediate inspiration was lapp, a framework for writing command-line Lua scripts; the idea was that parameters could be declared as having specific types, e.g files-for-reading, and they would then become directly available for use, with the framework closing any open files at the end.

The introduction of attributes was another useful innovation that came with 1.5 (again, arguably under pressure from C#.)

 @Help("Demonstrating a simple command-line framework")
 public class SimpleCommand extends Commandlet {
     @Help("scale factor")
     public double scale = 1.0;
     @Help("adds two numbers together and scales")
     public double add(double x, double y) {
         return (x+y)*scale;
     }
     public static void main(String[] args) {
         new SimpleCommand().go(args);
     }
 }
 $> java SimpleCommand --help
 Demonstrating a simple command-line framework
 Flags:
 -scale: scale factor
 Commands:
 add:    adds two numbers together and scales
 $> java SimpleCommand add 1.2 4.2
 5.4
 $> java SimpleCommand -scale 2 add 1.2 4.2
 10.8

Commandlet is a straightforward little framework (about 500 lines, which is micro by Java standards) that handles some pesky details. From reflection, it knows what types a command expects and converts the parameters accordingly. (I'm not a fan of prescriptive frameworks, so it is not necessary to use @Help annotations; they are just used to provide help, rather like Python doc strings.)

Naturally, after I did this I discovered that cliche does something very similar, except for writing interactive shells. This is more of a library - you do not derive your classes from a framework class - which is arguably a more flexible design.

The only tricky bit of reflection magic needed was to support variable argument lists:

 @Help("takes a variable number of doubles")
 public void sum(double... vals) {
     double sum = 0.0;
     for (double x : vals) {
         sum += x;
     }
     System.out.println ("sum was " + sum);
 }

Invoking this method by reflection involves passing a double[] array, so for vararg methods any extra parameters are collected and converted into a primitive array of doubles.

 if (method.isVarArgs()) {
   Class<?> type = types[nargs].getComponentType();
   Object varargs = Array.newInstance(type, parms.length - nargs);
   for (int i = nargs, j = 0; i < parms.length; ++i,++j) {
      Object res = convertString(parms[i],type);
      Array.set(varargs,j,res);
   }
   values.add(varargs);
 }

Being one of those people who can never remember how to read a text file in Java, I've made BufferedReader a known type:

 @Help("read a file and trim each line")
 public void read(BufferedReader file) throws IOException {
     String line = file.readLine();
     while (line != null) {
         System.out.println(line.trim());
         line = file.readLine();
     }
 }

Parameters of type int, double, String, BufferedReader and PrintStream are known, and others can be added. Say I have this method:

 public void dec (byte[] arr) {
     for (byte b : arr) {
         System.out.print(b+" ");
     }
     System.out.println();
 }

Then defining how byte[] is to be read in can be done like so:

 @Converter
 public byte[] toByteArray(String s) {
     byte[] res = new byte[s.length()/2];
     for (int i = 0, j = 0; i < s.length(); i += 2, j++) {
         String hex = s.substring(i,i+2);
         res[j] = (byte)Short.parseShort(hex, 16);
     }
     return res;
 }
 $> java bytearr dec AF03EE
 -81 3 -18

The strategy is simple: if a parameter type is unknown, then look at all public methods marked with Converter and match against their return types. It's interesting to contrast this convention with the approach to converters taken by Cliche which is more of a classic Java solution (define a new interface and create an anonymous class implementing it).

Simularly, you gain control over output by defining 'stringifiers'

 public byte[] test() { return new byte[] {0x4E,0x3C,0x02}; }
 @Stringifier
 public String asString(byte[] arr) {
     StringBuffer sb = new StringBuffer();
     for (int i = 0; i < arr.length; i++) {
         sb.append(String.format("%02X",arr[i]));
     }
     return sb.toString();
 }
 ...
 $> java bytearr test
 4E3C02

Here it's the first argument type that must match the output type, and the return value must be a string. As with Converters, the actual name of the method is not important.

Usually the idea is to turn data into something more readable for humans, but it's also useful to generate standard formats that other programs can easily parse. JSON is a popular representation of data that fits well with the first theme of this article, which is discovering expresive notations. The elegant little function J makes creating dynamic structured data almost as natural as it is in JavaScript or Python:

 public Json result(int a, int b) {
     return J("one",A(a,10*a),"two",A(b,10*b));
 }
 $> java JDataTest result 10 20
 {
  "two":[
   "20",
   "200"
   ],
  "one":[
   "10",
   "100"
   ]
  }

The class Json is derived from a map from strings to objects, so J can be expressed like this in a way that makes the potential errors clearer.

 static void error (String message) throws IllegalArgumentException {
     throw new IllegalArgumentException(message);
 }
 static Json J(Object... objects) throws IllegalArgumentException {
     Json res = new Json();
     if (objects.length % 2 != 0)
         error("need key/value pairs");
     for (int i = 0; i < objects.length; i += 2) {
         if (! (objects[i] instanceof String))
             error("keys must be strings");
         map.put(objects[i], objects[i+1]);
     }
     return res;
 }

Json also contains a method for generating an optionally pretty-printed text representation of this structure; (useful, but not in itself particularly interesting code); it gives us the stringifier.

 @Stringifier
 public String json(Json map) {
     return Json.asText(map," ");
 }

There was a discussion on Reddit recently about how JSON would make a good interchange format between command-line programs. We can go further, and do a Java shell that works like PowerShell; commands pass actual data between each other, which is then put out as text by various adapters. That is, in this line the piped 'programs' are classes which will pass data directly to each other - here baz is either a custom adapter, or one gets the default output in JSON.

 $> foo | bar | baz

And (perhaps) that will be something to amuse me on a rainy Sunday ...

In the meantime, at work I wanted to show a concrete example of a Web service that could allow our server speaking CORBA to communicate with a Django web server. Corbaphobia is a well-known phenomenon in the distributed programming universe and I wanted to make all this cool remote functionality available in a simpler format. JSON seemed a better match than XML for our data. So I realized that it would actually be fairly straightforward to derive a little web server from Commandlet and in fact it took less than 120 lines, thanks to classes like URLDecoder. An URL like 'find?name=alice&age=16' is turned into 'find -name alice -age 16' and then interpreted by Commandlet, with all such commands returning Json data with the above stringifier. The commands take no explicit parameters, which are instead passed as flags. Such servers must therefore define a resetFields method so they can clear out the fields before each new request.

Making such a simple web service interface to existing remote services turns out to be very straightforward, mostly due to the expressive power of the J function above.

 public String ssid;
 public int did;
 public Json getSingleDataRange() {
     if (ssid == null) return error("must provide 'ssid'");
     long sid = parseSid(ssid);
     if (sid == 0) return error("cannot parse "+ssid);
     if (did == 0) return error("must provide 'did'");
     Results results = remote.getSingleDataRange(sid, did, getTimeSpan(sid));
     // have to explicitly box the primitive arrays to serialize out to JSON
     int len = results.times.length;
     Object[] times = new Object[len];
     Object[] values = new Object[len];
     for (int i = 0; i < len; ++i) {
         times[i] = results.times[i];
         values[i] = results.values[i];
     }
     return J("times",times,"values",values);
 }

The error strategy is simple: don't worry about 404 or 500, just return some data that contains an error message.

 private Json error(String msg) {
     return J("error",msg);
 }

The test client was 33 lines of Python, another case of the right tool for the job, really just a matter of exercising standard Python libraries like httplib, json, pprint and urllib.

MiniJsonServer is what the Agile people so charmingly call a spike solution, i.e. what us less sophisticated hackers call an 'exploratory hack'. It is a hack because it transforms URLs into equivalent command-line parameters, and relies on the local state of the object to pass parameters.

We can do better than that, by annotating the arguments of the exposed methods in a style similar to that used by Cliche:

 public Json getSingleDataRange(
   @Param("ssid sensor id") String ssid,
   @Param("did detector id") int did,
   @Param("t1 start time") Time t1,
   @Param("t2 end time") Time t2
 ) {
 ...
 }

Then the parameters can be passed by name and meaningful errors generated semi-automatically. A good side-effect is that the parameters are now documented as well as named, and our little server can respond to a 'help' command with a useful summary of the server's functionality. (As the author of Cliche says, it is straightforward to re-arrange the JavaDoc of a function into this reflection-friendly form).

In this application there is freedom to specify times in various formats (not everyone likes to speak Java milliseconds-since-epoch) and we support that by defining an internal time wrapper type called Time and make the conversion more specific, depending on a flag tfmt.

There is always the temptation to push further; it's straightforward for MiniJsonServer to work as a more general local webserver. Some of this functionality fits the intended application, e.g. when the associated data from a sensor is some common binary format like an image, a request for data should return the data with that specific 'image/jpg' content type. (Encoding images in JSON strikes me as the kind of universal-language madness that got XML such a bad name.) But as I already have the Orbiter project for developing little local HTTP servers, this would be taking the fun too far.

Having a wide range of applications is good exercise for a framework, no matter how dinky. I've done a simple Swing console that integrates with Commandlet. The main method of such commandlets looks like this:

 public static void main(String[] args) {
     TestConsole me = new TestConsole();
     Console c = new Console("testing console","? ",me);
     me.go(c);
     c.setVisible(true);
 }

where Console is the Swing console window. It is relatively stupid and focussed on its job (which is the mark of a good class in my opinion) and implements the Displayable interface that provides a single method display. In turn, Commandlet implemnts a Evaluator interface providing eval. For maximum flexiblity, Commandlet also implements Displayable and provides a default console implementation; Console implements Evaluator so that direct subclasses can use it directly. This is all standard Java thinking and decouples the engine from the interface pretty well.

In summary, apart from having some fun and wanting to talk about it, I wanted to show that more expressive notations in Java can be invented, if you are prepared to sacrifice some correctness. This trade-off should be done as a conscious engineering decision, and shouldn't be decided solely using abstract rules. After all, guidelines are not commandments.

The example code is available here