Showing posts with label Eta. Show all posts
Showing posts with label Eta. Show all posts

Monday, 2 August 2010

Eta, Part II - Syntax (part I)

Many people have pointed out that language designers tend to obsess over syntax far too much and that their time would be better spent thinking about the semantics of their languages. Some (usually those who are either more academically inclined or old lispers) go so far as claiming that syntax is ultimately irrelevant, since a) which syntax someone prefers is largely a matter of taste anyways and b) every syntax becomes 'natural' after sufficient exposure.
Well, this topic has been discussed thoroughly, and I will only add to it to the extent that I am going to justify my own design decisions on the matter.
On the most abstract level a program can be thought of as a nested structure of operations being applied to sub-units which again consist of operations being applied to sub-sub-units, and so forth. Within a compiler this structure is usually represented as a so-called AST (abstract syntax tree).
If we print out an AST in parenthesized polish notation we would essentially end up with Lisp's syntax. This very elegant idea has a couple of advantages - it is extremely simple, easy to parse and totally generic (note though that the oft-heralded homoiconicity of Lisps is a red herring in my opinion - in every language that I know of it would not be difficult to represent a program's AST in the language itself).
On the other hand - at least for me - this genericity makes programs more difficult to read, especially at a glance, since it lacks redundancy. In Lisp the only carriers of information about the structure of a program are the names of operations and the nesting structure. In most main-stream languages however syntax is used as an additional redundant channel of communication. This redundancy makes it much easier to quickly grasp the structure of a piece of source code.
Have a look at this bit of C for example:

 3 struct Point
 4     {
 5     float x, y;
 6     };
 7 
 8 float point_dist(Point p1, Point p2)
 9     {
10     float dx = p2.x-p1.x, dy = p2.y-p1.y;
11     
12     return sqrt(dx*dx + dy*dy);
13     }

We can see that the same basic functionality is provided by very different syntactical elements depending on the context. The separation of terms for example is done by whitespace (top-level), ',' (declarations) and ';' (statements). Grouping is done by '()' (arithmetics, actually not shown in this example), operator precedence (arithmetics) and '{}' (statements). The application of an operation to arguments is expressed either in infix notation (arithmetics), prefix with '()' (function call), plain prefix (flow control keywords) or implicitly (declarations).
Of course this mess is far removed from the theoretical purity of Lisp's S-expressions. However it allows us to very quickly distinguish between different kinds of operations and different kinds of lists of terms. Looking for a declaration - spot names separated by whitespace, looking for function calls - find name + '()', and so on.
Redundancy therefore clearly serves to support readability (or "glanceability"). Too much of it on the other hand will certainly have an opposite effect. The optimal syntax will consequently add just enough redundancy to improve readability. (side note: There is also useless redundancy - Pascal is a lot more redundant than C, however mostly due to the fact that it uses keywords instead of punctuation and longer keywords. In my opinion this reduces readability. A similar argument could be made for Java.)
To maximize the effect of syntax it is also important that there is as little ambiguity in the correspondence between syntactic elements and semantic structure as possible. A nice counter-example is provided by C++. By "overloading" old syntax it becomes a lot harder to read (quickly) than C.
In Eta I wanted the overall look to stay somewhere in the vicinity of a traditional curly-brace language. At the same time I wanted it to be as simple and regular as possible while defining an unambiguous relationship between syntactic elements and semantics. (side note: This sounds a lot more goal-oriented than it was. Actually it took me quite a while to find out that these were the goals I was aiming for.)
This post is already long enough however, therefore I will postpone the details of Eta's syntax to the next post. As a small teaser the example from above rewritten in Eta:

1 Point @ type : (x @ float, y @ float)
2 
3 point_dist(p1 @ Point, p2 @ Point) @ float :
4     {
5     dx @ float : p2.x-p1.x
6     dy @ float : p2.y-p1.y
7     
8     <- sqrt` dx*dx + dy*dy
9     }

Thursday, 22 July 2010

Eta

In the last two weeks what began as a small attempt at writing a work-saving template system for individual-based simulations turned into my first sort-of somewhat working (but not so work-saving) compiler for Eta (or η).
Eta is a project I have been working on/thinking about since a couple of years already. It started off with my increasing dislike for all the syntactic and semantic warts of the bloated mess that is C++. At the time I was desperately looking for an alternative, however everything I found that promised enough performance was either immature or only slightly less warty and bloated (sidenote: I think D is a much better language than C++ and I really hope it catches on. Still, it's far from being a good language IMHO.).
So I did what everybody who should instead really, really work on his PhD thesis (and I'm *not* talking about a PhD in Computer Science) would do - I started thinking about how to design my own language.
As others have said language design tends to be more successful when it tries to scratch a personal itch than when it attempts to solve other people's problems. In my case I really wanted a language that would make it easier and more fun to implement the simulations I am working on. That means the language had to be
  • fast. It *does* make a difference whether I have to wait 5 days or 8 days for a set of simulations to finish.
  • strictly and statically typed. As I said before the major difficulty when writing simulations is that errors are often silent. Every bit of static guarantee the compiler can give helps.
  • interoperable with C/C++. I'm not going to reimplement all the libraries I use.
  • expressive. For reasons of efficiency dynamic operations are often out in simulations. Therefore similar redundant patterns start to show up at lots of different places. E.g. if I add a new trait to an individual it has to be read, set, initialized, mutated, written to the data file, read from a config file, etc. Some of it can be alleviated by some advanced template wizardry but in the end I sooner or later usually fall back to external code generation. My ideal language would have built-in compile-time macros to solve this problem.
  • clear and unambiguous. One problem with C++ is that understanding what *exactly* a particular piece of code does can be non-trivial. Apart from syntactic idiosyncrasies things like silent shadowing of globals and implicit conversion rules make it necessary to be aware of a big amount of context to understand local semantics. This is especially a problem for simulations since (due to lack of external tests) code review plays an essential role in ensuring their correctness. A good language should therefore reduce the amount of context necessary to understand a piece of code as much as possible.
In addition I wanted a *nice* language. This is of course to a large degree a matter of taste, but for me an aesthetically pleasing language has to be concise - I hate the wordiness of Java or Pascal. Also I really like operators, in my opinion they greatly increase readability (I'm constantly tempted by unicode, but until now issues with inputting non-ascii characters have kept me from going that way. If only everybody had APL keyboards...). Furthermore I want a language to be as orthogonal as possible - everything should be the result of the combination of a few simple rules. I really dislike ad-hoc rules just to achieve some convenient effect or pointless duplication of functionality (why does D for example need templates, CTFE, macros"string mixins", traits and conditional compilation?).
Apart from these general principles I had a couple of specific technical ideas about mechanisms I wanted to include in the language. I will leave the details on that, on how I implemented Eta and on how the language looks like currently to the next post, however.