Last week, in order to avoid working on more important things I started a small project I had wanted to do since quite a while. As mentioned before I am really not happy with C++ as a language to write simulations in. Unfortunately according to the Great Language Shootout all the nice languages are way too slow to be seriously considered.
However - as aptly explained on the shootout page - comparing the speed of programming languages and even the speed of implementations of programming languages is not very meaningful. Results will vary widely dependent on application area.
Therefore the only reasonable thing to do is to implement a benchmark which is representative of the kind of program one is interested in in all candidate languages. Which is exactly what I have started to do.
I have defined a reference model and implemented it in (very very basic) C++ and (C++ish) D (and CUDA which strictly speaking doesn't belong here. More on that in a later blog post - hopefully. I'm really bad at actually writing posts that I have announced before...).
To my great delight it turned out that the D program ran only about 10% longer than my basic C++ version. In principle this is clearly a small enough loss in speed to be compensated for by the nice improvements D offers over C++. I was really disappointed though (you can see how much I would like to abandon C++) when I found out a bit later that PGO (profile guided optimization) gives my C++ program another 20% boost. Add to that the fact that 64-bit D is still some way off and that value types are second class citizens in D and I'm not really sure whether I will do my next project in D or not. In any case it was nice to see the progress they have made. I am optimistic that my C++-days are numbered...
As for the shootout - I will try to find the time to add "proper" implementations in D and C++ (which hopefully will not perform much differently). I hope I will also be able to cover other interesting or up-and-coming languages, such as OCaml or Bitc. If anyone of my three readers feels willing and able to help out - just head over to the project page, read the model definition and hack away. I will be happy to post code/results.
Showing posts with label programming language. Show all posts
Showing posts with label programming language. Show all posts
Sunday, 31 October 2010
Monday, 2 August 2010
Eta, Part II - Syntax (part I)
Many people have pointed out that language designers tend to obsess over syntax far too much and that their time would be better spent thinking about the semantics of their languages. Some (usually those who are either more academically inclined or old lispers) go so far as claiming that syntax is ultimately irrelevant, since a) which syntax someone prefers is largely a matter of taste anyways and b) every syntax becomes 'natural' after sufficient exposure.
Well, this topic has been discussed thoroughly, and I will only add to it to the extent that I am going to justify my own design decisions on the matter.
On the most abstract level a program can be thought of as a nested structure of operations being applied to sub-units which again consist of operations being applied to sub-sub-units, and so forth. Within a compiler this structure is usually represented as a so-called AST (abstract syntax tree).
If we print out an AST in parenthesized polish notation we would essentially end up with Lisp's syntax. This very elegant idea has a couple of advantages - it is extremely simple, easy to parse and totally generic (note though that the oft-heralded homoiconicity of Lisps is a red herring in my opinion - in every language that I know of it would not be difficult to represent a program's AST in the language itself).
On the other hand - at least for me - this genericity makes programs more difficult to read, especially at a glance, since it lacks redundancy. In Lisp the only carriers of information about the structure of a program are the names of operations and the nesting structure. In most main-stream languages however syntax is used as an additional redundant channel of communication. This redundancy makes it much easier to quickly grasp the structure of a piece of source code.
Have a look at this bit of C for example:
We can see that the same basic functionality is provided by very different syntactical elements depending on the context. The separation of terms for example is done by whitespace (top-level), ',' (declarations) and ';' (statements). Grouping is done by '()' (arithmetics, actually not shown in this example), operator precedence (arithmetics) and '{}' (statements). The application of an operation to arguments is expressed either in infix notation (arithmetics), prefix with '()' (function call), plain prefix (flow control keywords) or implicitly (declarations).
Of course this mess is far removed from the theoretical purity of Lisp's S-expressions. However it allows us to very quickly distinguish between different kinds of operations and different kinds of lists of terms. Looking for a declaration - spot names separated by whitespace, looking for function calls - find name + '()', and so on.
Redundancy therefore clearly serves to support readability (or "glanceability"). Too much of it on the other hand will certainly have an opposite effect. The optimal syntax will consequently add just enough redundancy to improve readability. (side note: There is also useless redundancy - Pascal is a lot more redundant than C, however mostly due to the fact that it uses keywords instead of punctuation and longer keywords. In my opinion this reduces readability. A similar argument could be made for Java.)
To maximize the effect of syntax it is also important that there is as little ambiguity in the correspondence between syntactic elements and semantic structure as possible. A nice counter-example is provided by C++. By "overloading" old syntax it becomes a lot harder to read (quickly) than C.
In Eta I wanted the overall look to stay somewhere in the vicinity of a traditional curly-brace language. At the same time I wanted it to be as simple and regular as possible while defining an unambiguous relationship between syntactic elements and semantics. (side note: This sounds a lot more goal-oriented than it was. Actually it took me quite a while to find out that these were the goals I was aiming for.)
This post is already long enough however, therefore I will postpone the details of Eta's syntax to the next post. As a small teaser the example from above rewritten in Eta:
Well, this topic has been discussed thoroughly, and I will only add to it to the extent that I am going to justify my own design decisions on the matter.
On the most abstract level a program can be thought of as a nested structure of operations being applied to sub-units which again consist of operations being applied to sub-sub-units, and so forth. Within a compiler this structure is usually represented as a so-called AST (abstract syntax tree).
If we print out an AST in parenthesized polish notation we would essentially end up with Lisp's syntax. This very elegant idea has a couple of advantages - it is extremely simple, easy to parse and totally generic (note though that the oft-heralded homoiconicity of Lisps is a red herring in my opinion - in every language that I know of it would not be difficult to represent a program's AST in the language itself).
On the other hand - at least for me - this genericity makes programs more difficult to read, especially at a glance, since it lacks redundancy. In Lisp the only carriers of information about the structure of a program are the names of operations and the nesting structure. In most main-stream languages however syntax is used as an additional redundant channel of communication. This redundancy makes it much easier to quickly grasp the structure of a piece of source code.
Have a look at this bit of C for example:
3 struct Point
4 {
5 float x, y;
6 };
7
8 float point_dist(Point p1, Point p2)
9 {
10 float dx = p2.x-p1.x, dy = p2.y-p1.y;
11
12 return sqrt(dx*dx + dy*dy);
13 }
We can see that the same basic functionality is provided by very different syntactical elements depending on the context. The separation of terms for example is done by whitespace (top-level), ',' (declarations) and ';' (statements). Grouping is done by '()' (arithmetics, actually not shown in this example), operator precedence (arithmetics) and '{}' (statements). The application of an operation to arguments is expressed either in infix notation (arithmetics), prefix with '()' (function call), plain prefix (flow control keywords) or implicitly (declarations).
Of course this mess is far removed from the theoretical purity of Lisp's S-expressions. However it allows us to very quickly distinguish between different kinds of operations and different kinds of lists of terms. Looking for a declaration - spot names separated by whitespace, looking for function calls - find name + '()', and so on.
Redundancy therefore clearly serves to support readability (or "glanceability"). Too much of it on the other hand will certainly have an opposite effect. The optimal syntax will consequently add just enough redundancy to improve readability. (side note: There is also useless redundancy - Pascal is a lot more redundant than C, however mostly due to the fact that it uses keywords instead of punctuation and longer keywords. In my opinion this reduces readability. A similar argument could be made for Java.)
To maximize the effect of syntax it is also important that there is as little ambiguity in the correspondence between syntactic elements and semantic structure as possible. A nice counter-example is provided by C++. By "overloading" old syntax it becomes a lot harder to read (quickly) than C.
In Eta I wanted the overall look to stay somewhere in the vicinity of a traditional curly-brace language. At the same time I wanted it to be as simple and regular as possible while defining an unambiguous relationship between syntactic elements and semantics. (side note: This sounds a lot more goal-oriented than it was. Actually it took me quite a while to find out that these were the goals I was aiming for.)
This post is already long enough however, therefore I will postpone the details of Eta's syntax to the next post. As a small teaser the example from above rewritten in Eta:
1 Point @ type : (x @ float, y @ float)
2
3 point_dist(p1 @ Point, p2 @ Point) @ float :
4 {
5 dx @ float : p2.x-p1.x
6 dy @ float : p2.y-p1.y
7
8 <- sqrt` dx*dx + dy*dy
9 }
Thursday, 22 July 2010
Eta
In the last two weeks what began as a small attempt at writing a work-saving template system for individual-based simulations turned into my first sort-of somewhat working (but not so work-saving) compiler for Eta (or η).
Eta is a project I have been working on/thinking about since a couple of years already. It started off with my increasing dislike for all the syntactic and semantic warts of the bloated mess that is C++. At the time I was desperately looking for an alternative, however everything I found that promised enough performance was either immature or only slightly less warty and bloated (sidenote: I think D is a much better language than C++ and I really hope it catches on. Still, it's far from being a good language IMHO.).
So I did what everybody who should instead really, really work on his PhD thesis (and I'm *not* talking about a PhD in Computer Science) would do - I started thinking about how to design my own language.
As others have said language design tends to be more successful when it tries to scratch a personal itch than when it attempts to solve other people's problems. In my case I really wanted a language that would make it easier and more fun to implement the simulations I am working on. That means the language had to be
macros"string mixins", traits and conditional compilation?).
Apart from these general principles I had a couple of specific technical ideas about mechanisms I wanted to include in the language. I will leave the details on that, on how I implemented Eta and on how the language looks like currently to the next post, however.
Eta is a project I have been working on/thinking about since a couple of years already. It started off with my increasing dislike for all the syntactic and semantic warts of the bloated mess that is C++. At the time I was desperately looking for an alternative, however everything I found that promised enough performance was either immature or only slightly less warty and bloated (sidenote: I think D is a much better language than C++ and I really hope it catches on. Still, it's far from being a good language IMHO.).
So I did what everybody who should instead really, really work on his PhD thesis (and I'm *not* talking about a PhD in Computer Science) would do - I started thinking about how to design my own language.
As others have said language design tends to be more successful when it tries to scratch a personal itch than when it attempts to solve other people's problems. In my case I really wanted a language that would make it easier and more fun to implement the simulations I am working on. That means the language had to be
- fast. It *does* make a difference whether I have to wait 5 days or 8 days for a set of simulations to finish.
- strictly and statically typed. As I said before the major difficulty when writing simulations is that errors are often silent. Every bit of static guarantee the compiler can give helps.
- interoperable with C/C++. I'm not going to reimplement all the libraries I use.
- expressive. For reasons of efficiency dynamic operations are often out in simulations. Therefore similar redundant patterns start to show up at lots of different places. E.g. if I add a new trait to an individual it has to be read, set, initialized, mutated, written to the data file, read from a config file, etc. Some of it can be alleviated by some advanced template wizardry but in the end I sooner or later usually fall back to external code generation. My ideal language would have built-in compile-time macros to solve this problem.
- clear and unambiguous. One problem with C++ is that understanding what *exactly* a particular piece of code does can be non-trivial. Apart from syntactic idiosyncrasies things like silent shadowing of globals and implicit conversion rules make it necessary to be aware of a big amount of context to understand local semantics. This is especially a problem for simulations since (due to lack of external tests) code review plays an essential role in ensuring their correctness. A good language should therefore reduce the amount of context necessary to understand a piece of code as much as possible.
Apart from these general principles I had a couple of specific technical ideas about mechanisms I wanted to include in the language. I will leave the details on that, on how I implemented Eta and on how the language looks like currently to the next post, however.
Subscribe to:
Posts (Atom)