These days retro-minimalism in desktop computing has become quite fashionable. People play old console classics, use minimalistic tiling window managers and many people go back to using text editors that their parents already grew up with.
I have always found this trend a bit weird, especially with respect to text editors. Why on earth should I throw 20 years of CUA, a mousable gui and the convenience of not having to constantly switch between different modes overboard and start to use vi?
I myself learned to program with QuickBasic. Although Windows (I think some version 2.0 or 3.0) *was* running on my computer I did not use it since it was notoriously unstable and stole away precious RAM from my individual-based simulations (yes, this is what I did when I was 16...). Also the DOS IDE QuickBasic came with was actually quite nice. The built-in editor implemented a standard CUA interface, therefore that is what I became used to.
I was accordingly shocked when I started university and had to learn vi. Horrible non-intuitive key bindings, a modal interface, no menus - I was disgusted. Out of necessity I learned enough vi to get along (we weren't even allowed to use X terminals for the first two years), but I never liked it. At home I continued to use QuickBasic and QuickC, or, when I started to switch to DJGPP (it allowed programming for protected mode which meant I could use the whole *8MB* of RAM without jumping through hoops) I just went with the simple DOS edit command and later with rhide.
Later, after switching to Linux I luckily stumbled upon NEdit. NEdit actually made me happy - it was fast, featureful and extremely configurable and scriptable. Unfortunately - being based on Motif - NEdit was also firmly rooted in the ancient past of Unix UI technology. Since there also was no sign of ongoing development I finally with great regret abandoned it (together with WindowMaker and my iBook) a couple of years ago.
Since then I have tried all CUA editors and IDEs I could get a hold on but none of them really satisfied me. Having replaced my iBook with a MacBook I decided to try OS X for a while, which didn't make the text editor situation any easier. I didn't manage to warm up to XCode and Smultron although nice felt a bit too locked in (just in case I wanted to ditch OS X again) therefore I stuck for a while with jEdit. After my annoyance with OS X and my MacBook had grown strong enough to go back to Linux (on a nice Lenovo R61) I decided to give kate and kwrite a try, which I have used since then.
A couple of weeks ago after having encountered another weird annoying bug in kwrite and an extensive round of checking on all linux text editors I know of I finally had enough. I decided to give vi a second chance.
And to my own utter surprise this time around I actually liked it. Vim really excels at configurability, speed, syntax highlighting and multi-window editing (due to its bugginess a constant pain in kate). On the other hand I didn't find it difficult at all to memorize the odd key combinations (though having a cheat sheet pasted to the wall doesn't hurt) and somehow they just didn't feel nearly as unintuitive as during previous encounters. Even the constant switching between insert and command mode which has always been my biggest issue with vi hasn't started to annoy me yet.
I could imagine that it's an age thing. It took me nearly 30 years to start to like olives, capers and anchovy and now I love them (visiting Sicily helped...).
Maybe vi really is an acquired taste and I just needed to pass the 35 to finally learn to appreciate it.
P.S.: While ditching kate/kwrite I also changed (back) to E17 and modified all color schemes to bright on dark. Could be that I'm just following fashion after all...
Tuesday, 9 November 2010
Sunday, 31 October 2010
D is finally (nearly) there
Last week, in order to avoid working on more important things I started a small project I had wanted to do since quite a while. As mentioned before I am really not happy with C++ as a language to write simulations in. Unfortunately according to the Great Language Shootout all the nice languages are way too slow to be seriously considered.
However - as aptly explained on the shootout page - comparing the speed of programming languages and even the speed of implementations of programming languages is not very meaningful. Results will vary widely dependent on application area.
Therefore the only reasonable thing to do is to implement a benchmark which is representative of the kind of program one is interested in in all candidate languages. Which is exactly what I have started to do.
I have defined a reference model and implemented it in (very very basic) C++ and (C++ish) D (and CUDA which strictly speaking doesn't belong here. More on that in a later blog post - hopefully. I'm really bad at actually writing posts that I have announced before...).
To my great delight it turned out that the D program ran only about 10% longer than my basic C++ version. In principle this is clearly a small enough loss in speed to be compensated for by the nice improvements D offers over C++. I was really disappointed though (you can see how much I would like to abandon C++) when I found out a bit later that PGO (profile guided optimization) gives my C++ program another 20% boost. Add to that the fact that 64-bit D is still some way off and that value types are second class citizens in D and I'm not really sure whether I will do my next project in D or not. In any case it was nice to see the progress they have made. I am optimistic that my C++-days are numbered...
As for the shootout - I will try to find the time to add "proper" implementations in D and C++ (which hopefully will not perform much differently). I hope I will also be able to cover other interesting or up-and-coming languages, such as OCaml or Bitc. If anyone of my three readers feels willing and able to help out - just head over to the project page, read the model definition and hack away. I will be happy to post code/results.
However - as aptly explained on the shootout page - comparing the speed of programming languages and even the speed of implementations of programming languages is not very meaningful. Results will vary widely dependent on application area.
Therefore the only reasonable thing to do is to implement a benchmark which is representative of the kind of program one is interested in in all candidate languages. Which is exactly what I have started to do.
I have defined a reference model and implemented it in (very very basic) C++ and (C++ish) D (and CUDA which strictly speaking doesn't belong here. More on that in a later blog post - hopefully. I'm really bad at actually writing posts that I have announced before...).
To my great delight it turned out that the D program ran only about 10% longer than my basic C++ version. In principle this is clearly a small enough loss in speed to be compensated for by the nice improvements D offers over C++. I was really disappointed though (you can see how much I would like to abandon C++) when I found out a bit later that PGO (profile guided optimization) gives my C++ program another 20% boost. Add to that the fact that 64-bit D is still some way off and that value types are second class citizens in D and I'm not really sure whether I will do my next project in D or not. In any case it was nice to see the progress they have made. I am optimistic that my C++-days are numbered...
As for the shootout - I will try to find the time to add "proper" implementations in D and C++ (which hopefully will not perform much differently). I hope I will also be able to cover other interesting or up-and-coming languages, such as OCaml or Bitc. If anyone of my three readers feels willing and able to help out - just head over to the project page, read the model definition and hack away. I will be happy to post code/results.
Thursday, 28 October 2010
Expensive data
As a theoretical biologist a lot of my research involves burning a *lot* of CPU time on computer simulations of evolving animal populations.
Usually I run a program for thousands of generations, each of which consists of hundreds to thousands of time steps during each of which hundreds of individuals interact with each other and their environment. This has to be replicated a dozen or so times with different seeds of the random number generator. The whole thing has then to be repeated for each combination of parameters I'm interested in.
To give an idea of the scale: Running 10k generations (which has been argued to be far too little) of the simulation I am currently working on on a typical node of my university's cluster (newish multi-core Opterons) takes about 3-5 hours. One standard sweep of the parameter space has 64 parameter combinations (which leaves out so many fascinating possibilities that it hurts) times 10 replicates each, thus 640 runs (each of those sets produces 4-5 GB of data, by the way).
In a typical project I tend to rewrite and change the simulation program many times, first of all to find bugs but then also as part of an iterative process where I create the program and run it, look at the results, think about them, adjust my opinion about the model/question/approach, change the program, run it, etc.
For the latest incarnation of my current project (the 4th or 5th major one) I have now done 25 of the above mentioned parameter sets. That means for just one part of the project I have already used more than 7 CPU-years and produced more than 100GB of data. And that is by far not going to be the end of it...
Usually I run a program for thousands of generations, each of which consists of hundreds to thousands of time steps during each of which hundreds of individuals interact with each other and their environment. This has to be replicated a dozen or so times with different seeds of the random number generator. The whole thing has then to be repeated for each combination of parameters I'm interested in.
To give an idea of the scale: Running 10k generations (which has been argued to be far too little) of the simulation I am currently working on on a typical node of my university's cluster (newish multi-core Opterons) takes about 3-5 hours. One standard sweep of the parameter space has 64 parameter combinations (which leaves out so many fascinating possibilities that it hurts) times 10 replicates each, thus 640 runs (each of those sets produces 4-5 GB of data, by the way).
In a typical project I tend to rewrite and change the simulation program many times, first of all to find bugs but then also as part of an iterative process where I create the program and run it, look at the results, think about them, adjust my opinion about the model/question/approach, change the program, run it, etc.
For the latest incarnation of my current project (the 4th or 5th major one) I have now done 25 of the above mentioned parameter sets. That means for just one part of the project I have already used more than 7 CPU-years and produced more than 100GB of data. And that is by far not going to be the end of it...
Sunday, 24 October 2010
Hiking in movies
One of the things I love about The Hobbit and The Lord of the Rings is that while reading those books you can really *feel* how it is to travel. In particular you can feel how it is to travel by foot.
I have done some hiking myself and at least for me there is a special magic to exploring a landscape on your own two feet that is not evoked by any other form of travel. I'm not sure what it is - it includes moments like reaching the highest point of a pass and finally being able to see the valley on the other side, or looking back after an hour of walking and realizing the distance you made - but there is much more to it and I can't easily describe it in a few sentences.
Whatever it is, I think Tolkien managed to transport it quite well in his novels (actually it is said that Tolkien himself loved to make long walks through the English countryside). In Peter Jackson's movies on the other hand it is missing almost entirely. There is certainly no lack of trying - we see great panoramas of landscapes, helicopter flights through snow-covered mountains; we follow the heroes as they walk through brush, moorland, grassland, forest and all other kinds of temperate biome you could imagine. We see them walking, stumbling and climbing.
But still - at least for me this always looks like actors dropped in a scenic landscape (which it of course is) - that is, slightly soulless and artificial. I don't even think the movies are bad in general, I think given the economical constraints (mass appeal required to get back the gigantic investment) they are even close to a best-case scenario. But in this particular aspect they fail almost completely.
But now comes the funny thing. The other day I checked some video clips we made when we had been hiking in the Peak District with the kids the last time. Nothing special really - greyish weather, us, sheep, some hills. But there it was - even in these short amateurish clips, made with a cheap flip camera, I found the "spirit of hiking" was clearly recognizable!
Now, the really interesting thing to ask is of course, why is that so? The non-interesting answer would be that my personal experience (having been on that hike myself) colours my perception and that for everybody else the videos would be just as soulless as the mentioned movie scenes. This is perfectly possible of course, however I find it much more interesting to imagine that there is more to it than just that.
Here are a number of factors that I think might be responsible:
perfection
Pictures in Hollywood movies are perfect and glossy, my clips aren't. Perfection creates distance and a feeling of artificialness.
objective camera
In the movies we see the heroes stoically marching through New Zealand's Best Of. Since the heroes as well as said Best Of (and especially the combination of the two) cost a lot of money and supposedly are what the viewers want to see, the camera is quite busy putting them in the best light. Most of the time therefore we either see the landscape at a wide angle with the group of people somewhere in the middle or we see the latter from the front or the side passing the camera's position. That means we get an uninvolved (helicopter-equipped) spectator's view of what's going on, which is how we would experience a landscape when sitting in a car or a train (or a heli), but *not* how we experience it when walking through it.
speed
When a movie wants to show us that a car for instance is moving very fast, we usually see it approaching (very fast) to the camera's position (preferably at a turn), passing it and then moving away. The camera usually stays fixed at its position and only turns to keep the car in focus.
When filming people walking in contrast directors seem to think that walking per se is a far too boring activity to keep the viewer's attention. Therefore the camera compensates by moving around the person. Approaching it from the back, passing it, approaching it from the front, circling it, etc. This all makes for a busy picture however it does *not* convey the feeling of slowness that is defining for walking. I think essentially walking is usually shown as an activity while in reality it is more of a state.
So, after all this - what do we see in my clip? We see a slow pan of the (greyish, wet, sheep-dotted) landscape. Then a group of people overtakes the camera and slowly (in walking speed that is) moves downwards a small hill, climbs a fence gate and disappears down a path. As I said, really nothing special and more than anything proof of my utter lack of cinematographic ability. Still (for me) it perfectly transports the slowness and smallness of people moving through a landscape by foot.
I have done some hiking myself and at least for me there is a special magic to exploring a landscape on your own two feet that is not evoked by any other form of travel. I'm not sure what it is - it includes moments like reaching the highest point of a pass and finally being able to see the valley on the other side, or looking back after an hour of walking and realizing the distance you made - but there is much more to it and I can't easily describe it in a few sentences.
Whatever it is, I think Tolkien managed to transport it quite well in his novels (actually it is said that Tolkien himself loved to make long walks through the English countryside). In Peter Jackson's movies on the other hand it is missing almost entirely. There is certainly no lack of trying - we see great panoramas of landscapes, helicopter flights through snow-covered mountains; we follow the heroes as they walk through brush, moorland, grassland, forest and all other kinds of temperate biome you could imagine. We see them walking, stumbling and climbing.
But still - at least for me this always looks like actors dropped in a scenic landscape (which it of course is) - that is, slightly soulless and artificial. I don't even think the movies are bad in general, I think given the economical constraints (mass appeal required to get back the gigantic investment) they are even close to a best-case scenario. But in this particular aspect they fail almost completely.
But now comes the funny thing. The other day I checked some video clips we made when we had been hiking in the Peak District with the kids the last time. Nothing special really - greyish weather, us, sheep, some hills. But there it was - even in these short amateurish clips, made with a cheap flip camera, I found the "spirit of hiking" was clearly recognizable!
Now, the really interesting thing to ask is of course, why is that so? The non-interesting answer would be that my personal experience (having been on that hike myself) colours my perception and that for everybody else the videos would be just as soulless as the mentioned movie scenes. This is perfectly possible of course, however I find it much more interesting to imagine that there is more to it than just that.
Here are a number of factors that I think might be responsible:
perfection
Pictures in Hollywood movies are perfect and glossy, my clips aren't. Perfection creates distance and a feeling of artificialness.
objective camera
In the movies we see the heroes stoically marching through New Zealand's Best Of. Since the heroes as well as said Best Of (and especially the combination of the two) cost a lot of money and supposedly are what the viewers want to see, the camera is quite busy putting them in the best light. Most of the time therefore we either see the landscape at a wide angle with the group of people somewhere in the middle or we see the latter from the front or the side passing the camera's position. That means we get an uninvolved (helicopter-equipped) spectator's view of what's going on, which is how we would experience a landscape when sitting in a car or a train (or a heli), but *not* how we experience it when walking through it.
speed
When a movie wants to show us that a car for instance is moving very fast, we usually see it approaching (very fast) to the camera's position (preferably at a turn), passing it and then moving away. The camera usually stays fixed at its position and only turns to keep the car in focus.
When filming people walking in contrast directors seem to think that walking per se is a far too boring activity to keep the viewer's attention. Therefore the camera compensates by moving around the person. Approaching it from the back, passing it, approaching it from the front, circling it, etc. This all makes for a busy picture however it does *not* convey the feeling of slowness that is defining for walking. I think essentially walking is usually shown as an activity while in reality it is more of a state.
So, after all this - what do we see in my clip? We see a slow pan of the (greyish, wet, sheep-dotted) landscape. Then a group of people overtakes the camera and slowly (in walking speed that is) moves downwards a small hill, climbs a fence gate and disappears down a path. As I said, really nothing special and more than anything proof of my utter lack of cinematographic ability. Still (for me) it perfectly transports the slowness and smallness of people moving through a landscape by foot.
Monday, 2 August 2010
Eta, Part II - Syntax (part I)
Many people have pointed out that language designers tend to obsess over syntax far too much and that their time would be better spent thinking about the semantics of their languages. Some (usually those who are either more academically inclined or old lispers) go so far as claiming that syntax is ultimately irrelevant, since a) which syntax someone prefers is largely a matter of taste anyways and b) every syntax becomes 'natural' after sufficient exposure.
Well, this topic has been discussed thoroughly, and I will only add to it to the extent that I am going to justify my own design decisions on the matter.
On the most abstract level a program can be thought of as a nested structure of operations being applied to sub-units which again consist of operations being applied to sub-sub-units, and so forth. Within a compiler this structure is usually represented as a so-called AST (abstract syntax tree).
If we print out an AST in parenthesized polish notation we would essentially end up with Lisp's syntax. This very elegant idea has a couple of advantages - it is extremely simple, easy to parse and totally generic (note though that the oft-heralded homoiconicity of Lisps is a red herring in my opinion - in every language that I know of it would not be difficult to represent a program's AST in the language itself).
On the other hand - at least for me - this genericity makes programs more difficult to read, especially at a glance, since it lacks redundancy. In Lisp the only carriers of information about the structure of a program are the names of operations and the nesting structure. In most main-stream languages however syntax is used as an additional redundant channel of communication. This redundancy makes it much easier to quickly grasp the structure of a piece of source code.
Have a look at this bit of C for example:
We can see that the same basic functionality is provided by very different syntactical elements depending on the context. The separation of terms for example is done by whitespace (top-level), ',' (declarations) and ';' (statements). Grouping is done by '()' (arithmetics, actually not shown in this example), operator precedence (arithmetics) and '{}' (statements). The application of an operation to arguments is expressed either in infix notation (arithmetics), prefix with '()' (function call), plain prefix (flow control keywords) or implicitly (declarations).
Of course this mess is far removed from the theoretical purity of Lisp's S-expressions. However it allows us to very quickly distinguish between different kinds of operations and different kinds of lists of terms. Looking for a declaration - spot names separated by whitespace, looking for function calls - find name + '()', and so on.
Redundancy therefore clearly serves to support readability (or "glanceability"). Too much of it on the other hand will certainly have an opposite effect. The optimal syntax will consequently add just enough redundancy to improve readability. (side note: There is also useless redundancy - Pascal is a lot more redundant than C, however mostly due to the fact that it uses keywords instead of punctuation and longer keywords. In my opinion this reduces readability. A similar argument could be made for Java.)
To maximize the effect of syntax it is also important that there is as little ambiguity in the correspondence between syntactic elements and semantic structure as possible. A nice counter-example is provided by C++. By "overloading" old syntax it becomes a lot harder to read (quickly) than C.
In Eta I wanted the overall look to stay somewhere in the vicinity of a traditional curly-brace language. At the same time I wanted it to be as simple and regular as possible while defining an unambiguous relationship between syntactic elements and semantics. (side note: This sounds a lot more goal-oriented than it was. Actually it took me quite a while to find out that these were the goals I was aiming for.)
This post is already long enough however, therefore I will postpone the details of Eta's syntax to the next post. As a small teaser the example from above rewritten in Eta:
Well, this topic has been discussed thoroughly, and I will only add to it to the extent that I am going to justify my own design decisions on the matter.
On the most abstract level a program can be thought of as a nested structure of operations being applied to sub-units which again consist of operations being applied to sub-sub-units, and so forth. Within a compiler this structure is usually represented as a so-called AST (abstract syntax tree).
If we print out an AST in parenthesized polish notation we would essentially end up with Lisp's syntax. This very elegant idea has a couple of advantages - it is extremely simple, easy to parse and totally generic (note though that the oft-heralded homoiconicity of Lisps is a red herring in my opinion - in every language that I know of it would not be difficult to represent a program's AST in the language itself).
On the other hand - at least for me - this genericity makes programs more difficult to read, especially at a glance, since it lacks redundancy. In Lisp the only carriers of information about the structure of a program are the names of operations and the nesting structure. In most main-stream languages however syntax is used as an additional redundant channel of communication. This redundancy makes it much easier to quickly grasp the structure of a piece of source code.
Have a look at this bit of C for example:
3 struct Point
4 {
5 float x, y;
6 };
7
8 float point_dist(Point p1, Point p2)
9 {
10 float dx = p2.x-p1.x, dy = p2.y-p1.y;
11
12 return sqrt(dx*dx + dy*dy);
13 }
We can see that the same basic functionality is provided by very different syntactical elements depending on the context. The separation of terms for example is done by whitespace (top-level), ',' (declarations) and ';' (statements). Grouping is done by '()' (arithmetics, actually not shown in this example), operator precedence (arithmetics) and '{}' (statements). The application of an operation to arguments is expressed either in infix notation (arithmetics), prefix with '()' (function call), plain prefix (flow control keywords) or implicitly (declarations).
Of course this mess is far removed from the theoretical purity of Lisp's S-expressions. However it allows us to very quickly distinguish between different kinds of operations and different kinds of lists of terms. Looking for a declaration - spot names separated by whitespace, looking for function calls - find name + '()', and so on.
Redundancy therefore clearly serves to support readability (or "glanceability"). Too much of it on the other hand will certainly have an opposite effect. The optimal syntax will consequently add just enough redundancy to improve readability. (side note: There is also useless redundancy - Pascal is a lot more redundant than C, however mostly due to the fact that it uses keywords instead of punctuation and longer keywords. In my opinion this reduces readability. A similar argument could be made for Java.)
To maximize the effect of syntax it is also important that there is as little ambiguity in the correspondence between syntactic elements and semantic structure as possible. A nice counter-example is provided by C++. By "overloading" old syntax it becomes a lot harder to read (quickly) than C.
In Eta I wanted the overall look to stay somewhere in the vicinity of a traditional curly-brace language. At the same time I wanted it to be as simple and regular as possible while defining an unambiguous relationship between syntactic elements and semantics. (side note: This sounds a lot more goal-oriented than it was. Actually it took me quite a while to find out that these were the goals I was aiming for.)
This post is already long enough however, therefore I will postpone the details of Eta's syntax to the next post. As a small teaser the example from above rewritten in Eta:
1 Point @ type : (x @ float, y @ float)
2
3 point_dist(p1 @ Point, p2 @ Point) @ float :
4 {
5 dx @ float : p2.x-p1.x
6 dy @ float : p2.y-p1.y
7
8 <- sqrt` dx*dx + dy*dy
9 }
Thursday, 22 July 2010
Eta
In the last two weeks what began as a small attempt at writing a work-saving template system for individual-based simulations turned into my first sort-of somewhat working (but not so work-saving) compiler for Eta (or η).
Eta is a project I have been working on/thinking about since a couple of years already. It started off with my increasing dislike for all the syntactic and semantic warts of the bloated mess that is C++. At the time I was desperately looking for an alternative, however everything I found that promised enough performance was either immature or only slightly less warty and bloated (sidenote: I think D is a much better language than C++ and I really hope it catches on. Still, it's far from being a good language IMHO.).
So I did what everybody who should instead really, really work on his PhD thesis (and I'm *not* talking about a PhD in Computer Science) would do - I started thinking about how to design my own language.
As others have said language design tends to be more successful when it tries to scratch a personal itch than when it attempts to solve other people's problems. In my case I really wanted a language that would make it easier and more fun to implement the simulations I am working on. That means the language had to be
macros"string mixins", traits and conditional compilation?).
Apart from these general principles I had a couple of specific technical ideas about mechanisms I wanted to include in the language. I will leave the details on that, on how I implemented Eta and on how the language looks like currently to the next post, however.
Eta is a project I have been working on/thinking about since a couple of years already. It started off with my increasing dislike for all the syntactic and semantic warts of the bloated mess that is C++. At the time I was desperately looking for an alternative, however everything I found that promised enough performance was either immature or only slightly less warty and bloated (sidenote: I think D is a much better language than C++ and I really hope it catches on. Still, it's far from being a good language IMHO.).
So I did what everybody who should instead really, really work on his PhD thesis (and I'm *not* talking about a PhD in Computer Science) would do - I started thinking about how to design my own language.
As others have said language design tends to be more successful when it tries to scratch a personal itch than when it attempts to solve other people's problems. In my case I really wanted a language that would make it easier and more fun to implement the simulations I am working on. That means the language had to be
- fast. It *does* make a difference whether I have to wait 5 days or 8 days for a set of simulations to finish.
- strictly and statically typed. As I said before the major difficulty when writing simulations is that errors are often silent. Every bit of static guarantee the compiler can give helps.
- interoperable with C/C++. I'm not going to reimplement all the libraries I use.
- expressive. For reasons of efficiency dynamic operations are often out in simulations. Therefore similar redundant patterns start to show up at lots of different places. E.g. if I add a new trait to an individual it has to be read, set, initialized, mutated, written to the data file, read from a config file, etc. Some of it can be alleviated by some advanced template wizardry but in the end I sooner or later usually fall back to external code generation. My ideal language would have built-in compile-time macros to solve this problem.
- clear and unambiguous. One problem with C++ is that understanding what *exactly* a particular piece of code does can be non-trivial. Apart from syntactic idiosyncrasies things like silent shadowing of globals and implicit conversion rules make it necessary to be aware of a big amount of context to understand local semantics. This is especially a problem for simulations since (due to lack of external tests) code review plays an essential role in ensuring their correctness. A good language should therefore reduce the amount of context necessary to understand a piece of code as much as possible.
Apart from these general principles I had a couple of specific technical ideas about mechanisms I wanted to include in the language. I will leave the details on that, on how I implemented Eta and on how the language looks like currently to the next post, however.
Friday, 18 June 2010
gender roles in a nutshell
Since quite a while already I have been planning to write a blog post about gender issues and why I think it might be that we are heading towards a "matriarchy" (heading, mind you, and really rather slowly) - all totally speculative and hypothetical of course. This would fit rather nicely with the Zeitgeist: It seems it is (regrettably) becoming fashionable again to attribute socioeconomic gender differences to (hypothetical) biological differences - and to say so publicly (see Dr Isis' nice take on a recent high-profile case). Accordingly there's lots of discussion about gender issues in the blogosphere.
Last week however reality caught up with my comfortable theoretical stance. Within three days my two oldest children managed to demonstrate to me in a nutshell what the discussion is about.
First my older son (2 1/2 years) decided one morning that he wanted to wear a skirt that day. As such this wouldn't be something to write home about - if he wants to wear a skirt, so be it. However the thing occurred on a week day, so the kids had to go to nursery. Our kids' nursery is really nice (and definitely much nicer than the one they had been to before) but the women (yes, it's only women...) in my son's group are not exactly the brightest and most open-minded people on the planet. Also, my son is easily embarassed and is having sort of a hard time at the moment anyways... So, in the end we actually managed to talk him out of it without (hopefully) making it too obvious... But I'm not proud of it.
Then, a couple of days later, I had this slightly surreal conversation with my daughter (4 1/2):
There we have it, the problems of gender roles neatly presented by my two pre-school-age children. Boys have to follow stereotypes. Girls think they are worth less. Apart from throwing my hands up in dismay, I don't know what to do...
Last week however reality caught up with my comfortable theoretical stance. Within three days my two oldest children managed to demonstrate to me in a nutshell what the discussion is about.
First my older son (2 1/2 years) decided one morning that he wanted to wear a skirt that day. As such this wouldn't be something to write home about - if he wants to wear a skirt, so be it. However the thing occurred on a week day, so the kids had to go to nursery. Our kids' nursery is really nice (and definitely much nicer than the one they had been to before) but the women (yes, it's only women...) in my son's group are not exactly the brightest and most open-minded people on the planet. Also, my son is easily embarassed and is having sort of a hard time at the moment anyways... So, in the end we actually managed to talk him out of it without (hopefully) making it too obvious... But I'm not proud of it.
Then, a couple of days later, I had this slightly surreal conversation with my daughter (4 1/2):
she - You know, I don't like it that there are girls...I'm not sure how serious this is/she was. It is absolutely possible that she was channeling a book or a movie (which she must have read/seen at nursery...) or some other kid at the nursery. Still, it has me slightly worried.
me (baffled, because she tried to literally translate the English expression into German and I wasn't sure what she meant) - Huh? What do you mean?
she - Because boys are best!
me - What? But that's total nonsense!
she - And I only like penises.
There we have it, the problems of gender roles neatly presented by my two pre-school-age children. Boys have to follow stereotypes. Girls think they are worth less. Apart from throwing my hands up in dismay, I don't know what to do...
Sunday, 30 May 2010
Planet Shapes or The Bliss of Pointless Nerdy Hobbies
A couple of weeks ago I finally found a (tentative) solution for a problem that I have been mulling over since nearly 20 years. The stubbornness of this problem was only rivaled by its absolute pointlessness and obscurity. Let me elaborate.
RPG obsession
When I was 14 or 15 a friend of mine asked me an innocent question which turned out to have a huge impact on my life - he invited me to join his MERP group. I am not entirely sure whether I already had read Lord of the Rings at that time but I definitely had read the Hobbit, therefore although I totally did *not* get the concept of role playing games I was happy to give it a try.
This triggered an addiction-like fascination for role-playing games, Fantasy and Sci-Fi which only subsided years later. I spent countless hours reading F&SF novels (many of them crappy and most of them crappily translated) and playing MERP, Midgard, Warhammer and RoleMaster (sometimes three to four times a week). Luckily school in Germany usually ends at 13:00, therefore the amount of school hours I ended up sacrificing for my hobby were limited enough to let me get through school more or less successfully.
Reinventing
One peculiar thing about me being obsessed with something is that I usually very quickly begin to feel unsatisfied with the way things are. After a while I start to think 'this is really not good I could easily do this much better'. In many cases this is of course a blatant overestimation of my abilities (or an underestimation of the difficulty of the problem), but usually I end up spending many fun hours on some creative activity and I always learn a lot from it.
Anyways, in the case of the RPG obsession I of course immediately started wanting to design my own RPG system (I think I showed up at my third session of MERP with a new, "improved" character sheet, which was much more complicated and much less useful than the regular one) and my own fantasy world. Since starting small was never my thing this fantasy world of course had to have a complete history, its own biology, its own geology, ... you get the drift.
The problem
One problem I encountered early on was that I would have liked my hand-drawn maps to be an *exact* representation of my invented reality. I really hated the whole idea of having to live with a sub-optimal 2-dimensional projection of a 3-dimensional curved surface. There are some standard solutions to this problem - besides pragmatically accepting the imperfection of maps - such as ignoring it (usually done by fantasy authors), or assuming the world is a disk. However I wanted my world to be plausible with as little alterations to real-world physics as possible. I thought about many different solutions, from the dumb to the downright bizarre, but none of them was even close to satisfying. In the end I had to leave the problem unsolved.
The solution
Until three weeks ago, when I found this. After having believed for many years that the only way a planets shape can change with increasing angular momentum was to become an increasingly flatter oblate spheroid I learned to my utter astonishment that there are at least two other (slightly bizarre) possible shapes. Both of them would not solve my problem but greatly alleviate it. With a cigar-shaped planet the inaccuracies of the map are quite small (as long as you stay away from the ends of the cigar), but rotation speed and gravity should still occur in earth-like combinations. Outlandish but physically plausible - a great solution! Now I just have to find a way to actually calculate which combinations of rotation, density, mass and shape allow for earth-like conditions...
Parting words
As usual for me with these projects, after an initial time of furious activity my interest in my RPG world somewhat tapered off back then and the project never reached anything resembling completion (it reincarnated a while later as the setting for my own horribly complicated version of the board game Civilization). Luckily I am nowadays a bit wiser than when I was young and don't let myself be depressed by another unfinished project. I rather see fiddling around with one of them as an aimless entertaining activity which I pursue for its own sake not in order to produce something great.
And - as in this case - I usually learn something which is really obscure and pointless but fascinating.
RPG obsession
When I was 14 or 15 a friend of mine asked me an innocent question which turned out to have a huge impact on my life - he invited me to join his MERP group. I am not entirely sure whether I already had read Lord of the Rings at that time but I definitely had read the Hobbit, therefore although I totally did *not* get the concept of role playing games I was happy to give it a try.
This triggered an addiction-like fascination for role-playing games, Fantasy and Sci-Fi which only subsided years later. I spent countless hours reading F&SF novels (many of them crappy and most of them crappily translated) and playing MERP, Midgard, Warhammer and RoleMaster (sometimes three to four times a week). Luckily school in Germany usually ends at 13:00, therefore the amount of school hours I ended up sacrificing for my hobby were limited enough to let me get through school more or less successfully.
Reinventing
One peculiar thing about me being obsessed with something is that I usually very quickly begin to feel unsatisfied with the way things are. After a while I start to think 'this is really not good I could easily do this much better'. In many cases this is of course a blatant overestimation of my abilities (or an underestimation of the difficulty of the problem), but usually I end up spending many fun hours on some creative activity and I always learn a lot from it.
Anyways, in the case of the RPG obsession I of course immediately started wanting to design my own RPG system (I think I showed up at my third session of MERP with a new, "improved" character sheet, which was much more complicated and much less useful than the regular one) and my own fantasy world. Since starting small was never my thing this fantasy world of course had to have a complete history, its own biology, its own geology, ... you get the drift.
The problem
One problem I encountered early on was that I would have liked my hand-drawn maps to be an *exact* representation of my invented reality. I really hated the whole idea of having to live with a sub-optimal 2-dimensional projection of a 3-dimensional curved surface. There are some standard solutions to this problem - besides pragmatically accepting the imperfection of maps - such as ignoring it (usually done by fantasy authors), or assuming the world is a disk. However I wanted my world to be plausible with as little alterations to real-world physics as possible. I thought about many different solutions, from the dumb to the downright bizarre, but none of them was even close to satisfying. In the end I had to leave the problem unsolved.
The solution
Until three weeks ago, when I found this. After having believed for many years that the only way a planets shape can change with increasing angular momentum was to become an increasingly flatter oblate spheroid I learned to my utter astonishment that there are at least two other (slightly bizarre) possible shapes. Both of them would not solve my problem but greatly alleviate it. With a cigar-shaped planet the inaccuracies of the map are quite small (as long as you stay away from the ends of the cigar), but rotation speed and gravity should still occur in earth-like combinations. Outlandish but physically plausible - a great solution! Now I just have to find a way to actually calculate which combinations of rotation, density, mass and shape allow for earth-like conditions...
Parting words
As usual for me with these projects, after an initial time of furious activity my interest in my RPG world somewhat tapered off back then and the project never reached anything resembling completion (it reincarnated a while later as the setting for my own horribly complicated version of the board game Civilization). Luckily I am nowadays a bit wiser than when I was young and don't let myself be depressed by another unfinished project. I rather see fiddling around with one of them as an aimless entertaining activity which I pursue for its own sake not in order to produce something great.
And - as in this case - I usually learn something which is really obscure and pointless but fascinating.
Tuesday, 4 May 2010
C++ sucks for simulations
Today I once again had to realize that C/C++ is really a bad language to use for numerical simulations. In two heavily scrutinized (and often used) pieces of code I found two simple, yet far-reaching bugs which would immediately have been spotted by the compiler or the run-time system, respectively, had I used a decent language.
I implemented the first version of the simulation I am currently (again) working on about six years ago. Since then I made countless changes and refinements, although the general structure remained the same (yay structured programming). In particular the core functions of the model class (which implement what the model actually does) I tend to revise quite often in an iterative process of checking results, trying to understand them and coming up with new ways to test whether what I come up with is what's actually happening.
The stable and general part of the code I am producing I usually pull into a private library (which is available from sourceforge but so much a work in progress that I won't link to it) after some time.
One of the bugs occurred in the core model, the other one in the library, both in pieces of code I have checked and re-checked dozens of times. Both bugs were really, really stupid.
I will keep the story of how I actually found these bugs in the end for another blog post, suffice it to say that one of them is a Heisenbug, i.e. it had no effect in the debug version of the program.
Bug 1:
Without going into too much detail, the basic idea behind this code is that two individuals a and b have to negotiate their "role" (roleA and roleB) in the conflict. How this negotiation happens depends on the mode of role assignment each individual prefers (sym_a and sym_b). In one particular mode (sym_random) b automatically has the opposite role of a. Therefore in line 7 it should be roleA instead of sym_a.
I know, it's just a typo and a stupid one at that (although the code shown here is slimmed down quite a bit, in the original version the mistake is a bit more difficult to spot). Still, one reason it can go unnoticed is that C++ happily let's me convert between enum, int and bool without as much as a cringe (BTW, the fact that I have to use int for sym_a and sym_b instead of an enum is due to another quirk of C++ - enums don't play well with IO).
Bug 2:
Ok, this one is slightly embarassing. In a misguided attempt at optimisation I decided to rely on the input being valid (i.e. between 0 and 1). Which was sort of fine since in the original version I had two debug asserts checking for validity. Unfortunately it turned out that if I switched on SSE* code generation in gcc values of p==1.0 (which passed the asserts) lead to silent overflows in the multiplication so that the function always returned false. After I finally found it the bug was easily fixed by bypassing the comparison for values <=0 and >=1.
These two examples demonstrate two problems of C++ which make numerical programming significantly more error prone than necessary. Implicit conversion and silent arithmetic errors are by far not the most common sources of bugs in my programs. However if they occur the resulting bugs are among the hardest to find.
Unfortunately all languages that would offer enough static and dynamic checking to avoid these types of bugs come with a strong efficiency penalty (and, yes, 50% longer runtime *is* too much). It seems for now we will just have to make do with a bad language.
I implemented the first version of the simulation I am currently (again) working on about six years ago. Since then I made countless changes and refinements, although the general structure remained the same (yay structured programming). In particular the core functions of the model class (which implement what the model actually does) I tend to revise quite often in an iterative process of checking results, trying to understand them and coming up with new ways to test whether what I come up with is what's actually happening.
The stable and general part of the code I am producing I usually pull into a private library (which is available from sourceforge but so much a work in progress that I won't link to it) after some time.
One of the bugs occurred in the core model, the other one in the library, both in pieces of code I have checked and re-checked dozens of times. Both bugs were really, really stupid.
I will keep the story of how I actually found these bugs in the end for another blog post, suffice it to say that one of them is a Heisenbug, i.e. it had no effect in the debug version of the program.
Bug 1:
1 const int sym_a = a.symmetry();
2 const int sym_b = b.symmetry();
3 enum {sym_random = 0, sym_noRoles = 1};
4
5 const bool roleA = sym_a == sym_noRoles ? true : rng(2);
6 const bool roleB = sym_b == sym_noRoles ? true :
7 (sym_a == sym_random ? !sym_a : rng(2));
Without going into too much detail, the basic idea behind this code is that two individuals a and b have to negotiate their "role" (roleA and roleB) in the conflict. How this negotiation happens depends on the mode of role assignment each individual prefers (sym_a and sym_b). In one particular mode (sym_random) b automatically has the opposite role of a. Therefore in line 7 it should be roleA instead of sym_a.
I know, it's just a typo and a stupid one at that (although the code shown here is slimmed down quite a bit, in the original version the mistake is a bit more difficult to spot). Still, one reason it can go unnoticed is that C++ happily let's me convert between enum, int and bool without as much as a cringe (BTW, the fact that I have to use int for sym_a and sym_b instead of an enum is due to another quirk of C++ - enums don't play well with IO).
Bug 2:
1 /** Gives true with a probability of p. */ 2 bool choice(float p) 3 { 4 return (*this)() < Type((this->getMax()) * p); 5 }
Ok, this one is slightly embarassing. In a misguided attempt at optimisation I decided to rely on the input being valid (i.e. between 0 and 1). Which was sort of fine since in the original version I had two debug asserts checking for validity. Unfortunately it turned out that if I switched on SSE* code generation in gcc values of p==1.0 (which passed the asserts) lead to silent overflows in the multiplication so that the function always returned false. After I finally found it the bug was easily fixed by bypassing the comparison for values <=0 and >=1.
These two examples demonstrate two problems of C++ which make numerical programming significantly more error prone than necessary. Implicit conversion and silent arithmetic errors are by far not the most common sources of bugs in my programs. However if they occur the resulting bugs are among the hardest to find.
Unfortunately all languages that would offer enough static and dynamic checking to avoid these types of bugs come with a strong efficiency penalty (and, yes, 50% longer runtime *is* too much). It seems for now we will just have to make do with a bad language.
Friday, 5 February 2010
Who cares about lack of evidence
One of the common strategies of creationists and proponents of 'Intelligent Design' to discredit the theory of (Darwinian) evolution is to point out the supposed lack of evidence the theory has. Aside from the obvious reason that the overwhelming majority of falsification of evidence having been brought forth by these people in the past ranged from misunderstandings to utter nonsense, this claim always bothered me on a more principal level.
I think I now understand why. Even if they were right and all the little pieces of evidence that have been collected over the years were just plain wrong - it would not really matter that much. The assumption that it would shows a deep misunderstanding of the way science works. Let me explain why.
First we have to make clear what we are talking about. The "theory of evolution" creationists et al. make so much fuss about is essentially the assumption of modern biology that evolutionary processes (in the wider sense, i.e. including drift, gene flow and extinction) are solely responsible for the emergence of the diversity of life on earth from a single ancestral life form. (Many creationists confuse that with theories on the origin of life, but that is a different matter.)
In science theories that explain something reasonably well are kept around until either they are proven invalid or a theory that gives a better explanation comes around. If anti-evolutionists want us to abandon the theory of evolution they therefore either have to prove it invalid or provide something better.
That only leaves the second option - coming up with a better theory. As with every other topic many clever people have spent their lives thinking about what constitutes a good theory. The current mainstream version goes something like this:
A good theory has to be logically consistent and able to explain the phenomenon in question. It has to be falsifiable - a theory that can not possibly be proven wrong belongs to faith and not to science. Given two good theories, the more parsimonious, i.e. the one needing fewer assumptions is considered better.
The assumption of the existence of a supreme being with limitless power does not strike me as particularly parsimonious not to mention the fact that its alleged unpredictability makes every theory based on its behaviour by definition unfalsifiable.
Therefore, even *if* creationists et al. were right concerning the lack of evidence for evolution, I have to say, given the rather lousy alternatives, I will stick with it. Feel free to prove me wrong.
I think I now understand why. Even if they were right and all the little pieces of evidence that have been collected over the years were just plain wrong - it would not really matter that much. The assumption that it would shows a deep misunderstanding of the way science works. Let me explain why.
First we have to make clear what we are talking about. The "theory of evolution" creationists et al. make so much fuss about is essentially the assumption of modern biology that evolutionary processes (in the wider sense, i.e. including drift, gene flow and extinction) are solely responsible for the emergence of the diversity of life on earth from a single ancestral life form. (Many creationists confuse that with theories on the origin of life, but that is a different matter.)
Digression I: kinds of evolutionI assume the goal of anti-evolutionists is to discredit evolutionary theory and replace it with something more to their liking such as the "theory" of Intelligent Design which involves some divine intervention.
Creationists like to distinguish between micro-evolution (changes in a population over time) and macro-evolution (appearance and disappearance of species) claiming that the two are qualitatively different phenomena. Although this disctinction is not part of mainstream biology we will keep it here just for the sake of the argument.
Anti-evolutionists usually (nowadays) have no problems with biologie's understanding of "micro-evolution". The thing they hate and that the whole debate is about is how "macro-evolution" is explained by evolutionary theory.
In science theories that explain something reasonably well are kept around until either they are proven invalid or a theory that gives a better explanation comes around. If anti-evolutionists want us to abandon the theory of evolution they therefore either have to prove it invalid or provide something better.
Digression II: kinds of theories
I think what most people have in mind when they talk about a "theory" is something like Newtonian mechanics or relativity. A clearcut, simple (as opposed to complex) mathematical model that describes some part of the "inner workings" of our world.
The theory of evolution is very different. It consists of three statements which in very simplified form look like this:
- If there is heritable variation between individuals in a population which matters for their chances of survival or reproduction then evolution will take place which ultimately also can lead to the split of populations into distinct species.
- The physiology and ecology of actual biological individuals provides reproduction/survival-relevant heritable variation so that actual populations can evolve.
- The process of evolution has occured in the past and is (solely) responsible for the diversity of life on earth.
Epistemologically these are very different kinds of statements.In short proving the theory of evolution invalid could be done by showing that the theory is logically inconsistent (i.e. that evolution a priori can not happen) or that it contradicts more fundamental laws of e.g. physics or chemistry (i.e. that evolution can not happen in our universe or on our planet).
The first part is a essentially a theory about a specific type of emergence in complex systems. It describes how given certain conditions concerning the elements of a system certain mechanisms lead to a specific behaviour of the system. In this sense the first part is entirely a logical statement and has nothing to do with reality.
The second part is the assumption that the mentioned conditions can occur in our world and that the mentioned mechanisms are compatible with the laws of physics/chemistry/etc.
The third part is the statement that these processes have actually occured in the past and are responsible for the appearance of a certain aspect of the world as we see it.
Digression III: kinds of being wrong'Lack of evidence' however (even if it would apply) does not disprove the theory of evolution, at best it weakens its explanatory success.
I usually tend towards a rather constructivist point of view but just for the sake of the argument let us for a moment assume there is some objective describable reality with respect to which our theories can actually be wrong.
The theory of evolution (and similar theories about the history of complex systems) can be wrong in three ways:
- It can be logically inconsistent, i.e. the assumed conditions do not lead to the described processes happening or the assumed mechanisms have a different outcome.
- It can be inconsistent with the laws of physics, i.e. although evolution might happen the way we describe it, it can not do so in our world since the preconditions can never be met or the mechanisms can not take place.
- It can be historically inaccurate. This means that although evolution could happen in our world it did not do so in the past or at least not on a sufficient scale to actually produce the diversity of life as we know it.
That only leaves the second option - coming up with a better theory. As with every other topic many clever people have spent their lives thinking about what constitutes a good theory. The current mainstream version goes something like this:
A good theory has to be logically consistent and able to explain the phenomenon in question. It has to be falsifiable - a theory that can not possibly be proven wrong belongs to faith and not to science. Given two good theories, the more parsimonious, i.e. the one needing fewer assumptions is considered better.
The assumption of the existence of a supreme being with limitless power does not strike me as particularly parsimonious not to mention the fact that its alleged unpredictability makes every theory based on its behaviour by definition unfalsifiable.
Therefore, even *if* creationists et al. were right concerning the lack of evidence for evolution, I have to say, given the rather lousy alternatives, I will stick with it. Feel free to prove me wrong.
Subscribe to:
Posts (Atom)