Tuesday, 4 May 2010

C++ sucks for simulations

Today I once again had to realize that C/C++ is really a bad language to use for numerical simulations. In two heavily scrutinized (and often used) pieces of code I found two simple, yet far-reaching bugs which would immediately have been spotted by the compiler or the run-time system, respectively, had I used a decent language.
I implemented the first version of the simulation I am currently (again) working on about six years ago. Since then I made countless changes and refinements, although the general structure remained the same (yay structured programming). In particular the core functions of the model class (which implement what the model actually does) I tend to revise quite often in an iterative process of checking results, trying to understand them and coming up with new ways to test whether what I come up with is what's actually happening.
The stable and general part of the code I am producing I usually pull into a private library (which is available from sourceforge but so much a work in progress that I won't link to it) after some time.
One of the bugs occurred in the core model, the other one in the library, both in pieces of code I have checked and re-checked dozens of times. Both bugs were really, really stupid.
I will keep the story of how I actually found these bugs in the end for another blog post, suffice it to say that one of them is a Heisenbug, i.e. it had no effect in the debug version of the program.

Bug 1:
1 const int sym_a = a.symmetry();
2 const int sym_b = b.symmetry();
3 enum {sym_random = 0, sym_noRoles = 1};
4 
5 const bool roleA = sym_a == sym_noRoles ? true : rng(2);
6 const bool roleB = sym_b == sym_noRoles ? true : 
7  (sym_a == sym_random ? !sym_a : rng(2));

Without going into too much detail, the basic idea behind this code is that two individuals a and b have to negotiate their "role" (roleA and roleB) in the conflict. How this negotiation happens depends on the mode of role assignment each individual prefers (sym_a and sym_b). In one particular mode (sym_random) b automatically has the opposite role of a. Therefore in line 7 it should be roleA instead of sym_a.
I know, it's just a typo and a stupid one at that (although the code shown here is slimmed down quite a bit, in the original version the mistake is a bit more difficult to spot). Still, one reason it can go unnoticed is that C++ happily let's me convert between enum, int and bool without as much as a cringe (BTW, the fact that I have to use int for sym_a and sym_b instead of an enum is due to another quirk of C++ - enums don't play well with IO).

Bug 2:

1 /** Gives true with a probability of p. */
2 bool choice(float p)
3  {
4  return (*this)() < Type((this->getMax()) * p);
5  }

Ok, this one is slightly embarassing. In a misguided attempt at optimisation I decided to rely on the input being valid (i.e. between 0 and 1). Which was sort of fine since in the original version I had two debug asserts checking for validity. Unfortunately it turned out that if I switched on SSE* code generation in gcc values of p==1.0 (which passed the asserts) lead to silent overflows in the multiplication so that the function always returned false. After I finally found it the bug was easily fixed by bypassing the comparison for values <=0 and >=1.

These two examples demonstrate two problems of C++ which make numerical programming significantly more error prone than necessary. Implicit conversion and silent arithmetic errors are by far not the most common sources of bugs in my programs. However if they occur the resulting bugs are among the hardest to find.
Unfortunately all languages that would offer enough static and dynamic checking to avoid these types of bugs come with a strong efficiency penalty (and, yes, 50% longer runtime *is* too much). It seems for now we will just have to make do with a bad language.

No comments:

Post a Comment