Tuesday, November 27, 2007

Error reporting & handling in libgeda

Apparently some people do read my blog: Peter Clifton and I met up for a drink at Borders in town this evening, and he recommended that I write more often. So here's this evening's contribution, on a slight refactor of libgeda error reporting.

It's been a while since I last blogged about gEDA development, and since then I've had the good fortune to be able to travel to Cambridge, MA and meet several of the other developers based around there. In the last couple of days I've started ramping up my involvement in preparation for having some time for serious development over the Christmas vacation.

So, to begin: in general, libgeda kills the process without warning far too often. This is a bad thing for a library to do; in particular, it's very bad to do it just because someone passed some bad data in. The libgeda doesn't make any attempt to hide how the data structures are organised, and thus we have to assume that user code will make use of this knowledge to try and hack them directly.

My pondering of this problem over the last couple of days has lead me to think up four basic rules to consider when working out how to handle errors occuring in libgeda code:

  1. If possible, succeed.
  2. If failure is inevitable, fail gracefully.
  3. If normal operation may result in failure, use GError.
  4. Assume that libgeda works.

Of course, some of these need a little explanation as to their interpretation and why they make sense.

Firstly, "If possible, succeed." This seems obvious, but actually has some subtlety. What I mean here is that if there's a sensible, clean way of carrying on despite a problem, you should do so. This only really applies to user-facing code, as code which can't be called from outside libgeda really should have had its inputs checked by the calling function already. However, since all of libgeda's code is user-facing -- we don't have any private headers or the like -- this point is rather moot. In addition, "succeeding" doesn't preclude printing messages to the effect of, "Someone's playing silly buggers but I'm going to try my best anyway." g_critical() should be used for this. One example would be when an unknown object type is encountered; at the moment, libgeda often kills the program by calling g_assert_not_reached(), when often it would be valid to continue by logging a critical message and then skipping over the offending object.

Secondly, "If failure is inevitable, fail gracefully." Failing gracefully requires no dangerous side-effects from the failure. If possible, the system should be returned to its prior state. This often requires that user data needs to be checked before doing anything destructive, possibly at the expense of some CPU time.

GError is a nice mechanism in that it allows errors to be ignored if necessary. A good example of when it should be used would be in code which reads and writes files; because GLib's file access code already uses GError extensively, this would not be hard to implement.

Finally, "Assume that libgeda works." Similarly to rule #1, what is meant here is that libgeda functions should check their own behaviour. If they do so, there is no need for libgeda functions which call them to check again -- it's a waste of CPU time and developer effort.

So, given the above points, when is it appropriate to use g_assert()? It would not be appropriate to use it to check the arguments passed to a function, but it would be valid to use it to check that the function has successfully done what it is supposed to before returning a result. For instance, when a complex algorithm is in use, putting some assertions in to make sure that the algorithm actually does what you think it does might be a very good idea.

Of course, this has lead to a number of action items in terms of libgeda refactoring:

  • Cleaning up uses of g_assert() in libgeda, and replacing them with g_critical() logging and graceful failure where possible. I've already made a start on this.
  • Moving all of the file handling code to use GError. This will allow gschem to show a message dialog when a file operation fails, rather than forcing the user to look at the log window or at the console.
  • In the medium term, moving all of libgeda to use the GLib logging API (of which g_critical() is part) will allow the development of a shiny new gschem log window which allows logging at different levels with colour coding of criticality. This would be nice.
  • Ideally, the error messages used for GErrors emitted from libgeda should be translated. This means libgeda will need to use gettext.

As usual, I'd be interested to hear peoples' thoughts on this. Please let me know on the -dev list or by e-mail.