Saturday, June 11, 2011

gEDA and Guile — dealing with deprecated libguile functions

This is the fourth in a series of blog posts on extensibility in gEDA using Guile Scheme.

  1. Finding Scheme API code in gEDA
  2. Compiling against multiple Guile versions
  3. Safe handling of non-local exits
  4. Dealing with deprecated libguile functions
  5. Checking arguments to Scheme functions in C
  6. How and when to use Scheme errors
  7. Reducing boilerplate with "snarfing macros"
  8. Opportunities to get involved

In this post I'll explain how to set up your environment to get warnings about use of deprecated libguile API, and some of the most problematic deprecated API usage in gEDA.

Getting notified about deprecated API usage

Most gEDA applications use something like the following to disable deprecation warnings unless you specifically set your environment to request them:

if (getenv ("GUILE_WARN_DEPRECATED") == NULL)
  putenv ("GUILE_WARN_DEPRECATED=no");

The simplest way to enable deprecation warnings is, therefore, to add something like this to your shell rc file:

GUILE_WARN_DEPRECATED="detailed"
export GUILE_WARN_DEPRECATED

Now when you start gschem, you'll get a few warnings on stderr (depending on which version of Guile you've compiled against). Here's an example of what you see when starting gschem if you compile against Guile 2.0:

$ gschem
`(debug-enable 'debug)' is obsolete and has no effect. (1)
Remove it from your code.
SCM_STRING_CHARS is deprecated. See the manual for alternatives. (2)
SCM_SYMBOL_CHARS is deprecated. Use scm_symbol_to_string. (3)

Some of this is a problem, some of it isn't.

  1. We use (debug-enable 'debug) in system-gafrc to enable backtraces in Guile 1.8, but it does nothing in Guile 2.0. It would be nice to find a way to only call it if necessary, but it's not really a priority.
  2. SCM_STRING_CHARS() is used a lot in gEDA to obtain a pointer to the underlying character buffer of a Scheme string, and it's been deprecated since Guile 1.8.0. gEDA's reliance on this is a bit of a problem, unfortunately, and I'll explain why later.
  3. SCM_SYMBOL_CHARS() is similar to SCM_STRING_CHARS() (but for the string representation of a Scheme symbol), and has also been deprecated for a long time.

The problem with SCM_STRING_CHARS() and SCM_SYMBOL_CHARS()

So, why is using SCM_STRING_CHARS() a problem? The main issue is that its use assumes that Guile's internal string representation is the same as gEDA's, i.e. an array of char, and that Guile's internal string encoding is the same as gEDA's, i.e. UTF-8. Neither of these assumptions are reliable.

There are two different internal representations for strings in Guile 2.0. All strings are stored as an array of Unicode code points.

  • If all the code points are in the range 0-255 inclusive, the code points are stored with one byte per code point, i.e. as Latin-1 or ISO-9959-1. This is not UTF-8.
  • If any of the code points is outside that range, the whole string is stored with four bytes per code point, i.e. as UTF-32. This is also not UTF-8.

Additionally, Guile 2.0 introduces read-only strings (which don't work with SCM_STRING_CHARS()) and shared substrings (which don't work with SCM_STRING_CHARS()).

So how can we break a function func that takes a single string argument and uses SCM_STRING_CHARS()? Let me count the ways.

  1. We can pass it a string containing code points above 255. Since we target a worldwide user base these days, that's not particularly unlikely.

    (func "你好")
  2. We can pass it a shared substring.

    (func (substring/shared "foo bar" 0 3))
  3. We can pass it a read-only substring

    (func (substring/read-only "foo bar" 0 3))

SCM_SYMBOL_CHARS() shares all these problems. Both these macros have been deprecated since Guile 1.8.0 exactly because the Guile developers wanted to be able to change Guile's original internal string representation to support Unicode fully. (In case you're wondering why they don't use UTF-8, it's because Scheme requires a bunch of string operators that operate on the nth character in a string, and using UTF-8 would make those operators much slower).

Replacing SCM_STRING_CHARS()

What's the alternative? Ideally, we'd use the rather handy scm_to_utf8_stringn() function, but that was only introduced in Guile 2.0 (along with Unicode support), so it's not an option. Instead, we have to rely on scm_to_locale_string(). The main difference between the new functions and SCM_STRING_CHARS() is that the new functions allocate memory, which must be freed with free() (n.b. not g_free()).

Update: gEDA now provides scm_to_utf8_string() and scm_from_utf8_string() even if Guile doesn't, so always use them unless you actually want to work with locale-encoded strings.

So suppose we started off with a version of myfunc() that uses SCM_STRING_CHARS():

void
myfunc (SCM arg)
{
  /* N.b. we should check that arg is in fact a string */
  printf ("%s", SCM_STRING_CHARS (arg));
}

It should be replaced by:

void
myfunc (SCM arg)
{
  char *arg_str;
  /* N.b. we should check that arg is in fact a string */
  arg_str = scm_to_utf8_string (arg);
  printf ("%s", arg_str);
  free (arg_str);
}

In reality, you'll want to do something more complicated than just print the string. Don't forget that if you do more than trivial calls into libguile in between creating arg_str and freeing it you should probably use dynamic wind to make sure that it is properly cleaned up.

You should also be aware that because scm_to_utf8_string() tries to return a null-terminated string, it throws an error if the string contains #\nul characters. It can throw an error if the string can't be converted to the requested encoding (which is locale-dependent for scm_to_locale_string()). This introduces its own challenges.

Replacing SCM_SYMBOL_CHARS()

Replacing SCM_SYMBOL_CHARS() is similar to replacing SCM_STRING_CHARS(); simply use scm_symbol_to_string() to convert the symbol to a string, and then use scm_to_utf8_string() as before.

Conclusion

Library APIs are rarely deprecated without a good reason, and being aware of and proactive about updating deprecated API usage can help avoid some serious problems. Updating gEDA to remove the use of the SCM_STRING_CHARS() and SCM_SYMBOL_CHARS() macros is an important job. Nevertheless, it would still be quite accessible for someone less familiar with the gEDA code base, as it can be done by dealing with one function at a time.

In my next post, I will describe how to use SCM_ASSERT to check types of SCM arguments to functions that use libguile.

No comments: