Sunday, June 12, 2011

gEDA and Guile — checking arguments to Scheme functions in C

This is the fifth in a series of blog posts on extensibility in gEDA using Guile Scheme.

  1. Finding Scheme API code in gEDA
  2. Compiling against multiple Guile versions
  3. Safe handling of non-local exits
  4. Dealing with deprecated libguile functions
  5. Checking arguments to Scheme functions in C
  6. How and when to use Scheme errors
  7. Reducing boilerplate with "snarfing macros"
  8. Opportunities to get involved

In this post I'll explain how to use the SCM_ASSERT() macro to check arguments passed to Guile functions written in C using libguile.

Using SCM_ASSERT()

SCM_ASSERT() is a shorthand way to raise a Scheme wrong-type-arg error for a Scheme variable if a test fails. For example, if there is a function which takes a string as an argument, you might write it as:

SCM
myfunc (SCM str)
{
  SCM_ASSERT (scm_is_string (str), str, SCM_ARG1, "my-func");

  /* ... do things with str ... */
}

All functions in gEDA which are exposed as Scheme procedures must use SCM_ASSERT() to thoroughly check the types of their arguments.

There are a few things to note:

  • The first parameter to SCM_ASSERT() is a test, and should evaluate to an integer, not a Scheme value. If the test is zero, an error is signalled.
  • If you need to use a test which evaluates to a Scheme value (e.g. scm_list_p()), don't compare it to SCM_BOOL_F or SCM_BOOL_T directly; use scm_is_true() or scm_is_false().
  • The second parameter is the variable which has the wrong type. Although the test might not be on it directly (e.g. if you are checking for a list of strings), the second parameter must always be the actual argument which was passed to myfunc.
  • The third parameter to SCM_ASSERT() indicates which argument to myfunc had the wrong type — and remember that this is 1-indexed. Although this parameter is just an integer, you should always use the SCM_ARG1, SCM_ARG2 etc. macros for arguments up to 7, since this makes the purpose of the parameter more readable.
  • If you have a function that takes an arbitrary number of parameters, you should use the SCM_ARGn macro as the argument index.
  • The final parameter is a char * string containing the name of the function as seen by Scheme ("my-func" in this case).

If you have a C helper function which is used by many similar Scheme functions, you can generate more useful error messages if you pass it the name of the function that's actually visible to Scheme code. For example:

void
helperfunc (SCM str, const char *funcname)
{
  SCM_ASSERT (scm_is_string (str), str, SCM_ARGn, funcname);

  /* ... do things ... */
}

void
myfunc (SCM str)
{
  helperfunc (str, "my-func");

  /* ... do more things ... */
}

Don't leak resources!

The most important thing to remember is that SCM_ASSERT() does not behave like most normal assertion mechanisms when programming in C (g_assert() in GLib for example), in that it doesn't end the program if it fails — it simply causes a non-local exit by raising an error.

Because SCM_ASSERT() might not return, it's really important to be careful about managing resources when writing a Scheme function in C. For example, consider a function that needs to convert a Scheme list of strings into a GList of C strings. Here's a naive, broken implementation:

/* Don't do this -- it leaks memory */
void
process_string_list (SCM lst)
{
  GList *glst = NULL;
  SCM_ASSERT (scm_is_true (scm_list_p (lst))), lst, SCM_ARG1,
              "process-string-list");

  for (SCM iter = lst; iter != SCM_EOL; iter = SCM_CDR (iter)) {
    SCM str = SCM_CAR (iter);
    SCM_ASSERT (scm_is_string (str)), lst, SCM_ARG1, "process-string-list");
    glst = g_list_prepend (glst, scm_to_utf8_string (str));
  }

  /* ... do something with glst ... */
}

If the SCM_ASSERT() inside the loop raises an error, the stack will unwind out of process_string_list() and the memory allocated in glst will be leaked. Instead, you could check the input, and then build the list:

void
process_string_list (SCM lst)
{
  GList *glst = NULL;
  SCM_ASSERT (scm_is_true (scm_list_p (lst))), lst, SCM_ARG1,
              "process-string-list");

  /* Check the input is actually a list of strings */
  for (SCM iter = lst; iter != SCM_EOL; iter = SCM_CDR (iter)) {
    SCM_ASSERT (scm_is_string (SCM_CAR (iter)), lst, SCM_ARG1, "process-string-list");
  }

  /* Convert to a GList */
  for (SCM iter = lst; iter != SCM_EOL; iter = SCM_CDR (lst)) {
    glst = g_list_prepend (glst, scm_to_utf8_string (SCM_CAR (iter)));
  }

  /* ... do something with glst ... */
}

The problem with this is that it is inefficient, as it requires iterating over the linked list twice. A more efficient approach would be to use dynamic wind, as I discussed in a previous post in this series.

/* A function for freeing a GList containing strings */
static void
free_string_glist (void *data)
{
  GList *glst = *((GList *) data);
  for (GList *iter = glst; iter != NULL; iter = g_list_next (iter)) {
    free (iter->data);
  }
  g_list_free (glst);
}

void
process_string_list (SCM lst)
{
  GList *glst = NULL;
  SCM_ASSERT (scm_is_true (scm_list_p (lst))), lst, SCM_ARG1,
              "process-string-list");

  /* Begin a dynamic extent, and register a handler to clean up on error */
  scm_dynwind_begin (0);
  scm_dynwind_unwind_handler (free_string_glist, (void *) &glst, 0);

  /* Convert to a GList, checking types */
  for (SCM iter = lst; iter != SCM_EOL; iter = SCM_CDR (iter)) {
    SCM str = SCM_CAR (iter);
    SCM_ASSERT (scm_is_string (str)), lst, SCM_ARG1, "process-string-list");
    glst = g_list_prepend (glst, scm_to_utf8_string (str));
  }

  /* End dynamic extent */
  scm_dynwind_end ();

  /* ... do something with glst ... */
}

Update: Thanks to Ivan Stankovic for pointing out a bug in this example!

Unfortunately, there appear to be quite a few places in gEDA where there is the potential for resource leakage like this to occur, and checking that all functions which use SCM_ASSERT() do so in a safe manner would be a really useful thing to do.

Conclusion

As you can tell, even doing something as conceptually simple as checking your arguments safely can be a tricky thing to get right when using libguile — another argument, if you needed one, for keeping the number of functions in gEDA's Scheme API as small as possible while still exposing all of the necessary functionality.

In my next post, I'll discuss how (and when!) in general to throw Scheme errors from C.

No comments: