gfront

An Objective-C to C translator using GObject

Background

GObject is used today primarily to write GNOME applications. It provides object oriented facilities in the C language. Unfortunately, if one wants to program with the GObject libraries, there is a steep learning curve. Also, much of the code that one needs to write is large amounts of boilerplate code. Furthermore, since C is such a low-level language it does not offer the quick development time that high-level languages, such as Java, Perl or Python, can offer. To respond to these challenges we decided to develop a translator that would take Objective-C code and turn it into C code that uses the GObject type system.

To date, the two other projects that have tackled similar problems are The GObject Builder and Vala . These projects translate a Java-like syntax and a C#-like syntax respectively into GObject code. We chose to translate from Objective-C because it is an elegant language and strict super-set of the C thus allowing the system-level control of C combined with the ease of a high level object-oriented language.

Objective-C

Objective-C is a strict super-set of the C language. This practically means that any C program already written is a valid Objective-C program and can be extended to use Objective-C syntax. Objective-C adds on top of C object-oriented functionality such as classes and inheritance using a SmallTalk derived syntax. Methods are termed messages in Objective-C. There are class and instance messages, which are equivalent to Java's static and instance methods respectively. Objective-C also adds categories, a unique feature allowing programmers to add methods to objects without needing to recompile the original objects or create a derived type. Objective-C offers protocols, which are like interfaces in Java; these assure users of the code that certain methods are definitely implemented by the class. Objective-C is also a dynamically typed, late bound language, so messages can be passed to objects, even if that message may not be defined for that object. Objects can then pass that message on to some other object that may or may not be able to handle it itself. Additionally, the Apple/NextStep runtime for Objective-C uses reference counting for memory management. The Boehm conservative collector may be used with Apple or GNU implementations.

The GObject Type System

The GObject type system serves as an object-oriented framework for C. One of GObject's primary design goals is to allow easy integration with object systems in other languages. It provides a generic type system that allows one to define a singly-inherited class structure, and it manages creation, destruction, and copying objects. It also can handle the memory management of defined classes using reference counting.

The GObject object system is similar to Java in that it is single inheritance only and methods are dispatched using offsets in a vtable. It is not as dynamic as Objective-C.

Each GObject class requires the declaration of two structures, an instance structure and a class descriptor structure. The first field of the instance structure is the parent type or the GObject root class if there is no parent. The first field of the class descriptor is the class descriptor of the parent, or the GObject root descriptor if there is no parent. Including the parent structure allows for single

Scanning and Parsing

Since Objective-C is a strict superset of C, we started with a copy of a C scanner written for lex and C grammar written for Yacc. We first modified the grammar to use Bison precedence rules for expressions and build and abstract syntax tree for C using Objective-C classes that we defined. We then extended this grammar to include the subset of Objective-C that we would be using.

Unlike some other compilrs, we defined our Abstract Syntax Tree nodes to be very high-level and to closely match the Objective-C grammar. This aided us later on in our translation phase, since for the much of the C language we could simply print out verbatim what the original code was.

We first added Objective-C syntax for interfaces, which an example follows:

@interface className : inheritedClassName 
{
    @public         // Public instance variables
        int a;
    @protected      // Protected instance variables
        double b;
    @private        // Private instance variables
        char c;
}
// Declare an instance message
-(int) func1: (char) param1 : (int) param2;
// and a class message
+(char) func2: (int) param1;
@end

We also added implementation syntax, another example follows:

@implementation className : inheritedClassName
// Define an instance message
-(int) func1: (char) param1 : (int) param2 
{
    //Do stuff...
    return 3;
}
// and a class message
+(char) func2: (int) param1 
{
    //Do stuff...
    return '\0';
}
@end

We implemented message passing in the grammar as well, which can be used as follows:

//Begin code snippet...
char ch = [className func2: 12];
className *cl = [[className alloc] init];
int a = [cl func1: ch : 1];

Type Checking

One of the benefits of using Objective-C is that its classes can inherit other class's methods and fields. Obviously there is no notion of this in C, so to translate this to C, we had to insert into an instance struct a parent struct that held the fields of the parent's class, the normal GObject mechanism. Since there can be many levels of inheritance, that parent struct could also contain a parent struct and so on. The following is an example of how we handle this.

@interface superClass 
{
    int a;
}
@end

@interface class : superClass 
{   
    int b;
}
@end

//Is translated into...

struct superClass 
{
    int a;
};

struct class 
{
    struct superClass parent;
    int b;
};

In most Object Oriented languages, when an instance wants to use a field inherited by another class, it simply uses the name of that field and the compiler will calculate the appropriate offset. However, we had to keep track of which parent holds that particular field, so that after translation occurs, the C compiler could determine the proper offsets in the struct to access the field. The before and after of such a piece of code is like this:

class cl;
cl.a = 5;

// This becomes...

struct class cl;
cl.parent.a = 5;

Another benefit of using Objective-C is that it is dynamically typed, therefore many type checks happen at run time. Unfortunately C is not dynamically typed, so we must do many of those checks at translation time. We performed type synthesis on the program expressions to determine the type of instances that receive messages. This allows us to cast the class descriptor to the correct type to lookup the method pointer.

Translation

As stated earlier, the C code subset of Objective-C is straightforward to translate since it can mostly be used as-is without any modification. For the Objective-C constructs though, much has to be done. In Objective-C the interface construct declares a class, meaning instance variables and instance and class methods are declared. These classes and their instance variables translates into C struct declarations for the instance and class descriptor. The methods are translated into function declarations. Objective-C implementations are a bit more complex. The messages defined by the implementations are turned into regular C function definitions. Pointers to these functions are stored in the class descriptor allowing for dynamic dispatch. Message expressions are also translated into appropriate function calls with the receiver struct passed in as the instance pointer.