Ideas for Language Constructs Implementable by Translation to Readable C++
This is a collection of ideas of what one might do with technology that makes it easy to add language constructs to an extensible language (closely related to C++) that targets readable C++ source code. The C++ compatibility and readability constraints restrict the language, for sure, but many interesting things can still be done, and barrier for adoption is low, especially when the compiler is used merely as a one-off code generation wizard.
The surface syntax used for the source language (the "C++ code I want written for me" specification language) used in these examples varies. The source-to-source translation has been done manually, but the resulting C++ code has been verified to compile.
[and_or] and
and or
expressions of the same type as the subexpressions
- spec files: and_or.lsc
- generated files: and_or.hpp and_or.cpp
- test program: and_or/
In C++, the &&
and ||
operators always produce a boolean result, but sometimes it would be nice to have more Scheme-like and
and or
operators such that and
would yield either the value of the last subexpression or 0, and or
would yield either the value of the first true expression or 0.
This feature is implementable through source-to-source translation in terms of C++ if expressions and statements and temporary variables (in order to avoid repeating side effects).
[anon_class] anonymous classes
- spec files: anon_class.ext.cpp
- generated files: anon_class.hpp anon_class.cpp
- test program: anon_class/
In C++ it is common to use classes with nothing but pure virtual
methods as callback interfaces for clients to implement in order to receive event notifications, for instance. The Symbian platform, for instance, includes a large number of such interfaces, and these classes are known as M
classes due to their naming convention.
Java’s interface
construct is used in the same way, but Java also supports anonymous classes, which makes it more convenient to implement callbacks where required. This example explores the idea of adding anonymous class support to C++ through source-to-source translation.
[func_obj] lambda expressions and closures
- spec files: func_obj.ext.cpp
- generated files: func_obj.hpp func_obj.cpp
- test program: func_obj/
This example is basically what is described for lambda expression surface syntax and semantics in the N2550 specification that is to apparently be adopted for C++0x. Given that the specification defines the “semantics of lambda expressions via translation to function objects” it should be quite possible to systematically source-to-source transform such expressions to valid C++, allowing the construct to be used with older compilers that do not support C++0x.
[lit_desc] implicit _LIT
declaration within an expression
- spec files: lit_desc.ext.cpp
- generated files: lit_desc.hpp lit_desc.cpp
- test program: lit_desc/
This example concerns the _LIT
construct which is used in Symbian C++ to declare string (“descriptor”) literals. _LIT
is a macro that produces a C++ declaration, and hence it cannot appear in an expression context. A source-to-source translator could lift such declarations to the nearest preceding declaration context.
(Symbian does have an alternative _L
macro that allows a literal to appear in an expression context, but _LIT
is preferred as _L
involves a performance penalty.)
More generally, allowing declarations within expressions would be powerful particularly when coupled with a macro facility capable of local transformations.
[member_init] member variable initialization with assignment syntax
- spec files: member_init.ext.cpp
- generated files: member_init.hpp member_init.cpp
- test program: member_init/
As your compiler may tell you, “ISO C++ forbids initialization of member” variables if they are instance variables (i.e., not static
), and you must then initialize in the constructor. This probably does not seem attractive to those with a Java background, for instance.
C++ in any case initializes instance variables in the order they are declared, and there hence probably is no confusion if the ctor member initializers were added automatically by a source-to-source translator, letting the actual variable declaration include an assignment specifying the initial value of the variable.
Similarly, for consistency, one might also allow the initialization of static
non-const
variables in the same manner as static
const
variables. Leading to a situation where all member variables can be declared the same way, with initial value and all.
[nested_anon_func] nested and anonymous functions
- spec files: nested_anon_func.lsc
- generated files: nested_anon_func.hpp nested_anon_func.cpp
- test program: nested_anon_func/
Nested and anonymous functions are not supported in C++, yet they may be handy in cases where a function is only referenced in a particular context, in which case it may be desirable to only have the function defined in that context.
Anonymous functions as such are implementable through source-to-source translation in a relatively straightforward way, as basically all that is required is to name the functions uniquely and lift them to the top level, where C++ does allow them.
This example considers the simple case where closures are not supported. To support closures one would have to consider the lifetime of visible variables from enclosing scope, possibly having to provide multiple alternate solutions depending on how memory is to be managed.
[pimpl] automatic hiding of class implementation
- spec files: pimpl.ext.cpp
- generated files: pimpl.hpp pimpl.cpp
- test program: pimpl/
Perhaps for future-proofing ABI compatibility, or just to hide implementation details, one often sees the application of the Pimpl idiom or some variation thereof. The idea is to separate at least the private instance data (or perhaps the entire implementation of a class interface) into a separate class whose definition is not given in public header files. Just a pointer to an instance of that class is kept in the “public” class, meaning that even if the implementation class changes, the size of the public class stays the same.
This approach has its benefits, but entails more typing when done manually, and hence this is a potential application for source-to-source translation based automation.
With automation of the boilerplate coding it probably makes sense to hide not only the instance data but also the private methods behind an opaque pointer, as is done in this example. This way one can see the whole implementation by looking at the implementation class (here Numbers::Impl
) alone, as the public class is nothing but a wrapper.
[recur] explicit tail calls
In C++, not all compilers consistently perform tail call optimizations where possible. And if one cannot be sure of such optimization taking place, in cases where many repeated tail calls are possible one may wish to avoid recursion altogether. Which is a shame as using some other looping construct may be less readable and more effort to write.
It is possible to implement a looping construct that has syntax similar to recursive function calls, and which translates to something that does not consume stack with every iteration.
This example gets its name from recur
, which “is the only non-stack-consuming looping construct in Clojure”. The syntax used is that of Scheme’s named let
.
[scope_exit] execute a statement at scope exit
- spec files: scope_exit.ext.cpp
- generated files: scope_exit.hpp scope_exit.cpp
- test program: scope_exit/
The idea here is to make it more convenient to use RAII in order to ensure that a particular operation is executed at scope exit. Leave it to the source-to-source translator to define a class and a destructor and to instantiate it in order to get the code you specify executed. The syntax used is scope
exit
statement, which is similar to the scope(exit)
construct of the D language.
[two_phase] Symbian two-phase construction idiom
- spec files: two_phase.ext.cpp
- generated files: two_phase.hpp two_phase.cpp
- test program: two_phase/
Symbian has its own form of exceptions called leaves. These are not allowed in constructors, and if a ctor may leave, then the object is not considered fully constructed after the ctor has been invoked. Rather, one must also invoke a method named ConstructL
, whose naming is by convention. Often, for convenience, a class includes NewL
and/or NewLC
static methods that invoke both the constructor and the ConstructL
method.
Suppose one were to simply write a constructor, and annotate it with attributes specifying whether the ctor is potentially leaving
, and whether one wants NewL
, NewLC
, or both. This source-to-source transformation example explores that scenario.