Patching the compiler: a case study


This paragraph presents a case study of a patched compiler which addresses a minor shortcoming of YAFL's current implementation. The BOOLEAN constant denotations TRUE and FALSE are resp. translated into 1 and 0 in the intermediate C source code. However, C considers any non-zero value as TRUE, and may return an arbitrary non-zero value instead of 1 when evaluating comparison expressions.

The YAFL compiler naively translates the following YAFL statement:

  Flag := (i = j);

into:

  Flag = (i == j);

(The above example is not really correct, since YAFL identifiers are mangled before being translated into C. However, the basic patterns of the compilation scheme are clear) According to the C language definition, the integer value which will be stored in the C variable Flag may hold another value than 1 or 0. This behaviour is harmless in most cases, but can be disastrous if one relies on the fact that TRUE should always be represented byÿ1.

The purpose of this case study is to show how one can improve the compiler in order to ensure that TRUE values are always represented by 1, without altering the internal engine of the compiler at all. It can even be done without any access to its source code.

The original compiler uses a class Expression to describe binary and unary expressions. We must declare an Expression-derived class, in a definition module:

DEFINITION MODULE MyPatch;

FROM YaflExpressions IMPORT Expression;
IMPORT YaflGC;

  CLASS SafeExpression;
    INHERITS Expression;
    REDEFINE METHOD GenerateCode(Output: YaflGC);
  END SafeExpression;

END MyPatch;

and in the corresponding implementation module:

IMPLEMENTATION MODULE MyPatch;

  FROM YaflPredefined IMPORT PredefItems;
  FROM YaflType IMPORT Type;

  CLASS SafeExpression;
  INHERITS Expression;

    REDEFINE METHOD GenerateCode(Output: YaflGC);
      VAR
        t: Type;
      BEGIN
      -------------------
      -- GetType is a methods which is inherited
      -- from the base class Expression, and which
      -- returns the resulting type
      -------------------
      t := GetType;
      ASSERT t <> VOID;
      IF (t.ArrayLevel = 0) AND
	 (t.SimpleType = PredefItems.Boolean) AND 
	 (FirstExpr <> VOID) AND
	 (FirstExpr.GetType.SimpleType <> PredefItems.Boolean) AND
	 (SecondExpr.GetType.SimpleType <> PredefItems.Boolean) THEN
	 ------------------------
	 -- If it is indeed a binary expression,
	 -- and if its type is boolean and its array
	 -- level is 0, then our alternate code
	 -- generation scheme applies.
	 ------------------------
	 Output.LeftParent;
	 -------------------------
	 -- Now, call the original version of
	 -- the code generation method we
	 -- are redefining.
	 -------------------------
	 BASE(Output);
	 Output.QuestionMark;
	 Output.WriteChar ('1');
	 Output.Colon;
	 Output.WriteChar ('0');
	 Output.RightParent;
	ELSE
	 -------------------------
	 -- If our expression is not a
	 -- binary boolean expression,
	 -- use the original code generation
	 -- scheme.
	 -------------------------
	 BASE(Output);
	 END;
       END Generate;
     END SafeExpression;

END MyPatch;

Using this slightly modified form of the code generator, the YAFL statement:

  Flag := (i = j);

would be translated into:

  Flag = ((i == j)?1:0);

which does meet our requirement of always using 1 for TRUE, instead of any non-zero value.

This example is not complete. The mechanism which tells the compiler to use our new SafeExpression class instead of its own Expression is not described, but it is a purely administrative process.

Beside requiring no alteration of the original source code, this mechanism also offers the major advantage of an extreme conciseness. Very little original code had to be written. The parsing and analysis part of the compilation scheme are inherited with no modification at all, and even the code generation pass relies on the version it redefines through the pseudo-method BASE.

It is true, of course, that careless redefinition of a code generation method, without explicit call to the BASE pseudo-method may change the whole behaviour of the generated program, but this is the kind of risk one must be ready to take for the ultimate flexibility provided by YAFL's compiler support.