Cforall Design Issues


C9x Upgrades

Cforall is (nearly) a superset of C89. It should be a superset of C9x. A near-final draft of the proposed standard adds some features that affect Cforall.

New Types

C9x defines a new long long, and allows for additional implementation-defined integral types. It also provides a family of complex types: complex float, complex double, complex long double, and optionally imaginary float, imaginary double, and imaginary long double. These types require new overloadings of the built-in operators. (The complex types might only be accessible if the <complex.h> header has been included; if so, that header would contain the complex operator declarations.)

The complex operators are interesting because they mix types, instead of having the same type for all parameters and the return type.

imaginary float ?*?(imaginary float, float),
                ?*?(float, imaginary float);
float ?*?(imaginary float, imaginary float);
complex float ?*?(complex float, complex float);
complex float ?+?(float, imaginary float),
              ?+?(imaginary float, float);

Since the set of integer types is implementation-defined, the `usual arithmetic conversions' refer to the type's integer conversion rank (roughly, the size of the type), which is a relative measure. Cforall's overload resolution rules must be rewritten to get rid of the absolute `conversion cost' measure.

The restrict Qualifier

The restrict type qualifier aids optimizers by letting programmers tell compilers that certain pointers do not point to the same object. This could double the number of overloadings of operators on pointers; they must have overloadings for restrict-qualified pointers as well as for the other qualified versions of pointer types.

Standard Library

The header <stdbool.h> defines a bool type, which is a synonym for some other integer type (possibly an implementation-defined type). <inttypes.h> declares many typedefs for integer types having certain exact widths and minimal widths, the fastest integer types having certain minimal widths, an integer type large enough to hold an object pointer, and the largest integer type. The Cforall version of the headers might define the types using type declarations instead of typedef or #define.

Header <tgmath.h> defines type-generic macros for mathematical functions. For instance, in the C code `sqrt(v)' the argument v can have any floating point or complex type, and the result will have the same type. Cforall can do this cleanly using overloading. (I'd hate to do this as a true macro. I predict that this is the just the snout of the camel, and the whole camel of overloading will be in the C0x tent.)

Enumerated Types

C's enumeration constants have type int, not the enumerated type that declared them. C++ changed that, so that overload resolution does the sensible thing when a function name that is overloaded for int and an enumerated type is applied to an enumeration constant. Cforall should do the same.

Effects:

Characters

C's character literals have type int, not char. C++ changed that, so that overload resolution does the sensible thing when a function name that is overloaded for int and char is applied to 'x'. Cforall should do the same.

Reference Manual changes:

Standard Library

Cforall's version of the standard library could be slightly different from C's.

Conversions

I distrust implicit conversions, but the Rational type rational.cf is a case where they are useful. It provides mixed-mode arithmetic through polymorphic operators that convert their operands, using the convention that widen() is the name of the conversion. The approach has several defects.

  1. widen()'s result type doesn't depend on its argument type. In an expression like `r + widen(i)', it can be hard to figure out what i is being widened to, and why.

    This is a predictable consequence of allowing overloaded functions to differ only in their return type; it is also about as bad as it gets. The main reason for providing mixed-mode operators in rational.cf is to avoid forcing programmers to scatter widen() calls throughout their code.

  2. The widen/narrow convention may require a standard library of widen()/narrow() functions for the built-in types.
  3. The polymorphic widener
    forall(type T | int widen(T)) Rational widen();
    can create Rationals from other user-defined types, even those written later. But there is no way to define a matching polymorphic widener from Rational to other types. The obvious code is
    spec widenable_from_rat(type T) { 
        T widen(int); 
        T widen(unsigned); 
        T ?/?(T, T); 
    } 
    
    forall(type T | widenable_from_rat(T)) 
    T widen(Rational r) { 
        return widen(r.numerator)/widen(r.denominator); 
    }
    This fails because Cforall requires that the value of T be inferable from the argument of widen().
  4. Some innocent-looking uses of widen() will be ambiguous.
    void     f(double); 
    void     f(float);
    Rational r;
    f(7);               /* unambiguous (converts to float). */
    f(widen(r));        /* ambiguous: widen to double or float? */
    f((float)widen(r)); /* unambiguous. */
    widen() fails where implicit conversions succeed because "widen" functions do not have conversion costs.
  5. The polymorphic function definitions are trivial and repetitive. A full set of operators would be quite large.
  6. Every function that should provide implicit conversion of its arguments must be rewritten to be (or have a wrapper function written that is) polymorphic and uses widen().
  7. Conversion of user-defined types must use widen() or narrow(); pre-defined types may use casts. I would prefer that user-defined and pre-defined types have identical capabilities.

An extension for user-definable conversions could solve these problems, without being much uglier than a boot. It must do two things.

I suggest adding (in the Cforall style of ugly operator identifiers) a cast operator `()?'. Cast expressions are rewritten as function calls and interpreted much like all the other operators, except that the expression has at most one valid interpretation whose type is the cast-to type.

T1 ()? (T2);	/* Explicit conversion from type T2 to T1. */
T2 v2;
... (T1)v2 ...	/* Calls the explicit conversion declared above. */

For mnemonic purposes, the `?' marks the position of the value operand of the cast. There is no `?' corresponding to the type name because it is not an argument to the function.

There would also be two keywords safe and unsafe (syntactically, they would be function-specifiers, like C9x inline) for declaring implicit conversions. The monomorphic `widen()' and `narrow()' functions from rational.cf would be replaced by

safe Rational ()? (int i) {
    Rational result = {i, 1};
    return result;
}

safe float ()? (Rational r) {
    return (float)r.numerator/(float)r.denominator;
}

safe double ()? (Rational r) {
    return (double)r.numerator/(double)r.denominator;
}

unsafe int ()? (Rational r) {
    return r.numerator/r.denominator;
    }

The more dangerous the conversion is, the more work it is to type in.

Unfortunately, `safe' and `unsafe' may be common identifiers in legacy C code.

Type Hierarchy

The type hierarchy would be inferred from the safe declarations: if `safe T1 ()? (T2)' exists, then T1 is `wider' than T2. A cycle of safe conversions is a static error. Make a directed graph using the safe conversions for edges; the conversion cost between two types is the longest path between them in the graph. Conversion costs can change as safe conversions go in and out of scope.

void     f(double);	/* (1) */
void     f(float);	/* (2) */
void     g(Rational);
Rational r;
g(7);	/* implicit safe conversion from int to Rational. */
f(r);	/* unambiguous: call (2) after safe conversion to float. */

The pre-defined type hierarchy would be explained by a set of pre-defined safe functions, such as `safe double ()?(float)'. The conversions above insert Rational between int and float, and leave its relationship to long undefined.

Since the "usual arithmetic conversions" and the set of integral types is implementation-dependant in C9x, the set of predefined conversions will be partially implementation-defined in Cforall. The result must be much the same as C9x's notion of integer conversion rank, but will have a slightly finer grain because it will explain a particular implementation's preference for promoting signed types to unsigned (or vice versa) in the "usual arithmetic conversions".

Polymorphic Conversion Functions

The polymorphic widen() to Rational would now be written
forall(type T | safe int ()?(T))
safe Rational ()?(T nu m) {
    Rational result = {(int)num, 1};
    return result;
}

Rational is now `wider' than any type with a safe conversion to int. The cost of a whatever-to-Rational conversion is one greater than the cost of whatever-to-int, because of the int-to-Rational conversion in the hierarchy.

Since the target type appears in the cast expression, we can use a special type inference rule to allow polymorphic conversions from Rational.

spec widenable_from_rat(type T) { 
    safe T ()? (int); 
    safe T ()? (unsigned); 
    T ?/?(T, T); 
}

forall(type T | widenable_from_rat(T))
safe T ()? (Rational r) {
    return ((T)r.numerator) / ((T)r.denominator);
}

/* User-defined infinite precision integer type: */
extern type Longest | widenable_from_rat(Longest);
... (Longest)r ...	/* Infer T to be Longest */

(In fact, this polymorphic conversion could replace all three monomorphic conversion operators from Rational to floating point types above.)

The general case for a polymorphic conversion has this form:

forall(type-decls)
return-type ()?(parms);
... (target-type)exp ...

To perform type inference, we must unify return-type and target-type. I think the compiler can do that by doing exactly the same things it would do for

forall(type-decls)
return-type polyfunc(return-type,parms);
target-type dummy;
... polyfunc(dummy, exp) ...

Note that each polymorphic conversion adds an unbounded number of edges to the type hierarchy graph. Each new type could be linked into the hierarchy by an existing polymorphic conversion, and each new function could create a link by causing an existing type to satisfy an assertion. This is worrisome because the hierarchy must be statically acyclic. I see two possible solutions.

  1. Check the hierarchy for cycles after every declaration (in the worst case). Fortunately, the hierarchy is usually small.
  2. Imitate C++, where any number of things that could be errors are ignored if they are not used; for instance, ambiguous member overloadings are only reported if some code tries to use one of the members. Similarly, Cforall would allow a program to declare a cycle of safe conversions if it never needed to calculate the conversion cost of a conversion involving a cycle. Since explicit conversions eliminate the need for comparisons of conversion costs, a programmer could eliminate `cycle' errors by adding explicit conversions.

Chaining

If conversions from T1 to T2 and from T2 to T3 exist, should Cforall automatically chain them to form a T1-to-T3 conversion where needed? I think the answer is yes. The programmer could force such chaining by adding a polymorphic conversion:

forall(type T1, type T2 | T2 ()?(T1), type T3 | T3 ()?(T2))
T3 ()?(T1 t1) { return (T3)(T2)t1; }
so we save the Cforall implementor little work by answering "no".

A chain is as safe as the least safe conversion in it. There may be more than one possible chain connecting two types; the compiler should probably choose among them using the existing "best valid interpretation" rules from Reference Manual section 5.2.1.

Alternatives

One way to eliminate the safe and unsafe keywords is to code them into the operator identifiers. There would be three identifiers: `()?', `(safe)?', and `(unsafe)?'. The rewrite rule for cast expressions, and the Reference Manual's definition of the interpretation of an identifier, would become hairier. Also, is it an error to declare both `T1 ()?(T2)' and `T1 (safe)?(T2)'?

Another alternative removes the safe and unsafe keywords, and adds clauses to type declarations to position the declared type in the hierarchy: `type Rational < float > int'. Implicit conversion would be allowed between any types connected in the hierarchy. I don't see how to specify a place in the hierarchy for enum, struct, or pointer types. I also don't see how to do the polymorphic positioning (`Rational is wider than any type that is narrower than int') that a safe polymorphic `()?' does.

Inline Types

The Cforall version of <stddef.h> could declare size_t with a type declaration to allow overloading. There are two ways to do it.

static type size_t | integer(size_t) = unsigned long;
(Assuming integer is a spec that declares integer operations.)

This makes the implementation of size_t visible wherever <stddef.h> is included, so the two types will be inter-convertible. Operations on size_ts will be efficient, but there will be little gain in type safety. Also, if the program also declares `extern void* malloc(size_t);', then `malloc()' will have a larger scope than the type of its argument.

extern type size_t | integer(size_t);
This provides type safety, but operations like ?==? will be external function calls, and hence relatively inefficient.

C9x provides inline functions. Types and functions have very similar linkage semantics in Cforall (deliberately, since alloca()-style functions might be used to implement types). Perhaps types can be declared inline as well?

inline type size_t | integer(size_t) = unsigned long;

By analogy with C9x function semantics, this would declare size_t as a type with external linkage, and would preserve type safety, but would reveal the implementation to the compiler for optimization. The Cforall code for the standard library would contain

#include <stddef.h>
extern type size_t;

which tells the compiler to generate the data items or code that implement size_t.

In this case, the compiler should use the default (unsigned long) operations for all of the operations on size_t declared by integer(size_t). In most cases, some operations on a type must be redefined. I suggest that any declaration of an operation should override the default implementation. (This is a small change from current Cforall.) For instance, here is a counter type that is just like unsigned int except that decrementing it sticks at zero instead of wrapping around.

inline type counter | integer(counter) = unsigned;

/* Inline definition replaces the default implementation. */
inline counter ?=-?(counter* c, unsigned u) {
    if (u < c) c = (unsigned)c - u;
    else c = 0;
    return c;
}

/* Decrement is implemented in another translation unit. */
extern counter --?(counter*), ?--(counter*);

Syntactic Fantasies

Something about Cforall encourages me to play a `what-if' game: `if this sequence of tokens was legal, what would it mean?' Think of it as orthogonality at the syntactic level. Don't take any of this too seriously.

Type Expressions

Right now, the initializer in a type declaration must be a type name, which is the type equivalent of a constant. Initializers for variables can be expressions. Can type initializers be expressions? The only operator that seems to make sense for types is the conditional operator. The definition of the smallest integer type containing at least 16 bits would be

type int_least16_t = (SHORT_MAX >= 32767) ? short : int;

This would be an alternative to a preprocessor #if statement.

Compound Literals

C9x provides compound literals. (GNU C provides similar but slightly weaker `constructor expressions')

struct foo {
    int  f1;
    char f2;
}
... (struct foo){42, 'x'} ...
... (struct foo){.f2='x', .f1=42} ...

Compound literals are modifiable objects, with addresses. By contrast, cast expressions produce values. This matches my idea of the difference between conversions and constructors: conversions create values from values, constructors build objects from values.

Strangely, the compound literal syntax works for scalar types, too (at least, as I read the draft standard, and GCC 2.7.2.1 allows it). `(long)1' produces the value 1 with type long. `(long){1}' creates an anonymous long object, initialized to 1.

This could be shoe-horned into Cforall as multi-argument explicit `()?' operators with an lvalue-qualified result type. Some sort of magic would allocate the memory used for the object.

lvalue Rational ()?(Rational* magic, int numerator, unsigned denominator) {
    magic->numerator = numerator;
    magic->denominator = denominator;
    return *magic;
}
... (Rational){1,2} ... /* Rewritten as ()?(magic(), 1, 2) */

The two long operators mentioned above might be declared

long ()?(int);                /* conversion  */
lvalue long ()?(long*, int);  /* constructor */

In effect, we get C++ constructors with a C-like calling syntax. That means we would have to work out the semantics of constructors.

Doing compound literals to the nines requires adding keyword parameters to the language, to support `.f1=42' above, which probably isn't worth the trouble. I also don't see how to work array compound literals into the scheme.

... (float [3]){1.1, 2.2, 3.3} ...  /* Create 3-element array. */
... (float []){4.4, 5.5} ...        /* Create 2-element array. */

Member Operations

Consider allowing declarations of postfix `operators' that would provide `pseudo-members'.

type Complex = struct { float real, imaginary;};
float   ?.theta(Complex) { ... }
float   ?.rho(Complex) { ... };

Complex c, *cp;
float   f;
f = c.theta;  /* equivalent to f = ?.theta(c) */
f = cp->rho;  /* equivalent to f = (*cp).rho and hence to f = ?.rho((*cp)) */

`?.theta' and `?.rho' would provide a way to view complex numbers as structures having polar-coordinate members, even if they were implemented using ``real'' and ``imaginary'' members. The notational advantage of this scheme over two ordinary functions called rho and theta is small.

I can also imagine allowing `float ?.theta=?(Complex*,float)', which would allow assignment to the theta pseudo-member. (The function would have to alter the real and imaginary members of the Complex structure.) There is some overlap between member-assignment functions and lvalue functions.

Objects

Within the function

forall(type T | ops(T)) void f(T t)

the object t does not have a known type, but it does have a known set of operations. It is an object in the `object-oriented' sense as well as the C sense of the word. In Cforall such objects only exist within polymorphic functions. In object-oriented languages they are not so constrained.

Cforall could provide objects two ways.

Existential Types
forsome(type T | ops(T)) T t;
Dependant Types
struct {
    type T | ops(T);
    T val;
} t;

Either approach is more powerful than pure object orientation. Here are declarations of a pair of items where the item type is not known, but it is known that they have exactly the same type.

forsome(type T) struct {T left, right};
struct { type T; T left, right; };

Non-Type Inference

All inferred parameters in a forall clause are types. We could allow unsigned, and try inferring array sizes.

forall(unsigned n) int sum(int a[n]);
int costs[20];
...
int total = sum(costs); /* n inferred to be 20. */
Traditional C array parameter declarations would not be useful here, because they do not provide size information.
void f(int *a) {... sum(a) ...}  /* Error! */
void g(int a[]) {... sum(a) ...}  /* Error! */

A declaration with a size is still not really good enough.

void f(int a[12]) {... sum(a) ...} /* Dicey! */

In C, the argument passed to a does not have to be a 12-element array; it can be any size of array, or even a null pointer! Cforall with this extension would have to tighten this up. We could require that the compiler be able to statically determine that the argument is an array with at least 12 elements. I don't know how much code would break.

Nick Maclaren once argued (ftp://oozelum.csi.cam.ac.uk/dist/C9X.restrict, link broken as of 2007.11.7) that, for restrict-qualified array parameters, array size declarations should be respected. His reason is that, on modern computers where the CPU is vastly faster than the main memory, Fortran compilers have taken to prefetching array parameters. C compilers cannot safely perform the same optimization unless they can assume that the argument is a big-enough array. He wanted the C9X standard to say that passing a null pointer or an array with insufficient elements causes undefined behaviour. This would break less code, and preserve more bugs.

Cforall would also change the meaning of the cast `(int [20])a', which would no longer mean quite the same thing as `(int*)a'.

Structure Prototypes

This was proposed in February 1998 in comp.std.c by bill@cafe.net (Kaz Kylheku). The idea is that eliding member names in a structure prevents access to the members.
/* list.h */

	typedef struct list_t {
		unsigned long list_count;
		struct list_node_t *;
		struct list_node_t *;
	} list_t;

	extern unsigned long list_count(struct list_t *l);
	#define list_count(L) ((L)->list_count)

	typedef struct list_node_t {
		void *list_node_data;
		struct list_node_t *;
		struct list_node_t *;
	} list_node_t;

	#define list_node_set_data(L, D) ((L)->list_node_data = (D))
	#define list_node_get_data(L) ((L)->list_node_data)


	/* list.c */

	#include "list.h"

	struct list_t {
		unsigned long;
		struct list_node_t *head;
		struct list_node_t *tail;
	};

	struct list_node_t {
		void *;
		struct list_node_t *next;
		struct list_node_t *prev;
	};

Structure prototypes are complete types, so you can instantiate them; you just can't touch certain members without cheating. This has some advantages over C++ private fields: it doesn't pollute any name spaces, and you don't have to define who has private access in a language that does not have member functions.

On the down side, it interacts with Cforall's anonymous members and inheritance.