Cforall is (nearly) a superset of C89. It should be a superset of C9x. A near-final draft of the proposed standard adds some features that affect Cforall.
C9x defines a new long long,
and allows for additional implementation-defined integral types. It also
provides a family of complex types: complex float,
complex double, complex long double, and
optionally imaginary float, imaginary double, and
imaginary long double. These types require new overloadings
of the built-in operators. (The complex types might only be accessible if
the <complex.h> header has been included; if so, that
header would contain the complex operator declarations.)
The complex operators are interesting because they mix types, instead of having the same type for all parameters and the return type.
imaginary float ?*?(imaginary float, float),
?*?(float, imaginary float);
float ?*?(imaginary float, imaginary float);
complex float ?*?(complex float, complex float);
complex float ?+?(float, imaginary float),
?+?(imaginary float, float);
Since the set of integer types is implementation-defined, the `usual arithmetic conversions' refer to the type's integer conversion rank (roughly, the size of the type), which is a relative measure. Cforall's overload resolution rules must be rewritten to get rid of the absolute `conversion cost' measure.
restrict QualifierThe restrict type qualifier aids optimizers by letting
programmers tell compilers that certain pointers do not point to the same
object. This could double the number of overloadings of operators on
pointers; they must have overloadings for restrict-qualified
pointers as well as for the other qualified versions of pointer types.
The header <stdbool.h> defines a bool
type, which is a synonym for some other integer type (possibly an
implementation-defined type). <inttypes.h> declares
many typedefs for integer types having certain exact widths and minimal
widths, the fastest integer types having certain minimal widths, an integer
type large enough to hold an object pointer, and the largest integer type.
The Cforall version of the headers might define the types using
type declarations instead of
typedef or #define.
Header <tgmath.h> defines type-generic macros
for mathematical functions. For instance, in the C code
`sqrt(v)' the argument v can have any floating
point or complex type, and the result will have the same type.
Cforall can do this cleanly using overloading. (I'd hate to do this as a
true macro. I predict that this is the just the snout of the camel, and
the whole camel of overloading will be in the C0x tent.)
C's enumeration constants have type int, not the enumerated
type that declared them. C++ changed that, so that overload resolution
does the sensible thing when a function name that is overloaded for
int and an enumerated type is applied to an enumeration
constant. Cforall should do the same.
Effects:
enum color and enum fruit
could both define ORANGE.)
int), and
implicit (probably unsafe) conversions back.
int. Other
operations on enumeration values are handled via implicit
conversions.
C's character literals have type int, not
char. C++ changed that, so that overload resolution does the
sensible thing when a function name that is overloaded for int
and char is applied to 'x'. Cforall should do
the same.
Reference Manual changes:
'x')
and escape-sequence literals ('\n',
'\x30', '\30') have type
char, multi-character literals
('xy') have type int, and wide
literals (L'x') have type wchar_t.
Cforall's version of the standard library could be slightly different from C's.
size_t as
typedef or #define synonyms for existing
types. This makes it difficult to overload a function for
size_t and other integer-type arguments in portable
code, because the programmer doesn't know what size_t
is a synonym for.
Cforall could use type declarations for
size_t and the like. That would
allow overloading. It could also reduce the efficiency of
operations on size_t, and would break unportable C code
that `knows' what type size_t is.
Another reason to do this is to forbid nonsensical
operations. For instance, ISO C defines time_t to be an
arithmetic type, but defines no arithmetic properties beyond the
legality of `t == (time_t)-1'. We should remove the
other operations, or provide standard arithmetic functions based on
difftime(), etc., that will work.
bsearch() and
qsort() would improve type safety. Existing calls
would not be affected.
forall(type T)
T* bsearch(const T* key, const T* base, size_t nmemb,
int (*compar)(const T*, const T*));
forall(type T | int ?(T,T) | int ?==?(T,T) | int ?>?(T,T))
T* bsearch(const T* key, const T* base, size_t nmemb);
NULL should be defined as
(forall(type T) T*)0, not
(void*)0.
strchr().
char* strchr(char*, int); const char* strchr(const char*, int);
integer, arithmetic, ... .
arithmetic_base would define
?+=?, but not ?+?.
forall(type T | arithmetic_base(T))
T ?+?(T a, T b) { return a += b; }
I distrust implicit conversions, but the Rational type rational.cf is a case where they are
useful. It provides mixed-mode arithmetic through polymorphic operators
that convert their operands, using the convention that widen()
is the name of the conversion. The approach has several defects.
widen()'s result type doesn't depend on its argument
type. In an expression like `r + widen(i)', it can be
hard to figure out what i is being widened to, and why.
This is a predictable consequence of allowing overloaded
functions to differ only in their return type; it is also about as
bad as it gets. The main reason for providing mixed-mode operators
in rational.cf is to avoid forcing programmers to scatter
widen() calls throughout their code.
widen()/narrow() functions for the built-in types.
forall(type T | int widen(T)) Rational widen();can create Rationals from other user-defined types, even those written later. But there is no way to define a matching polymorphic widener from Rational to other types. The obvious code is
spec widenable_from_rat(type T) {
T widen(int);
T widen(unsigned);
T ?/?(T, T);
}
forall(type T | widenable_from_rat(T))
T widen(Rational r) {
return widen(r.numerator)/widen(r.denominator);
}
This fails because Cforall requires that the value of T be
inferable from the argument of widen().
widen() will be ambiguous.
void f(double); void f(float); Rational r; f(7); /* unambiguous (converts to float). */ f(widen(r)); /* ambiguous: widen to double or float? */ f((float)widen(r)); /* unambiguous. */
widen() fails where implicit conversions succeed
because "widen" functions do not have conversion costs.
widen().
widen() or
narrow(); pre-defined types may use casts. I would
prefer that user-defined and pre-defined types have identical
capabilities.
An extension for user-definable conversions could solve these problems, without being much uglier than a boot. It must do two things.
int to void*),
implicit and safe (int to long), or
implicit and unsafe (int to short).
I suggest adding (in the Cforall style of ugly operator
identifiers) a cast operator `()?'. Cast expressions are
rewritten as function calls and interpreted much like all the other
operators, except that the expression has at most one valid interpretation
whose type is the cast-to type.
T1 ()? (T2); /* Explicit conversion from type T2 to T1. */ T2 v2; ... (T1)v2 ... /* Calls the explicit conversion declared above. */
For mnemonic purposes, the `?' marks the position of the
value operand of the cast. There is no `?' corresponding to
the type name because it is not an argument to the function.
There would also be two keywords safe and
unsafe (syntactically, they would be
function-specifiers, like C9x inline) for declaring
implicit conversions. The monomorphic `widen()' and
`narrow()' functions from rational.cf would be
replaced by
safe Rational ()? (int i) {
Rational result = {i, 1};
return result;
}
safe float ()? (Rational r) {
return (float)r.numerator/(float)r.denominator;
}
safe double ()? (Rational r) {
return (double)r.numerator/(double)r.denominator;
}
unsafe int ()? (Rational r) {
return r.numerator/r.denominator;
}
The more dangerous the conversion is, the more work it is to type in.
Unfortunately, `safe' and `unsafe' may be
common identifiers in legacy C code.
The type hierarchy would be inferred from the safe
declarations: if `safe T1 ()? (T2)' exists, then
T1 is `wider' than T2. A cycle of
safe conversions is a static error. Make a directed graph
using the safe conversions for edges; the conversion cost between two types
is the longest path between them in the graph. Conversion costs
can change as safe conversions go in and out of scope.
void f(double); /* (1) */ void f(float); /* (2) */ void g(Rational); Rational r; g(7); /* implicit safe conversion from int to Rational. */ f(r); /* unambiguous: call (2) after safe conversion to float. */
The pre-defined type hierarchy would be explained by a set of
pre-defined safe functions, such as `safe double
()?(float)'. The conversions above insert
Rational between int and float, and
leave its relationship to long undefined.
Since the "usual arithmetic conversions" and the set of integral types is implementation-dependant in C9x, the set of predefined conversions will be partially implementation-defined in Cforall. The result must be much the same as C9x's notion of integer conversion rank, but will have a slightly finer grain because it will explain a particular implementation's preference for promoting signed types to unsigned (or vice versa) in the "usual arithmetic conversions".
widen() to
Rational would now be written
forall(type T | safe int ()?(T))
safe Rational ()?(T nu m) {
Rational result = {(int)num, 1};
return result;
}
Rational is now `wider' than any type with a safe conversion to
int. The cost of a whatever-to-Rational
conversion is one greater than the cost of whatever-to-int,
because of the int-to-Rational conversion in the
hierarchy.
Since the target type appears in the cast expression, we can use a special
type inference rule to allow polymorphic conversions from
Rational.
spec widenable_from_rat(type T) {
safe T ()? (int);
safe T ()? (unsigned);
T ?/?(T, T);
}
forall(type T | widenable_from_rat(T))
safe T ()? (Rational r) {
return ((T)r.numerator) / ((T)r.denominator);
}
/* User-defined infinite precision integer type: */
extern type Longest | widenable_from_rat(Longest);
... (Longest)r ... /* Infer T to be Longest */
(In fact, this polymorphic conversion could replace all three monomorphic conversion operators from Rational to floating point types above.)
The general case for a polymorphic conversion has this form:
forall(type-decls) return-type ()?(parms); ... (target-type)exp ...
To perform type inference, we must unify return-type and target-type. I think the compiler can do that by doing exactly the same things it would do for
forall(type-decls) return-type polyfunc(return-type,parms); target-type dummy; ... polyfunc(dummy, exp) ...
Note that each polymorphic conversion adds an unbounded number of edges to the type hierarchy graph. Each new type could be linked into the hierarchy by an existing polymorphic conversion, and each new function could create a link by causing an existing type to satisfy an assertion. This is worrisome because the hierarchy must be statically acyclic. I see two possible solutions.
If conversions from T1 to T2 and from
T2 to T3 exist, should Cforall automatically
chain them to form a T1-to-T3 conversion where
needed? I think the answer is yes. The programmer could force such
chaining by adding a polymorphic conversion:
forall(type T1, type T2 | T2 ()?(T1), type T3 | T3 ()?(T2))
T3 ()?(T1 t1) { return (T3)(T2)t1; }
so we save the Cforall implementor little work by answering "no".
A chain is as safe as the least safe conversion in it. There may be more than one possible chain connecting two types; the compiler should probably choose among them using the existing "best valid interpretation" rules from Reference Manual section 5.2.1.
One way to eliminate the safe and unsafe
keywords is to code them into the operator identifiers. There would be
three identifiers: `()?', `(safe)?', and
`(unsafe)?'. The rewrite rule for cast expressions, and the
Reference Manual's definition of the interpretation of an identifier, would
become hairier. Also, is it an error to declare both `T1
()?(T2)' and `T1 (safe)?(T2)'?
Another alternative removes the safe and
unsafe keywords, and adds clauses to type declarations to
position the declared type in the hierarchy: `type Rational < float >
int'. Implicit conversion would be allowed between any types
connected in the hierarchy. I don't see how to specify a place in the
hierarchy for enum, struct, or pointer types. I
also don't see how to do the polymorphic positioning
(`Rational is wider than any type that is narrower than
int') that a safe polymorphic `()?' does.
The Cforall version of <stddef.h> could declare
size_t with a type declaration to
allow overloading. There are two ways to do it.
static type size_t | integer(size_t) = unsigned long;
integer is a spec that declares integer
operations.)
This makes the implementation of size_t visible
wherever <stddef.h> is included, so the two types
will be inter-convertible. Operations on size_ts will
be efficient, but there will be little gain in type safety. Also,
if the program also declares `extern void*
malloc(size_t);', then `malloc()' will have a
larger scope than the type of its argument.
extern type size_t | integer(size_t);
?==?
will be external function calls, and hence relatively inefficient.
C9x provides inline functions. Types and functions have very similar
linkage semantics in Cforall (deliberately, since
alloca()-style functions might be used to implement types).
Perhaps types can be declared inline as well?
inline type size_t | integer(size_t) = unsigned long;
By analogy with C9x function semantics, this would declare
size_t as a type with external linkage, and would preserve
type safety, but would reveal the implementation to the compiler for
optimization. The Cforall code for the standard library would contain
#include <stddef.h> extern type size_t;
which tells the compiler to generate the data items or code that
implement size_t.
In this case, the compiler should use the default (unsigned long)
operations for all of the operations on size_t declared by
integer(size_t). In most cases, some operations on a type
must be redefined. I suggest that any declaration of an operation should
override the default implementation. (This is a small change from current
Cforall.) For instance, here is a counter type that is just
like unsigned int except that decrementing it sticks at zero
instead of wrapping around.
inline type counter | integer(counter) = unsigned;
/* Inline definition replaces the default implementation. */
inline counter ?=-?(counter* c, unsigned u) {
if (u < c) c = (unsigned)c - u;
else c = 0;
return c;
}
/* Decrement is implemented in another translation unit. */
extern counter --?(counter*), ?--(counter*);
Something about Cforall encourages me to play a `what-if' game: `if this sequence of tokens was legal, what would it mean?' Think of it as orthogonality at the syntactic level. Don't take any of this too seriously.
Right now, the initializer in a type declaration must be a
type name, which is the type equivalent of a constant. Initializers for
variables can be expressions. Can type initializers be expressions? The
only operator that seems to make sense for types is the conditional
operator. The definition of the smallest integer type containing at least
16 bits would be
type int_least16_t = (SHORT_MAX >= 32767) ? short : int;
This would be an alternative to a preprocessor #if statement.
C9x provides compound literals. (GNU C provides similar but slightly weaker `constructor expressions')
struct foo {
int f1;
char f2;
}
... (struct foo){42, 'x'} ...
... (struct foo){.f2='x', .f1=42} ...
Compound literals are modifiable objects, with addresses. By contrast, cast expressions produce values. This matches my idea of the difference between conversions and constructors: conversions create values from values, constructors build objects from values.
Strangely, the compound literal syntax works for scalar types, too (at
least, as I read the draft standard, and GCC 2.7.2.1 allows it).
`(long)1' produces the value 1 with type long.
`(long){1}' creates an anonymous long object,
initialized to 1.
This could be shoe-horned into Cforall as multi-argument explicit
`()?' operators with an lvalue-qualified result
type. Some sort of magic would allocate the memory used for the object.
lvalue Rational ()?(Rational* magic, int numerator, unsigned denominator) {
magic->numerator = numerator;
magic->denominator = denominator;
return *magic;
}
... (Rational){1,2} ... /* Rewritten as ()?(magic(), 1, 2) */
The two long operators mentioned above might be declared
long ()?(int); /* conversion */ lvalue long ()?(long*, int); /* constructor */
In effect, we get C++ constructors with a C-like calling syntax. That means we would have to work out the semantics of constructors.
Doing compound literals to the nines requires adding keyword parameters to
the language, to support `.f1=42' above, which probably isn't
worth the trouble. I also don't see how to work array compound literals
into the scheme.
... (float [3]){1.1, 2.2, 3.3} ... /* Create 3-element array. */
... (float []){4.4, 5.5} ... /* Create 2-element array. */
Consider allowing declarations of postfix `operators' that would provide `pseudo-members'.
type Complex = struct { float real, imaginary;};
float ?.theta(Complex) { ... }
float ?.rho(Complex) { ... };
Complex c, *cp;
float f;
f = c.theta; /* equivalent to f = ?.theta(c) */
f = cp->rho; /* equivalent to f = (*cp).rho and hence to f = ?.rho((*cp)) */
`?.theta' and `?.rho' would provide a way to
view complex numbers as structures having polar-coordinate members, even if
they were implemented using ``real'' and ``imaginary'' members. The
notational advantage of this scheme over two ordinary functions called
rho and theta is small.
I can also imagine allowing
`float ?.theta=?(Complex*,float)',
which would allow assignment to the theta pseudo-member. (The
function would have to alter the real and imaginary members of the
Complex structure.) There is some overlap between
member-assignment functions and lvalue functions.
Within the function
forall(type T | ops(T)) void f(T t)
the object t does not have a known type, but it does have
a known set of operations. It is an object in the `object-oriented' sense
as well as the C sense of the word. In Cforall such objects only exist
within polymorphic functions. In object-oriented languages they are not so
constrained.
Cforall could provide objects two ways.
forsome(type T | ops(T)) T t;
struct {
type T | ops(T);
T val;
} t;
Either approach is more powerful than pure object orientation. Here are declarations of a pair of items where the item type is not known, but it is known that they have exactly the same type.
forsome(type T) struct {T left, right};
struct { type T; T left, right; };
All inferred parameters in a forall clause are types.
We could allow unsigned, and try inferring array sizes.
forall(unsigned n) int sum(int a[n]); int costs[20]; ... int total = sum(costs); /* n inferred to be 20. */Traditional C array parameter declarations would not be useful here, because they do not provide size information.
void f(int *a) {... sum(a) ...} /* Error! */
void g(int a[]) {... sum(a) ...} /* Error! */
A declaration with a size is still not really good enough.
void f(int a[12]) {... sum(a) ...} /* Dicey! */
In C, the argument passed
to a does not have to be a 12-element array; it can be any
size of array, or even a null pointer! Cforall with this extension would
have to tighten this up. We could require that the compiler be able to
statically determine that the argument is an array with at least 12
elements. I don't know how much code would break.
Nick Maclaren once argued
(ftp://oozelum.csi.cam.ac.uk/dist/C9X.restrict, link broken as of
2007.11.7) that, for restrict-qualified array parameters,
array size declarations should be respected. His reason is that, on
modern computers where the CPU is vastly faster than the main memory,
Fortran compilers have taken to prefetching array parameters. C compilers
cannot safely perform the same optimization unless they can assume that
the argument is a big-enough array. He wanted the C9X standard to say
that passing a null pointer or an array with insufficient elements causes
undefined behaviour. This would break less code, and preserve more bugs.
Cforall would also change the meaning of the cast
`(int [20])a', which would no longer mean quite the same
thing as `(int*)a'.
/* list.h */
typedef struct list_t {
unsigned long list_count;
struct list_node_t *;
struct list_node_t *;
} list_t;
extern unsigned long list_count(struct list_t *l);
#define list_count(L) ((L)->list_count)
typedef struct list_node_t {
void *list_node_data;
struct list_node_t *;
struct list_node_t *;
} list_node_t;
#define list_node_set_data(L, D) ((L)->list_node_data = (D))
#define list_node_get_data(L) ((L)->list_node_data)
/* list.c */
#include "list.h"
struct list_t {
unsigned long;
struct list_node_t *head;
struct list_node_t *tail;
};
struct list_node_t {
void *;
struct list_node_t *next;
struct list_node_t *prev;
};
Structure prototypes are complete types, so you can instantiate them; you
just can't touch certain members without cheating. This has some
advantages over C++ private fields: it doesn't pollute any
name spaces, and you don't have to define who has private access in a
language that does not have member functions.
On the down side, it interacts with Cforall's anonymous members and inheritance.