* and .
int i, *j, m[10];creates three variables each working with integer values but each having a different type. This syntax causes declaration errors like:
int * i, j;which means
int *i; int j;not
int *i; int *j;Thus the distribution of the declaration type over the variable list is complex. Nevertheless, this declaration syntax is extremely concise because, in practise, it is very common to have a group of related variables based on a basic type; these variables can all be declared in a single declaration using this syntax. As well, declaration errors resulting from incorrect distribution of type information occur infrequent and are detected by the type system. Finally, this declaration syntax is largely entrenched in the C style of programming. The second declaration problem is much more serious and results from the syntactic nesting used for specifying complex type information, e.g.:
char (*argv[])[]; // often confusingly written as: char **argv void (*(*p)[10])(int);The type is formed by nesting the most specific type information around the variable being declared and less specific type information is nested in layers around the more specific type information, using parenthesis to precisely control the nesting. Unfortunately, this syntax confuses even veteran C programmers and results in many errors. Furthermore, it precludes other desirable extensions like named or multiple return values from routines. There is no subtle fix for this problem because the basic idea is inappropriate for specifying declarations, which means that any fix cannot be upwards compatible with C. Other C-like programming languages do not have these declarations problems so clearly the problems can be fixed. The question is how to engineer a fix that has a reasonable chance of acceptance by the C community. First, the C declaration order type-information variable-list is retained for both technical and subjective reasons. The technical reason is that the Pascal/Ada style of variable-list type-information precludes variable initialization for simple one pass compilers, e.g., compare:
int i = 3, j = 4, k = 5;to:
i : integer = 3; j : integer = 4; k : integer = 5;The brevity of the former declaration follows the C philosophy. Both of the following alternative syntactic forms have problems:
i = 3, j = 4, k = 5 : integer; i, j, k : integer = 3, 4, 5;In the first, the compiler does not know the variable type until late in the declaration, which makes checking the validity of the initialized values difficult. In the second, the separation of the initialization values from the variables increases the potential for incorrect initialization. Finally, Pascal/Ada syntactic form deviates completely from current C syntax, making it problematic for C programmers. To handle the nested type problems the following new type, variable and routine syntax are introduced. The new style of declarations may appear together with existing C declarations in the same program block, but cannot be mixed within a specific declaration.
* is used to indicate a pointer, square brackets [] are used to represent an array, and parentheses () are used to indicate a routine declaration, which is identical to C declarations.
However, unlike C, the type declaration tokens are specified from left to right.
For instance, a variable x of type pointer to integer is defined as follows:
NEW OLD (C) * int x; int *x;Other examples are:
[20] int y; // array of 20 integers * [20] double z; // pointer to array of 20 doubles [20] * char w; // array of 20 pointers to char
Routine prototype declarations are specified with the return type enclosed in brackets, e.g.:
[ int ] ( int, * int, [] * char ) f; [ [] float f ] ( int i, double d ) g;The parameter types appear in the parenthesis (optional parameter names are allowed), and the return type appears after the parenthesis in square brackets (an optional return variable name is allowed). The parenthesis and brackets must always be specified simply to encourage readability, even when there are no parameters or value returned, as in:
[] () f; [] (int) f; [int] () f;Two differences from C are that an empty parameter list implies no parameters not an unknown parameter list (as in C++), and no return type implies no return value not an integer return value. The C keywords
extern and static, which declare externally defined and locally defined routines respectively, are supported:
extern [int] () f; static [int] (int) g;These keywords are also supported for variable declarations:
extern int v1; static * int v2;The new syntax can be used for declaring fields in a structure or union, e.g.:
NEW OLD (C)
struct s1 { struct s2 {
* int f1; int *f1;
* [* int] (int) f2; int *(*f2)(int);
}; };
union u1 { union u2 {
[5] int f1; int f1[5];
* int f2; int *f2;
}; };
static [ int x; ] foo( int y; char z1, z2; ) {
// routine body
}
First, the optional keyword static is equivalent to that in standard C.
Second, return values are named and the name is a local variable of the particular return type.
The value of the local return variable is automatically returned at routine termination.
Third, lists of declarations are separated by a semicolon instead of a comma, which allows parameter declarations to appear more like declarations and one type can declare several parameters.
An external declaration of the routine foo is written as follows:
extern [ int ] ( int, char, char ) foo;Note that the name of the return value is unnecessary.
return statement does not contain an expression, for example:
[int x;] bar() {
...
x = 0;
return;
}
When the return is encountered, the current value of x is returned to the caller.
`Falling off the bottom' of a routine is permitted, as in:
[int x;] foobar() {
x = 0;
}
In this case, the current value of x, i.e. 0, is returned to the calling routine just as if a return had been encountered.
Proof: in a distributed system, the target of an inter-system pointer can disappear forever
do C programmers really want GC forced upon them?
because destructors have side-effects, time of execution is important
separate task stacks (and possibly heaps) form independent allocation areas, which reduces unnecessary contention
Much of data structure and database research involves complex storage management, including allocation, storage layout, and managing free areas. Garbage Collection precludes much of this work because it eliminates control over storage management.
malloc/free exacerbates the problem but fundamentally storage management is difficult and problematic so attacking a single aspect does not solve the general problem.
Given the fundamental complexity of the problem, the best approach is to provide runtime tools to help understand the complex behaviour of the storage management, and in addition, introduce new programming language constructs that reduce heap usage.
Many storage management problems result from having to rewrite basic storage management algorithms for each data type and/or program. Rewriting inevitably introduces many errors. Fortunately, this problem is solvable with generic, generalized storage-management libraries, which have been extensively debugged (assuming the languages type system is significantly flexible to define them).
For example, manipulating variable-length strings generates a large number of intermediate string values.
Users should not be responsible for managing the storage for all of these values because it is tedious, error-prone and is not fundamental to the string abstraction.
Therefore, the string type must be able to manage its own intermediate string values.
It is possible to provide this level of storage management support if a data type has explicit control on the assignment and copying of all of its values (e.g. get and set routines), as in CLU.
Currently, C++ provides most of this capability but awkwardly.
Because there is no explicit control of all assignment and copying actions, it is necessary to create hidden intermediate types during certain operations to achieve the equivalent of get and set.
goto x; // disallowed with new rule because x is no longer visible
{
goto x;
x: 3;
{
goto x;
}
goto x;
}
goto x; // disallowed with new rule because x is no longer visible
Advantage: This restriction prevents branching into blocks, which ensures that all variables in a block are well formed during the block's execution, except for intra-block problems, such as:
goto x; // branch around initialization foo f( 3 ); x: f.mem(); // try to call member routine of uninitialized object
Disadvantage: Not backwards compatible, but only affects a form of branching that should seldom appear because of `structured programming'.
break and continue with a target label to support static multi-level exit [Buhr85], e.g.:
NEW OLD
x:for ( ... ) { for ( ... ) {
y:for ( ... ) { for ( ... ) {
z:for ( ... ) { for ( ... ) {
... break x; ... ... goto x; ...
... break y; ... ... goto y; ...
... break z; ... ... break; ...
} }
} } y:;
} } z:;
The inner most loop has three exit points, which cause termination of the three nested loops, respectively.
NEW OLD
x:for ( ... ) { x:for ( ... ) {
y:for ( ... ) { y:for ( ... ) {
z:for ( ... ) { for ( ... ) {
... continue x; ... ... goto x; ...
... continue y; ... ... goto y; ...
... continue z; ... ... continue; ...
} }
} }
} }
The inner most loop has three restart points, which cause the next loop iteration to begin, respectively.
A break or continue without a target label has the same meaning as the current control structure, i.e., apply to the current loop construct.
The target label must be directly associated with a for, while, do or switch statement.
When blocks are exited by a break or continue, all variables within the blocks are terminated.
break and continue are simply a goto restricted in the following ways:
Advantage:
This extension allows static multi-level exits without having to use the goto statement and ties the control flow to the target control structure rather than an arbitrary point in a program.
Furthermore, the location of the label at the beginning of the target control structure informs the reader of the program that a multi-level exit is occurring in the body of the control structure.
Having the label at the end of the control structure, which might be off the page or screen, is not as convenient.
[10,10,10] int a; // NEW int [10][10][10] a; // OLDFormal array definition:
void f( [,,] int a); // NEW void f( [10,,10] int a ); // NEW void f( int [10][][10] a ); // OLDSubscript:
a[1,2,3] = 3; // NEW a[1][2][3] = 3; // OLDNew arrays would have array descriptors, which are passed around to routines. In many cases, it should be possible to pass a new array to a routine expecting an old array, but not vice versa.
Disadvantage:
The major incompatibility is that int * no longer matches with int [], for example:
int main( int argc, char **argv ) ...must be written as:
int main( int argc, [] * [] char argv ) ...
A minor incompatibility is that a comma expression cannot appear directly in a subscript, so that a[1,2,3], which currently means a[3], must be written as a[(1,2,3)].
It is questionable if this a useful feature and it is doubtful many programs use it.
int size;
cin >> size;
int array[size + 1]; // already supported in g++
class foo {
const int size; // const not required
int array[size + 1];
public:
foo( int size ) : size(size) {} // initialization required
};
Multiple DSAs are allowed in a block or class.
In the latter example, size must be initialized when the class is instantiated.
The qualifier const implies this, but it does not seem essential to force the use of this qualifier as only the initialization is required;
without the const qualifier, size could be used as a variable in the class members.
Advantage: DSAs make use of implicit storage management on the stack for dynamically sized objects rather than explicit storage management on the heap, which reduces the potential for storage management errors. This extension is backwards compatible.
Second, modules provide a mechanism to collect interface information into units smaller than a UNIX file. That is to say, multiple module interfaces can appear in a single UNIX file.
Third, modules provide the ability to compile an interface so that is it unnecessary to reparse the interface as for textual substitution. This makes compiling programs that contain even a large number of interfaces efficient. This is similar to the same way that separate compilation improves the efficiency of the development of a large program.
* and .-> operator in C is that the priority of operators * and . are incorrect.
The priority of operator . is lower than the priority of operator *, which means *p.f executes as *(p.f) instead of (*p).f.
This behaviour is counter-intuitive to all but hard-core C programmers and is different from all modern imperative programming languages.
Instead of switching the priority of operators * and ., the operator -> was introduced to mean the two operators but performed with the correct priority, that is, p->f implies (*p).f.
Advantage:
This change corrects a long standing design error in C and would pave the way for the eventual removal of the unnecessary operator -> from the language.
Disadvantage:
This change is not backwards compatible.
However, it remains to be seen how many programs use the form *p.f to mean *(p.f).
Does this help with `smart' pointer?
int i; int &r = i; => int *const r = &i; r = 2; => *r = 2;Not particularly useful in this context except to give multiple names to the same storage. The problem is that references in C++ are severely restricted. Only one level of automatic de-reference is done, and a reference variable must be initialized on declaration. Useful for call by reference so that it is unnecessary to always have to prefix the parameter with
* and can use dot notion instead of -> notion.
string substr(string &s); string substr(string *s); s = ... s ... vs *s = ... *s ...
get and put, like CLU)
void.
Changes this to the C++ ...