A Cforall White Paper
Richard C. Bilson
Revised: 16 September 2001
The problem of overload resolution has been widely studied with respect to popular imperative programming languages such as Ada, C++, and Java. These languages allow overloading in different situations, which imposes different requirements on the resolution algorithm. Ada allows overloading based on argument and/or return types, but has no implicit conversions. C++ and Java only allow overloading based on argument types, but have a rich set of implicit conversions that can potentially be applied to actual parameters. Ada allows overloading of functions, operators, and enumerated constants; C++ allows overloading of functions and operators, and Java only allows overloading of functions.
Java and C++ algorithms for resolving overloads are insufficient for Ada, since the potential for overloading based on return type adds significant complexity. This lead to algorithms by Baker[1] and Cormack[2], which incorporate both arguments and return types. While Cforall is similar to Ada, in that it allows return-type overloading, these algorithms are insufficient since they do not take implicit conversions into account. In C++ or Java there can be multiple matches for a single use of a name, without the expression being ambiguous -- the selected interpretation is in some way the "nearest" one, or, to use the Java terminology, a "maximally specific" one. There needs to be some well-defined notion of distance, which is the "cost" in Cforall terminology. The cost is not necessarily an integer value, but it must be possible to add costs and compare them. This discussion looks at ways to extend the Baker and Cormack algorithms to incorporate this notion of cost.
The discussion of these algorithms assumes that operators in expressions have been converted to an equivalent functional form. In addition, these algorithms should be consistent in the case of overloaded variables and constants, if these are viewed as functions of no arguments returning a value whose type is the type of the constant or variable. Pseudocode for the Baker and Cormack algorithms is adapted from Fischer and LeBlanc[3].
Baker's algorithm is designed to be applied from the bottom up. It takes as its parameters the name of the overloaded function, and a list of possible interpretations for each actual parameter. The algorithm returns a list of possible interpretations for this use of the function. Since at this point the context of the expression is not known (that is, the constraints that exist on the return type), any interpretation of the name that is consistent with the possible interpretations of the actual parameters is returned.
Baker( this_name : name, args : list< set< tree > > ) : tree_list
result_trees : set< tree > = {}
for each interpretation (return_type, formals) corresponding to this_name
if formals.length == args.length then
new_tree : tree
new_tree.type = return_type
for i = 1 to formals.length
if there exists j in args[i] such that formals[i].type == j.type then
new_tree.child[i] = j
else
skip to next interpretation
(i.e. this interpretation is not consistent with the possible argument types)
end if
end for
if this interpretation is valid then
if there exists i in result_trees such that i.type == new_tree.type then
i.ambiguous = true
else
add new_tree to result_trees
end if
end if
end if
end for
for each tree in result_trees
if tree.ambiguous then
remove tree from result_trees
end if
end for
return result_trees
Baker's algorithm assumes there is an exact match between the types of formal and actual parameters. This assumption does not hold in the Cforall case -- here, there may be inexact and exact matches. An interpretation with an exact match on one parameter may have very distant matches on other parameters. For this reason, Algorithm 2 keeps track of the total cost of matching a given interpretation with the set of actuals. When two interpretations are valid for a set of actuals and have the same result type, it is only an ambiguity if the costs of those interpretations are the same. Otherwise, the cheapest interpretation for that result type can be chosen, and the rest discarded.
BakerDitchfield( this_name : name, args : list< set< tree > > ) : tree_list
result_trees : set< tree > = {}
for each interpretation (return_type, formals) corresponding to this_name
if formals.length == args.length then
new_tree : tree
new_tree.type = return_type
new_tree.cost = 0
for i = 1 to formals.length
if there exists j in args[i] such that j.type can be converted to formals[i].type then
find k in args[i] such that conversion_cost( k.type, formals[i].type ) + k.cost is minimized
new_tree.child[i] = k
new_tree.cost += k.cost
new_tree.cost += conversion_cost( k.type, formals[i].type )
else
skip to next interpretation
(i.e. this interpretation is not consistent with the possible argument types)
end if
end for
if this interpretation is valid then
if there exists i in result_trees such that i.type == new_tree.type then
if i.cost == new_tree.cost then
i.ambiguous = true
else if i.cost < new_tree.cost then
remove i from result_trees
add new_tree to result_trees
(else throw away new_tree)
end if
else
add new_tree to result_trees
end if
end if
end if
end for
for each tree in result_trees
if tree.ambiguous then
remove tree from result_trees
end if
end if
return result_trees
Cormack's algorithm is simpler, but it presupposes the existence of an untyped expression tree. It traverses the tree from the top down, so that it knows the context of each sub-expression. This algorithm only counts the number of valid interpretations of the expression, but it could easily be augmented to "decorate" the expression tree with types, if it is able to determine them unambiguously.
Cormack( target_type : type, this_node : tree ) : integer
solutions : integer = 0
for each interpretation (return_type, formals) corresponding to the name of this_node
if formals.length == this_node.num_children and return_type == target_type then
combos : integer = 1
for i = 1 to formals.length
combos *= Cormack( formals[i].type, this_node.child[i] )
end for
solutions += combos
end if
end for
return solutions
A variant of Cormack's algorithm for Cforall must consider not only the number of valid interpretations, but also the cost of those interpretations. Since the algorithm works from the top down, it can eliminate expensive interpretations immediately and so only return multiple valid interpretations if they have the same (minimal) cost.
CormackDitchfield( target_type : type, this_node : tree ) : cost, integer
min_cost : cost = Infinity
solutions : integer = 0
for each interpretation (return_type, formals) corresponding to the name of this_node
if formals.length == this_node.num_children then
this_cost : cost = 0
combos : integer = 1
for i = 1 to formals.length
(new_cost, new_count) = CormackDitchfield( formals[i].type, this_node.child[i] )
if new_count == 0 then
skip to next interpretation
else
this_cost += new_cost
combos *= new_count
end if
end for
this_cost += conversion_cost( return_type, target_type )
if this_cost < min_cost
min_cost = this_cost
solutions = combos
else if this_cost == min_cost
solution += combos
(else throw away this interpretation)
end if
end if
end for
return min_cost, solutions
[1] Baker, T. "A one-pass algorithm for overload resolution in Ada." ACM Transactions on Programming Languages and Systems 4(4): 601-14, 1982.
[2] Cormack, G. V. "An algorithm for the selection of overloaded functions in Ada." SIGPLAN Notices 16(2): 48-52, 1981.
[3] Fischer, Charles N. and Richard J. LeBlanc Jr. Crafting a Compiler with C. Benjamin/Cummings, 1991.
There are more Cforall white papers by Richard C. Bilson