Algorithms for Overload Resolution

A Cforall White Paper

Richard C. Bilson

Revised: 16 September 2001

Introduction

The problem of overload resolution has been widely studied with respect to popular imperative programming languages such as Ada, C++, and Java. These languages allow overloading in different situations, which imposes different requirements on the resolution algorithm. Ada allows overloading based on argument and/or return types, but has no implicit conversions. C++ and Java only allow overloading based on argument types, but have a rich set of implicit conversions that can potentially be applied to actual parameters. Ada allows overloading of functions, operators, and enumerated constants; C++ allows overloading of functions and operators, and Java only allows overloading of functions.

Java and C++ algorithms for resolving overloads are insufficient for Ada, since the potential for overloading based on return type adds significant complexity. This lead to algorithms by Baker[1] and Cormack[2], which incorporate both arguments and return types. While Cforall is similar to Ada, in that it allows return-type overloading, these algorithms are insufficient since they do not take implicit conversions into account. In C++ or Java there can be multiple matches for a single use of a name, without the expression being ambiguous -- the selected interpretation is in some way the "nearest" one, or, to use the Java terminology, a "maximally specific" one. There needs to be some well-defined notion of distance, which is the "cost" in Cforall terminology. The cost is not necessarily an integer value, but it must be possible to add costs and compare them. This discussion looks at ways to extend the Baker and Cormack algorithms to incorporate this notion of cost.

The discussion of these algorithms assumes that operators in expressions have been converted to an equivalent functional form. In addition, these algorithms should be consistent in the case of overloaded variables and constants, if these are viewed as functions of no arguments returning a value whose type is the type of the constant or variable. Pseudocode for the Baker and Cormack algorithms is adapted from Fischer and LeBlanc[3].

Bottom-up (Baker) Methods

Baker's algorithm is designed to be applied from the bottom up. It takes as its parameters the name of the overloaded function, and a list of possible interpretations for each actual parameter. The algorithm returns a list of possible interpretations for this use of the function. Since at this point the context of the expression is not known (that is, the constraints that exist on the return type), any interpretation of the name that is consistent with the possible interpretations of the actual parameters is returned.

Algorithm 1 (Baker)

Baker( this_name : name, args : list< set< tree > > ) : tree_list
  result_trees : set< tree > = {}
  for each interpretation (return_type, formals) corresponding to this_name
    if formals.length == args.length then
      new_tree : tree
      new_tree.type = return_type
      for i = 1 to formals.length
        if there exists j in args[i] such that formals[i].type == j.type then
          new_tree.child[i] = j
        else
          skip to next interpretation
          (i.e. this interpretation is not consistent with the possible argument types)
	end if
      end for
      if this interpretation is valid then
        if there exists i in result_trees such that i.type == new_tree.type then
          i.ambiguous = true
        else
          add new_tree to result_trees
	end if
      end if
    end if
  end for
  for each tree in result_trees
    if tree.ambiguous then
      remove tree from result_trees
    end if
  end for
  return result_trees

Baker's algorithm assumes there is an exact match between the types of formal and actual parameters. This assumption does not hold in the Cforall case -- here, there may be inexact and exact matches. An interpretation with an exact match on one parameter may have very distant matches on other parameters. For this reason, Algorithm 2 keeps track of the total cost of matching a given interpretation with the set of actuals. When two interpretations are valid for a set of actuals and have the same result type, it is only an ambiguity if the costs of those interpretations are the same. Otherwise, the cheapest interpretation for that result type can be chosen, and the rest discarded.

Algorithm 2

BakerDitchfield( this_name : name, args : list< set< tree > > ) : tree_list
  result_trees : set< tree > = {}
  for each interpretation (return_type, formals) corresponding to this_name
    if formals.length == args.length then
      new_tree : tree
      new_tree.type = return_type
      new_tree.cost = 0
      for i = 1 to formals.length
        if there exists j in args[i] such that j.type can be converted to formals[i].type then
          find k in args[i] such that conversion_cost( k.type, formals[i].type ) + k.cost is minimized
          new_tree.child[i] = k
          new_tree.cost += k.cost
          new_tree.cost += conversion_cost( k.type, formals[i].type )
        else
          skip to next interpretation
          (i.e. this interpretation is not consistent with the possible argument types)
	end if
      end for
      if this interpretation is valid then
        if there exists i in result_trees such that i.type == new_tree.type then
          if i.cost == new_tree.cost then
            i.ambiguous = true
          else if i.cost < new_tree.cost then
            remove i from result_trees
            add new_tree to result_trees
          (else throw away new_tree)
	  end if
        else
          add new_tree to result_trees
	end if
      end if
    end if
  end for
  for each tree in result_trees
    if tree.ambiguous then
      remove tree from result_trees
    end if
  end if
  return result_trees

Top-down (Cormack) Methods

Cormack's algorithm is simpler, but it presupposes the existence of an untyped expression tree. It traverses the tree from the top down, so that it knows the context of each sub-expression. This algorithm only counts the number of valid interpretations of the expression, but it could easily be augmented to "decorate" the expression tree with types, if it is able to determine them unambiguously.

Algorithm 3 (Cormack)

Cormack( target_type : type, this_node : tree ) : integer
  solutions : integer = 0
  for each interpretation (return_type, formals) corresponding to the name of this_node
    if formals.length == this_node.num_children and return_type == target_type then
      combos : integer = 1
      for i = 1 to formals.length
        combos *= Cormack( formals[i].type, this_node.child[i] )
      end for
      solutions += combos
    end if
  end for
  return solutions

Algorithm 4

A variant of Cormack's algorithm for Cforall must consider not only the number of valid interpretations, but also the cost of those interpretations. Since the algorithm works from the top down, it can eliminate expensive interpretations immediately and so only return multiple valid interpretations if they have the same (minimal) cost.

CormackDitchfield( target_type : type, this_node : tree ) : cost, integer
  min_cost : cost = Infinity
  solutions : integer = 0
  for each interpretation (return_type, formals) corresponding to the name of this_node
    if formals.length == this_node.num_children then
      this_cost : cost = 0
      combos : integer = 1
      for i = 1 to formals.length
        (new_cost, new_count) = CormackDitchfield( formals[i].type, this_node.child[i] )
        if new_count == 0 then
          skip to next interpretation
        else
          this_cost += new_cost
          combos *= new_count
        end if
      end for
      this_cost += conversion_cost( return_type, target_type )
      if this_cost < min_cost
        min_cost = this_cost
        solutions = combos
      else if this_cost == min_cost
        solution += combos
      (else throw away this interpretation)
      end if
    end if
  end for
  return min_cost, solutions

References

[1] Baker, T. "A one-pass algorithm for overload resolution in Ada." ACM Transactions on Programming Languages and Systems 4(4): 601-14, 1982.

[2] Cormack, G. V. "An algorithm for the selection of overloaded functions in Ada." SIGPLAN Notices 16(2): 48-52, 1981.

[3] Fischer, Charles N. and Richard J. LeBlanc Jr. Crafting a Compiler with C. Benjamin/Cummings, 1991.


There are more Cforall white papers by Richard C. Bilson