CS211 Final Project JavaDUCK

Due: 10:10 a.m. Tuesday 28 April, 1998 in lecture.

Important Notes:

Disclaimer ó It pays to increase your word power

We have tried to make this assignment as interesting as possible for you (and we hope that by the end of term you will agree that we succeeded). However, due to the nature of the assignment, a few new concepts from different areas of computer science will have to be introduced. That means we need a few "big words" that might scare you at first. There is no need to be alarmed! We are going to introduce those words only to establish a common vocabulary so that we can talk freely about the assignment. Gradually, everything relevant to the assignment will be explained thoroughly. We stress the word relevant because all those new concepts will be only touched upon, and you should not worry about having to learn lots of new material. Mostly, it will be things that you either already know or have good intuition about. Only now weíll have some formal definitions for them.

This project has been tested before being used in the course. No animals (except your TAs) were hurt during the experiments ... Max and Dan are recovering nicely.


Java is a relatively new language, but it is already being used widely in the development of large software systems. Java offers lots of useful features to developers, such as added security, getting rid of raw pointers, applet support, and others. In particular, Java's strong support for encapsulation and object-oriented programming is key to the development of large software systems. It allows developers to design and implement their projects at a very high level of abstraction, without having to worry constantly about lots of little implementation details.

Of course, we expect that no large program will be perfect (except perhaps those written by the teaching staff of CS211). Often, bugs will be found and customers will ask for new features after the initial development has finished. The task of fixing bugs and adding new features after the system has been delivered is called software maintenance. However, in industry, the programmers who write a system are usually not the ones who end up maintaining it!

Let's suppose a team has been assigned to maintain a large software product written is Java. To perform their job effectively, the team members must become familiar with the code quickly. However, wading through the actual Java source files is cumbersome and inefficient because the implementation details are usually less important than the higher level details, such as which methods a class provides, which interfaces it implements, etc.

What the system maintainers really need is a high-level view of the classes that can be extracted automatically from the source code. In fact, there already is such a tool that comes with Sun's standard JDK called javadoc. When you pass a Java source file to javadoc, it reads through the code and automatically creates a web page showing the various features of the class, omitting most of the details. The project web page contains a link to such a generated web page.

Your job will be to create a much simpler version of this kind of tool called JavaDUCK (Java DocUmenter of Code, oK?). Your program will read in a Java source class (i.e., a .java file), build up various data structures to represent the class and its subcomponents (i.e., its methods and variables), and then create an HTML file that formats this information in a visually pleasing manner. We will walk you through the major steps of the assignment, and we will also provide some of the parts for you. However, it is vital that you attend recitation or you will be lost.

To make life simpler for you, we will restrict the kinds of Java programs you have to consider (we'll explain as we go what this means). Also, you may assume that any Java source file passed to your program is legal. However, because your program won't be handling all of Java (e.g., you can ignore arrays as they are difficult to parse), the source code for your solution will likely violate some of these assumptions and won't be a good candidate for running through the system as test input. This is going to take you some time to understand, so it is important that you start early.

What you really have to do

We are going to make some simplifications about Java so that you don't get bogged down in too many esoteric details. For example, you may assume that the code you have to process will be completely legal, and will not have any interface definitions or arrays or initializing expressions in declarations or Ö. well, if we continued to describe in English what you can and cannot expect, then this document would get much longer and there would surely be confusing and ambiguous descriptions. For this reason, we will present a formal grammar for the files you must document. This has the advantage of letting us present a concise, unambiguous definition of your input files. It has the disadvantage that we have to explain to you how read the darned grammar!

What you need to do is read in files that conform to the grammar and produce an appropriate HTML file. Reading in a source file and figuring out its structure (what are the variable names, is this method abstract, etc.) is called parsing and generally requires some specialized techniques. Weíre going to do some of it for you and give you some ideas for the rest.

A grammar? Is that a mammal of some sort?

As promised, we are going to describe exactly what sort of input files your program can expect. Do not worry about any Java constructs not included in the grammar presented here. Do worry about all possible ways of conforming to the grammar. Keep in mind that the grammar is just formalizing what you already know and is making some simplifications.

The way we are going to write a grammar for Java is by specifying which keywords belong where. Anything appearing in [square brackets] is optional. Anything in ALL CAPS will mean an identifier, which is basically a class name or a variable name or a method name. Anything with an asterisk (*) next to it may appear any number of times (e.g., 0 times, 1 time, or 17 times).

We are ready to give the grammar:

[import LIBRARY;]*

[package PACKAGE_NAME;]

[public] [abstract] [final] class CLASS_NAME









Letís go through this declaration step-by-step. What this says is that a file will have some number of import statements, then either zero or one package statements, and then one class definition. Each import statement (if there are any) will consist of the keyword "import" followed by an identifier followed by a semicolon. The package statement (if there is one) will consist of the keyword "package" followed by an identifier followed by a semicolon.

The class declaration part can be described like this. It can have any combination of the keywords "public", "abstract", and "final", but those that are present will be in this order. Then it has the keyword "class". Then it has an identifier (the name of the class). Then it may or may not have the keyword "extends" followed by an identifier. Then it may or may not have the keyword "implements" followed by a comma-separated list of identifiers (names of interfaces). Study the implements line carefully and see how it uses the definitions of [] and * we gave above.

Then we are guaranteed to have a "{" and then the line "//Variables". Then we will have any number of variable declarations. Then we will have the line "//Methods". Then we will have any number of method declarations. Finally we will have a "}" and nothing else. Notice that we that our Java source code will have no comments except for the lines "//Variables" and "//Methods", and that all variable members come before all method members. This should make your life much easier.

We would be done except that we havenít yet given the rules for Variable_Declaration and Method_Declaration. So letís get it over with.


[public | protected | private] [static] [final] TYPE


The "|" used above is new. It basically means "one or the other but not both". So a variable declaration starts with "public", "private", "protected" or none of the above (because of the brackets). Remember that if none of them appears, the visibility level is "package". Then it either has static or doesnít. Then it either has final or doesnít. Then it has an identifier which is the type, then a comma-separated list of identifiers which are the variables. Finally it has a semicolon. Notice that for simplicity we will not allow initialization of variables when they are declared. So the declarations:

int i;

int j, k, l;

are both okay for our purposes, but the declaration:

int m=0;

is not (even though it is legal Java).


[public | protected | private] [static] [abstract] [final] [RETURN_TYPE] METHOD_NAME ([TYPE PARAM [, TYPE PARAM]*])


<Stuff we do not care about, except for curlies>


The first part (everything up to [RETURN_TYPE]) is very similar to variable declarations; we leave it to you to understand it. The RETURN_TYPE identifier is optional because constructors do not have return types. Since we are assuming the input code is correct, you can decide whether a method is a constructor by seeing if it has a RETURN_TYPE and a METHOD_NAME or just a METHOD_NAME. After METHOD_NAME, we will see a "(" and then zero or more parameters separated by commas, and then a ")". Then we will see the method body which starts with "{" and ends with a "}". We donít care much about what is in the method body. Weíll discuss this more in the next section.

Now Itís Time to Parse

We understood the grammar so we would know what our program must do. But we still donít know how to actually write such a program. Processing the code takes place over two steps.

The first step, called lexical analysis or scanning, is the process of breaking the input into tokens. Tokens are sequences of characters in the code that "mean something" in the programming language. For example, " int a,b; " would be broken up into tokens "int", "a", ",", "b", and ";". Each token would carry the underlying text (such as "int" or "a") and a constant that would specify its type (token "int" would have type INT, token "a" would have type IDENTIFIER).

Now for the good news: We have written the lexical analyzer for you! It can be downloaded from the project web page. It contains an example program and a description on how to use it (itís very simple to use, we promise). We will give you some tips on how to use the lexer in recitation. To understand the rest of this section, you just need to know that you can create an object of type Yylex (call this object lexAnalyzer) and this will have a method yylex() which returns the next token in a file. The types of tokens are defined in TokenType.java, which we will also provide for you.

The second step is called parsing. Basically, you will keep calling lexAnalyzer.yylex() and "do the right thing" with the token you get back. How do you know what the right thing is? Use the grammar! Everywhere you call lexAnalyzer.yylex() you should know what possible things you might get back. All you have to do is figure out what you actually get back, store the relevant information, and figure out what possible things you might get back next, and so on.

Hereís an example. Letís suppose weíve just gotten the keyword "class" back. That is, we just got a token whose type was CLASS. Next we know we will get a token of type IDENTIFIER whose text is the name of the class. Weíll have to store that name somewhere because the class name is an important thing for our HTML file. The possibilities for the type of the next token are EXTENDS, IMPLEMENTS, and LCURLY. (Why? Look at the grammar!). Weíll decide what to do based on which of the three it is. If itís EXTENDS, then the next token we read will be the parent class name, which weíll need to store somewhere. After that we can expect either IMPLEMENTS or LCURLY. And so on and so forth. As a general word of advice, try to take advantage of the structure of programs as much as possible. That means, for example, while you are processing the variables in a class, you should be inside a loop whose exit condition is that the next token is the special token "//Methods". Your program will have a large set of nested loops and if statements, as well as calls to interesting auxiliary methods. Your TAs will give you some advice on how to achieve this in recitation.

The * cases in the grammar are slightly more interesting. For example, you donít know how many interfaces (if any) a class may implement. You have to keep getting more (hint: sounds like a loop) as long as after an interface name you get a token of type COMMA and not of type LCURLY (again, look back at the grammar). And you have to keep all the names somewhere (hint: maybe a Vector).

Finally, remember that we donít care about method bodies, but we do need to figure out where they end (so we know where the next method, if there is one, starts). Hint: A method ends when the number of right curly brackets seen equals the number of left curly brackets seen. So just read all the tokens in the method body while keeping track of curly brackets.

Staying Organized

You need to write the Java code that does things like the previous two paragraphs describe. And you need to do it for the whole grammar. You will also need an organized way to store all the information youíre getting as you go along. What kind of information are we looking for? This is described in the next section.

What we suggest is writing storage classes, similar to the Component class you designed in assignment #3. For example, you may want to define a class called ClassInfo, an instance of which will represent one class from a given file. This class would probably have storage for a class name, a superclass name, interfaces, and all variables and methods. How do you store variables and methods? You may want to define classes VariableInfo and MethodInfo, similar to ClassInfo. This way you can organize storage of the information about your variables and methods. Since you are going to store both variables and methods in your class, it may be useful to define an abstract class MemberInfo that would contain things common to variables and methods.

Generating HTML Documentation

Parsing the input and building up the necessary data structures is almost all of the hard work. Now that you have all the information you need, you just need to generate an HTML file that presents the information in a visually-pleasing manner. The HTML file must conform to the following specifications:

Note: Only variables and methods listed in the file should be documented. That is, do not worry about inherited members (unless you do the extra credit, see below).

Examples of acceptable HTML files will be available on the course website. You are encouraged to create more exciting files as long as they conform to the specifications above. The basics of HTML have already been explained in recitation.

Implementation Specification

Basically you are on your own to create the class definitions and data structures you need to implement the project in an appropriate way. However, in order for us to test your program (and for the GUI described below to work), your implementation must have a class called Documenter. This class must have a method with the following name and form:

public static void documentClass (String classname) {...}

This file must read in the file classname + ".java" and produce the file classname + ".html" according to the specifications above. Everything else is up to you.

What You Need From Us

The code we will provide you will be minimal. We will provide the lexer as described above. We will also provide a simple graphical widget for specifying what class to document (a FileDialog like in assignment #1). Basically you type in class name foo and that means Documenter.documentClass("foo") will be executed.

In CodeWarrior, use "Java Application" as the project kind.


What to Hand In

The diskette should contain the following:

We will compile and run your program using CodeWarrior Pro 2.

We will call your Documenter.documentClass method, so this method must be called this and must do what it is supposed to (see above).

Working in Stages

This is a big project, much bigger than any of your previous assignments. If you attempt to do the whole thing and then start testing and debugging, then you will be probably get hopelessly lost. We recommend implementing your project in the following stages:

  1. Just generate documentation for files that have no variables or methods, no import statements, and no package statement. In terms of our grammar, your file will look like
  2. [public] [abstract] [final] class CLASSNAME

    [extends SUPERCLASS]

    [implements INTERFACE1 [, INTERFACE2]* ] {}

    This stage will probably take you the most time because you will have to figure out how to use the lexer, how to create an HTML file, and how to parse. Do not forget that a class may implement more than one interface.

  3. Now let your input files have import statements and package statements. This should be easy after stage one. You just have to ignore tokens until the class declaration starts. Remember that just because youíre allowing import/package statements, it doesnít mean that input files will always have them.
  4. Now add variable declarations. This will require more parsing, but it will be easier than what you did in stage 1. Remember, though, that the output needs to separate static and instance variables. Think about what data structures you want to create to hold the necessary information. Also remember to expect the "//Variables" line in the right place.
  5. Now add method declarations. The parsing will probably be more difficult, but by now you should have the hang of things. Separating static methods, constructors, and instance methods shouldnít be much harder than what you did in stage 3. You need to convey a lot of information about each method, so make sure your HTML file is easy for a human to understand. Also remember where to check for the "//Methods" line which separates variables and methods.
  6. If your thoughts havenít yet turned to barbecues, swimming holes, and warm evenings, try the extra credit. For the extra credit, first just look at the parent class (if available). Once you have that working, look at all available ancestor classes.

Extra Credit

Do not work on the extra credit until you have the assignment completed. If you did a good job with the assignment, then modifying it to complete the extra credit will not be more work than doing the extra credit originally.

For the extra credit, you must list all inherited variables and methods that can be discovered. Also if you can discover that a method overrides the method of an ancestor class then you must document this. More specifically:

Note: You may assume all of the Java files are correct. For example, it is illegal for a method in the child class to have the same name and parameters as a method in an ancestor class but a different return type. Again, you do not have to check for this sort of thing.