Java Import Statement Cleanup (New and Improved Version)

I posed this homework assignment in my Spring 2003 graduate programming language class. You will first find the statement of the assignment. My solution (which the students did not see until after the due date) is at the end of this article. If you are impatient and just want to clean your import statements, go to the end of this document and download the program.

In case you are curious . . . the students did rather well. The majority of the students independently produced correct solutions, and one of them found a subtle error in mine.

Background

In the Java programming language, the names of classes that are defined inside packages always start with the package name. For example, the class name

java.awt.Rectangle

starts with the name of the package java.awt. As a convenience to programmers, the import mechanism can be used to reference classes without the package name. For example, it is tedious to write

java.awt.Rectangle box = new java.awt.Rectangle(5, 10, 20, 30);

You can use the more convenient form

Rectangle box = new Rectangle(5, 10, 20, 30);

provided you import the class.

Classes in the same package are automatically imported, as are the classes in the java.lang package. For all other classes, you must supply an import statement to either import a specific class

import java.awt.Rectangle;

or to import all classes in a package, using the on demand notation

import java.awt.*;

Importing classes can lead to ambiguities. A class name is ambiguous if it occurs in two packages that are imported on demand. For example, suppose a program contains the imports

import java.awt.*;
import java.util.*;

The class name List is now ambiguous because there are two classes java.awt.List and java.util.List. You can resolve the ambiguity by adding a specific import of the class name:

import java.awt.*;
import java.util.*;
import java.util.List;

However, if you need to refer to both java.awt.List and java.util.List in the same source file, then you have crossed the limits of the import mechanism. You can use an import statement to shorten one of the names to List, but you need to reference the other by its full name whenever it occurs in the source text.

The Problem

Ambiguities are unpleasant because they can arise over time, as libraries expand. For example, in JDK 1.1, there was no java.util.List class. Consider a program that imports java.awt.* and java.util.* and uses the name List as a shortened form of java.awt.List . That program compiles without errors under JDK1.1 but fails to compile in Java 2.

Therefore, the use of import on demand is somewhat dangerous. However, single class importing can lead to long import lists that are tedious to manage, especially as code is moved from one class to another during development.

To illustrate this, consider the import list in one of my recent files.

import java.awt.*;
import java.awt.geom.*;
import java.io.*;
import java.util.*;

It turned out that I really needed

import java.awt.Graphics2D;
import java.awt.Rectangle;
import java.awt.geom.Point2D;
import java.awt.geom.Rectangle2D;
import java.util.ArrayList;

(Apparently, the need for importing java.io.* had gone away at some point during the program's evolution)

Thus, a problem that Java programmers face is how to keep import lists up-to-date when programs change.

The Assignment

Your assignment is to write a Java program called ImportCleaner that reads a Java source file and prints a minimal set of single import statements for all classes that the program needs to import, such as
import java.awt.Graphics2D;
import java.awt.Rectangle;
import java.awt.geom.Point2D;
import java.awt.geom.Rectangle2D;
import java.util.ArrayList;

Sort the list alphabetically.

You may assume that the source file is syntactically correct. You may further assume that all classes in the source file and all classes on which they depend have already been compiled and can be loaded.

You should use JavaCC to build a parser that parses the Java source file.

Deliverables

Tips

  1. Carefully study the Java language specification (JLS), in particular section 6.5.
  2. Start early. This is not a hard assignment but it takes some amount of time to understand all the subtleties of the Java specification.
  3. Modify the Java 1.1 grammar that comes as part of JavaCC
  4. You cannot rely on naming conventions. Don't make any decisions based on lowercase/uppercase letters.
  5. Take advantage of the fact that you know that the source file is syntactically correct and that it has been compiled. That simplifies some of the scope rules, and it allows you to load any classes that the source file defines.
  6. Modify the parser to capture all imports, all names, and all name lists.
  7. Focus on the head of each name. That is, if you see a name x.y.z.w, only x needs to be imported.
  8. Once you capture all names and check whether the heads are matched by the imports, your only problem is false positives. There are situations where the head x of a name is a class name in one of the imported packages, but it actually refers to some other entity in the given source file.
  9. To avoid false positives, you need to keep track of block and class scopes. Keep a stack of scope objects. Push a new scope at the beginning of the block or class, and pop it off at the end.
  10. When you encounter a name, remember its scope. It will be popped off, but the reference in the name object keeps it alive. (Just like a Scheme closure...)
  11. A scope can contain the same name multiple times. For example, it is legal to have a local variable and a local class with the same name!
  12. Pay attention to the rules for resolving ambiguous names and type names in the JLS.
  13. Ambiguous names match variables and fields, but type names do not. For example,
    import javax.swing.*;
    public class Test
    {
    public static void main(String[] SwingConstants)
    {
    System.out.println(SwingConstants.length); // don't generate import
    }
    }

    import javax.swing.*;
    public class Test
    {
    public static void main(String[] SwingConstants)
    {
    class Foo implements SwingConstants {}; // generate import javax.swing.SwingConstants
    System.out.println(Foo.EAST);
    }
    }
  14. Local scope includes the public scope of supertypes. For example,
    import java.net.*;   
    public class Test extends javax.print.DocFlavor
    {
    public Test() { super("", ""); }
    public static void main(String[] args)
    {
    System.out.println(URL.AUTOSENSE); // don't generate import
    }
    }
  15. For simplicity, you may assume that all superclasses and superinterfaces are top-level (i.e. not inner) classes.
  16. To determine that a class is a public class of another package, use
    Class cl = Class.forName(...);
    if (Modifier.isPublic(cl.getModifiers()) ...
    This is important because Class.forName will load non-public classes (such as java.awt.Queue).
    Thanks to Bao-Hoi Nguyen for pointing out this issue!

My Solution

  1. Download the file ImportCleaner.jar . With some browsers, you may need to right-click on the link and select a menu option such as "Save Link As".
  2. Save it in your favorite location
  3. Open a command shell
  4. Run java -jar your/favorite/location/ImportCleaner.jar options source file(s). Or, if you need to set a class path, run
    java -classpath ...:your/favorite/location/ImportCleaner.jar ImportCleaner options source file(s).

The program takes the following options:

Note:  The program saves the original source files, adding an extension .cleaned.

The program is distributed under the Sun Public License. (Because the program contains JavaCC, I don't think I can distribute it under the GPL.) Source code is contained in the JAR file.