Java Import Statement Cleanup (Pr

Cay S. Horstmann

NOTE: Several months after writing this article, I assigned a homework problem in my graduate programming language class that does a better job than this solution. The technique used in this article (i.e., analyzing class files) is interesting, but if you  need to clean import statements, I suggest you use the new and improved cleaner.

Background

In the Java programming language, the names of classes that are defined inside packages always start with the package name. For example, the class name

java.awt.Rectangle

starts with the name of the package java.awt. As a convenience to programmers, the import mechanism can be used to reference classes without the package name. For example, it is tedious to write

java.awt.Rectangle box = new java.awt.Rectangle(5, 10, 20, 30);

You can use the more convenient form

Rectangle box = new Rectangle(5, 10, 20, 30);

provided you import the class.

Classes in the same package are automatically imported, as are the classes in the java.lang package. For all other classes, you must supply an import statement to either import a specific class

import java.awt.Rectangle;

or to import all classes in a package, using the wildcard notation

import java.awt.*;

Importing classes can lead to ambiguities. A class name is ambiguous if it occurs in two packages that are imported by wildcards. For example, suppose a program contains the imports

import java.awt.*;
import java.util.*;

The class name List is now ambiguous because there are two classes java.awt.List and java.util.List. You can resolve the ambiguity by adding a specific import of the class name:

import java.awt.*;
import java.util.*;
import java.util.List;

However, if you need to refer to both java.awt.List and java.util.List in the same source file, then you have crossed the limits of the import mechanism. You can use an import statement to shorten one of the names to List, but you need to reference the other by its full name whenever it occurs in the source text.

The Problem

Ambiguities are unpleasant because they can arise over time, as libraries expand. For example, in JDK 1.1, there was no java.util.List class. Consider a program that imports java.awt.* and java.util.* and uses the name List as a shortened form of java.awt.List . That program compiles without errors under JDK1.1 but fails to compile in Java 2.

Therefore, the use of wildcards for imports is somewhat dangerous. However, importing each class can lead to long import lists that are tedious to manage, especially as code is moved from one class to another during development.

To illustrate this, consider the import list in one of my recent files.

import java.awt.*;
import java.awt.geom.*;
import java.io.*;
import java.util.*;

It turned out that I really needed

import java.awt.Graphics2D;
import java.awt.Rectangle;
import java.awt.geom.Point2D;
import java.awt.geom.Rectangle2D;
import java.util.ArrayList;

(Apparently, the need for importing java.io.* had gone away at some point during the program's evolution)

Thus, a problem that Java programmers face is how to keep import lists up-to-date when programs change.

Potential Solutions

One time-honored solution of checking import lists is to comment out one line at a time until compiler errors go away. Naturally, that is tedious.

Another solution is to stop using import lists altogether and referencing the full class names at all times. Naturally, that too is tedious.

Several compilers emit lists of classes that they load as they compile a program. For example, if you run the compiler in the Sun J2SE SDK 1.4 with the -verbose option, you get a list such as

[loading /usr/local/j2sdk1.4.0/jre/lib/rt.jar(java/awt/Font.class)]
[loading /usr/local/j2sdk1.4.0/jre/lib/rt.jar(java/awt/Graphics2D.class)]
[loading /usr/local/j2sdk1.4.0/jre/lib/rt.jar(java/awt/Stroke.class)]
[loading /usr/local/j2sdk1.4.0/jre/lib/rt.jar(java/awt/font/FontenderContext.class)]
[loading /usr/local/j2sdk1.4.0/jre/lib/rt.jar(java/awt/geom/Line2D.class)]
[loading /usr/local/j2sdk1.4.0/jre/lib/rt.jar(java/awt/geom/Point2D.class)]
[loading /usr/local/j2sdk1.4.0/jre/lib/rt.jar(java/awt/geom/Rectangle2D.class)]
[loading /usr/local/j2sdk1.4.0/jre/lib/rt.jar(java/util/ArrayList.class)]
[loading ./AbstractEdge.class]
[loading /usr/local/j2sdk1.4.0/jre/lib/rt.jar(java/lang/Object.class)]
[loading ./Edge.class]
[loading /usr/local/j2sdk1.4.0/jre/lib/rt.jar(java/io/Serializable.class)]
[loading /usr/local/j2sdk1.4.0/jre/lib/rt.jar(java/lang/Cloneable.class)]
[loading ./LineStyle.class]
[loading ./ArrowHead.class]
[loading /usr/local/j2sdk1.4.0/jre/lib/rt.jar(java/lang/String.class)]
[checking SegmentedLineEdge]
[loading /usr/local/j2sdk1.4.0/jre/lib/rt.jar(java/awt/Graphics.class)]
[loading ./SerializableEnumeration.class]
[loading /usr/local/j2sdk1.4.0/jre/lib/rt.jar(java/util/AbstractList.class)]
[loading /usr/local/j2sdk1.4.0/jre/lib/rt.jar(java/util/AbstractCollection.class)]
[loading /usr/local/j2sdk1.4.0/jre/lib/rt.jar(java/awt/geom/Line2D$Double.class)]
[loading /usr/local/j2sdk1.4.0/jre/lib/rt.jar(java/awt/Shape.class)]
[loading /usr/local/j2sdk1.4.0/jre/lib/rt.jar(java/text/CharacterIterator.class)]
[loading /usr/local/j2sdk1.4.0/jre/lib/rt.jar(java/lang/Comparable.class)]
[loading /usr/local/j2sdk1.4.0/jre/lib/rt.jar(java/lang/CharSequence.class)]
[loading /usr/local/j2sdk1.4.0/jre/lib/rt.jar(java/awt/geom/Point2D$Double.class)]
[loading /usr/local/j2sdk1.4.0/jre/lib/rt.jar(java/awt/geom/RectangularShape.class)]
[loading /usr/local/j2sdk1.4.0/jre/lib/rt.jar(java/text/AttributedCharacterIterator.class)]

It would be an easy matter to write a script that transforms this output into a set of import statements. However, the output contains classes that don't actually need to be imported (such as  CharSequence and AttributedCharacterIterator).  These classes are loaded because some of the loaded classes depend on them.  It is not clear (at least to me) how one can  weed out the unneeded classes. 

I used a different approach. I wrote a utility that harvests the class file. Unlike source files, class files never contain shortened class files. Even standard classes such as java.lang.String are referenced by their full names.  

Class files contain the names of classes as well as field and method descriptors that contain class names (in an odd format, such as Ljava/lang/String; ). To harvest the class names, one must know the layout of the constant pool and the field and method descriptors. The class file format is well-documented--see the references at the end of this document--and only moderately complex.

The ImportCleaner Program

The ImportCleaner program parses one or more class files, harvests the class names, removes the unnecessary ones, sorts the remaining ones, and prints out a list of import statements to System.out.

Since ImportCleaner parses class files, your source file must first be compiled (presumably with a less-than-optimal import statement set). Then run ImportCleaner on the class file, capture the output, and paste the import lines into your source file.

For example, to run ImportCleaner on its own class files, you use

java -jar import_cleaner.jar ImportCleaner

(You can find the ImportCleaner class files by unzipping import_cleaner.jar).

The result is this list of imports, printed to System.out:

import java.io.DataInput;
import java.io.DataInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FilenameFilter;
import java.io.IOException;
import java.io.PrintStream;
import java.util.Iterator;
import java.util.Set;
import java.util.TreeSet;

Typically, your next step is to capture that list of imports and paste it into your source file.

If your source file contains multiple top-level classes, then you need to list all of them on the command line. For example,

java -jar import_cleaner.jar MyClass MyHelperClassThatIsDefinedInTheSameFile

However, inner classes are located automatically.

You can supply the name of a class in any of the following forms:

The ImportCleaner program strips off the suffixes and then looks for the file MyClass.class and all files of the form MyClass$*.class (for inner classes). 

If your class file is located in a package, you need to invoke ImportCleaner from the base directory. For example, if you use the package com.mycompany.myproject , invoke ImportCleaner from the directory that contains the com directory. You can supply the package name in either of the following forms:

Capturing the output is very easy if you use the shell mode in Emacs. Other good programming editors have similar features. Alternatively, you can redirect the output to a file:

java -jar import_cleaner.jar class(es) > outputfile

The program takes the following options:

Download and Installation

The usual.
  1. Download the file import_cleaner.jar . With some browsers, you may need to right-click on the link and select a menu option such as "Save Link As".
  2. Save it in your favorite location
  3. Open a command shell
  4. Run java -jar your/favorite/location/import_cleaner.jar options class(es)

License

The program is distributed under the GNU General Public License . Source code and a copy of the license are contained in the JAR file.

Limitations

Unfortunately, constants are not included in the class files, so this utility will miss them. Typical examples are:

BorderLayout.NORTH
Color.red

Also, the types of local variables are not included in the class file. This sounds like a big problem, but fortunately, the same class or interface name is often used in a method or field descriptor as well. To illustrate the issue, consider this code:

public void draw(Graphics2D g2)
{
Stroke oldStroke = g2.getStroke();
g2.setStroke(new BasicStroke());
. . .
g2.setStroke(oldStroke);
}

ImportCleaner includes java.awt.Graphics2D because it appears in the method signature of the processed class. It includes java.awt.BasicStroke because of the constructor call. But by default it won't include java.awt.Stroke since there is no guaranteed reference for it in the source file.

A remedy is to recompile after pasting in the ImportCleaner output. Then look at the error messages and manually insert the missing imports. It sounds bad, but in actual practice it doesn't seem to be all that bothersome. 

Another remedy would be to use fully qualified class names in this situation.

You may also wish to use the -usecalls option. That option harvests method calls. In our example, that option finds the java.awt.Stroke class from the ()Ljava/awt/Stroke; method descriptor of the Graphics2D.getStroke method. Harvesting method descriptors is not the default because it can lead to spurious imports. (See the description of the -usecalls option for more information.)

References