|NOTE: Several months after writing
this article, I assigned a homework
problem in my graduate programming language class that does a
better job than this solution. The technique used in this article
(i.e., analyzing class files) is interesting, but if you need to
clean import statements, I suggest you use the new and improved cleaner.
In the Java programming language, the names of classes that are defined inside packages always start with the package name. For example, the class name
starts with the name of the package java.awt. As a
convenience to programmers, the import mechanism can be used
to reference classes without the package name. For example, it is
tedious to write
java.awt.Rectangle box = new java.awt.Rectangle(5, 10, 20, 30);
You can use the more convenient form
Rectangle box = new Rectangle(5, 10, 20, 30);
provided you import the class.
Classes in the same package are automatically imported, as are the
classes in the java.lang package. For all other classes, you
must supply an import statement to either import a specific class
or to import all classes in a package, using the wildcard
Importing classes can lead to ambiguities. A class name is
ambiguous if it occurs in two packages that are imported by wildcards.
For example, suppose a program contains the imports
The class name List is now ambiguous because there are two classes java.awt.List and java.util.List. You can resolve the ambiguity by adding a specific import of the class name:
However, if you need to refer to both java.awt.List and java.util.List
in the same source file, then you have crossed the limits of the import
mechanism. You can use an import statement to shorten one of
the names to List, but you need to reference the other by its
full name whenever it occurs in the source text.
Ambiguities are unpleasant because they can arise over time, as libraries expand. For example, in JDK 1.1, there was no java.util.List class. Consider a program that imports java.awt.* and java.util.* and uses the name List as a shortened form of java.awt.List . That program compiles without errors under JDK1.1 but fails to compile in Java 2.
Therefore, the use of wildcards for imports is somewhat dangerous. However, importing each class can lead to long import lists that are tedious to manage, especially as code is moved from one class to another during development.
To illustrate this, consider the import list in one of my recent
It turned out that I really needed
(Apparently, the need for importing java.io.* had gone away at some point during the program's evolution)
Thus, a problem that Java programmers face is how to keep import lists up-to-date when programs change.
One time-honored solution of checking import lists is to comment out one line at a time until compiler errors go away. Naturally, that is tedious.
Another solution is to stop using import lists altogether and referencing the full class names at all times. Naturally, that too is tedious.
Several compilers emit lists of classes that they load as they compile a program. For example, if you run the compiler in the Sun J2SE SDK 1.4 with the -verbose option, you get a list such as
It would be an easy matter to write a script that transforms this output into a set of import statements. However, the output contains classes that don't actually need to be imported (such as CharSequence and AttributedCharacterIterator). These classes are loaded because some of the loaded classes depend on them. It is not clear (at least to me) how one can weed out the unneeded classes.
I used a different approach. I wrote a utility that harvests the class file. Unlike source files, class files never contain shortened class files. Even standard classes such as java.lang.String are referenced by their full names.
Class files contain the names of classes as well as field and method descriptors that contain class names (in an odd format, such as Ljava/lang/String; ). To harvest the class names, one must know the layout of the constant pool and the field and method descriptors. The class file format is well-documented--see the references at the end of this document--and only moderately complex.
The ImportCleaner program parses one or more class files, harvests the class names, removes the unnecessary ones, sorts the remaining ones, and prints out a list of import statements to System.out.
Since ImportCleaner parses class files, your source file must first be compiled (presumably with a less-than-optimal import statement set). Then run ImportCleaner on the class file, capture the output, and paste the import lines into your source file.
For example, to run ImportCleaner on its own class files,
java -jar import_cleaner.jar ImportCleaner
(You can find the ImportCleaner class files by unzipping import_cleaner.jar).
The result is this list of imports, printed to System.out:
Typically, your next step is to capture that list of imports and
paste it into your source file.
If your source file contains multiple top-level classes, then you need to list all of them on the command line. For example,
java -jar import_cleaner.jar MyClass MyHelperClassThatIsDefinedInTheSameFile
However, inner classes are located automatically.
You can supply the name of a class in any of the following forms:
The ImportCleaner program strips off the suffixes and then looks for the file MyClass.class and all files of the form MyClass$*.class (for inner classes).
If your class file is located in a package, you need to invoke ImportCleaner
from the base directory. For example, if you use the package com.mycompany.myproject
, invoke ImportCleaner from the directory that contains the
com directory. You can supply the package name in either of the
Capturing the output is very easy if you use the shell mode in Emacs. Other good programming editors have similar features. Alternatively, you can redirect the output to a file:
java -jar import_cleaner.jar class(es) > outputfile
The program takes the following options:
DataInputStream in = new DataInputStream(new FileInputStream(file));Harvesting the method call yields the spurious import java.io.InputStream . Such spurious imports are generally harmless but unsightly.
The program is distributed under the GNU General Public License
. Source code and a copy of the license are contained in the JAR file.
Unfortunately, constants are not included in the class files, so this utility will miss them. Typical examples are:
Also, the types of local variables are not included in the class file. This sounds like a big problem, but fortunately, the same class or interface name is often used in a method or field descriptor as well. To illustrate the issue, consider this code:
public void draw(Graphics2D g2)
Stroke oldStroke = g2.getStroke();
. . .
ImportCleaner includes java.awt.Graphics2D
because it appears in the method signature of the processed class. It
includes java.awt.BasicStroke because of the constructor
call. But by default it won't include java.awt.Stroke since
there is no guaranteed reference for it in the source file.
A remedy is to recompile after pasting in the ImportCleaner output. Then look at the error messages and manually insert the missing imports. It sounds bad, but in actual practice it doesn't seem to be all that bothersome.
Another remedy would be to use fully qualified class names in this situation.
You may also wish to use the -usecalls option. That option
harvests method calls. In our example, that option finds the java.awt.Stroke
class from the ()Ljava/awt/Stroke; method descriptor of the
Graphics2D.getStroke method. Harvesting method descriptors is not
the default because it can lead to spurious imports. (See the
description of the
-usecalls option for more information.)