CS46B Lab 3

Copyright © Cay S. Horstmann 2010 Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.

Instructions

Objectives

Learning Outcomes

This lab should be completed in two weeks.

A. Removing Directory Entries

  1. Start with this code and make it into a project lab3, following Step 1 of these instructions. You'll need this file again. If you don't have it any longer, download it as deptdir.txt onto your computer.

    Run the program (Run → Run Main Project from the main menu). In the file chooser, navigate to whereever you saved the deptdir.txt file, and click on Open.

    Look up the phone number of your instructor. (The key is Phone.)

    Then select Remove Entry.

    What happens? Why? (You'll need to read through the source code.)

  2. In the AddressBookDemo class, implement the removeEntry method, using the lookupEntry method for guidance.

    What is the code of your removeEntry method?

  3. Start the program again as described before and select Remove Entry.

    What happens? Why? (You'll need to read through the source code.)

Ok, we'll need to implement the remove method of ArrayListAddressBook. And then we'll need to test it. Chances are you aren't getting it right the first time.

In that case you'll run the program, select the directory, click Open, select Remove Entry, find it didn't work, run the program, select the directory, click Open, select Remove Entry, find it didn't work, run the program, select the directory, click Open, select Remove Entry, and on and on and on. That's a lot of clicks.

For now, we will not implement the remove method, but learn how not do click so much.

B. Launching a Terminal Window

If you need to repeat the same set of operations many times over, there is a better way: the command line interface. The command line interface is harder to use initially, but it gives you a lot of power.

Look at it this way. If you just want to snap vacation photos, you'll use a "point and shoot" camera. It requires no training and takes decent pictures.

.

But if you want your photos to be published in a magazine, you will need a professional camera. You will need to spend time learning rather technical concepts such as f-stops and shutter speeds.

.

In this lab, you will begin to learn the command shell. You will use the command shell throughout the semester for many tasks.

How to launch a command shell depends on your operating system:

You will get a window that looks somewhat like this. (The exact look depends on your operating system.)

Never seen one of these before? Congratulations—you have just reached level 2.

C. Viewing Directories

  1. You type commands into the terminal, and you get responses in plain text.

    After each command, you hit the ENTER key. Let's try it out. Type ls followed by ENTER. (ls stands for “list”.)

    What happens? If nothing interesting happens, try ls -a instead (i.e. ls, a space, a hyphen followed by a without any space, and finally ENTER. This shows all files, including the “hidden” ones.)

  2. When you first open a terminal window, you are in your home directory. Its location depends on the operating system and your user name. To see exactly what that directory is, type pwd followed by ENTER.

    What is your home directory?

  3. To see the contents of another directory, use ls followed by the directory name. Try listing the contents of your desktop directory. It is located at

    Mac OS X /Users/firstname/Desktop
    Ubuntu Linux /home/firstname/Desktop
    Windows Vista/7 /cygdrive/c/Users/Firstname/Desktop

    (Always use /cygdrive/c in Cygwin instead of C:—a colon has a special meaning in the shell.)

    Windows XP /cygdrive/c/Documents\ and\ Settings/Firstname\ Lastname/Desktop

    (Note the \ before the spaces.)

    Here firstname is the first name of the user in lowercase (or whatever was supplied as the user name when your laptop was set up).

    When you type the directory name, try typing the /, then a character or two, then hit TAB. For example, to get to the desktop, try typing De and then the TAB key.

    What happens when you hit TAB?

  4. This autocompletion feature is similar to the Ctrl+Space feature in NetBeans, and it is a huge time saver. Experienced shell users almost never type a full file name but use TAB whenever possible. It is faster and it cuts down on errors.

    Next, let's find the lab3 NetBeans project. The details depend on where you unzipped the lab3 folder. If it's on the desktop, try a directory such as /Users/firstname/Desktop/lab3 or /cygdrive/c/Users/Firstname/Desktop/lab3. You aren't actually using the desktop, right? If you are organized, it's in some place such as /Users/firstname/cs46b/lab3 or /cygdrive/c/Users/firstname/cs46b/lab3. Ask for help if you can't locate it.

    In which directory is your project? What happens when you type ls followed by that directory?

  5. Another useful feature of the shell window is command line recall.

    Hit the ↑ and ↓ arrow keys a few times. What happens?

  6. Now use the ↑ and ↓ arrow keys to recall the command that lists the lab3 directory, then add /build/classes (without a space) after lab3.

    What files are listed in that directory?

    These are the compiled files of your application.

D. Copying Files

  1. The cp command copies one or more files. We will use its simplest form here. We want to copy the deptdir.txt file to the current directory.

    Where did you put the deptdir.txt file? Give the full path name.

  2. Now type

    cp -v /path/to/deptdir.txt .

    Replace /path/to with the actual path to deptdir.txt. (For me, that was /home/cay/Downloads/deptdir.txt.)

    Type a space and a period after the full path.

    The period indicates the current directory (the one that is printed by pwd). You are copying the file to the current directory.

    The -v indicates the “verbose” option, in which you get some feedback what happens.

    What feedback did you get?

  3. Now type

    ls

    What happens? Why?

  4. We will soon be modifying our deptdir.txt file. Let's make a copy to be safe. Copy deptdir.txt to deptdir.original.txt in the current directory, using the cp command.

    What command did you use?

E. Running a Java Program

  1. Type javac -version followed by ENTER in the shell window.

    That's javac, not java.

    What happens? It might work or not. If you get a version number, what is it? If you get an error message, what is it?

  2. Under Windows, you might get a “command not found” error message. In that case, you need to use the full path name for the javac.exe program, which is a bit tedious. Type /cygdrive/c/Pr, then TAB to get it completed to /cygdrive/c/Program\ Files, then add /java/jdk and hit TAB again to get a completion somewhat like /cygdrive/c/Program\ Files/java/jdk1.7.0_01. If tab completion doesn't find it there, look instead in /cygdrive/c/Program\ Files\ \(x86\). (Note that there is a \ before every space and before every special symbol.) Once you found the JDK directory, add /bin/java -version. The entire thing is something like /cygdrive/c/Program\ Files/java/jdk1.7.0_01/bin/java -version.  Fortunately, you only need to enter it once since you can always recall it with the ↑ key.

    Now type java -classpath followed by the name of the full path to lab3/build/classes and AddressBookDemo. For example,

    java -classpath /home/cay/cs46b/lab3/build/classes AddressBookDemo

    Be careful with the letter case in AddressBookDemo.

    Don't type this from scratch. Hit ↑ until you find the ls command from Step 3. Add a space and AddressBookDemo. Then hit HOME (Ctrl+A on the Mac), erase the ls and add java -classpath. Hit ENTER when you are done.

    With Windows, you now run into an unhappy issue. You have to know which programs are Cygwin programs and which are not. The ls program is a Cygwin program and takes Unix style path names (starting with /cygdrive/c). The java program is not a Cygwin program, and it does not understand /cygdrive/c. Instead, you need to type a path such as C:/Users/cay/cs46b/lab3/build/classes. Technically, you are supposed to use backslashes (\), but as it happens, java will accept slashes. So, before hitting ENTER, make sure you replaced /cygdrive/c with C: on Windows.

    What happens when you run the AddressBookDemo?

  3. Don't bother entering a file name into the file chooser. Just hit Cancel. We'll work on that later.

    Now run the program again, using only two keystrokes.

    What are the keystrokes?

F. Installing a Text Editor

You need a professional text editor for this lab. Windows Notepad or Mac TextEdit will not suffice. Some students have a professional text editor that they like and know, but more often than not, it turns out that they don't actually know it all that well. So, even if you love your text editor, you'll need to install the one that we use in the lab. 

The professional text editor that we use in this lab is called Emacs. It is available on all major operating systems, so you won't have to learn another editor if you switch operating systems—which most professional programmers need to do at least occasionally. It is very powerful and has a huge set of extensions. 

“Emacs outshines all other editing software in approximately the same way that the noonday sun does the stars. It is not just bigger and brighter; it simply makes everything else vanish. ”—Neal Stephenson

If someone tells you that Emacs is old-fashioned and hard to use, don't listen—modern versions of Emacs have menus and dialogs and require no special training.

  1. Start Emacs.

    Select Options -> C-x/C-c/C-v Cut and Paste (CUA) and Options->Save Options. At the bottom of the window, in a ”status line”, Emacs tells you that it saved the options in a file.

    What is the name of this file?

  2. Select File -> Open File... from the menu and select the file named .emacs. Note the period (.) before the filename.

    If there is no such file, hit the following key combination instead:

    [Control-X][Control-F]~/.emacs[Enter]

    That is, hold down Control and X, let go of X but not of Control, type F, let go of both, then type the eight characters ~/.emacs and then hit the Enter key. (If you already have a ~/ after hitting [Control-X][Control-F], you can just enter the six characters .emacs and then Enter.)

    At the end, add the following lines:

    (prefer-coding-system 'utf-8-unix)
    (setq indent-tabs-mode nil)
    (setq require-final-newline t)

    Just copy from this page and paste it in with Ctrl+V.

    Select File -> Save from the menu.

    What does the status line at the bottom of the window say?

  3. Exit Emacs and start it again. Use File -> Open and open the file ~/.bash_profile (except for Ubuntu where it is ~/.bashrc. If you are annoyed by this inconsistency, open a terminal, select Edit -> Profiles -> Default -> Edit -> Title and Command, then check “Run command as login shell”. Then the terminal reads ~/.bash_profile)

    Note the dot (.) before the letter b. It is a part of the file name.

    The ~ character denotes the “home directory” of the current user.

    If you can't find the file, you need to select File->Visit New File from the menu instead.

    What is the contents of the file?

  4. Open a terminal window. (Windows users: A Cygwin terminal.) A few letters will appear in the top left corner. These letters are called the “prompt”. Typically, they contain one or two lines of text, including the name of the current directory. Sometimes, they contain the user name.

    What is the prompt in your system?

    Close the window when you are done.

  5. At the bottom of the file that you opened in step 3, add a new line
    export PS1='\w\$ '

    exactly as it appears here, as a string of eighteen characters, including two spaces, two backslashes, two single quotes, and two uppercase letters.  Just copy/paste it.

    Save the file. Open another terminal window.

    What prompt do you get now?

    PS1 is an environment variable, a variable that controls the behavior of the shell. PS1 sets the prompt string. The \w sequence denotes the path to the current directory. The \$ sequence yields a $ for a non-administrator, a # for an administrator. (This is useful to alert you when you are logged in as an administrator.)

  6. Windows users only: We are now able to solve the problem raised in part E. When running the javac or java programs, some installations require you to specify the full path to the program such as /cygdrive/c/Program\ Files\ \(x86\)/java/jdk1.7.0_02/bin/javac (The exact path and version number may be different on your computer.) You can solve this problem by adding the path to the bin directory to the PATH environment variable. Add something like this to your .bash_profile file.
    export JAVA_HOME=/cygdrive/c/Program\ Files\ \(x86\)/java/jdk1.7.0_02
    export PATH=$JAVA_HOME/bin:$PATH

    Remember to edit the path and version number.

    Add the entry to .bash_profile. Save the file. Open a new command shell. Type javac.

    What output do you get?

    If you still get a message that the command cannot be found, see a lab assistant or your instructor.

  7. Windows users only: Let's do the same with emacs so you don't have to type emacs-*/bin/emacs to start it. Find the exact name of your Emacs directory and add something like this to your .bash_profile file.
    export PATH=$PATH:~/emacs-24.1/bin

    Remember to edit the  version number.

    Open a new command shell. Type emacs &.

    What happens?

G. Windows Users: Your New Home

If you use Mac OS X or Linux, skip this part.

  1. Your life is a lot easier if you use c:\Cygwin\home\yourname as your home. Here is how.

    First, find out where your browser downloads files. It's usually something like c:\Users\yourname\Downloads.

    Now open a Cygwin shell and issue the command

    ln -s /cygdrive/c/Users/yourname/Downloads downloads

    In the path name for the download directory, replace c: with /cygdrive/c and all backslashes (\) with forward slashes (/). If the path to your download directory contains spaces, add a \ before each space.

    As you type each path segment, type the first few characters, then the Tab key.  That triggers  autocompletion, which makes it easier to type the correct path name.

    Type

    ls downloads

    What files do you see?

    Do you see the files in your browser's download directory, such as setup.exe? If not, type rm downloads and try the ln -s command again.

  2. Did you see lab3.zip in your downloads directory from the previous step? If not, download this file again. It should now show up if you run ls downloads. Now run
    mv downloads/lab3.zip .
    jar xvf lab3.zip

    What happens?

  3. Type ls lab3. What files and directories do you find?
  4. Start NetBeans and make another project from the directory c:\Cygwin\home\yourname\lab3. Run the project once. Just cancel when the dialog comes up.

    Type ls lab3 again. What files and directories do you find?

  5. Type
    cd lab3
    java -classpath build/classes AddressBookDemo

    What happens?

    Note how you no longer had to type painful path names. Life is better in ~ than in /cygdrive/c/Users/yourname.

If you got to this point, you can stop and complete the lab in the second week.

H. Command Line Arguments

  1. Remember our goal--we don't want to click through the GUI when we are testing. Fortunately, the program will switch to a console UI if we specify the name of the file from the command line:

    java -classpath /path/to/classes AddressBookDemo deptdir.txt

    When main starts, args[0] is the first argument after the program name, i.e. deptdir.txt.

    If you've ever wondered what the String[] args is good for in public static void main(String[] args), now you know...

    How does the AddressBookDemo program switch between GUI and console mode?

  2. Now run
    java -classpath /path/to/classes AddressBookDemo

    Again, replace the /path/to with the actual path.

    What happens? Why?

  3. Ok, we need to supply a file name.

    java -classpath /path/to/classes AddressBookDemo deptdir.txt

    What happens?

    For now, type option 5.

I. Input Redirection

Run the console program again (with two keystrokes :-))

Select option 2, then enter the name Horstmann. Then select option 5 to quit:

1: Add/Change Entry
2: Look Up Entry
3: Remove Entry
4: Save Directory
5: Exit
Enter command: 2
Enter name: Horstmann
Enter key: Phone
Value: (408) 924-5085
1: Add/Change Entry
2: Look Up Entry
3: Remove Entry
4: Save Directory
5: Exit
Enter command: 5

Now we don't even want to type the inputs! Let's make a file input.txt containing the four lines

2
Horstmann
Phone
5

Follow these instructions to make the file:

Mac OS X touch input.txt

open -e input.txt

(Annoyingly, the Mac text editor refuses to open a file that doesn't exist. touch updates the time stamp of a file, but it also makes it if it didn't exist.)

Ubuntu Linux gedit input.txt
Windows Notepad input.txt

Put in the four lines, save the file, and quit the text editor. Now run

java -classpath /path/to/classes AddressBookDemo deptdir.txt < input.txt

As always, remember to hit ↑ and just add the < input.txt to the end of the command line.

The < symbol means “read keystrokes from a file”, or, more formally “redirect System.in to a file”.

Now edit input.txt to add a lookup for Diaz.

  1. What is the content of input.txt now?

  2. What happens when you run the program with this input.txt?

J. Output Redirection

Now we are ready to do some serious test automation. We want to test that adding an entry works correctly. Here is the plan:

  1. What input.txt file tests this scenario?

  2. Now let's capture the output. Run this command.

    java -classpath /path/to/classes AddressBookDemo deptdir.txt < input.txt > output.txt

    Remember to hit ↑ and just add the > output.txt to the end of the command line...

    What is the contents of output.txt? How do you know?

  3. Another way of checking contents of a file is the cat command. Type

    cat output.txt

    What happens?

  4. Here is a good reason for saving the output. It often happens that you make a change to a program, and you want to run a test case again to check that it still works. Save the output file

    cp -v output.txt expected.txt.

    What is the contents of expected.txt? How did you check it?

  5. Run the program again and capture its output in output.txt. Then compare the two:

    diff output.txt expected.txt

    The diff command compares two files and prints their differences. If there aren't any, it prints nothing.

    Run the AddressBookDemo program, and the diff command as described. What happens? Why?

  6. Change input.txt by adding a 4 before the last line, i.e.

    ...
    2
    Diaz
    Phone
    4
    5

    With the new input.txt, run the AddressBookDemo program and the diff command as described. Then do it again. What happens? Why?

    We'll do more test automation when we implement removal in the next lab.

  7. One last question:

    For each of these commands, give a one-sentence description what they do: ls, pwd, cp, cat, diff.

K. Spying on Tabs, Newlines, and Character Encodings

  1. Look at this file in Notepad (Windows)/TextEdit (Mac)/gedit (Linux). What do you notice about the way that the code lines up?
  2. That looks terrible, right? The culprit is the tab character. Many programming editors insert tab characters to line up code. For example,
    if (x > y)
    {
       y = x;
    }

    is actually

    if (x > y)
    \ty=x;
    }

    where \t denotes a tab.

    When the file is displayed, the tab is shown as some number of spaces. How many spaces? That's the problem—nobody agrees. NetBeans thinks it should be 4, Notepad thinks it should be 8.

    Here is how you can see the tabs. Load the file into Emacs and run

    Alt-X hexl-mode Enter

    That is, type Alt-X, then hexl-mode in the status line, then Enter.

    Never seen one of these? Congratulations, you've reached level 3.

    You are seeing the encoding of each byte in the file. To the right, you see the characters, and if you look carefully as you move the arrow keys, you can see how they correspond.

    For example, move the cursor on the lowercase b in public on the right hand side.  In the hex display, you will see 62. That's the code for b.

    What is the code for lowercase c? How do you know?

  3. Hexadecimal is like decimal, but for people with 16 fingers. It has extra digits A B C D E F with decimal values 10 11 12 13 14 15. And 62 isn't 6 x 10 + 2 but 6 x 16 + 2 or 98. It's used for showing byte values because it's more compact than decimal. The range 0 - 255 turns into the range 00 - FF in hex: FF is 15 x 16 + 15 = 255. You don't have to worry about the details. What matters is that the hex dump shows truthfully what is in the file, not what the editor wants you to see.

    Look for spaces (with code 20) and tabs (with code 09). What is the first line of code in which you see each?

  4. You don't want tabs. In NetBeans, here is how you turn them off. Select Tools -> Options -> Editor -> Formatting and check Expand tabs to spaces.

    How do you turn off tabs in Emacs? Hint: Look into your ~/.emacs file.

    Don't use tabs. There is no advantage and only pain. I am not the only one who thinks this. Some people say that you should only use tabs and never spaces. In theory, that would work—it's the mixture of tabs and spaces that causes the problem. But how confident are you that you and your collaborators won't ever mix them? BTW, I didn't say “Don't use the Tab key”. The Tab key is fine. Just tell your editor to insert spaces when you press it.

  5. Now look at this file in Notepad in Windows. If you don't have Windows, peek at the laptop of someone who does. This file shows a different problem: line endings. In most operating systems, the end of a line is denoted by a single character, the newline with code 0A. In Windows, however, two characters are expected: a "carriage return" with code 0D and a newline.

    What's a carriage return?

    Remember these?

    In the olden days, you had to move the "carriage" back to the left of the paper, and then advance the paper one line. Or, you could not advance the paper and print over the same line multiple times, for example to  strike out characters.

    Just in case you ever need to run Windows on a typewriter, every line must end with 0D 0A.

    Look at ExhibitB.txt in hexl-mode. What do you see at the end of each line?

  6. Now you know why the lines didn't move back to the left in Notepad. You'd think that Microsoft could figure out how to fix this, but apparently not.

    And they are not the only culprit. Look at this file in hexl-mode. What do you see at the end of each line?

  7. Now try running
    sh ExhibitC.txt

    What happens?

  8. That's not what should be happening. The commands in the file should be executed. The shell gets confused by those extra 0D that it doesn't expect. You'd think that it could figure out how to ignore them, but it doesn't.

    To fix this, run

    dos2unix ExhibitC.txt

    (DOS is the precursor to Windows.)

    Now look at the file in hexl-mode again. (Close the old one and reload it.)

    What happens?

  9. Now run
    sh ExhibitC.txt

    What happens? Why does it work now?

    Always use Unix-style line endings. The Emacs configuration that I gave you takes care of that. And don't use Notepad.

  10. How can you fix ExhibitB.txt so that Notepad won't choke?
  11. One byte can encode 256 different values. There are tens of thousands of different characters in the different alphabets used on our fair planet, so one needs to use more than one byte to encode all of them. Unfortunately, there are different encoding schemes. Generally, the most useful one is the so-called UTF-8 encoding. Here is a file that encodes San José in UTF-8.

    What is the UTF-8 encoding for é?

  12. Here is a file that encodes San José in ISO 8859-1, another popular encoding that can only represent 256 characters (the 128 ASCII characters and a selection of accented characters that are useful for Western European languages).

    What is the ISO 8859-1 encoding for é?

  13. Now type
    cat ExhibitD.txt
    cat ExhibitE.txt

    Which one looks correct? (This depends entirely on how your system is configured.)

  14. Why can't your system pick the correct encoding for each file?

    Always use UTF-8 for your files unless you have an ironclad reason not to do so. (“I didn't know” isn't such a reason.) The Emacs configuration that I gave you makes UTF-8. When you read a file in a Java program, always open the scanner with UTF-8: new Scanner(file, "UTF-8"). Otherwise, your program will use the character encoding of the grader's operating system, and you don't know what that is.

  15. Having sung the praises of UTF-8, it's not without pitfalls either. Open up the following file: test1.out. What is the first line?
  16. Now compile this program
    import java.util.*;
    
    public class Test
    {
       public static void main(String[] args)
       {
          Scanner in = new Scanner(System.in);
          String line = in.nextLine();
          if (line.startsWith(args[0]))
             System.out.println("match");
          else
             System.out.println("no match");
       }
    }

    What do you expect to happen when you run

    java Test Bahrain < test1.out
  17. What actually happens? (Get the program to run first—when you compile and run it in the same directory as the one containing test1.out, the program will run and print something.)
  18. Open up the test1.out file in Emacs with hexl-mode. What are the first five bytes?
  19. What letters are denoted by the fourth and fifth byte?
  20. The first three bytes are the “byte order mark” U+FEFF in UTF-8. This requires some explanation. There are 16-bit encodings of Unicode, where each character is encoded as a sequence of one or more 16-bit quantities (values between 0 and 65536). Each of them is in turn represented by two 8-bit bytes.
    xxxx xxxx | xxxx xxxx
    <-byte1->   <-byte0->

    There are two possible ways of saving these two bytes in a file

    Which one do you think is more reasonable?

  21. You are right, of course, but unfortunately, both ways occur in practice. To distinguish the two, the following clever scheme is used by 16-bit encodings in Unicode. Make the file start with FEFF, the byte order mark, which is required to be ignored. The flipped FFFE

    Why does that help with reading a file with a 16-bit encoding?

  22. The first three bytes that you have seen are the UTF-8 encoding of the byte order mark. Why is a byte order mark not actually needed in a UTF-8 file?
  23. Nevertheless, there it is. Microsoft likes to put it into UTF-8 files as an indicator that they are, well, UTF-8.  It's perfectly legal, and it's not a bad idea. Explain how this might help distinguishing a Unicode file from an file encoded in, say, ISO 8859-1 or UTF-16.
  24. Why does the Java program fail? Modify Test.java so that the last line reads:
    System.out.printf("no match: %x\n" , Integer.valueOf(line.charAt(0)));

    Run

    java Test Bahrain < test1.out

    again. What happens?

  25. The Unicode standard requires that a program ignore the byte order mark at the beginning of a file. Java fails to ignore it. Check out this and this bug report. Vote for getting them fixed! What simple fix should Oracle make to the Scanner class?