What I Learned Over the Weekend About HTTP

In the 13th edition of Core Java, the sample web sites that I used for form posts no longer exist, so I decided to provide my own, and also support file upload. But my file upload example failed mysteriously. Here is what I learned about HTTP and the standard Java HttpClient.

The HttpClient API

Ever since Java 1.0, the Java API had an URLConnection class for making HTTP requests. It requires an oddball sequence of calls to make a POST request:

  1. Call url.openConnection() to get a connection
  2. Set any request properties on the connection
  3. Call setDoOutput(true) on the connection
  4. Write to the stream returned by getOutputStream()
  5. Close the output stream
  6. Read the response headers
  7. Read the response data from the connection's input stream
  8. In case of error, there is also an error stream

To post a form or file upload, you have to manually encode the request body.

No wonder that many programmers use a more convenient library such as Apache HttpComponents.

In Java 11, the HttpClient class promised three improvements:

The modern HTTP client lives up to the first two promises. What about the third? Let's look at the API.

  1. Get a client:
    HttpClient client = HttpClient.newHttpClient()
    
  2. Make a request:
    HttpRequest request = HttpRequest.newBuilder()
       .uri(new URI(urlString))
       .header("Content-Type", "application/json")
       .POST(HttpRequest.BodyPublishers.ofString(jsonString))
       .build();
    
  3. Get the response:
    HttpResponse<String> response = client.send(request, HttpResponse.BodyHandlers.ofString());
    
  4. Read the response:

    String bodyString = response.body();
    

All good and well, except, what's the deal with those body publishers and handlers???

The Body Publisher and Handler

The request content might not be in a string, but it could be in a file or an input stream or a byte array. That's where the BodyPublisher comes in. The BodyPublishers helper class lets you wrap a string, a file, an input stream, a byte array, or a sequence of byte arrays, or a sequence of body publishers, into a BodyPublisher.

What about form posts or file uploads? No, you still have to format those requests by hand.

In many years of web programming, I have exactly never had a request body that was an input stream or byte array. It's always been a JSON post, a classic form post or a file upload.

The Java API doesn't have support for JSON, perhaps because we are still waiting to do this efficiently with value objects. And it doesn't provide a BodyPublisher for form posts or file uploads either. I don't know why, and I added implementations in Core Java.

With body handlers, it's not so bad, but maybe strings could have been the default? A handler with a JSON result would also be nice, but the Java API doesn't yet have support for JSON.

The Core Java Examples

In the Java 17 edition of Core Java, I provided three examples:

  1. A classic form post to http://www.physics.csbsju.edu/cgi-bin/stats/t-test_paste.n.plot (now dead). As a matter of fact, I had a hard time finding such a site because public APIs with form posts have become very rare.
  2. A JSON post to http://www.reverso.net/WebReferences/WSAJAXInterface.asmx/TranslateWS (now dead). I am sure Reverso still has an API, but maybe now it has an API key or a fee or both. There aren't a lot of public API with JSON POST, since one can usually use a GET request for the kinds of queries that public APIs support.
  3. A file upload to a data URI generator that is blessedly still live. But who knows for how long?

So I decided to take matters into my own hands and added a simple service to the CodeCheck autograder. It runs a submitted Java program with given input and returns the output. You can submit the program and the input via JSON, a form post, or a file upload.

Fussy File Upload

It is easy enough to make a body publisher for the application/x-www-form-urlencoded content type:

   public static BodyPublisher ofFormData(Map<Object, Object> data) 
   {
      boolean first = true;
      var builder = new StringBuilder();
      for (Map.Entry<Object, Object> entry : data.entrySet()) 
      {
         if (first) first = false; else builder.append("&");
         builder.append(URLEncoder.encode(entry.getKey().toString(),
            StandardCharsets.UTF_8));
         builder.append("=");
         builder.append(URLEncoder.encode(entry.getValue().toString(), 
            StandardCharsets.UTF_8));
      }
      return BodyPublishers.ofString(builder.toString());
   }

Which makes it all the more surprising that the Java API doesn't do it.

Tangential note: The StandardCharsets.UTF_8 is still there since URLEncoder didn't get the memo on JEP 400: UTF-8 by Default.

File upload turned out to be fiddlier. I made a subtle mistake in the previous edition of Core Java, which that data URI service forgave. But not the framework that my CodeCheck server uses. It came back with

<h1>Bad Request</h1>
<p id="detail">
   For request 'POST /run' [Unexpected end of input]
</p>

What was bad? The server was coy about it. And it could not show the body of an incoming request that had a parse error.

Perhaps the server implementors had thought that malformed requests would be rare. They may have never contemplated that some HTTP client APIs force its users to handcraft the fiddly details of file upload.

What about the client API? Can it show the request body? Nope, only the headers. Comment in this StackOverflow post: “Bang up job, Oracle!”

There is a reason for this. Someone would have to subscribe to a flow and figure out how to turn the flow messages into a comprehensible log. That Flow abstraction is there for some technical optimization, not for the benefit of the API users. And perhaps the implementors found it tedious too, or they would have done the logging.

The next step in debugging is to use some simple local HTTP server that prints out the requests. Sadly not the JEP 408: Simple Web Server, which can only handle GET requests. The JavaScript universe has many choices, such as this one. Or you can just use netcat:

nc -kl 8888

I did that. And was very surprised about the results.

I used a BodyPublishers.ofByteArrays with a list of byte[] for the bits and pieces of the file upload protocol. The netcat output looked like this:

POST /run HTTP/1.1
Connection: Upgrade, HTTP2-Settings
Host: localhost:8888
HTTP2-Settings: AAEAAEAAAAIAAAABAAMAAABkAAQBAAAAAAUAAEAA
Transfer-encoding: chunked
Upgrade: h2c
User-Agent: Java-http-client/17.0.5
Content-Type: multipart/form-data; boundary=7693dafd05a2418c80cbb970ef8d8ec6

49
--7693dafd05a2418c80cbb970ef8d8ec6
Content-Disposition: form-data; name=
36
"Input"; filename="Input"
Content-Type: text/plain


1


2


49
--7693dafd05a2418c80cbb970ef8d8ec6
Content-Disposition: form-data; name=
4b
"HelloWorld.java"; filename="HelloWorld.java"
Content-Type: text/x-java


97
// Our first Java program

public class HelloWorld {
    public static void main(String[] args) {
        System.out.println("Hello, World!");
    }
}

2


26
--7693dafd05a2418c80cbb970ef8d8ec6--

There were the bits and pieces of the multipart form-data protocol, with its characteristic boundaries. But what about those numbers, pretty clearly the length of the fragments? I had never seen this before, but it is perfectly legitimate. It is called chunked transfer encoding.

Which explains why my implementation worked with some web servers. But why not with mine?

I lamented to my long-suffering wife that I tried everything, looked at every single boundary and counted every last newline. She was annoyed.

But wait. Newline? This is HTTP. One of the last places where lines end in CRLF. Sure enough, when I changed all my \n to \r\n, my server was happy.

What Did I Learn Today?

.jpeg

  1. If you use the HttpClient to send and receive JSON, it's a minor nuisance that you have to deal with the string body publisher and handler.
  2. If you need to send an old-fashioned form post (application/x-www-form-urlencoded) or file upload (multipart/form-data), you need to manually encode the request body. What a pain. The Java 21 edition of Core Java shows you how.
  3. In general, if you need to debug HttpClient traffic, it is nice that you can log headers. As for request bodies, use an echo server, or simply netcat.
  4. With HTTP, remember CRLF. But why did I have to go deep into those weeds?
  5. And HTTP is getting so complex that debugging it turns into a nightmare. With HTTP/1.1, that chunked encoding. But at least it is text. HTTP/2 and HTTP/3? Forget it. That's why a client library can't just punt on fiddly parameter encodings.
  6. When you design an API, and you find yourself dragging in concepts such as Flow that are of no interest to API users, maybe you are on the wrong track? And if you make common use cases tedious, that's not so good either.
  7. Enable users to log everything!

I realize that this may come across as unkind to the Java API designers. But still, the good news is that there is a standard API in the platform. In other ecosystems, I have found myself ponder the relative merits of a number of half-baked and poorly supported libraries.

Comments powered by Talkyard.