Just Enough Developed Infrastructure

Java Servlets and Large, Large file Uploads: enter apache fileupload

Having examined the rails alternatives for large file upload, we turned towards other alternatives and YES , did we find one!
It's the java fileupload component from the apache commons project. It has the following features:

  • you can control the memory size within your servlet
  • you have direct access to the incoming stream without any temporary file
  • you have a streaming api for processing the different multipart streams
This document describes you to use:
  • the non-streaming API
  • the streaming version
  • the streaming version + combinations of writing FileIO using various buffers.

The basic example:( non streaming)

As described in http://commons.apache.org/fileupload/using.html You can specify <yourMaxMemorySize>, <yourTempDirectory>, <yourMaxRequestSize>
// Create a factory for disk-based file items
DiskFileItemFactory factory = new DiskFileItemFactory();
// Set factory constraints
factory.setSizeThreshold(yourMaxMemorySize);
factory.setRepository(yourTempDirectory);
// Create a new file upload handler
ServletFileUpload upload = new ServletFileUpload(factory);

// Set overall request size constraint upload.setSizeMax(yourMaxRequestSize);
// Parse the request List /* FileItem */ items = upload.parseRequest(request);


Processing the items is easy. And check if it a file part
// Process the uploaded items
Iterator iter = items.iterator();
while (iter.hasNext()) {
 FileItem item = (FileItem) iter.next();
 if (item.isFormField()) {
      processFormField(item);
 } else {
     processUploadedFile(item);
 }
}
You access the file characteristics like this:
// Process a file upload
if (!item.isFormField()) {
     String fieldName = item.getFieldName();
     String fileName = item.getName();
     String contentType = item.getContentType();
     boolean isInMemory = item.isInMemory();
     long sizeInBytes = item.getSize();
    ...
}
default implementation of FileUpload, write()  will attempt to rename the file to the specified destination, but if you want you can read the stream directly.
// Process a file upload
if (writeToFile) {
 File uploadedFile = new File(...);
          item.write(uploadedFile);
} else {
       InputStream uploadedStream = item.getInputStream();
        ...
 uploadedStream.close();
}

The streaming API:
As described in http://commons.apache.org/fileupload/streaming.html fileupload provides a way to avoid the write to disk before your servlet can handle.
// Check that we have a file upload request
boolean isMultipart = ServletFileUpload.isMultipartContent(request);
Now we are ready to parse the request into its constituent items. Here's how we do it:
// Create a new file upload handler
ServletFileUpload upload = new ServletFileUpload();
// Parse the request
FileItemIterator iter = upload.getItemIterator(request);
while (iter.hasNext()) {
      FileItemStream item = iter.next();
     String name = item.getFieldName();
     InputStream stream = item.openStream();
     if (item.isFormField()) {
               System.out.println("Form field " + name + " with value " + Streams.asString(stream) + " detected.");
     } else {
              System.out.println("File field " + name + " with file name " + item.getName() + " detected.");
              // Process the input stream
              ...
     }
}
How to write is to disk most efficiently:
We're allmost there. Now that we have the name, the stream we can write it to the correct place on this, without tempfiles and overloaded memory!
http://java.sun.com/docs/books/performance/1st_edition/html/JPIOPerformance.fm.html describes different ways you can use to write your file to disk
Option 1 : The naive way, we take the inputstream and write it byte per byte to an outputstream.
FileOutputStream fout= new FileOutputStream (yourPathtowriteto);
int byte_;
while ((byte_=stream.read()) != -1)
{
    fout.write(byte_);
}
fout.close();
Option 2: We use bufferstreams to do the job
FileOutputStream fout= new FileOutputStream ( yourPathtowriteto );
BufferedOutputStream bout= new BufferedOutputStream (fout);
BufferedInputStream bin= new BufferedInputStream(stream);

int byte_; while ((byte_=bin.read()) != -1) {      bout.write(byte_); } bout.close(); bin.close();
Option 3: Use a byte array instead of per byte. You can experiment with different buffersize to see the effect. This depends on your filesystem blocksize and your disk cache size and so on.


 FileOutputStream fout= new FileOutputStream (yourPathtowriteto);
 BufferedOutputStream bout= new BufferedOutputStream (fout);
 BufferedInputStream bin= new BufferedInputStream(stream);
 byte buf[] = new byte[2048];
 while ((bin.read(buf)) != -1)
 {
     bout.write(buf);
 }
 bout.close();
bin.close();
Option 4: Using a *static* byte array: to avoid reallocation , we create a final buffer and use synchronized to control access to it.
Again experiment with the buffersize
 static final int BUFF_SIZE = 100000;
 static final byte[] buffer = new byte[BUFF_SIZE];

 FileOutputStream fout= new FileOutputStream (yourPathtowriteto);  while (true) {   synchronized (buffer) {   int amountRead = stream.read(buffer);   if (amountRead == -1) {   break;   }   fout.write(buffer, 0, amountRead); } }
Controlling network I/O in your appserver
We optimized our file writing. But we can also improve our network handling.
You can look at grizzly, glassfish, or jetty to use the NIO capabilities of Java.