Sunday, October 14, 2007

Complexities in Copying Files with Java

Copying a file is a simple operation but while working with Java I noticed there is enough to talk about this operation.

The most basic form of file copying is, opening two streams and copy character-by-character. Even though it is very inefficient code, just to cover from the beginning, this code snippet is used.

FileInputStream fin = new FileInputStream(fromFile);
FileOutputStream fout = new FileOutputStream(newFile);
int aChar ;
while ( (aChar = fin.read())>0){
fout.write(aChar);
}

We can enhance the above code snippet by using a buffer of fixed size instead of copying character-by-character. Choosing an appropriate buffer size affects the performance. It can not be too low and too high.

FileInputStream fin = new FileInputStream(fromFile);
FileOutputStream fout = new FileOutputStream(newFile);
int charRead ;
byte[] buffer = new byte[BUFFER_SIZE];
while ( (charRead = fin.read(buffer))!=-1){
fout.write(buffer, 0, charRead);
}

We can consider the above code snippet trouble free and always works, but there is much significant performance improvement possible by using Java NIO. In the above approach, OS is reading the file content into memory and then copying to the Java buffers and following the same path back to write into the file. Using NIO, we can directly transfer from the OS buffers, without copying into the Java buffer. Usually copying into the OS buffers will be performed by the hardware drivers, so this operation takes less CPU cycles. This code snippet looks something like this.

FileChannel in = new FileInputStream(src.getAbsoluteFile()).
getChannel();
FileChannel out = new FileOutputStream(dst.getAbsoluteFile()).
getChannel();
long size = in.size();
long bytesTransferred = 0L;
for (long bytesWritten = 0L; bytesWritten <>
+=bytesTransferred) {
bytesTransferred = in.transferTo(bytesWritten,
CHANNEL_TRANSFER_SIZE, out);
}


The above code snippet improves the performance due to NIO but it introduces the OS specific dependencies. The OS resources like paged pool used by the NIO are very limited. If multiple threads copying big files, we easily ran out of buffers in Windows 2003. While copying large files Windows memory pages continue to be accumulated from pool until it reaches 160MB (80% of 200MB). Once this limit reached, Windows memory manager gets activated and it will free up the pool. If the remaining 20% quickly filled with subsequent requests before memory manager cleanup, it causes this Insufficient system resource exception (System error 1450).

Microsoft has an article on this subject
http://support.microsoft.com/kb/304101 . As per this article there are two ways to handle this problem:

Decrease the memory pool threshold so that, memory manager starts cleanup much early so we will have sufficient pool memory remaining before completing cleanup.
Set the memory pool to unlimited which gives maximum possible pool memory.

During my experience with this problem, I found the first alternative worked.


No comments: