Friday, October 26, 2007

Concurrent DB operations

In one of the projects I worked, there is a persistence layer which provides the API in the form of createOrUpdate method. This method starts a transaction verifies weather any object exists with the same business key. If exists, it will update existing object otherwise it will insert a new one.

While performing DB update operations like create or insert in multiple threads, I found two kinds of issues

  1. Phantom Read. When multiple threads making a decision of insert or update by performing DB lookup, two threads may declare insert but due to other thread, it leads to constraint violation errors.
  2. Transaction dead lock. If the transaction is relatively large, if another transaction trying to insert same object it leads to deadlock.

Setting a database transaction isolation level to serializable or declaring these methods as serializable solves the problem but it will defeat the purpose of parallel threads.

Below are the two kinds of workarounds for this issue:
  1. Re-submit failed transaction with random wait.
  2. Lock and write model

The first one is very simple to implement. When ever a transaction is failed due to, wait for some random time and retry the transaction. Identifying the cases that can be retried is important for this solution. For all the failures, it will not make sense to retry. Only recoverable error cases like constraint violation and deadlock can be retried.

In the second model, we need to identify the business key which makes the entity unique in domain and track the locks on this key. Every thread before attempting create/update transaction, it needs to obtain the lock on business key. After completing the transaction, it needs to release the lock. Releasing lock notifies the threads which are waiting for the lock. This kind of granular locking may not be possible with every database, so to keep the application database independent, this works better.

Sunday, October 14, 2007

Complexities in Copying Files with Java

Copying a file is a simple operation but while working with Java I noticed there is enough to talk about this operation.

The most basic form of file copying is, opening two streams and copy character-by-character. Even though it is very inefficient code, just to cover from the beginning, this code snippet is used.

FileInputStream fin = new FileInputStream(fromFile);
FileOutputStream fout = new FileOutputStream(newFile);
int aChar ;
while ( (aChar = fin.read())>0){
fout.write(aChar);
}

We can enhance the above code snippet by using a buffer of fixed size instead of copying character-by-character. Choosing an appropriate buffer size affects the performance. It can not be too low and too high.

FileInputStream fin = new FileInputStream(fromFile);
FileOutputStream fout = new FileOutputStream(newFile);
int charRead ;
byte[] buffer = new byte[BUFFER_SIZE];
while ( (charRead = fin.read(buffer))!=-1){
fout.write(buffer, 0, charRead);
}

We can consider the above code snippet trouble free and always works, but there is much significant performance improvement possible by using Java NIO. In the above approach, OS is reading the file content into memory and then copying to the Java buffers and following the same path back to write into the file. Using NIO, we can directly transfer from the OS buffers, without copying into the Java buffer. Usually copying into the OS buffers will be performed by the hardware drivers, so this operation takes less CPU cycles. This code snippet looks something like this.

FileChannel in = new FileInputStream(src.getAbsoluteFile()).
getChannel();
FileChannel out = new FileOutputStream(dst.getAbsoluteFile()).
getChannel();
long size = in.size();
long bytesTransferred = 0L;
for (long bytesWritten = 0L; bytesWritten <>
+=bytesTransferred) {
bytesTransferred = in.transferTo(bytesWritten,
CHANNEL_TRANSFER_SIZE, out);
}


The above code snippet improves the performance due to NIO but it introduces the OS specific dependencies. The OS resources like paged pool used by the NIO are very limited. If multiple threads copying big files, we easily ran out of buffers in Windows 2003. While copying large files Windows memory pages continue to be accumulated from pool until it reaches 160MB (80% of 200MB). Once this limit reached, Windows memory manager gets activated and it will free up the pool. If the remaining 20% quickly filled with subsequent requests before memory manager cleanup, it causes this Insufficient system resource exception (System error 1450).

Microsoft has an article on this subject
http://support.microsoft.com/kb/304101 . As per this article there are two ways to handle this problem:

Decrease the memory pool threshold so that, memory manager starts cleanup much early so we will have sufficient pool memory remaining before completing cleanup.
Set the memory pool to unlimited which gives maximum possible pool memory.

During my experience with this problem, I found the first alternative worked.