Friday, November 2, 2007

Classification of Software Applications – Customer View

There may be several classifications already exists for Software applications. I want to classify applications based on their customers independent of the technology.

At a high level below is the classification of software application business segments

  1. End User Applications
  2. End user service provider application
  3. Enterprise Applications
  4. Enterprise Service Provider Application

Applications in every segment have their unique requirements. We can not impress users of one segment with the application developed for another segment. In reality there may be a slight overlap, but we can identify the sweat spot easily if we look at the feature set carefully. In this article I will try to explain each segment and its unique properties. These aspects not just related to product marketing, product development also required to understand to develop best fit applications for the requirement world.

End User applications are targeted for install, use and maintain by the end users. Very good installation wizards, easy or zero maintenance are some of the key properties. Application should work in normal environments without having a need for installing additional applications. Some of the examples for this category are Editors, Games, Utilities like file compression, etc.

End user service provider applications are usually web applications. One service provider hosts the application, several users uses the system concurrently. Unlike end user applications, installing and uninstalling application is not very important but other features like backup, staging and restore play a critical role. It is required to support huge number of concurrent users.

Enterprise applications are the applications targeted to use in enterprise environments. To use an application in enterprise environment it need to have some mandatory requirements like authentication, role based access, scalability, performance. Application should be easily integrated with existing infrastructure. So integrating with external authentication systems and integration with existing management systems makes application to fit in enterprise environments.

For small enterprises maintaining infrastructure is a burden so they started relying on service providers with stringent requirements agreed in the form of SLAs. This adds special requirements to the applications running in this segment. Application scalability, performance and user management plays critical role. Non availability or performance issues may incur a huge loss to the service provider so monitoring these applications is very important. Securing privacy of individual users or enterprises is another unique property of the applications in this space.

Friday, October 26, 2007

Concurrent DB operations

In one of the projects I worked, there is a persistence layer which provides the API in the form of createOrUpdate method. This method starts a transaction verifies weather any object exists with the same business key. If exists, it will update existing object otherwise it will insert a new one.

While performing DB update operations like create or insert in multiple threads, I found two kinds of issues

  1. Phantom Read. When multiple threads making a decision of insert or update by performing DB lookup, two threads may declare insert but due to other thread, it leads to constraint violation errors.
  2. Transaction dead lock. If the transaction is relatively large, if another transaction trying to insert same object it leads to deadlock.

Setting a database transaction isolation level to serializable or declaring these methods as serializable solves the problem but it will defeat the purpose of parallel threads.

Below are the two kinds of workarounds for this issue:
  1. Re-submit failed transaction with random wait.
  2. Lock and write model

The first one is very simple to implement. When ever a transaction is failed due to, wait for some random time and retry the transaction. Identifying the cases that can be retried is important for this solution. For all the failures, it will not make sense to retry. Only recoverable error cases like constraint violation and deadlock can be retried.

In the second model, we need to identify the business key which makes the entity unique in domain and track the locks on this key. Every thread before attempting create/update transaction, it needs to obtain the lock on business key. After completing the transaction, it needs to release the lock. Releasing lock notifies the threads which are waiting for the lock. This kind of granular locking may not be possible with every database, so to keep the application database independent, this works better.

Sunday, October 14, 2007

Complexities in Copying Files with Java

Copying a file is a simple operation but while working with Java I noticed there is enough to talk about this operation.

The most basic form of file copying is, opening two streams and copy character-by-character. Even though it is very inefficient code, just to cover from the beginning, this code snippet is used.

FileInputStream fin = new FileInputStream(fromFile);
FileOutputStream fout = new FileOutputStream(newFile);
int aChar ;
while ( (aChar = fin.read())>0){
fout.write(aChar);
}

We can enhance the above code snippet by using a buffer of fixed size instead of copying character-by-character. Choosing an appropriate buffer size affects the performance. It can not be too low and too high.

FileInputStream fin = new FileInputStream(fromFile);
FileOutputStream fout = new FileOutputStream(newFile);
int charRead ;
byte[] buffer = new byte[BUFFER_SIZE];
while ( (charRead = fin.read(buffer))!=-1){
fout.write(buffer, 0, charRead);
}

We can consider the above code snippet trouble free and always works, but there is much significant performance improvement possible by using Java NIO. In the above approach, OS is reading the file content into memory and then copying to the Java buffers and following the same path back to write into the file. Using NIO, we can directly transfer from the OS buffers, without copying into the Java buffer. Usually copying into the OS buffers will be performed by the hardware drivers, so this operation takes less CPU cycles. This code snippet looks something like this.

FileChannel in = new FileInputStream(src.getAbsoluteFile()).
getChannel();
FileChannel out = new FileOutputStream(dst.getAbsoluteFile()).
getChannel();
long size = in.size();
long bytesTransferred = 0L;
for (long bytesWritten = 0L; bytesWritten <>
+=bytesTransferred) {
bytesTransferred = in.transferTo(bytesWritten,
CHANNEL_TRANSFER_SIZE, out);
}


The above code snippet improves the performance due to NIO but it introduces the OS specific dependencies. The OS resources like paged pool used by the NIO are very limited. If multiple threads copying big files, we easily ran out of buffers in Windows 2003. While copying large files Windows memory pages continue to be accumulated from pool until it reaches 160MB (80% of 200MB). Once this limit reached, Windows memory manager gets activated and it will free up the pool. If the remaining 20% quickly filled with subsequent requests before memory manager cleanup, it causes this Insufficient system resource exception (System error 1450).

Microsoft has an article on this subject
http://support.microsoft.com/kb/304101 . As per this article there are two ways to handle this problem:

Decrease the memory pool threshold so that, memory manager starts cleanup much early so we will have sufficient pool memory remaining before completing cleanup.
Set the memory pool to unlimited which gives maximum possible pool memory.

During my experience with this problem, I found the first alternative worked.


Saturday, September 22, 2007

Three Dimensions for Decision Making

Decision making is not so trivial in software development. We follow a systematic approach for decision making in our software development environment that helps in fast decision making. Even though decision making can not be automated, but helps a lot to make a decision or justify a decision.

In Enterprise Software development, it is required to deal with alternatives and trade-offs in every stage of the development. We need to choose the best alternative by keeping the current situation and future direction in mind to achieve customer satisfaction.

  • If we don’t make proper decisions about the usability of the product, it builds dissatisfaction about the product for customers.
  • If we don’t make decisions in proper time, it may add significant delay and additional cost to the project execution. Some times we may loose important customers also.

  • If we don’t make decisions by keeping the future direction of the product in mind, it may leads to major redesign and some times scraping the existing code.

Making all the decisions about all stages of the project before hand is practically impossible. It can be due to changing requirements or awareness about the more alternatives during the course of development. Some times some decisions deferred in one stage of the software development may become significant during the next stage and these decisions may have bubble impact on previous stages. The difficulty in decision making is more visible in distributed teams than a single team working in one geographical location due to the limited communication among the team.

In Innominds, we follow a systematic approach to handle any trade-off by analyzing the solution in three dimensions.

Dimension #1: Usability
Usability of a software solution is very important, to reach the customer satisfaction. We need to target for the solution that gives the best usability for the product

Dimension #2: Scalability
While choosing an alternative we need to analyze weather the solution is scalable or not when the size of the problem increases. If we don’t consider the realistic scalability requirements, product may pass the quality check in our lab environments but it may fail at very first customer.

Dimension #3: Time-To-Market
Time-To-Market is very important for any software product. To meet the time requirements it is acceptable to sacrifice the scalability aspect, but it is not recommended to sacrifice usability.

Sunday, August 26, 2007

Coding, Testing: Which is First?

I saw some managers planning that Resource-A will do the coding and Resource-B will do the Unit testing. In my opinion it is totally a misconception about the unit testing. Unit testing is not a task to complete. It is integral part of coding. Unit testing without coding and coding without unit testing both are not productive. In a test driven environment, we need to design and plan our work so that after coding few lines, write test cases and verify the code just we completed.

I noticed some people coding and directly testing from UI. In this approach unit tests gets lower priority in front of the anxiety to do the final testing using UI. Unit tests just considered as a distraction and additional work. This leads to a poor quality of code. More time spent on debugging during development and testing. Writing unit tests after coding increases development time and there is a more tendency that not able to cover all important cases. Again it leads to a poor quality of code. Bottom line is, if we don’t do it in a right way, adding more time can not improve the quality.

It is important to understand the reasons for the below questions
Why people tend to do the integration testing before unit testing?
How it matters unit testing after complete coding instead of testing while coding?

Some times people not sure about what they are coding and not sure weather it is going to work together with the rest of the system or not. So they will focus directly on integration testing. This is true because at the end if they find this solution is not going to work, it need to throw out all the code along with the unit tests. Instead of trying to do the coding work it is required to do some design home work. Some times adding an integration test with the stubs and discussing with the peers about the solution helps to improve the situation.

My answer to the second question is it definitely matters the way we write the unit tests. We can apply a generic principle, If the delay in getting feedback increased, its effectiveness will be decreased. I mean executing unit tests is something like getting feedback from the code. We can write good tests cases for the code written in few minutes compared to the code written in last week or last month. Also it takes more time to recollect and understand the code. Taking immediate feedback saves time in debugging during integration.

Tuesday, August 14, 2007

Code Coverage - Tool or Goal?

Code coverage can not be a goal for any software development process. It just helps as a tool for developers and management.

While using code coverage by managers it is important to note that more code coverage does not give any guarantee about the correctness of the system. It can not be used as an objective for the team. It may drive the development team towards the wrong direction of getting code coverage. Coverage driven development is very dangerous because it may happen that developers just focused on the silly branching scenarios instead of thinking from the use case and test scenario point of view. Management can use the code coverage information to get a feel of how much more testing required.

Developers also not expected to consider code coverage as a goal. In a test driven development, we need to focus on different scenarios and write the test cases. We can use code coverage information to discover the missing test cases. Some times we expect a particular portion of the code already covered but code coverage shows a gap. This indicates our test is not doing as we expected. Code coverage information helps to find the bugs in test code.

In development process, do the coding and developing cycles until we get satisfied with required functionality. At the end take coverage report and analyze the effectiveness of the tests. We may get some hints about the missing test cases. It is not important to cover every branch of the condition, just for coverage purpose. Add more tests only if it makes sense.

Test Driven Development is driven by tests and tests are driven by use cases and requirements. In this development process code coverage is a tool to get the correct test cases. If we focus on code coverage, development time increases without getting any quality benefit. If we focus on test cases, quality of the software increases and we will get code coverage as by product.

Saturday, August 11, 2007

Escaping Readability?

In most of the programming languages to assign special characters as a string, we need to escape them using backslash. It is not an issue at all for most of the times. Even though we are habituated, I like to highlight the difficulties with this escaping from my experience.

There is a chance of making mistake while escaping, may lead to incorrect results. Some times we get compilation problem but some times it escapes the backslash following character incorrectly. For example, if we forgot to escape “c:\backup”, it will treat “\b” as a back space character.

Due to escaping we will loose the readability of the string. More specifically we will face this problem while dealing with regular expressions because they also require escaping. Let us say if we want to use \ as a character in regular expression, we need to escape it as “\\”. While writing this expression in Java code every slash need to be escaped. Then this regular expression becomes “\\\\”. Otherwise both regular expression compiler and Java compiler can not interpret correctly. If we write a big regular expression with special characters it is very hard to interpret the actual expression.

Even though these problems we face rarely, while dealing with them causes a lot of inconvenience. In language it self we may add the capability to escape automatically using some special syntax. It is difficult to change the language syntax so no need to discuss much about the language changes. But it is easy to provide some kind of view transformations from the editors. When we place the cursor near to a string with quotes, an info bubble comes up which shows the string without escaping formalities and allows editing also. Once we click outside it looks like a normal string with escaping. It is just a basic thought for the solution but there may be better ways to address this problem.