Friday, March 23, 2012

Thread Pooling in Java - Intuitive overview. Part 1

You may want to check my post on basics of Java threads, http://karthikpresumes.blogspot.in/2013/02/java-multi-threading-basics.html

Today Let us talk about Thread Pooling. Before that, let me give you an intuitive idea of why pooling of threads needed.

Let us start with a trivial web server implementation. In this, main thread would keep on listening to incoming requests and process those messages according to their arrival. This is easy to implement; Good for single processor web-server, given tasks are CPU bound or intensive. Normally, servers would have multiple processors. So, in Quad-core machine, even CPU intensive tasks would be utilizing about 25% of entire system's capability, if service is single-threaded.

Simultaneously, N number of requests could be easily served in N-processor based web server. Now, let us make our trivial web server to run N threads simultaneously to improve the performance by N fold. Cool. This way of scaling(generally, it means increasing number of requests served, per unit time) of a service is known as Vertical Scaling.

Let us discuss, how this system shall be designed. Since the requests served are of CPU intensive, we know the optimal number of concurrent threads., a priori. So, the system shall be designed as given below.
  1. At most, only the given number of threads be running and not more than that.
  2. It should be having finite unbounded queue of pending tasks; This is a moot point, by the way. But we will believe eventual completion of a task is better than rejecting that.
  3. Already created threads shouldn't be killed or shouldn't die on its own after the completion of a task. Since creating threads are generally known to be costly.
  4. During task execution, if a thread happens to crash itself, thread pool must be intelligent enough to create one.
The above design is so generic and could be abstracted easily. Java's FixedThreadPool does exactly the same job. It has to be defined with number of threads; It has finitely unbounded queue(roughly, 4 gig entries could be waiting in this queue, in a 32 bit machine) for waiting tasks.

It could be created using below line of code.

ExecutorService threadExec = Executors.newFixedThreadPool(numThreads);
ExecutorService is an interface which has APIs for submitting a task to the pool and Shutting down, etc. We will discuss this and "Future" in the next part of this blog as it would be digressing if we start discussing that right now.

A complete test code could be found here. You can test with several parameters and see the power of thread pooling.

http://code.google.com/p/threadpool-tests-java/source/browse/FixedThreadPoolTest.java

Let us assume, our web server has to handle very simple requests which doesn't take much time to complete and involves huge of I/O activities - Files I/O, network activity like another web service to process. In this scenario, it is not good to limit the number of threads as most of the time would be spent on I/O and not on computing.

Main problem with this case is, determining the number of threads at most could run on a system. Even if we could get that parameter statistically, it is not good to hold those many threads running always. For instance, after some analysis, we come to know that there may be around 1000 threads needed, at most. If we go for FixedThreadPool, it is a waste of resources as we wouldn't get the peak traffic always. Since these tasks are asynchronous and probably short lived, getting a proper max bound on number of optimal threads wouldn't be always possible.

The system that could handle this scenario, shall be designed as follows.
  1. The system should create threads as and when needed.
  2. After a thread's task completion, it could wait for certain time and die if no other task is available.
  3. It shouldn't have any waiting tasks, rejection shall be preferred instead. Think about a SynchronousQueue in Java.

Java's CachedThreadPool is designed with the above-said design goals. It is perfectly great for short-lived, asynchronous tasks. Creating a CachedThreadPool and working with that is essentially tantamount to FixedThreadPool. So, the line below, doesn't need any further explanations.

ExecutorService threadExec = Executors.newCachedThreadPool();

And a test program to analyze this is,
http://code.google.com/p/threadpool-tests-java/source/browse/CachedThreadPoolTest.java
We will discuss the implementation details of Fixed and Cached Thread Pool in the next part.

Thanks for Reading.

2 comments:

  1. Hi Karthik,
    Nice Explanation. I have one question. How come decision of keeping pending tasks in queue or rejecting the incoming tasks is related to type of thread pool implementation: FixedThreadPool or CachedThreadPool. Is this not something depending upon business logic.

    May be i am missing something.

    ReplyDelete
  2. Hi Karthik,
    Thanks for nice explanation.
    I have one basic doubt here. I have following method in place which initialises ExecutorServices.

    List getAllData() {
    ExecutorService threadPoolExec = Executors.newFixedThreadPool(4);

    threadPoolExec.submit(new Object());
    threadPoolExec.submit(new Object());
    }

    I don't want to shut of the ExecutorServices, because i want to use the threads created already. If this method is called from 5 UIs at the same time. How the behavior will be? Will it create 5 instance of Executor?

    Please reply me. Thanks in advance.

    ReplyDelete