My thoughts

Tuesday, June 2, 2015

Lambda Java 8 - Basic and concise introduction

Introduction

Lambda is a new feature added in Java 8. It is an elegant replacement for anonymous classes which implement functional interfaces (interfaces with single method).

For instance, runnable interface has a single method, run().

With anonymous class, a simple implementation, would look like this,

Thread thread = new Thread(new Runnable() {
@Override
public void run() {
// Your complex logic goes here ..
System.out.println("Hello World !!");
}
});
thread.run();

With Lambda, it would look like the below,

Thread thread = new Thread(() -> {
// Your complex logic goes here ..
System.out.println("Hello World !!");
});
thread.run();

Can you see the reduced code and increased code clarity. Lambda improves readability. And for understanding performance of lambda, you can read this,
http://wiki.jvmlangsummit.com/images/7/7b/Goetz-jvmls-lambda.pdf

You may wonder, why do we need a concept which just simplifies anonymous classes with single method. For that we need to understand what is the need for single method interface. Generally single method or functional interface is used where the functionality or implementation to be passed as a method argument. In runnable interface, we pass the code to be run by a new thread. In Java core framework, there are a lot of functional interfaces, like Callable, Comparator, etc. If you carefully at those, implementations of these interfaces, are just to pass the code to other methods.

Syntax of Lambda Expression

Comma separated list of parameters, with enclosing paranthesis and parameters don't need data types as it can be inferred; Followed by single arrow, ->

Body with an expression or a block of statements. In case of expression, runtime would evaluate and return the result. in case of statement block, block has to be properly enclosed by curly braces and return statement has to be added, if needed.

For instance, if x and y are parameters to be passed, a simple lambda expression, might look like this,

(x, y) -> {
// Add needed code here.
return x + y;
}

Capturing variable in the enclosed scope

Lambda expressions can capture variables in the enclosing scope without any problems associated with scoping or shadowing. Variables can't be defined in the lambda body scope with a name, if the same variable name has already been used in the enclosing scope. In the following code snippet, lambda uses the variable, name defined in the enclosing scope,

String name = "Karthik";
Thread thread = new Thread(() -> {
// Your complex logic goes here ..
System.out.println("Hello " + name + " !!");
});
thread.run();

Hope this post gave some basic ideas on Lambda,

For more information, please refer,

https://docs.oracle.com/javase/tutorial/java/javaOO/lambdaexpressions.html

Friday, May 22, 2015

Spring boot - Quickly bootstrap your web-service or web-app.

Now a days, for quick web-services and web-app implementations, I have started using Spring boot, Just I wanted to share my experiences; in this post, I am just covering the basics of Spring boot framework.

Essentially, Spring boot helps you to,

Get rid of manual Spring configurations and absolutely the XML files for configuration. This is a great boost for first timers. :)
Create a stand-alone or uber jars and instantly run that on production boxes, with the help of embedded servlet containers like Jetty, Tomcat, etc. Of course, users could opt for container of their choice. Default is Embedded Tomcat. This is again a great boost as configuring embedded containers generally a bit tedious and time consuming.
get simplified Maven configurations, with the set of Starter POMs, Spring-boot website says, it is opinionated, but in my view, it is great as this is created by experts in the field and must have been tested fully for optimal dependencies.
make your app or service, production ready (with just adding dependency for Spring actuator) by providing various end-points for health-check, metrics, etc. This is awesome as it makes all your services look uniform when it comes to managing. This reminds me DRY (Don't repeat yourself) for API design :)
Simplify spring annotations, like "SpringBootApplication" annotation, which effectively covers essential annotations like Configuration, ComponentScan, etc.

Lets get our hands dirt, to experience Spring boot.

Create a simple Maven project with your IDE and change your pom, like given below,

This POM is enough for our RESTful service, :) Now can you see how spring boot helps to minimize the maven configurations to the core minimum.

As you see, POM sets that spring-boot-starter-parent as the parent pom for the project, this is essential part of Spring boot project as it has all the essential core components for the standalone spring based web-service. And in dependencies section, the needed dependencies shall be added, we added the artifact, spring-boot-starter-web. Real power of spring boot lies in this, as it would auto-configure the project as per the dependencies. With spring-boot-starter-web, needed web mvc configurations will be added up automagically :) If you want to have Velocity template engine for your View, you could simply add up the dependency, spring-boot-starter-velocity, which would make boot to add the essential configurations for Velocity engine. Cool, right ?

To get the service to complete, you may want to add the controller, like given below,

Code is pretty self evident, it handles the end-point, /hello, with simple response, "Hello!".

The main file should be placed under the right package, in our example, this file has to be placed in the package, in.blogspot.karthikpresumes

The annotation, SpringBootApplication is very important, it is an annotation which simply covers all the essential annotations, like Configuration, ComponentScan, etc.

The entire project is hosted here, https://github.com/karthikmit/SpringBootBlog.git

Main reference:

http://projects.spring.io/spring-boot/

Thanks for reading!

Wednesday, February 6, 2013

Java Threads - Synchronization

If you would like to have a look at my post on basics on Java Threads, Here it is http://karthikpresumes.blogspot.in/2013/02/java-multi-threading-basics.html

In this post, Let us discuss the need for synchronization and methods or techniques provided by Java to achieve that.

Threads do share resources and address space with other threads, if they belong to same process. Sharing address space implies that one thread could easily communicate with other threads by normal Java objects. This is not true if a process want to communicate with other process, as processes wouldn't share address space with other processes.

Let us assume a simple producer-consumer application. Producer is running in a thread and Consumer in another thread. Producer may populate information in a queue which is shared with Consumer object. This becomes very easy to implement using threads, as sharing the queue is as simple as holding the reference of the queue in both consumer and producer.

Producer Consumer With Threads

This code is to show you how easy to design with threads. But this comes with the cost of synchronization. Even though, in the code shown above, I didn't do anything special for multi-threading, synchronization is needed as the Queue object is being used by more than one thread. The ill effect of multi-threaded app without synchronization, may be well explained with the following code.

Simple counter without synchronization

Can you guess what could be the result? In my system, I got the following result.

We can explain this behavior with the concept of thread interleaving. Core part of counter is the code, "count++". This isn't a single activity as it seems. CPU has to execute three sequential steps to execute that.

Fetch the value of count and store it locally
Increment stored value
Set it back to the variable, count

During the execution of these steps, thread could be preempted by the processor at any step and another thread shall be resumed. This is known as thread interleaving. Certain patterns of interleaving may result in wrong behavior. Let us assume one thread got preempted at second step completion. And another starts. Now this thread wouldn't see the incremented value of counter and would increment the old value of the count variable. This may cause inconsistency and hence produce the wrong results.

Ok. How to solve this? By making a non-thread safe method, synchronized as below.

Synchronized makes the method to be executed by only one thread completely or atomically. If a thread is yet to fully finish off the synchronized method execution, other threads would have to wait, if they call that method. This is achieved by Java's object level monitor or lock mechanism. Whenever a thread calls the synchronized method, object lock/ monitor would be assigned to the thread and lock would be released only at the end of the method call. If other threads call this method, have to wait until the lock is released.This essentially serializes the method calls.

If a method modifies the state of the object which is shared across threads, then that method must be synchronized. Trade off of using the synchronized method is performance. Yes. if all methods are synchronized, there is no point in multi-threading, as it is effectively, single threaded. In fact, performance would decrease if synchronized is heavily used. It is important to encapsulate the code which modifies the state into a separate methods or synchronization should be applied for the right block of the code inside a method using synchronized statement.

Ok. what if the static methods use the synchronized keyword. Class level lock would be used instead of Object level. So, synchronized method is the easiest way of handling thread interleaving issues. Synchronized statement not only solves the threads interleaving issue, it also helps to achieve the happens-before relationship. It makes the changes done to the state, visible to other threads. As per Java thread model, it isn't guaranteed that write happened in one thread wouldn't be essentially visible to other thread. Have you ever heard of volatile variables? That is exactly to solve the issue of visibility. Any change in volatile variable would be immediately visible to other threads which uses the same variable. This might cause a bit of performance degradation as CPU couldn't do any optimization in this case. Memory barriers are techniques to solve the visibility issues. One of those techniques, is synchronization. Volatile and Atomic variables are some other ways for memory barriers.

Some of the reasons for memory inconsistency or visibility issues:

CPU would reorder execution steps as long as it doesn't affect the correctness of the program.
CPU cache isn't needed to be always in sync with main memory. In multi-processor machines, this may cause visibility issues.

If a thread wants to make the changes visible to other threads immediately, it has to use synchronization. Ok, the simple thumb-rule is, whenever state is getting changed in multi-threaded environment, it is good to keep that synchronized.

One more important stuff to discuss is Atomic variables in Java. As we had discussed, Synchronized keyword is to make sure that the method or block of code has to be executed atomically by threads. Atomically means, the entire set of steps would be finished by thread before other threads starts executing the method. This comes with the cost as the thread has to acquire the lock and that involves certain kernel related activities. There is a cheaper way to achieve atomic execution, if the execution is simple like increment, decrement or swap with other value. CPU provides set of instructions to achieve these atomic operations using CAS(compare and swap) instructions. Java has abstractions like AtomicLong for this. Let us rewrite the Counter class with AtomicLong.

Now, there is no need for synchronized method as the operation is already atomic, because of AtomicLong.

Thanks for reading.

I will come up with one more post to discuss further in multithreading. Comments are welcome.

Tuesday, February 5, 2013

Java Multi-threading - Basics

Threads, in simple terms, is the basic unit of execution; It runs the code. It has its own stack and shares the heap space and resources with other threads in the process. Process could also be considered as the unit of execution, but threads are light weight as it shares several stuff with other threads in the process.

You may want to look at my blog post on Processes, http://karthikpresumes.blogspot.in/2013/01/linux-processes-essentials-part1.html

Why Multi-threading

Any program or software may have to perform CPU activity and IO activity, during the execution. These activities are exclusive, and performed by CPU and IO processor respectively. At certain times, either CPU or IO would be completely idle, if the program runs in a single thread. Multi threading may help in this scenario. During IO on one thread won't necessarily hinder CPU activity in another thread. And now a days, computers are powered by multi cores/ processors. Simultaneously several threads can run to improve performance, even though all are CPU intensive. Multi-processing is another way to effectively utilize the computing/ IO resources. Every task would be assigned a process.

Multi-threading has certain problems which Multi-processing doesn't have, like synchronization of threads, as processes would run in their own different address space and have its own set of resources. But since threads are light weight, it is better to use threads if tasks are short-lived.

Java threads

Java has nice abstraction for threads in a class, Thread. So, creating a thread is as simple as instantiating a thread object. Thread has to be assigned some code to run, right. Java has an abstraction for that also, with its Runnable interface which has only one method "run". Just have a look at the following snippet for Thread creation.

We have implemented the "run" method of Runnable interface and assigned the "target" object in the Thread object, thread. Given these, "thread" object is all set. In order to start this thread, we have to call the start method in thread object. After that, a new thread would be spawned and the main thread would continue further. Ok, at one point of time, we may need to wait for created thread to complete its task, right? In order to do so, we need to call thread object's join method from the parent thread; Join forces the calling thread to wait until the called thread is terminated. Let us have a look at the full code for this.

Forgot to mention, there is one more way to create thread which is to extend the Thread class and override its run method. But I would suggest the Runnable interface way, which looks clean and more readable than the other way.

Since multiple threads are running in the process, it is good to make sure the threads are always doing some useful work. Think of a thread implementation which keeps on polling for some information in a tight while loop. Won't it hurt the overall system performance? Ok, In those scenarios, thread may give up its execution so that CPU could go ahead with other threads in the process. For that, thread may invoke "sleep" method which causes the CPU to suspend the execution of the thread for some time and switch over to the other threads. Sleep method has to be provided with time argument which signifies until when the thread has to be suspended. This time period should be approximate as it is not guaranteed that CPU would resume the thread exactly after this period of time, which is impossible.

A snippet of code to illustrate this,

In this example, you could see, an try catch around Sleep method. Yes, sleep could throw Interrupted exception. In the same way, Join could also be interrupted. You have to gracefully handle this, in your application. Mostly, this exception could be ignored. Let us see an example with Interrupt from main thread and its handling in Child thread.

You may want to have a look at Java synchronization and atomic variables related stuff, which is discussed in a different post,

http://karthikpresumes.blogspot.in/2013/02/java-threads-synchronization.html

Thanks for reading, friends.

Tuesday, January 22, 2013

Linux processes - Essentials - Part1

Introduction to Processes

In order to understand OS Process, we need to understand what a computer program is. CPU executes a set of instructions sequentially which is stored in RAM, along with the help of CPU registers and other collaborating systems like IO and Graphics units. Stored instructions is known as a program. Program is generally an executable file, stored in the disk. If you had ever used, ls, it is a program. Process is a running instance of a program. If you run 10 ls commands in parallel, system would have 10 processes of the program. Let us try to understand processes in Linux further in this post.

Process related information - Linux

Every process has to be initiated or spawned by some other process in the system. When a process is spawned, an unique ID would be allo, known as PID. And the parent process would be identified with PPID. Run the command "ps -f" in your terminal and have a look at the columns. You could be able to see PID and PPID along with other fields. I am listing the results from my terminal below.

karthikeyan@karthikeyan:~$ ps -f
UID PID PPID C STIME TTY TIME CMD
1000 6990 3032 0 14:49 pts/4 00:00:00 bash
1000 8800 6990 0 16:08 pts/4 00:00:00 nc -l 1234
1000 8803 6990 0 16:08 pts/4 00:00:00 ps -f

You could check the bash process which is the shell. This is the parent process for all the processes, run from this shell. If you get inquisitive about this parent processes, you might want to check the parent process of bash also (PPID, 3032); For that, you have to do "ps -ef". Option 'e' is to show every process in the system. From my terminal,

UID PID PPID C STIME TTY TIME CMD
1000 3032 1 0 11:38 ? 00:00:30 /usr/bin/gnome-terminal -x /bin/sh -c '/home/karthikeyan/Desktop/Link to idea.sh'
1000 3038 3032 0 11:38 ? 00:00:00 gnome-pty-helper
1000 3039 3032 0 11:38 pts/1 00:00:00 /bin/sh -c '/home/karthikeyan/Desktop/Link to idea.sh'
1000 5802 3032 0 13:33 pts/3 00:00:00 bash
1000 6990 3032 0 14:49 pts/4 00:00:00 bash

PID 3032, which is a parent process of bash, is gnome-terminal process. The parent PID of that process, is actually init as shown below. And Init is obviously spawned by Linux kernel.

UID PID PPID C STIME TTY TIME CMD
root 1 0 0 11:21 ? 00:00:00 /sbin/init

Since every process has to be spawned by a parent, once the child exits, it has to be cleaned up by parent. Otherwise, the child would be in Zombie state. In zombie state, process resources are deallocated and only entry in the process table remains. This is needed for the parent process to know the exit status of the child process using wait(), system call. Once wait is executed on the zombie process, entry in the process would be deleted and child would move out of Zombie state.

Zombie state is the termination state of the process and there are other states also, as given below:

Running
This is the state of the process, when it is executing. When it gets preempted by scheduler, it would move to ready state and to ready queue to get scheduled later.

Suspended
If the process is waiting on IO or Network, it would be in this state.

Stopped
If the process is in this state, it is stopped by another process, usually a debugger. Once it moves out of this, it would go to ready state.

Handy commands

In this section, we discuss some important commands related with linux processes.

The most important command to remember is "ps aux". This gives almost all the important information about the processes in the system.

PS - Process snap-shot. And a signifies all processes, u signifies user oriented format.

Generally "ps aux" output is lengthy, so Grep or more may be used along with that.

Other important options with ps are:

-e Every process
-f Full format
-p PID List, useful for filtering on PIDs

You could have a look at PS Man page for other options.

Internals of Process

Structure task_struct, defined in linux/sched.h, represents a process. For every process, there would be one task_struct allocated by kernel to store information about the process. So, analyzing this structure, gives us very useful information. Since this is a huge topic, let us take a simple overview alone, in this post. And I would cover other important stuff in a next post :)

Let us understand some important fields.

volatile long state;
This holds the state of the process; We have discussed states in this post.

int prio, static_prio, normal_prio;
unsigned int rt_priority;
Priority of the process.

struct mm_struct *mm, *active_mm;
This is very important field, holds the address space of the process. As you know, every process has its own address space, so that one process can't accidentally write into another process' memory space.

struct thread_struct thread;
This stores CPU state of the process.

struct fs_struct *fs;
struct files_struct *files;

This stores the file descriptors table for the process and other file-system related information would be in the struct fs_struct pointed by 'fs'.

int exit_state;
int exit_code, exit_signal;

This is to know the exit status of the process.

struct pid_link pids[PIDTYPE_MAX];

Hash table for PIDs, to expedite finding of task_struct, given a PID. And the following lists help for walking through processes.

struct list_head children;

This is a doubly linked list for Children processes spawned by this process.

struct list_head sibling;

This is a doubly linked list for siblings of process.

Saturday, December 8, 2012

Java Custom Annotations - Intuitive view

Java Custom Annotations - Intuitive understanding

Annotation, in simple terms, is a description of something, but not all descriptions could be considered as valid annotations in programming world. Simple Java comments couldn't be considered as an annotation, but a Javadoc could be. Formally, annotation is meta-data of some data. Meta-data is a type of data which could be understood by compiler or run-time like JVM, but shall be treated differently than normal data.

For instance, Annotations in a java class is a means to provide some extra information about that class. But, why is it useful? You could have used some of the compile-time annotations in your code, which provide some hints to the compiler like @ SuppressWarnings, @Deprecated, etc. Annotations are much more powerful which could make certain great stuff possible. Let us start with an example and understand the power of Annotations intuitively.

Let us think of a simple client-server protocol which exchanges messages through TCP Layer. And we define Messaging protocol as follows:

command + "END" + data

command -> defines the action to be performed
"END" -> delimiter for command message
data -> data on which action shall be performed

For example,
"echoENDHello World" shall be handled by the Server to return the data "Hello World" as echo is the command which is not supposed to do any operation on the data.

Now let us think of the server's implementation for this. All messages shall be received by a socket and sent to a message processor object for processing. In the processor, parsing may happen and according to the command, a right handler could be selected to process further.

The complete example is uploaded in the GIT. https://github.com/karthikmit/AnnotationsDemo

And this is for demo purpose and expect some TODOs here and there :)

For the Echo command, we may define a EchoHandler class which extends Handler interface. We may define a handler factory which returns the EchoHandler object, given the command parameter, echo. This design is almost loosely coupled as adding a new command, needs a new handler class to be implemented and some changes in Handler Factory and no changes needed in other components. Wait, what is the information being held by this Handler factory? Is that dependency really needed? Yes, because a class derived from Handler interface isn't capable of informing its own purpose to other components. There are several ways by which this information could be injected into the class, think of a final string which says which command it is capable of handling.

Annotations is a non-intrusive and programmatic way by which this sort of extra information about the class can be clearly expressed. Non-intrusive means the annotations don't do any harm on its own to the host class. If we could annotate the new controller with the new command, message processor could utilize this annotation during its handler discovery phase. A perfect de-coupling is possible with annotations. Let us get into some details of annotations.

Annotation is actually an interface and Handler annotation could be defined as follows.

@ in front of interface keyword, can be understood as AT, Annotation Type. Apart from this, there is no distinction between normal interface and Annotation definition.
Target and Retention are the meta annotations which "describe" annotations. To make this example complete, Target says this annotation should be applied at Class level; BTW, Method and Field level annotations are also possible. Retention says annotation should be available at RunTime. We need this annotation at run-time for message processor object to discover the command handlers.

Given annotation is an interface, we need to instantiate somewhere, right. Let us check the Echo handler code to understand this.

Like a Java modifier, annotation precedes the definition. In the above snippet, we annotate the EchoMessageHandler with Handler and makes its value to return "echo". This could be thought of as instantiation of Handler annotation interface.

I need to digress a bit; Check the Override annotation in the above snippet, which is actually annotating "handle" method. This annotation says that this method should be the parent interface method and is getting overridden. If that is not so, compiler would throw an error. Since Override annotation is Compile-time stuff, Run-time doesn't have any idea about this annotation.

Handler discovery of MessageProcessor class can be defined as follows:

Every Class object of type and Method object of class methods has a method getAnnotations which gives an array of annotations. This could be utilized for Handler discovery in this example.

Tuesday, March 27, 2012

Thread Pooling in Java - Part 2 - Internals.

For the basics of Java threads, please check this post, http://karthikpresumes.blogspot.in/2013/02/java-multi-threading-basics.html

In the first part, we had analyzed the needs for Fixed and cached Thread Pools.

http://karthikpresumes.blogspot.in/2012/03/thread-pooling-in-java-intuitive.html

Fixed thread pools have fixed number of running threads operating on a finite unbounded tasks queue.
Cached thread pools spawn as many number of threads as the task count at any time and have a Synchronized Queue.

And we had seen use cases for each of the thread pools in the previous part. Now, what if an use case needs the mixed behaviors of the above. For instance, behave like a CachedThreadPool until a fixed number of tasks.

Analysis of the implementations of the above thread pools would open new doors for solving interesting variants of thread-pool based problems.

Actually, both Fixed and Cached thread pools creation, internally would create instance of ThreadPoolExecutor with different parameters.
For instance, let us analyze the FixedThreadPool call,

public static ExecutorService newFixedThreadPool(int nThreads) {
return new ThreadPoolExecutor(nThreads, nThreads,
0L, TimeUnit.MILLISECONDS,
new LinkedBlockingQueue());
}

The declaration of ThreadPoolExecutor is,

public ThreadPoolExecutor(int corePoolSize,
int maximumPoolSize,
long keepAliveTime,
TimeUnit unit,
BlockingQueue workQueue) { ...

Let us try to understand each and every parameter.

CorePoolSize: This represents the number of threads to be alive even in the absence of any task. In Fixed Thread Pool, it should be equal to the total or Max thread count as we know the optimal number of threads and destroying/ recreating the threads incur performance hurt.

Maximum Pool Size: This represents the maximum number of threads that could be created in the thread pool. If the count of running threads exceeds "corePoolSize" and queue of waiting tasks are filled completely, then a new thread could be created if Maximum Pool Size > Core Pool Size.

Keep Alive Time: In case threads created exceeds the corePoolSize and some of the threads are idle for "keepAliveTime" then those would be killed to save the resources in the System. And the next parameter is the unit for KeepAliveTime.

BlockingQueue: It describes the queue to be used for Waiting tasks. For Fixed Thread Pool, it is unbounded. And for CachedThreadPool, it is SynchronizedQueue, means at any time, queued task must be immediately served; means no task could be queued for processing later.

So, if we could statistically analyze the peak and average traffic of incoming tasks, we could come up with optimal values for Core, Max pool size and KeepAliveTime; which could make our thread-pool efficient and resources conservative. :)

To make the discussion complete, we will try to understand the implementation of ThreadPoolExecutor.

Well, we need to discuss what happens when Execute of ThreadPoolExecutor called.

Algorithm which backs Execute is simple. If number of threads is less than the core pool size, a new thread will be spawned to handle this new task. If the number of active threads exceeds the core pool size and queue is filled up fully, algorithm would check for the spawning of additional threads, constrained by the max pool size count, is possible; If not, rejection handler would be called.

ThreadPoolExecutor holds a control state variable ctl, which is an AtomicInteger, provides some useful information like effective worker threads and state of ThreadPool(Running, Shutting down, etc). And there are several utility functions around this variable.

Apart from this, there are several other functionalities which assist the main functionalities like termination of Thread Pool and thread factory, etc. People interested in that, could dive into the source code for complete understanding. I hope I tried my best to keep the information concise.

Thanks for Reading.