Monday, November 18, 2013

JMeter Tutorial - Writing a Test Plan

In this tutorial, we will write a very simple test plan.

Here are the specs of the Test Application we will run JMeter against:

  • large instance on EC2
  • Ubuntu LTS12.04
  • Tomcat 7
  • Java and Spring


Setting up JMeter on Windows

We will be setting up JMeter on Windows simply because it's easier. Read User Load Testing Simulation - Installing Apache JMeter on Windows 2012 Base.


Creating a Thread Group

Start JMeter by clicking on bin/ApacheJMeter.jar in your install location. You will see in the Tree hierarchy that it has a Test Plan item and a WorkBench item.

Right click on Test Plan -> Add -> Threads (Users) -> Thread Group

The Thread Group tells JMeter how many users and requests it should simulate.

In the Thread Group panel, fill in the following:

Name: Web Users
Number of Threads (users): 10
Ramp-Up Period (in seconds): 0
Loop Count: 5

The above will generate 5 requests for each of the 10 users. Total number of requests is 50.

The Ramp-Up Period defines the time delay which each JMeter will start the user. If Ramp-Up Period is 10 and Number of Threads is 10 then the delay between each user is 1 second. In the above, we have Ramp-Up Period = 0, meaning that all the users will start at the same time.


Add Default HTTP Request Information

Since most requests we are going to make will share some common properties (like IP and port), we will set up some default HTTP Request Information.

Right click on Web Users -> Add -> Config Element -> HTTP Request Defaults.

Fill in your server name (or IP) and port number.


Add Cookies

We will add HTTP Cookies.

Right click on Web Users -> Add -> Config Element -> HTTP Cookie Manager


Adding a HTTP Request

This is where your simulated users will request. We will add a login request. You will need to figure out what are your site's username and password parameters (In my case, it's username and password)

Right click on Web Users -> Add -> Sampler -> HTTP Request

Fill in the following:

Name: Login
Server Name of IP: /
Method: POST

In parameters, click Add.

Add Name=username, Value={your_username}
Add Name=password, Value={your password}

You can add a GET a request similarly as well.


Adding the Graph Results Listener

To observe response time, we can add the Graph Results Listener.

Right click on Web Users -> Add -> Listener -> Graph Results.


Running the Test Plan

In the menu bar, click on Run -> Start

Friday, November 15, 2013

CxfSerlvet OutOfMemoryError

I was working with uploading large files to my Spring application, and I encountered an OutOfMemoryError.

java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:2271)
        at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
        at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
        at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
        at org.apache.cxf.io.CachedOutputStream.write(CachedOutputStream.java:461)
        at org.apache.cxf.helpers.IOUtils.copy(IOUtils.java:160)
        at org.apache.cxf.helpers.IOUtils.copy(IOUtils.java:104)
        at org.apache.cxf.attachment.AttachmentDataSource.cache(AttachmentDataSource.java:52)
        at org.apache.cxf.attachment.AttachmentDeserializer.cacheStreamedAttachments(AttachmentDeserializer.java:20
8)

If you are using CxfSerlvet, it lets you define the buffer memory (attachment-memory-threshold) and the max upload size (attachment-max-size)



If you are sure you have these set correctly, make sure the attachment-directory exists. If it does, make sure tomcat has permission to write to it.

Munin not generating graphs - Make sure CRON job is running

I am currently using Ubuntu 12.04 on EC2.

If your munin master is not running, you should check if munin is set up as a CRON job.

List all the scheduled cron jobs:
crontab -l
If munin-cron is not set up, we will add it. Edit the crontab file
crontab -e
Let's make munin master run every 5 mins. Append the following to the end of the file
*/5 * * * * /usr/bin/munin-cron
Let's make munin run.
sudo -u munin munin-cron

Monday, November 4, 2013

Enable Async Request Processing for Java Spring

Serlvet 3 supports asynchronous request processing which allows the requested operation to be performed on a separate thread, freeing the HTTP request memory.

In Spring, an allocated amount of memory is allocated for request processing. If you are running a long operation (such as uploading a large file, batch mailing, big data analysis), the memory for that request would be blocked until the operation is finished.

A better way of approaching this would be to return the request right away and let the long operation to run in another thread.

In Spring 3.x, it is relatively easy with the @Async support.

Before I dive down into the tutorial, here's my specs:
  • ubuntu 12.04
  • large EC2 instance
  • OpenJDK 1.7
  • Maven 3.0.4
  • Spring 3.2.4
  • CXFServlet 2.5.3 with JAX-RS (This is for Rest API)
This post will be about how to enable @Async in your spring project.


Install the latest version of Spring

At the time of this writing, the stable version is 3.2.4. The following shows how to add the maven dependency in pom.xml



Use Serlvet 3.0 namespace

In web.xml, make sure you use the XML nameplace for version 3.0.

 


Add Async Support in entry servlets in web.xml

If you are writing a normal Spring MVC Web app, you will need to add async-supported in the dispatcher serlvet below. My project only uses the Rest API, so I put the async-supported xml tag in the CXFServlet.



Enable Async in application context

In applicationContext.xml, enable executor and scheduler. If you don't know where is the applicationContext.xml, find it from web.xml's contextConfigLocation context-param field.


Make sure you have the task namespace as above.


Annotate a method as @Async

Below is a quick test of the @Async method. Notice the @Async annotation below.


Try taking out the @Async annotation and execute the program. You should be able to observe the difference.

Wednesday, October 9, 2013

Elastic Search on EC2 - Install ES cluster on Amazon Linux AMI

We will install ElasticSearch (ES) on a EC2 instance.

Here's the specs:
  • Amazon Linux AMI 2013.09
  • Medium instance
  • 64-bit machine
  • Elastic Search 0.90.5
  • Spring MVC
  • Maven
Begin by launching an instance.  You may get an out of memory error in /var/log/syslog if you use a micro instance when you launch a machine.  If you are not sure how to launch an instance, read Amazon EC2 - Launching Ubuntu Server 12.04.1 LTS step by step guide.

For the security group, you will need to open the following ports:
  • 22 (SSH)
  • 9300 (ElasticSearch Transport)
  • 9200 (HTTP Testing)

Attach Two EBS drives

We will be using one for saving data and one for logging.  Create and attach two EBS drives in the AWS console.

You will have two volumes: /dev/xvdf and /dev/xvdg.  Let's format them using XFS.
yum -y install xfsprogs xfsdump
sudo mkfs.xfs /dev/xvdf
sudo mkfs.xfs /dev/xvdg
Make the data drive /vol. Make the log drive /vol1.
vi /etc/fstab
Append the following:
/dev/xvdf /vol xfs noatime 0 0
/dev/xvdg /vo1 xfs noatime 0 0
Mount the drives
mkdir /vol
mkdir /vol1
mount /vol
mount /vol1
Read Amazon EC2 - Mounting a EBS drive for more information.

ssh into the instance
ssh -i {key} ubuntu@{ec2_public_address}

Update the machine
sudo yum -y update

Install Oracle Sun Java

In order to run ES efficiently, a JVM must be able to allocate large virtual address space and perform garbage collection on large heaps without pausing JVM.  There are also some stories online talking about OpenJDK is not as good as Oracle Java for ES.  Feel free to let me know in the comments below if this is not the case.

Download Java 7 from Oracle.

Put it in /usr/lib/jvm.

Extract and install it
tar -zxvf jdk-7u40-linux-x64.gz
Rename the folder from jdk1.7.0_40 to jdk1.7.0

You should now have jdk1.7.0 inside /usr/lib/jvm

Set java, javac.
sudo /usr/sbin/alternatives --install "/usr/bin/java" "java" "/usr/lib/jvm/jdk1.7.0/bin/java" 1
sudo /usr/sbin/alternatives --install "/usr/bin/javac" "javac" "/usr/lib/jvm/jdk1.7.0/bin/javac" 1
Correct the permissions.
sudo chmod a+x /usr/bin/java
sudo chmod a+x /usr/bin/javac
sudo chown -R root:root /usr/lib/jvm/jdk1.7.0
Set to the Sun Java by:
sudo /usr/sbin/alternatives --config java
Check your java version.
java -version

Download and install ElasticSearch

Download ElasticSearch (Current version as of this writing is 0.90.5).
sudo su
mkdir /opt/tools
cd /opt/tools
wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.5.zip
unzip elasticsearch-0.90.5.zip
Install ElasticSearch Cloud AWS plugin.
cd elasticsearch-0.90.5
bin/plugin -install elasticsearch/elasticsearch-cloud-aws/1.15.0

Configuring ES

AWS can shut down your instances at any time.  If you are storing indexed data in ephemeral drives, you will lose all the data when all the instances are shut down.

There are were two ways to persist data:
  • Store data in EBS via local gateway
  • Store data in S3 via S3 gateway
A restart of the nodes would begin to recover data from the gateway. The EBS route is better for performance, while the S3 route is better for persistence [S3 is deprecated].

We will be setting up a ES cluster and use a local gateway. S3 gateway is deprecated at the time of this writing.  The ES team has promised a new backup mechanism in the future.

vi /opt/tools/elasticsearch-0.90.5/config/elasticsearch.yml

cluster.name: mycluster
cloud:
    aws:
        access_key:
        secret_key:
        region: us-east-1
discovery:
    type: ec2

We have specified a cluster called "mycluster" above. You will need to input your aws access keys and create a S3 bucket.

We also need to ensure the JVM does not swap by doing two things:

1) Locking the memory (find this setting inside elasticsearch.yml)
bootstrap.mlockall: true
2) Set ES_MIN_MEM and ES_MAX_MEM to the same value. It is also recommended to set them to half of the system's available ram. We will set this in the ElasticSearch Service Wrapper later in the article.

Create the data and log paths.
mkdir /vol/elasticsearch/data
mkdir /vol1/elasticsearch/log
Set the data and log paths in /config/elasticsearch.yml
path.data: /vol/elasticsearch/data
path.logs: /vol1/elasticsearch/logs 
Let's edit config/logging.yml
vi /opt/tools/elasticsearch-0.90.5/config/logging.yml
Edit these settings and make sure these lines are uncommented and present

logger:
  gateway: DEBUG
  org.apache: WARN
  discovery: TRACE


Testing the cluster
bin/elasticsearch -f
Browse to the ec2 address at port 9200
http://ec2-XX-XXX-XXX-XXX.compute-1.amazonaws.com:9200/
You should see the following:
{
  "ok" : true,
  "status" : 200,
  "name" : "Storm",
  "version" : {
    "number" : "0.90.5",
    "build_hash" : "c8714e8e0620b62638f660f6144831792b9dedee",
    "build_timestamp" : "2013-09-17T12:50:20Z",
    "build_snapshot" : false,
    "lucene_version" : "4.4"
  },
  "tagline" : "You Know, for Search" 
}


Installing ElasticSearch as a Service

We will be using the ElasticSearch Java Service Wrapper.

Download the service wrapper and move it to bin/service.
curl -L -k http://github.com/elasticsearch/elasticsearch-servicewrapper/tarball/master | tar -xz
mv /service /opt/tools/elasticsearch-0.90.5/bin
Make ElasticSearch to start automatically when system reboots.
bin/service/elasticsearch install
Make ElasticSearch Service a defaul command (we will call this es_service)
ln -s /opt/tools/elasticsearch-0.90.5/bin/service/elasticsearch /usr/bin/es_service
Start the service
es_service start
You should see:
Starting ElasticSearch...
Waiting for ElasticSearch......
running: PID:2503 

Tweaking the memory settings

There will be three settings you want to care about:

  • ES_HEAP_SIZE
  • ES_MIN_MEM
  • ES_MAX_MEM
It is recommended to set ES_MIN_MEM to be the same as ES_MAX_MEM.  However, you can just set ES_HEAP_SIZE as it will be assigned to both ES_MIN_MEM and ES_MAX_MEM.


We will be tweaking these settings in the service wrapper's elasticsearch.conf instead of elasticsearch's.

vi /opt/tools/elasticsearch-0.90.5/bin/service/elasticsearch.conf

set.default.ES_HEAP_SIZE=1024

There are a few things you need to beware of.

  1. You need to leave some memory for the OS for non elasticsearch operations. Try leaving at least half of the available memory.
  2. As a reference, use 1024Mb for every 1 million documents you are saving.
Restart the service.

Ubuntu EC2 - Install Sun Oracle Java

Download Java 7 from Oracle.

Put it in /usr/lib/jvm.

Extract and install it
tar -zxvf jdk-7u40-linux-x64.gz
Rename the folder from jdk1.7.0_40 to jdk1.7.0

You should now have jdk1.7.0 inside /usr/lib/jvm

Set java, javac.
sudo update-alternatives --install "/usr/bin/java" "java" "/usr/lib/jvm/jdk1.7.0/bin/java" 1
sudo update-alternatives --install "/usr/bin/javac" "javac" "/usr/lib/jvm/jdk1.7.0/bin/javac" 1
Correct the permissions.
sudo chmod a+x /usr/bin/java
sudo chmod a+x /usr/bin/javac
sudo chown -R root:root /usr/lib/jvm/jdk1.7.0
If you have more than one version of java, you can always switch them using
sudo update-alternatives --config java
Check your java version.
java -version

Thursday, October 3, 2013

ElasticSearch - Defining the Mapping Schema

The previous posts demonstrate how easy it is to index some words and retrieve them via the REST or Java API.  However, we never really talk about how to tweak the searches to fit our needs.

Consider a subject object with two properties like the following:
{
  "name":"The Old & New British English",
  "code":12345
}
Say we have a list of subjects like the above and we want to index and search subjects with the following requirement:
  1. search by exact subject name
  2. search with stop words removed, accent characters conversion
  3. search with some spelling mistakes allowed
  4. search with some words skipped
  5. search by exact code
Without specifying the mapping, ElasticSearch (ES) will use the standard analyzer.

Before we define the ES schema, let's get familiar with the following terms.

A mapping defines how properties (Ex. "name" and "code" properties above) are indexed and searched through analyzers and tokenizers.

An analyzer is a group of filters executed in-order.
Reference: Analyzers

A filter is a function that transforms data (lowercase, stop-word removal, phonetics).
Reference: Token Filters

When we search/index for the phrase "The Old & New British English", an analyzer will break down the phrase into words through tokenizers. Each word/token is then passed through a bunch of token filters.  For example, a lowercase token filter will normalize the incoming words to lowercased words.

For another explanation, refer to this post for a better understanding of analyzers.

The following defines a simple mapping with index=subjects, id=subject, and two properties (name, code).

curl -X PUT "http://localhost:9200/subjects" -d '
{
  "mappings":{
     "subject":{
          "properties":{
            "name":{
              "type":"string"
            },
          "code":{
              "type":"string"
          }
    }
  }
}'


1.) Search by exact subject name

This is very easy. We will make the "name" field not indexed.

"subject":{
          "properties":{
            "name":{
              "type":"string"
              "index":"not_analyzed"
            }
         }

Let's popular the index.

curl -XPUT http://localhost:9200/subjects/subject/1 -d '
{
  "name":"The Old & New British English",
  "code":12345
}'

Try to do a search on the phrase "The Old & New British English"

curl -X GET "http://localhost:9200/subjects/_search?pretty=true" -d '{
    "query" : {
        "text" : { "name": "The Old & New British English" }
    }
}'

Now try to search with "the Old & New British English" or "Old & New British English". This is not very helpful since most people won't search with case-sensitivity or exact phrases.

Let's delete this mapping.

curl -X DELETE "http://localhost:9200/subjects"


2) Search with stop words removed, accent characters conversion

Let's use a new custom analyzer called "full_name".

curl -X PUT "http://localhost:9200/subjects" -d '
{
  "mappings":{
      "subject":{
          "properties":{
            "name":{
              "type":"string",
              "analyzer":"full_name"
            }
          }
      }
  }
}

To customize the way searches would work, we need to tweak the analyzer settings.  The general form of defining the settings is as follows:

"settings":{
    "analysis":{
        "filter":{
        }
    },
    "analyzer":{
        "full_name":{
            "filter":[
            ],
            "type":"custom",
            "tokenizer":"standard"
        }
    }
}

We want "subject" to be searchable with stop words removed and normalized accent characters (so that the accent e can be searchable by by an 'e').

"settings":{
    "analysis":{
        "filter":{
        }
    },
    "analyzer":{
        "full_name":{
            "filter":[
                "standard",
                "lowercase",
                "asciifolding"
            ],
            "type":"custom",
            "tokenizer":"standard"
        }
    }
}

The lowercase filter normalizes token text to lower case. Since an analyzer is used both in the index time and search time, the lowercase filter will allow case-insensitivity searches.

Let's populate the schema to the ES cluster:

curl -X PUT "http://localhost:9200/subjects" -d '
{
  "mappings":{
      "subject":{
          "properties":{
            "name":{
              "type":"string",
              analyzer:"full_name"
            }
          }
      }
  },
  "settings":{
    "analysis":{
      "analyzer":{
        "full_name":{
          "filter":[
            "standard",
            "lowercase",
            "asciifolding"
          ],
          "type":"custom",
          "tokenizer":"standard"
        }
      }
    }
  }
}'

Populate ES with "The Old & New British English".

Search for the following:
  • "The Old & New British English"
  • "old & new british english"
  • "british english"
  • "british hello english"
  • "engliah"

All of the above, expect the last one, should return the result.


3) Search with some spelling mistakes allowed

To make the search work for "engliah", we need to use the filter edgeNGram.  edgeNGram takes in two parameters: "min_gram", "max_gram".

For the term "apple" with min_gram=3, max_gram=5, ES will index it with:
  • app
  • appl
  • apple
Let's try this.

curl -X PUT "http://localhost:9200/subjects" -d '
{
  "mappings":{
      "subject":{
          "properties":{
            "name":{
              "type":"string",
              "analyzer":"partial_name"
            }
          }
      }
  },
  "settings":{
    "analysis":{
      "filter":{
        "name_ngrams": {
          "max_gram":10,
          "min_gram":2,
          "type": "edgeNGram"
        }
      },
      "analyzer":{
        "partial_name":{
          "filter":[
            "standard",
            "lowercase",
            "asciifolding",
            "name_ngrams"
          ],
          "type":"custom",
          "tokenizer":"standard"
        }
      }
    }
  }
}'

Use _analyze to check how the phrase will be indexed.

curl -X GET "http://localhost:9200/subjects/_analyze?analyzer=partial_name&pretty=true" -d 'The Old & New British English'

Try to search for the term "engliah".  You should see the result showing up.


4) Search with some words skipped

This is already working by 3) above.


5) Search by exact code

"subject":{
          "properties":{
            "code":{
              "type":"string"
              "index":"not_analyzed"
            }
         }

You can accomplish this with 1) or 2) above.  For the purpose of accomplishing the exact search, if case-sensitivity is important for you, use 1), else use 2).  I am opting 1) above.


Putting all these together

To accommodate for different search formats, we need to specify "subject" as a multi-field.

"subject":{
      "properties":{
        "name":{
          "fields":{
            "name":{
              "type":"string",
              "index":"not_analyzed"
            },
            "partial":{
                "type":"string",
                "search_analyzer":"full_name",
                "index_analyzer":"partial_name"
             }
          },
          "type":"multi_field"
        }
      }

You can access "name" by "name.name", or just "name".  This is the default field for "name" and it is defaulted to "full_name" - exact search.

You can access "partial" by "name.partial".  This is the NGram search (spelling mistakes allowed).  We are indexing the words with NGram variations, but using the exact term to search.

For example, consider a search for the term "app" within a data store with the following:
apples
appetizer
apes

If both search_analyzer and index_analyzer are using "partial_name", all three terms above will be returned.

If the search_analyzer is "full_name" and index_analyzer is "partial_name", then only "apples" and "appetizer" will be returned.  This is the desired case.

Now putting the mapping all together:

curl -X PUT "http://localhost:9200/subjects" -d '
{
  "mappings":{
      "subject":{
          "properties":{
            "name":{
              "fields":{
                  "name":{
                      "type":"string",
                      "analyzer":"full_name"
                  },
                  "partial":{
                      "type":"string",
                      "search_analyzer":"full_name",
                      "index_analyzer":"partial_name"
                  }
              }
            },
            "code":{
                "type":"string",
                "analyzer":"full_name"
            }
          }
      }
  },
  "settings":{
    "analysis":{
      "filter":{
        "name_ngrams": {
          "max_gram":10,
          "min_gram":2,
          "type": "edgeNGram"
        }
      },
      "analyzer":{
        "full_name":{
          "filter":[
            "standard",
            "lowercase",
            "asciifolding"
          ],
          "type":"custom",
          "tokenizer":"standard"
        },
        "partial_name":{
          "filter":[
            "standard",
            "lowercase",
            "asciifolding",
            "name_ngrams"
          ],
          "type":"custom",
          "tokenizer":"standard"
        }
      }
    }
  }
}'

Wednesday, October 2, 2013

ElasticSearch - Indexing via Java API

There are many ways to populate your data to the ElasticSearch data store. The most primitive way is to populate via the REST API via PUT or POST requests.

In this tutorial, we will be populating via the Java API. I have data in MySQL and my Web application is based on Spring.

Here's my setup:

  • Ubuntu 12.04 Amazon EC2
  • JDK 1.7
  • Spring 3.2
  • MySQL


Install ElasticSearch (ES) via Maven

Put the following into your pom.xml file.

Make sure you also installed the same version of ES on your server. Read How to Install ElasticSearch on EC2.

Let's create a search service called ElasticSearchService:

Interface:



Implementation:

We will be using the ElasticSearch's native Java API. We will connect to the ElasticSearch cluster using the Client object. Using the XContentBuilder, we can construct JSON wrapper of the category objects. The category data is stored in MySQL and retrieved by the categoryDao object. Finally, a HTTP GET request will put the data into the ES cluster.



Let's create the interface that you can invoke the call.

Interface:



Implementation:

Monday, September 30, 2013

ElasticSearch Query - how to insert and retreive search data

ElasticSearch uses HTTP Methods (ex. GET, POST, PUT, DELETE) to retrieve, save, and delete search data from its index.

For simplicity, we will use curl to demonstrate some usages. If you haven't done so already, start ElasticSearch in your terminal.


Adding a document

We will send a HTTP POST request to add the subject "sports" to an index. The request will have the following form:
curl -XPOST "http://localhost:9200/{index}/{type}/{id}" -d '{"key0":  "value0", ... , "keyX": "valueX"}'
Example:
curl -XPOST "http://localhost:9200/subjects/subject/1" -d '{"name":  "sports",  "creator": {"first_name":"John", "last_name":"Smith"}}'

Retrieving the document

We can get back the document by sending a GET request.
curl -X GET "http://localhost:9200/subjects/_search?q=sports"
We can also use a POST request to query the above.
curl -X POST "http://localhost:9200/subjects/_search" -d '{
"query": {"term":{"name":"sports"}}
}'
Both of the above will give you the following:
{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.30685282,"hits":[{"_index":"subjects","_type":"subject","_id":"1","_score":0.30685282, "_source" : {"name":  "sports"}}]}}
The _source filed above holds the results for the query.

To search based on the nested properties (Ex. first_name, last_name), we can do the following:
curl -XGET "http://localhost:9200/subjects/_search?q=subject.creator.first_name:John"
curl -XGET "http://localhost:9200/subjects/subject/_search?q=creator.first_name:John"
curl -XGET "http://localhost:9200/subjects/subject/_search?q=subject.creator.first_name:John" 
All the above queries will return the same results.


Deleting the document

Similarly, we can delete the subject index by a DELETE request.
curl -X DELETE "http://localhost:9200/subjects"

Creating Document with settings and mappings

If you want to adjust settings like number of shards and replicas, you may find the following useful. The more shards you have, the better the indexing performance. The more replicas you have, the better the searching performance.
curl -X PUT "http://localhost:9200/subjects" -d '
{"settings":{"index":{"number_of_shards":3, "number_of_replicas":2}}},
{"mappings":{"document": {
                             "properties": {
                                 "name" : {"type":string, "analyzer":"full_text"}
                             }
                         }
                       }
}'
The above created an index called subjects. Each document in the index has a property called name.


Checking the Mapping
curl -X GET "http://localhost:9200/subjects/_mapping?pretty=true"
You should see
{
  "subjects" : { }
}
The pretty parameter above just formats the JSON result in a human readable format.

How to Install ElasticSearch on EC2

Search is not easy. There are a lot of things you need to consider.

In the software level,

Can a search query have spelling mistakes?
Should stop words (Ex. a, the) be filtered?
What about a phrase search given non-exact phrase?

In the operation level,

Should the search be decoupled from the app machines?
Should the search be distributed? If so, how many shards, replicas should be there?

Doing a quick search would tell you that Apache Lucene is the industry standard. There are two popular abstractions on top of Lucene: Solr and ElasticSearch (ES).

There are a lot of debates on which one should be used. I choose ES because
  • it's distributed by design
  • easier to integrate for AWS EC2

The following post will talk about how you can install ElasticSearch in your linux machine (I like to use the ubuntu 12.04 build from EC2).

Download elasticsearch from elasticsearch.org. Extract the files and put it into a folder of your choice (Ex. /opt/tools).
cd /opt/tools
wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.5.zip
unzip elasticsearch-0.90.5.zip
You can start elasticsearch by:
bin/elasticsearch -f
You may want to tweak the Xmx (max memory size the heap can reach for the JVM) and Xms (the inistal heap memory size for the JVM) values.
bin/elasticsearch -f -Xmx2g -Xms2g -Des.index.storage.type=memory -Des.max-open-files=true
You can also run it as a service using the script located in bin/service.

After you started your service, visit "http://localhost:9200" in the browser. You should see the following:

{
  "ok" : true,
  "status" : 200,
  "name" : "Solitaire",
  "version" : {
    "number" : "0.90.5",
    "build_hash" : "c8714e8e0620b62638f660f6144831792b9dedee",
    "build_timestamp" : "2013-09-17T12:50:20Z",
    "build_snapshot" : false,
    "lucene_version" : "4.4"
  },
  "tagline" : "You Know, for Search"
}

Thursday, September 26, 2013

Java reading and writing file line by line

We will be using BufferedReader to read a structured file line by line and then using BufferedWriter to write it out.

The example takes some structured data and creates MySQL insert statements for each dataset and then outputs it as a file.

Sunday, August 25, 2013

Uninstall NodeJS from MacOSX

Open your terminal.

Find where nodejs is installed by:
which node
In my case, it's in /usr/local/bin/node

Go to the folder that contains /bin/node
cd /usr/local
Remove all node related stuffs
sudo rm -rf bin/node bin/node-waf include/node lib/node lib/pkgconfig/nodejs.pc share/man/man1/node.1

Sunday, August 18, 2013

FireFox OS Tutorial - Creating a Percent Calculator App

In this post, I will demonstrate how to build a Firefox OS app. From beginning to finish, it took around half a day. But most of the time was spent on non-coding stuffs like taking screenshots of the app and making the icons.

For the purpose of this post, we will build something very simple - the Percent Calculator.

Here are the tools and frameworks I used:


Percent Calcuator

Here are some screenshots of the app:





Install the Firefox OS Simulator

Before we begin, be sure you have the latest version of the Firefox browser.

Download the Firefox OS Simulator as an add-on.

In the Firefox browser, click on the Firefox menu -> Web Developer -> Firefox OS Simulator.

This is your Firefox dashboard.


Toggle the Simulator button to "Running" as shown above.


Congratulations! You now have the simulator running. Play around with it to get a feel of how it works.

Creating the App Source Structure

Create the folder structure like the following:

root
->css
  ->app.css
->images
->js
  --app.js
index.html
manifest.webapp

Download the minified versions of jquery and jquery mobile and put them in the js folder above. You may also want to roll out your own jquery mobile theme.

After you are done, add the links to the head section of the index.html


    < link rel="stylesheet" href="css/app.css">
    < link rel="stylesheet" href="css/mytheme.min.css" />
    < link rel="stylesheet" href="css/jquery.mobile.structure-1.3.2.min.css" />
    < script src="js/jquery-1.9.1.min.js">< /script>
    < script src="js/jquery.mobile-1.3.2.min.js">< /script>
    < script src="js/app.js">< /script>

app.css will store all the styles while app.js will store all the logic. Note that it is very important to place all javascript codes in files outside of the index.html due to Content Security Policy (CSP).

Here's the code for index.html so far:



css/borderless.min.css is the css file I create using the jquery theme roller.

The Manifest File

manifest.webapp defines the app's metadata. You can specify version, description, icons, developer, permissions, language, etc.

Here's the sample manifest file:



You will want to bookmark the permission page.

Coding the App

If you know HTML, CSS and Javascript, you should have no problem with this part. If you do not know anything about it, click here.

There will be three files you will be constantly working with:

index.html - holds your page markup
css/app.css - all your stylings
js/app.js - all the app logic

Here are the source code for the app (All the stuffs should be self-explanatory.).

index.html


css/app.css



js/app.js



Creating the icons

Download the PSD file (Icon circle) at the bottom of this page. Open this in photoshop and create a logo. You will want sizes of 30x30 and 60x60.

Specify these in the manifest.

Publish to the Market

When you are ready, zip everything inside the root folder. Login to the Firefox Marketplace.

Test your zip file by uploading it to the app validator. Select App Type as Packaged.

You will probably see a bunch of CSP warnings. It is okay as long as the app is not a privileged or certified app.

When you are ready, submit it to the market. You will need to write a privacy policy as well.

Wednesday, July 17, 2013

Ansbile EC2 - setting up Nginx, MySQL, php, git

In this post, we will write a playbook that's going to set up a EC2 machine for a fully workable php environment.

Starting from a fresh machine with an attached ebs volume, we will do the following:

  1. Format the new ebs volume with XFS and mount it as /vol
  2. Install php, mysql and nginx
  3. Create a mysql user and create a database
  4. Copy the public and private keys into the targeted machine
  5. Checkout a project from github

Begin by spinning a fresh EC2 AMI and attach a ebs volume to it. Read Ansible - how to launch EC2 instances and setup the php environment.


Format the new ebs volume with XFS and mount it as /vol

We will mount the new ebs volume /dev/xvdf as /vol and format it with XFS

    - name: update machine with latest packages
      action: command yum -y update
    - name: install xfsprogs
      action: yum pkg=xfsprogs state=latest
    - name: format new volume
      filesystem: fstype=xfs dev=/dev/xvdf
    - name: edit fstab and mount the vol
      action: mount name={{mount_dir}} src=/dev/xvdf opts=noatime fstype=xfs state=mounted


Install php, mysql and nginx

    - name: install php
      action: yum pkg=php state=latest
    - name: install php-mysql
      action: yum pkg=php-mysql state=latest
    - name: install nginx
      action: yum pkg=nginx state=latest
    - name: ensure nginx is running
      action: service name=nginx state=started
    - name: install mysql server
      action: yum pkg=mysql-server state=latest
    - name: make sure mysql is running
      action: service name=mysqld state=started


Create a mysql user and a database

    - name: install python mysql
      action: yum pkg=MySQL-python state=latest
    - name: create database user
      action: mysql_user user=admin password=1234qwer priv=*.*:ALL state=present
    - name: create db
      action: mysql_db db=ansible state=present


Copy the public and private keys into the targeted machine

We want the target machine to be able to do a git pull without username and password prompts.

mkdir ~/.ssh
ssh-keygen -t rsa -C "you@email.com"

You will see:
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Just press Enter on the above prompts.

Two files will be generated: id_rsa, id_rsa.pub

Log in to Github and then Go to Account Settings -> SSH Keys

Add new key by giving it a name and pasting the content of id_rsa.pub

Test it by:
ssh -T git@github.com
Here are the Ansible tasks:

    - name: install git
      action: yum pkg=git state=latest
    - name: copy private key
      action: template src=~/.ssh/id_rsa.pub dest=~/.ssh/id_rsa.pub
    - name: copy public key
      action: template src=~/.ssh/id_rsa dest=~/.ssh/id_rsa


Checkout a project from github

    - name: git checkout source
      action: git repo=ssh://git@github.com:{your_git_repo}.git dest={{work_dir}} version=unstable


Full Ansible Playbook source:

Tuesday, July 16, 2013

Ansible - how to launch EC2 instances and setup the php environment

In this post, we will create a script that will launch an instance in the EC2 cloud and install php and nginx (Installing httpd is going to be very similar) on it.

First you will need to set be Ansible.

If you are using ubuntu, read Install Ansible on ubuntu EC2.

If you are using a Mac, read Installing and Running Ansible on Mac OSX and pinging ec2 machines.

You must:
  • have python boto installed
  • set up the AWS access keys in the environment settings
Adding a host

We will use the ec2 module. It runs against localhost, so we will add a host entry.

vi /etc/ansible/hosts

Append the following:

localhost ansible_connection=local

Launching a micro instance



Label this launch_playbook.yml

Execute the script.
ansible-playbook launch_playbook.yml
In your AWS EC2 console, you will see an instance named ansible. Each task is executed in sequence.

Now add this new host in the ansible host file and label it webservers.

vi /etc/ansible/hosts
[webservers]
{the_ip_of_ec2_instance_we_just_created} ansible_connection=ssh ansible_ssh_user=ec2-user ansible_ssh_private_key_file={path_to_aws_private_key}
You don't have to do the above. In fact, you can use the group name "ec2-servers" for the following script. But the following script will need to be in the same file as the first script. I am just separating these files for easier configuration in the future.


Installing php, nginx, mysql

Label this configure_playbook.yml

Execute the script.
ansible-playbook configure_playbook.yml
Go to the public address of this instance. You should see the nginx welcoming message.

Remember to terminate the instance when you finish, else it will incur charges.

Install Ansible on ubuntu EC2

Begin by spinning a new EC2 ubuntu instance.


Install Ansible and its dependencies
sudo apt-get install python-pip python-dev
sudo pip install ansible
sudo apt-get install python-boto 
Make sure boto version is larger than 2.3

To check boto version:
pip freeze | grep boto

Make the hosts file
sudo mkdir /etc/ansible
sudo touch /etc/ansible/hosts
Put the IPs of your machines in the hosts file.

Ex. [webservers] is a group name for the 2 IPs below.
[webservers]
255.255.255.255
111.111.111.111

Check the Playbook Settings

ansible playbook playbook.yml --list-hosts

You will see the servers that the Playbook will run against:

  play #1 (create instances): host count=1
    localhost

  play #2 (configure instances): host count=0


Play the Playbook

ansible-playbook playbook.yml


AWS credentials

If you are going to use the ec2 module, you will need to set up the access keys in your environment.
vi ~/.bashrc
Append the following with your keys (You need to log in to your AWS console to get the access key pairs)
export AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY}
export AWS_SECRET_ACCESS_KEY=${AWS_SECRET_KEY}

Saturday, July 13, 2013

Installing and Running Ansible on Mac OSX and pinging ec2 machines

We will be installing Ansible from Git.


Install Ansible

Download ez_setup.py.

Install ez_setup, pip, jinja2 and ansible.
sudo python ez_setup.py
sudo easy_install pip
sudo pip install ansible
sudo pip install jinja2

Define your host file

Create the file /etc/ansible/hosts.

Put the IP of each machine you want to ping.

Example:
[appservers]
255.255.255.255 ansible_ssh_private_key_file={your_key_path}.pem  ansible_ssh_user=ec2-user
Change the IP to your EC2 instance's IP. The [appservers] is just a label for grouping. You may have servers grouped as web servers, app servers, db servers, etc.


Run Ansible

ansible all -m ping

You will see a response similar to the following if it's successful.
255.255.255.255 | success >> {
    "changed": false,
    "ping": "pong"
Let's execute a command on all the machines:

ansible all -a "/bin/echo hello"

You will see:
255.255.255.255 | success | rc=0 >>
hello

Saving the key in memory

If you don't specify the ansible_ssh_private_key_file and ansible_ssh_user attributes in the inventory file above. You can either 1.) specify the key and user in the ansible command or 2.) use ssh-agent.

1.) Explicitly specifying the user and key:
ansible all -m ping -u ec2-user --private-key={your_key}.pem
2.) Using ssh-agent and ssh-add
ssh-agent bash
ssh-add ~/.ssh/{your_key}.pem
Then you can ping the ec2 server like this:
ansible all -m ping -u ec2-user

Wednesday, July 10, 2013

AWS Elastic MapReduce - EMR MySQL DBInputFormat

In this post, we will build a MapReduce program as a JAR executable. To make this example more interesting than most of the other online posts out there, we will modify the common WordCount example to fetch from MySQL instead of a text file.

You will need to at least understand the basics of what are the mapper and the reducer to follow this post. You may want to read this from Apache.

We will use Maven to build the project. If you have no idea how to do this, read Building a JAR Executable with Maven and Spring. We will feed this JAR via the Amazon Elastic MapReduce (EMR) and save the output in Amazon S3.

Here are the EMR supported Hadoop Versions. We will be using 1.0.3.


What we will do:

Assume we have a database called Company and there is a table called Employee with two columns: id and title.

We will count the number of employees with the same titles.

This is same as the WordCount examples you see in other tutorials, but we are fetching this from a database.


Install Hadoop Library

First in your java project, include the Maven Library in the pom.xml file.

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.0.3</version>
</dependency>


The File Structure

The program will be very basic and contain the following files. The filenames should be self-explanatory.

Main.java
Map.java
Reduce.java


The mapred library VS the mapreduce library

When you are reading other hadoop examples online, you will see them using either the mapred or the mapreduce library. mapred is the older version, while mapreduce is the cleaner and newer version. To upgrade from mapred to mapreduce, read Hadoop - mapred VS mapreduce libraries.

This example will use the org.apache.hadoop.mapreduce library.


EmployeeRecord

We will need to serialize the object of our interest by implementing Writable and DBWritable as show below.




The Mapper




The Reducer




Main.java

We will hope everything up. The steps are simple.

Create a Job.
Set output format.
Set input format.
Set Mapper class.
Set Reducer class.
Set input. (In our case, it will be from the database)
Set output.


Run the Job via the AWS EMR console

Compile the project and generate a self-contained JAR file. If you are using maven, read Building a JAR Executable with Maven and Spring.

Upload your JAR file to your s3 bucket.

In the AWS EMR console, specify the location of the JAR file.

JAR location: {your_bucket_name}/{jar_name}

Arguments: s3n://{your_bucket_name}/output

The program above takes in the output location as an argument.

Read AWS - Elastic Map Reduce Tutorial for more details on how to create a job flow in EMR.

If you encounter the mysql driver missing error, read Amazon Elastic MapReduce (EMR) ClassNotFoundException: com.mysql.jdbc.Driver.

Tuesday, July 9, 2013

Amazon Elastic MapReduce (EMR) ClassNotFoundException: com.mysql.jdbc.Driver

If you get the "ClassNotFoundException: com.mysql.jdbc.Driver" error while doing a JAR Elastic MapReduce, you will need to copy the mysql connector library into the hadoop/bin library.

The error will look like:

Caused by: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:169)
at org.apache.hadoop.mapreduce.lib.db.DBConfiguration.getConnection(DBConfiguration.java:148)
at org.apache.hadoop.mapreduce.lib.db.DBInputFormat.getConnection(DBInputFormat.java:184)
... 20 more

We can copy the mysql connector library to each of the machines by "bootstrapping".


1.) Get the MySQL connector library.

You can download it from the Maven repository.

Create a bucket on S3 and upload the SQL connector to this bucket.


2.) Writing a bootstrap bash file

Name this file bootstrap.sh. We will use the "hadoop fs" command to copy the connector from S3 to each machine.

Script:
#!/bin/bash
hadoop fs -copyToLocal s3n://wundrbooks-emr-dev/mysql-connector-java-5.1.25.jar $HADOOP_HOME/lib
Upload this script to the same bucket you created in the previous step.


 3.) Create a Job Flow

Log in to the AWS EMR console.

Click on create a job flow.

Fill in all the details including your JAR file.

At the last "bootstrap" step, select custom bootstrap action and put in the location of the bootstrap.sh script (ex. s3n://{my_bucket}/bootscript.sh).

Start the job flow and monitor the stderr and stdout. Everything should work.

Hadoop - mapred VS mapreduce libraries

When you start to work on Hadoop, you may find that there are two libraries (mapred VS mapreduce) in a lot of online tutorials. Use the mapreduce library. The mapred is the older version.

To upgrade to the mapreduce library, check out the following slideshows from Yahoo:

Monday, July 8, 2013

Amazon EMR - RDS DB Security Group

Log in to your AWS RDS console. Select Security Groups on the left sidebar.

Select your DB Security Group and click on the Edit button.

Add the following:

1.) EMR master

Connection Type = EC2 Security Group
EC2 Security Group = ElasticMapReduce-master

2.) EMR slave

Connection Type = EC2 Security Group
EC2 Security Group = ElasticMapReduce-slave

Friday, July 5, 2013

Sunday, June 30, 2013

MySQL - delete from a table with a subquery on the same table

Assume you have a table called story with columns id, content and status. 

Table schema:
CREATE TABLE `story` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `content` longtext NOT NULL,
  `status` smallint(6) NOT NULL,
  PRIMARY KEY (`id`),
) ENGINE=InnoDB ;
We want to delete all the records that have status 3 and id less than 6022.

You may do the following:
delete from story where id in (select id from story where status = 3 and id < 6022);
However, this does not work.
ERROR 1093 (HY000): You can't specify target table 'story' for update in FROM clause
Instead, you need to wrap the subquery in an abstract table with an alias (called old_featured below).
delete from story where id in (select * from (select id from story where status = 3 and id < 6022) as old_featured);

Wednesday, June 26, 2013

Regex prepend and append

I was working on making all the countries below as options in a select dropdown box. For the upcoming example, you can use Sublime or Notepad++.

The countries have the following form (country_value {tab_character} country_name):

Sample Input:
AD Andorra AE United Arab Emirates AF Afghanistan AG Antigua and Barbuda AI Anguilla AL Albania AM Armenia
An option html element has the following form:



There are two things we need to do:

  1. trim all the whitespaces
  2. Replace with the option tag


1. Trim all the whitespaces

You can trim all the whitespaces by matching the following:
Pattern: ^[ \t]+|[ \t]+$
Replace with: (nothing)

2. Replace with the option tag

Match them into two blocks, you can use \1 to reference first block (first matching parenthesis below),
Pattern: (.*)\t(.*)
Replace with: 

Thursday, June 20, 2013

Stopping Image Bandwidth Theft with .htaccess

A few days ago, my host notified me that my shared host account has been using more than usual CPU resources. I didn't see the email, and my site was banned. I emailed my host and asked them to un-suspend my account so I could investigate what was the problem.

I checked Google Analytics and the traffic was normal. I then checked the bandwidth usage and found that the bandwidth was very high.

I dived deeper into the problem and I discovered that some sites were hot linking pictures on my domain.

Here's the hot-linking site:



In case if you are wondering why the image is so familiar, I actually purchased it from ShutterStock.

So I have an idea - why not swap those pictures with an image to market my site?

I used Photoshop and created this.


In case you are wondering, the banner goes to appgags.com.

At the root of my .htaccess, I appended the following code:

RewriteCond %{HTTP_REFERER} !^http://(.+\.)?appsmylife\.com/ [NC]
RewriteCond %{HTTP_REFERER} !^$
RewriteRule .*\.(jpe?g|gif|bmp|png|jpg)$ /images/appgags_banner.jpg [L]

Here's the result:


Unfortunately, the image is stretched due to the way that website set it.

Wednesday, June 19, 2013

Building a JAR Executable with Maven and Spring

I have a Spring MVC web project. I want to generate some xml files based on some data stored in my database. This program will be run once or a few times everyday. As the data size grows, the program may take longer to run. The best way to do this is to write a JAR executable and feed it to Amazon to Elastic MapReduce.

We will set up a small JAR project with Maven below.

First generate a Maven project called maventest. If you do not have maven set up, read Install Maven 3 on Amazon EC2 Ubuntu.
mvn archetype:generate -DarchetypeGroupId=org.apache.maven.archetypes -DgroupId=com.mycompany.maventest -DartifactId=maventest
Resources files are located in src/main/resources/.

Spring requires a context xml file that defines how classes are instantiated. Create a file called applicationContext.xml in src/main/resources.

Paste the following into the applicationContext.xml file



Create a file called jdbc.properties in src/main/resources. We will be connecting to a MySQL database.



My whole project is called myproject. It has the Spring MVC web project has the package name   com.mycompany.package1 and artifactId mywebproject. This cron program will have the package name com.mycompany.package2 with artifactId cron.

We will use the maven-jar-plugin to define the location of our main class.

< plugin>
< groupId>org.apache.maven.plugins< /groupId>
< artifactId>maven-jar-plugin< /artifactId>
< configuration>
  < archive>
    < manifest>
      < mainClass>com.mycompany.package2.App< /mainClass>
    < /manifest>
  < /archive>
< /configuration>
< /plugin>

In order to package all the dependencies as one jar file, we will use maven-shade-plugin. Sometimes, spring.schemas and spring.handlers files can be overwritten. Examine the transformers tag below to see that the content of these files will be appended.

We will now include all the dependencies in the pom.xml file.


Package the project (It's useful to understand what's compile, package, install and deploy)
mvn clean package
If the package command fails, it may be complaining that it can't find the com.mycompany.package1.mywebproject project.

You can do a mvn:install for mywebproject. This will install mywebproject to the local maven repository. It is located under /Users/{your_name}/.m2. Remember to compile mywebproject as JAR by setting it in the pom.xml. If you ever suspect that some classes are missing or the build is not quite correct. It's highly likely that mywebproject in the local repository is not up-to-date.

The jar file is located in the target folder. Run the jar
java -jar maventest-1-0-SNAPSHOT

Unable to locate resource hector-core-1.1-3-SNAPSHOT.jar

If you ever get the following message regarding the hector client, you can install it manually by download the file from the hectorclient website.

Error message:

Unable to locate resource https://oss.sonatype.org/content/groups/public/org/hectorclient/hector-core/1.1-3-SNAPSHOT/hector-core-1.1-3-SNAPSHOT.jar

Installation command:

mvn install:install-file -DgroupId=org.hectorclient -DartifactId=hector-core -Dversion=1.1-3-SNAPSHOT -Dpackaging=jar -Dfile=~/Downloads/hector-core-1.1-3-20130112.031550-9.jar

Tuesday, June 18, 2013

Spring component-scan: scanning services from multiple base packages

In your applicationContext.xml, you can specify base packages for scanning by:


<context:component-scan base-package="com.mycompany.package1, com.mycompany.package2"/>

Maven - A sample pom.xml for java jar executable

The following provides a sample JAR executable.

It uses hiberate, spring, mysql.

The class that contains the Main function is specified in the "mainClass" tag.

Sample pom.xml

< ?xml version="1.0"?>
<project xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd" xmlns="http://maven.apache.org/POM/4.0.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.mycompany.generator</groupId>
  <artifactId>onix-generator</artifactId>
  <version>1.0-SNAPSHOT</version>
  <name>onix-generator</name>
  <url>http://maven.apache.org</url>
  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <spring.version>3.2.1.RELEASE</spring.version>
    <jdbc.groupId>mysql</jdbc.groupId>
        <jdbc.artifactId>mysql-connector-java</jdbc.artifactId>
        <jdbc.version>5.1.14</jdbc.version>
        <hibernate.version>3.6.10.Final</hibernate.version>
        <javamail.version>1.4.1</javamail.version>
        <log4j.version>1.2.16</log4j.version>
  </properties>
  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-jar-plugin</artifactId>
        <configuration>
          <archive>
            <manifest>
              <addClasspath>true</addClasspath>
              <mainClass>com.mycompany.generator.App</mainClass>
              <packageName>com.mycompany.generator</packageName>
            </manifest>
          </archive>
        </configuration>
      </plugin>
      <plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-compiler-plugin</artifactId>
    <configuration>
      <source>1.7</source>
      <target>1.7</target>
      </configuration>
      </plugin>
    </plugins>
  </build>
  <dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>${jdbc.groupId}</groupId>
      <artifactId>${jdbc.artifactId}</artifactId>
      <version>${jdbc.version}</version>
    </dependency>
    <dependency>
            <groupId>log4j</groupId>
            <artifactId>log4j</artifactId>
            <version>${log4j.version}</version>
            <exclusions>
                <exclusion>
                    <artifactId>mail</artifactId>
                    <groupId>javax.mail</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>jms</artifactId>
                    <groupId>javax.jms</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>jmxtools</artifactId>
                    <groupId>com.sun.jdmk</groupId>
                </exclusion>
                <exclusion>
                    <artifactId>jmxri</artifactId>
                    <groupId>com.sun.jmx</groupId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-jdbc</artifactId>
            <version>${spring.version}</version>
        </dependency>
        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-orm</artifactId>
            <version>${spring.version}</version>
        </dependency>
        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-aspects</artifactId>
            <version>${spring.version}</version>
        </dependency>
        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-context-support</artifactId>
            <version>${spring.version}</version>
        </dependency>
        <dependency>
        <groupId>com.mycompany.generator</groupId>
        <artifactId>epubserver</artifactId>
        <version>1.0-SNAPSHOT</version>
        <classifier>classes</classifier>
        </dependency>
  </dependencies>

</project>

You can generate an executable by running
mvn clean package
The jar file will be located in the target folder. You can run the executable by:
java -jar {name_of_jar}