Monday, November 18, 2013

JMeter Tutorial - Writing a Test Plan

In this tutorial, we will write a very simple test plan.

Here are the specs of the Test Application we will run JMeter against:

  • large instance on EC2
  • Ubuntu LTS12.04
  • Tomcat 7
  • Java and Spring


Setting up JMeter on Windows

We will be setting up JMeter on Windows simply because it's easier. Read User Load Testing Simulation - Installing Apache JMeter on Windows 2012 Base.


Creating a Thread Group

Start JMeter by clicking on bin/ApacheJMeter.jar in your install location. You will see in the Tree hierarchy that it has a Test Plan item and a WorkBench item.

Right click on Test Plan -> Add -> Threads (Users) -> Thread Group

The Thread Group tells JMeter how many users and requests it should simulate.

In the Thread Group panel, fill in the following:

Name: Web Users
Number of Threads (users): 10
Ramp-Up Period (in seconds): 0
Loop Count: 5

The above will generate 5 requests for each of the 10 users. Total number of requests is 50.

The Ramp-Up Period defines the time delay which each JMeter will start the user. If Ramp-Up Period is 10 and Number of Threads is 10 then the delay between each user is 1 second. In the above, we have Ramp-Up Period = 0, meaning that all the users will start at the same time.


Add Default HTTP Request Information

Since most requests we are going to make will share some common properties (like IP and port), we will set up some default HTTP Request Information.

Right click on Web Users -> Add -> Config Element -> HTTP Request Defaults.

Fill in your server name (or IP) and port number.


Add Cookies

We will add HTTP Cookies.

Right click on Web Users -> Add -> Config Element -> HTTP Cookie Manager


Adding a HTTP Request

This is where your simulated users will request. We will add a login request. You will need to figure out what are your site's username and password parameters (In my case, it's username and password)

Right click on Web Users -> Add -> Sampler -> HTTP Request

Fill in the following:

Name: Login
Server Name of IP: /
Method: POST

In parameters, click Add.

Add Name=username, Value={your_username}
Add Name=password, Value={your password}

You can add a GET a request similarly as well.


Adding the Graph Results Listener

To observe response time, we can add the Graph Results Listener.

Right click on Web Users -> Add -> Listener -> Graph Results.


Running the Test Plan

In the menu bar, click on Run -> Start

Friday, November 15, 2013

CxfSerlvet OutOfMemoryError

I was working with uploading large files to my Spring application, and I encountered an OutOfMemoryError.

java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:2271)
        at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113)
        at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
        at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140)
        at org.apache.cxf.io.CachedOutputStream.write(CachedOutputStream.java:461)
        at org.apache.cxf.helpers.IOUtils.copy(IOUtils.java:160)
        at org.apache.cxf.helpers.IOUtils.copy(IOUtils.java:104)
        at org.apache.cxf.attachment.AttachmentDataSource.cache(AttachmentDataSource.java:52)
        at org.apache.cxf.attachment.AttachmentDeserializer.cacheStreamedAttachments(AttachmentDeserializer.java:20
8)

If you are using CxfSerlvet, it lets you define the buffer memory (attachment-memory-threshold) and the max upload size (attachment-max-size)



If you are sure you have these set correctly, make sure the attachment-directory exists. If it does, make sure tomcat has permission to write to it.

Munin not generating graphs - Make sure CRON job is running

I am currently using Ubuntu 12.04 on EC2.

If your munin master is not running, you should check if munin is set up as a CRON job.

List all the scheduled cron jobs:
crontab -l
If munin-cron is not set up, we will add it. Edit the crontab file
crontab -e
Let's make munin master run every 5 mins. Append the following to the end of the file
*/5 * * * * /usr/bin/munin-cron
Let's make munin run.
sudo -u munin munin-cron

Monday, November 4, 2013

Enable Async Request Processing for Java Spring

Serlvet 3 supports asynchronous request processing which allows the requested operation to be performed on a separate thread, freeing the HTTP request memory.

In Spring, an allocated amount of memory is allocated for request processing. If you are running a long operation (such as uploading a large file, batch mailing, big data analysis), the memory for that request would be blocked until the operation is finished.

A better way of approaching this would be to return the request right away and let the long operation to run in another thread.

In Spring 3.x, it is relatively easy with the @Async support.

Before I dive down into the tutorial, here's my specs:
  • ubuntu 12.04
  • large EC2 instance
  • OpenJDK 1.7
  • Maven 3.0.4
  • Spring 3.2.4
  • CXFServlet 2.5.3 with JAX-RS (This is for Rest API)
This post will be about how to enable @Async in your spring project.


Install the latest version of Spring

At the time of this writing, the stable version is 3.2.4. The following shows how to add the maven dependency in pom.xml



Use Serlvet 3.0 namespace

In web.xml, make sure you use the XML nameplace for version 3.0.

 


Add Async Support in entry servlets in web.xml

If you are writing a normal Spring MVC Web app, you will need to add async-supported in the dispatcher serlvet below. My project only uses the Rest API, so I put the async-supported xml tag in the CXFServlet.



Enable Async in application context

In applicationContext.xml, enable executor and scheduler. If you don't know where is the applicationContext.xml, find it from web.xml's contextConfigLocation context-param field.


Make sure you have the task namespace as above.


Annotate a method as @Async

Below is a quick test of the @Async method. Notice the @Async annotation below.


Try taking out the @Async annotation and execute the program. You should be able to observe the difference.

Wednesday, October 9, 2013

Elastic Search on EC2 - Install ES cluster on Amazon Linux AMI

We will install ElasticSearch (ES) on a EC2 instance.

Here's the specs:
  • Amazon Linux AMI 2013.09
  • Medium instance
  • 64-bit machine
  • Elastic Search 0.90.5
  • Spring MVC
  • Maven
Begin by launching an instance.  You may get an out of memory error in /var/log/syslog if you use a micro instance when you launch a machine.  If you are not sure how to launch an instance, read Amazon EC2 - Launching Ubuntu Server 12.04.1 LTS step by step guide.

For the security group, you will need to open the following ports:
  • 22 (SSH)
  • 9300 (ElasticSearch Transport)
  • 9200 (HTTP Testing)

Attach Two EBS drives

We will be using one for saving data and one for logging.  Create and attach two EBS drives in the AWS console.

You will have two volumes: /dev/xvdf and /dev/xvdg.  Let's format them using XFS.
yum -y install xfsprogs xfsdump
sudo mkfs.xfs /dev/xvdf
sudo mkfs.xfs /dev/xvdg
Make the data drive /vol. Make the log drive /vol1.
vi /etc/fstab
Append the following:
/dev/xvdf /vol xfs noatime 0 0
/dev/xvdg /vo1 xfs noatime 0 0
Mount the drives
mkdir /vol
mkdir /vol1
mount /vol
mount /vol1
Read Amazon EC2 - Mounting a EBS drive for more information.

ssh into the instance
ssh -i {key} ubuntu@{ec2_public_address}

Update the machine
sudo yum -y update

Install Oracle Sun Java

In order to run ES efficiently, a JVM must be able to allocate large virtual address space and perform garbage collection on large heaps without pausing JVM.  There are also some stories online talking about OpenJDK is not as good as Oracle Java for ES.  Feel free to let me know in the comments below if this is not the case.

Download Java 7 from Oracle.

Put it in /usr/lib/jvm.

Extract and install it
tar -zxvf jdk-7u40-linux-x64.gz
Rename the folder from jdk1.7.0_40 to jdk1.7.0

You should now have jdk1.7.0 inside /usr/lib/jvm

Set java, javac.
sudo /usr/sbin/alternatives --install "/usr/bin/java" "java" "/usr/lib/jvm/jdk1.7.0/bin/java" 1
sudo /usr/sbin/alternatives --install "/usr/bin/javac" "javac" "/usr/lib/jvm/jdk1.7.0/bin/javac" 1
Correct the permissions.
sudo chmod a+x /usr/bin/java
sudo chmod a+x /usr/bin/javac
sudo chown -R root:root /usr/lib/jvm/jdk1.7.0
Set to the Sun Java by:
sudo /usr/sbin/alternatives --config java
Check your java version.
java -version

Download and install ElasticSearch

Download ElasticSearch (Current version as of this writing is 0.90.5).
sudo su
mkdir /opt/tools
cd /opt/tools
wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.5.zip
unzip elasticsearch-0.90.5.zip
Install ElasticSearch Cloud AWS plugin.
cd elasticsearch-0.90.5
bin/plugin -install elasticsearch/elasticsearch-cloud-aws/1.15.0

Configuring ES

AWS can shut down your instances at any time.  If you are storing indexed data in ephemeral drives, you will lose all the data when all the instances are shut down.

There are were two ways to persist data:
  • Store data in EBS via local gateway
  • Store data in S3 via S3 gateway
A restart of the nodes would begin to recover data from the gateway. The EBS route is better for performance, while the S3 route is better for persistence [S3 is deprecated].

We will be setting up a ES cluster and use a local gateway. S3 gateway is deprecated at the time of this writing.  The ES team has promised a new backup mechanism in the future.

vi /opt/tools/elasticsearch-0.90.5/config/elasticsearch.yml

cluster.name: mycluster
cloud:
    aws:
        access_key:
        secret_key:
        region: us-east-1
discovery:
    type: ec2

We have specified a cluster called "mycluster" above. You will need to input your aws access keys and create a S3 bucket.

We also need to ensure the JVM does not swap by doing two things:

1) Locking the memory (find this setting inside elasticsearch.yml)
bootstrap.mlockall: true
2) Set ES_MIN_MEM and ES_MAX_MEM to the same value. It is also recommended to set them to half of the system's available ram. We will set this in the ElasticSearch Service Wrapper later in the article.

Create the data and log paths.
mkdir /vol/elasticsearch/data
mkdir /vol1/elasticsearch/log
Set the data and log paths in /config/elasticsearch.yml
path.data: /vol/elasticsearch/data
path.logs: /vol1/elasticsearch/logs 
Let's edit config/logging.yml
vi /opt/tools/elasticsearch-0.90.5/config/logging.yml
Edit these settings and make sure these lines are uncommented and present

logger:
  gateway: DEBUG
  org.apache: WARN
  discovery: TRACE


Testing the cluster
bin/elasticsearch -f
Browse to the ec2 address at port 9200
http://ec2-XX-XXX-XXX-XXX.compute-1.amazonaws.com:9200/
You should see the following:
{
  "ok" : true,
  "status" : 200,
  "name" : "Storm",
  "version" : {
    "number" : "0.90.5",
    "build_hash" : "c8714e8e0620b62638f660f6144831792b9dedee",
    "build_timestamp" : "2013-09-17T12:50:20Z",
    "build_snapshot" : false,
    "lucene_version" : "4.4"
  },
  "tagline" : "You Know, for Search" 
}


Installing ElasticSearch as a Service

We will be using the ElasticSearch Java Service Wrapper.

Download the service wrapper and move it to bin/service.
curl -L -k http://github.com/elasticsearch/elasticsearch-servicewrapper/tarball/master | tar -xz
mv /service /opt/tools/elasticsearch-0.90.5/bin
Make ElasticSearch to start automatically when system reboots.
bin/service/elasticsearch install
Make ElasticSearch Service a defaul command (we will call this es_service)
ln -s /opt/tools/elasticsearch-0.90.5/bin/service/elasticsearch /usr/bin/es_service
Start the service
es_service start
You should see:
Starting ElasticSearch...
Waiting for ElasticSearch......
running: PID:2503 

Tweaking the memory settings

There will be three settings you want to care about:

  • ES_HEAP_SIZE
  • ES_MIN_MEM
  • ES_MAX_MEM
It is recommended to set ES_MIN_MEM to be the same as ES_MAX_MEM.  However, you can just set ES_HEAP_SIZE as it will be assigned to both ES_MIN_MEM and ES_MAX_MEM.


We will be tweaking these settings in the service wrapper's elasticsearch.conf instead of elasticsearch's.

vi /opt/tools/elasticsearch-0.90.5/bin/service/elasticsearch.conf

set.default.ES_HEAP_SIZE=1024

There are a few things you need to beware of.

  1. You need to leave some memory for the OS for non elasticsearch operations. Try leaving at least half of the available memory.
  2. As a reference, use 1024Mb for every 1 million documents you are saving.
Restart the service.

Ubuntu EC2 - Install Sun Oracle Java

Download Java 7 from Oracle.

Put it in /usr/lib/jvm.

Extract and install it
tar -zxvf jdk-7u40-linux-x64.gz
Rename the folder from jdk1.7.0_40 to jdk1.7.0

You should now have jdk1.7.0 inside /usr/lib/jvm

Set java, javac.
sudo update-alternatives --install "/usr/bin/java" "java" "/usr/lib/jvm/jdk1.7.0/bin/java" 1
sudo update-alternatives --install "/usr/bin/javac" "javac" "/usr/lib/jvm/jdk1.7.0/bin/javac" 1
Correct the permissions.
sudo chmod a+x /usr/bin/java
sudo chmod a+x /usr/bin/javac
sudo chown -R root:root /usr/lib/jvm/jdk1.7.0
If you have more than one version of java, you can always switch them using
sudo update-alternatives --config java
Check your java version.
java -version

Thursday, October 3, 2013

ElasticSearch - Defining the Mapping Schema

The previous posts demonstrate how easy it is to index some words and retrieve them via the REST or Java API.  However, we never really talk about how to tweak the searches to fit our needs.

Consider a subject object with two properties like the following:
{
  "name":"The Old & New British English",
  "code":12345
}
Say we have a list of subjects like the above and we want to index and search subjects with the following requirement:
  1. search by exact subject name
  2. search with stop words removed, accent characters conversion
  3. search with some spelling mistakes allowed
  4. search with some words skipped
  5. search by exact code
Without specifying the mapping, ElasticSearch (ES) will use the standard analyzer.

Before we define the ES schema, let's get familiar with the following terms.

A mapping defines how properties (Ex. "name" and "code" properties above) are indexed and searched through analyzers and tokenizers.

An analyzer is a group of filters executed in-order.
Reference: Analyzers

A filter is a function that transforms data (lowercase, stop-word removal, phonetics).
Reference: Token Filters

When we search/index for the phrase "The Old & New British English", an analyzer will break down the phrase into words through tokenizers. Each word/token is then passed through a bunch of token filters.  For example, a lowercase token filter will normalize the incoming words to lowercased words.

For another explanation, refer to this post for a better understanding of analyzers.

The following defines a simple mapping with index=subjects, id=subject, and two properties (name, code).

curl -X PUT "http://localhost:9200/subjects" -d '
{
  "mappings":{
     "subject":{
          "properties":{
            "name":{
              "type":"string"
            },
          "code":{
              "type":"string"
          }
    }
  }
}'


1.) Search by exact subject name

This is very easy. We will make the "name" field not indexed.

"subject":{
          "properties":{
            "name":{
              "type":"string"
              "index":"not_analyzed"
            }
         }

Let's popular the index.

curl -XPUT http://localhost:9200/subjects/subject/1 -d '
{
  "name":"The Old & New British English",
  "code":12345
}'

Try to do a search on the phrase "The Old & New British English"

curl -X GET "http://localhost:9200/subjects/_search?pretty=true" -d '{
    "query" : {
        "text" : { "name": "The Old & New British English" }
    }
}'

Now try to search with "the Old & New British English" or "Old & New British English". This is not very helpful since most people won't search with case-sensitivity or exact phrases.

Let's delete this mapping.

curl -X DELETE "http://localhost:9200/subjects"


2) Search with stop words removed, accent characters conversion

Let's use a new custom analyzer called "full_name".

curl -X PUT "http://localhost:9200/subjects" -d '
{
  "mappings":{
      "subject":{
          "properties":{
            "name":{
              "type":"string",
              "analyzer":"full_name"
            }
          }
      }
  }
}

To customize the way searches would work, we need to tweak the analyzer settings.  The general form of defining the settings is as follows:

"settings":{
    "analysis":{
        "filter":{
        }
    },
    "analyzer":{
        "full_name":{
            "filter":[
            ],
            "type":"custom",
            "tokenizer":"standard"
        }
    }
}

We want "subject" to be searchable with stop words removed and normalized accent characters (so that the accent e can be searchable by by an 'e').

"settings":{
    "analysis":{
        "filter":{
        }
    },
    "analyzer":{
        "full_name":{
            "filter":[
                "standard",
                "lowercase",
                "asciifolding"
            ],
            "type":"custom",
            "tokenizer":"standard"
        }
    }
}

The lowercase filter normalizes token text to lower case. Since an analyzer is used both in the index time and search time, the lowercase filter will allow case-insensitivity searches.

Let's populate the schema to the ES cluster:

curl -X PUT "http://localhost:9200/subjects" -d '
{
  "mappings":{
      "subject":{
          "properties":{
            "name":{
              "type":"string",
              analyzer:"full_name"
            }
          }
      }
  },
  "settings":{
    "analysis":{
      "analyzer":{
        "full_name":{
          "filter":[
            "standard",
            "lowercase",
            "asciifolding"
          ],
          "type":"custom",
          "tokenizer":"standard"
        }
      }
    }
  }
}'

Populate ES with "The Old & New British English".

Search for the following:
  • "The Old & New British English"
  • "old & new british english"
  • "british english"
  • "british hello english"
  • "engliah"

All of the above, expect the last one, should return the result.


3) Search with some spelling mistakes allowed

To make the search work for "engliah", we need to use the filter edgeNGram.  edgeNGram takes in two parameters: "min_gram", "max_gram".

For the term "apple" with min_gram=3, max_gram=5, ES will index it with:
  • app
  • appl
  • apple
Let's try this.

curl -X PUT "http://localhost:9200/subjects" -d '
{
  "mappings":{
      "subject":{
          "properties":{
            "name":{
              "type":"string",
              "analyzer":"partial_name"
            }
          }
      }
  },
  "settings":{
    "analysis":{
      "filter":{
        "name_ngrams": {
          "max_gram":10,
          "min_gram":2,
          "type": "edgeNGram"
        }
      },
      "analyzer":{
        "partial_name":{
          "filter":[
            "standard",
            "lowercase",
            "asciifolding",
            "name_ngrams"
          ],
          "type":"custom",
          "tokenizer":"standard"
        }
      }
    }
  }
}'

Use _analyze to check how the phrase will be indexed.

curl -X GET "http://localhost:9200/subjects/_analyze?analyzer=partial_name&pretty=true" -d 'The Old & New British English'

Try to search for the term "engliah".  You should see the result showing up.


4) Search with some words skipped

This is already working by 3) above.


5) Search by exact code

"subject":{
          "properties":{
            "code":{
              "type":"string"
              "index":"not_analyzed"
            }
         }

You can accomplish this with 1) or 2) above.  For the purpose of accomplishing the exact search, if case-sensitivity is important for you, use 1), else use 2).  I am opting 1) above.


Putting all these together

To accommodate for different search formats, we need to specify "subject" as a multi-field.

"subject":{
      "properties":{
        "name":{
          "fields":{
            "name":{
              "type":"string",
              "index":"not_analyzed"
            },
            "partial":{
                "type":"string",
                "search_analyzer":"full_name",
                "index_analyzer":"partial_name"
             }
          },
          "type":"multi_field"
        }
      }

You can access "name" by "name.name", or just "name".  This is the default field for "name" and it is defaulted to "full_name" - exact search.

You can access "partial" by "name.partial".  This is the NGram search (spelling mistakes allowed).  We are indexing the words with NGram variations, but using the exact term to search.

For example, consider a search for the term "app" within a data store with the following:
apples
appetizer
apes

If both search_analyzer and index_analyzer are using "partial_name", all three terms above will be returned.

If the search_analyzer is "full_name" and index_analyzer is "partial_name", then only "apples" and "appetizer" will be returned.  This is the desired case.

Now putting the mapping all together:

curl -X PUT "http://localhost:9200/subjects" -d '
{
  "mappings":{
      "subject":{
          "properties":{
            "name":{
              "fields":{
                  "name":{
                      "type":"string",
                      "analyzer":"full_name"
                  },
                  "partial":{
                      "type":"string",
                      "search_analyzer":"full_name",
                      "index_analyzer":"partial_name"
                  }
              }
            },
            "code":{
                "type":"string",
                "analyzer":"full_name"
            }
          }
      }
  },
  "settings":{
    "analysis":{
      "filter":{
        "name_ngrams": {
          "max_gram":10,
          "min_gram":2,
          "type": "edgeNGram"
        }
      },
      "analyzer":{
        "full_name":{
          "filter":[
            "standard",
            "lowercase",
            "asciifolding"
          ],
          "type":"custom",
          "tokenizer":"standard"
        },
        "partial_name":{
          "filter":[
            "standard",
            "lowercase",
            "asciifolding",
            "name_ngrams"
          ],
          "type":"custom",
          "tokenizer":"standard"
        }
      }
    }
  }
}'