Here's the specs:
- Amazon Linux AMI 2013.09
- Medium instance
- 64-bit machine
- Elastic Search 0.90.5
- Spring MVC
- Maven
Begin by launching an instance. You may get an out of memory error in /var/log/syslog if you use a micro instance when you launch a machine. If you are not sure how to launch an instance, read Amazon EC2 - Launching Ubuntu Server 12.04.1 LTS step by step guide.
For the security group, you will need to open the following ports:
- 22 (SSH)
- 9300 (ElasticSearch Transport)
- 9200 (HTTP Testing)
Attach Two EBS drives
We will be using one for saving data and one for logging. Create and attach two EBS drives in the AWS console.
You will have two volumes: /dev/xvdf and /dev/xvdg. Let's format them using XFS.
yum -y install xfsprogs xfsdumpMake the data drive /vol. Make the log drive /vol1.
sudo mkfs.xfs /dev/xvdf
sudo mkfs.xfs /dev/xvdg
vi /etc/fstabAppend the following:
/dev/xvdf /vol xfs noatime 0 0Mount the drives
/dev/xvdg /vo1 xfs noatime 0 0
mkdir /volRead Amazon EC2 - Mounting a EBS drive for more information.
mkdir /vol1
mount /vol
mount /vol1
ssh into the instance
ssh -i {key} ubuntu@{ec2_public_address}
Update the machine
sudo yum -y update
Install Oracle Sun Java
In order to run ES efficiently, a JVM must be able to allocate large virtual address space and perform garbage collection on large heaps without pausing JVM. There are also some stories online talking about OpenJDK is not as good as Oracle Java for ES. Feel free to let me know in the comments below if this is not the case.
Download Java 7 from Oracle.
Put it in /usr/lib/jvm.
Extract and install it
tar -zxvf jdk-7u40-linux-x64.gzRename the folder from jdk1.7.0_40 to jdk1.7.0
You should now have jdk1.7.0 inside /usr/lib/jvm
Set java, javac.
sudo /usr/sbin/alternatives --install "/usr/bin/java" "java" "/usr/lib/jvm/jdk1.7.0/bin/java" 1
sudo /usr/sbin/alternatives --install "/usr/bin/javac" "javac" "/usr/lib/jvm/jdk1.7.0/bin/javac" 1
Correct the permissions.
sudo chmod a+x /usr/bin/java
sudo chmod a+x /usr/bin/javac
sudo chown -R root:root /usr/lib/jvm/jdk1.7.0
Set to the Sun Java by:
sudo /usr/sbin/alternatives --config java
Check your java version.
java -version
Download and install ElasticSearch
Download ElasticSearch (Current version as of this writing is 0.90.5).
sudo su
mkdir /opt/tools
cd /opt/tools
wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.5.zip
unzip elasticsearch-0.90.5.zip
Install ElasticSearch Cloud AWS plugin.
cd elasticsearch-0.90.5
bin/plugin -install elasticsearch/elasticsearch-cloud-aws/1.15.0
Configuring ES
AWS can shut down your instances at any time. If you are storing indexed data in ephemeral drives, you will lose all the data when all the instances are shut down.
There are were two ways to persist data:
- Store data in EBS via local gateway
- Store data in S3 via S3 gateway
A restart of the nodes would begin to recover data from the gateway. The EBS route is better for performance, while the S3 route is better for persistence [S3 is deprecated].
We will be setting up a ES cluster and use a local gateway. S3 gateway is deprecated at the time of this writing. The ES team has promised a new backup mechanism in the future.
vi /opt/tools/elasticsearch-0.90.5/config/elasticsearch.yml
cluster.name: mycluster
cloud:
aws:
access_key:
secret_key:
region: us-east-1
region: us-east-1
discovery:
type: ec2
We have specified a cluster called "mycluster" above. You will need to input your aws access keys and create a S3 bucket.
We also need to ensure the JVM does not swap by doing two things:
1) Locking the memory (find this setting inside elasticsearch.yml)
bootstrap.mlockall: true
2) Set ES_MIN_MEM and ES_MAX_MEM to the same value. It is also recommended to set them to half of the system's available ram. We will set this in the ElasticSearch Service Wrapper later in the article.
Create the data and log paths.
mkdir /vol/elasticsearch/dataSet the data and log paths in /config/elasticsearch.yml
mkdir /vol1/elasticsearch/log
path.data: /vol/elasticsearch/data
path.logs: /vol1/elasticsearch/logs
Let's edit config/logging.yml
vi /opt/tools/elasticsearch-0.90.5/config/logging.yml
Edit these settings and make sure these lines are uncommented and present
logger:
gateway: DEBUG
org.apache: WARN
discovery: TRACE
Testing the cluster
Installing ElasticSearch as a Service
We will be using the ElasticSearch Java Service Wrapper.
Download the service wrapper and move it to bin/service.
Tweaking the memory settings
There will be three settings you want to care about:
We will be tweaking these settings in the service wrapper's elasticsearch.conf instead of elasticsearch's.
vi /opt/tools/elasticsearch-0.90.5/bin/service/elasticsearch.conf
set.default.ES_HEAP_SIZE=1024
There are a few things you need to beware of.
bin/elasticsearch -fBrowse to the ec2 address at port 9200
http://ec2-XX-XXX-XXX-XXX.compute-1.amazonaws.com:9200/You should see the following:
{ "ok" : true, "status" : 200, "name" : "Storm", "version" : { "number" : "0.90.5", "build_hash" : "c8714e8e0620b62638f660f6144831792b9dedee", "build_timestamp" : "2013-09-17T12:50:20Z", "build_snapshot" : false, "lucene_version" : "4.4" }, "tagline" : "You Know, for Search"}
Installing ElasticSearch as a Service
We will be using the ElasticSearch Java Service Wrapper.
Download the service wrapper and move it to bin/service.
curl -L -k http://github.com/elasticsearch/elasticsearch-servicewrapper/tarball/master | tar -xzMake ElasticSearch to start automatically when system reboots.
mv/service /opt/tools/elasticsearch-0.90.5/bin
bin/service/elasticsearch installMake ElasticSearch Service a defaul command (we will call this es_service)
ln -s /opt/tools/elasticsearch-0.90.5/bin/service/elasticsearch /usr/bin/es_serviceStart the service
es_service startYou should see:
Starting ElasticSearch...
Waiting for ElasticSearch......
running: PID:2503
Tweaking the memory settings
There will be three settings you want to care about:
- ES_HEAP_SIZE
- ES_MIN_MEM
- ES_MAX_MEM
It is recommended to set ES_MIN_MEM to be the same as ES_MAX_MEM. However, you can just set ES_HEAP_SIZE as it will be assigned to both ES_MIN_MEM and ES_MAX_MEM.
We will be tweaking these settings in the service wrapper's elasticsearch.conf instead of elasticsearch's.
vi /opt/tools/elasticsearch-0.90.5/bin/service/elasticsearch.conf
set.default.ES_HEAP_SIZE=1024
There are a few things you need to beware of.
- You need to leave some memory for the OS for non elasticsearch operations. Try leaving at least half of the available memory.
- As a reference, use 1024Mb for every 1 million documents you are saving.
Restart the service.
Thank you so much for doing this! Brilliant!
ReplyDeletehow to make sure that port 9300 is working?
ReplyDelete