Showing posts with label ec2. Show all posts
Showing posts with label ec2. Show all posts

Thursday, July 2, 2015

Upgrading php5.4 to php5.5 in Amazon EC2

First stop apache, nginx, php-fpm if you are running them.

List all the php 5.4 modules:

> yum list installed | grep php54

php54.x86_64                        5.4.21-1.46.amzn1              @amzn-updates
php54-bcmath.x86_64                 5.4.21-1.46.amzn1              @amzn-updates
php54-cli.x86_64                    5.4.21-1.46.amzn1              @amzn-updates
php54-common.x86_64                 5.4.21-1.46.amzn1              @amzn-updates
php54-devel.x86_64                  5.4.21-1.46.amzn1              @amzn-updates
php54-fpm.x86_64                    5.4.21-1.46.amzn1              @amzn-updates
php54-gd.x86_64                     5.4.21-1.46.amzn1              @amzn-updates
php54-intl.x86_64                   5.4.21-1.46.amzn1              @amzn-updates
php54-mbstring.x86_64               5.4.21-1.46.amzn1              @amzn-updates
php54-mcrypt.x86_64                 5.4.21-1.46.amzn1              @amzn-updates
php54-mysqlnd.x86_64                5.4.21-1.46.amzn1              @amzn-updates
php54-pdo.x86_64                    5.4.21-1.46.amzn1              @amzn-updates
php54-pecl-apc.x86_64               3.1.13-1.12.amzn1              @amzn-updates
php54-pecl-igbinary.x86_64          1.1.2-0.2.git3b8ab7e.6.amzn1   @amzn-updates
php54-pecl-memcache.x86_64          3.0.7-3.10.amzn1               @amzn-updates
php54-pecl-memcached.x86_64         2.1.0-1.5.amzn1                @amzn-updates
php54-pecl-xdebug.x86_64            2.2.1-1.6.amzn1                @amzn-updates
php54-process.x86_64                5.4.21-1.46.amzn1              @amzn-updates
php54-soap.x86_64                   5.4.21-1.46.amzn1              @amzn-updates
php54-xml.x86_64                    5.4.21-1.46.amzn1              @amzn-updates
php54-xmlrpc.x86_64                 5.4.21-1.46.amzn1              @amzn-updates

Remove all of them:

yum remove php54.x86_64 php54-bcmath.x86_64 php54-cli.x86_64 php54-common.x86_64 php54-devel.x86_64 php54-fpm.x86_64 php54-gd.x86_64 php54-intl.x86_64 php54-mbstring.x86_64 php54-mcrypt.x86_64 php54-mysqlnd.x86_64 php54-pdo.x86_64 php54-pecl-apc.x86_64 php54-pecl-igbinary.x86_64 php54-pecl-memcache.x86_64 php54-pecl-memcached.x86_64 php54-pecl-xdebug.x86_64 php54-process.x86_64 php54-soap.x86_64 php54-xml.x86_64 php54-xmlrpc.x86_64

Install php 5.5

yum install php55.x86_64 php55-bcmath.x86_64 php55-cli.x86_64 php55-common.x86_64 php55-devel.x86_64 php55-fpm.x86_64 php55-gd.x86_64 php55-intl.x86_64 php55-mbstring.x86_64 php55-mcrypt.x86_64 php55-mysqlnd.x86_64 php55-pdo.x86_64 php55-pecl-apc.x86_64 php55-pecl-igbinary.x86_64 php55-pecl-memcache.x86_64 php55-pecl-memcached.x86_64 php55-pecl-xdebug.x86_64 php55-process.x86_64 php55-soap.x86_64 php55-xml.x86_64 php55-xmlrpc.x86_64

You may need to tweak the php-fpm settings

Friday, November 15, 2013

Munin not generating graphs - Make sure CRON job is running

I am currently using Ubuntu 12.04 on EC2.

If your munin master is not running, you should check if munin is set up as a CRON job.

List all the scheduled cron jobs:
crontab -l
If munin-cron is not set up, we will add it. Edit the crontab file
crontab -e
Let's make munin master run every 5 mins. Append the following to the end of the file
*/5 * * * * /usr/bin/munin-cron
Let's make munin run.
sudo -u munin munin-cron

Wednesday, October 9, 2013

Elastic Search on EC2 - Install ES cluster on Amazon Linux AMI

We will install ElasticSearch (ES) on a EC2 instance.

Here's the specs:
  • Amazon Linux AMI 2013.09
  • Medium instance
  • 64-bit machine
  • Elastic Search 0.90.5
  • Spring MVC
  • Maven
Begin by launching an instance.  You may get an out of memory error in /var/log/syslog if you use a micro instance when you launch a machine.  If you are not sure how to launch an instance, read Amazon EC2 - Launching Ubuntu Server 12.04.1 LTS step by step guide.

For the security group, you will need to open the following ports:
  • 22 (SSH)
  • 9300 (ElasticSearch Transport)
  • 9200 (HTTP Testing)

Attach Two EBS drives

We will be using one for saving data and one for logging.  Create and attach two EBS drives in the AWS console.

You will have two volumes: /dev/xvdf and /dev/xvdg.  Let's format them using XFS.
yum -y install xfsprogs xfsdump
sudo mkfs.xfs /dev/xvdf
sudo mkfs.xfs /dev/xvdg
Make the data drive /vol. Make the log drive /vol1.
vi /etc/fstab
Append the following:
/dev/xvdf /vol xfs noatime 0 0
/dev/xvdg /vo1 xfs noatime 0 0
Mount the drives
mkdir /vol
mkdir /vol1
mount /vol
mount /vol1
Read Amazon EC2 - Mounting a EBS drive for more information.

ssh into the instance
ssh -i {key} ubuntu@{ec2_public_address}

Update the machine
sudo yum -y update

Install Oracle Sun Java

In order to run ES efficiently, a JVM must be able to allocate large virtual address space and perform garbage collection on large heaps without pausing JVM.  There are also some stories online talking about OpenJDK is not as good as Oracle Java for ES.  Feel free to let me know in the comments below if this is not the case.

Download Java 7 from Oracle.

Put it in /usr/lib/jvm.

Extract and install it
tar -zxvf jdk-7u40-linux-x64.gz
Rename the folder from jdk1.7.0_40 to jdk1.7.0

You should now have jdk1.7.0 inside /usr/lib/jvm

Set java, javac.
sudo /usr/sbin/alternatives --install "/usr/bin/java" "java" "/usr/lib/jvm/jdk1.7.0/bin/java" 1
sudo /usr/sbin/alternatives --install "/usr/bin/javac" "javac" "/usr/lib/jvm/jdk1.7.0/bin/javac" 1
Correct the permissions.
sudo chmod a+x /usr/bin/java
sudo chmod a+x /usr/bin/javac
sudo chown -R root:root /usr/lib/jvm/jdk1.7.0
Set to the Sun Java by:
sudo /usr/sbin/alternatives --config java
Check your java version.
java -version

Download and install ElasticSearch

Download ElasticSearch (Current version as of this writing is 0.90.5).
sudo su
mkdir /opt/tools
cd /opt/tools
wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.5.zip
unzip elasticsearch-0.90.5.zip
Install ElasticSearch Cloud AWS plugin.
cd elasticsearch-0.90.5
bin/plugin -install elasticsearch/elasticsearch-cloud-aws/1.15.0

Configuring ES

AWS can shut down your instances at any time.  If you are storing indexed data in ephemeral drives, you will lose all the data when all the instances are shut down.

There are were two ways to persist data:
  • Store data in EBS via local gateway
  • Store data in S3 via S3 gateway
A restart of the nodes would begin to recover data from the gateway. The EBS route is better for performance, while the S3 route is better for persistence [S3 is deprecated].

We will be setting up a ES cluster and use a local gateway. S3 gateway is deprecated at the time of this writing.  The ES team has promised a new backup mechanism in the future.

vi /opt/tools/elasticsearch-0.90.5/config/elasticsearch.yml

cluster.name: mycluster
cloud:
    aws:
        access_key:
        secret_key:
        region: us-east-1
discovery:
    type: ec2

We have specified a cluster called "mycluster" above. You will need to input your aws access keys and create a S3 bucket.

We also need to ensure the JVM does not swap by doing two things:

1) Locking the memory (find this setting inside elasticsearch.yml)
bootstrap.mlockall: true
2) Set ES_MIN_MEM and ES_MAX_MEM to the same value. It is also recommended to set them to half of the system's available ram. We will set this in the ElasticSearch Service Wrapper later in the article.

Create the data and log paths.
mkdir /vol/elasticsearch/data
mkdir /vol1/elasticsearch/log
Set the data and log paths in /config/elasticsearch.yml
path.data: /vol/elasticsearch/data
path.logs: /vol1/elasticsearch/logs 
Let's edit config/logging.yml
vi /opt/tools/elasticsearch-0.90.5/config/logging.yml
Edit these settings and make sure these lines are uncommented and present

logger:
  gateway: DEBUG
  org.apache: WARN
  discovery: TRACE


Testing the cluster
bin/elasticsearch -f
Browse to the ec2 address at port 9200
http://ec2-XX-XXX-XXX-XXX.compute-1.amazonaws.com:9200/
You should see the following:
{
  "ok" : true,
  "status" : 200,
  "name" : "Storm",
  "version" : {
    "number" : "0.90.5",
    "build_hash" : "c8714e8e0620b62638f660f6144831792b9dedee",
    "build_timestamp" : "2013-09-17T12:50:20Z",
    "build_snapshot" : false,
    "lucene_version" : "4.4"
  },
  "tagline" : "You Know, for Search" 
}


Installing ElasticSearch as a Service

We will be using the ElasticSearch Java Service Wrapper.

Download the service wrapper and move it to bin/service.
curl -L -k http://github.com/elasticsearch/elasticsearch-servicewrapper/tarball/master | tar -xz
mv /service /opt/tools/elasticsearch-0.90.5/bin
Make ElasticSearch to start automatically when system reboots.
bin/service/elasticsearch install
Make ElasticSearch Service a defaul command (we will call this es_service)
ln -s /opt/tools/elasticsearch-0.90.5/bin/service/elasticsearch /usr/bin/es_service
Start the service
es_service start
You should see:
Starting ElasticSearch...
Waiting for ElasticSearch......
running: PID:2503 

Tweaking the memory settings

There will be three settings you want to care about:

  • ES_HEAP_SIZE
  • ES_MIN_MEM
  • ES_MAX_MEM
It is recommended to set ES_MIN_MEM to be the same as ES_MAX_MEM.  However, you can just set ES_HEAP_SIZE as it will be assigned to both ES_MIN_MEM and ES_MAX_MEM.


We will be tweaking these settings in the service wrapper's elasticsearch.conf instead of elasticsearch's.

vi /opt/tools/elasticsearch-0.90.5/bin/service/elasticsearch.conf

set.default.ES_HEAP_SIZE=1024

There are a few things you need to beware of.

  1. You need to leave some memory for the OS for non elasticsearch operations. Try leaving at least half of the available memory.
  2. As a reference, use 1024Mb for every 1 million documents you are saving.
Restart the service.

Ubuntu EC2 - Install Sun Oracle Java

Download Java 7 from Oracle.

Put it in /usr/lib/jvm.

Extract and install it
tar -zxvf jdk-7u40-linux-x64.gz
Rename the folder from jdk1.7.0_40 to jdk1.7.0

You should now have jdk1.7.0 inside /usr/lib/jvm

Set java, javac.
sudo update-alternatives --install "/usr/bin/java" "java" "/usr/lib/jvm/jdk1.7.0/bin/java" 1
sudo update-alternatives --install "/usr/bin/javac" "javac" "/usr/lib/jvm/jdk1.7.0/bin/javac" 1
Correct the permissions.
sudo chmod a+x /usr/bin/java
sudo chmod a+x /usr/bin/javac
sudo chown -R root:root /usr/lib/jvm/jdk1.7.0
If you have more than one version of java, you can always switch them using
sudo update-alternatives --config java
Check your java version.
java -version

Monday, September 30, 2013

ElasticSearch Query - how to insert and retreive search data

ElasticSearch uses HTTP Methods (ex. GET, POST, PUT, DELETE) to retrieve, save, and delete search data from its index.

For simplicity, we will use curl to demonstrate some usages. If you haven't done so already, start ElasticSearch in your terminal.


Adding a document

We will send a HTTP POST request to add the subject "sports" to an index. The request will have the following form:
curl -XPOST "http://localhost:9200/{index}/{type}/{id}" -d '{"key0":  "value0", ... , "keyX": "valueX"}'
Example:
curl -XPOST "http://localhost:9200/subjects/subject/1" -d '{"name":  "sports",  "creator": {"first_name":"John", "last_name":"Smith"}}'

Retrieving the document

We can get back the document by sending a GET request.
curl -X GET "http://localhost:9200/subjects/_search?q=sports"
We can also use a POST request to query the above.
curl -X POST "http://localhost:9200/subjects/_search" -d '{
"query": {"term":{"name":"sports"}}
}'
Both of the above will give you the following:
{"took":1,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":0.30685282,"hits":[{"_index":"subjects","_type":"subject","_id":"1","_score":0.30685282, "_source" : {"name":  "sports"}}]}}
The _source filed above holds the results for the query.

To search based on the nested properties (Ex. first_name, last_name), we can do the following:
curl -XGET "http://localhost:9200/subjects/_search?q=subject.creator.first_name:John"
curl -XGET "http://localhost:9200/subjects/subject/_search?q=creator.first_name:John"
curl -XGET "http://localhost:9200/subjects/subject/_search?q=subject.creator.first_name:John" 
All the above queries will return the same results.


Deleting the document

Similarly, we can delete the subject index by a DELETE request.
curl -X DELETE "http://localhost:9200/subjects"

Creating Document with settings and mappings

If you want to adjust settings like number of shards and replicas, you may find the following useful. The more shards you have, the better the indexing performance. The more replicas you have, the better the searching performance.
curl -X PUT "http://localhost:9200/subjects" -d '
{"settings":{"index":{"number_of_shards":3, "number_of_replicas":2}}},
{"mappings":{"document": {
                             "properties": {
                                 "name" : {"type":string, "analyzer":"full_text"}
                             }
                         }
                       }
}'
The above created an index called subjects. Each document in the index has a property called name.


Checking the Mapping
curl -X GET "http://localhost:9200/subjects/_mapping?pretty=true"
You should see
{
  "subjects" : { }
}
The pretty parameter above just formats the JSON result in a human readable format.

How to Install ElasticSearch on EC2

Search is not easy. There are a lot of things you need to consider.

In the software level,

Can a search query have spelling mistakes?
Should stop words (Ex. a, the) be filtered?
What about a phrase search given non-exact phrase?

In the operation level,

Should the search be decoupled from the app machines?
Should the search be distributed? If so, how many shards, replicas should be there?

Doing a quick search would tell you that Apache Lucene is the industry standard. There are two popular abstractions on top of Lucene: Solr and ElasticSearch (ES).

There are a lot of debates on which one should be used. I choose ES because
  • it's distributed by design
  • easier to integrate for AWS EC2

The following post will talk about how you can install ElasticSearch in your linux machine (I like to use the ubuntu 12.04 build from EC2).

Download elasticsearch from elasticsearch.org. Extract the files and put it into a folder of your choice (Ex. /opt/tools).
cd /opt/tools
wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.5.zip
unzip elasticsearch-0.90.5.zip
You can start elasticsearch by:
bin/elasticsearch -f
You may want to tweak the Xmx (max memory size the heap can reach for the JVM) and Xms (the inistal heap memory size for the JVM) values.
bin/elasticsearch -f -Xmx2g -Xms2g -Des.index.storage.type=memory -Des.max-open-files=true
You can also run it as a service using the script located in bin/service.

After you started your service, visit "http://localhost:9200" in the browser. You should see the following:

{
  "ok" : true,
  "status" : 200,
  "name" : "Solitaire",
  "version" : {
    "number" : "0.90.5",
    "build_hash" : "c8714e8e0620b62638f660f6144831792b9dedee",
    "build_timestamp" : "2013-09-17T12:50:20Z",
    "build_snapshot" : false,
    "lucene_version" : "4.4"
  },
  "tagline" : "You Know, for Search"
}

Wednesday, July 17, 2013

Ansbile EC2 - setting up Nginx, MySQL, php, git

In this post, we will write a playbook that's going to set up a EC2 machine for a fully workable php environment.

Starting from a fresh machine with an attached ebs volume, we will do the following:

  1. Format the new ebs volume with XFS and mount it as /vol
  2. Install php, mysql and nginx
  3. Create a mysql user and create a database
  4. Copy the public and private keys into the targeted machine
  5. Checkout a project from github

Begin by spinning a fresh EC2 AMI and attach a ebs volume to it. Read Ansible - how to launch EC2 instances and setup the php environment.


Format the new ebs volume with XFS and mount it as /vol

We will mount the new ebs volume /dev/xvdf as /vol and format it with XFS

    - name: update machine with latest packages
      action: command yum -y update
    - name: install xfsprogs
      action: yum pkg=xfsprogs state=latest
    - name: format new volume
      filesystem: fstype=xfs dev=/dev/xvdf
    - name: edit fstab and mount the vol
      action: mount name={{mount_dir}} src=/dev/xvdf opts=noatime fstype=xfs state=mounted


Install php, mysql and nginx

    - name: install php
      action: yum pkg=php state=latest
    - name: install php-mysql
      action: yum pkg=php-mysql state=latest
    - name: install nginx
      action: yum pkg=nginx state=latest
    - name: ensure nginx is running
      action: service name=nginx state=started
    - name: install mysql server
      action: yum pkg=mysql-server state=latest
    - name: make sure mysql is running
      action: service name=mysqld state=started


Create a mysql user and a database

    - name: install python mysql
      action: yum pkg=MySQL-python state=latest
    - name: create database user
      action: mysql_user user=admin password=1234qwer priv=*.*:ALL state=present
    - name: create db
      action: mysql_db db=ansible state=present


Copy the public and private keys into the targeted machine

We want the target machine to be able to do a git pull without username and password prompts.

mkdir ~/.ssh
ssh-keygen -t rsa -C "you@email.com"

You will see:
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Just press Enter on the above prompts.

Two files will be generated: id_rsa, id_rsa.pub

Log in to Github and then Go to Account Settings -> SSH Keys

Add new key by giving it a name and pasting the content of id_rsa.pub

Test it by:
ssh -T git@github.com
Here are the Ansible tasks:

    - name: install git
      action: yum pkg=git state=latest
    - name: copy private key
      action: template src=~/.ssh/id_rsa.pub dest=~/.ssh/id_rsa.pub
    - name: copy public key
      action: template src=~/.ssh/id_rsa dest=~/.ssh/id_rsa


Checkout a project from github

    - name: git checkout source
      action: git repo=ssh://git@github.com:{your_git_repo}.git dest={{work_dir}} version=unstable


Full Ansible Playbook source:

Tuesday, July 16, 2013

Ansible - how to launch EC2 instances and setup the php environment

In this post, we will create a script that will launch an instance in the EC2 cloud and install php and nginx (Installing httpd is going to be very similar) on it.

First you will need to set be Ansible.

If you are using ubuntu, read Install Ansible on ubuntu EC2.

If you are using a Mac, read Installing and Running Ansible on Mac OSX and pinging ec2 machines.

You must:
  • have python boto installed
  • set up the AWS access keys in the environment settings
Adding a host

We will use the ec2 module. It runs against localhost, so we will add a host entry.

vi /etc/ansible/hosts

Append the following:

localhost ansible_connection=local

Launching a micro instance



Label this launch_playbook.yml

Execute the script.
ansible-playbook launch_playbook.yml
In your AWS EC2 console, you will see an instance named ansible. Each task is executed in sequence.

Now add this new host in the ansible host file and label it webservers.

vi /etc/ansible/hosts
[webservers]
{the_ip_of_ec2_instance_we_just_created} ansible_connection=ssh ansible_ssh_user=ec2-user ansible_ssh_private_key_file={path_to_aws_private_key}
You don't have to do the above. In fact, you can use the group name "ec2-servers" for the following script. But the following script will need to be in the same file as the first script. I am just separating these files for easier configuration in the future.


Installing php, nginx, mysql

Label this configure_playbook.yml

Execute the script.
ansible-playbook configure_playbook.yml
Go to the public address of this instance. You should see the nginx welcoming message.

Remember to terminate the instance when you finish, else it will incur charges.

Install Ansible on ubuntu EC2

Begin by spinning a new EC2 ubuntu instance.


Install Ansible and its dependencies
sudo apt-get install python-pip python-dev
sudo pip install ansible
sudo apt-get install python-boto 
Make sure boto version is larger than 2.3

To check boto version:
pip freeze | grep boto

Make the hosts file
sudo mkdir /etc/ansible
sudo touch /etc/ansible/hosts
Put the IPs of your machines in the hosts file.

Ex. [webservers] is a group name for the 2 IPs below.
[webservers]
255.255.255.255
111.111.111.111

Check the Playbook Settings

ansible playbook playbook.yml --list-hosts

You will see the servers that the Playbook will run against:

  play #1 (create instances): host count=1
    localhost

  play #2 (configure instances): host count=0


Play the Playbook

ansible-playbook playbook.yml


AWS credentials

If you are going to use the ec2 module, you will need to set up the access keys in your environment.
vi ~/.bashrc
Append the following with your keys (You need to log in to your AWS console to get the access key pairs)
export AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY}
export AWS_SECRET_ACCESS_KEY=${AWS_SECRET_KEY}

Saturday, July 13, 2013

Installing and Running Ansible on Mac OSX and pinging ec2 machines

We will be installing Ansible from Git.


Install Ansible

Download ez_setup.py.

Install ez_setup, pip, jinja2 and ansible.
sudo python ez_setup.py
sudo easy_install pip
sudo pip install ansible
sudo pip install jinja2

Define your host file

Create the file /etc/ansible/hosts.

Put the IP of each machine you want to ping.

Example:
[appservers]
255.255.255.255 ansible_ssh_private_key_file={your_key_path}.pem  ansible_ssh_user=ec2-user
Change the IP to your EC2 instance's IP. The [appservers] is just a label for grouping. You may have servers grouped as web servers, app servers, db servers, etc.


Run Ansible

ansible all -m ping

You will see a response similar to the following if it's successful.
255.255.255.255 | success >> {
    "changed": false,
    "ping": "pong"
Let's execute a command on all the machines:

ansible all -a "/bin/echo hello"

You will see:
255.255.255.255 | success | rc=0 >>
hello

Saving the key in memory

If you don't specify the ansible_ssh_private_key_file and ansible_ssh_user attributes in the inventory file above. You can either 1.) specify the key and user in the ansible command or 2.) use ssh-agent.

1.) Explicitly specifying the user and key:
ansible all -m ping -u ec2-user --private-key={your_key}.pem
2.) Using ssh-agent and ssh-add
ssh-agent bash
ssh-add ~/.ssh/{your_key}.pem
Then you can ping the ec2 server like this:
ansible all -m ping -u ec2-user

Tuesday, April 16, 2013

Scaling Pinterest from 0 to 1 billion

The following is a very informative link sharing how Pinterest scaled from 0 to 1 billion users in under two years. Throughout the years, they have tried different technologies and abandoned some.

Below are some key points:

  • an architecture is good when growth can be handled by adding more of the same staff (machines)
  • when you push a technology to the limit, it will fail in its own special way
  • the stack used are MySQL with sharding, Python, Amazon EC2, S3, Akamai, elastic load balancer, memcache, Redis


Notice that they dropped Cassandra, and Rackspace.

Here's the link:

http://highscalability.com/blog/2013/4/15/scaling-pinterest-from-0-to-10s-of-billions-of-page-views-a.html?utm_source=feedly

Saturday, February 9, 2013

Micro Instance out of memory - add swap

I was trying to update my Symfony project and I got the following while I was trying to update the database schema or assets:
Fatal error: Uncaught exception 'ErrorException' with message 'Warning: proc_open(): fork failed - Cannot allocate memory in
An Amazon EC2 micro.t1 instance only has 613MB RAM. It is not enough to run a lot of processes.

What I can do is to 1) switch to a small instance or 2) add a 1GB swap in disk.


Here are the commands to add a 1GB swap


sudo /bin/dd if=/dev/zero of=/var/swap.1 bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 34.1356 s, 31.5 MB/s

sudo /sbin/mkswap /var/swap.1
Setting up swapspace version 1, size = 1048572 KiB
no label, UUID=9cffd7c9-8ec6-4f6c-8eea-79aa3173a59a
sudo /sbin/swapon /var/swap.1


To turn off the swap do the following:

sudo /sbin/swapoff /var/swap.1

Saturday, February 2, 2013

EC2 instance terminated from custom AMI

If every time your instance terminates when you try to launch it from an AMI, the AMI is probably corrupted.

What you need to do is to recreate the AMI from the original instance.

Make sure all processes are turned down except the ssh daemon.

Run netstat -tupln

Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      701/sshd
tcp6       0      0 :::22                   :::*                    LISTEN      701/sshd
udp        0      0 0.0.0.0:68              0.0.0.0:*                           491/dhclient3

Make sure only the above processes are running and then create the AMI.

How to deal with log files in AWS EC2

When you are launching your application for production, it's best to keep the logs in a separate drive.

What I typically do is to mount 2 EBS volumes on a instance. One for source code; one for log files.

For convenience, you can mount the source code directory as /var/www and the log files directory as /var/log.

If you are looking for information on how to mount and format a volume, read Amazon EC2 - Mounting a EBS drive.

Tuesday, January 29, 2013

Setting up Cassandra Multi Nodes on Amazon EC2

Cassandra is a NoSQL database. It is designed to be launched in a cluster of machines, providing high availability and fault tolerance.

Before starting, make sure you scan through Node and Cluster Initialization Properties.


Important Node attributes:

cluster_name
All nodes in a cluster must have the same name.

commitlog_directory
Datastax recommends to put this into a separate disk partition (Perhaps, EBS).

data_file_directories
Stores the column family data.

partitoner
defaults to RandomPartitioner

rpc_address
set to 0.0.0.0 to listen on all configured interfaces

rpc_port
Port for Thrift server. Default is 9160.

saved_caches_directory
Location where column family key and row caches will be stored.

seeds
Nodes that contain information about the ring topology and obtain gossip information.

storage_port
Port for inter-node communication. Default is 7000.


Create a Large Instance

You will need to launch at least a large instance. See Cassandra Hardware for more details. If you have a smaller instance, you may not be able to use /etc/init.d/cassandra start command and you will see a JVM heap memory error.

Ideal Cassandra Instance Specs:

  • 32 GB RAM (Minimum 4GB)
  • 8-core cpu
  • 2 disks (one for CommitLogDirectory, one for DataFileDirectories)
  • RAID 0 for DataFileDirectories disk when disk capacity is 50% full
  • XFS file system
  • Minimum 3 replications (instances)


We will NOT be using EBS due to the bad I/O throughput and reliability. We will use the ephemeral volume instead. Click here for more details.


Security Group

PortDescription
Public Facing Ports
22SSH port.
8888OpsCenter website port.
Cassandra Inter-node Ports
1024+JMX reconnection/loopback ports. See description for port 7199.
7000Cassandra inter-node cluster communication.
7199Cassandra JMX monitoring port. After the initial handshake, the JMX protocol requires that the client reconnects on a randomly chosen port (1024+).
9160Cassandra client port (Thrift).
OpsCenter ports
61620OpsCenter monitoring port. The opscenterd daemon listens on this port for TCP traffic coming from the agent.
61621OpsCenter agent port. The agents listen on this port for SSL traffic initiated by OpsCenter.

Create a Security Group with the settings above.

Port 22 and 8888 will be 0.0.0.0/0.
1024-65535 will be your group id. (Click on Details tab on a Security Group to check your group id)
All other ports will be your group id.


Mounting the ephemeral drive

We will begin by formatting the ephemeral drive with XFS.

Use fdisk -l to check what's your ephemeral drive. It may come with ext3 already.

umount /dev/xvdb
mkfs.xfs -f /dev/xvdb

vi /etc/fstab

Remove the original entry and put

/vol xfs noatime 0 0

sudo mount /vol

You may also want to use RAID-0 to strip a set of ephemeral volumes.


Install Oracle Sun Java

Do not use OpenJDK. Cassandra works only with Oracle Sun Java.

Download jdk-6u38-linux-x64.bin.

mkdir /usr/java/latest

Upload or wget the JDK in this folder.

chmod +x jdk-6u38-linux-x64.bin

sudo ./jdk-6u38-linux-x64.bin

sudo update-alternatives --install "/usr/bin/java" "java" "/usr/java/latest/jdk1.6.0_38/bin/java" 1

sudo update-alternatives --set java /usr/java/latest/jdk1.6.0_38/bin/java


java -version
java version "1.6.0_38"
Java(TM) SE Runtime Environment (build 1.6.0_38-b05)
Java HotSpot(TM) 64-Bit Server VM (build 20.13-b02, mixed mode)

Make sure JNA is installed. Linux does not swap out the JVM and performance can improve.

sudo apt-get install libjna-java

vi /etc/security/limits.conf

Add:

cassandra soft memlock unlimited
cassandra hard memlock unlimited


Install Cassandra

Begin by installing a single node Cassandra. Read Cassandra - installing on Ubuntu 12.04 Amazon EC2.

Make sure Cassandra is at version 1.2.x and cqlsh is 2.3.x
cassandra -version 
cqlsh --version
We want to save the data in the ephemeral drive. The mount point we created earlier is /vol

mkdir /vol/cassandra
mkdir /vol/cassandra/commitlog
mkdir /vol/cassandra/data
mkdir /vol/cassandra/saved_caches

chown cassandra:cassandra -R /vol/cassandra

vi /etc/cassandra/cassandra.yaml

Point these directories to the ones we created above.

  • commitlog_directory
  • data_file_directories
  • saved_caches_directory

Kill Cassandra if you started with cassandra -f command. We will want to start from init.d

sudo /etc/init.d/cassandra start
sudo /etc/init.d/cassandra stop
sudo /etc/init.d/cassandra status

Use nodetool to check the status:
nodetool -h localhost -p 7199 ring
Reboot and check if it's running by running "netstat -tupln"

If it's not starting, check the log /var/log/cassandra/output.log

If it's complaining about oldClusterName != newClusterName, just remove everything in the data_file_directories.


Create a Cassandra AMI

We will be setting up a ring (multi-node Cassandra). Before you create an AMI, umount /dev/xvdb and comment out the xfs record in /etc/fstab. Else you won't be able to ssh into the instances launched by this image.

Launch a second instance in another availability zone


Setting up a Cassandra Ring

Before you begin, make sure you have the following:

  • Cassandra on each node
  • a cluster name
  • IP of each node
  • seed nodes
  • snitches (EC2Snitch, EC2MultiRegionSnitch)
  • open required firewalls

A snitch is used to determine which data centers and racks are written to and read from, and distribute replicas by grouping machines into data centers and racks.

For Ec2Snitch, a region is treated as a data center, and an availability zone is treated as a rack within a data center.


Setting up Multi Data Center Cassandra Ring

We will begin by tweaking the first node.

cd /etc/cassandra/cassandra.yaml

Set the following:
cluster_name: my_cluster
initial_token: 0
Start Cassandra. If you face any problems starting it, delete all the files in commitlog_directory and data_file_directories.

We will now add a second node in a different region (Ex. if first region is at us-east-1a, then make second region to be at us-east-1d).

ssh into your second instance. Remember to mount the partition back. Use "df" to make sure /dev/xvdb is mounted.
umount /mnt
vi /etc/fstab
uncomment "/dev/xvdb /vol xfs noatime 0 0" and remove entries that are using /dev/xvdb if appropriate
mkfs.xfs -f /dev/xvdb
mount /vol
mkdir /vol/cassandra
mkdir /vol/cassandra/commitlog
mkdir /vol/cassandra/data
mkdir /vol/cassandra/saved_caches
chown cassandra:cassandra -R /vol/cassandra
Now edit /etc/cassandra/cassandra.yaml.

The following needs to be changed on all nodes:
  • seeds
  • rpc_address
  • listen_address
The following needs to be changed on the new node:
  • initial_token
  • auto_bootstrap

Seeds
Add the private IPs of all nodes
- seeds: "10.31.2.31,10.216.218.73"

RPC Address
The address in which clients connect to


Listen Address
The address in which nodes connect with each other

For the first node,
listen_address: 10.31.2.31
rpc_address: 10.31.2.31
For the second node,
listen_address: 10.216.218.73
rpc_address: 10.216.218.73

Initial token (skip to Virtual nodes if you are using Cassandra 1.2.x and above)
This is used for load balancing. The first node should have a value of zero. All other nodes will need to recalculate this value every time a new node joins the cluster.

Calculate this based on the number of nodes. Use the Python problem from Cassandra.

Create a file called token_generator.py. Paste the following in the file.
#! /usr/bin/python
import sys
if (len(sys.argv) > 1):
        num=int(sys.argv[1])
else:
        num=int(raw_input("How many nodes are in your cluster? "))
for i in range(0, num):
        print 'node %d: %d' % (i, (i*(2**127)/num))
Change it to an executable.
chmod 777 token_generator.py
Execute the program with the number of nodes as the first argument. In our case, it's 2.
./token_generator.py 2
The output should be similar to the following
node 0: 0
node 1: 85070591730234615865843651857942052864
Put 85070591730234615865843651857942052864 as the initial token for the second node.
initial_token: '85070591730234615865843651857942052864'
If you get DatabaseDescriptor.java (line 509) Fatal configuration error, you are probably using Cassandra 1.2.x.


Virtual nodes (Cassandra 1.2.x or above)
vnodes are introducted in 1.2.x.

Set num_tokens to 256 and leave initial_token to empty.


Auto bootstrapping
When a new node is added, the cluster will automatically migrate the correct range of data from existing nodes.

Do not set autobootstrap: true and include it in the seed list together.

After all the above setup, start both nodes. Then check if they are up.
nodetool status

PropertyFileSnitch

Set endpoint_snitch: PropertyFileSnitch

We will be using PropertyFileSnitch and define our data centers and racks.

We will use dc1 to represent data center 1 and rac1 to represent rack 1.

Create /etc/cassandra/cassandra-topology.properties on all nodes and place the following:

10.216.218.73=dc1:rac1
10.31.2.31=dc2:rac1
default=dc1:rac1


default=dc1:rac1 is for when a node first joined and it's not specified in file.

Keep in mind that when creating our schema we will be using NetworkTopologyStrategy and use the dc and rac references we used above.

You may want to create an image again.


Testing the cluster replication 

Start Cassandra for both nodes:
service cassandra start
Check the status:


nodetool status

Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address           Load       Tokens  Owns   Host ID                               Rack
UN  10.32.6.31        28.94 KB   256     48.2%  eab0379f-2ac6-408a-b6dc-0ad475337a28  rac1
Datacenter: dc2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address           Load       Tokens  Owns   Host ID                               Rack
UN  10.108.23.52      47.66 KB   256     51.8%  35f1f17c-84c1-4e10-83b5-857feba03f4d  rac1


In both nodes, try executing the following:

cqlsh 10.216.218.73
cqlsh 10.31.2.31

You should not have a problem connecting to both of these machines. Make sure you are have the latest cqlsh (2.3.0 at the moment of this post).

We will be executing a script. I would recommend setting up Git and pull your code from Github for a production machine.

Create a script called test.cql.

Paste the following:
create keyspace helloworld with replication ={'class': 'NetworkTopologyStrategy', 'dc1': 1, 'dc2',1};
create table activity (
activity_key int,
activity_time timeuuid,
activity_type varint,
primary key (activity_key, activity_time)
)
with clustering order by (event_time desc);
Execute your script by running:
cqlsh 10.216.218.73 -f test.cql
Check to see if the keyspace "helloworld" exists:
cqlsh 10.216.218.73
describe keyspaces;

Connection your Application to Cassandra

In the Security Group for Cassandra, open up 9160 to your application security group id.

Check if the connection is okay by telnet
telnet 10.31.2.31 9160

Friday, January 25, 2013

Setting up Lighttpd Load Balancer on EC2 Ubuntu

Lighttpd is an asynchronous server. Along with Nginx, Lighttpd is one of the fast servers designed to counter the C10k problem. If you want to set up Nginx, read Setting up Nginx on EC2 Ubuntu.

This tutorial will demonstrate how to use Lighttpd to load balance application servers.


Creating a EC2 Instance

In the AWS Management Console, begin by creating a t1.micro Ubuntu Server 12.04.1 LTS 64-bit. (If you don't know how to create an instance, read Amazon EC2 - Launching Ubuntu Server 12.04.1 LTS step by step guide.

Here are some guidelines:
  • Uncheck Delete on Termination for the root volume
  • Add port 22, 80 and 443 to the Security Group, call it lighttpd.

Install Lighttpd

ssh -i {key} ubuntu@{your_ec2_public_address}

sudo apt-get update -y

sudo apt-get install -y lighttpd

Lighttpd should be running. To check its status, run

service lighttpd status

All the configuration files are located in /etc/lighttpd

To enable/disable a module
  • Use /usr/sbin/lighty-enable-mod and /usr/sbin/lighty-disable-mod
  • Or create a symbolic link from /etc/lighttpd/conf-available/{module} to /etc/lighttpd/conf-enabled/module
To load balance application servers, we will be using the 10-proxy.conf file as a template.

cd /etc/lighttpd/conf-available
cp 10-proxy.conf 11-proxy.conf
vi 11-proxy.conf

We are interesting in the following two variables:
  • proxy.balance - choose from hash, round-robin or fair
  • proxy.server - put the servers you want to load balance to
For example:
proxy.balance     = "hash"
proxy.server     = ( "" =>
                     (
                       ( "host" => "10.204.199.85",
                         "port" => 80
                       ),
                       ( "host" => "10.202.111.140",
                         "port" => 80
                       )
                     )
                    )
The above settings will load balance to two other servers based on IP.

Restart the server.
service lighttpd restart
Test the server.

To check the status:
netstat -ntulp

Thursday, January 24, 2013

Setting up a Java Tomcat7 Production Server on Amazon EC2

This tutorial will demonstrate how to build a Tomcat7 server running a Java application on Amazon EC2.

Here are the tools we will set up:
  • Apache Tomcat7
  • Open JDK7
  • GitHub
  • Maven 3.0.4
  • MySQL

Creating a EC2 Instance

In the AWS Management Console, begin by creating a t1.micro Ubuntu Server 12.04.1 LTS 64-bit machine. (If you don't know how to create an instance, read Amazon EC2 - Launching Ubuntu SErver 12.04.1 LTS step by step guide.)

Here are some guidelines:
  • Uncheck Delete on Termination for the root volume
  • Add port 22, 80, 443 to the Security Group.

Create a EBS volume

We will create a 20GB volume to store our Java code. The EBS will be formated with XFS.


If the volume kept on getting stuck, keep restarting the EC2 instance until it's attached.


Configure the EC2 instance

ssh into the instance (ssh -i {key} ubuntu@{your_ec2_public_address})

sudo apt-get update -y

My mounting point for /dev/xvdf is called /vol.
cd /vol
mkdir src
mkdir webapps
mkdir war_backups

/vol/src is where we will place the application code. /vol/webapps is where we will deploy the WAR file. /vol/war_backups is for making war backups, as the name implies.


Deploying code from GitHub

Skip this if you are using other source control. The idea is that we will put the Java application code in the /vol/src folder.

sudo apt-get install git -y
mkdir /vol/src
cd /vol/src

git config --global user.name "your_name"
git config --global user.email "your_email"
git config --global github.user "your_github_login"
git clone ssh://git@github.com/username/repo.git

You will want to establish a connection with Github using ssh rather than https because if you are building an image that can be used for auto-scaling you don't want to input the username and password every time. See Generating SSH Keys for more details.

Your project should be located in /vol/src/{your_project}


Set up the Tomcat7 server

Begin by reading Install OpenJDK 7. Read Install Java OpenJDK 7 on Amazon EC2 Ubuntu.

echo $JAVA_HOME to check if it's set.

Install Tomcat 7. Read Install Tomcat 7 on Amazon EC2 Ubuntu.

Remember to change to ports 80, 443, and the root web directory as the Tomcat war root path.

Check http://{your_ec2_public_address} in your browser to make sure Tomcat7 is running.

Make sure Tomcat7 is still up after you reboot the machine.


Generating the war file

We will be using Maven to compile our Spring Java project. If you are using other build frameworks, skip this.

Read Install Maven 3 on Amazon EC2 Ubuntu.

Run "mvn --version" to make sure it's using OpenJDK 7 and running the latest version of Maven.

cd /vol/src/{your_project}
mvn clean install

A WAR file should be built.

Move this WAR file into the Tomcat webapps directory. If you are following this tutorial, it should be at /vol/webapps.

Remember to label this WAR file as ROOT.war

It's easier to do load balancing mapping later.

Browse to check you can access the site.


Using Amazon SES as the SMTP email service

Using SES will increase the likelihood of email delivery. Read Using Amazon SES to send emails.

Recompile your project and test it.


Moving MySQL to Amazon RDS

If you are using MySQL, you should move to Amazon RDS as it simplifies a lot of management, backup operations for you.

Read Using MySQL on Amazon RDS.

To interact with RDS through your EC2 instance, install MySQL Server or just the MySQL client interface.
sudo apt-get install -y mysql-server
Stop the local MySQL server. We won't be using it.
sudo /etc/init.d/mysql stop
Connect to your RDS instance
mysql -h {rds_public_address} -P 3306 -u{username} -p{password}
Do NOT use the following form. You will get a access denied.
mysql -u{username} -p{password} -h {rds_public_address} -P 3306
Update the JDBC settings in your application, recompile and test it.


Load Balancing Tomcat7

If you are planning to run multiple instances, read Setting up Lighttpd Load Balancer on EC2 Ubuntu or Setting up Nginx on EC2 Ubuntu.

Wednesday, January 23, 2013

Amazon EC2 - remote host identification has changed


You may get the following message when you ssh into your EC2 machine:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
aa:c3:4d:d2:db:64:17:f0:b3:9c:77:d7:47:2f:31:ab.
Please contact your system administrator.
This can happen when you are associating your Elastic IP to another instance.

All you need to do is to remove the known_hosts file
rm ~/.ssh/known_hosts 

Tuesday, January 22, 2013

Setting up Nginx on EC2 Ubuntu

Nginx is a high performance Web server and a reverse proxy. It is one of the top servers that can counter the C10K problem. It can be used to load balancer application servers and serve static assets.

There are many ways to set up your load balancers in AWS. Here are some examples:
  1. Elastic Loader Balancer -> Application and Nginx on each server
  2. Three layers: Elastic Load Balancer -> Nginx Servers(cache, load balancers) -> Application Servers
  3. Elastic Loader Blancer -> Application servers; using ElasticFront to server assets
  4. Nginx -> Application servers
Instagram uses the 2nd approach above.

This tutorial will focus on setting up Nginx on a single EC2 instance, while load balancing the application servers.


Creating a EC2 Instance

In the AWS Management Console, begin by creating a t1.micro Ubuntu Server 12.04.1 LTS 64-bit. (If you don't know how to create an instance, read Amazon EC2 - Launching Ubuntu Server 12.04.1 LTS step by step guide.

Here are some guidelines:
  • Uncheck Delete on Termination for the root volume
  • Add port 22, 80 and 443 to the Security Group, call it Nginx.

Installing Nginx

ssh into your instance.
ssh -i {your key} ubuntu@{your_ec2_public_address}
sudo apt-get update

sudo apt-get install -y nginx

Check the nginx version
nginx -v
If this is not the latest version, do the following:

sudo vi /etc/apt/sources.list

Add:
deb http://nginx.org/packages/ubuntu/ precise nginx
deb-src http://nginx.org/packages/ubuntu/ precise nginx 
sudo apt-get update

You will get:
W: GPG error: http://nginx.org precise Release: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY ABF5BD827BD9BF62
Add the public key:

wget http://nginx.org/packages/keys/nginx_signing.key
cat nginx_signing.key | sudo apt-key add -

sudo apt-get install nginx

You may get the following:

dpkg: error processing /var/cache/apt/archives/nginx_1.2.6-1~precise_amd64.deb (--unpack):
 trying to overwrite '/etc/logrotate.d/nginx', which is also in package nginx-common 1.1.19-1ubuntu0.1
dpkg-deb: error: subprocess paste was killed by signal (Broken pipe)
Errors were encountered while processing:
 /var/cache/apt/archives/nginx_1.2.6-1~precise_amd64.deb

apt-get remove nginx-common
sudo apt-get install nginx

Check your version to make sure it's the latest version (nginx -v).

Make Nginx start on boot.
update-rc.d nginx defaults

Nginx Basic Commands

sudo service nginx start
sudo service nginx stop
sudo service nginx restart
sudo service nginx status


Checking IP of your browser:
ifconfig eth0 | grep inet | awk '{ print $2 }'

Load balancing servers

We will have the Nginx server to load balance two servers (backend1.example.com and backend2.example.com) in a round robin fashion.

Begin by creating a new virtual host configuration file.
cp /etc/nginx/sites-available/default /etc/nginx/sites-available/{domain}
Put the following into the file:


upstream domain {
    ip_hash;
    server backend1.example.com:8080;
    server backend2.example.com:8080;
}
server {
    listen 80;
    server_name domain
    access_log /var/log/nginx/web_portal.access.log;
    location / {
            proxy_pass      http://domain/;
            proxy_next_upstream error timeout invalid_header http_500;
            proxy_connect_timeout 2;
            proxy_set_header        Host            $host;
            proxy_set_header        X-Real-IP       $remote_addr;
            proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_intercept_errors on;
    }
}



Make sure the domain above match with your request domain.

It is very important to have the following two attributes. It defines what happens when a server is down. In this case, it would redirect the client request to the next machine if the server is not responding within 2 secs.
proxy_next_upstream error timeout invalid_header http_500;
proxy_connect_timeout 1;
Check out proxy_read_timeout as well.

ip_hash will always send the client back to the same server based on the IP.

Check the Nginx Wiki for more info.

Disable the default config.
rm /etc/nginx/sites-enabled/default
Enable the configuration by symbolic link to sites-enabled.
sudo ln -s /etc/nginx/sites-available/{domain} /etc/nginx/sites-enabled/{domain}
If Nginx doesn't seem to pick up on the configuration, make sure /etc/nginx/nginx.conf has the following within the http block:
include /etc/nginx/sites-enabled/*;
Restart the server.
service nginx restart
To deploy code without service interruption, read Nginx - How to deploy code without service disruption.



How to build a NodeJS AMI on EC2

This demo will provide guidelines on how to configure a NodeJS EC2 instance and create a NodeJS AMI on Ubuntu.

Specs:

Ubuntu Server 12.04.1 LTS 64-bit


Create a Ubuntu Server 12.04.1 LTS 64-bit t1.micro instance


Uncheck delete on termination for the EBS-root disc.

Create a Security Group called Node JS Production (or anything you want).

Add port 22, 80, 443, 3000 to the Security Group. (I am adding port 3000 because I run the app from port 3000)

Launch the instance.

In the AWS Management Console, Volumes -> Create Volume.

Make the volume with
  • type = Standard
  • Size = 20GB
  • Availability Zone must match the EC2's Availability Zone
  • make the drive name xvdf

Associate this EBS with the EC2 instance we just created.

ssh into your instance.
Ex. ssh -i {key} ubuntu@{ec2-address}.compute-1.amazonaws.com
sudo apt-get update

We are going to format the xvdf with XFS file system. Refer to Amazon EC2 - Mounting a EBS drive.


Install NodeJS and other dependencies

sudo apt-get -y nodejs npm

If you run "node --version", you will find the node version is 0.6.12. We want to use 0.8.18, since it's a lot faster.

sudo npm install -g n
sudo n 0.8.18

Now "sudo node --version" will show version 0.8.18 while "node --version" will show 0.6.12


Install Git and fetch your code (Optional)

sudo apt-get install git -y
mkdir /vol/src
cd /vol/src

git config --global user.name "your_name"
git config --global user.email "your_email"
git config --global github.user "your_github_login"
git clone ssh://git@github.com/username/repo.git

You will want to establish a connection with Github using ssh rather than https because if you are building an image that can be used for auto-scaling you don't want to input the username and password every time. See Generating SSH Keys for more details.

Test your application by running

sudo node {your_app}


Making the NodeJS start on boot

To make a robust image, we want the NodeJS app to start on boot and respawn when crashed. We will write a simple service. All service scripts are located in /etc/init.

Let's create the file /etc/init/{your_app_name}_service.conf

sudo vi http://upstart.ubuntu.com/wiki/Stanzas

Put the following into the file:

#######################

#!upstart

description "my NodeJS server"
author      "Some Dude"

# start on startup
start on started networking
stop on shutdown

# Automatically Respawn:
respawn
respawn limit 5 60

script
    cd /vol/src/{your_app}
    exec sudo node /vol/src/{your_app}/app.js >> /vol/log/app_`date +"%Y%m%d_%H%M%S"`.log 2>&1
end script

post-start script
   # Optionally put a script here that will notifiy you node has (re)started
   # /root/bin/hoptoad.sh "node.js has started!"
end script
#######################


Refer to upstart stanzas for more details about what each field mean.

Create the directory to store NodeJS outputs:

sudo mkdir /vol/log

I have marked each log file with the start time of the app. You will probably want to change this to create logs daily.

To check if the services are running:

initctl list | grep {your_app_name}_service.conf

To start a service:

sudo service {your_app_name}_service.conf start

To stop a service:

sudo service {your_app_name}_service.conf stop


Now reboot your EC2 instance in the AWS console.

Test if your site is started.


Create a NodeJS AMI


In the AWS Management Console, click instances at the left sidebar.

Right click on the Wordpress instance created above and click on Create Image.

Fill in the image name. I like to name things in a convention that is systematic. If you are planning to write deploy scripts and do auto-scaling, it is easier to identify what an image is. I use the following convention:

{namespace}_{what_is_it}_{date}

Ex. mycompany_blog_20130118

You will want the date because you may create an image every time you deploy new codes.

Leave the other options as default, and click on Create Image.

On the left sidebar, click on AMIs under Images.

You can see the status of the AMI we just created.

You should launch an instance from this AMI and test all the data is there.