Thursday, January 31, 2013

Cassandra Hector - May not be enough replicas present to handle consistency level.

If you get this error, make sure your datacenter and rack names are the same as the ones when you create your keyspace.

Make sure everything is lowercase.

For example:

In /etc/cassandra/cassandra-topology.properties, change all to lowercase.
10.216.218.73=dc1:rac1
10.31.2.31=dc2:rac1
default=dc1:rac1
Then create the keyspace in lowercase.
create keyspace helloworld with replication ={'class': 'NetworkTopologyStrategy', 'dc1': 1, 'dc2',1}; 

Setting up Github on AWS

sudo apt-get install git -y

Choose a location to store your code. I like to put my code on a separate EBS for portability. In the following, I will use /vol for this location.

mkdir /vol/src
cd /vol/src

git config --global user.name "your_name"
git config --global user.email "your_email"
git config --global github.user "your_github_login"
git clone ssh://git@github.com/username/repo.git

You will want to establish a connection with Github using ssh rather than https because if you are building an image that can be used for auto-scaling you don't want to input the username and password every time. See Generating SSH Keys for more details.

Your project should be located in /vol/src/{your_project}.

Tuesday, January 29, 2013

Setting up Cassandra Multi Nodes on Amazon EC2

Cassandra is a NoSQL database. It is designed to be launched in a cluster of machines, providing high availability and fault tolerance.

Before starting, make sure you scan through Node and Cluster Initialization Properties.


Important Node attributes:

cluster_name
All nodes in a cluster must have the same name.

commitlog_directory
Datastax recommends to put this into a separate disk partition (Perhaps, EBS).

data_file_directories
Stores the column family data.

partitoner
defaults to RandomPartitioner

rpc_address
set to 0.0.0.0 to listen on all configured interfaces

rpc_port
Port for Thrift server. Default is 9160.

saved_caches_directory
Location where column family key and row caches will be stored.

seeds
Nodes that contain information about the ring topology and obtain gossip information.

storage_port
Port for inter-node communication. Default is 7000.


Create a Large Instance

You will need to launch at least a large instance. See Cassandra Hardware for more details. If you have a smaller instance, you may not be able to use /etc/init.d/cassandra start command and you will see a JVM heap memory error.

Ideal Cassandra Instance Specs:

  • 32 GB RAM (Minimum 4GB)
  • 8-core cpu
  • 2 disks (one for CommitLogDirectory, one for DataFileDirectories)
  • RAID 0 for DataFileDirectories disk when disk capacity is 50% full
  • XFS file system
  • Minimum 3 replications (instances)


We will NOT be using EBS due to the bad I/O throughput and reliability. We will use the ephemeral volume instead. Click here for more details.


Security Group

PortDescription
Public Facing Ports
22SSH port.
8888OpsCenter website port.
Cassandra Inter-node Ports
1024+JMX reconnection/loopback ports. See description for port 7199.
7000Cassandra inter-node cluster communication.
7199Cassandra JMX monitoring port. After the initial handshake, the JMX protocol requires that the client reconnects on a randomly chosen port (1024+).
9160Cassandra client port (Thrift).
OpsCenter ports
61620OpsCenter monitoring port. The opscenterd daemon listens on this port for TCP traffic coming from the agent.
61621OpsCenter agent port. The agents listen on this port for SSL traffic initiated by OpsCenter.

Create a Security Group with the settings above.

Port 22 and 8888 will be 0.0.0.0/0.
1024-65535 will be your group id. (Click on Details tab on a Security Group to check your group id)
All other ports will be your group id.


Mounting the ephemeral drive

We will begin by formatting the ephemeral drive with XFS.

Use fdisk -l to check what's your ephemeral drive. It may come with ext3 already.

umount /dev/xvdb
mkfs.xfs -f /dev/xvdb

vi /etc/fstab

Remove the original entry and put

/vol xfs noatime 0 0

sudo mount /vol

You may also want to use RAID-0 to strip a set of ephemeral volumes.


Install Oracle Sun Java

Do not use OpenJDK. Cassandra works only with Oracle Sun Java.

Download jdk-6u38-linux-x64.bin.

mkdir /usr/java/latest

Upload or wget the JDK in this folder.

chmod +x jdk-6u38-linux-x64.bin

sudo ./jdk-6u38-linux-x64.bin

sudo update-alternatives --install "/usr/bin/java" "java" "/usr/java/latest/jdk1.6.0_38/bin/java" 1

sudo update-alternatives --set java /usr/java/latest/jdk1.6.0_38/bin/java


java -version
java version "1.6.0_38"
Java(TM) SE Runtime Environment (build 1.6.0_38-b05)
Java HotSpot(TM) 64-Bit Server VM (build 20.13-b02, mixed mode)

Make sure JNA is installed. Linux does not swap out the JVM and performance can improve.

sudo apt-get install libjna-java

vi /etc/security/limits.conf

Add:

cassandra soft memlock unlimited
cassandra hard memlock unlimited


Install Cassandra

Begin by installing a single node Cassandra. Read Cassandra - installing on Ubuntu 12.04 Amazon EC2.

Make sure Cassandra is at version 1.2.x and cqlsh is 2.3.x
cassandra -version 
cqlsh --version
We want to save the data in the ephemeral drive. The mount point we created earlier is /vol

mkdir /vol/cassandra
mkdir /vol/cassandra/commitlog
mkdir /vol/cassandra/data
mkdir /vol/cassandra/saved_caches

chown cassandra:cassandra -R /vol/cassandra

vi /etc/cassandra/cassandra.yaml

Point these directories to the ones we created above.

  • commitlog_directory
  • data_file_directories
  • saved_caches_directory

Kill Cassandra if you started with cassandra -f command. We will want to start from init.d

sudo /etc/init.d/cassandra start
sudo /etc/init.d/cassandra stop
sudo /etc/init.d/cassandra status

Use nodetool to check the status:
nodetool -h localhost -p 7199 ring
Reboot and check if it's running by running "netstat -tupln"

If it's not starting, check the log /var/log/cassandra/output.log

If it's complaining about oldClusterName != newClusterName, just remove everything in the data_file_directories.


Create a Cassandra AMI

We will be setting up a ring (multi-node Cassandra). Before you create an AMI, umount /dev/xvdb and comment out the xfs record in /etc/fstab. Else you won't be able to ssh into the instances launched by this image.

Launch a second instance in another availability zone


Setting up a Cassandra Ring

Before you begin, make sure you have the following:

  • Cassandra on each node
  • a cluster name
  • IP of each node
  • seed nodes
  • snitches (EC2Snitch, EC2MultiRegionSnitch)
  • open required firewalls

A snitch is used to determine which data centers and racks are written to and read from, and distribute replicas by grouping machines into data centers and racks.

For Ec2Snitch, a region is treated as a data center, and an availability zone is treated as a rack within a data center.


Setting up Multi Data Center Cassandra Ring

We will begin by tweaking the first node.

cd /etc/cassandra/cassandra.yaml

Set the following:
cluster_name: my_cluster
initial_token: 0
Start Cassandra. If you face any problems starting it, delete all the files in commitlog_directory and data_file_directories.

We will now add a second node in a different region (Ex. if first region is at us-east-1a, then make second region to be at us-east-1d).

ssh into your second instance. Remember to mount the partition back. Use "df" to make sure /dev/xvdb is mounted.
umount /mnt
vi /etc/fstab
uncomment "/dev/xvdb /vol xfs noatime 0 0" and remove entries that are using /dev/xvdb if appropriate
mkfs.xfs -f /dev/xvdb
mount /vol
mkdir /vol/cassandra
mkdir /vol/cassandra/commitlog
mkdir /vol/cassandra/data
mkdir /vol/cassandra/saved_caches
chown cassandra:cassandra -R /vol/cassandra
Now edit /etc/cassandra/cassandra.yaml.

The following needs to be changed on all nodes:
  • seeds
  • rpc_address
  • listen_address
The following needs to be changed on the new node:
  • initial_token
  • auto_bootstrap

Seeds
Add the private IPs of all nodes
- seeds: "10.31.2.31,10.216.218.73"

RPC Address
The address in which clients connect to


Listen Address
The address in which nodes connect with each other

For the first node,
listen_address: 10.31.2.31
rpc_address: 10.31.2.31
For the second node,
listen_address: 10.216.218.73
rpc_address: 10.216.218.73

Initial token (skip to Virtual nodes if you are using Cassandra 1.2.x and above)
This is used for load balancing. The first node should have a value of zero. All other nodes will need to recalculate this value every time a new node joins the cluster.

Calculate this based on the number of nodes. Use the Python problem from Cassandra.

Create a file called token_generator.py. Paste the following in the file.
#! /usr/bin/python
import sys
if (len(sys.argv) > 1):
        num=int(sys.argv[1])
else:
        num=int(raw_input("How many nodes are in your cluster? "))
for i in range(0, num):
        print 'node %d: %d' % (i, (i*(2**127)/num))
Change it to an executable.
chmod 777 token_generator.py
Execute the program with the number of nodes as the first argument. In our case, it's 2.
./token_generator.py 2
The output should be similar to the following
node 0: 0
node 1: 85070591730234615865843651857942052864
Put 85070591730234615865843651857942052864 as the initial token for the second node.
initial_token: '85070591730234615865843651857942052864'
If you get DatabaseDescriptor.java (line 509) Fatal configuration error, you are probably using Cassandra 1.2.x.


Virtual nodes (Cassandra 1.2.x or above)
vnodes are introducted in 1.2.x.

Set num_tokens to 256 and leave initial_token to empty.


Auto bootstrapping
When a new node is added, the cluster will automatically migrate the correct range of data from existing nodes.

Do not set autobootstrap: true and include it in the seed list together.

After all the above setup, start both nodes. Then check if they are up.
nodetool status

PropertyFileSnitch

Set endpoint_snitch: PropertyFileSnitch

We will be using PropertyFileSnitch and define our data centers and racks.

We will use dc1 to represent data center 1 and rac1 to represent rack 1.

Create /etc/cassandra/cassandra-topology.properties on all nodes and place the following:

10.216.218.73=dc1:rac1
10.31.2.31=dc2:rac1
default=dc1:rac1


default=dc1:rac1 is for when a node first joined and it's not specified in file.

Keep in mind that when creating our schema we will be using NetworkTopologyStrategy and use the dc and rac references we used above.

You may want to create an image again.


Testing the cluster replication 

Start Cassandra for both nodes:
service cassandra start
Check the status:


nodetool status

Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address           Load       Tokens  Owns   Host ID                               Rack
UN  10.32.6.31        28.94 KB   256     48.2%  eab0379f-2ac6-408a-b6dc-0ad475337a28  rac1
Datacenter: dc2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address           Load       Tokens  Owns   Host ID                               Rack
UN  10.108.23.52      47.66 KB   256     51.8%  35f1f17c-84c1-4e10-83b5-857feba03f4d  rac1


In both nodes, try executing the following:

cqlsh 10.216.218.73
cqlsh 10.31.2.31

You should not have a problem connecting to both of these machines. Make sure you are have the latest cqlsh (2.3.0 at the moment of this post).

We will be executing a script. I would recommend setting up Git and pull your code from Github for a production machine.

Create a script called test.cql.

Paste the following:
create keyspace helloworld with replication ={'class': 'NetworkTopologyStrategy', 'dc1': 1, 'dc2',1};
create table activity (
activity_key int,
activity_time timeuuid,
activity_type varint,
primary key (activity_key, activity_time)
)
with clustering order by (event_time desc);
Execute your script by running:
cqlsh 10.216.218.73 -f test.cql
Check to see if the keyspace "helloworld" exists:
cqlsh 10.216.218.73
describe keyspaces;

Connection your Application to Cassandra

In the Security Group for Cassandra, open up 9160 to your application security group id.

Check if the connection is okay by telnet
telnet 10.31.2.31 9160

What is RAID?

RAID (redundant array of independent disks) is a storage technology that combines multiple disk components into a logical unit. Different RAID levels define different configurations to employ striping, mirroring, or parity.

RAID levels

RAID 0 - block-striped volume
  • splits data across 2 or more disks
  • no data redundancy
  • no parity
  • used to increase performance
  • used to create a large logical disk out of 2 or more physical ones
  • size = n x min(all drive sizes)

RAID 1 - mirror
  • exact copy of data on 2 disks
  • size = n x min(all drive sizes)
  • does not provide protection against data corruption due to viruses, accidental file changes or deletions, or any other data-specific changes
  • speed = n * disk speed
  • use independent disk controllers to increase speed

RAID 2 - bit striped volume
  • stripes data at the bit level; uses hamming code for error correction
  • no commercial applications of RAID 2 today

RAID 3
  • byte-level striping with a dedicated parity disk
  • cannot serve multiple requests simultaneously
  • not commonly used

RAID 4
  • block-level striping with a dedicated parity disk
  • not commonly used

RAID 5
  • block-level striping with parity data

RAID 6
  • block-level striping with two parity blocks
  • penalty on write operations

RAID 10 - mirror stripes
  • top level RAID 0 array composed of 2 or more RAID 1 arrays

RAID 0+1 - striped mirrors
  • top level RAID 1 mirror composed of two or more RAID 0 strip sets

Friday, January 25, 2013

Setting up Lighttpd Load Balancer on EC2 Ubuntu

Lighttpd is an asynchronous server. Along with Nginx, Lighttpd is one of the fast servers designed to counter the C10k problem. If you want to set up Nginx, read Setting up Nginx on EC2 Ubuntu.

This tutorial will demonstrate how to use Lighttpd to load balance application servers.


Creating a EC2 Instance

In the AWS Management Console, begin by creating a t1.micro Ubuntu Server 12.04.1 LTS 64-bit. (If you don't know how to create an instance, read Amazon EC2 - Launching Ubuntu Server 12.04.1 LTS step by step guide.

Here are some guidelines:
  • Uncheck Delete on Termination for the root volume
  • Add port 22, 80 and 443 to the Security Group, call it lighttpd.

Install Lighttpd

ssh -i {key} ubuntu@{your_ec2_public_address}

sudo apt-get update -y

sudo apt-get install -y lighttpd

Lighttpd should be running. To check its status, run

service lighttpd status

All the configuration files are located in /etc/lighttpd

To enable/disable a module
  • Use /usr/sbin/lighty-enable-mod and /usr/sbin/lighty-disable-mod
  • Or create a symbolic link from /etc/lighttpd/conf-available/{module} to /etc/lighttpd/conf-enabled/module
To load balance application servers, we will be using the 10-proxy.conf file as a template.

cd /etc/lighttpd/conf-available
cp 10-proxy.conf 11-proxy.conf
vi 11-proxy.conf

We are interesting in the following two variables:
  • proxy.balance - choose from hash, round-robin or fair
  • proxy.server - put the servers you want to load balance to
For example:
proxy.balance     = "hash"
proxy.server     = ( "" =>
                     (
                       ( "host" => "10.204.199.85",
                         "port" => 80
                       ),
                       ( "host" => "10.202.111.140",
                         "port" => 80
                       )
                     )
                    )
The above settings will load balance to two other servers based on IP.

Restart the server.
service lighttpd restart
Test the server.

To check the status:
netstat -ntulp

Thursday, January 24, 2013

Setting up a Java Tomcat7 Production Server on Amazon EC2

This tutorial will demonstrate how to build a Tomcat7 server running a Java application on Amazon EC2.

Here are the tools we will set up:
  • Apache Tomcat7
  • Open JDK7
  • GitHub
  • Maven 3.0.4
  • MySQL

Creating a EC2 Instance

In the AWS Management Console, begin by creating a t1.micro Ubuntu Server 12.04.1 LTS 64-bit machine. (If you don't know how to create an instance, read Amazon EC2 - Launching Ubuntu SErver 12.04.1 LTS step by step guide.)

Here are some guidelines:
  • Uncheck Delete on Termination for the root volume
  • Add port 22, 80, 443 to the Security Group.

Create a EBS volume

We will create a 20GB volume to store our Java code. The EBS will be formated with XFS.


If the volume kept on getting stuck, keep restarting the EC2 instance until it's attached.


Configure the EC2 instance

ssh into the instance (ssh -i {key} ubuntu@{your_ec2_public_address})

sudo apt-get update -y

My mounting point for /dev/xvdf is called /vol.
cd /vol
mkdir src
mkdir webapps
mkdir war_backups

/vol/src is where we will place the application code. /vol/webapps is where we will deploy the WAR file. /vol/war_backups is for making war backups, as the name implies.


Deploying code from GitHub

Skip this if you are using other source control. The idea is that we will put the Java application code in the /vol/src folder.

sudo apt-get install git -y
mkdir /vol/src
cd /vol/src

git config --global user.name "your_name"
git config --global user.email "your_email"
git config --global github.user "your_github_login"
git clone ssh://git@github.com/username/repo.git

You will want to establish a connection with Github using ssh rather than https because if you are building an image that can be used for auto-scaling you don't want to input the username and password every time. See Generating SSH Keys for more details.

Your project should be located in /vol/src/{your_project}


Set up the Tomcat7 server

Begin by reading Install OpenJDK 7. Read Install Java OpenJDK 7 on Amazon EC2 Ubuntu.

echo $JAVA_HOME to check if it's set.

Install Tomcat 7. Read Install Tomcat 7 on Amazon EC2 Ubuntu.

Remember to change to ports 80, 443, and the root web directory as the Tomcat war root path.

Check http://{your_ec2_public_address} in your browser to make sure Tomcat7 is running.

Make sure Tomcat7 is still up after you reboot the machine.


Generating the war file

We will be using Maven to compile our Spring Java project. If you are using other build frameworks, skip this.

Read Install Maven 3 on Amazon EC2 Ubuntu.

Run "mvn --version" to make sure it's using OpenJDK 7 and running the latest version of Maven.

cd /vol/src/{your_project}
mvn clean install

A WAR file should be built.

Move this WAR file into the Tomcat webapps directory. If you are following this tutorial, it should be at /vol/webapps.

Remember to label this WAR file as ROOT.war

It's easier to do load balancing mapping later.

Browse to check you can access the site.


Using Amazon SES as the SMTP email service

Using SES will increase the likelihood of email delivery. Read Using Amazon SES to send emails.

Recompile your project and test it.


Moving MySQL to Amazon RDS

If you are using MySQL, you should move to Amazon RDS as it simplifies a lot of management, backup operations for you.

Read Using MySQL on Amazon RDS.

To interact with RDS through your EC2 instance, install MySQL Server or just the MySQL client interface.
sudo apt-get install -y mysql-server
Stop the local MySQL server. We won't be using it.
sudo /etc/init.d/mysql stop
Connect to your RDS instance
mysql -h {rds_public_address} -P 3306 -u{username} -p{password}
Do NOT use the following form. You will get a access denied.
mysql -u{username} -p{password} -h {rds_public_address} -P 3306
Update the JDBC settings in your application, recompile and test it.


Load Balancing Tomcat7

If you are planning to run multiple instances, read Setting up Lighttpd Load Balancer on EC2 Ubuntu or Setting up Nginx on EC2 Ubuntu.

Using Amazon SES to send emails

Amazon SES provides a easy way to send emails from your application. The strongest point of using SES is that it reduces the likelihood that your emails will be marked as spams by your ISPs.

You will need your AWS Access Keys and the SMTP Credentials.


Getting your AWS Access Keys

If you want to send email directly by using the Amazon SES API or the AWS SDK, you will need to use these keys.

In the AWS console, click on My Account/Console -> Security Credentials.

Obtain your access key id and the secret access key. (Scroll to the section called Access Credentials).


Getting the SMTP Credentials

Go to the Amazon SES console.

Click Create My SMTP Credentials.

Enter a new name for the IAM user or just use the default. Click Create.

Make sure you download the credentials or record it somewhere. You will NOT be able to see this again.


Testing Email Sending

In the SES console, click Verified Senders on the left sidebar.

Add a few email addresses by clicking on Verify a New Email Address.

Test sending a few emails and make sure you get them. Both senders and receivers need to be verified.


Request Production Access

Click in the form below. Usually, this process takes a day. It's better to request as early as possible.

https://portal.aws.amazon.com/gp/aws/html-forms-controller/contactus/SESProductionAccess2011Q3


Configuring your SMTP settings in your application

In the SES console, click SMTP Settings on the left sidebar.

You will need to configure your application with the following settings (also the SMTP credentials we set up above).

Server Name:email-smtp.us-east-1.amazonaws.com
Port:25, 465 or 587
Use Transport Layer Security (TLS):Yes
Authentication:Your SMTP credentials - see below.


For my set up, I use Spring's email provider and configure my provider with the above settings.

That's it. Test sending your emails through emails.

If you want to directly access the Amazon SES API, look at Using the Amazon SES API to Send Email.

Wednesday, January 23, 2013

Amazon EC2 - remote host identification has changed


You may get the following message when you ssh into your EC2 machine:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
aa:c3:4d:d2:db:64:17:f0:b3:9c:77:d7:47:2f:31:ab.
Please contact your system administrator.
This can happen when you are associating your Elastic IP to another instance.

All you need to do is to remove the known_hosts file
rm ~/.ssh/known_hosts 

Tuesday, January 22, 2013

Setting up Nginx on EC2 Ubuntu

Nginx is a high performance Web server and a reverse proxy. It is one of the top servers that can counter the C10K problem. It can be used to load balancer application servers and serve static assets.

There are many ways to set up your load balancers in AWS. Here are some examples:
  1. Elastic Loader Balancer -> Application and Nginx on each server
  2. Three layers: Elastic Load Balancer -> Nginx Servers(cache, load balancers) -> Application Servers
  3. Elastic Loader Blancer -> Application servers; using ElasticFront to server assets
  4. Nginx -> Application servers
Instagram uses the 2nd approach above.

This tutorial will focus on setting up Nginx on a single EC2 instance, while load balancing the application servers.


Creating a EC2 Instance

In the AWS Management Console, begin by creating a t1.micro Ubuntu Server 12.04.1 LTS 64-bit. (If you don't know how to create an instance, read Amazon EC2 - Launching Ubuntu Server 12.04.1 LTS step by step guide.

Here are some guidelines:
  • Uncheck Delete on Termination for the root volume
  • Add port 22, 80 and 443 to the Security Group, call it Nginx.

Installing Nginx

ssh into your instance.
ssh -i {your key} ubuntu@{your_ec2_public_address}
sudo apt-get update

sudo apt-get install -y nginx

Check the nginx version
nginx -v
If this is not the latest version, do the following:

sudo vi /etc/apt/sources.list

Add:
deb http://nginx.org/packages/ubuntu/ precise nginx
deb-src http://nginx.org/packages/ubuntu/ precise nginx 
sudo apt-get update

You will get:
W: GPG error: http://nginx.org precise Release: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY ABF5BD827BD9BF62
Add the public key:

wget http://nginx.org/packages/keys/nginx_signing.key
cat nginx_signing.key | sudo apt-key add -

sudo apt-get install nginx

You may get the following:

dpkg: error processing /var/cache/apt/archives/nginx_1.2.6-1~precise_amd64.deb (--unpack):
 trying to overwrite '/etc/logrotate.d/nginx', which is also in package nginx-common 1.1.19-1ubuntu0.1
dpkg-deb: error: subprocess paste was killed by signal (Broken pipe)
Errors were encountered while processing:
 /var/cache/apt/archives/nginx_1.2.6-1~precise_amd64.deb

apt-get remove nginx-common
sudo apt-get install nginx

Check your version to make sure it's the latest version (nginx -v).

Make Nginx start on boot.
update-rc.d nginx defaults

Nginx Basic Commands

sudo service nginx start
sudo service nginx stop
sudo service nginx restart
sudo service nginx status


Checking IP of your browser:
ifconfig eth0 | grep inet | awk '{ print $2 }'

Load balancing servers

We will have the Nginx server to load balance two servers (backend1.example.com and backend2.example.com) in a round robin fashion.

Begin by creating a new virtual host configuration file.
cp /etc/nginx/sites-available/default /etc/nginx/sites-available/{domain}
Put the following into the file:


upstream domain {
    ip_hash;
    server backend1.example.com:8080;
    server backend2.example.com:8080;
}
server {
    listen 80;
    server_name domain
    access_log /var/log/nginx/web_portal.access.log;
    location / {
            proxy_pass      http://domain/;
            proxy_next_upstream error timeout invalid_header http_500;
            proxy_connect_timeout 2;
            proxy_set_header        Host            $host;
            proxy_set_header        X-Real-IP       $remote_addr;
            proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_intercept_errors on;
    }
}



Make sure the domain above match with your request domain.

It is very important to have the following two attributes. It defines what happens when a server is down. In this case, it would redirect the client request to the next machine if the server is not responding within 2 secs.
proxy_next_upstream error timeout invalid_header http_500;
proxy_connect_timeout 1;
Check out proxy_read_timeout as well.

ip_hash will always send the client back to the same server based on the IP.

Check the Nginx Wiki for more info.

Disable the default config.
rm /etc/nginx/sites-enabled/default
Enable the configuration by symbolic link to sites-enabled.
sudo ln -s /etc/nginx/sites-available/{domain} /etc/nginx/sites-enabled/{domain}
If Nginx doesn't seem to pick up on the configuration, make sure /etc/nginx/nginx.conf has the following within the http block:
include /etc/nginx/sites-enabled/*;
Restart the server.
service nginx restart
To deploy code without service interruption, read Nginx - How to deploy code without service disruption.



Nginx - How to deploy code without service disruption

Let's say you have the following upstream settings (Nginx load balancing two servers):
# /etc/nginx/sites-available/backend
upstream backend {
        server backend1.example.com:8080;
        server backend2.example.com:8080;
}
To update backend1.example.com, mark it as down,


# /etc/nginx/sites-available/backend
upstream backend {
        server backend1.example.com:8080 down;
        server backend2.example.com:8080;
}

> service nginx reload

Now update your code and reload the server configs.

> service nginx reload

Similarly, for the second server

upstream backend {
        server backend1.example.com:8080;
        server backend2.example.com:8080 down;
}

> service nginx reload

Update code on second server

> service nginx reload

The reload command does not restart the Nginx server.

How to build a NodeJS AMI on EC2

This demo will provide guidelines on how to configure a NodeJS EC2 instance and create a NodeJS AMI on Ubuntu.

Specs:

Ubuntu Server 12.04.1 LTS 64-bit


Create a Ubuntu Server 12.04.1 LTS 64-bit t1.micro instance


Uncheck delete on termination for the EBS-root disc.

Create a Security Group called Node JS Production (or anything you want).

Add port 22, 80, 443, 3000 to the Security Group. (I am adding port 3000 because I run the app from port 3000)

Launch the instance.

In the AWS Management Console, Volumes -> Create Volume.

Make the volume with
  • type = Standard
  • Size = 20GB
  • Availability Zone must match the EC2's Availability Zone
  • make the drive name xvdf

Associate this EBS with the EC2 instance we just created.

ssh into your instance.
Ex. ssh -i {key} ubuntu@{ec2-address}.compute-1.amazonaws.com
sudo apt-get update

We are going to format the xvdf with XFS file system. Refer to Amazon EC2 - Mounting a EBS drive.


Install NodeJS and other dependencies

sudo apt-get -y nodejs npm

If you run "node --version", you will find the node version is 0.6.12. We want to use 0.8.18, since it's a lot faster.

sudo npm install -g n
sudo n 0.8.18

Now "sudo node --version" will show version 0.8.18 while "node --version" will show 0.6.12


Install Git and fetch your code (Optional)

sudo apt-get install git -y
mkdir /vol/src
cd /vol/src

git config --global user.name "your_name"
git config --global user.email "your_email"
git config --global github.user "your_github_login"
git clone ssh://git@github.com/username/repo.git

You will want to establish a connection with Github using ssh rather than https because if you are building an image that can be used for auto-scaling you don't want to input the username and password every time. See Generating SSH Keys for more details.

Test your application by running

sudo node {your_app}


Making the NodeJS start on boot

To make a robust image, we want the NodeJS app to start on boot and respawn when crashed. We will write a simple service. All service scripts are located in /etc/init.

Let's create the file /etc/init/{your_app_name}_service.conf

sudo vi http://upstart.ubuntu.com/wiki/Stanzas

Put the following into the file:

#######################

#!upstart

description "my NodeJS server"
author      "Some Dude"

# start on startup
start on started networking
stop on shutdown

# Automatically Respawn:
respawn
respawn limit 5 60

script
    cd /vol/src/{your_app}
    exec sudo node /vol/src/{your_app}/app.js >> /vol/log/app_`date +"%Y%m%d_%H%M%S"`.log 2>&1
end script

post-start script
   # Optionally put a script here that will notifiy you node has (re)started
   # /root/bin/hoptoad.sh "node.js has started!"
end script
#######################


Refer to upstart stanzas for more details about what each field mean.

Create the directory to store NodeJS outputs:

sudo mkdir /vol/log

I have marked each log file with the start time of the app. You will probably want to change this to create logs daily.

To check if the services are running:

initctl list | grep {your_app_name}_service.conf

To start a service:

sudo service {your_app_name}_service.conf start

To stop a service:

sudo service {your_app_name}_service.conf stop


Now reboot your EC2 instance in the AWS console.

Test if your site is started.


Create a NodeJS AMI


In the AWS Management Console, click instances at the left sidebar.

Right click on the Wordpress instance created above and click on Create Image.

Fill in the image name. I like to name things in a convention that is systematic. If you are planning to write deploy scripts and do auto-scaling, it is easier to identify what an image is. I use the following convention:

{namespace}_{what_is_it}_{date}

Ex. mycompany_blog_20130118

You will want the date because you may create an image every time you deploy new codes.

Leave the other options as default, and click on Create Image.

On the left sidebar, click on AMIs under Images.

You can see the status of the AMI we just created.

You should launch an instance from this AMI and test all the data is there.

Monday, January 21, 2013

Switching NodeJS versions on Ubuntu

This tutorial will demo how you can have multiple versions of NodeJS and switch them any time.

Specs:
  • Ubuntu Server 12.04.1 LTS

Steps:

> sudo apt-get install nodejs npm

> sudo npm install -g n

> sudo n {version}
Ex. sudo n 0.8.18
If the version is not installed, the n cmd will automatically install the version for you. If the version is installed, then it will simply switch to that version.

NodeJS default install location on Mac OSX

Both the node and node_modules are located in:
/usr/local/lib/

Sunday, January 20, 2013

Using Spring with Velocity and SiteMesh on Google App Engine

When working with web applications, it's often important to pick a view technology that you like but also has long term support. The following will demonstrate how to integrate Velocity and SiteMesh into Spring and Google App Engine.

Tools:

  • Spring MVC 3.2.0
  • Google App Engine SDK 1.7.4
  • velocity-1.7
  • SiteMesh 2.4.2


We will be using Apache Velocity since Spring recommends it (Alternatively, you can use FreeMarker). We will also use SiteMesh because the decorator pattern to plug in header, footer, or other components are very convenient.

This tutorial will assume you have set up Spring on Google App Engine already. Read Running Spring 3.2 on Google App Engine for more details.

First download the following:



Add the following JARs to /war/WEB-INF/lib/

  • velocity-1.7.jar
  • velocity-1.7-dep.jar
  • sitemesh-2.4.2.jar
  • velocity-tools-view-2.0.jar
  • commons-collections-3.2.jar
  • commons-digester-2.1.jar
You may need to find some of the above elsewhere.

Add all these JARs to your build path.( Right Click on root project folder. Click Properties. Click Java Build Path. Click Libraries. Click Add JARs.)

In /war/WEB-INF/web.xml, add the following:

In your serlvet.xml, replace your viewResolver with:

Create the file /war/WEB-INF/sitemesh.xml and add

Create the file /war/WEB-INF/decorators.xml and add

Create the layout file /war/decorators/layout.vm and add

Create a velocity file /war/velocity/hello.vm and put the text "Hello World" into it.

Create an empty file /war/WEB-INF/velocity.properties. If you don't do this, you may get an exception saying access denied for reading velocity.properties.

Create a controller called HelloController
public class VideoController {
@RequestMapping(value="/hello", method = RequestMethod.GET)
public ModelAndView sayHello() {
return new ModelAndView("hello");
}
}

Start the server and browse http://localhost:8888/hello and check if you get the decorator content wrapper the Hello World message.


Friday, January 18, 2013

Running Spring 3.2 on Google App Engine

We will walk through the minimum to set up Spring MVC on Google App Engine.

Below are the specs that we are using:
  • Java 1.6
  • Spring MVC 3.2
  • Google App Engine SDK 1.7
  • Spring Tool Suite 3.1

Launch Spring Tool Suite.

In the Dashboard (one of the tabs), click Extensions. Find and install Google Plugin.

Create a Google App Engine Project. Do not check GWT.

First put the following Spring dependencies to /war/WEB-INF/lib
  • spring-aop-3.2.0.RELEASE.jar
  • spring-beans-3.2.0.RELEASE.jar
  • spring-context-3.2.0.RELEASE.jar
  • spring-context-support-3.2.0.RELEASE.jar
  • spring-core-3.2.0.RELEASE.jar
  • spring-expression-3.2.0.RELEASE.jar
  • spring-web-3.2.0.RELEASE.jar
  • spring-webmvc-3.2.0.RELEASE.jar

You will also need:
Right click on the root folder of the project. Click Properties. Click Java Build Path. Add all the jars above.

In /war/WEB-INF/web.xml, add the following:



Create /war/WEB-INF/spring-mvc-servlet.xml.

Replace with:


Create the folder /war/views.

Create the file /war/views/hello_world.jsp.

Create the file /src/com.yournamespace.controller.HelloWorldController.java

Put the following:


The above returns the hello_world.jsp.

Right click on the root project folder. Click Run as -> Web Application.

Using Amazon Route 53 to map a subdomain to an instance

In the AWS Management Console, go to the Route 53 page.

Click Create Hosted Zone. Fill in the domain name (subdomain in this case) and the comment.

Click Go To Record Sets on the top left navbar.
Do not create additional name serve (NS) or start of authority (SOA) records in the Route 53 hosted zone, or delete the existing records.
Click on Create Record Set.

For the purpose of this demo, we will map the subdomain to a EC2 instance

Select A record for Type.

Put the IP of the EC2 instance as value. If you don't have an Elastic IP, allocate one for your instance.

Click Create Record Type.

Take note of the four NS records.

In your DNS registrar, add the four NS records above to the subdomain.
Do not add a start of authority (SOA) record to the zone file for the parent domain. Because the subdomain will use Route 53, the DNS service for the parent domain is not the authority for the subdomain. 
If your DNS service automatically added an SOA record for the subdomain, delete the record for the subdomain. However, do not delete the SOA record for the parent domain.
Wait until the DNS is progagate.

Use ping to check the IP the subdomain is mapping to.

Thursday, January 17, 2013

Running Wordpress on Amazon EC2


This article is about how to install Wordpress on Amazon EC2 with MySQL running on Amazon RDS.


Launch a EBS-backed AMI

In the ec2 console,  click launch select Ubuntu Server 12.04.1 LTS 64-bit (AMI id = ami-3d4ff254).
Use t1.micro.
Set delete on termination to false for the root device.
Set termination behaviour to Stop.
Add port 22, 80, 443 for security group


Install Software

sudo apt-get update
sudo apt-get install apache2 libapache2-mod-auth-mysql php5-mysql mysql-client libapache2-mod-php5

We are not going to install mysql-server, since we will be using RDS.


Use Amazon RDS as the database

If you want to set up your own MySQL database, you can do so. For the purpose of this tutorial, we will use Amazon RDS because it takes care of scaling, replication and backups (to S3) without minimim effort.

Read Using MySQL on Amazon RDS to create a MySQL database.

After you created a database, note the database name, username, password, and endpoint address of the DB instance. The endpoint address will be like wordpress.a2ks0zoxdxq.us-east-1.rds.amazonaws.com.

You can ssh into your ec2 instance and run the mysql command to access the database.

mysql -h {endpoint_address} -P 3306 -u{username} -p{password}

Note that when I used a different syntax for the above mysql command, I kept on getting access denied error. Please use the syntax I specified above.


Download Wordpress

sudo mkdir /var/www
cd /var/www
wget http://wordpress.org/latest.tar.gz
tar -xzvf latest.tar.gz
rm latest.tar.gz
mv wordpress {name_of_your_blog}


Configure Wordpress

mv wp-config-sample.php wp-config.php
vi wp-config.php

Change these:

define('DB_NAME', 'database_name_here');
/** MySQL database username */
define('DB_USER', 'username_here');
/** MySQL database password */
define('DB_PASSWORD', 'password_here');
/** MySQL hostname */
define('DB_HOST', 'localhost');
/** Database Charset to use in creating database tables. */
define('DB_CHARSET', 'utf8');
/** The Database Collate type. Don't change this if in doubt. */
define('DB_COLLATE', '');

The DB_HOST will be the endpoint we specified above. Include the port :3306 as well.
Ex. wordpress.a2ks0zoxdxq.us-east-1.rds.amazonaws.com:3306

Generate keys for the following:
https://api.wordpress.org/secret-key/1.1/salt/

define('AUTH_KEY',         'put your unique phrase here');
define('SECURE_AUTH_KEY',  'put your unique phrase here');
define('LOGGED_IN_KEY',    'put your unique phrase here');
define('NONCE_KEY',        'put your unique phrase here');
define('AUTH_SALT',        'put your unique phrase here');
define('SECURE_AUTH_SALT', 'put your unique phrase here');
define('LOGGED_IN_SALT',   'put your unique phrase here');
define('NONCE_SALT',       'put your unique phrase here');

Configure Apache

cd /etc/apache2/sites-available
cp default wordpress
vi wordpress

Change DocumentRoot and Directory from /var/www/ to your blog's directory.

Change AllowOverride from none to all. (If you don't do this, you can't do pretty links in Wordpress.)


                Options Indexes FollowSymLinks MultiViews
                AllowOverride all
                Order allow,deny
                allow from all
       

Save the File.

a2dissite default
a2ensite wordpress

a2enmod rewrite

service apache2 reload

Launch the site

If you are starting a new Wordpress site, access the site from the browser, and following the on screen instruction and you are done.


Porting data from local MySQL to RDS MySQL

To export data:

mysqldump -u{username} -p{password} -h {host} {database} > backup.sql

To import data:

mysql -u{username} -p{password} -h {host} {database} < backup.sql


Creating a Wordpress EBS-backed AMI


The goal is to create a customized Wordpress image that can be quickly launched for scaling purposes.
When an instance is launched with this image, all services (Apache, MySQL, etc) should start on boot and resume normal operations.

Steps for creating a Wordpress AMI:
  1. Launch a EBS-backed AMI instance (MUST be EBS-backed, not instance store-backed)
  2. When the instance is running, install required software and load applications (Apache, Wordpress).
  3. Create an image from the instance

Note that if you attach new volumes, the new AMI will contain block device mapping information for those volumes. When you launch an instance with the new AMI, the instance will launch additional volumes.

1 & 2) Launch a running Wordpress Instance

Follow Running Wordpress on Amazon EC2 to launch a running instance with all Wordpress software installed.

3) Create an image from the instance

In the AWS Management Console, click instances at the left sidebar.

Right click on the Wordpress instance created above and click on Create Image.

Fill in the image name. I like to name things in a convention that is systematic. If you are planning to write deploy scripts and do auto-scaling, it is easier to identify what an image is. I use the following convention:

{namespace}_{what_is_it}_{date}

Ex. mycompany_blog_20130118

You will want the date because you may create an image every time you deploy new codes.

Leave the other options as default, and click on Create Image.

On the left sidebar, click on AMIs under Images.

You can see the status of the AMI we just created.

You should launch an instance from this AMI and test all the data is there.

Using MySQL on Amazon RDS

For an introduction on what is Amazon RDS, please refer to What is Amazon Relational Database Service (Amazon RDS).

AWS Free Tier includes 750hrs of Micro DB Instance each month for one year, 20GB of Storage, and 20GB for Backups with Amazon Relational Database Service (RDS).


Create a DB Security Group

In the RDS UI console, click on DB Security Groups on the left sidebar.

Click Create DB Security Group

Fill in the name and description. (Ex. Name=blog, Description=wordpress blog)

Select the newly created Security Group.

In the bottom half of the screen (the description tab), choose EC2 Security Group for the drop down box and select the desired EC2 Security Group.


Launch a RDS instance

In the RDS UI console, click on DB Instances on the left sidebar.

Click Launch a DB Instance.

Select MySQL Community Edition.

Fill in the database specs.

If you want 24/7 availability, you would want Multi-AZ deployment. But it would cost more.

For this parameter......Do this:
License ModelSelect the default, General-Public-License, to use the general license agreement for MySQL.
DB Engine VersionSelect 5.5.20 to use the default version of MySQL. Note that RDS supports additional versions of MySQL.
DB Instance ClassSelect db.m1.small to select a configuration that equates to 1.7 GB memory, 1 ECU (1 virtual core with 1 ECU), 64-bit platform, and moderate I/O capacity. for more information about the capacity for all the DB Instance class options, see Amazon Relational Database Service Features.
Multi-AZ DeploymentSelect No to not request that your database be made available in multiple availability zones. For more information about multiple availability zones, see the RDS documentation.
Auto Minor Version UpgradeSelect Yes to enable your DB Instance to receive minor DB Engine version upgrades automatically when they become available.
Allocated StorageType 5 to allocate 5 GB of storage for your database. In some cases, allocating a higher amount of storage for your DB Instance than the size of your database can improve I/O performance. For more information about storage allocation, see Amazon Relational Database Service Features.
Use Provisioned IOPSLeave the check box unselected. This option turns on Provisioned IOPS (I/O operations per second), a high-performance storage option in RDS that is optimized for I/O-intensive, transactional (OLTP) database workloads. For more information about high performance storage, see Provisioned IOPS.
DB Instance IdentifierType a name for the DB Instance that is unique for your account in the region you selected. You may chose to add some intelligence to the name such as including the region and DB Engine you selected, for example west2-mysql-instance1.
Master User NameType a name using alphanumeric characters that you will use as the master user name to log on to your DB Instance with all database privileges.
Master User PasswordType a password that contains from 8 to 16 printable ASCII characters (excluding /,", and @) for your master user password.

Click continue to Additional Configuration.

Fill in the database name. For Availability Zone, choose the same region that your EC2 instance is at. This will speed up the connection and also not induce any cross-region fees.

Select the DB Security Group that we created in the previous section.

Click continue to Management Options.

Enable Automatic Backups, and set the Backup Retention Period to your desired days. A retention period of 7 days means that the daily backup will be deleted after 7 days. That also means there will be 7 backups at anytime.

Review the information in the next screen and launch the DB instance.

Wednesday, January 16, 2013

How to identify EBS backed VS instance backed root device

The EC2 console makes it very painful to know if an instance is EBS-backed or instance-backed.

We will use the command ec2-describe-images to check if the root device is EBS-backed after we launch an instance.


Steps:

Log in to your AWS console.

Launch an instance with your desired image (I like the Ubuntu Server 12.04.1 LTS 64-bit; it's AMI id is ami-3d4ff254 and it's EBS-backed).

At the AWS EC2 console, find the AMI ID of the instance you just launched (should be the third column in the table).

ssh into your instance.

Run ec2-describe-images (See Installing the Amazon EC2 API Tools).

> ec2-describe-images ami-3d4ff254 -H
Type ImageID Name Owner State Accessibility ProductCodes Architecture ImageType KernelId RamdiskId Platform RootDeviceType VirtualizationType Hypervisor
IMAGE ami-a29943cb 099720109477/ubuntu/images/ebs/ubuntu-precise-12.04-amd64-server-20120424 099720109477 available public x86_64 machine aki-825ea7eb ebs paravirtual xen
The highlighted text shows the root device type.

Note that EBS-backed instance does NOT mean EBS-optimized instance. (An EBS-Optimized instance is provisioned with dedicated throughput to EBS. The m1.large, m1.xlarge, and m2.4xlarge instance types are currently available as EBS-Optimized instances.)

If there is an easier way that you can identify EBS-backed images in the UI console, please post below.

Amazon EC2 Root device - EBS-backed VS Instance store-backed

When launching a EC2 instance, it is very important to know if the root device is EBS-backed or instance store-backed. The major difference lies in what happens to the data when an instance is terminated.

In an EBS-backed root device, when the instance terminates, the data in the root volume would persist.

In an instance store-backed root device, when the instance terminates, the data would be gone.


Detailed Comparison:
CharacteristicAmazon EBS-BackedAmazon instance store-backed
Boot Time
Usually less than 1 minute
Usually less than 5 minutes
Size Limit
1 TiB
10 GiB
Root Device Volume
Amazon EBS volume
Instance store volume
Data Persistence
Data on Amazon EBS volumes persists after instance termination; you can also attach instance store volumes that don't persist after instance termination
Data on instance store volumes persists only during the life of the instance; you can also attach Amazon EBS volumes that persist after instance termination
Upgrading
The instance type, kernel, RAM disk, and user data can be changed while the instance is stopped.
Instance attributes are fixed for the life of an instance
Charges
Instance usage, Amazon EBS volume usage, and Amazon EBS snapshot charges for AMI storage
Instance usage and Amazon S3 charges for AMI storage
AMI Creation/Bundling
Uses a single command/call
Requires installation and use of AMI tools
Stopped State
Can be placed in stopped state where instance is not running, but the instance is persisted in Amazon EBS
Cannot be in stopped state; instances are running or terminated

Check the AMI basics guide for more details.

Friday, January 11, 2013

Backing up Database from EBS to S3


General Steps

  1. freeze the database (flush all tables with read lock)
  2. freeze the file system (xfs freeze)
  3. take the snapshot
  4. unfreeze file system
  5. unfreeze database


Use xfs on EBS and ephermeral volumes (fast to format and supports freezing)

~20MB/sec for a single stream transfer to S3

Snapshots are incremental, compressed and performed in the background by EBS

Snapshots can be corrupt, take them often

Thursday, January 10, 2013

What is Amazon Relational Database Service (Amazon RDS)?

Amazon Relational Database Service (Amazon RDS)
  • MySQL database on the cloud
  • automatically backs up the database

What you need to know?
  • operates from a small instance (64-bit platform with 1.7 GB of RAM and 1 Elastic Computing Unit (ECU)) up to a quadruple extra-large instance (64-bit platform with 68 GB of RAM and 26 ECUs).
  • need to tune RAM and storage
  • use Amazon CloudWatch to help tuning

Scaling
  • can be scaled in several dimensions: database storage size, database instance compute capacity, 
  • and the number of Read Replicas
  • can partition/shard to spread workload over multiple RDS instances
  • use MySQL’s built-in asynchronous replication to scale heavy workloads

Backups
  • 2 types: automated and user-initiated
  • automated backup retains up to 8 days; can be restore to any point from start of retention period to about last 5 mins from current time
  • user-initiated snapshots are kept until explicitly deleted
  • backups are replicated across multiple availability zones within the same region

Multi-AZ deployment option

  • Synchronously replicate data across availability zones
  • When DB fails, RDS will automatically failover to the standby instance (takes ~3mins)

Database tuning factors

Database performance - scale up memory and cpu resources by choosing a larger instance

I/O performance and bandwidth - increase EBS volumes, use RAID 0 across multiple EBS

Use clustering, replication, multiple read slaves


Pricing - http://aws.amazon.com/rds/pricing/

Based on the DB instance hours (per hour), the amount of provisioned database storage (per GB-month and per million IO requests), additional backup storage (per GB-month), and Data Transfer in / out (per GB per month)


Anti-Patterns

index-and-query data - if you don't need joins or complex transactions, use SimpleDB

BLObs - use S3

Automatic elasticity - System admins need to configure RDS to achieve elasticity

Other databases - use EBS


Resources


Amazon Elastic Block Store (EBS) – http://aws.amazon.com/ebs
Amazon EC2 Instance Store Volumes – http://docs.amazonwebservices.com/AWSAmazon EC2/latest/UserGuide
(see sections on Instance Types, Instance Storage, and Block Device Mapping)
Amazon Simple Storage Service (Amazon S3) – http://aws.amazon.com/s3
Amazon Simple Queue Service (Amazon SQS) – http://aws.amazon.com/sqs
Amazon SimpleDB – http://aws.amazon.com/simpledb
Running Databases on AWS – http://aws.amazon.com/running_databases
Amazon Relational Database Service (Amazon RDS) – http://aws.amazon.com/rds

What is Amazon SimpleDB?

Amazon SimpleDB

  • structured data (key/value pairs) in the cloud
  • highly available, scalable non-relational data store
  • eventually consistent reads (consistent read option available)
  • automatic, geo-redundant replication (within same region)
  • maximum of 10GB of storage per domain (for >10GB, you can partition your data across multiple SimpleDB domains)


Example usage:

Consider the situation when you need to store millions of images in the cloud. You can store these images in EBS or S3 and store the image locations in SimpleDB.


Pricing - http://aws.amazon.com/simpledb/pricing/


Based on data storage (per GB-month), data transfer (per GB-month), and machine hours (per month) associated with PUT and GET operations


Anti-Patterns

Prewritten application tied to a relational database - use RDS or EBS

Joins, complex transactions - use RDS or EBS

BLOb - use EBS or S3 and use SimpleDB to keep track of the locations

numeric data - use EBS or RDS

large dataset - use EBS or RDS

What is Amazon Simple Queue Service (Amazon SQS)?

Amazon Simple Queue Service (Amazon SQS)

  • reliable, highly scalable, hosted message queue for temporary storage and delivery of short (<= 64kB) text-based data
  • temporary data repository for messages waiting for processing
  • supports unlimited number of queues
  • supports unordered, at-least-once delivery of messages
  • message retention time from 1 hour to a 14 days
  • can minimize use of temporary disk files


What is this used for?

  • glue between components that may work faster or slower than others
  • multi-step processing pipeline
  • for optimization



A single client can send or receive Amazon SQS messages at a rate of about 5 to 50 messages per
second


SQS Pricing - http://aws.amazon.com/sqs/pricing/

Based on number of requests (priced per 10,000 requests) and the amount of data transferred in and out (priced per GB per month)


Anti-Patterns

Binary or large data - use S3 or RDS to storage the binary and store a pointer in SQS

Long term usage - use EBS, RDS

What is Amazon Simple Storage Service (Amazon S3)?

Amazon Simple Storage Service (Amazon S3)
  • highly scalable durable, distributed object store
  • for mission-critical and primary data storage
  • geo redundant
  • stored objects can be 1 byte to 5TB
  • can store unlimited Objects
  • can sustain the concurrent loss of data in two facilities
  • designed for 99.999999999% (“11 nines”) durability per object and 99.99% availability over a one-year period

What can you store on S3?
  • static web content
  • video, photo
  • clickstream data
  • media transcoding
  • snapshot for EBS volumes
  • use Reduced Redundancy Storage (RRS) option for noncritical data (99.99% durability per object over a given year)

Pricing

Based on storage (per GB per month), data transfer in or out (per GB per month), and requests (per n thousand requests per month).

To upload or download large volume of data, you can use AWS Import/Export service.


Anti-Patterns

File system - S3 is a flat namespace; not for POSIX-compliant filesystem

Structured data with query - use SimpleDB to hold location (bucket name and key) of the object

Rapidly-changing data - use EBS or database

What is Amazon Local Instance Store volumes (ephemeral drives)?

Local Instance Store volumes (ephemeral drives)

  • temporary block-level storage (persists only during the lifetime of the associated EC2 instance)
  • 160GB to 1.7TB
  • used for scratch volume, RAM disk, continually changing data, buffers, caches, data replicated across a fleet of instances (load balanced pool of web servers)
  • very fast for sequential access
  • data persists across instance reboots, but NOT when EC2 terminates or when going through a failure/restart cycle
  • no charge
  • fixed in size


Anti-Patterns

Persistent storage - use EBS or S3

Database storage - use EBS

Shared Storage - use EBS or S3

Snapshots - EBS

What is Amazon EC2 Elastic Block Storage (EBS)?


Amazon Elastic Block Storage (EBS)
  • off-instance NAS (Network Attached Storage)
  • size between 1GB to 1TB, allocated in 1GB increments
  • 1 EBS can only be attached to 1 EC2 instance
  • 1 EC2 instance can attach multiple EBS
  • for data that changes quickly and requires long-term persistance (ex. file system, database) 
  • EBS volumes with less than 20GB of modified data since the last snapshot are designed for between 99.5% - 99.9% annual durability
  • snapshots are available across availability zone
  • if an availability zone is unavailable, the EBS will not be available
  • **snapshot EBS frequently
  • EBS is priced per GB-month of provisioned storage and per million I/O requests
  • EBS Snapshots are priced per GB-month of data stored, as well as per 1,000 PUT requests and per 10,000 GET requests when saving and loading snapshots

How to increase/decrease size of a EBS
  • Quiesce the application or file system.
  • Snapshot your EBS volume’s data into Amazon S3 (using “Create Snapshot from Volume”).
  • Create a new EBS volume from the snapshot, but specify a larger size than the original volume.
  • Attach the new, larger volume to your Amazon EC2 instance.
  • Detach and delete the original EBS volume.

Anti-Patterns

Temporary storage - use local (Ephemeral) volume

Highly-durable storage - S3 standard storage is designed for 99.999999999% annual durability per object

Static data or web content - use S3

Key-value pair, schema-less data - use SimpleDB

Tuesday, January 8, 2013

Spring MVC - remember me security.xml

Below is a reference for (Spring 3.0) remember-me settings in WEB_INF/security.xml.

There are 4 pieces of information you need to define:

1) Enable remember-me functionality. Make sure you supply a key**. If you don't, it may not work.

auto-config
="false"  entry-point-ref="authenticationEntryPoint">
<remember-me services-ref="myRememberMeServices" key="your_key"/>
</http>


2)  Define the remember-me service

<beans:bean id="myRememberMeServices"
class="org.springframework.security.web.authentication.rememberme.TokenBasedRememberMeServices" >
                    <beans:constructor-arg name="key" value="your_key"/>
                    <beans:constructor-arg name="userDetailsService" ref="userManager"/>
</beans:bean>


3) Define the remember-me filter

<beans:bean id="rememberMeFilter" class=
 "org.springframework.security.web.authentication.rememberme.RememberMeAuthenticationFilter">
  <beans:property name="rememberMeServices" ref="myRememberMeServices"/>
  <beans:property name="authenticationManager" ref="authenticationManager" />
</beans:bean>


4) Define the authentication provider

<beans:bean id="rememberMeAuthenticationProvider" class=
 "org.springframework.security.authentication.RememberMeAuthenticationProvider">
  <beans:property name="key" value="your_key"/>
</beans:bean>

Thursday, January 3, 2013

Authentication Types


http basic - username and password are send in plain text along with the request; subjected to every kind of attacks

http digest - password is hashed with md5; subjected to man-in-the-middle-attack

wsse - username and password encryption, prevents man-in-the-middle-attack, no need for web server cookies; but it requires the server and client to know some form of the password. (Either server holds the clear-text password, or the client has the hashed version of the password)

x.509 - public and private key

oauth2 - requires client id and client secret key

http basic via https - everything is encrypted