Showing posts with label aws. Show all posts
Showing posts with label aws. Show all posts

Friday, July 24, 2015

Migrating Splunk indexed data

First stop splunk.

cd into your splunk/bin directory
./splunk stop

Create a new folder (ex. /mnt/splunk_data).

cp -rp splunk/var/lib/splunk/* /mnt/splunk_data/

Change SPLUNK_DB to point to /mnt/splunk_data.

vi splunk/etc/splunk-launch.conf

Find SPLUNK_DB in the file and change the path.

SPLUNK_DB=/mnt1/splunk_data

You may also want to change the retention policy and the max storage size.

// 30 days
frozenTimePeriodInSecs = 2592000

// 90G
maxTotalDataSizeMB = 90000

It's recommended to set the size using the following formula:

Total storage = daily average rate x retention policy x 1/2 = 15 Gig

Start Splunk.

./splunk start

To tune Splunk settings, check:
http://docs.splunk.com/Documentation/Splunk/4.3.1/Installation/CapacityplanningforalargerSplunkdeployment

Thursday, July 23, 2015

Install Splunk Forwarding and Receiving

We will be using Splunk Light.

Click on the menu icon at the upper right corner. Choose Data -> Receiving.

In Configure receiving, choose 9997 as the receiving port.

In your application instance, install the universal splunk forwarder.

http://www.splunk.com/en_us/download/universal-forwarder.html

Extract it and put it in /opt/splunk_forwarder directory

sudo ./splunk start
sudo ./splunk enable boot-start -user ec2-user

List all the forward servers:
./splunk list forward-server

Active forwards:
None
Configured but inactive forwards:
None

If it prompts you for username and password, use
username: admin
password: changeme

Add the receiving server to the forwarder:

./splunk add forward-server :9997

Test the connection:
./splunk list forward-server

Active forwards:
None
Configured but inactive forwards:
:9997

If it's not active, remember to add port 9997 to your security group.

Add data to monitor

./splunk add monitor -index main -sourcetype

To list what's being monitored:

./splunk list monitor

Installing splunk on AWS

Begin by downloading Splunk Light here: http://www.splunk.com/en_us/download.html. You will probably need to register an account on Splunk before it lets you to download it.

Upload Splunk to your ec2 instance using SCP. For example

scp -i ec2-user@:tmp

In above, I uploaded the splunk tgz file to a tmp folder in my ec2 instance.

You will need to install glibc.i686 first.

yum -y install glibc.i686

Create a folder called /opt if it doesn't exist

Extract your tgz file inside opt

tar xvzf splunklight-6.2.4-271043-Linux-i686.tgz

The splunk executable is located in /opt/splunk/bin. cd into it.

Start splunk:

sudo ./splunk start --accept-license

Start splunk on boot:

sudo ./splunk enable boot-start -user ec2-user

You should be able to view splunk's web interface at port 8000 or your ec2 public address.

Other useful commands:

./splunk stop
./splunk restart

Thursday, June 5, 2014

Setting up Wordpress on Elastic Beanstalk

ElasticBeanStalk is a service that automates scaling, load-balancing, and deploying applications so you can concentrate on only software development. In a way, it is very similar to Google App Engline.

In this article, we will visit how we can set up Wordpress on Elastic Beanstalk.


Configuration the Elastic Beanstalk Environment

First log into your ElasticBeansTalk and click on Create a New Application.

Enter the Application Name and Description and click Next.

Click on "Create one now" to create a new environment.

Environment tier: Web Server 1.0
Predefined configuration: PHP
Environment type: Load balancing, autoscaling

Note for Environment tier, Web Server handles web requests (HTTP/S) while workers handle background processes.

Choose sample application for now

You will then be prompted to input an Environment Name. Label it whatever, as we will use CNAME later.

For Additional Resources, check create an RDS DB instance with this environment.

In Configuration Details, select your EC2 key pair and leave the other details as it is. You can always change these later.

For RDS configuration, put 5GB for allocated storage. Input the username and password. Select Create snapshot and single availability zone.

Click Launch.

Once it's launched, click Configuration on the left sidebar, then Software Configuration. In PARAM1, set to production, staging or something else.


Installing Wordpress

We need to have different wp-config files for local development and Elastic BeanStalk. Let's define the local config as local-config.php.

Set up wordpress in your local computer. 

In wp-config, replace the database configs with the following: 

if ( file_exists( dirname( __FILE__ ) . '/local-config.php' ) ) {
    define( 'WP_LOCAL_DEV', true );
    include( dirname( __FILE__ ) . '/local-config.php' );
} else {
    define( 'WP_LOCAL_DEV', false );
    define('WP_HOME','');
    define('WP_SITEURL','');
    define('DB_NAME', 'database');
    define('DB_USER', 'username');
    define('DB_PASSWORD', 'password');
    define('DB_HOST', 'localhost');
}

Fill in the above db_name, db_user, db_password, db_host with the appropriate settings.

Create a file called local-config.php at the same directory as wp-config.php

Put in the following with your local database information.

<?php
    define('WP_HOME','');
    define('WP_SITEURL','');
    define('DB_NAME', '');
    define('DB_USER', '');
    define('DB_PASSWORD', 'root');
    define('DB_HOST', '');


Install AWS ElasticBeanstalk Tools and AWSDevTools

Download  from http://aws.amazon.com/code/6752709412171743

Read this: http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/GettingStarted.GetSetup-devtools.html

Check to make sure eb is running properly:

eb --version

Configure your ElasticBeanstalk Git settings:

git aws.config

Reading Credentials from C:\Users\Kenneth\.elasticbeanstalk\aws_credential_file.

The file does not exist.  You can supply credentials by creating the file or editing .elasticbeanstalk/config to reference a different file.
The credential file should have the following format:

AWSAccessKeyId=your key
AWSSecretKey=your secret

AWS Access Key:
AWS Secret Key:
AWS Region [default to us-east-1]: (Check this in your Elastic Beanstalk console)
AWS Elastic Beanstalk Application: <put in the application name you created above>
AWS Elastic Beanstalk Environment: <put in the environment name you created above>

Check if Elastic Beanstalk can detect your app:

eb status --verbose

Now deploy your application:

git aws.push

Remember to use Route53 to map to Elastic Beanstalk.

http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/customdomains.html

After you point the domain to the Elastic Load Balancer. If the site does not load up in the browser, do not panic. Give it 15 mins. It will be back up.

Wednesday, May 14, 2014

AWS s3 - The specified bucket is not valid.

If you receive the message "The specified bucket is not valid." while trying "enable website hosting", make sure your bucket name adhere to the following:

- Should not contain uppercase characters
- Should not contain underscores (_)
- Should be between 3 and 63 characters long
- Should not end with a dash
- Cannot contain two, adjacent periods
- Cannot contain dashes next to periods (e.g., "my-.bucket.com" and "my.-bucket" are invalid)

Wednesday, July 17, 2013

Ansbile EC2 - setting up Nginx, MySQL, php, git

In this post, we will write a playbook that's going to set up a EC2 machine for a fully workable php environment.

Starting from a fresh machine with an attached ebs volume, we will do the following:

  1. Format the new ebs volume with XFS and mount it as /vol
  2. Install php, mysql and nginx
  3. Create a mysql user and create a database
  4. Copy the public and private keys into the targeted machine
  5. Checkout a project from github

Begin by spinning a fresh EC2 AMI and attach a ebs volume to it. Read Ansible - how to launch EC2 instances and setup the php environment.


Format the new ebs volume with XFS and mount it as /vol

We will mount the new ebs volume /dev/xvdf as /vol and format it with XFS

    - name: update machine with latest packages
      action: command yum -y update
    - name: install xfsprogs
      action: yum pkg=xfsprogs state=latest
    - name: format new volume
      filesystem: fstype=xfs dev=/dev/xvdf
    - name: edit fstab and mount the vol
      action: mount name={{mount_dir}} src=/dev/xvdf opts=noatime fstype=xfs state=mounted


Install php, mysql and nginx

    - name: install php
      action: yum pkg=php state=latest
    - name: install php-mysql
      action: yum pkg=php-mysql state=latest
    - name: install nginx
      action: yum pkg=nginx state=latest
    - name: ensure nginx is running
      action: service name=nginx state=started
    - name: install mysql server
      action: yum pkg=mysql-server state=latest
    - name: make sure mysql is running
      action: service name=mysqld state=started


Create a mysql user and a database

    - name: install python mysql
      action: yum pkg=MySQL-python state=latest
    - name: create database user
      action: mysql_user user=admin password=1234qwer priv=*.*:ALL state=present
    - name: create db
      action: mysql_db db=ansible state=present


Copy the public and private keys into the targeted machine

We want the target machine to be able to do a git pull without username and password prompts.

mkdir ~/.ssh
ssh-keygen -t rsa -C "you@email.com"

You will see:
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Just press Enter on the above prompts.

Two files will be generated: id_rsa, id_rsa.pub

Log in to Github and then Go to Account Settings -> SSH Keys

Add new key by giving it a name and pasting the content of id_rsa.pub

Test it by:
ssh -T git@github.com
Here are the Ansible tasks:

    - name: install git
      action: yum pkg=git state=latest
    - name: copy private key
      action: template src=~/.ssh/id_rsa.pub dest=~/.ssh/id_rsa.pub
    - name: copy public key
      action: template src=~/.ssh/id_rsa dest=~/.ssh/id_rsa


Checkout a project from github

    - name: git checkout source
      action: git repo=ssh://git@github.com:{your_git_repo}.git dest={{work_dir}} version=unstable


Full Ansible Playbook source:

Wednesday, July 10, 2013

AWS Elastic MapReduce - EMR MySQL DBInputFormat

In this post, we will build a MapReduce program as a JAR executable. To make this example more interesting than most of the other online posts out there, we will modify the common WordCount example to fetch from MySQL instead of a text file.

You will need to at least understand the basics of what are the mapper and the reducer to follow this post. You may want to read this from Apache.

We will use Maven to build the project. If you have no idea how to do this, read Building a JAR Executable with Maven and Spring. We will feed this JAR via the Amazon Elastic MapReduce (EMR) and save the output in Amazon S3.

Here are the EMR supported Hadoop Versions. We will be using 1.0.3.


What we will do:

Assume we have a database called Company and there is a table called Employee with two columns: id and title.

We will count the number of employees with the same titles.

This is same as the WordCount examples you see in other tutorials, but we are fetching this from a database.


Install Hadoop Library

First in your java project, include the Maven Library in the pom.xml file.

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.0.3</version>
</dependency>


The File Structure

The program will be very basic and contain the following files. The filenames should be self-explanatory.

Main.java
Map.java
Reduce.java


The mapred library VS the mapreduce library

When you are reading other hadoop examples online, you will see them using either the mapred or the mapreduce library. mapred is the older version, while mapreduce is the cleaner and newer version. To upgrade from mapred to mapreduce, read Hadoop - mapred VS mapreduce libraries.

This example will use the org.apache.hadoop.mapreduce library.


EmployeeRecord

We will need to serialize the object of our interest by implementing Writable and DBWritable as show below.




The Mapper




The Reducer




Main.java

We will hope everything up. The steps are simple.

Create a Job.
Set output format.
Set input format.
Set Mapper class.
Set Reducer class.
Set input. (In our case, it will be from the database)
Set output.


Run the Job via the AWS EMR console

Compile the project and generate a self-contained JAR file. If you are using maven, read Building a JAR Executable with Maven and Spring.

Upload your JAR file to your s3 bucket.

In the AWS EMR console, specify the location of the JAR file.

JAR location: {your_bucket_name}/{jar_name}

Arguments: s3n://{your_bucket_name}/output

The program above takes in the output location as an argument.

Read AWS - Elastic Map Reduce Tutorial for more details on how to create a job flow in EMR.

If you encounter the mysql driver missing error, read Amazon Elastic MapReduce (EMR) ClassNotFoundException: com.mysql.jdbc.Driver.

Tuesday, July 9, 2013

Amazon Elastic MapReduce (EMR) ClassNotFoundException: com.mysql.jdbc.Driver

If you get the "ClassNotFoundException: com.mysql.jdbc.Driver" error while doing a JAR Elastic MapReduce, you will need to copy the mysql connector library into the hadoop/bin library.

The error will look like:

Caused by: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:169)
at org.apache.hadoop.mapreduce.lib.db.DBConfiguration.getConnection(DBConfiguration.java:148)
at org.apache.hadoop.mapreduce.lib.db.DBInputFormat.getConnection(DBInputFormat.java:184)
... 20 more

We can copy the mysql connector library to each of the machines by "bootstrapping".


1.) Get the MySQL connector library.

You can download it from the Maven repository.

Create a bucket on S3 and upload the SQL connector to this bucket.


2.) Writing a bootstrap bash file

Name this file bootstrap.sh. We will use the "hadoop fs" command to copy the connector from S3 to each machine.

Script:
#!/bin/bash
hadoop fs -copyToLocal s3n://wundrbooks-emr-dev/mysql-connector-java-5.1.25.jar $HADOOP_HOME/lib
Upload this script to the same bucket you created in the previous step.


 3.) Create a Job Flow

Log in to the AWS EMR console.

Click on create a job flow.

Fill in all the details including your JAR file.

At the last "bootstrap" step, select custom bootstrap action and put in the location of the bootstrap.sh script (ex. s3n://{my_bucket}/bootscript.sh).

Start the job flow and monitor the stderr and stdout. Everything should work.

Monday, July 8, 2013

Amazon EMR - RDS DB Security Group

Log in to your AWS RDS console. Select Security Groups on the left sidebar.

Select your DB Security Group and click on the Edit button.

Add the following:

1.) EMR master

Connection Type = EC2 Security Group
EC2 Security Group = ElasticMapReduce-master

2.) EMR slave

Connection Type = EC2 Security Group
EC2 Security Group = ElasticMapReduce-slave

Friday, July 5, 2013

Monday, June 17, 2013

AWS - Elastic Map Reduce Tutorial

MapReduce has become a very common technique utilizing parallel computing.

Let's say you have a database table with username and description columns in it. You want to replace html tags in the description column with empty spaces. Let's say the database has petabytes of data. It will take forever for a single machine to do this job.

MapReduce works by distributing this job among multiple machines. Each machine executes different dataset in parallel, and then the outputs will be aggregated. Therefore, a job that may take days to compute can take only mins to finish.

In this tutorial, we will experiment with Amazon's Elastic MapReduce.

Let's get started.


Create a S3 bucket

Elastic MapReduce uses S3 to store it's input and output. We will first create a bucket.

Log into your Amazon Web S3 console. Create a bucket, say my_map_reduce_data. Amazon S3 bucket names need to be unique across all Amazon S3 buckets. It's best to prefix it with your company name.


Create input data

Let's create a text file and put some random data into it. We will create a MapReduce function to count word frequencies.

Ex.
apple apple orange orange orange
pear pear pear pear pear pear pear pineapple pineapple

Label this file input.txt.

Create a folder inside my_map_reduce_data and call it input.


Implementing the mapper function

Download the following file and save it as wordSplitter.py

https://s3.amazonaws.com/elasticmapreduce/samples/wordcount/wordSplitter.py

It's a script that reads the input file line by line and prints out the number of occurrence for each distinct word in that line.

Upload wordSplitter.py to my_map_reduce_data


Launch the Elastic MapReduce Cluster

Sign in to the Elastic MapReduce Console.

Click on Create New Job Flow.

Give the Job Flow Name WordSplitter.

Choose Amazon Distribution for the Hadoop version.

Choose Streaming as the job flow type. You write the mapper and reducer scripts in any of the following languages: Ruby, Perl, Python, PHP, R, Bash, or C++.

Click Continue.


Input, output locations

Fill in the following:
Input Location: my_map_reduce_data/input
Output Location: my_map_reduce_data/output
Mapper: my_map_reduce_data/wordSplitter.py
Reducer: aggregate

Click Continue.


Configure EC2 Instances

Leave the options as it is.

Click Continue.


Advanced Options

If you want to ssh into the master node, specify the Amazon EC2 Key Pair.

For the Amazon S3 Log Path, put my_map_reduce_data/logs

Check Yes for Enable debugging. It will create an index of the log files in Amazon SimpleDB.

Leave the other boxes as NO.


BootStrap Actions

Proceed without bootstrap actions. BootStrap allows additional software to be installed in the cluster nodes before the MapReduce process any data.


Review the information and start the job. You should see the job being started.

You can monitor the status of the nodes in the MapReduce Web console.

Check the output folder in S3 after the job is completed.

Remember to delete all buckets to avoid getting charges.

Thursday, May 30, 2013

AWS Auto Scaling Part 3 - Auto Scaling Based on Demand

In this example, we will go through an example that will auto scaling your group based on CPU Utilization. Alternatively, you can also make your group scale in or out based on other metrics like memory usage, IO throughput, etc.

We will set up a auto scaling group with elastic load balancing. Scale out by adding one instance when the CPU utilization is above 80% for 10 mins. Scale in by subtracting one instance when the CPU utilization is below 40% for 10 mins.

There will be 6 things we need to do:
  1. create a launch configuration
  2. create an auto scaling group
  3. create a scale out policy
  4. create a scale in policy
  5. create a alarm attached to the scale out policy
  6. create an alarm attached to the scale in policy

Create a Launch Configuration

We will create a launch configuration called NodeJS.
as-create-launch-config NodeJS --image-id ami-87acc4ee --instance-type m1.large --block-device-mapping="/dev/sda1=snap-1f356ee2, /dev/sdf=snap-18356ee5, /dev/sdg=snap-15356ee8, /dev/sdb=ephemeral0" --group your_security_group --key your_key_pair 
This is same as what we have at the end of Part 2 - Auto Scaling Based on Fixed Number of Instances.


Create an Auto Scaling Group 

We will create an auto scaling group called NodeJSGroup.
as-create-auto-scaling-group NodeJSGroup --launch-configuration NodeJS --availability-zones us-east-1d --min-size 1 --max-size 5 --tag "k=name, v=AsNodeJSProd, p=true" --load-balancers TestNodeBalancer
We have added a elastic load balancer called TestNodeBalancer. You can create a load balancer in the AWS EC2 console.

Check all your available load balancers:
elb-describe-lbs --headers
This is same as what we have at the end of Part 2 - Auto Scaling Based on Fixed Number of Instances.


Create a Scale out Policy

We will create a scale out policy such that a new instance will be added to the auto scaling group when the CPU utilization reaches 80% or above for over 10 mins. We will label this policy NodeJSScaleOutPolicy.

as-put-scaling-policy NodeJSScaleOutPolicy --auto-scaling-group NodeJSGroup --adjustment=1 --type ChangeInCapacity --cooldown 300
Take note of the Amazon Resource  Name (ARN) when it returns as a response. You will need this to set up the alarm.
arn:aws:autoscaling:us-east-1:240591131275:scalingPolicy:f0037fff-0949-4123-8887-f6c7064b8253:autoScalingGroupName/NodeJSGroup:policyName/NodeJSScaleOutPolicy
The --cooldown = 300 parameter means that there needs to be 300 seconds gap before the policy can be applied again.

You can check the policy by
as-describe-policies

Create a Scale in Policy

We will create a scale in policy such that an instance will be removed from the auto scaling group when the CPU utilization reaches 40% or below for over 10 mins.
as-put-scaling-policy NodeJSScaleInPolicy --auto-scaling-group NodeJSGroup --adjustment=-1 --type ChangeInCapacity --cooldown 300

Install CloudWatch API

We will need to associate the alarms with the scale in and scale out policies. To do that, we need to install the CloudWatch Command Line Tools.


Associate Alarms with Policies

Make an alarm called NodeJSHighAlarm and associate it with NodeJSScaleOutPolicy.
mon-put-metric-alarm NodeJSHighAlarm --comparison-operator GreaterThanThreshold --evaluation-periods 1 --metric-name CPUUtilization --namespace "AWS/EC2" --period 600 --statistic Average --threshold 80 --alarm-actions arn:aws:autoscaling:us-east-1:240591131275:scalingPolicy:f0037fff-0949-4123-8887-f6c7064b8253:autoScalingGroupName/NodeJSGroup:policyName/NodeJSScaleOutPolicy --dimensions "AutoScalingGroupName=NodeJSGroup"

Make another alarm called NodeJSLowAlarm and associate it with NodeJSScaleInPolicy.
mon-put-metric-alarm NodeJSLowAlarm --comparison-operator LessThanThreshold --evaluation-periods 1 --metric-name CPUUtilization --namespace "AWS/EC2" --period 600 --statistic Average --threshold 40 --alarm-actions arn:aws:autoscaling:us-east-1:240591131275:scalingPolicy:631fd578-9517-42e2-a424-8b1ed8dd0874:autoScalingGroupName/NodeJSGroup:policyName/NodeJSScaleInPolicy  --dimensions "AutoScalingGroupName=NodeJSGroup"
Check the status of the alarms
mon-describe-alarms --headers 

Wednesday, May 29, 2013

AWS Command Line Resources

http://aws.amazon.com/developertools

AWS Auto Scaling Part 2 - Auto Scaling Based on Fixed Number of Instances

In part 1 - AWS Auto Scaling Part 1 - Configuring Auto Scaling Command Line Tools, we have spinned a new ubuntu machine and installed the auto scaling command line tools.

We will create two things
  • a launch configuration (defines what AMI to be launched)
  • an auto scaling group (defines the number of instances to be launched, etc)

Creating a Launch Configuration

We will create a launch configuration.

as-create-launch-config --image-id --instance-type

Choose an AMI of your choice:
as-create-launch-config NodeJS --image-id ami-111111 --instance-type t1.micro
Check the launch configuration
as-describe-launch-configs --headers
Note: In the AWS EC2 Console, you can create an AMI by right-clicking one of your EC2 instances and click Create Image.


Creating an Auto Scaling Group

The auto scaling group takes the following as its parameters:
  • name for the group
  • a launch configuration
  • one or more availability zones
  • a minimum group size
  • a maximum group size

We will create a group called NodeJSGroup. It will launch the NodeJS configuration we created above. We will use us-east-1d as the region and we want to spin 1 instance.
as-create-auto-scaling-group NodeJSGroup --launch-configuration NodeJS --availability-zones us-east-1d --min-size 1 --max-size 1
Check the status of the group by:
as-describe-auto-scaling-groups --headers
Check the health of the auto scaling instances:
as-describe-auto-scaling-instances --headers 
You should see the health of the launched instances. If you don't see any, check the activity log
as-describe-scaling-activities

Deleting launch configurations and auto scaling groups

We will first remove all the instances from the Auto Scaling Group NodeJSGroup. Then we will delete the launch config and the group.

First update the group setting to terminate all the instances.
as-update-auto-scaling-group NodeJSGroup --min-size 0 --max-size 0
Now delete the group and the launch config.
as-delete-auto-scaling-group NodeJSGroup
as-delete-launch-config NodeJS

A more complicated example (with device mapping, security group

The launch configuration above doesn't take into account of the security groups and device-block-mappings. We will create a more complicated example below.

To check the device-block-mappings of an AMI, you will need to install the EC2 API Tools and use the ec2-describe-images command.

ssh connect to the machine you want to use auto-scaling on.

Run either of the following commands.
ec2-describe-images -o self
ec2-describe-images
You will get something like the following

IMAGE ami-17acc4ee 140591131275/nodejs-production-20130522 240591131275 available private x86_64 machine aki-125ea7eb ebs paravirtual xen
BLOCKDEVICEMAPPING EBS /dev/sda1 snap-1f356ee2 8 true standard
BLOCKDEVICEMAPPING EBS /dev/sdf snap-18356ee5 20 false standard
BLOCKDEVICEMAPPING EBS /dev/sdg snap-15356ee8 20 false standard
BLOCKDEVICEMAPPING EPHEMERAL /dev/sdb ephemeral0
For the above, the block-device-mapping will be
block-device-mapping=/dev/sda1=snap-1f356ee2, /dev/sdf=snap-b8356ee5, /dev/sdg=snap-b5356ee8, /dev/sdb=ephemeral0
Find the instance's security group in the AWS EC2 console
--group security_group
You will need to specify a key pair to ssh into this instance as well
--key key_pair
The whole command will be:
as-create-launch-config NodeJS --image-id --instance-type m1.large --block-device-mapping="/dev/sda1=snap- 1f356ee2, /dev/sdf=snap-18356ee5, /dev/sdg=snap-15356ee8, /dev/sdb=ephemeral0" --group security_group  --key key_pair
Now create the auto scaling group.
as-create-auto-scaling-group NodeJSGroup --launch-configuration NodeJS --availability-zones us-east-1d --min-size 1 --max-size 5 --tag "k=Name, v=AsNodeJSProd, p=true"
The tag name will propagate to all the instances. If you don't specify a tag, your instances will have no human readable names. Read more about tags here. k=Name must have Name capitalized, else you won't see the human readable names in your AWS EC2 console.

Check the status by using the following commands
as-describe-launch-configs
as-describe-auto-scaling-groups
as-describe-auto-scaling-instances
You should have one machine launched.

Keep in mind that in some newer instances, the volume names are different. For example,
/xvda1 = /sda1
/xvdb = /sdb
/xvdf = /sdf
/xvdg = /sdg
When using the as-create-launch-config command, use the names returned from as-describe-images even if the actual volumes are different names.

For device-block-mapping, you can specify different kinds of volumes with different sizes. Read Block Device Mapping for more information.

Now let's stop the auto scaling group, else you will need to pay.
as-update-auto-scaling-group NodeJSGroup --min-size 0 --max-size 0
as-delete-auto-scaling-group NodeJSGroup
as-delete-launch-config NodeJS
In the next post, we will go into auto scaling based on metrics like CPU utilizations via the CloudWatch API.

Friday, May 24, 2013

AWS Auto Scaling Part 1 - Configuring Auto Scaling Command Line Tools

In this post, we will experiment with Amazon's auto scaling service.

We will first begin by installing the Auto Scaling Command Line Tools in a new Ubuntu machine.

Connect to your machine by ssh.


Download and Unzip the Auto Scaling Command Line Tools

mkdir /opt/tools
cd /opt/tools
wget http://ec2-downloads.s3.amazonaws.com/AutoScaling-2011-01-01.zip

sudo apt-get install unzip
unzip AutoScaling-2011-01-01.zip


Install Java

Read Install Java OpenJDK 7 on Amazon EC2 Ubuntu.


Setting the environment variables

In your  ~/.bashrc file, append the following lines to the end of the file.
export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")
export PATH=$JAVA_HOME/bin:$PATH
Set the AWS_AUTO_SCALING_HOME to the location where you unzipped the command line tools.

export AWS_AUTO_SCALING_HOME=/opt/tools/AutoScaling-1.0.61.2
export PATH=$PATH:$AWS_AUTO_SCALING_HOME/bin

Install the Security Credentials

Go to the AWS security console.

Scroll to the Access Credentials section.

Note down an active pair of Access Key ID and Secret Access Key (Click show to see the secret access key).
vi /opt/tools/AutoScaling-1.0.61.2/credential-file-path.template
Paste your keys:
AWSAccessKeyId=
AWSSecretKey=
Append to ~/.bashrc
export AWS_CREDENTIAL_FILE=/opt/tools/AutoScaling-1.0.61.2/credential-file-path.template

Setting the Auto Scaling Region

By default, the auto scaling region is us-east-1.

If want to use a different region, you need to change to your region. Note down the region endpoint here: Regions and Endpoints.
vi ~/.bashrc
export AWS_AUTO_SCALING_URL=https://autoscaling.us-east-1.amazonaws.com

Test your configuration

as-cmd

You should see a panel of commands like the following:

Command Name                                Description
------------                                -----------
as-create-auto-scaling-group                Create a new Auto Scaling group.
as-create-launch-config                     Creates a new launch configuration.
as-create-or-update-tags                    Create or update tags.
as-delete-auto-scaling-group                Deletes the specified Auto Scaling group.
as-delete-launch-config                     Deletes the specified launch configuration.
as-delete-notification-configuration        Deletes the specified notification configuration.
as-delete-policy                            Deletes the specified policy.
as-delete-scheduled-action                  Deletes the specified scheduled action.
as-delete-tags                              Delete the specified tags
as-describe-adjustment-types                Describes all policy adjustment types.
as-describe-auto-scaling-groups             Describes the specified Auto Scaling groups.
as-describe-auto-scaling-instances          Describes the specified Auto Scaling instances.
as-describe-auto-scaling-notification-types Describes all Auto Scaling notification types.
as-describe-launch-configs                  Describes the specified launch configurations.
as-describe-metric-collection-types         Describes all metric colle... metric granularity types.
as-describe-notification-configurations     Describes all notification...given Auto Scaling groups.
as-describe-policies                        Describes the specified policies.
as-describe-process-types                   Describes all Auto Scaling process types.
as-describe-scaling-activities              Describes a set of activit...ties belonging to a group.
as-describe-scheduled-actions               Describes the specified scheduled actions.
as-describe-tags                            Describes tags
as-describe-termination-policy-types        Describes all Auto Scaling termination policy types.
as-disable-metrics-collection               Disables collection of Auto Scaling group metrics.
as-enable-metrics-collection                Enables collection of Auto Scaling group metrics.
as-execute-policy                           Executes the specified policy.
as-put-notification-configuration           Creates or replaces notifi...or the Auto Scaling group.
as-put-scaling-policy                       Creates or updates an Auto Scaling policy.
as-put-scheduled-update-group-action        Creates or updates a scheduled update group action.
as-resume-processes                         Resumes all suspended Auto... given Auto Scaling group.
as-set-desired-capacity                     Sets the desired capacity of the Auto Scaling group.
as-set-instance-health                      Sets the health of the instance.
as-suspend-processes                        Suspends all Auto Scaling ... given Auto Scaling group.
as-terminate-instance-in-auto-scaling-group Terminates a given instance.
as-update-auto-scaling-group                Updates the specified Auto Scaling group.
help
version                                     Prints the version of the CLI tool and the API.

    For help on a specific command, type ' --help'

In Part 2, we will go through an example that will launch an instance via a auto scaling group.

Wednesday, April 17, 2013

Using Munin to monitor EC2 instances on Amazon

After playing around with CloudWatch, I find the interface very confusing to use. The biggest problem  is EC2 instance are described by AMI image ID rather than my pre-defined machine tag name (Let me know in the comments below if you can figure out how).

I was looking at a few monitoring tools (Nagis, Cacti, Munin, Zabbix) and decided to try to out Munin. The biggest motivator for me is that Instagram is also using Munin.

Let's begin by spinning an Ubuntu instance as the Munin master.


Installing Munin Master and Munin Nodes:

Install munin and munin-node
apt-get install munin
apt-get install munin-node
Install apache (for viewing reports from the Web)
apt-get install apache2
For all the instances you want to monitor, install Munin Node.
apt-get install munin-node
For these node instances, we will edit munin-node.conf
vi /etc/munin/munin-node.conf
Change the host_name. Name this to be something descriptive so you will know what this machine is. The master node will report using this name.
host_name {api1.monetize24hours.com}
Change allow from
allow ^127\.0\.0\.1$
to
allow ^.*$
This is saying allow all internal IPs to connect to. Since AWS elastic address changes all the time, it's better to set it to allow all. Do NOT set it to the instance's external address else you will be charged for data transfer. Make sure all the machines are behind a firewall.

Restart the Munin node.
/etc/init.d/munin-node restart
Repeat the settings above for all the Munin nodes.

Now in the Master Munin node, edit vi /etc/munin/munin.conf. Search for

[localhost.localdomain]
    address 127.0.0.1
    use_node_name yes

Change it to

[api1.monetize24hours.com]
    address ip-00-000-000-000.ec2.internal

This value must match the host name you defined in the Munin node above. The address is the ec2 private address of the Munin node. This is how Munin master will aggregate and report the data.


Showing Data on Webpages

Make sure the Munin master can connect to your Munin nodes.
telnet {private_ec2_address} 4949
Port 4949 is used for Munin internodes communication

If it doesn't connect, add port 4949 for the Munin node's security group.

You can find the Munin master's security group name by clicking on the Security Group and checking the Details tab. If looks something like "sg-e0000000".

Now edit /etc/munin/munin.conf to tweak the log and graph generation directories.
bdir   /var/lib/munin
htmldir /var/www/munin
logdir /var/log/munin
rundir  /var/run/munin
Change the above directories. Create them if they don't exist. Make sure you set the appropriate permissions for the directories.

Wait for 5 to 10 minutes. The Perl cron will gather data.

Access the graphs by
{public_ec2_address}/munin
You will want to secure the webpages so no one else can access them. Either secure them by ip or username and password.

We will use .htaccess in the following example.
htpasswd -c /etc/apache2/.munin_htpasswd admin
Create /var/www/munin/.htaccess, and put the following:
AuthUserFile /etc/apache2/.munin_htpasswd
AuthGroupFile /dev/null
AuthName EnterPassword
AuthType Basic
Edit /etc/apache2/sites-available/default.

Change AllowOverride None to AllowOverride All

Restart apache.
service apache2 restart

Tuesday, April 16, 2013

Using Amazon CloudWatch Command Line Tool to record metrics

Introduction

Amazon CloudWatch provides seamless integration for monitoring AWS resources like EC2 instances, RDS instances, EBS volumes, etc based on CPU utilization, data transfer and disk usage.

There are two types of monitoring: basic and detailed. Basic Monitoring reports at a five-minute frequency. Detailed Monitoring reports at a one-minute frequency while aggregating by AMI ID and instance type.

Monitoring data is retained for two weeks, even if your instance is terminated.

Below are the resources to metrics mapping: (For example, CloudWatch tracks request count and latency of Elastic Load Balancer)

ELB - request count, latency
EBS - read/write latency
RDS - freeable memory, available storage space
SQS - number of messages sent and received

You can also send custom metrics to CloudWatch by using the Put API.

You can view your stats in the AWS Management Console.

For overall status of all AWS services, check AWS Service Health Dashboard.


Setting up Amazon CloudWatch Command Line Tool

Spin up a EC2 instance. (Skip this if you are using your home computer).

Begin by downloading the CloudWatch CLI Tool.
mkdir /opt/tools/aws
cd /opt/tools/aws
wget http://ec2-downloads.s3.amazonaws.com/CloudWatch-2010-08-01.zip
Install zip and unzip the package.
sudo apt-get install zip
unzip CloudWatch-2010-08-01.zip 
Check if you have JAVA installed
java -version
If not, read Install Java OpenJDK 7 on Amazon EC2 Ubuntu.

Set AWS_CLOUDWATCH_HOME path in ~/.bashrc
export AWS_CLOUDWATCH_HOME=/opt/tools/aws/CloudWatch-1.0.13.4
export PATH=$PATH:$AWS_CLOUDWATCH_HOME/bin
Enter your AWS Access Key ID and Secret Access Key in the file $AWS_CLOUDWATCH_HOME/bin/credential-file-path.template. You can find your credentials in the AWS Management Console.

AWSAccessKeyId=
AWSSecretKey=

chmod 600 credential-file-path.template

Rename credential-file-path.template to something else (ex. aws_credentials)

Move this file to somewhere else. You may be using this in some other service. For instance, move to /opt/tools/aws.

Add this entry to ~/.bashrc
export AWS_CREDENTIAL_FILE=/opt/tools/aws/aws_credentials
Update ~/.bashrc.
source ~/.bashrc
Test the tool:
mon-cmd
You should see the following:
Command Name                       Description
------------                       -----------
help
mon-delete-alarms                  Delete alarms
mon-describe-alarm-history         Describe alarm history
mon-describe-alarms                Describe alarms fully.
mon-describe-alarms-for-metric     Describe all alarms associated with a single metric
mon-disable-alarm-actions          Disable all actions for a given alarm
mon-enable-alarm-actions           Enable all actions for a given alarm
mon-get-stats                      Get metric statistics
mon-list-metrics                   List user's metrics
mon-put-data                       Put metric data
mon-put-metric-alarm               Create a new alarm or update an existing one
mon-set-alarm-state                Manually set the state of an alarm
version                            Prints the version of the CLI tool and the API.

Publish data points to CloudWatch

CloudWatch allows you to publish data points via PUT requests. CloudWatch only works with data that's in UTC timestamp and within the past two weeks (Only data within two weeks would be retained).

In this example, you will feed CloudWatch with some custom data points.

Execute the following data sets. But substitute the date below to be within a few hours before.
C

Set A (4 data points):
mon-put-data -m RequestLatency -n "Test001" -t 2013-04-16T20:30:00Z -v 87 -u Milliseconds
mon-put-data -m RequestLatency -n "Test001" -t 2013-04-16T20:30:00Z -v 51 -u Milliseconds
mon-put-data -m RequestLatency -n "Test001" -t 2013-04-16T20:30:00Z -v 125 -u Milliseconds
mon-put-data -m RequestLatency -n "Test001" -t 2013-04-16T20:30:00Z -v 235 -u Milliseconds
Set B (Instead of sending individual data points, send sum, min, max and sample count):
mon-put-data -m RequestLatency -n "Test001" -t 2013-04-16T21:30:00Z -s "Sum=577,Minimum=65,Maximum=189,SampleCount=5" -u Milliseconds 
Set C:
mon-put-data -m RequestLatency -n "Test001" -s "Sum=806,Minimum=47,Maximum=328,SampleCount=6" -u Milliseconds
The above are data within latency within three hours. Just think of them as some data points.

Let's get the data summary:
mon-get-stats -m RequestLatency -n "Test001" -s "Average" --start-time 2013-04-16T19:30:00Z --headers
Results:

Time                 Average             Unit
2013-04-16 20:30:00  124.5               Milliseconds
2013-04-16 21:30:00  115.4               Milliseconds
2013-04-16 22:29:00  134.33333333333334  Milliseconds
You can also see the Visual Representation in the AWS Management Console.

Login to CloudWatch.

Click on Metrics in the Left Panel. Select Test001 in the "Viewing" Dropdown box.

You can also create alarms based on this metric.

This concludes the tutorial. If you are interested in more advanced tools, check out this post - Using Munin to monitor EC2 instances on Amazon.

Monday, February 25, 2013

AWS Java - Securing S3 content using query string authentication

Amazon S3 is a highly available and durable hosting environment that can let you serve websites, images, and large files. Sometimes, you may want to secure your contents so only you or your authenticated users can access them. This becomes more important when it's pay content.

This post is about using query string authentication to make the content to be available for a specified period of time.

Specs:
  • Java 1.7
  • Eclipse Juno

Before you begin, make sure you have all the AWS Eclipse tools ready. Read Using Java AWS SDK to upload files to Amazon S3 for how to install the AWS SDK tool and a basic guide on how to upload, delete and retrieve files on S3.

Signing the request will require the following structure:

Authorization = "AWS" + " " + AWSAccessKeyId + ":" + Signature;

Signature = Base64( HMAC-SHA1( YourSecretAccessKeyID, UTF-8-Encoding-Of( StringToSign ) ) );

StringToSign = HTTP-Verb + "\n" +
 Content-MD5 + "\n" +
 Content-Type + "\n" +
 Date + "\n" +
 CanonicalizedAmzHeaders +
 CanonicalizedResource;

CanonicalizedResource = [ "/" + Bucket ] +
  +
 [ sub-resource, if present. For example "?acl", "?location", "?logging", or "?torrent"];

CanonicalizedAmzHeaders = 


Assuming that you either have read the post above or you have implemented upload, upload a file to your Amazon S3 account.

In the AWS Management console, set the file's ACL permissions to your administrative account only (By default, it should be already if you didn't programmatically changed the ACL permission).

We will implement the following function called getS3Url().




We have set the expiration date to be one hour later. You would see the following expiration message an hour later.


< Error>
< Code>AccessDenied</ Code>
< Message>Access Denied</ Message>
< RequestId>8ECB67C2458CE483</ RequestId>
< HostId>
vL6wXNOkvYlpHXbvvlG1SGhy3q/+Ocb3guXtyaDZjmEu24Z4XQpwjfmNAvM+SViz
</ HostId>
</ Error>


Thursday, February 21, 2013

Using Java AWS SDK to upload files to Amazon S3

Amazon S3 is a highly available and durable storage suitable for storage large files that do not change frequently. This post will focus on how to upload files programmatically via the Java Amazon SDK. For an introduction to S3, read What is Amazon Simple Storage Service (Amazon S3)?

My specs:
  • Eclipse Juno
  • SpringMVC 3.1.x
  • Maven 3.0.x

Install AWS Toolkit

In eclipse, click on help in the menu bar and then "Install New Software".

In the "Work with:" input box, put " http://aws.amazon.com/eclipse" and Click Add...

Check on the AWS Toolkit for Eclipse and click Yes to install all the tools.

In the Eclipse toolbar, you will see a red cube icon. Click on the down arrow next to this icon. Click Preference.

Fill in your Access Key ID and Secret Access Key. Give it an Account Name (Ex. use your email). You can find your keys in the Amazon Management Console (My Account/Console -> Security Credentials). Click on Apply and OK.

In the Eclipse menu bar, click on Window -> Preferences. Expand the AWS Toolkit. Right click on your key. Click Select Private Key File. Associate it with your private key. Click OK.

Click on the down arrow next to the Amazon cube icon. Select Show AWS Explorer View. You should be able to see the Amazon S3 service and all your related buckets (if you have any).


Download and Install the AWS SDK for Java

You can download it here. Click on the AWS SDK for Java button.

Extract the file. Code Samples are located in /samples.

If you are using Maven, you can add the AWS SDK as a dependency in the pom.xml file.


< dependency>
< groupId>com.amazonaws</ groupId>
< artifactId>aws-java-sdk</ artifactId>
< version>1.3.32</ version>
< /dependency>


Choose the version you want here.

Alternatively, you can just add it as a library (Right Click on the project -> Java Build Path -> Libraries -> Add External JARs).


Running the default AWS Sample Apps

We will begin by setting up a sample project that you can check out how S3 works.

Click on the down arrow next to the Amazon icon.

Select New AWS Java Project.

Give a Project name.

Select your account.

Select Amazon S3 Sample, Amazon S3 Transfer Progress Sample, and AWS Console Application. Click Next.

Expand the newly created project. Left click on the AwsConsoleApp.java. In the Eclipse menu bar, click on Run -> Run.

You should see output like the following:


===========================================
Welcome to the AWS Java SDK!
===========================================
You have access to 3 Availability Zones.
You have 14 Amazon EC2 instance(s) running.
You have 0 Amazon SimpleDB domain(s)containing a total of 0 items.
You have 8 Amazon S3 bucket(s), containing 71841 objects with a total size of 224551364 bytes.



If you run the S3Sample.java, you will get the following:


===========================================
Getting Started with Amazon S3
===========================================

Creating bucket my-first-s3-bucket-39065c55-2ee5-413a-9de1-6814dbb253c1

Listing buckets
 - my-first-s3-bucket-39065c55-2ee5-413a-9de1-6814dbb253c1

Uploading a new object to S3 from a file

Downloading an object
Content-Type: text/plain
    abcdefghijklmnopqrstuvwxyz
    01234567890112345678901234
    !@#$%^&*()-=[]{};':',.<>/?
    01234567890112345678901234
    abcdefghijklmnopqrstuvwxyz

Listing objects
 - MyObjectKey  (size = 135)

Deleting an object

Deleting bucket my-first-s3-bucket-39065c55-2ee5-413a-9de1-6814dbb253c1


Integrate the S3 SDK

To begin, you need to have the file AwsCredentials.properties at the root of you class path. You can just copy the one generated during the sample project to your project class path. Or you can just create one with the following content:

secretKey=
accessKey=


Create an authenticated S3 object:

AmazonS3 s3 = new AmazonS3Client(new ClasspathPropertiesFileCredentialsProvider());

Objects in S3 are stored in the form of buckets. Each bucket is globally unique. You cannot create a bucket with a name that another user has created. Each bucket contains key and value pairs you can define in any ways you want.


Create a bucket:

String bucketName = "my-s3-bucket-" + UUID.randomUUID();
s3.createBucket(bucketName);

For readability, I have skipped the exception handling, I will come back to it at the end. The name of the bucket must conform to all the DNS rules. I usually name them using my domain name.


Delete a bucket:

s3.deleteBucket(bucketName);


List all buckets:

for (Bucket bucket : s3.listBuckets()) {
    System.out.println(" - " + bucket.getName());
}


Save an object in a bucket:

String key = "myObjectKey";

PutObjectRequest putObject = new PutObjectRequest(bucketName, key, myFile);
s3.putObject(putObject);

myFile is of class File above.


Delete an object:

s3.deleteObject(bucketName, key);


Get/Download an object:

String key = "myObjectKey";
GetObjectRequest getObject = new GetObjectRequest(bucketName, key);
S3Object object = s3.getObject(getObject);


List objects by prefix:

ObjectListing objectListing = s3.listObjects(new ListObjectsRequest()
                    .withBucketName(bucketName)
                    .withPrefix("My"));
for (S3ObjectSummary objectSummary : objectListing.getObjectSummaries()) {
    System.out.println(" - " + objectSummary.getKey() + "  " +
                                   "(size = " + objectSummary.getSize() + ")");
}


Uploading large files

Use TransferManager whenever possible. It makes use of S3 multipart uploads to achieve enhanced throughput, performance, and reliability. It uses multiple threads to upload multiple parts of a single upload at once.

AWSCredentials myCredentials = new BasicAWSCredentials(...);
TransferManager tx = new TransferManager(myCredentials);
Upload myUpload = tx.upload(myBucket, myFile.getName(), myFile);

 while (myUpload.isDone() == false) {
     System.out.println("Transfer: " + myUpload.getDescription());
     System.out.println("  - State: " + myUpload.getState());
     System.out.println("  - Progress: " + myUpload.getProgress().getBytesTransfered());
     // Do work while we wait for our upload to complete...
     Thread.sleep(500);
 }


Exceptions

Whenever you call any of the AWS API, you should surround the calls with try and catch clauses like the following:

try{
    // AWS requests here

} catch (AmazonServiceException ase) {
            System.out.println("Caught an AmazonServiceException, which means your request made it "
                    + "to Amazon S3, but was rejected with an error response for some reason.");
            System.out.println("Error Message:    " + ase.getMessage());
            System.out.println("HTTP Status Code: " + ase.getStatusCode());
            System.out.println("AWS Error Code:   " + ase.getErrorCode());
            System.out.println("Error Type:       " + ase.getErrorType());
            System.out.println("Request ID:       " + ase.getRequestId());
        } catch (AmazonClientException ace) {
            System.out.println("Caught an AmazonClientException, which means the client encountered "
                    + "a serious internal problem while trying to communicate with S3, "
                    + "such as not being able to access the network.");
            System.out.println("Error Message: " + ace.getMessage());
        }
    }


If you interested in securing your S3 contents for your authenticated users only, check out AWS Java - Securing S3 content using query string authentication.

Saturday, February 9, 2013

Micro Instance out of memory - add swap

I was trying to update my Symfony project and I got the following while I was trying to update the database schema or assets:
Fatal error: Uncaught exception 'ErrorException' with message 'Warning: proc_open(): fork failed - Cannot allocate memory in
An Amazon EC2 micro.t1 instance only has 613MB RAM. It is not enough to run a lot of processes.

What I can do is to 1) switch to a small instance or 2) add a 1GB swap in disk.


Here are the commands to add a 1GB swap


sudo /bin/dd if=/dev/zero of=/var/swap.1 bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 34.1356 s, 31.5 MB/s

sudo /sbin/mkswap /var/swap.1
Setting up swapspace version 1, size = 1048572 KiB
no label, UUID=9cffd7c9-8ec6-4f6c-8eea-79aa3173a59a
sudo /sbin/swapon /var/swap.1


To turn off the swap do the following:

sudo /sbin/swapoff /var/swap.1