First stop splunk.
cd into your splunk/bin directory
./splunk stop
Create a new folder (ex. /mnt/splunk_data).
cp -rp splunk/var/lib/splunk/* /mnt/splunk_data/
Change SPLUNK_DB to point to /mnt/splunk_data.
vi splunk/etc/splunk-launch.conf
Find SPLUNK_DB in the file and change the path.
SPLUNK_DB=/mnt1/splunk_data
You may also want to change the retention policy and the max storage size.
// 30 days
frozenTimePeriodInSecs = 2592000
// 90G
maxTotalDataSizeMB = 90000
It's recommended to set the size using the following formula:
Total storage = daily average rate x retention policy x 1/2 = 15 Gig
Start Splunk.
./splunk start
To tune Splunk settings, check:
http://docs.splunk.com/Documentation/Splunk/4.3.1/Installation/CapacityplanningforalargerSplunkdeployment
Showing posts with label aws. Show all posts
Showing posts with label aws. Show all posts
Friday, July 24, 2015
Thursday, July 23, 2015
Install Splunk Forwarding and Receiving
We will be using Splunk Light.
Click on the menu icon at the upper right corner. Choose Data -> Receiving.
In Configure receiving, choose 9997 as the receiving port.
In your application instance, install the universal splunk forwarder.
http://www.splunk.com/en_us/download/universal-forwarder.html
Extract it and put it in /opt/splunk_forwarder directory
sudo ./splunk start
sudo ./splunk enable boot-start -user ec2-user
List all the forward servers:
./splunk list forward-server
Active forwards:
None
Configured but inactive forwards:
None
If it prompts you for username and password, use
username: admin
password: changeme
Add the receiving server to the forwarder:
./splunk add forward-server:9997
Test the connection:
./splunk list forward-server
Active forwards:
None
Configured but inactive forwards:
:9997
If it's not active, remember to add port 9997 to your security group.
Add data to monitor
./splunk add monitor -index main -sourcetype
To list what's being monitored:
./splunk list monitor
Click on the menu icon at the upper right corner. Choose Data -> Receiving.
In Configure receiving, choose 9997 as the receiving port.
In your application instance, install the universal splunk forwarder.
http://www.splunk.com/en_us/download/universal-forwarder.html
Extract it and put it in /opt/splunk_forwarder directory
sudo ./splunk start
sudo ./splunk enable boot-start -user ec2-user
List all the forward servers:
./splunk list forward-server
Active forwards:
None
Configured but inactive forwards:
None
If it prompts you for username and password, use
username: admin
password: changeme
Add the receiving server to the forwarder:
./splunk add forward-server
Test the connection:
./splunk list forward-server
Active forwards:
None
Configured but inactive forwards:
If it's not active, remember to add port 9997 to your security group.
Add data to monitor
./splunk add monitor
To list what's being monitored:
./splunk list monitor
Installing splunk on AWS
Begin by downloading Splunk Light here: http://www.splunk.com/en_us/download.html. You will probably need to register an account on Splunk before it lets you to download it.
Upload Splunk to your ec2 instance using SCP. For example
scp -i ec2-user@:tmp
In above, I uploaded the splunk tgz file to a tmp folder in my ec2 instance.
You will need to install glibc.i686 first.
yum -y install glibc.i686
Create a folder called /opt if it doesn't exist
Extract your tgz file inside opt
tar xvzf splunklight-6.2.4-271043-Linux-i686.tgz
The splunk executable is located in /opt/splunk/bin. cd into it.
Start splunk:
sudo ./splunk start --accept-license
Start splunk on boot:
sudo ./splunk enable boot-start -user ec2-user
You should be able to view splunk's web interface at port 8000 or your ec2 public address.
Other useful commands:
./splunk stop
./splunk restart
Upload Splunk to your ec2 instance using SCP. For example
scp -i
In above, I uploaded the splunk tgz file to a tmp folder in my ec2 instance.
You will need to install glibc.i686 first.
yum -y install glibc.i686
Create a folder called /opt if it doesn't exist
Extract your tgz file inside opt
tar xvzf splunklight-6.2.4-271043-Linux-i686.tgz
The splunk executable is located in /opt/splunk/bin. cd into it.
Start splunk:
sudo ./splunk start --accept-license
Start splunk on boot:
sudo ./splunk enable boot-start -user ec2-user
You should be able to view splunk's web interface at port 8000 or your ec2 public address.
Other useful commands:
./splunk stop
./splunk restart
Thursday, June 5, 2014
Setting up Wordpress on Elastic Beanstalk
ElasticBeanStalk is a service that automates scaling, load-balancing, and deploying applications so you can concentrate on only software development. In a way, it is very similar to Google App Engline.
In this article, we will visit how we can set up Wordpress on Elastic Beanstalk.
Configuration the Elastic Beanstalk Environment
First log into your ElasticBeansTalk and click on Create a New Application.
Enter the Application Name and Description and click Next.
Click on "Create one now" to create a new environment.
Environment tier: Web Server 1.0
Predefined configuration: PHP
Environment type: Load balancing, autoscaling
Note for Environment tier, Web Server handles web requests (HTTP/S) while workers handle background processes.
Choose sample application for now
You will then be prompted to input an Environment Name. Label it whatever, as we will use CNAME later.
For Additional Resources, check create an RDS DB instance with this environment.
In Configuration Details, select your EC2 key pair and leave the other details as it is. You can always change these later.
For RDS configuration, put 5GB for allocated storage. Input the username and password. Select Create snapshot and single availability zone.
Click Launch.
Once it's launched, click Configuration on the left sidebar, then Software Configuration. In PARAM1, set to production, staging or something else.
Installing Wordpress
We need to have different wp-config files for local development and Elastic BeanStalk. Let's define the local config as local-config.php.
Set up wordpress in your local computer.
In wp-config, replace the database configs with the following:
if ( file_exists( dirname( __FILE__ ) . '/local-config.php' ) ) {
define( 'WP_LOCAL_DEV', true );
include( dirname( __FILE__ ) . '/local-config.php' );
} else {
define( 'WP_LOCAL_DEV', false );
define('WP_HOME','');
define('WP_SITEURL','');
define('DB_NAME', 'database');
define('DB_USER', 'username');
define('DB_PASSWORD', 'password');
define('DB_HOST', 'localhost');
}
Fill in the above db_name, db_user, db_password, db_host with the appropriate settings.
Create a file called local-config.php at the same directory as wp-config.php
Put in the following with your local database information.
<?php
define('WP_HOME','');
define('WP_SITEURL','');
define('DB_NAME', '');
define('DB_USER', '');
define('DB_PASSWORD', 'root');
define('DB_HOST', '');
Install AWS ElasticBeanstalk Tools and AWSDevTools
Download from http://aws.amazon.com/code/6752709412171743
Read this: http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/GettingStarted.GetSetup-devtools.html
Check to make sure eb is running properly:
eb --version
Configure your ElasticBeanstalk Git settings:
git aws.config
Reading Credentials from C:\Users\Kenneth\.elasticbeanstalk\aws_credential_file.
The file does not exist. You can supply credentials by creating the file or editing .elasticbeanstalk/config to reference a different file.
The credential file should have the following format:
AWSAccessKeyId=your key
AWSSecretKey=your secret
AWS Access Key:
AWS Secret Key:
AWS Region [default to us-east-1]: (Check this in your Elastic Beanstalk console)
AWS Elastic Beanstalk Application: <put in the application name you created above>
AWS Elastic Beanstalk Environment: <put in the environment name you created above>
Check if Elastic Beanstalk can detect your app:
eb status --verbose
Now deploy your application:
git aws.push
Remember to use Route53 to map to Elastic Beanstalk.
http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/customdomains.html
After you point the domain to the Elastic Load Balancer. If the site does not load up in the browser, do not panic. Give it 15 mins. It will be back up.
http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/customdomains.html
After you point the domain to the Elastic Load Balancer. If the site does not load up in the browser, do not panic. Give it 15 mins. It will be back up.
Wednesday, May 14, 2014
AWS s3 - The specified bucket is not valid.
If you receive the message "The specified bucket is not valid." while trying "enable website hosting", make sure your bucket name adhere to the following:
- Should not contain uppercase characters
- Should not contain underscores (_)
- Should be between 3 and 63 characters long
- Should not end with a dash
- Cannot contain two, adjacent periods
- Cannot contain dashes next to periods (e.g., "my-.bucket.com" and "my.-bucket" are invalid)
- Should not contain uppercase characters
- Should not contain underscores (_)
- Should be between 3 and 63 characters long
- Should not end with a dash
- Cannot contain two, adjacent periods
- Cannot contain dashes next to periods (e.g., "my-.bucket.com" and "my.-bucket" are invalid)
Wednesday, July 17, 2013
Ansbile EC2 - setting up Nginx, MySQL, php, git
In this post, we will write a playbook that's going to set up a EC2 machine for a fully workable php environment.
Starting from a fresh machine with an attached ebs volume, we will do the following:
Begin by spinning a fresh EC2 AMI and attach a ebs volume to it. Read Ansible - how to launch EC2 instances and setup the php environment.
Format the new ebs volume with XFS and mount it as /vol
We will mount the new ebs volume /dev/xvdf as /vol and format it with XFS
- name: update machine with latest packages
action: command yum -y update
- name: install xfsprogs
action: yum pkg=xfsprogs state=latest
- name: format new volume
filesystem: fstype=xfs dev=/dev/xvdf
- name: edit fstab and mount the vol
action: mount name={{mount_dir}} src=/dev/xvdf opts=noatime fstype=xfs state=mounted
Install php, mysql and nginx
- name: install php
action: yum pkg=php state=latest
- name: install php-mysql
action: yum pkg=php-mysql state=latest
- name: install nginx
action: yum pkg=nginx state=latest
- name: ensure nginx is running
action: service name=nginx state=started
- name: install mysql server
action: yum pkg=mysql-server state=latest
- name: make sure mysql is running
action: service name=mysqld state=started
Create a mysql user and a database
- name: install python mysql
action: yum pkg=MySQL-python state=latest
- name: create database user
action: mysql_user user=admin password=1234qwer priv=*.*:ALL state=present
- name: create db
action: mysql_db db=ansible state=present
Copy the public and private keys into the targeted machine
We want the target machine to be able to do a git pull without username and password prompts.
mkdir ~/.ssh
ssh-keygen -t rsa -C "you@email.com"
You will see:
Two files will be generated: id_rsa, id_rsa.pub
Log in to Github and then Go to Account Settings -> SSH Keys
Add new key by giving it a name and pasting the content of id_rsa.pub
Test it by:
- name: install git
action: yum pkg=git state=latest
- name: copy private key
action: template src=~/.ssh/id_rsa.pub dest=~/.ssh/id_rsa.pub
- name: copy public key
action: template src=~/.ssh/id_rsa dest=~/.ssh/id_rsa
Checkout a project from github
- name: git checkout source
action: git repo=ssh://git@github.com:{your_git_repo}.git dest={{work_dir}} version=unstable
Full Ansible Playbook source:
Starting from a fresh machine with an attached ebs volume, we will do the following:
- Format the new ebs volume with XFS and mount it as /vol
- Install php, mysql and nginx
- Create a mysql user and create a database
- Copy the public and private keys into the targeted machine
- Checkout a project from github
Begin by spinning a fresh EC2 AMI and attach a ebs volume to it. Read Ansible - how to launch EC2 instances and setup the php environment.
Format the new ebs volume with XFS and mount it as /vol
We will mount the new ebs volume /dev/xvdf as /vol and format it with XFS
- name: update machine with latest packages
action: command yum -y update
- name: install xfsprogs
action: yum pkg=xfsprogs state=latest
- name: format new volume
filesystem: fstype=xfs dev=/dev/xvdf
- name: edit fstab and mount the vol
action: mount name={{mount_dir}} src=/dev/xvdf opts=noatime fstype=xfs state=mounted
Install php, mysql and nginx
- name: install php
action: yum pkg=php state=latest
- name: install php-mysql
action: yum pkg=php-mysql state=latest
- name: install nginx
action: yum pkg=nginx state=latest
- name: ensure nginx is running
action: service name=nginx state=started
- name: install mysql server
action: yum pkg=mysql-server state=latest
- name: make sure mysql is running
action: service name=mysqld state=started
Create a mysql user and a database
- name: install python mysql
action: yum pkg=MySQL-python state=latest
- name: create database user
action: mysql_user user=admin password=1234qwer priv=*.*:ALL state=present
- name: create db
action: mysql_db db=ansible state=present
Copy the public and private keys into the targeted machine
We want the target machine to be able to do a git pull without username and password prompts.
mkdir ~/.ssh
ssh-keygen -t rsa -C "you@email.com"
You will see:
Enter file in which to save the key (/root/.ssh/id_rsa):Just press Enter on the above prompts.
Enter passphrase (empty for no passphrase):
Two files will be generated: id_rsa, id_rsa.pub
Log in to Github and then Go to Account Settings -> SSH Keys
Add new key by giving it a name and pasting the content of id_rsa.pub
Test it by:
ssh -T git@github.comHere are the Ansible tasks:
- name: install git
action: yum pkg=git state=latest
- name: copy private key
action: template src=~/.ssh/id_rsa.pub dest=~/.ssh/id_rsa.pub
- name: copy public key
action: template src=~/.ssh/id_rsa dest=~/.ssh/id_rsa
Checkout a project from github
- name: git checkout source
action: git repo=ssh://git@github.com:{your_git_repo}.git dest={{work_dir}} version=unstable
Wednesday, July 10, 2013
AWS Elastic MapReduce - EMR MySQL DBInputFormat
In this post, we will build a MapReduce program as a JAR executable. To make this example more interesting than most of the other online posts out there, we will modify the common WordCount example to fetch from MySQL instead of a text file.
You will need to at least understand the basics of what are the mapper and the reducer to follow this post. You may want to read this from Apache.
We will use Maven to build the project. If you have no idea how to do this, read Building a JAR Executable with Maven and Spring. We will feed this JAR via the Amazon Elastic MapReduce (EMR) and save the output in Amazon S3.
Here are the EMR supported Hadoop Versions. We will be using 1.0.3.
What we will do:
Assume we have a database called Company and there is a table called Employee with two columns: id and title.
We will count the number of employees with the same titles.
This is same as the WordCount examples you see in other tutorials, but we are fetching this from a database.
Install Hadoop Library
First in your java project, include the Maven Library in the pom.xml file.
The program will be very basic and contain the following files. The filenames should be self-explanatory.
Run the Job via the AWS EMR console
Compile the project and generate a self-contained JAR file. If you are using maven, read Building a JAR Executable with Maven and Spring.
Upload your JAR file to your s3 bucket.
In the AWS EMR console, specify the location of the JAR file.
JAR location: {your_bucket_name}/{jar_name}
Arguments: s3n://{your_bucket_name}/output
The program above takes in the output location as an argument.
Read AWS - Elastic Map Reduce Tutorial for more details on how to create a job flow in EMR.
If you encounter the mysql driver missing error, read Amazon Elastic MapReduce (EMR) ClassNotFoundException: com.mysql.jdbc.Driver.
You will need to at least understand the basics of what are the mapper and the reducer to follow this post. You may want to read this from Apache.
We will use Maven to build the project. If you have no idea how to do this, read Building a JAR Executable with Maven and Spring. We will feed this JAR via the Amazon Elastic MapReduce (EMR) and save the output in Amazon S3.
Here are the EMR supported Hadoop Versions. We will be using 1.0.3.
What we will do:
Assume we have a database called Company and there is a table called Employee with two columns: id and title.
We will count the number of employees with the same titles.
This is same as the WordCount examples you see in other tutorials, but we are fetching this from a database.
Install Hadoop Library
First in your java project, include the Maven Library in the pom.xml file.
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.0.3</version>
</dependency>
The File Structure
The program will be very basic and contain the following files. The filenames should be self-explanatory.
Main.java
Map.java
Reduce.java
The mapred library VS the mapreduce library
When you are reading other hadoop examples online, you will see them using either the mapred or the mapreduce library. mapred is the older version, while mapreduce is the cleaner and newer version. To upgrade from mapred to mapreduce, read Hadoop - mapred VS mapreduce libraries.
This example will use the org.apache.hadoop.mapreduce library.
EmployeeRecord
We will need to serialize the object of our interest by implementing Writable and DBWritable as show below.
The Mapper
The Reducer
Main.java
We will hope everything up. The steps are simple.
Create a Job.
Set output format.
Set input format.
Set Mapper class.
Set Reducer class.
Set input. (In our case, it will be from the database)
Set output.
The mapred library VS the mapreduce library
When you are reading other hadoop examples online, you will see them using either the mapred or the mapreduce library. mapred is the older version, while mapreduce is the cleaner and newer version. To upgrade from mapred to mapreduce, read Hadoop - mapred VS mapreduce libraries.
This example will use the org.apache.hadoop.mapreduce library.
EmployeeRecord
We will need to serialize the object of our interest by implementing Writable and DBWritable as show below.
The Mapper
The Reducer
Main.java
We will hope everything up. The steps are simple.
Create a Job.
Set output format.
Set input format.
Set Mapper class.
Set Reducer class.
Set input. (In our case, it will be from the database)
Set output.
Run the Job via the AWS EMR console
Compile the project and generate a self-contained JAR file. If you are using maven, read Building a JAR Executable with Maven and Spring.
Upload your JAR file to your s3 bucket.
In the AWS EMR console, specify the location of the JAR file.
JAR location: {your_bucket_name}/{jar_name}
Arguments: s3n://{your_bucket_name}/output
The program above takes in the output location as an argument.
Read AWS - Elastic Map Reduce Tutorial for more details on how to create a job flow in EMR.
If you encounter the mysql driver missing error, read Amazon Elastic MapReduce (EMR) ClassNotFoundException: com.mysql.jdbc.Driver.
Tuesday, July 9, 2013
Amazon Elastic MapReduce (EMR) ClassNotFoundException: com.mysql.jdbc.Driver
If you get the "ClassNotFoundException: com.mysql.jdbc.Driver" error while doing a JAR Elastic MapReduce, you will need to copy the mysql connector library into the hadoop/bin library.
The error will look like:
Caused by: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:169)
at org.apache.hadoop.mapreduce.lib.db.DBConfiguration.getConnection(DBConfiguration.java:148)
at org.apache.hadoop.mapreduce.lib.db.DBInputFormat.getConnection(DBInputFormat.java:184)
... 20 more
We can copy the mysql connector library to each of the machines by "bootstrapping".
1.) Get the MySQL connector library.
You can download it from the Maven repository.
Create a bucket on S3 and upload the SQL connector to this bucket.
2.) Writing a bootstrap bash file
Name this file bootstrap.sh. We will use the "hadoop fs" command to copy the connector from S3 to each machine.
Script:
3.) Create a Job Flow
Log in to the AWS EMR console.
Click on create a job flow.
Fill in all the details including your JAR file.
At the last "bootstrap" step, select custom bootstrap action and put in the location of the bootstrap.sh script (ex. s3n://{my_bucket}/bootscript.sh).
Start the job flow and monitor the stderr and stdout. Everything should work.
The error will look like:
Caused by: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:169)
at org.apache.hadoop.mapreduce.lib.db.DBConfiguration.getConnection(DBConfiguration.java:148)
at org.apache.hadoop.mapreduce.lib.db.DBInputFormat.getConnection(DBInputFormat.java:184)
... 20 more
We can copy the mysql connector library to each of the machines by "bootstrapping".
1.) Get the MySQL connector library.
You can download it from the Maven repository.
Create a bucket on S3 and upload the SQL connector to this bucket.
2.) Writing a bootstrap bash file
Name this file bootstrap.sh. We will use the "hadoop fs" command to copy the connector from S3 to each machine.
Script:
#!/bin/bashUpload this script to the same bucket you created in the previous step.
hadoop fs -copyToLocal s3n://wundrbooks-emr-dev/mysql-connector-java-5.1.25.jar $HADOOP_HOME/lib
3.) Create a Job Flow
Log in to the AWS EMR console.
Click on create a job flow.
Fill in all the details including your JAR file.
At the last "bootstrap" step, select custom bootstrap action and put in the location of the bootstrap.sh script (ex. s3n://{my_bucket}/bootscript.sh).
Start the job flow and monitor the stderr and stdout. Everything should work.
Monday, July 8, 2013
Amazon EMR - RDS DB Security Group
Log in to your AWS RDS console. Select Security Groups on the left sidebar.
Select your DB Security Group and click on the Edit button.
Add the following:
1.) EMR master
Connection Type = EC2 Security Group
EC2 Security Group = ElasticMapReduce-master
2.) EMR slave
Connection Type = EC2 Security Group
EC2 Security Group = ElasticMapReduce-slave
Select your DB Security Group and click on the Edit button.
Add the following:
1.) EMR master
Connection Type = EC2 Security Group
EC2 Security Group = ElasticMapReduce-master
2.) EMR slave
Connection Type = EC2 Security Group
EC2 Security Group = ElasticMapReduce-slave
Friday, July 5, 2013
Amazon Elastic MapReduce (EMR) - Unsupported major.minor version 51.0
Amazon Elastic MapReduce (EMR) does not support Java 1.7, use 1.6 instead.
Monday, June 17, 2013
AWS - Elastic Map Reduce Tutorial
MapReduce has become a very common technique utilizing parallel computing.
Let's say you have a database table with username and description columns in it. You want to replace html tags in the description column with empty spaces. Let's say the database has petabytes of data. It will take forever for a single machine to do this job.
MapReduce works by distributing this job among multiple machines. Each machine executes different dataset in parallel, and then the outputs will be aggregated. Therefore, a job that may take days to compute can take only mins to finish.
In this tutorial, we will experiment with Amazon's Elastic MapReduce.
Let's get started.
Create a S3 bucket
Elastic MapReduce uses S3 to store it's input and output. We will first create a bucket.
Log into your Amazon Web S3 console. Create a bucket, say my_map_reduce_data. Amazon S3 bucket names need to be unique across all Amazon S3 buckets. It's best to prefix it with your company name.
Create input data
Let's create a text file and put some random data into it. We will create a MapReduce function to count word frequencies.
Ex.
apple apple orange orange orange
pear pear pear pear pear pear pear pineapple pineapple
Label this file input.txt.
Create a folder inside my_map_reduce_data and call it input.
Implementing the mapper function
Download the following file and save it as wordSplitter.py
https://s3.amazonaws.com/elasticmapreduce/samples/wordcount/wordSplitter.py
It's a script that reads the input file line by line and prints out the number of occurrence for each distinct word in that line.
Upload wordSplitter.py to my_map_reduce_data
Launch the Elastic MapReduce Cluster
Sign in to the Elastic MapReduce Console.
Click on Create New Job Flow.
Give the Job Flow Name WordSplitter.
Choose Amazon Distribution for the Hadoop version.
Choose Streaming as the job flow type. You write the mapper and reducer scripts in any of the following languages: Ruby, Perl, Python, PHP, R, Bash, or C++.
Click Continue.
Input, output locations
Fill in the following:
Input Location: my_map_reduce_data/input
Output Location: my_map_reduce_data/output
Mapper: my_map_reduce_data/wordSplitter.py
Reducer: aggregate
Click Continue.
Configure EC2 Instances
Leave the options as it is.
Click Continue.
Advanced Options
If you want to ssh into the master node, specify the Amazon EC2 Key Pair.
For the Amazon S3 Log Path, put my_map_reduce_data/logs
Check Yes for Enable debugging. It will create an index of the log files in Amazon SimpleDB.
Leave the other boxes as NO.
BootStrap Actions
Proceed without bootstrap actions. BootStrap allows additional software to be installed in the cluster nodes before the MapReduce process any data.
Review the information and start the job. You should see the job being started.
You can monitor the status of the nodes in the MapReduce Web console.
Check the output folder in S3 after the job is completed.
Remember to delete all buckets to avoid getting charges.
Let's say you have a database table with username and description columns in it. You want to replace html tags in the description column with empty spaces. Let's say the database has petabytes of data. It will take forever for a single machine to do this job.
MapReduce works by distributing this job among multiple machines. Each machine executes different dataset in parallel, and then the outputs will be aggregated. Therefore, a job that may take days to compute can take only mins to finish.
In this tutorial, we will experiment with Amazon's Elastic MapReduce.
Let's get started.
Create a S3 bucket
Elastic MapReduce uses S3 to store it's input and output. We will first create a bucket.
Log into your Amazon Web S3 console. Create a bucket, say my_map_reduce_data. Amazon S3 bucket names need to be unique across all Amazon S3 buckets. It's best to prefix it with your company name.
Create input data
Let's create a text file and put some random data into it. We will create a MapReduce function to count word frequencies.
Ex.
apple apple orange orange orange
pear pear pear pear pear pear pear pineapple pineapple
Label this file input.txt.
Create a folder inside my_map_reduce_data and call it input.
Implementing the mapper function
Download the following file and save it as wordSplitter.py
https://s3.amazonaws.com/elasticmapreduce/samples/wordcount/wordSplitter.py
It's a script that reads the input file line by line and prints out the number of occurrence for each distinct word in that line.
Upload wordSplitter.py to my_map_reduce_data
Launch the Elastic MapReduce Cluster
Sign in to the Elastic MapReduce Console.
Click on Create New Job Flow.
Give the Job Flow Name WordSplitter.
Choose Amazon Distribution for the Hadoop version.
Choose Streaming as the job flow type. You write the mapper and reducer scripts in any of the following languages: Ruby, Perl, Python, PHP, R, Bash, or C++.
Click Continue.
Input, output locations
Fill in the following:
Input Location: my_map_reduce_data/input
Output Location: my_map_reduce_data/output
Mapper: my_map_reduce_data/wordSplitter.py
Reducer: aggregate
Click Continue.
Configure EC2 Instances
Leave the options as it is.
Click Continue.
Advanced Options
If you want to ssh into the master node, specify the Amazon EC2 Key Pair.
For the Amazon S3 Log Path, put my_map_reduce_data/logs
Check Yes for Enable debugging. It will create an index of the log files in Amazon SimpleDB.
Leave the other boxes as NO.
BootStrap Actions
Proceed without bootstrap actions. BootStrap allows additional software to be installed in the cluster nodes before the MapReduce process any data.
Review the information and start the job. You should see the job being started.
You can monitor the status of the nodes in the MapReduce Web console.
Check the output folder in S3 after the job is completed.
Remember to delete all buckets to avoid getting charges.
Thursday, May 30, 2013
AWS Auto Scaling Part 3 - Auto Scaling Based on Demand
In this example, we will go through an example that will auto scaling your group based on CPU Utilization. Alternatively, you can also make your group scale in or out based on other metrics like memory usage, IO throughput, etc.
We will set up a auto scaling group with elastic load balancing. Scale out by adding one instance when the CPU utilization is above 80% for 10 mins. Scale in by subtracting one instance when the CPU utilization is below 40% for 10 mins.
Create an Auto Scaling Group
We will create an auto scaling group called NodeJSGroup.
Check all your available load balancers:
Create a Scale out Policy
We will create a scale out policy such that a new instance will be added to the auto scaling group when the CPU utilization reaches 80% or above for over 10 mins. We will label this policy NodeJSScaleOutPolicy.
You can check the policy by
Create a Scale in Policy
We will create a scale in policy such that an instance will be removed from the auto scaling group when the CPU utilization reaches 40% or below for over 10 mins.
Install CloudWatch API
We will need to associate the alarms with the scale in and scale out policies. To do that, we need to install the CloudWatch Command Line Tools.
Associate Alarms with Policies
Make an alarm called NodeJSHighAlarm and associate it with NodeJSScaleOutPolicy.
Make another alarm called NodeJSLowAlarm and associate it with NodeJSScaleInPolicy.
There will be 6 things we need to do:
- create a launch configuration
- create an auto scaling group
- create a scale out policy
- create a scale in policy
- create a alarm attached to the scale out policy
- create an alarm attached to the scale in policy
Create a Launch Configuration
We will create a launch configuration called NodeJS.
as-create-launch-config NodeJS --image-id ami-87acc4ee --instance-type m1.large --block-device-mapping="/dev/sda1=snap-1f356ee2, /dev/sdf=snap-18356ee5, /dev/sdg=snap-15356ee8, /dev/sdb=ephemeral0" --group your_security_group --key your_key_pairThis is same as what we have at the end of Part 2 - Auto Scaling Based on Fixed Number of Instances.
Create an Auto Scaling Group
We will create an auto scaling group called NodeJSGroup.
as-create-auto-scaling-group NodeJSGroup --launch-configuration NodeJS --availability-zones us-east-1d --min-size 1 --max-size 5 --tag "k=name, v=AsNodeJSProd, p=true" --load-balancers TestNodeBalancerWe have added a elastic load balancer called TestNodeBalancer. You can create a load balancer in the AWS EC2 console.
Check all your available load balancers:
elb-describe-lbs --headersThis is same as what we have at the end of Part 2 - Auto Scaling Based on Fixed Number of Instances.
Create a Scale out Policy
We will create a scale out policy such that a new instance will be added to the auto scaling group when the CPU utilization reaches 80% or above for over 10 mins. We will label this policy NodeJSScaleOutPolicy.
as-put-scaling-policy NodeJSScaleOutPolicy --auto-scaling-group NodeJSGroup --adjustment=1 --type ChangeInCapacity --cooldown 300Take note of the Amazon Resource Name (ARN) when it returns as a response. You will need this to set up the alarm.
arn:aws:autoscaling:us-east-1:240591131275:scalingPolicy:f0037fff-0949-4123-8887-f6c7064b8253:autoScalingGroupName/NodeJSGroup:policyName/NodeJSScaleOutPolicyThe --cooldown = 300 parameter means that there needs to be 300 seconds gap before the policy can be applied again.
You can check the policy by
as-describe-policies
Create a Scale in Policy
We will create a scale in policy such that an instance will be removed from the auto scaling group when the CPU utilization reaches 40% or below for over 10 mins.
as-put-scaling-policy NodeJSScaleInPolicy --auto-scaling-group NodeJSGroup --adjustment=-1 --type ChangeInCapacity --cooldown 300
Install CloudWatch API
We will need to associate the alarms with the scale in and scale out policies. To do that, we need to install the CloudWatch Command Line Tools.
Associate Alarms with Policies
Make an alarm called NodeJSHighAlarm and associate it with NodeJSScaleOutPolicy.
mon-put-metric-alarm NodeJSHighAlarm --comparison-operator GreaterThanThreshold --evaluation-periods 1 --metric-name CPUUtilization --namespace "AWS/EC2" --period 600 --statistic Average --threshold 80 --alarm-actions arn:aws:autoscaling:us-east-1:240591131275:scalingPolicy:f0037fff-0949-4123-8887-f6c7064b8253:autoScalingGroupName/NodeJSGroup:policyName/NodeJSScaleOutPolicy --dimensions "AutoScalingGroupName=NodeJSGroup"
Make another alarm called NodeJSLowAlarm and associate it with NodeJSScaleInPolicy.
mon-put-metric-alarm NodeJSLowAlarm --comparison-operator LessThanThreshold --evaluation-periods 1 --metric-name CPUUtilization --namespace "AWS/EC2" --period 600 --statistic Average --threshold 40 --alarm-actions arn:aws:autoscaling:us-east-1:240591131275:scalingPolicy:631fd578-9517-42e2-a424-8b1ed8dd0874:autoScalingGroupName/NodeJSGroup:policyName/NodeJSScaleInPolicy --dimensions "AutoScalingGroupName=NodeJSGroup"Check the status of the alarms
mon-describe-alarms --headers
Wednesday, May 29, 2013
AWS Command Line Resources
http://aws.amazon.com/developertools
AWS Auto Scaling Part 2 - Auto Scaling Based on Fixed Number of Instances
In part 1 - AWS Auto Scaling Part 1 - Configuring Auto Scaling Command Line Tools, we have spinned a new ubuntu machine and installed the auto scaling command line tools.
We will create two things
Creating a Launch Configuration
We will create a launch configuration.
as-create-launch-config --image-id --instance-type
Choose an AMI of your choice:
Creating an Auto Scaling Group
The auto scaling group takes the following as its parameters:
We will create a group called NodeJSGroup. It will launch the NodeJS configuration we created above. We will use us-east-1d as the region and we want to spin 1 instance.
Deleting launch configurations and auto scaling groups
We will first remove all the instances from the Auto Scaling Group NodeJSGroup. Then we will delete the launch config and the group.
First update the group setting to terminate all the instances.
A more complicated example (with device mapping, security group
The launch configuration above doesn't take into account of the security groups and device-block-mappings. We will create a more complicated example below.
To check the device-block-mappings of an AMI, you will need to install the EC2 API Tools and use the ec2-describe-images command.
ssh connect to the machine you want to use auto-scaling on.
Run either of the following commands.
Check the status by using the following commands
Keep in mind that in some newer instances, the volume names are different. For example,
For device-block-mapping, you can specify different kinds of volumes with different sizes. Read Block Device Mapping for more information.
Now let's stop the auto scaling group, else you will need to pay.
We will create two things
- a launch configuration (defines what AMI to be launched)
- an auto scaling group (defines the number of instances to be launched, etc)
Creating a Launch Configuration
We will create a launch configuration.
as-create-launch-config
Choose an AMI of your choice:
as-create-launch-config NodeJS --image-id ami-111111 --instance-type t1.microCheck the launch configuration
as-describe-launch-configs --headersNote: In the AWS EC2 Console, you can create an AMI by right-clicking one of your EC2 instances and click Create Image.
Creating an Auto Scaling Group
The auto scaling group takes the following as its parameters:
- name for the group
- a launch configuration
- one or more availability zones
- a minimum group size
- a maximum group size
We will create a group called NodeJSGroup. It will launch the NodeJS configuration we created above. We will use us-east-1d as the region and we want to spin 1 instance.
as-create-auto-scaling-group NodeJSGroup --launch-configuration NodeJS --availability-zones us-east-1d --min-size 1 --max-size 1Check the status of the group by:
as-describe-auto-scaling-groups --headersCheck the health of the auto scaling instances:
as-describe-auto-scaling-instances --headersYou should see the health of the launched instances. If you don't see any, check the activity log
as-describe-scaling-activities
Deleting launch configurations and auto scaling groups
We will first remove all the instances from the Auto Scaling Group NodeJSGroup. Then we will delete the launch config and the group.
First update the group setting to terminate all the instances.
as-update-auto-scaling-group NodeJSGroup --min-size 0 --max-size 0Now delete the group and the launch config.
as-delete-auto-scaling-group NodeJSGroup
as-delete-launch-config NodeJS
A more complicated example (with device mapping, security group
The launch configuration above doesn't take into account of the security groups and device-block-mappings. We will create a more complicated example below.
To check the device-block-mappings of an AMI, you will need to install the EC2 API Tools and use the ec2-describe-images command.
ssh connect to the machine you want to use auto-scaling on.
Run either of the following commands.
ec2-describe-images -o selfYou will get something like the following
ec2-describe-images
IMAGE ami-17acc4ee 140591131275/nodejs-production-20130522 240591131275 available private x86_64 machine aki-125ea7eb ebs paravirtual xenFor the above, the block-device-mapping will be
BLOCKDEVICEMAPPING EBS /dev/sda1 snap-1f356ee2 8 true standard
BLOCKDEVICEMAPPING EBS /dev/sdf snap-18356ee5 20 false standard
BLOCKDEVICEMAPPING EBS /dev/sdg snap-15356ee8 20 false standard
BLOCKDEVICEMAPPING EPHEMERAL /dev/sdb ephemeral0
block-device-mapping=/dev/sda1=snap-1f356ee2, /dev/sdf=snap-b8356ee5, /dev/sdg=snap-b5356ee8, /dev/sdb=ephemeral0Find the instance's security group in the AWS EC2 console
--group security_groupYou will need to specify a key pair to ssh into this instance as well
--key key_pairThe whole command will be:
as-create-launch-config NodeJS --image-idNow create the auto scaling group.--instance-type m1.large --block-device-mapping="/dev/sda1=snap- 1f356ee2, /dev/sdf=snap-18356ee5, /dev/sdg=snap-15356ee8, /dev/sdb=ephemeral0" --group security_group --key key_pair
as-create-auto-scaling-group NodeJSGroup --launch-configuration NodeJS --availability-zones us-east-1d --min-size 1 --max-size 5 --tag "k=Name, v=AsNodeJSProd, p=true"The tag name will propagate to all the instances. If you don't specify a tag, your instances will have no human readable names. Read more about tags here. k=Name must have Name capitalized, else you won't see the human readable names in your AWS EC2 console.
Check the status by using the following commands
as-describe-launch-configsYou should have one machine launched.
as-describe-auto-scaling-groups
as-describe-auto-scaling-instances
Keep in mind that in some newer instances, the volume names are different. For example,
/xvda1 = /sda1When using the as-create-launch-config command, use the names returned from as-describe-images even if the actual volumes are different names.
/xvdb = /sdb
/xvdf = /sdf
/xvdg = /sdg
For device-block-mapping, you can specify different kinds of volumes with different sizes. Read Block Device Mapping for more information.
Now let's stop the auto scaling group, else you will need to pay.
as-update-auto-scaling-group NodeJSGroup --min-size 0 --max-size 0In the next post, we will go into auto scaling based on metrics like CPU utilizations via the CloudWatch API.
as-delete-auto-scaling-group NodeJSGroup
as-delete-launch-config NodeJS
Friday, May 24, 2013
AWS Auto Scaling Part 1 - Configuring Auto Scaling Command Line Tools
In this post, we will experiment with Amazon's auto scaling service.
We will first begin by installing the Auto Scaling Command Line Tools in a new Ubuntu machine.
Connect to your machine by ssh.
Download and Unzip the Auto Scaling Command Line Tools
mkdir /opt/tools
cd /opt/tools
wget http://ec2-downloads.s3.amazonaws.com/AutoScaling-2011-01-01.zip
sudo apt-get install unzip
unzip AutoScaling-2011-01-01.zip
Install Java
Read Install Java OpenJDK 7 on Amazon EC2 Ubuntu.
Setting the environment variables
In your ~/.bashrc file, append the following lines to the end of the file.
Install the Security Credentials
Go to the AWS security console.
Scroll to the Access Credentials section.
Note down an active pair of Access Key ID and Secret Access Key (Click show to see the secret access key).
Setting the Auto Scaling Region
By default, the auto scaling region is us-east-1.
If want to use a different region, you need to change to your region. Note down the region endpoint here: Regions and Endpoints.
Test your configuration
as-cmd
You should see a panel of commands like the following:
Command Name Description
------------ -----------
as-create-auto-scaling-group Create a new Auto Scaling group.
as-create-launch-config Creates a new launch configuration.
as-create-or-update-tags Create or update tags.
as-delete-auto-scaling-group Deletes the specified Auto Scaling group.
as-delete-launch-config Deletes the specified launch configuration.
as-delete-notification-configuration Deletes the specified notification configuration.
as-delete-policy Deletes the specified policy.
as-delete-scheduled-action Deletes the specified scheduled action.
as-delete-tags Delete the specified tags
as-describe-adjustment-types Describes all policy adjustment types.
as-describe-auto-scaling-groups Describes the specified Auto Scaling groups.
as-describe-auto-scaling-instances Describes the specified Auto Scaling instances.
as-describe-auto-scaling-notification-types Describes all Auto Scaling notification types.
as-describe-launch-configs Describes the specified launch configurations.
as-describe-metric-collection-types Describes all metric colle... metric granularity types.
as-describe-notification-configurations Describes all notification...given Auto Scaling groups.
as-describe-policies Describes the specified policies.
as-describe-process-types Describes all Auto Scaling process types.
as-describe-scaling-activities Describes a set of activit...ties belonging to a group.
as-describe-scheduled-actions Describes the specified scheduled actions.
as-describe-tags Describes tags
as-describe-termination-policy-types Describes all Auto Scaling termination policy types.
as-disable-metrics-collection Disables collection of Auto Scaling group metrics.
as-enable-metrics-collection Enables collection of Auto Scaling group metrics.
as-execute-policy Executes the specified policy.
as-put-notification-configuration Creates or replaces notifi...or the Auto Scaling group.
as-put-scaling-policy Creates or updates an Auto Scaling policy.
as-put-scheduled-update-group-action Creates or updates a scheduled update group action.
as-resume-processes Resumes all suspended Auto... given Auto Scaling group.
as-set-desired-capacity Sets the desired capacity of the Auto Scaling group.
as-set-instance-health Sets the health of the instance.
as-suspend-processes Suspends all Auto Scaling ... given Auto Scaling group.
as-terminate-instance-in-auto-scaling-group Terminates a given instance.
as-update-auto-scaling-group Updates the specified Auto Scaling group.
help
version Prints the version of the CLI tool and the API.
For help on a specific command, type ' --help'
In Part 2, we will go through an example that will launch an instance via a auto scaling group.
We will first begin by installing the Auto Scaling Command Line Tools in a new Ubuntu machine.
Connect to your machine by ssh.
Download and Unzip the Auto Scaling Command Line Tools
mkdir /opt/tools
cd /opt/tools
wget http://ec2-downloads.s3.amazonaws.com/AutoScaling-2011-01-01.zip
sudo apt-get install unzip
unzip AutoScaling-2011-01-01.zip
Install Java
Read Install Java OpenJDK 7 on Amazon EC2 Ubuntu.
Setting the environment variables
In your ~/.bashrc file, append the following lines to the end of the file.
export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")Set the AWS_AUTO_SCALING_HOME to the location where you unzipped the command line tools.
export PATH=$JAVA_HOME/bin:$PATH
export AWS_AUTO_SCALING_HOME=/opt/tools/AutoScaling-1.0.61.2
export PATH=$PATH:$AWS_AUTO_SCALING_HOME/bin
Install the Security Credentials
Go to the AWS security console.
Scroll to the Access Credentials section.
Note down an active pair of Access Key ID and Secret Access Key (Click show to see the secret access key).
vi /opt/tools/AutoScaling-1.0.61.2/credential-file-path.templatePaste your keys:
AWSAccessKeyId=Append to ~/.bashrc
AWSSecretKey=
export AWS_CREDENTIAL_FILE=/opt/tools/AutoScaling-1.0.61.2/credential-file-path.template
Setting the Auto Scaling Region
By default, the auto scaling region is us-east-1.
If want to use a different region, you need to change to your region. Note down the region endpoint here: Regions and Endpoints.
vi ~/.bashrc
export AWS_AUTO_SCALING_URL=https://autoscaling.us-east-1.amazonaws.com
Test your configuration
as-cmd
You should see a panel of commands like the following:
Command Name Description
------------ -----------
as-create-auto-scaling-group Create a new Auto Scaling group.
as-create-launch-config Creates a new launch configuration.
as-create-or-update-tags Create or update tags.
as-delete-auto-scaling-group Deletes the specified Auto Scaling group.
as-delete-launch-config Deletes the specified launch configuration.
as-delete-notification-configuration Deletes the specified notification configuration.
as-delete-policy Deletes the specified policy.
as-delete-scheduled-action Deletes the specified scheduled action.
as-delete-tags Delete the specified tags
as-describe-adjustment-types Describes all policy adjustment types.
as-describe-auto-scaling-groups Describes the specified Auto Scaling groups.
as-describe-auto-scaling-instances Describes the specified Auto Scaling instances.
as-describe-auto-scaling-notification-types Describes all Auto Scaling notification types.
as-describe-launch-configs Describes the specified launch configurations.
as-describe-metric-collection-types Describes all metric colle... metric granularity types.
as-describe-notification-configurations Describes all notification...given Auto Scaling groups.
as-describe-policies Describes the specified policies.
as-describe-process-types Describes all Auto Scaling process types.
as-describe-scaling-activities Describes a set of activit...ties belonging to a group.
as-describe-scheduled-actions Describes the specified scheduled actions.
as-describe-tags Describes tags
as-describe-termination-policy-types Describes all Auto Scaling termination policy types.
as-disable-metrics-collection Disables collection of Auto Scaling group metrics.
as-enable-metrics-collection Enables collection of Auto Scaling group metrics.
as-execute-policy Executes the specified policy.
as-put-notification-configuration Creates or replaces notifi...or the Auto Scaling group.
as-put-scaling-policy Creates or updates an Auto Scaling policy.
as-put-scheduled-update-group-action Creates or updates a scheduled update group action.
as-resume-processes Resumes all suspended Auto... given Auto Scaling group.
as-set-desired-capacity Sets the desired capacity of the Auto Scaling group.
as-set-instance-health Sets the health of the instance.
as-suspend-processes Suspends all Auto Scaling ... given Auto Scaling group.
as-terminate-instance-in-auto-scaling-group Terminates a given instance.
as-update-auto-scaling-group Updates the specified Auto Scaling group.
help
version Prints the version of the CLI tool and the API.
For help on a specific command, type '
Wednesday, April 17, 2013
Using Munin to monitor EC2 instances on Amazon
After playing around with CloudWatch, I find the interface very confusing to use. The biggest problem is EC2 instance are described by AMI image ID rather than my pre-defined machine tag name (Let me know in the comments below if you can figure out how).
Restart the Munin node.
Now in the Master Munin node, edit vi /etc/munin/munin.conf. Search for
[localhost.localdomain]
address 127.0.0.1
use_node_name yes
Change it to
[api1.monetize24hours.com]
address ip-00-000-000-000.ec2.internal
This value must match the host name you defined in the Munin node above. The address is the ec2 private address of the Munin node. This is how Munin master will aggregate and report the data.
Showing Data on Webpages
Make sure the Munin master can connect to your Munin nodes.
If it doesn't connect, add port 4949 for the Munin node's security group.
You can find the Munin master's security group name by clicking on the Security Group and checking the Details tab. If looks something like "sg-e0000000".
Now edit /etc/munin/munin.conf to tweak the log and graph generation directories.
Wait for 5 to 10 minutes. The Perl cron will gather data.
Access the graphs by
We will use .htaccess in the following example.
Change AllowOverride None to AllowOverride All
Restart apache.
I was looking at a few monitoring tools (Nagis, Cacti, Munin, Zabbix) and decided to try to out Munin. The biggest motivator for me is that Instagram is also using Munin.
Let's begin by spinning an Ubuntu instance as the Munin master.
Installing Munin Master and Munin Nodes:
Install munin and munin-node
apt-get install munin
apt-get install munin-node
Install apache (for viewing reports from the Web)
apt-get install apache2
For all the instances you want to monitor, install Munin Node.
apt-get install munin-nodeFor these node instances, we will edit munin-node.conf
vi /etc/munin/munin-node.confChange the host_name. Name this to be something descriptive so you will know what this machine is. The master node will report using this name.
host_name {api1.monetize24hours.com}Change allow from
allow ^127\.0\.0\.1$to
allow ^.*$This is saying allow all internal IPs to connect to. Since AWS elastic address changes all the time, it's better to set it to allow all. Do NOT set it to the instance's external address else you will be charged for data transfer. Make sure all the machines are behind a firewall.
Restart the Munin node.
/etc/init.d/munin-node restartRepeat the settings above for all the Munin nodes.
Now in the Master Munin node, edit vi /etc/munin/munin.conf. Search for
[localhost.localdomain]
address 127.0.0.1
use_node_name yes
Change it to
[api1.monetize24hours.com]
address ip-00-000-000-000.ec2.internal
This value must match the host name you defined in the Munin node above. The address is the ec2 private address of the Munin node. This is how Munin master will aggregate and report the data.
Showing Data on Webpages
Make sure the Munin master can connect to your Munin nodes.
telnet {private_ec2_address} 4949Port 4949 is used for Munin internodes communication
If it doesn't connect, add port 4949 for the Munin node's security group.
You can find the Munin master's security group name by clicking on the Security Group and checking the Details tab. If looks something like "sg-e0000000".
Now edit /etc/munin/munin.conf to tweak the log and graph generation directories.
bdir /var/lib/muninChange the above directories. Create them if they don't exist. Make sure you set the appropriate permissions for the directories.
htmldir /var/www/munin
logdir /var/log/munin
rundir /var/run/munin
Wait for 5 to 10 minutes. The Perl cron will gather data.
Access the graphs by
{public_ec2_address}/muninYou will want to secure the webpages so no one else can access them. Either secure them by ip or username and password.
We will use .htaccess in the following example.
htpasswd -c /etc/apache2/.munin_htpasswd adminCreate /var/www/munin/.htaccess, and put the following:
AuthUserFile /etc/apache2/.munin_htpasswdEdit /etc/apache2/sites-available/default.
AuthGroupFile /dev/null
AuthName EnterPassword
AuthType Basic
Change AllowOverride None to AllowOverride All
Restart apache.
service apache2 restart
Tuesday, April 16, 2013
Using Amazon CloudWatch Command Line Tool to record metrics
Introduction
Amazon CloudWatch provides seamless integration for monitoring AWS resources like EC2 instances, RDS instances, EBS volumes, etc based on CPU utilization, data transfer and disk usage.
There are two types of monitoring: basic and detailed. Basic Monitoring reports at a five-minute frequency. Detailed Monitoring reports at a one-minute frequency while aggregating by AMI ID and instance type.
Monitoring data is retained for two weeks, even if your instance is terminated.
Below are the resources to metrics mapping: (For example, CloudWatch tracks request count and latency of Elastic Load Balancer)
ELB - request count, latency
EBS - read/write latency
RDS - freeable memory, available storage space
SQS - number of messages sent and received
You can also send custom metrics to CloudWatch by using the Put API.
You can view your stats in the AWS Management Console.
For overall status of all AWS services, check AWS Service Health Dashboard.
Setting up Amazon CloudWatch Command Line Tool
Spin up a EC2 instance. (Skip this if you are using your home computer).
Begin by downloading the CloudWatch CLI Tool.
Set AWS_CLOUDWATCH_HOME path in ~/.bashrc
chmod 600 credential-file-path.template
Rename credential-file-path.template to something else (ex. aws_credentials)
Move this file to somewhere else. You may be using this in some other service. For instance, move to /opt/tools/aws.
Add this entry to ~/.bashrc
Publish data points to CloudWatch
CloudWatch allows you to publish data points via PUT requests. CloudWatch only works with data that's in UTC timestamp and within the past two weeks (Only data within two weeks would be retained).
In this example, you will feed CloudWatch with some custom data points.
Execute the following data sets. But substitute the date below to be within a few hours before.
C
Set A (4 data points):
Let's get the data summary:
Login to CloudWatch.
Click on Metrics in the Left Panel. Select Test001 in the "Viewing" Dropdown box.
You can also create alarms based on this metric.
This concludes the tutorial. If you are interested in more advanced tools, check out this post - Using Munin to monitor EC2 instances on Amazon.
Amazon CloudWatch provides seamless integration for monitoring AWS resources like EC2 instances, RDS instances, EBS volumes, etc based on CPU utilization, data transfer and disk usage.
There are two types of monitoring: basic and detailed. Basic Monitoring reports at a five-minute frequency. Detailed Monitoring reports at a one-minute frequency while aggregating by AMI ID and instance type.
Monitoring data is retained for two weeks, even if your instance is terminated.
Below are the resources to metrics mapping: (For example, CloudWatch tracks request count and latency of Elastic Load Balancer)
ELB - request count, latency
EBS - read/write latency
RDS - freeable memory, available storage space
SQS - number of messages sent and received
You can also send custom metrics to CloudWatch by using the Put API.
You can view your stats in the AWS Management Console.
For overall status of all AWS services, check AWS Service Health Dashboard.
Setting up Amazon CloudWatch Command Line Tool
Spin up a EC2 instance. (Skip this if you are using your home computer).
Begin by downloading the CloudWatch CLI Tool.
mkdir /opt/tools/awsInstall zip and unzip the package.
cd /opt/tools/aws
wget http://ec2-downloads.s3.amazonaws.com/CloudWatch-2010-08-01.zip
sudo apt-get install zipCheck if you have JAVA installed
unzip CloudWatch-2010-08-01.zip
java -versionIf not, read Install Java OpenJDK 7 on Amazon EC2 Ubuntu.
Set AWS_CLOUDWATCH_HOME path in ~/.bashrc
export AWS_CLOUDWATCH_HOME=/opt/tools/aws/CloudWatch-1.0.13.4Enter your AWS Access Key ID and Secret Access Key in the file $AWS_CLOUDWATCH_HOME/bin/credential-file-path.template. You can find your credentials in the AWS Management Console.
export PATH=$PATH:$AWS_CLOUDWATCH_HOME/bin
AWSAccessKeyId=
AWSSecretKey=
chmod 600 credential-file-path.template
Rename credential-file-path.template to something else (ex. aws_credentials)
Move this file to somewhere else. You may be using this in some other service. For instance, move to /opt/tools/aws.
Add this entry to ~/.bashrc
export AWS_CREDENTIAL_FILE=/opt/tools/aws/aws_credentialsUpdate ~/.bashrc.
source ~/.bashrcTest the tool:
mon-cmdYou should see the following:
Command Name Description
------------ -----------
help
mon-delete-alarms Delete alarms
mon-describe-alarm-history Describe alarm history
mon-describe-alarms Describe alarms fully.
mon-describe-alarms-for-metric Describe all alarms associated with a single metric
mon-disable-alarm-actions Disable all actions for a given alarm
mon-enable-alarm-actions Enable all actions for a given alarm
mon-get-stats Get metric statistics
mon-list-metrics List user's metrics
mon-put-data Put metric data
mon-put-metric-alarm Create a new alarm or update an existing one
mon-set-alarm-state Manually set the state of an alarm
version Prints the version of the CLI tool and the API.
Publish data points to CloudWatch
CloudWatch allows you to publish data points via PUT requests. CloudWatch only works with data that's in UTC timestamp and within the past two weeks (Only data within two weeks would be retained).
In this example, you will feed CloudWatch with some custom data points.
Execute the following data sets. But substitute the date below to be within a few hours before.
C
Set A (4 data points):
mon-put-data -m RequestLatency -n "Test001" -t 2013-04-16T20:30:00Z -v 87 -u MillisecondsSet B (Instead of sending individual data points, send sum, min, max and sample count):
mon-put-data -m RequestLatency -n "Test001" -t 2013-04-16T20:30:00Z -v 51 -u Milliseconds
mon-put-data -m RequestLatency -n "Test001" -t 2013-04-16T20:30:00Z -v 125 -u Milliseconds
mon-put-data -m RequestLatency -n "Test001" -t 2013-04-16T20:30:00Z -v 235 -u Milliseconds
mon-put-data -m RequestLatency -n "Test001" -t 2013-04-16T21:30:00Z -s "Sum=577,Minimum=65,Maximum=189,SampleCount=5" -u MillisecondsSet C:
mon-put-data -m RequestLatency -n "Test001" -s "Sum=806,Minimum=47,Maximum=328,SampleCount=6" -u MillisecondsThe above are data within latency within three hours. Just think of them as some data points.
Let's get the data summary:
mon-get-stats -m RequestLatency -n "Test001" -s "Average" --start-time 2013-04-16T19:30:00Z --headersResults:
Time Average UnitYou can also see the Visual Representation in the AWS Management Console.
2013-04-16 20:30:00 124.5 Milliseconds
2013-04-16 21:30:00 115.4 Milliseconds
2013-04-16 22:29:00 134.33333333333334 Milliseconds
Login to CloudWatch.
Click on Metrics in the Left Panel. Select Test001 in the "Viewing" Dropdown box.
You can also create alarms based on this metric.
This concludes the tutorial. If you are interested in more advanced tools, check out this post - Using Munin to monitor EC2 instances on Amazon.
Monday, February 25, 2013
AWS Java - Securing S3 content using query string authentication
Amazon S3 is a highly available and durable hosting environment that can let you serve websites, images, and large files. Sometimes, you may want to secure your contents so only you or your authenticated users can access them. This becomes more important when it's pay content.
This post is about using query string authentication to make the content to be available for a specified period of time.
Specs:
Assuming that you either have read the post above or you have implemented upload, upload a file to your Amazon S3 account.
In the AWS Management console, set the file's ACL permissions to your administrative account only (By default, it should be already if you didn't programmatically changed the ACL permission).
We will implement the following function called getS3Url().
We have set the expiration date to be one hour later. You would see the following expiration message an hour later.
This post is about using query string authentication to make the content to be available for a specified period of time.
Specs:
- Java 1.7
- Eclipse Juno
Before you begin, make sure you have all the AWS Eclipse tools ready. Read Using Java AWS SDK to upload files to Amazon S3 for how to install the AWS SDK tool and a basic guide on how to upload, delete and retrieve files on S3.
Signing the request will require the following structure:
Authorization = "AWS" + " " + AWSAccessKeyId + ":" + Signature; Signature = Base64( HMAC-SHA1( YourSecretAccessKeyID, UTF-8-Encoding-Of( StringToSign ) ) ); StringToSign = HTTP-Verb + "\n" + Content-MD5 + "\n" + Content-Type + "\n" + Date + "\n" + CanonicalizedAmzHeaders + CanonicalizedResource; CanonicalizedResource = [ "/" + Bucket ] ++ [ sub-resource, if present. For example "?acl", "?location", "?logging", or "?torrent"]; CanonicalizedAmzHeaders =
Assuming that you either have read the post above or you have implemented upload, upload a file to your Amazon S3 account.
In the AWS Management console, set the file's ACL permissions to your administrative account only (By default, it should be already if you didn't programmatically changed the ACL permission).
We will implement the following function called getS3Url().
We have set the expiration date to be one hour later. You would see the following expiration message an hour later.
< Error>
< Code>AccessDenied</ Code>
< Message>Access Denied</ Message>
< RequestId>8ECB67C2458CE483</ RequestId>
< HostId>
vL6wXNOkvYlpHXbvvlG1SGhy3q/+Ocb3guXtyaDZjmEu24Z4XQpwjfmNAvM+SViz
</ HostId>
</ Error>
Thursday, February 21, 2013
Using Java AWS SDK to upload files to Amazon S3
Amazon S3 is a highly available and durable storage suitable for storage large files that do not change frequently. This post will focus on how to upload files programmatically via the Java Amazon SDK. For an introduction to S3, read What is Amazon Simple Storage Service (Amazon S3)?
My specs:
Install AWS Toolkit
In eclipse, click on help in the menu bar and then "Install New Software".
In the "Work with:" input box, put " http://aws.amazon.com/eclipse" and Click Add...
Check on the AWS Toolkit for Eclipse and click Yes to install all the tools.
In the Eclipse toolbar, you will see a red cube icon. Click on the down arrow next to this icon. Click Preference.
Fill in your Access Key ID and Secret Access Key. Give it an Account Name (Ex. use your email). You can find your keys in the Amazon Management Console (My Account/Console -> Security Credentials). Click on Apply and OK.
In the Eclipse menu bar, click on Window -> Preferences. Expand the AWS Toolkit. Right click on your key. Click Select Private Key File. Associate it with your private key. Click OK.
Click on the down arrow next to the Amazon cube icon. Select Show AWS Explorer View. You should be able to see the Amazon S3 service and all your related buckets (if you have any).
Download and Install the AWS SDK for Java
You can download it here. Click on the AWS SDK for Java button.
Extract the file. Code Samples are located in /samples.
If you are using Maven, you can add the AWS SDK as a dependency in the pom.xml file.
< dependency>
< groupId>com.amazonaws</ groupId>
< artifactId>aws-java-sdk</ artifactId>
< version>1.3.32</ version>
< /dependency>
Choose the version you want here.
Alternatively, you can just add it as a library (Right Click on the project -> Java Build Path -> Libraries -> Add External JARs).
Running the default AWS Sample Apps
We will begin by setting up a sample project that you can check out how S3 works.
Click on the down arrow next to the Amazon icon.
Select New AWS Java Project.
Give a Project name.
Select your account.
Select Amazon S3 Sample, Amazon S3 Transfer Progress Sample, and AWS Console Application. Click Next.
Expand the newly created project. Left click on the AwsConsoleApp.java. In the Eclipse menu bar, click on Run -> Run.
You should see output like the following:
If you run the S3Sample.java, you will get the following:
AmazonS3 s3 = new AmazonS3Client(new ClasspathPropertiesFileCredentialsProvider());
String bucketName = "my-s3-bucket-" + UUID.randomUUID();
s3.createBucket(bucketName);
s3.deleteBucket(bucketName);
Exceptions
Whenever you call any of the AWS API, you should surround the calls with try and catch clauses like the following:
try{
// AWS requests here
If you interested in securing your S3 contents for your authenticated users only, check out AWS Java - Securing S3 content using query string authentication.
My specs:
- Eclipse Juno
- SpringMVC 3.1.x
- Maven 3.0.x
Install AWS Toolkit
In eclipse, click on help in the menu bar and then "Install New Software".
In the "Work with:" input box, put " http://aws.amazon.com/eclipse" and Click Add...
Check on the AWS Toolkit for Eclipse and click Yes to install all the tools.
In the Eclipse toolbar, you will see a red cube icon. Click on the down arrow next to this icon. Click Preference.
Fill in your Access Key ID and Secret Access Key. Give it an Account Name (Ex. use your email). You can find your keys in the Amazon Management Console (My Account/Console -> Security Credentials). Click on Apply and OK.
In the Eclipse menu bar, click on Window -> Preferences. Expand the AWS Toolkit. Right click on your key. Click Select Private Key File. Associate it with your private key. Click OK.
Click on the down arrow next to the Amazon cube icon. Select Show AWS Explorer View. You should be able to see the Amazon S3 service and all your related buckets (if you have any).
Download and Install the AWS SDK for Java
You can download it here. Click on the AWS SDK for Java button.
Extract the file. Code Samples are located in /samples.
If you are using Maven, you can add the AWS SDK as a dependency in the pom.xml file.
< dependency>
< groupId>com.amazonaws</ groupId>
< artifactId>aws-java-sdk</ artifactId>
< version>1.3.32</ version>
< /dependency>
Choose the version you want here.
Alternatively, you can just add it as a library (Right Click on the project -> Java Build Path -> Libraries -> Add External JARs).
Running the default AWS Sample Apps
We will begin by setting up a sample project that you can check out how S3 works.
Click on the down arrow next to the Amazon icon.
Select New AWS Java Project.
Give a Project name.
Select your account.
Select Amazon S3 Sample, Amazon S3 Transfer Progress Sample, and AWS Console Application. Click Next.
Expand the newly created project. Left click on the AwsConsoleApp.java. In the Eclipse menu bar, click on Run -> Run.
You should see output like the following:
===========================================
Welcome to the AWS Java SDK!
===========================================
You have access to 3 Availability Zones.
You have 14 Amazon EC2 instance(s) running.
You have 0 Amazon SimpleDB domain(s)containing a total of 0 items.
You have 8 Amazon S3 bucket(s), containing 71841 objects with a total size of 224551364 bytes.
If you run the S3Sample.java, you will get the following:
===========================================
Getting Started with Amazon S3
===========================================
Creating bucket my-first-s3-bucket-39065c55-2ee5-413a-9de1-6814dbb253c1
Listing buckets
- my-first-s3-bucket-39065c55-2ee5-413a-9de1-6814dbb253c1
Uploading a new object to S3 from a file
Downloading an object
Content-Type: text/plain
abcdefghijklmnopqrstuvwxyz
01234567890112345678901234
!@#$%^&*()-=[]{};':',.<>/?
01234567890112345678901234
abcdefghijklmnopqrstuvwxyz
Listing objects
- MyObjectKey (size = 135)
Deleting an object
Deleting bucket my-first-s3-bucket-39065c55-2ee5-413a-9de1-6814dbb253c1
Integrate the S3 SDK
To begin, you need to have the file AwsCredentials.properties at the root of you class path. You can just copy the one generated during the sample project to your project class path. Or you can just create one with the following content:
secretKey=
accessKey=
Create an authenticated S3 object:
AmazonS3 s3 = new AmazonS3Client(new ClasspathPropertiesFileCredentialsProvider());
Objects in S3 are stored in the form of buckets. Each bucket is globally unique. You cannot create a bucket with a name that another user has created. Each bucket contains key and value pairs you can define in any ways you want.
Create a bucket:
s3.createBucket(bucketName);
For readability, I have skipped the exception handling, I will come back to it at the end. The name of the bucket must conform to all the DNS rules. I usually name them using my domain name.
Delete a bucket:
List all buckets:
for (Bucket bucket : s3.listBuckets()) {
System.out.println(" - " + bucket.getName());
}
Save an object in a bucket:
String key = "myObjectKey";
PutObjectRequest putObject = new PutObjectRequest(bucketName, key, myFile);
s3.putObject(putObject);
myFile is of class File above.
Delete an object:
s3.deleteObject(bucketName, key);
Get/Download an object:
String key = "myObjectKey";
GetObjectRequest getObject = new GetObjectRequest(bucketName, key);
S3Object object = s3.getObject(getObject);
List objects by prefix:
ObjectListing objectListing = s3.listObjects(new ListObjectsRequest()
.withBucketName(bucketName)
.withPrefix("My"));
for (S3ObjectSummary objectSummary : objectListing.getObjectSummaries()) {
System.out.println(" - " + objectSummary.getKey() + " " +
"(size = " + objectSummary.getSize() + ")");
}
Uploading large files
Use TransferManager whenever possible. It makes use of S3 multipart uploads to achieve enhanced throughput, performance, and reliability. It uses multiple threads to upload multiple parts of a single upload at once.
AWSCredentials myCredentials = new BasicAWSCredentials(...);
TransferManager tx = new TransferManager(myCredentials);
Upload myUpload = tx.upload(myBucket, myFile.getName(), myFile);
while (myUpload.isDone() == false) {
System.out.println("Transfer: " + myUpload.getDescription());
System.out.println(" - State: " + myUpload.getState());
System.out.println(" - Progress: " + myUpload.getProgress().getBytesTransfered());
// Do work while we wait for our upload to complete...
Thread.sleep(500);
}
Whenever you call any of the AWS API, you should surround the calls with try and catch clauses like the following:
try{
// AWS requests here
} catch (AmazonServiceException ase) {
System.out.println("Caught an AmazonServiceException, which means your request made it "
+ "to Amazon S3, but was rejected with an error response for some reason.");
System.out.println("Error Message: " + ase.getMessage());
System.out.println("HTTP Status Code: " + ase.getStatusCode());
System.out.println("AWS Error Code: " + ase.getErrorCode());
System.out.println("Error Type: " + ase.getErrorType());
System.out.println("Request ID: " + ase.getRequestId());
} catch (AmazonClientException ace) {
System.out.println("Caught an AmazonClientException, which means the client encountered "
+ "a serious internal problem while trying to communicate with S3, "
+ "such as not being able to access the network.");
System.out.println("Error Message: " + ace.getMessage());
}
}
If you interested in securing your S3 contents for your authenticated users only, check out AWS Java - Securing S3 content using query string authentication.
Saturday, February 9, 2013
Micro Instance out of memory - add swap
I was trying to update my Symfony project and I got the following while I was trying to update the database schema or assets:
What I can do is to 1) switch to a small instance or 2) add a 1GB swap in disk.
Here are the commands to add a 1GB swap
sudo /bin/dd if=/dev/zero of=/var/swap.1 bs=1M count=1024
sudo /sbin/mkswap /var/swap.1
To turn off the swap do the following:
sudo /sbin/swapoff /var/swap.1
Fatal error: Uncaught exception 'ErrorException' with message 'Warning: proc_open(): fork failed - Cannot allocate memory inAn Amazon EC2 micro.t1 instance only has 613MB RAM. It is not enough to run a lot of processes.
What I can do is to 1) switch to a small instance or 2) add a 1GB swap in disk.
Here are the commands to add a 1GB swap
sudo /bin/dd if=/dev/zero of=/var/swap.1 bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 34.1356 s, 31.5 MB/s
sudo /sbin/mkswap /var/swap.1
Setting up swapspace version 1, size = 1048572 KiBsudo /sbin/swapon /var/swap.1
no label, UUID=9cffd7c9-8ec6-4f6c-8eea-79aa3173a59a
To turn off the swap do the following:
sudo /sbin/swapoff /var/swap.1
Subscribe to:
Posts (Atom)