Wednesday, July 17, 2013

Ansbile EC2 - setting up Nginx, MySQL, php, git

In this post, we will write a playbook that's going to set up a EC2 machine for a fully workable php environment.

Starting from a fresh machine with an attached ebs volume, we will do the following:

  1. Format the new ebs volume with XFS and mount it as /vol
  2. Install php, mysql and nginx
  3. Create a mysql user and create a database
  4. Copy the public and private keys into the targeted machine
  5. Checkout a project from github

Begin by spinning a fresh EC2 AMI and attach a ebs volume to it. Read Ansible - how to launch EC2 instances and setup the php environment.


Format the new ebs volume with XFS and mount it as /vol

We will mount the new ebs volume /dev/xvdf as /vol and format it with XFS

    - name: update machine with latest packages
      action: command yum -y update
    - name: install xfsprogs
      action: yum pkg=xfsprogs state=latest
    - name: format new volume
      filesystem: fstype=xfs dev=/dev/xvdf
    - name: edit fstab and mount the vol
      action: mount name={{mount_dir}} src=/dev/xvdf opts=noatime fstype=xfs state=mounted


Install php, mysql and nginx

    - name: install php
      action: yum pkg=php state=latest
    - name: install php-mysql
      action: yum pkg=php-mysql state=latest
    - name: install nginx
      action: yum pkg=nginx state=latest
    - name: ensure nginx is running
      action: service name=nginx state=started
    - name: install mysql server
      action: yum pkg=mysql-server state=latest
    - name: make sure mysql is running
      action: service name=mysqld state=started


Create a mysql user and a database

    - name: install python mysql
      action: yum pkg=MySQL-python state=latest
    - name: create database user
      action: mysql_user user=admin password=1234qwer priv=*.*:ALL state=present
    - name: create db
      action: mysql_db db=ansible state=present


Copy the public and private keys into the targeted machine

We want the target machine to be able to do a git pull without username and password prompts.

mkdir ~/.ssh
ssh-keygen -t rsa -C "you@email.com"

You will see:
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Just press Enter on the above prompts.

Two files will be generated: id_rsa, id_rsa.pub

Log in to Github and then Go to Account Settings -> SSH Keys

Add new key by giving it a name and pasting the content of id_rsa.pub

Test it by:
ssh -T git@github.com
Here are the Ansible tasks:

    - name: install git
      action: yum pkg=git state=latest
    - name: copy private key
      action: template src=~/.ssh/id_rsa.pub dest=~/.ssh/id_rsa.pub
    - name: copy public key
      action: template src=~/.ssh/id_rsa dest=~/.ssh/id_rsa


Checkout a project from github

    - name: git checkout source
      action: git repo=ssh://git@github.com:{your_git_repo}.git dest={{work_dir}} version=unstable


Full Ansible Playbook source:

Tuesday, July 16, 2013

Ansible - how to launch EC2 instances and setup the php environment

In this post, we will create a script that will launch an instance in the EC2 cloud and install php and nginx (Installing httpd is going to be very similar) on it.

First you will need to set be Ansible.

If you are using ubuntu, read Install Ansible on ubuntu EC2.

If you are using a Mac, read Installing and Running Ansible on Mac OSX and pinging ec2 machines.

You must:
  • have python boto installed
  • set up the AWS access keys in the environment settings
Adding a host

We will use the ec2 module. It runs against localhost, so we will add a host entry.

vi /etc/ansible/hosts

Append the following:

localhost ansible_connection=local

Launching a micro instance



Label this launch_playbook.yml

Execute the script.
ansible-playbook launch_playbook.yml
In your AWS EC2 console, you will see an instance named ansible. Each task is executed in sequence.

Now add this new host in the ansible host file and label it webservers.

vi /etc/ansible/hosts
[webservers]
{the_ip_of_ec2_instance_we_just_created} ansible_connection=ssh ansible_ssh_user=ec2-user ansible_ssh_private_key_file={path_to_aws_private_key}
You don't have to do the above. In fact, you can use the group name "ec2-servers" for the following script. But the following script will need to be in the same file as the first script. I am just separating these files for easier configuration in the future.


Installing php, nginx, mysql

Label this configure_playbook.yml

Execute the script.
ansible-playbook configure_playbook.yml
Go to the public address of this instance. You should see the nginx welcoming message.

Remember to terminate the instance when you finish, else it will incur charges.

Install Ansible on ubuntu EC2

Begin by spinning a new EC2 ubuntu instance.


Install Ansible and its dependencies
sudo apt-get install python-pip python-dev
sudo pip install ansible
sudo apt-get install python-boto 
Make sure boto version is larger than 2.3

To check boto version:
pip freeze | grep boto

Make the hosts file
sudo mkdir /etc/ansible
sudo touch /etc/ansible/hosts
Put the IPs of your machines in the hosts file.

Ex. [webservers] is a group name for the 2 IPs below.
[webservers]
255.255.255.255
111.111.111.111

Check the Playbook Settings

ansible playbook playbook.yml --list-hosts

You will see the servers that the Playbook will run against:

  play #1 (create instances): host count=1
    localhost

  play #2 (configure instances): host count=0


Play the Playbook

ansible-playbook playbook.yml


AWS credentials

If you are going to use the ec2 module, you will need to set up the access keys in your environment.
vi ~/.bashrc
Append the following with your keys (You need to log in to your AWS console to get the access key pairs)
export AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY}
export AWS_SECRET_ACCESS_KEY=${AWS_SECRET_KEY}

Saturday, July 13, 2013

Installing and Running Ansible on Mac OSX and pinging ec2 machines

We will be installing Ansible from Git.


Install Ansible

Download ez_setup.py.

Install ez_setup, pip, jinja2 and ansible.
sudo python ez_setup.py
sudo easy_install pip
sudo pip install ansible
sudo pip install jinja2

Define your host file

Create the file /etc/ansible/hosts.

Put the IP of each machine you want to ping.

Example:
[appservers]
255.255.255.255 ansible_ssh_private_key_file={your_key_path}.pem  ansible_ssh_user=ec2-user
Change the IP to your EC2 instance's IP. The [appservers] is just a label for grouping. You may have servers grouped as web servers, app servers, db servers, etc.


Run Ansible

ansible all -m ping

You will see a response similar to the following if it's successful.
255.255.255.255 | success >> {
    "changed": false,
    "ping": "pong"
Let's execute a command on all the machines:

ansible all -a "/bin/echo hello"

You will see:
255.255.255.255 | success | rc=0 >>
hello

Saving the key in memory

If you don't specify the ansible_ssh_private_key_file and ansible_ssh_user attributes in the inventory file above. You can either 1.) specify the key and user in the ansible command or 2.) use ssh-agent.

1.) Explicitly specifying the user and key:
ansible all -m ping -u ec2-user --private-key={your_key}.pem
2.) Using ssh-agent and ssh-add
ssh-agent bash
ssh-add ~/.ssh/{your_key}.pem
Then you can ping the ec2 server like this:
ansible all -m ping -u ec2-user

Wednesday, July 10, 2013

AWS Elastic MapReduce - EMR MySQL DBInputFormat

In this post, we will build a MapReduce program as a JAR executable. To make this example more interesting than most of the other online posts out there, we will modify the common WordCount example to fetch from MySQL instead of a text file.

You will need to at least understand the basics of what are the mapper and the reducer to follow this post. You may want to read this from Apache.

We will use Maven to build the project. If you have no idea how to do this, read Building a JAR Executable with Maven and Spring. We will feed this JAR via the Amazon Elastic MapReduce (EMR) and save the output in Amazon S3.

Here are the EMR supported Hadoop Versions. We will be using 1.0.3.


What we will do:

Assume we have a database called Company and there is a table called Employee with two columns: id and title.

We will count the number of employees with the same titles.

This is same as the WordCount examples you see in other tutorials, but we are fetching this from a database.


Install Hadoop Library

First in your java project, include the Maven Library in the pom.xml file.

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.0.3</version>
</dependency>


The File Structure

The program will be very basic and contain the following files. The filenames should be self-explanatory.

Main.java
Map.java
Reduce.java


The mapred library VS the mapreduce library

When you are reading other hadoop examples online, you will see them using either the mapred or the mapreduce library. mapred is the older version, while mapreduce is the cleaner and newer version. To upgrade from mapred to mapreduce, read Hadoop - mapred VS mapreduce libraries.

This example will use the org.apache.hadoop.mapreduce library.


EmployeeRecord

We will need to serialize the object of our interest by implementing Writable and DBWritable as show below.




The Mapper




The Reducer




Main.java

We will hope everything up. The steps are simple.

Create a Job.
Set output format.
Set input format.
Set Mapper class.
Set Reducer class.
Set input. (In our case, it will be from the database)
Set output.


Run the Job via the AWS EMR console

Compile the project and generate a self-contained JAR file. If you are using maven, read Building a JAR Executable with Maven and Spring.

Upload your JAR file to your s3 bucket.

In the AWS EMR console, specify the location of the JAR file.

JAR location: {your_bucket_name}/{jar_name}

Arguments: s3n://{your_bucket_name}/output

The program above takes in the output location as an argument.

Read AWS - Elastic Map Reduce Tutorial for more details on how to create a job flow in EMR.

If you encounter the mysql driver missing error, read Amazon Elastic MapReduce (EMR) ClassNotFoundException: com.mysql.jdbc.Driver.

Tuesday, July 9, 2013

Amazon Elastic MapReduce (EMR) ClassNotFoundException: com.mysql.jdbc.Driver

If you get the "ClassNotFoundException: com.mysql.jdbc.Driver" error while doing a JAR Elastic MapReduce, you will need to copy the mysql connector library into the hadoop/bin library.

The error will look like:

Caused by: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:169)
at org.apache.hadoop.mapreduce.lib.db.DBConfiguration.getConnection(DBConfiguration.java:148)
at org.apache.hadoop.mapreduce.lib.db.DBInputFormat.getConnection(DBInputFormat.java:184)
... 20 more

We can copy the mysql connector library to each of the machines by "bootstrapping".


1.) Get the MySQL connector library.

You can download it from the Maven repository.

Create a bucket on S3 and upload the SQL connector to this bucket.


2.) Writing a bootstrap bash file

Name this file bootstrap.sh. We will use the "hadoop fs" command to copy the connector from S3 to each machine.

Script:
#!/bin/bash
hadoop fs -copyToLocal s3n://wundrbooks-emr-dev/mysql-connector-java-5.1.25.jar $HADOOP_HOME/lib
Upload this script to the same bucket you created in the previous step.


 3.) Create a Job Flow

Log in to the AWS EMR console.

Click on create a job flow.

Fill in all the details including your JAR file.

At the last "bootstrap" step, select custom bootstrap action and put in the location of the bootstrap.sh script (ex. s3n://{my_bucket}/bootscript.sh).

Start the job flow and monitor the stderr and stdout. Everything should work.

Hadoop - mapred VS mapreduce libraries

When you start to work on Hadoop, you may find that there are two libraries (mapred VS mapreduce) in a lot of online tutorials. Use the mapreduce library. The mapred is the older version.

To upgrade to the mapreduce library, check out the following slideshows from Yahoo:

Monday, July 8, 2013

Amazon EMR - RDS DB Security Group

Log in to your AWS RDS console. Select Security Groups on the left sidebar.

Select your DB Security Group and click on the Edit button.

Add the following:

1.) EMR master

Connection Type = EC2 Security Group
EC2 Security Group = ElasticMapReduce-master

2.) EMR slave

Connection Type = EC2 Security Group
EC2 Security Group = ElasticMapReduce-slave

Friday, July 5, 2013