developer24hours: mapreduce

Tuesday, July 9, 2013

Amazon Elastic MapReduce (EMR) ClassNotFoundException: com.mysql.jdbc.Driver

If you get the "ClassNotFoundException: com.mysql.jdbc.Driver" error while doing a JAR Elastic MapReduce, you will need to copy the mysql connector library into the hadoop/bin library.

The error will look like:

Caused by: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:169)
at org.apache.hadoop.mapreduce.lib.db.DBConfiguration.getConnection(DBConfiguration.java:148)
at org.apache.hadoop.mapreduce.lib.db.DBInputFormat.getConnection(DBInputFormat.java:184)
... 20 more

We can copy the mysql connector library to each of the machines by "bootstrapping".

1.) Get the MySQL connector library.

You can download it from the Maven repository.

Create a bucket on S3 and upload the SQL connector to this bucket.

2.) Writing a bootstrap bash file

Name this file bootstrap.sh. We will use the "hadoop fs" command to copy the connector from S3 to each machine.

Script:

#!/bin/bash
hadoop fs -copyToLocal s3n://wundrbooks-emr-dev/mysql-connector-java-5.1.25.jar $HADOOP_HOME/lib

Upload this script to the same bucket you created in the previous step.

3.) Create a Job Flow

Log in to the AWS EMR console.

Click on create a job flow.

Fill in all the details including your JAR file.

At the last "bootstrap" step, select custom bootstrap action and put in the location of the bootstrap.sh script (ex. s3n://{my_bucket}/bootscript.sh).

Start the job flow and monitor the stderr and stdout. Everything should work.

Hadoop - mapred VS mapreduce libraries

When you start to work on Hadoop, you may find that there are two libraries (mapred VS mapreduce) in a lot of online tutorials. Use the mapreduce library. The mapred is the older version.

To upgrade to the mapreduce library, check out the following slideshows from Yahoo:

Upgrading To The New Map Reduce API from Tom Hughes-Croucher