The error will look like:
Caused by: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:169)
at org.apache.hadoop.mapreduce.lib.db.DBConfiguration.getConnection(DBConfiguration.java:148)
at org.apache.hadoop.mapreduce.lib.db.DBInputFormat.getConnection(DBInputFormat.java:184)
... 20 more
We can copy the mysql connector library to each of the machines by "bootstrapping".
1.) Get the MySQL connector library.
You can download it from the Maven repository.
Create a bucket on S3 and upload the SQL connector to this bucket.
2.) Writing a bootstrap bash file
Name this file bootstrap.sh. We will use the "hadoop fs" command to copy the connector from S3 to each machine.
Script:
#!/bin/bashUpload this script to the same bucket you created in the previous step.
hadoop fs -copyToLocal s3n://wundrbooks-emr-dev/mysql-connector-java-5.1.25.jar $HADOOP_HOME/lib
3.) Create a Job Flow
Log in to the AWS EMR console.
Click on create a job flow.
Fill in all the details including your JAR file.
At the last "bootstrap" step, select custom bootstrap action and put in the location of the bootstrap.sh script (ex. s3n://{my_bucket}/bootscript.sh).
Start the job flow and monitor the stderr and stdout. Everything should work.
This is such a great resource that you are providing and you give it away for free. I love seeing blog that understand the value of providing a quality resource for free.
ReplyDeleteMinecraft Server List