When submitting a Java or Scala program, everything works fine. When submitting a python program, it's gets to the ACCEPTED state and then stalls. It eventually times out, but it's not getting picked up to run. Is this interface just for Java/Scala programs/jobs or should it be able to submit PySpark/Python jobs as well?
I am trying to invoke the pi.py sample program that comes with Spark 1.6.0.
Below is the java program that I am testing with. I'm new to Spark so apologies for any "newbie" errors.
import org.apache.spark.SparkConf;
import org.apache.spark.deploy.yarn.Client;
import org.apache.spark.deploy.yarn.ClientArguments;
import org.apache.hadoop.conf.Configuration;
// import org.apache.log4j.Logger;
/**
-
This class submits a SparkPi to a YARN from a Java client (as opposed
-
to submitting a Spark job from a shell command line using spark-submit).
-
To accomplish submitting a Spark job from a Java client, we use
-
the org.apache.spark.deploy.yarn.Client class described below:
-
Usage: org.apache.spark.deploy.yarn.Client [options]
Options:
--jar JAR_PATH Path to your application's JAR file (required in yarn-cluster mode)
--class CLASS_NAME Name of your application's main class (required)
--primary-py-file A main Python file
--arg ARG Argument to be passed to your application's main class.
Multiple invocations are possible, each will be passed in order.
--num-executors NUM Number of executors to start (Default: 2)
--executor-cores NUM Number of cores per executor (Default: 1).
--driver-memory MEM Memory for driver (e.g. 1000M, 2G) (Default: 512 Mb)
--driver-cores NUM Number of cores used by the driver (Default: 1).
--executor-memory MEM Memory per executor (e.g. 1000M, 2G) (Default: 1G)
--name NAME The name of your application (Default: Spark)
--queue QUEUE The hadoop queue to use for allocation requests (Default: 'default')
--addJars jars Comma separated list of local jars that want SparkContext.addJar to work with.
--py-files PY_FILES Comma-separated list of .zip, .egg, or .py files to place on the PYTHONPATH for Python apps.
--files files Comma separated list of files to be distributed with the job.
--archives archives Comma separated list of archives to be distributed with the job.
How to call this program example:
export SPARK_HOME="/Users/mparsian/spark-1.6.0"
java -DSPARK_HOME="$SPARK_HOME" org.dataalgorithms.client.SubmitSparkPiToYARNFromJavaCode 10
*/
public class SubmitSparkPiToYARNFromJavaCode {
public static void main(String[] args) throws Exception {
long startTime = System.currentTimeMillis();
// this is passed to SparkPi program
//THE_LOGGER.info("Slices Passed=" + args[0]);
String slices = args[0];
// String slices = "10";
//
// String SPARK_HOME = System.getProperty("SPARK_HOME");
String SPARK_HOME = "/opt/spark/spark-1.6.0";
// THE_LOGGER.info("SPARK_HOME=" + SPARK_HOME);
//
pi(SPARK_HOME, slices); // ... the code being measured ...
//
long elapsedTime = System.currentTimeMillis() - startTime;
// THE_LOGGER.info("elapsedTime (millis)=" + elapsedTime);
}
static void pi(String SPARK_HOME, String slices) throws Exception {
//
String[] args = new String[]{
"--name",
"Submit-SparkPi-To-Yarn",
//
"--driver-memory",
"512MB",
//
"--jar",
SPARK_HOME + "/examples/target/spark-examples_2.11-1.6.0.jar",
//
"--class",
"org.apache.spark.examples.JavaSparkPi",
// argument 1 to my Spark program
"--arg",
slices,
// argument 2 to my Spark program (helper argument to create a proper JavaSparkContext object)
"--arg",
"yarn-cluster"
};
Configuration config = new Configuration();
//
System.setProperty("SPARK_YARN_MODE", "true");
//
SparkConf sparkConf = new SparkConf();
ClientArguments clientArgs = new ClientArguments(args, sparkConf);
Client client = new Client(clientArgs, config, sparkConf);
client.run();
// done!
}
}
Thanks,
-Scott
When submitting a Java or Scala program, everything works fine. When submitting a python program, it's gets to the ACCEPTED state and then stalls. It eventually times out, but it's not getting picked up to run. Is this interface just for Java/Scala programs/jobs or should it be able to submit PySpark/Python jobs as well?
I am trying to invoke the pi.py sample program that comes with Spark 1.6.0.
Below is the java program that I am testing with. I'm new to Spark so apologies for any "newbie" errors.
import org.apache.spark.SparkConf;
import org.apache.spark.deploy.yarn.Client;
import org.apache.spark.deploy.yarn.ClientArguments;
import org.apache.hadoop.conf.Configuration;
// import org.apache.log4j.Logger;
/**
This class submits a SparkPi to a YARN from a Java client (as opposed
to submitting a Spark job from a shell command line using spark-submit).
To accomplish submitting a Spark job from a Java client, we use
the org.apache.spark.deploy.yarn.Client class described below:
Usage: org.apache.spark.deploy.yarn.Client [options]
Options:
--jar JAR_PATH Path to your application's JAR file (required in yarn-cluster mode)
--class CLASS_NAME Name of your application's main class (required)
--primary-py-file A main Python file
--arg ARG Argument to be passed to your application's main class.
Multiple invocations are possible, each will be passed in order.
--num-executors NUM Number of executors to start (Default: 2)
--executor-cores NUM Number of cores per executor (Default: 1).
--driver-memory MEM Memory for driver (e.g. 1000M, 2G) (Default: 512 Mb)
--driver-cores NUM Number of cores used by the driver (Default: 1).
--executor-memory MEM Memory per executor (e.g. 1000M, 2G) (Default: 1G)
--name NAME The name of your application (Default: Spark)
--queue QUEUE The hadoop queue to use for allocation requests (Default: 'default')
--addJars jars Comma separated list of local jars that want SparkContext.addJar to work with.
--py-files PY_FILES Comma-separated list of .zip, .egg, or .py files to place on the PYTHONPATH for Python apps.
--files files Comma separated list of files to be distributed with the job.
--archives archives Comma separated list of archives to be distributed with the job.
How to call this program example:
export SPARK_HOME="/Users/mparsian/spark-1.6.0"
java -DSPARK_HOME="$SPARK_HOME" org.dataalgorithms.client.SubmitSparkPiToYARNFromJavaCode 10
*/
public class SubmitSparkPiToYARNFromJavaCode {
public static void main(String[] args) throws Exception {
long startTime = System.currentTimeMillis();
}
static void pi(String SPARK_HOME, String slices) throws Exception {
//
String[] args = new String[]{
"--name",
"Submit-SparkPi-To-Yarn",
//
"--driver-memory",
"512MB",
//
"--jar",
SPARK_HOME + "/examples/target/spark-examples_2.11-1.6.0.jar",
//
"--class",
"org.apache.spark.examples.JavaSparkPi",
}
}
Thanks,
-Scott