Issue submitting Spark Job from code when Spark Job is a Python program.

When submitting a Java or Scala program, everything works fine.  When submitting a python program, it's gets to the ACCEPTED state and then stalls.  It eventually times out, but it's not getting picked up to run.  Is this interface just for Java/Scala programs/jobs or should it be able to submit PySpark/Python jobs as well?

I am trying to invoke the pi.py sample program that comes with Spark 1.6.0.

Below is the java program that I am testing with.  I'm new to Spark so apologies for any "newbie" errors.

import org.apache.spark.SparkConf;
import org.apache.spark.deploy.yarn.Client;
import org.apache.spark.deploy.yarn.ClientArguments;
import org.apache.hadoop.conf.Configuration;
// import org.apache.log4j.Logger;

/**
- This class submits a SparkPi to a YARN from a Java client (as opposed 
- to submitting a Spark job from a shell command line using spark-submit).
- 
- To accomplish submitting a Spark job from a Java client, we use 
- the org.apache.spark.deploy.yarn.Client class described below:
- Usage: org.apache.spark.deploy.yarn.Client [options]
  Options:
  --jar JAR_PATH           Path to your application's JAR file (required in yarn-cluster mode)
  --class CLASS_NAME       Name of your application's main class (required)
  --primary-py-file        A main Python file
  --arg ARG                Argument to be passed to your application's main class. 
                           Multiple invocations are possible, each will be passed in order.
  --num-executors NUM      Number of executors to start (Default: 2)
  --executor-cores NUM     Number of cores per executor (Default: 1).
  --driver-memory MEM      Memory for driver (e.g. 1000M, 2G) (Default: 512 Mb)
  --driver-cores NUM       Number of cores used by the driver (Default: 1).
  --executor-memory MEM    Memory per executor (e.g. 1000M, 2G) (Default: 1G)
  --name NAME              The name of your application (Default: Spark)
  --queue QUEUE            The hadoop queue to use for allocation requests (Default: 'default')
  --addJars jars           Comma separated list of local jars that want SparkContext.addJar to work with.
  --py-files PY_FILES      Comma-separated list of .zip, .egg, or .py files to place on the PYTHONPATH for Python apps.
  --files files            Comma separated list of files to be distributed with the job.
  --archives archives      Comma separated list of archives to be distributed with the job.
  
  How to call this program example:
  
     export SPARK_HOME="/Users/mparsian/spark-1.6.0"
     java -DSPARK_HOME="$SPARK_HOME" org.dataalgorithms.client.SubmitSparkPiToYARNFromJavaCode 10
  */
  public class SubmitSparkPiToYARNFromJavaCode {
  
  public static void main(String[] args) throws Exception {
      long startTime = System.currentTimeMillis();
  
  ```
  // this is passed to SparkPi program
  //THE_LOGGER.info("Slices Passed=" + args[0]);
  String slices = args[0];  
  // String slices = "10";
  //
  // String SPARK_HOME = System.getProperty("SPARK_HOME");
  String SPARK_HOME = "/opt/spark/spark-1.6.0";
  // THE_LOGGER.info("SPARK_HOME=" + SPARK_HOME);
  
  //
  pi(SPARK_HOME, slices); // ... the code being measured ... 
  //
  long elapsedTime = System.currentTimeMillis() - startTime;
  // THE_LOGGER.info("elapsedTime (millis)=" + elapsedTime);
  ```
  
  }
  
  static void pi(String SPARK_HOME, String slices) throws Exception {
      //  
      String[] args = new String[]{
          "--name",
          "Submit-SparkPi-To-Yarn",
          //
          "--driver-memory",
          "512MB",
          //
          "--jar",
          SPARK_HOME + "/examples/target/spark-examples_2.11-1.6.0.jar",
          //
          "--class",
          "org.apache.spark.examples.JavaSparkPi",
  
  ```
      // argument 1 to my Spark program
      "--arg",
      slices,
  
      // argument 2 to my Spark program (helper argument to create a proper JavaSparkContext object)
      "--arg",
      "yarn-cluster"
  };
  
  Configuration config = new Configuration();
  //
  System.setProperty("SPARK_YARN_MODE", "true");
  //
  SparkConf sparkConf = new SparkConf();
  ClientArguments clientArgs = new ClientArguments(args, sparkConf);
  Client client = new Client(clientArgs, config, sparkConf);
  
  client.run();
  // done!
  ```
  
  }
  }

Thanks,

-Scott


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue submitting Spark Job from code when Spark Job is a Python program. #4

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue submitting Spark Job from code when Spark Job is a Python program. #4

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions