Wherobots Job Submission Guide¶

Overview: Understanding Wherobots Job Submission¶

The Wherobots Job Submission functionality enables users to efficiently configure, initiate, and monitor computational tasks. Here's a summary of its capabilities:

Easily set up and initiate computational tasks using the platform's intuitive interface.
Configurable task parameters including runtime environments, driver settings, and advanced configurations.
Real-time monitoring of job status and execution details.

This feature provides a seamless approach to managing and executing jobs on the Wherobots platform.

Job Dashboard Access¶

Accessible via the "Jobs" tab on the left navigation pane.
Provides a list of all created jobs and their associated management options.

Jobs Tab

Creating a New Job¶

To initiate a new job, click the "New Job" button. Here's a summary of the configurations available:

Configuration	Description
Name	Unique name for your job.
Schedule	Use cron format to set up a run schedule. Generate cron. (UTC timezone based)
Tasks
	Name	Name for the task.
	File Path	Browse to select the desired file or script.
	Main Class	For JAR files, specify the main class for execution.
	Runtime & Version	Select the runtime environment and version.
	Driver Configuration	Define driver type, instance, and executor count.
Advanced Configuration
	Executor Defaults	If unspecified: Type and Instance same as Driver, Disk Size: 20GB each.
	Optional Parameters	Set parameters like disk size, timeout, retries, environment variables, arguments, Spark configurations, etc.
Libraries	Attach necessary libraries.

New job

Configurations and Libraries¶

Spark Configuration¶

When creating a job, users have the option to specify Spark configurations. These configurations give users more granular control over Spark's behavior during job execution. Remember, these configurations need to be specified explicitly when creating a job.

Example for Accessing AWS S3 Buckets: To allow Spark to access and read files from an AWS S3 bucket, you can specify the AWS credentials provider in a JSON format:

{
  "spark.hadoop.fs.s3a.bucket.<YOUR_BUCKET_NAME>.aws.credentials.provider": "org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider"
}

Replace <YOUR_BUCKET_NAME> with the name of your S3 bucket. This configuration uses the AnonymousAWSCredentialsProvider, which means no AWS credentials are required, so ensure your bucket permissions allow this type of access.

Working with Libraries¶

Wherobots supports adding external libraries to a job. These can be in the form of PyPi packages, wheel files, or JAR files.

PyPi: Simply specify the name of the package and its version. For example, to include the popular data manipulation library pandas, you would add:
- Library Name: pandas
- Library Version: 2.1.1
Wheel Files & JAR Files:
- To include a specific library as a Wheel file or a JAR file, upload it directly using the browse function in the Libraries section.
- Example: If you have a custom library named my_custom_library.jar, you can upload it directly. Similarly, for a Python Wheel file named my_python_lib.whl, use the same upload functionality. Ensure these files are accessible and compatible with your selected runtime.
Maven Libraries:
- Wherobots allows users to integrate libraries from Maven repositories directly into the Spark Configuration section.
- To add a Maven library, specify it in the Spark configuration using the spark.jars.packages property.

Example: To add Apache Commons Lang, you'd input into the Spark Configuration:

    {
      "spark.jars.packages": "org.apache.commons:commons-lang3:3.12.0"
    }

File Paths and Directories¶

When adding files or scripts for your tasks, ensure that they are located in either the data/shared directory or the data/customer-<customer_id> directory. These directories are structured for organized data storage and efficient access.

Configuration & Libraries

Job Creation Confirmation¶

Upon defining the necessary configurations, users can confirm and create the job using the "Create Job" button.

Job Execution¶

Once a job is created, its details can be viewed from the "Job Details" page.
The job can be executed by selecting the "Run Job" button.

Job details

Monitoring Job Progress¶

Accessing Job run output¶

From the main "Jobs" dashboard, locate the specific job you wish to monitor.
Click on the "Job ID" to access the detailed page of the job. This takes you to the "JOB RUN OUTPUT" section, offering a comprehensive view of the job's configuration and status.

Accessing Job Run Output

Job Run Output¶

This section captures the real-time output generated during the job's execution.
Detailed logs, including timestamped messages and system-related outputs, offer insights into the job's operation.
A "Wrap output" toggle is available for users to toggle between wrapped and unwrapped view of the output, optimizing readability based on preference.
In case of any issues or errors during execution, detailed traces will be provided in this section, aiding in troubleshooting.

Job Run Output

By monitoring the job details, users can gain a deep understanding of the job's operations, performance, and potential issues, ensuring efficient and optimized executions.

View System Events¶

This section captures the system events generated during the job's execution.
When you observe job failed or hanging with no log output, you can expand the tab to see if there is any message such as quota exceeded, etc.

System Events

Last update: October 17, 2023 07:48:30