Skip to main content

Spark Learn

Description

The Spark KNN Learn process type provides the ability to configure and productionize custom Spark K-Nearest Neighbors Learn code into a Syntasa Composer workflow. By configuring this process it is allowing Syntasa to manage the learning on a scheduled basis. A user will find that there are two ways of importing the code:

  1. Paste into a text editor window
  2. Upload a file with the code

After placing the code into text editor, the output locations of the process type will need to be specified.

Once the process is configured and tested it can be deployed to production, and scheduled for Syntasa to run on a scheduled basis.

Spark_Learn_Process.webp

Process Configuration

There are two screens that need configuring for this process type.

  1. Parameters
  2. Output

Parameters

Spark_Learn_Parameters.webp

The Parameters screen is where the actual custom code is imported by either pasting or file upload.

File Upload

  1. Click the Paperclip_Icon.webp icon
  2. A file browser window will appear
  3. Select the file with the code to be imported
  4. Click Open
  5. The contents of the file will be placed in the text editor window
  6. Also, the file name will be displayed just below the "File Upload" heading

Output

Spark_Learn_Output.webp

Output screen is where the table name, display name, and model name can be defined along with the option to "Load to BQ" when using Google Cloud Platform or "Load to Redshift" when using Amazon Web Services. There are three outputs for this process type per the following:

  1. learning_metrics to help provide ability to understand performance
  2. feature_importance
  3. model

Expected Output

The expected out of this process type are the model that is stored in the "Base Path" and the learning_metrics and feature_importance stored in the "Location" that are found on the Output screen. Loading to BQ or Redshift helps to make querying the learning metrics and feature importance easier.