Custom Application Configuration

This document contains a description of files that can or should be present in an Application Workbench. Note that instead of preparing the files manually you can also use a dedicated application creation wizard that is available on the EPISODES platform.

Application Definition (Required)

A JSON file (see this documentation for more information on JSON format) named appDefinition.json. Specifies inputs and outputs of the application and how the application should be run.

Options

scriptLanguage - name of the scripting language in which the application is written. Currently only one language is available:
- MATLAB - MATLAB programming language. This applies also to Octave applications
executableScriptName - name of the application executable file
inputFiles (Optional*) - input files that are needed by the application. For each input file defined here, the application panel will show a button to choose the location of that file. When the application is run, the input files will be read into a MATLAB variables and passed as arguments to the main executable function in the order in which they appear here. Each input file is defined by:
- dataType - data type of the input (see Data Types section below)
- typeLabel (Optional*) - a name distinguishing this input file from others. Allows to have multiple input files with the same data type. Each input file with different type label will be treated as a separate input and be passed to the main function as a separate argument
- multiplicity - how many inputs of that type are expected (written as string value). Defines minimum required number and maximum allowed number of inputs. The portal validates if the user selected the correct number of input files, and if not, it will not allow the user to run the application. Depending on what multiplicity is given the input(s) might be passed to the main function in a different way (see Handling custom multiplicity in input files section). The possible values are:
  - ? - zero or one. The input is optional, there can be at most one input file
  - * - zero or more. The input is optional and there is no upper limit of how many input files there can be
  - + - one or more. At least one file is required and there is no upper limit of how many input files there can be
  - n - exactly n input files are expected - no more, no less
  - [n, m] - there should be at least n input files and at most m input files
inputParameters (Optional*) - simple input parameters that should be entered by the user. For each parameter defined here, the application panel will show input form where the user can enter the value of the parameter. When the application is run, the parameters' values will be passed as arguments to the main executable function in the order in which they appear here. The parameter arguments are added after arguments specified by input files (see section Executable Script). When adding a parameter you have to specify:
- type - says what kind of value the field should hold, this determines also the type of a form field that would be displayed to the user (see Figure 1). The available options are:
  - TEXT - a text value
  - BOOLEAN - a true or false value
  - DOUBLE - a real number value
  - INTEGER - an integer value
  - TIME - a date value
- name (Optional) - a name that would be displayed next to the form. If not specified the value of the 'type' option will be displayed instead (see Figure 1). If you are using the wizard offered by the platform (see Creating New Application guide). The name of the parameter variable in the script would automatically be insterted here.

Figure 1. Visualization of different types of input parameters in an application form TODO: screen z widocznym date pickerem

outputs - list of outputs generated by the application. The outputs defined here should be returned as output variables by the main executable function. After the function is run each of these outputs will be saved as a mat file and returned as a result of the application. Each output is defined by:
- dataType - data type that this output file should have (see Data Types section)
- fileName - name of the file under which this output should be saved
- isReturnedValue - (applies to Matlab and Octave applications) value informing whether the variable is additionally declared as output of the function that is executed. In such case, the value would be automatically packed to a file in .mat format and setting the fileFormat property is not required. The default value is false .
- fileFormat - format of the produced file (see File Formats section)
requiredTools - list of programs or tools that are needed to run the application. At least one value should be present - the script interpreter. Please contact us if you need any other tools, or toolboxes, or if you require a specific version of a Matlab or Octave interpreter. Available options are:
- matlab - the MATLAB interpreter. By default one of these versions will be used: 2019b or 2021a (state as of October 2021, the versions might be updated in the future).
- matlab-signal_processing_toolbox - signal processing toolbox. Usable only with MATLAB
- matlab-image_processing_toolbox - image processing toolbox. Usable only with MATLAB
- octave - the Octave interpreter. By default one of these versions will be used: 3.8.1 or 4.2.1 (state as of October 2021, the versions might be updated in the future)
requiredComputationResources (Optional) - a map defining what computational resources should be available to the application. Available options are:
- COMPUTATION_TIME - maximum time that an application needs for the computation (in minutes). If not specified it defaults to 30 minutes
- MEMORY - maximum memory that an application needs for the computation (in gigabytes). If not specified it defaults to 2GB
- CPU_COUNT - number of processors that should be available to the application. If not specified it defaults to 1

* If a field is marked as 'Optional' it means that it doesn't have to be present in the file

Sample file

{
  "scriptLanguage":"MATLAB",
  "executableScriptName":"sampleScriptName.m",
  "inputFiles":[
    {
      "dataType":"double_vector",
      "multiplicity":"1",
      "typeLabel": "label"
    }
  ],
  "inputParameters":[
    "DOUBLE",
    "TEXT"
  ],
  "outputs":[
    {
      "dataType":"double_vector",
      "fileName":"output.mat",
      "isReturnedValue":false,
      "fileFormat":"MAT"
    }
  ],
  "requiredComputationResources":{
    "COMPUTATION_TIME":5
  },
  "requiredTools":[
    "octave"
  ]
}

Data Types

The data type field that is present in input and output definition must contain a name of one of predefined data types. It is important that the name is written in exactly the same way as it is in the predefined data type. You can check the possible data types in the upload file form in the platform, or you can contact us. The most commons types are:

double_vector - a vector of real number values
time_vector - a vector of date values
integer_vector - a vector of integer values
boolean_vector - a vector of boolean values
string_vector - a vector of text values
image_data - an image file
text_data - a text file
catalog - a seismic catalog
ground_motion_catalog - a ground motion catalog
miniseed - a miniSEED file (SEED file that contains only data record)
fullseed - a full-SEED file (SEED file that contains both data records and station information)
dataless - a dataless SEED (contains only station information)
seed - denotes a type that can be either a fullseed or a miniseed. Should be used when it does not matter if the seed file contains the dataless information or not
sac - time series data written in a SAC format
injection_rate - file containing injection rate
water_level - file containing water level

File Formats

The file format field that is present in output definition must contain a name of one of predefined file format. It is important that the name is written in exactly the same way as it is in the predefined type. The file format should also match the data type specified for the output, as not all the formats are supported with a specific data type - e.g. a seismic catalog cannot be written (and later read from) a PNG file. You can check the possible combinations of data types and file formats in the upload file form in the platform, or you can contact us. The most commons types are:

MAT - for Matlab/Octave data files
MINISEED - for miniSEED files (SEED files that contain only data record)
SEED - for full-SEED files (SEED files that contain both data records and station information)
DATALESS - for dataless SEED files (the files contain only station information)
SAC - for time series data written in a SAC format
PLAIN_TEXT - for files in any text format (Note, that, as DATALESS is also a text format, this could be applied also to DATALESS files)
PNG - for image files in PNG format
JPG - for image files in JPG format

Executable Script (Required)

The main script of the application that is the entry point of the application. The name of the script file should be the same as executableScriptName defined in the application definition file and the script should contain a function that has the same name as the file. The main function should have the same number of inputs and outputs as defined in the application definition file. The names of the input and output variables are irrelevant. For example, if the application definition contains:

"executableScriptName": "sampleFunctionName.m",
"inputParameters": ["TEXT", "DOUBLE"],
"inputFiles": [{ "dataType" : "integer_vector", "multiplicity" : "1"},  { "dataType" : "string_vector", "multiplicity" : "1"}],
"outputs": [{ "dataType" : "double_vector", "fileName" : "output1.mat", "isReturnedValue" : true}, { "dataType" : "boolean_vector", "fileName" : "output2.mat", "isReturnedValue" : true}, { "dataType" : "image_data", "fileName" : "plot.png", "fileFormat" : "PNG"} ]

the executable script could look like that:

function [outputDoubleValues, outputBooleanValues] = sampleFunctionName(intValues, stringValues, text, multiplicator)
  outputDoubleValues = double(intValues) * multiplicator;
  outputBooleanValues = strcmp(stringValues, text);
  plot(outputDoubleValues);
  print('-dpng', 'plot.png');
end

The order of inputs and outputs corresponds to the order in which they are defined in the application definition file, with the note that inputFiles variables come before inputParameters variables. For the above example the order of variables would be as follows:

intValues - corresponds to the input file with type 'integer_vector'
stringValues - corresponds to the input file with type 'string_vector'
text - corresponds to the input parameter with type TEXT
multiplicator - corresponds to the input parameter with type DOUBLE
outputDoubleValues - corresponds to the output with type 'double_vector'
outputBooleanValues - corresponds to the output with type 'boolean_vector'

Additionally, another file is produced within the script - an image file plot.png - it is not declared in the function outputs (but has to be declared in the application definition file).

Handling custom multiplicity in input files

Depending on what multiplicity is given, the input(s) might be passed to the main function in a different way. In the example above, all input files had multiplicity defined as "1" (default) - in such case, each file is read to a MATLAB variable which is passed directly as argument to the main function. If the multiplicity is set to any other value, all the files provided by the user for this input are read to MATLAB variables and packed into a single cell array before being passed as argument to the main function. For example, if the function has one input of type double_vector and multiplicity set to "*" and there are 3 input files, the function will receive a cell array containing three double vectors. If, however there are no input files present, an empty cell array will be passed. The variables in the cell array are not guaranteed to be in any specific order. This is done by design, inputs of the same data type and non-singular multiplicity are undistinguishable from each other. If you need to have a few inputs with the same data type, and you need to be able to distinguish between them use typeLabel instead.

Application Description (Optional)

A JSON file (see this documentation for more information on JSON format) named appDescription.json. Defines how the application will be described in the Applications page inside the EPISODES platform, and how a workspace directory will be named when the application is created. This file is optional - if it is not present the required fields - shortName and name - will be filled with the name of the repository, and all other fields will be left empty.

Options

shortName (Required) - short name of the application. The short name will be used for creating application directory within the workspace. It is not recommended to include special characters in the application name, some of them are not allowed
lastUpdate - a date when the application was last updated, written In format "dd-MM-yyyy HH:mm:ss"
references - links to additional resources that can be displayed with the application (e.g. user guide). It is a map where the key will be the text of the link and the value will be its web address (URL)
categories - categories to which the application can be assigned. It is advisable to use the categories already present within the portal. However, you can also add your own
keywords - keywords that match the application. If relevant, use the keywords already defined within the portal, but add also your own, to describe the application best
name (Required) - full name of the application, that will be visible on the application list. May contain spaces.
description - longer description of the application. May contain HTML tags, including links. May also include the input or output type name written as ${D__data_type}, where "data_type" is the data type of the input or output file - this will be filled with a full name of this data type that will be automatically translated into an appropriate language (if that translation is available)
license - license used for publishing the application. If you wish your application to be open to other users, include a license here. You might also include it as a LICENCE.txt file in your code repository
author - author of the application, may contain your name, institution and/or project affiliation.
citation - citations (articles) related to the application (e.g. the algorithms it uses). A generic portal and its application part is added to the citations, therefore, include here only citations related solely to this application.
inputsDescription - description of the application inputs. May contain HTML tags, including links. By default it should include listing of the types of inputs, but adding more detailed description is advisable. Input types should be written as ${D__data_type}, where "data_type" is the data type of the input file
resultsDescription - description of the application results. May contain HTML tags, including links. By default it should include listing of the types of outputs, but adding more detailed description is advisable. Output types should be written as ${D__data_type}, where "data_type" is the data type of the output file
computationalCharacteristic - description of the features of the application that might have impact on its computation time or other resources consumption.

Sample file

{
  "shortName":"AppDir",
  "references":{
    "User Guide":"https://docs.cyfronet.pl/pages/viewpage.action?pageId=somepage"
  },
  "lastUpdate":"28-02-2021 20:50:12",
  "categories":[
    "Category"
  ],
  "keywords":[
    "keyword1",
    "keyword2"
  ],
  "name":"Application name",
  "description":"Longer description containing some ${D__data_types} and <a href='#app:AppId'>links</a>.",
  "license":"Sample licence",
  "author":"Sample author",
  "citation":"Sample citations",
  "inputsDescription":"${D__data_type}",
  "resultsDescription":"${D__other_data_type}",
  "computationalCharacteristic":"Description of computational characteristic"
}

Other Files

You are free to add any other files to the repository if you wish. You can, for example, include additional script files to help better organize the code, or you can add some supplementary files like a readme or a license file.