This document contains instructions for running application CSV to GDF converter within the EPISODES Platform. The application is a tool for conversion of a CSV (Comma-Separated Values) file into a GDF (Generic Data Format) file with *.mat extension.

To obtain more general information about working with applications within the Platform, see Applications Quick Start Guide.

CATEGORY Converters

KEYWORDS Format conversion

CITATION If you use the results or visualizations retrieved from this application in a publication, then you must cite the data source as follows:
Orlecka-Sikora, B., Lasocki, S., Kocot, J. et al. (2020) An open data infrastructure for the study of anthropogenic hazards linked to georesource exploitation., Sci Data 7, 89, doi: 10.1038/s41597-020-0429-3.

Requirements for input file

The input file to be converted to the GDF file format should be a valid CSV file compatible with one of the GDF file types containing date/time information and one or more parameter values (the matching GDF types are: data types containing one parameter and date and data types containing parameters and date)*, with values separated by commas (',') - see also CSV format specifications, e.g., https://tools.ietf.org/html/rfc4180. The file may or may not contain a header line (if it does not, specify this in the form - see Options specification section), which defines the names of the columns in the GDF file. The numbers within the file should use English locale, the text, if containing special characters (including comma, used for values separation) should be enclosed in double quotes.

(*) More GDF types are planned to be supported in the future.

Example input file

The following excerpt provides an example of the contents of a valid input CSV file containing the resulting data for water level GDF type. It defines two columns with three entries - the first column is the time information, the latter a parameter (water level) value. If some of the values are not provided, an empty string should be put in their place (so that the number of delimiting commas is the same for all the columns). It is recommended that the column names match the name of fields in the GDF file type that is going to be generated (for reference see the GDF types), however, they can also be assigned at a later stage (see the Content properties section). The time can be written either as a single text column or divided into several columns (e.g. Year, Month, Day, Time)*. Any format can be used, however, it has to be later specified within the application form (see the Content properties section). 

Date,Water_Level
"2020 Jan 01 00:00:00.000",12.89
"2020 Jan 02 00:00:00.000",10.42
"2020 Jan 03 00:00:00.000",11.36

Excerpt 1. Sample content of a CSV input file.

(*) Available from version 2.54

Input file specification

The application requires single file of type Catalog in CSV - see also previous section for the input file requirements.


Figure 1. Application input file specification.

Filling form values

File options

Choose from the drop-down list (marked with (1) on Figure 2) what type of output file is expected. The description of types can be found at list of data types containing one parameter and date and list of data types containing parameters and date. If your content does not match any of these files, but also contains date/time information and one or more correlated parameters, use the GDF with time-correlated parameters. In case of other data, use the generic type (Generic GDF Data), however, this option is not recommended, as not all the GDF types are already supported. 

In the middle field (marked with (2) on Figure 2), you can indicate whether the selected input file has a header, if yes, the headers will appear in the table (see next section) and the system will automatically try to match them to a correct file type. The matching is based on the header names - if they are the same as the names of the fields of one of the supported GDF file type, the default content type, description, unit and display format (see Figure 3) will be automatically filled, as well as the Result type (marked with (1) in Figure 2). If the CSV file does not contain headers, new values will be generated by the system and can be later changed (see next section).

Optionally, a longer description of the file's contents can be added to the resulting file (marked with (3) on Figure 2).

Figure 2. Application form with GDF file options specification.

GDF field properties

The next part of the application form is generated based on the specific input file - the column names from header line (if it was present) are displayed in the first column (see Figure 3). If they match existing GDF type field names, they will have their default values already filled. If this is not the case, change the value of the field in the New name column (marked with (1) on Figure 4) - the standard field names will be suggested as you start typing, and if you choose one of them, the rest of the columns will be filled with default values. If your data does not match any of the standard fields, other columns will have to be filled manually. Take care also to set the Result type (marked with (1) in Figure 2) correctly - see the previous section. Any of the columns can be also excluded from the result GDF file using the box near to the column name (marked with (2) in Figure 3).

To read the file content correctly, it is required to set the column content type (marked with (3) in Figure 3), so that the program knows how to interpret the subsequent values. If the content type is Date and time, it is also required to set the time format (marked with (4) in Figure 3) to allow correct reading of the time fields. It is also possible to specify a time group, in case the date and time information is split into different columns, or to add a description or unit. Additionally, a display format might be added to specify a custom mode of display (e.g. engineering notation), with a help of a wizard (accessible with icon marked with (4) in Figure 3 and options visible in Figure 4).

Figure 3. Application form with most important elements marked.

Figure 4. Wizard used for display of format specification

Produced output

The result file will have the same name as the input, with only the extension changed to .mat, and it will be visualized within the application outputs. 

Figure 5. Result file visualization

Troubleshooting

Field format

The most common errors that might be spotted when running the application are caused by incorrect specification of the input format - either format of the time field or incorrect type of the field. If the value in the date/time field does not match the time format from the form (field marked with (3) in Figure 3), the application fails with an error that it cannot parse the field, showing also the content of the field on which the parsing failed - this is illustrated in Figure 6. In this example the time field has value 2010 Jan 01 00:00:00.000 (as in sample input file), therefore, the format of the field is YYYY MMM dd HH:mm:ss.SSS. However, the content of the format column within the application form was specified as YYYY-MMM-dd HH:mm:ss.SSS (the month written as three letter abbreviation - in this format the date value should be 2014-Jan-01 00:05:02.017) and the system could not read the value due to that. It is also important that all the dates/times within one CSV column have the same format.

Figure 6. Application error in case of incorrect time format specification.

Field names

In case the name of the fields in the resulting GDF file (New name column in Figure 3) do not match the chosen Result type (marked with (1) in Figure 2), a corresponding message will be displayed (see Figure 7). If, nevertheless, the file will be generated, it will probably be displayed correctly, however, its further use within the Platform's applications might result in unexpected errors.


Figure 7. Information displayed if the GDF field names do not match the selected GDF result type

Back to top

  • No labels