Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Anchor
top
top

Info

This document contains instructions for running application CSV to GDF converter within the EPISODES Platform. The application is a tool for conversion of a CSV (Comma-Separated Values) file into a GDF (Generic Data Format) file with *.mat extension.

To obtain more general information about working with applications within the Platform, see Applications Quick Start Guide.


CATEGORY Converters

KEYWORDS Format conversion

CITATION If you use the results or visualizations retrieved from this application in a publication, then you must cite the data source as follows:
Orlecka-Sikora, B., Lasocki, S., Kocot, J. et al. (2020) An open data infrastructure for the study of anthropogenic hazards linked to georesource exploitation., Sci Data 7, 89, doi: 10.1038/s41597-020-0429-3.

Requirements for input file
Anchor
Requirements
Requirements

The input file to be converted to the GDF file format should be a valid CSV file compatible with one of the GDF file types included in the listcontaining date/time information and one or more parameter values (the matching GDF types are: data types containing one parameter and date and data types containing parameters and date)*, with values separated by commas (',') - see also CSV format specifications, e.g., https://tools.ietf.org/html/rfc4180. The file may or may not contain a header line (if it does not, uncheck the "Use csv header" checkboxspecify this in the form - see Options specification section), which defines the names of the columns in the GDF file. The numbers within the file should use English locale, the text, if containing special characters (including comma, used for values separation) should be enclosed in double quotes.

(*) More GDF types are planned to be supported in the future.

Example input file

The following excerpt provides an example of the contents of a valid input CSV file containing the resulting data for water level GDF type. It defines two columns with three entries . Each entry contains a value for each column, if the value is not specified, it should be left empty (note the double comma in the last line of Excerpt 1) The most popular column names and formats are described in this guide, however, custom columns are also allowed. The time should be written as a single text value with any format, however, this format - the first column is the time information, the latter a parameter (water level) value. If some of the values are not provided, an empty string should be put in their place (so that the number of delimiting commas is the same for all the columns). It is recommended that the column names match the name of fields in the GDF file type that is going to be generated (for reference see the GDF types), however, they can also be assigned at a later stage (see the Content properties section). The time can be written either as a single text column or divided into several columns (e.g. Year, Month, Day, Time)*. Any format can be used, however, it has to be later specified within the application form (see the Filling form values Content properties section). 

Code Block
Date,Water_Level
"2020 Jan 01 00:00:00.000",12.89
"2020 Jan 02 00:00:00.000",10.42
"2020 Jan 03 00:00:00.000",11.36

Excerpt 1. Sample content of a CSV input file.

Anchor
SampleInput
SampleInput

(*) Available from version 2.54

Input file specification

The application requires single file of type Catalog in CSV - see also previous section for the input file requirements.


Figure 1. Application input file specification.

Options specification

Filling form values

File options
Anchor
options
options

Choose from the drop-down list (marked with (1) on Figure 2) what type of output file is expected

(it must be related to the input file according to the list of GDF file types) - if the type is not in the list

. The description of types can be found at list of data types containing one parameter and date and list of data types containing parameters and date. If your content does not match any of these files, but also contains date/time information and one or more correlated parameters, use the GDF with time-correlated parameters. In case of other data, use the generic type (Generic GDF Data), however, this option is not recommended, as not all the GDF types are already supported. 

In the middle field (marked with (2) on Figure 2), you can indicate whether the selected input file has a header, if yes, the headers will appear in the table (see next section) and the system will automatically try to match them to a correct file type. The matching is based on the header names - if they are the same as the names of the fields of one of the supported GDF file type, the default content type, description, unit and display format (see Figure 3) will be automatically filled, as well as the Result type (marked with (1) in Figure 2). If the CSV file does not contain headers, new values will be generated by the system and can be later changed (see next section).

Optionally, a longer . In addition, a description of the file's contents can be added to the resulting file (marked with (3) on Figure 2).

Image Added

Image RemovedFigure 2. Options specification.

If the file you have selected does not have a direct mapping in combination with the selected result type, a corresponding message to this effect will be displayed above the options fields (Figure 3).

Image Removed

Figure 3. Information displayed if the selected result type does not have an exact mapping to the GDF file types based on the input file specified

Filling form values AnchorForm

Application form with GDF file options specification.

GDF field properties
Anchor
columns
columns

The next part of the

Form

The application form is generated based on the specific input file - the column names from header line (if it was present) are displayed in the first column (see Figure 3). If they match existing GDF type field names, they will have their default values already filled. If this is not the case, change the value of the field in the New namecolumn (see (marked with (1) on Figure 4) . By default all columns will be present in the result GDF file, however, to exclude any of them uncheck - the standard field names will be suggested as you start typing, and if you choose one of them, the rest of the columns will be filled with default values. If your data does not match any of the standard fields, other columns will have to be filled manually. Take care also to set the Result type (marked with (1) in Figure 2) correctly - see the previous section. Any of the columns can be also excluded from the result GDF file using the box near to the column name (as marked with (12) in Figure 43).

To read the file content correctly, it is required to set the column content type (marked with (23) in Figure 43), so that the program knows how to interpret the subsequent values. If the content type is Date and time, it is also required to set the time format (marked with (34) in Figure 43) to allow correct reading of the time fields. It is also possible to specify a different name for the column in the resulting GDF file (by default the name is the same as in the header) or time group, in case the date and time information is split into different columns, or to add a description or unit. Additionally, a display format might be added to specify a custom mode of display (e.g. engineering notation), with a help of a wizard (accessible with icon marked with (4) in Figure 4 and 3 and options visible in Figure 54).

If the column name from the file header was defined among the GDF types column names, all of the above properties will be already filled with defaults by the system (rows from Date to Water_level in Figure 2). For other values, they have to be specified by the user. 

Image Added

Figure 3Image RemovedFigure 4. Application form with most important elements marked.

Figure 54. Wizard used for display of format specification

Produced output

The result file will have the same name as the input, with only the extension changed to .mat, and it will be visualized within the application outputs. 

Figure 65. Result file visualization

Troubleshooting

Field format

The most common errors that might be spotted when running the application are caused by incorrect specification of the input format - either format of the time field or incorrect type of the field. If the value in the date/time field does not match the time format from the form (field marked with (3) in Figure 43), the application fails with an error that it cannot parse the field, showing also the content of the field on which the parsing failed - this is illustrated in Figure 76. In this example the time field has value 2010 Jan 01 00:00:00.000 (as in sample input file), therefore, the format of the field is YYYY MMM dd HH:mm:ss.SSS. However, the content of the format column within the application form was specified as YYYY-MMM-dd HH:mm:ss.SSS (the month written as three letter abbreviation - in this format the date value should be 2014-Jan-01 00:05:02.017) and the system could not read the value due to that. It is also important that all the dates/times within one CSV column have the same format.

Figure 76. Application error in case of incorrect time format specification.

Field names

In case the name of the fields in the resulting GDF file (New name column in Figure 3) do not match the chosen Result type (marked with (1) in Figure 2), a corresponding message will be displayed (see Figure 7). If, nevertheless, the file will be generated, it will probably be displayed correctly, however, its further use within the Platform's applications might result in unexpected errors.

Image Added

Figure 7. Information displayed if the GDF field names do not match the selected GDF result type

Back to top

Related Documents

Content by Label
showLabelsfalse
max10
spacesISDOC
showSpacefalse
sortmodified
reversetrue
typepage
cqllabel = "converters" and type = "page" and space = "ISDOC"
labelskb-how-to-article