[vtk-developers] CSV file reader/writer improvement plans

Aron Helser aron.helser at kitware.com
Mon Jul 24 11:18:00 EDT 2017


We have used the vtkDelimitedTextReader as the first stage for a CSV data
import tool for a government customer, and they are open to moving the tool
into an open-source repository. Unfortunately, the project is stalled right
now as funding is considered.

Some thoughts inlne:
- The tool is a 'cleaner', so it's designed to take poorly-formed CSV files
and let the user make decisions about fixing the data, although there is a
non-interactive path that simply drops rows that have problem data. The
final result is a VTK table written to an HDF5 file, so this project
doesn't use the vtkDelimitedTextWriter. It has pretty good testing written
already.


On Sun, Jul 23, 2017 at 4:29 PM, Andras Lasso <lasso at queensu.ca> wrote:

> Hi all,
>
>
>
> What do people usually use for reading/writing CSV files?
>
>
>
> vtkDelimitedTextReader and vtkDelimitedTextWriter work for very specific
> cases, but have many important limitations:
>
>
>
> Reading:
>
>    - No way to specify column types. There is some heuristics that can
>    sometimes guess numeric column types but it is not usable in general (for
>    example, a numeric column may be empty or a double column may happen to
>    contain only integer values in a specific file).
>
> - We read everything as strings and then use vtkVariant to test if it can
be converted to int/double. We do use a heuristic to choose column data
types.


>
>    -
>    - All rows must contain exactly the same number of columns.
>
> If a row has too few entries, it is filled with empty values, at least for
string input.

>
>    -
>    - Columns that don’t have names cannot be read.
>
> - I believe vtkDelimitedTextReader will read columns without a name, but
they can't be used in calculations or exported until they have a name -
assigning a default name in your code might be enough.


>
>    -
>    - If a field value contains field separator or string separator
>    character then the file is parsed incorrectly.
>
> - The purpose of the string separator is to allow the field separator to
appear inside a field. If you've found this not to work, it's a bug that
can be submitted to VTK!


>
>    -
>
>
>
> Writing:
>
>    - Cannot specify number of digits for writing floating-point numbers
>    (currently something like 6 digits is hardcoded).
>    - Cannot write field values that contain string separator characters
>    (no escaping of “ by “” is performed)
>    - If field value may contain field separator character then all values
>    must be enforced to written with string separators, which makes the file
>    very hard to read and edit (normally string separators are only added when
>    needed)
>
>
>
> Questions to these answers would help us in planning/deciding if we
> implement solution for these by improving existing VTK classes, or just in
> our application:
>
>    - Is there any plan (or work in progress) to address these limitations?
>    - If we implemented these features, would they be welcome in VTK?
>
> - Yes, I think so!

>
>    -
>    - Would storing metadata (column type, format specifier, default
>    value, etc.) in a schema .csv file next to the data file (such as this:
>    https://github.com/Slicer/Slicer/blob/master/Libs/MRML/
>    Core/Testing/TestData/table.schema.csv
>    <https://github.com/Slicer/Slicer/blob/master/Libs/MRML/Core/Testing/TestData/table.schema.csv>)
>    would be considered a good solution? Are there any other standards/best
>    practices for storing csv schema?
>    - Would anyone else need these features and could contribute some time
>    for development or writing tests?
>
>
>
> Andras
>
>
>
> _______________________________________________
> Powered by www.kitware.com
>
> Visit other Kitware open-source projects at http://www.kitware.com/
> opensource/opensource.html
>
> Search the list archives at: http://markmail.org/search/?q=vtk-developers
>
> Follow this link to subscribe/unsubscribe:
> http://public.kitware.com/mailman/listinfo/vtk-developers
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://public.kitware.com/pipermail/vtk-developers/attachments/20170724/74643c08/attachment-0001.html>


More information about the vtk-developers mailing list