[vtk-developers] CSV file reader/writer improvement plans
Aron Helser
aron.helser at kitware.com
Mon Jul 24 11:18:00 EDT 2017
We have used the vtkDelimitedTextReader as the first stage for a CSV data
import tool for a government customer, and they are open to moving the tool
into an open-source repository. Unfortunately, the project is stalled right
now as funding is considered.
Some thoughts inlne:
- The tool is a 'cleaner', so it's designed to take poorly-formed CSV files
and let the user make decisions about fixing the data, although there is a
non-interactive path that simply drops rows that have problem data. The
final result is a VTK table written to an HDF5 file, so this project
doesn't use the vtkDelimitedTextWriter. It has pretty good testing written
already.
On Sun, Jul 23, 2017 at 4:29 PM, Andras Lasso <lasso at queensu.ca> wrote:
> Hi all,
>
>
>
> What do people usually use for reading/writing CSV files?
>
>
>
> vtkDelimitedTextReader and vtkDelimitedTextWriter work for very specific
> cases, but have many important limitations:
>
>
>
> Reading:
>
> - No way to specify column types. There is some heuristics that can
> sometimes guess numeric column types but it is not usable in general (for
> example, a numeric column may be empty or a double column may happen to
> contain only integer values in a specific file).
>
> - We read everything as strings and then use vtkVariant to test if it can
be converted to int/double. We do use a heuristic to choose column data
types.
>
> -
> - All rows must contain exactly the same number of columns.
>
> If a row has too few entries, it is filled with empty values, at least for
string input.
>
> -
> - Columns that don’t have names cannot be read.
>
> - I believe vtkDelimitedTextReader will read columns without a name, but
they can't be used in calculations or exported until they have a name -
assigning a default name in your code might be enough.
>
> -
> - If a field value contains field separator or string separator
> character then the file is parsed incorrectly.
>
> - The purpose of the string separator is to allow the field separator to
appear inside a field. If you've found this not to work, it's a bug that
can be submitted to VTK!
>
> -
>
>
>
> Writing:
>
> - Cannot specify number of digits for writing floating-point numbers
> (currently something like 6 digits is hardcoded).
> - Cannot write field values that contain string separator characters
> (no escaping of “ by “” is performed)
> - If field value may contain field separator character then all values
> must be enforced to written with string separators, which makes the file
> very hard to read and edit (normally string separators are only added when
> needed)
>
>
>
> Questions to these answers would help us in planning/deciding if we
> implement solution for these by improving existing VTK classes, or just in
> our application:
>
> - Is there any plan (or work in progress) to address these limitations?
> - If we implemented these features, would they be welcome in VTK?
>
> - Yes, I think so!
>
> -
> - Would storing metadata (column type, format specifier, default
> value, etc.) in a schema .csv file next to the data file (such as this:
> https://github.com/Slicer/Slicer/blob/master/Libs/MRML/
> Core/Testing/TestData/table.schema.csv
> <https://github.com/Slicer/Slicer/blob/master/Libs/MRML/Core/Testing/TestData/table.schema.csv>)
> would be considered a good solution? Are there any other standards/best
> practices for storing csv schema?
> - Would anyone else need these features and could contribute some time
> for development or writing tests?
>
>
>
> Andras
>
>
>
> _______________________________________________
> Powered by www.kitware.com
>
> Visit other Kitware open-source projects at http://www.kitware.com/
> opensource/opensource.html
>
> Search the list archives at: http://markmail.org/search/?q=vtk-developers
>
> Follow this link to subscribe/unsubscribe:
> http://public.kitware.com/mailman/listinfo/vtk-developers
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://public.kitware.com/pipermail/vtk-developers/attachments/20170724/74643c08/attachment-0001.html>
More information about the vtk-developers
mailing list