[Paraview] Visualizing large data set of 2-d+time point data

Murphy, John T. jtmurphy at anl.gov
Fri Jul 22 10:38:37 EDT 2011


Hello,

I am working with some simulation models that generate large data sets that I would like to visualize using ParaView. The simulation consists of very large numbers of elements called 'agents', each having a number of invariant attributes and a location in 2-D space that varies through time (discrete time steps). Conceptually the data are then:

Agents' Descriptions:
AgentID    Type   ExtendedType NetworkMembership  Etc… (other attributes TBD)
Agent 1         1           5                         17                                 x..
Agent2          2           7                         14                                 x..
…



Agents' Locations:
AgentID   Time      X         Y
Agent1        1       100      100
Agent2        1       100      90
Agent1        2       100.3   100.5
Agent2        2       99.8       90.2
…


As the simulation progresses the agents move in relation to one another; this movement is what I need to visualize. I also need to be able to restrict the view and/or color the agents by type, extended type, and network membership, as well as other attributes I continue to develop.

I am very open to suggestions about how to visualize this, but I have two specific approaches in mind. In the first ('A'), all the data are displayed at once; time is used as the 3rd dimension, so the motion of agents in 2-D space plays out along the z axis. In the second ('B'), a 2-D view is all that is needed, but the agents' motion through time is animated.

I have been able to achieve both of these already, but only with some difficulty; I am wondering if there is a better way to do each one. I have a number of questions that I hope you can help me with.

For reference, the approach I am using now is:

- Data are written from the simulation in a CSV file with columns = AgentId, Time, X, Y, Z, Type, ExtendedType, (other…);

- If 'A' is to be used, all data are written to a single file; if 'B', data are written to files with numeric designations

- ParaView can open the file(s) and import the data; if 'B', the multiple files are considered time series

- I apply the 'Table To Points' filter; if 'A' I use X = X, Y = Y, and Z = Time; if 'B', I use X = X, Y = Y, and Z = Z; 'Z' in my data set is a dummy column that has zero as its only value.

- The results of the Table To Points filter can be displayed. I can select agents' color by 'Type' or 'Extended Type'. For option 'A' I scale the Z-axis to by some multiplier, so that z-values are proportionally large compared to x and y values (Z is generally 1 - 1,000, while x and y are on the order of 0 – 10,000; the image is clearer when 'Z' is stretched and the points begin to appear as a long tube).

My questions are:

I) To do the first approach requires all of the data to be loaded as a single file; the second relies on the data being broken into multiple files named xx.1, xx.2, etc., with each file representing a timestep. I would like to know if there is a way to load the data in one format but arrive at either or both visualizations. (This saves me from having to output two sets of files for a given simulation.)

II) Currently I also save, and load, the data as a 'flat file' in CSV format. One implication of this is that all of the agent attributes are reproduced in each line of the data output, so:

AgentID   Time      X         Y            Type   ExtendedType
Agent1        1       100      100             1              5
Agent2        1       100      90               2              7
Agent1        2       100.3   100.5        1              5
Agent2        2       99.8       90.2         2              7

This means that the 'attribute' information, which does not vary through time, nevertheless is duplicated for every instance of an agent's row in the data. I'm more familiar with databases, so an approach where I can put the attributes in one table and 'join' them as need would be nice; is one available in ParaView?

III) My current strategy is to load the data from a CSV file and then use the 'Table To Points' filter to convert to data that ParaView can represent. For approach 'B' this expects a Z value, so I have a dummy column in my data set for "Z", the value of which is always zero. (For approach A, of course, the Z axis is 'Time'). After this the only depiction I can render is 3-D, even though I only need 2-D. This seems inelegant and I'd like to avoid it; is this possible?

IV) In approach 'A', each row of the data has an agent ID; this makes it theoretically possible to follow one agent's path through time. However, the approach that I am using now treats each row of the data set as a separate point of data, independent of all the others. I would like to create something that shows the tracks of individual agent's through time- perhaps drawing a line from the agent's location at t 1 to t 2, etc. Is this possible?

V) Finally, I will not be able to use a CSV format as the file sizes grow larger. I would like to use a format amenable to high-performance computing and parallel file writing; HDF5 and NetCDF are two options that seem readily available and workable. However, my understanding is that each of these defines a file format that can be specialized to accommodate different data structures. I also believe that ParaView can read HDF5 and NetCDF files, but I'm not clear if it is only able to read certain common data structures written in them- such as 'HDF5 Image files'- rather than being able to read any HDF5 file and accommodate whatever data structure it contains. Is there some standard file format that I should use? Or will I need to develop a special-purpose structure to store my data in, say, an HDF5 file, and possibly develop a custom paraview reader to accommodate it? Or is XDMF what is needed?

I am new to ParaView and to large-scale visualization; if there are existing ways to do what I need, please help me by directing me to them. It may be that I don't know the correct vocabulary for finding the right data structures and procedures that I need- I believe the data I'm working with are termed 'particle data'? Any clarification or help is very much appreciated.

Thanks in advance,
John

--
John T. Murphy
Computational Postdoctoral Fellow
Decision and Information Sciences and
Argonne Leadership Computing Facility
Argonne National Laboratory
jtmurphy at anl.gov<mailto:jtmurphy at anl.gov>


More information about the ParaView mailing list