[Paraview] [EXTERNAL] Re: Color legend and log scaling

Scott, W Alan wascott at sandia.gov
Tue Jul 1 18:23:29 EDT 2014


With regards to data spanning more than 4 orders of magnitude, this is dealt with by having this value user selectable.

With regards to setting q to include most data, using a q at all is user selectable.  If you turn it off, Rescale to Data Range would just set min and max to include all data.  If data included 0, which is illegal, you have to do something.  Setting the min to be q orders of magnitude smaller than max is better than setting it to 1!  Then, if that wasn't what you wanted, you could always reset it manually.  My goal is to create a smarter way of automatically dealing with min and max for log scaling.

With regards to data that is an "invalid value", this is an orthogonal problem.  Dealing with invalid data is already dealt with (i.e., nans), and maybe we should expand this to include huge positive and negative numbers (1e38 and -1e38).  I believe it is still outside of this discussion.

Regarding negative values, my bad - I was not clear.  A vertical normal log scale bar, going from a 1e8 to 1e4 may have minor labels (working down) of 1e8, 1e7, 1e6, 1e5, and 1e4.  I.e., each tick towards the top representing a change of 10 million units, each tick towards the top representing a change of a thousand units.  Now, for the negative case, this would be reversed.  The color bar remains colored as before (for default color map, red on top, blue on bottom).  Since more positive numbers are always on top, we would run as follows (working down the minor labels): -1e4, -1e5, -1e6, -1e7 and -1e8.  Units change by the thousands at the top of the color legend, and millions towards the bottom.

Once again, you could always manually set your min and max.  But, I am arguing that what we currently do - set min to 1, is arbitrary, crude and wrong.

Alan



-----Original Message-----
From: David Thompson [mailto:david.thompson at kitware.com] 
Sent: Tuesday, July 01, 2014 3:44 PM
To: Scott, W Alan
Cc: paraview at paraview.org
Subject: [EXTERNAL] Re: [Paraview] Color legend and log scaling

Hi Alan,

> I would like to propose a few changes in our log scaling algorithm for painting a dataset by a variable.   I discussed this with Utkarsh, and he asked that I bounce it off the e-mail list.  So, here goes.
>  
> Currently, when a user log scales a variable, if all data is positive, ParaView just uses the normal min and max.  There are times when this is not proper - for instance when looking at the temperature or density of material in a supernova, or velocity of outbound gas.  Another example is large data, with noise around zero.  I would like to propose that we have a user selectable option to set the minimum at maximum*10^-q, where q is user defined but defaults to 4.  In other words, the minimum would be set to 1*10^-4 of what the maximum is.

Avoiding a "window" around 0 in the initial view sounds good to me. However, I can imagine some cases where the data spans more than 4 orders of magnitude. One thing I've seen (debatably bad, but something ParaView must deal with) are simulations/datasets that use large numbers (i.e., 1e+38) to mark invalid values. (The LDAV climate data does this.) Showing a plot with the initial view set to [1e34,1e38] would not be useful, since it would only show invalid values. Another is chemical reaction simulations where concentrations span much more than 4 orders of magnitude (I've seen some span 11 or 12 orders of magnitude, but 5 or 6 can be common).

What choosing q to ensure that a significant fraction (say 90%?) of the data is actually on-screen? It not terribly hard (even in parallel) to extract a fixed-size sample that approximates a histogram to within a few percent. We could use that to determine where the bulk of the data resided and ensure that the q-value does not leave more than 10% offscreen.

> If all of the user's data is negative, ParaView grumbles, and then seg faults using a current master git pull - not optimal behavior.  In PV 4.1, it just sets min and max to 0.  I would like to propose that ParaView calculate the log of the data, as follows:  Index= -(log(abs(Var))).  Then, just draw the color legend as normal - for instance, red at top, white in the middle and blue at bottom.  Tick marks will be the reverse of positive log scaling - with the dense numbers, more negative numbers at the bottom and less dense, less negative numbers at the top.

I'm not sure I understand this, especially your use of "dense". It sounds like you have a particular dataset in mind where the probability density is low near zero. Are you saying you want the color scale to be different in the case of data that is all negative numbers? Or that log plots in general should have colors reversed?

> The problem arises with data that spans positive and negative numbers.  Since the log of 0 is infinity, we have to deal with very small numbers in a special way.  I propose that we find maxVal = max(maximum, abs(minimum)).  Then, we set the color bar to run from maxVal to -maxVal.  We log scale the top half of the color legend, running from maxVal to maxVal*10^-4, and we reverse log scale the bottom half of the color legend, running from -maxVal*10^-4 to -maxVal.  We calculate this negative range the same as the all negative data section above.  All data between maxVal*10^-4 and -maxVal*10^-4 would remain white by default, or user selectable black.

I like the idea of having a custom color for the range [-10^q,10^q] and I think I understand and agree with the simplified bounds/ticks for the case where the data crosses zero -- at least when it firmly crosses the origin. However, whose chemistry simulations I mentioned above would occasionally have very slightly negative concentrations. Physically they can never be negative but floating-point precision meant that some were. In this case, we might have a distribution of concentrations that went from -1e-16 to 1 with the bulk in the range [1e-10,1]. Using the algorithm above for the axes would assign half of the color palette to the range [-1,-1e-4], half to [1e-4,1] and draw a significant fraction of the data as "black". If instead the histogram was used to choose q, then we might be able to decide that all of the "naughty" data were outliers and select a log color-scale of [1e-10,1].

	David

> User selectable functionality would be as follows:
> 	* to allow/ not allow negative numbers (default allow)
> 	* to be able to change the q exponent (i.e., 4 above) (default 4)
> 	* to be able to change the painting color that is too small (default white)
> 	* to clamp minimum to some number (such as q == 4 above).  (default on).
>  
> Thoughts?
>  
> Alan
>  
>  
> _______________________________________________
> Powered by www.kitware.com
> 
> Visit other Kitware open-source projects athttp://www.kitware.com/opensource/opensource.html
> 
> Please keep messages on-topic and check the ParaView Wiki at:http://paraview.org/Wiki/ParaView
> 
> Follow this link to subscribe/unsubscribe:
> http://public.kitware.com/mailman/listinfo/paraview



More information about the ParaView mailing list