[Insight-users] how to use itk::RegularExpressionSeriesFileNames

Wed Jun 3 17:28:49 EDT 2009

Hi Darren,

Thanks for pointing this out,

Yes,
the documentation doesn't fully specify the behavior
of this regular expression engine.

The actual functionality is provided by the kwsys library in

                Insight/Utilities/kwsys

See the files

       RegularExpression.cxx
       RegularExpression.hxx.in

You will notice that the documentation for

    itkRegularExpressionSeriesFileNames

was extracted from the documentation in

       RegularExpression.hxx.in

This is a custom regular expression engine,
based on code that (at some point) was developed
at Texas Instruments.

If you want more details about the history of this code
we could track the sources of the files in kwsys.

     Regards,

           Luis

----------------------------------------
On Wed, Jun 3, 2009 at 2:59 PM, Darren Weber
<darren.weber.lists at gmail.com>wrote:

>
> Thanks, Bill.
>
> It's not clear from the header what the regex engine is.  Is it a custom
> regex or does it include a common regex library? e.g.:
> http://www.gnu.org/s/libc/manual/html_node/Pattern-Matching.html
> http://www.gnu.org/s/libc/manual/html_node/Regular-Expressions.html
> http://www.pcre.org/
>
> If the class uses a regex library, the documentation could point to online
> resources that define the regex language.  Unfortunately, there are subtle
> differences among regex libraries and it can be difficult to debug a regex
> without extended documentation and examples.
>
> http://www.regular-expressions.info/reference.html
> http://www.regular-expressions.info/refadv.html
> http://www.regular-expressions.info/refext.html
> http://www.regular-expressions.info/refflavors.html
>
> Now I understand that the subMatch is a component of the regex.  (Why
> didn't I understand that from the description?)  The header comment makes it
> clear that this is
>
> /** The index of the submatch that will be used to sort the matches. */
>
> May I suggest this phrase is included in the description.
>
> So the sub-regex must be defined within the regex using the () notation.
> For example, the test contains:
>
>   fit->SetRegularExpression("[^.]*.(.*)");
>   fit->SetSubMatch(1);
>
> It appears that the regex engine doesn't require escapes for [] and ().  In
> this example, when the SetSubMatch method is called with the numeric
> argument, it refers to the sub-regex pattern within "(.*)", which looks like
> it might be the filename extension (bmp, gif, png, tif, etc.).  However, the
> prior . is not escaped, so it's unclear whether it matches any character
> ('.') or a period char ('\.' in most regex engines).
>
> So the SetSubMatch method will always take a numeric argument, the index of
> the sub-regex (starting at 1, not 0).  For example, the following should be
> designed to exclude any files that begin with any number of '.' chars, then
> the file name is split into two sub-regex patterns to capture the file name
> and the file extension (assuming the file only has one '.' char in it to
> separate these parts of the full file name).  The period char '.' is not
> part of either sub-regex (unless the full file name has more than one).
>
>   fit->SetRegularExpression("[^.]*(.*)\.(.*)$");
>   fit->SetSubMatch(2);
>
> - [^.]* matches zero or more '.' char at the beginning of the string (the
> '.' is not escaped within [ ]).
> - (.*)\.(.*)$ matches patterns like "abcdef.xyz" at the end of a string,
> sub 1 is "abcdef", sub 2 is "xyz".
>
> In this pattern, the second subexpression should be the file extension,
> without a '.' char.  (Although the effect of this regex may depend on how
> greedy the .* pattern is.)  The '\.' prior to the second sub-expression is
> used to escape the usual meaning of the '.' char to match any char, so that
> the file name can be split into the file name and its extension (assuming
> the full file name has only one '.' char in it).
>
> May I suggest a couple of features?
>
> First, the SetSubMatch method could take an array of arguments.  It could
> be possible to sort on more than one sub-regex, with the sort precedence
> based on the values in the array.  In the example above, a call like the
> following would sort first by the file extension, then by the file name.
>
>   unsigned int sub[2] = {2, 1};
>   fit->SetSubMatch(sub);
>
> Second, the class might include a convenience method for debugging, to
> print the file names.  The method might adapt some of the code in the test.
> Perhaps call it PrintFileNames, PrintFileNamesSortedAlpha,
> PrintFileNamesSortedNumeric.
>
> Thanks again!
>
> Take care,
> Darren
>
>
>
>
>
> On Wed, Jun 3, 2009 at 5:06 AM, Bill Lorensen <bill.lorensen at gmail.com>wrote:
>
>> The test Testing/Code/IO/itkRegularExpressionSeriesFileNamesTest.cxx
>> shows how to sort and print the results.
>>
>> On Wed, Jun 3, 2009 at 12:59 AM, Darren Weber
>> <darren.weber.lists at gmail.com> wrote:
>> >
>> > The software guide and Examples/IO/ImageSeriesReadWrite2.cxx provide
>> some
>> > explanation of how to work with itk::RegularExpressionSeriesFileNames.
>> >
>> > However, I have not been able to find information on how to specify the
>> > regex and sort command line arguments for it.  The file name regex might
>> be
>> > compatible with grep or sed, or some other regex engine?  What is the
>> sort
>> > input, is it another regex?
>> >
>> > What is the best way to debug the regex and sort inputs?  What is the
>> > easiest way to get a std:cout list of the files after they are found and
>> > sorted?
>> >
>> > Thanks in advance,
>> > Darren
>> >
>> >
>> > PS,
>> >
>> > Detailed Description
>> >
>> > Generate an ordered sequence of filenames that match a regular
>> expression.
>> >
>> > This class generates an ordered sequence of files whose filenames match
>> a
>> > regular expression.  [What is the regex library?]  The file names are
>> sorted
>> > using a sub expression match selected by SubMatch.  [What does this
>> mean?]
>> > Regular expressions are a powerful, compact mechanism for parsing
>> strings.
>> > Expressions consist of the following metacharacters:
>> >
>> > ^ Matches at beginning of a line
>> >
>> > $ Matches at end of a line
>> >
>> > . Matches any single character
>> >
>> > [ ] Matches any character(s) inside the brackets
>> >
>> > [^ ] Matches any character(s) not inside the brackets
>> >
>> > Matches any character in range on either side of a dash
>> >
>> > * Matches preceding pattern zero or more times
>> >
>> > + Matches preceding pattern one or more times
>> >
>> > ? Matches preceding pattern zero or once only
>> >
>> > () Saves a matched expression and uses it in a later match
>> >
>> > Note that more than one of these metacharacters can be used in a single
>> > regular expression in order to create complex search patterns. For
>> example,
>> > the pattern [^ab1-9] says to match any character sequence that does not
>> > begin with the characters "ab" followed by numbers in the series one
>> through
>> > nine.
>> >
>> > Definition at line 72 of file itkRegularExpressionSeriesFileNames.h.
>> >
>> > _____________________________________
>> > Powered by www.kitware.com
>> >
>> > Visit other Kitware open-source projects at
>> > http://www.kitware.com/opensource/opensource.html
>> >
>> > Please keep messages on-topic and check the ITK FAQ at:
>> > http://www.itk.org/Wiki/ITK_FAQ
>> >
>> > Follow this link to subscribe/unsubscribe:
>> > http://www.itk.org/mailman/listinfo/insight-users
>> >
>> >
>>
>
>
> _____________________________________
> Powered by www.kitware.com
>
> Visit other Kitware open-source projects at
> http://www.kitware.com/opensource/opensource.html
>
> Please keep messages on-topic and check the ITK FAQ at:
> http://www.itk.org/Wiki/ITK_FAQ
>
> Follow this link to subscribe/unsubscribe:
> http://www.itk.org/mailman/listinfo/insight-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.itk.org/pipermail/insight-users/attachments/20090603/055b4d3d/attachment-0001.htm>