View Issue Details Jump to Notes ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0013806CMakeCMakepublic2012-12-20 08:522016-06-10 14:31
ReporterAndreas Mohr 
Assigned ToKitware Robot 
PrioritynoneSeveritytextReproducibilityalways
StatusclosedResolutionmoved 
PlatformPCOSLinuxOS VersionRHEL5
Product VersionCMake 2.8.10.2 
Target VersionFixed in Version 
Summary0013806: list(SORT) produces unavoidable data corruption (likely root cause: improper semi-colon string *payload* handling in CMake)
DescriptionI just wanted to extend a test case for my work on a new ENVIRONMENT option to add_custom_command() (adding many "interesting" env var key/value tests for escaping of XML, batch, shell, ... specials),
and ended up realizing that any escaped semi-colon string content will end up getting broken, hard, by CMake.

The test provided below will produce the following output:

$ cmake .
tlist pre-sort:
hi;there
cruel;world

tlist post-sort:
cruel
world
hi
there

extra_escaped_list pre-sort:
hi;there
cruel;world

extra_escaped_list post-sort:
cruel
world
hi
there

extra_escaped_list_i_mean_it pre-sort:
hi\\
there
cruel\\
world

extra_escaped_list_i_mean_it post-sort:
cruel\\
hi\\
there
world

-- Configuring done
-- Generating done



This shows that any SORT of the list will cause unsurviveable data corruption, which is a major, INSURMOUNTABLE problem (as the futile attempts at backslash-escaping in the test case below illustrate) when attempting to compile a list of patterns for escape tests towards rather unrelated system components.

This wide-spread data corruption by CMake core layers should be fixed quickly (we're now at 2.8.x, finally having reached good usability on the very large number of systems that CMake supports, where I really wouldn't have expected such data corruption issues to have remained).
I'd deem data corruption bugs to be of the near-highest level on the priority scale (with perhaps actual security issues topping it), thereby assigning priority urgent.
(I did read "[sldev] Semicolons and CMake" https://lists.secondlife.com/pipermail/sldev/2009-April/013502.html [^] , and have to admit I walked away unconvinced)

This should possibly be handled by introducing a new case to the CMake policy mechanism, to preserve the (reportedly quite important in this case) bug-for-bug compat in older code.

If it isn't possible to fix this problem cleanly (or e.g. not in a single evolution step), then one should think of other possibilities to be able to work around the currently unavoidable data corruption. One way might be to introduce a special CMAKE_ESCAPE_* variable which when inserted marks content in a special manner to ensure proper handling.
Or possibly one could add new cmake_escape_*() function helpers rather than resorting to often unclean global-variable-based handling.

Thanks!
Steps To Reproducecmake_minimum_required(VERSION 2.8)

project(list_escape_semicolon_test NONE)

function(show_list _title _list)
  message("${_title}:")
  foreach(elem_ ${_list})
    message("${elem_}")
  endforeach(elem_ ${_list})
  message("")
endfunction(show_list _title _list)

function(process_list _name _list)
  show_list("${_name} pre-sort" "${_list}")
  list(SORT _list)
  show_list("${_name} post-sort" "${_list}")
endfunction(process_list _name _list)

function(test_list_sort_escaping)
  set(tlist "")
  list(APPEND tlist "hi\;there")
  list(APPEND tlist "cruel\;world")
  process_list(tlist "${tlist}")
  set(tlist "")
  list(APPEND tlist "hi\\;there")
  list(APPEND tlist "cruel\\;world")
  process_list(extra_escaped_list "${tlist}")
  set(escape_string "\\\\")
  set(tlist "")
  list(APPEND tlist "hi${escape_string};there")
  list(APPEND tlist "cruel${escape_string};world")
  process_list(extra_escaped_list_i_mean_it "${tlist}")
endfunction(test_list_sort_escaping)

test_list_sort_escaping()
Additional InformationThat's now the second (and unrelated) time in about two weeks that I stumbled (and fell) over this (the first time being reading in a file(STRINGS) with semi-colon payload and iterating over elements of the resulting list).
TagsNo tags attached.
Attached Files

 Relationships

  Notes
(0031931)
Brad King (manager)
2012-12-20 09:23

CMake is not a general-purpose data processing language. Semicolons are simply not supported in list values very well (nor are square brackets).

Semicolon-separated lists were originally created to handle lists of source files e.g.

 set(srcs a.c b.c)
 add_executable(a ${srcs})

so the implementation, created in the early days, did not take use cases beyond that into account. Back then CMake was used only for our own projects so we didn't need anything more robust.

Fixing this will require at least the following:

1. Teach list expansion parsing to handle escapes correctly. Replace '\\' with '\' and '\;' with ';' without dividing. There is a partial implementation of this already but doesn't quite work right.

2. Teach list construction to generate escapes correctly. Replace '\' with '\\' and ';' with '\;' in values. Even if this is fixed in the C++ list construction cases there will still be projects that make their own lists via string manipulation whose behavior would be changed by step 1.

3. I'm not sure what to do about the []-nesting cases.

4. I'm not sure what to do about backward compatibility. At least a policy will be needed, but it could be quite difficult to get right.
(0031938)
Brad King (manager)
2012-12-21 08:45

Re 0013806:0031931: Having thought about this for a day my conclusion is that this is not worth fixing. The behavior has been this way for over 10 years and projects have gotten by with it and many may now depend on it. It's not even clear what the proper behavior would be if this were fixed because there are ambiguities. For example, does

 set(x "1;2;3")

store a list of three elements, a list of one element containing semicolons, or a non-list string value? (This question is rhetorical.)

Of course list behavior could have been implemented more carefully from the beginning but it is too late to change behavior this fundamental to the language. Data processing is not a design goal.

I think the most we can do is improve documentation to warn about cases involving ';' inside values and nested inside '[]'.
(0042177)
Kitware Robot (administrator)
2016-06-10 14:28

Resolving issue as `moved`.

This issue tracker is no longer used. Further discussion of this issue may take place in the current CMake Issues page linked in the banner at the top of this page.

 Issue History
Date Modified Username Field Change
2012-12-20 08:52 Andreas Mohr New Issue
2012-12-20 09:23 Brad King Note Added: 0031931
2012-12-21 08:45 Brad King Note Added: 0031938
2012-12-21 08:45 Brad King Priority urgent => none
2012-12-21 08:45 Brad King Severity major => text
2012-12-21 08:45 Brad King Status new => backlog
2016-06-10 14:28 Kitware Robot Note Added: 0042177
2016-06-10 14:28 Kitware Robot Status backlog => resolved
2016-06-10 14:28 Kitware Robot Resolution open => moved
2016-06-10 14:28 Kitware Robot Assigned To => Kitware Robot
2016-06-10 14:31 Kitware Robot Status resolved => closed


Copyright © 2000 - 2018 MantisBT Team