MantisBT - CMake
View Issue Details
0015891CMakeCMakepublic2015-12-21 15:212016-05-02 08:30
Ben Boeckel 
Clinton Stimpson 
normalminorhave not tried
closedwon't fix 
CMake 3.4.1 
 
0015891: CMake extracts non-standard filenames incorrectly from tarballs
Attached is a tarball which contains a file "Appendix A \xc2\xa0 An Introduction to Preprocessor Metaprogramming.html" (the hex is a UTF-8 non-breaking-space). "cmake -E tar xzf" will extract the file improperly, mangling it. CMake can then not delete the mangled file.

Using msys2's (from git-bash) to extract the tarball works as expected (the filename looks fine and cmake can delete it).
No tags attached.
gz bad-filename.tar.gz (170) 2015-12-21 15:21
https://public.kitware.com/Bug/file/5595/bad-filename.tar.gz
Issue History
2015-12-21 15:21Ben BoeckelNew Issue
2015-12-21 15:21Ben BoeckelFile Added: bad-filename.tar.gz
2015-12-21 15:59Ben BoeckelNote Added: 0040018
2015-12-24 08:48Clinton StimpsonNote Added: 0040025
2015-12-24 10:06Clinton StimpsonNote Added: 0040026
2016-01-01 20:40Clinton StimpsonNote Added: 0040036
2016-01-01 20:40Clinton StimpsonStatusnew => resolved
2016-01-01 20:40Clinton StimpsonResolutionopen => won't fix
2016-01-01 20:40Clinton StimpsonAssigned To => Clinton Stimpson
2016-05-02 08:30Robert MaynardNote Added: 0040990
2016-05-02 08:30Robert MaynardStatusresolved => closed

Notes
(0040018)
Ben Boeckel   
2015-12-21 15:59   
When git-bash's tar extracts it, it is put on disk as \xc2\xa0, but when CMake extracts it, it becomes \xc2\x00\xa0\x00 (I think). Messing around with iconv, I can generate: \xc3\x82\xc2\xa0 which appears to be a surrogate pair.
(0040025)
Clinton Stimpson   
2015-12-24 08:48   
Which platform is this on?
(0040026)
Clinton Stimpson   
2015-12-24 10:06   
I see this problem on Windows.

The string for the filename doesn't make it to the CMake side and is treated as an OEM code page in libarchive. Perhaps we need to look into updating libarchive.

My 7-zip had the exact same problem as CMake, until I upgraded my 7-zip.
(0040036)
Clinton Stimpson   
2016-01-01 20:40   
The format of the attached .tar file is and old format where encoding is undefined, or rather, the encoding is determined by the environment of the machine that created it.

If you want a defined encoding, you need to switch to another tar format, such as posix tar, where filenames in a .tar file are UTF-8.

If I take the attached .tar file, and re-create it using
$ tar --format=posix -cf ....
then take that new file over to Windows where I do cmake -E tar zxf ..., then I have no problem.

In summary, if you want to use non-ascii filenames in .tar, use a more recent standard such as posix tar.
(0040990)
Robert Maynard   
2016-05-02 08:30   
Closing resolved issues that have not been updated in more than 4 months.