View Issue Details Jump to Notes ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0015891CMakeCMakepublic2015-12-21 15:212016-05-02 08:30
ReporterBen Boeckel 
Assigned ToClinton Stimpson 
PrioritynormalSeverityminorReproducibilityhave not tried
StatusclosedResolutionwon't fix 
PlatformOSOS Version
Product VersionCMake 3.4.1 
Target VersionFixed in Version 
Summary0015891: CMake extracts non-standard filenames incorrectly from tarballs
DescriptionAttached is a tarball which contains a file "Appendix A \xc2\xa0 An Introduction to Preprocessor Metaprogramming.html" (the hex is a UTF-8 non-breaking-space). "cmake -E tar xzf" will extract the file improperly, mangling it. CMake can then not delete the mangled file.

Using msys2's (from git-bash) to extract the tarball works as expected (the filename looks fine and cmake can delete it).
TagsNo tags attached.
Attached Filesgz file icon bad-filename.tar.gz [^] (170 bytes) 2015-12-21 15:21

 Relationships

  Notes
(0040018)
Ben Boeckel (developer)
2015-12-21 15:59

When git-bash's tar extracts it, it is put on disk as \xc2\xa0, but when CMake extracts it, it becomes \xc2\x00\xa0\x00 (I think). Messing around with iconv, I can generate: \xc3\x82\xc2\xa0 which appears to be a surrogate pair.
(0040025)
Clinton Stimpson (developer)
2015-12-24 08:48

Which platform is this on?
(0040026)
Clinton Stimpson (developer)
2015-12-24 10:06

I see this problem on Windows.

The string for the filename doesn't make it to the CMake side and is treated as an OEM code page in libarchive. Perhaps we need to look into updating libarchive.

My 7-zip had the exact same problem as CMake, until I upgraded my 7-zip.
(0040036)
Clinton Stimpson (developer)
2016-01-01 20:40

The format of the attached .tar file is and old format where encoding is undefined, or rather, the encoding is determined by the environment of the machine that created it.

If you want a defined encoding, you need to switch to another tar format, such as posix tar, where filenames in a .tar file are UTF-8.

If I take the attached .tar file, and re-create it using
$ tar --format=posix -cf ....
then take that new file over to Windows where I do cmake -E tar zxf ..., then I have no problem.

In summary, if you want to use non-ascii filenames in .tar, use a more recent standard such as posix tar.
(0040990)
Robert Maynard (manager)
2016-05-02 08:30

Closing resolved issues that have not been updated in more than 4 months.

 Issue History
Date Modified Username Field Change
2015-12-21 15:21 Ben Boeckel New Issue
2015-12-21 15:21 Ben Boeckel File Added: bad-filename.tar.gz
2015-12-21 15:59 Ben Boeckel Note Added: 0040018
2015-12-24 08:48 Clinton Stimpson Note Added: 0040025
2015-12-24 10:06 Clinton Stimpson Note Added: 0040026
2016-01-01 20:40 Clinton Stimpson Note Added: 0040036
2016-01-01 20:40 Clinton Stimpson Status new => resolved
2016-01-01 20:40 Clinton Stimpson Resolution open => won't fix
2016-01-01 20:40 Clinton Stimpson Assigned To => Clinton Stimpson
2016-05-02 08:30 Robert Maynard Note Added: 0040990
2016-05-02 08:30 Robert Maynard Status resolved => closed


Copyright © 2000 - 2018 MantisBT Team