[Bug 268189] BSD tar incorectly encode UTF-8 sequences
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 268189] BSD tar incorectly encode UTF-8 sequences"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 268189] BSD tar incorectly encode UTF-8 sequences"
- Reply: bugzilla-noreply_a_freebsd.org: "[Bug 268189] BSD tar incorectly encode UTF-8 sequences"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Tue, 06 Dec 2022 08:22:58 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=268189 Bug ID: 268189 Summary: BSD tar incorectly encode UTF-8 sequences Product: Base System Version: 13.1-RELEASE Hardware: Any OS: Any Status: New Severity: Affects Many People Priority: --- Component: bin Assignee: bugs@FreeBSD.org Reporter: aeder@list.ru BSD tar incorectly encode UTF-8 sequences How to repeat: Create two directories with (UTF-8) names: d0 bf d0 be d0 bb d0 b5 d0 b2 d0 be d0 b8 cc 86 d0 bf d0 be d0 bb d0 b5 d0 b2 d0 be d0 b9 ("полевой" and "полевой"). It looks exactly the same, but actually it's different names. The difference is that sequence 'd0 b9' encode cyrillic 'й' symbol, but 'd0 b8 cc 86' encode actually two symbols: cyrillic 'и' and diacritic symbol which I can't enter here. You can create such directories or files, but if archived using BSD tar, second name become replaced by first name. Adding --posix option or LC_ALL=C doesn't help. GNU tar handle such files correctly - as separate files/directories. I think at least --posix (or some another option) must allow to COMPLETELY disable all filename encoding/decoding operations. Problem arise in 12.3-RELEASE also, but seems to absent in 10-RELEASEs. -- You are receiving this mail because: You are the assignee for the bug.