Re: shell script for removing unprintable characters in file names
- In reply to: Per olof Ljungmark : "shell script for removing unprintable characters in file names"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sat, 30 Nov 2024 20:06:15 UTC
On 11/30/24 07:33, Per olof Ljungmark wrote: > Hi, > > I am tasked with recovering hundreds or more files created with unknown > OSs and have unknown characters in the name, replaced with a '?'. > > Like file?nam?.??? > > Please, if you have such a script can you post or email it? Replacing > the unknown character with anything, like '-' or '_' using whatever > shell, sh, bash or csh. > > Thanks a lot! > > Per The first thing to understand is that a traditional Unix file names may be composed of any 8-bit characters except the directory path delimiter (forward slash '/', hexadecimal 0x2f) and the C programming language string terminator (NUL, hexadecimal 0x00). Create a test file with some control characters in the file name: 2024-11-30 11:40:47 dpchrist@laalaa ~/sandbox/rename $ cat /etc/debian_version ; uname -a 11.11 Linux laalaa 5.10.0-33-amd64 #1 SMP Debian 5.10.226-1 (2024-10-03) x86_64 GNU/Linux 2024-11-30 11:41:04 dpchrist@laalaa ~/sandbox/rename $ perl -e 'open FH, ">", "space tab\tnewline\n"; print FH "hello, world!\n"; close FH' 2024-11-30 11:41:55 dpchrist@laalaa ~/sandbox/rename $ ls 'space tab'$'\t''newline'$'\n' 2024-11-30 11:42:09 dpchrist@laalaa ~/sandbox/rename $ ls | hexdump 00000000 73 70 61 63 65 20 74 61 62 09 6e 65 77 6c 69 6e |space tab.newlin| 00000010 65 0a 0a |e..| 00000013 2024-11-30 11:42:31 dpchrist@laalaa ~/sandbox/rename $ cat 'space tab'$'\t''newline'$'\n' hello, world! Perl and the URI::Escape module can be used to replace problematic characters with percent-hexadecimal escape codes: 2024-11-30 11:47:11 dpchrist@laalaa ~/sandbox/rename $ perl -v | head -n 2 | tail -n 1 This is perl 5, version 32, subversion 1 (v5.32.1) built for x86_64-linux-gnu-thread-multi 2024-11-30 11:48:11 dpchrist@laalaa ~/sandbox/rename $ perl -mURI::Escape -e 'print $URI::Escape::VERSION, $/' 5.08 2024-11-30 11:49:25 dpchrist@laalaa ~/sandbox/rename $ perl -mURI::Escape=uri_escape -e 'foreach (@ARGV) {$in=$_; $out=uri_escape($_); rename($in, $out) or die "failed to rename $in"}' * 2024-11-30 11:50:54 dpchrist@laalaa ~/sandbox/rename $ ls space%20tab%09newline%0A By encoding problematic characters, rather than replacing them with a constant placeholder character ('?', '_', etc.), information is preserved and the process is reversible: 2024-11-30 11:50:57 dpchrist@laalaa ~/sandbox/rename $ perl -mURI::Escape=uri_unescape -e 'foreach (@ARGV) {$in=$_; $out=uri_unescape($_); rename($in, $out) or die "failed to rename $in"}' * 2024-11-30 11:51:07 dpchrist@laalaa ~/sandbox/rename $ ls 'space tab'$'\t''newline'$'\n' HTH, David p.s. I tried rename(1), but it has problems with newlines OOTB. Writing a pair of one-liners was faster than trying to work-around rename(1): https://manpages.debian.org/bullseye/rename/rename.1.en.html