Re: shell script for removing unprintable characters in file names

From: David Christensen <dpchrist_at_holgerdanske.com>
Date: Sat, 30 Nov 2024 20:06:15 UTC
On 11/30/24 07:33, Per olof Ljungmark wrote:
> Hi,
> 
> I am tasked with recovering hundreds or more files created with unknown 
> OSs and have unknown characters in the name, replaced with a '?'.
> 
> Like file?nam?.???
> 
> Please, if you have such a script can you post or email it? Replacing 
> the unknown character with anything, like '-' or '_' using whatever 
> shell, sh, bash or csh.
> 
> Thanks a lot!
> 
> Per


The first thing to understand is that a traditional Unix file names may 
be composed of any 8-bit characters except the directory path delimiter 
(forward slash '/', hexadecimal 0x2f) and the C programming language 
string terminator (NUL, hexadecimal 0x00).


Create a test file with some control characters in the file name:

2024-11-30 11:40:47 dpchrist@laalaa ~/sandbox/rename
$ cat /etc/debian_version ; uname -a
11.11
Linux laalaa 5.10.0-33-amd64 #1 SMP Debian 5.10.226-1 (2024-10-03) 
x86_64 GNU/Linux

2024-11-30 11:41:04 dpchrist@laalaa ~/sandbox/rename
$ perl -e 'open FH, ">", "space tab\tnewline\n"; print FH "hello, 
world!\n"; close FH'

2024-11-30 11:41:55 dpchrist@laalaa ~/sandbox/rename
$ ls
'space tab'$'\t''newline'$'\n'

2024-11-30 11:42:09 dpchrist@laalaa ~/sandbox/rename
$ ls | hexdump
00000000  73 70 61 63 65 20 74 61  62 09 6e 65 77 6c 69 6e  |space 
tab.newlin|
00000010  65 0a 0a                                          |e..|
00000013

2024-11-30 11:42:31 dpchrist@laalaa ~/sandbox/rename
$ cat 'space tab'$'\t''newline'$'\n'
hello, world!


Perl and the URI::Escape module can be used to replace problematic 
characters with percent-hexadecimal escape codes:

2024-11-30 11:47:11 dpchrist@laalaa ~/sandbox/rename
$ perl -v | head -n 2 | tail -n 1
This is perl 5, version 32, subversion 1 (v5.32.1) built for 
x86_64-linux-gnu-thread-multi

2024-11-30 11:48:11 dpchrist@laalaa ~/sandbox/rename
$ perl -mURI::Escape -e 'print $URI::Escape::VERSION, $/'
5.08

2024-11-30 11:49:25 dpchrist@laalaa ~/sandbox/rename
$ perl -mURI::Escape=uri_escape -e 'foreach (@ARGV) {$in=$_; 
$out=uri_escape($_); rename($in, $out) or die "failed to rename $in"}' *

2024-11-30 11:50:54 dpchrist@laalaa ~/sandbox/rename
$ ls
space%20tab%09newline%0A


By encoding problematic characters, rather than replacing them with a 
constant placeholder character ('?', '_', etc.), information is 
preserved and the process is reversible:

2024-11-30 11:50:57 dpchrist@laalaa ~/sandbox/rename
$ perl -mURI::Escape=uri_unescape -e 'foreach (@ARGV) {$in=$_; 
$out=uri_unescape($_); rename($in, $out) or die "failed to rename $in"}' *

2024-11-30 11:51:07 dpchrist@laalaa ~/sandbox/rename
$ ls
'space tab'$'\t''newline'$'\n'


HTH,

David


p.s.  I tried rename(1), but it has problems with newlines OOTB. 
Writing a pair of one-liners was faster than trying to work-around 
rename(1):

https://manpages.debian.org/bullseye/rename/rename.1.en.html