From nobody Sat Nov 30 20:06:15 2024 X-Original-To: questions@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Y11KD4xj2z5fZYK for ; Sat, 30 Nov 2024 20:06:40 +0000 (UTC) (envelope-from dpchrist@holgerdanske.com) Received: from holgerdanske.com (holgerdanske.com [IPv6:2001:470:0:19b::b869:801b]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "holgerdanske.com", Issuer "R10" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Y11KC5LdPz4SLk for ; Sat, 30 Nov 2024 20:06:39 +0000 (UTC) (envelope-from dpchrist@holgerdanske.com) Authentication-Results: mx1.freebsd.org; dkim=pass header.d=holgerdanske.com header.s=nov-20210719-112354 header.b=gneAFan8; spf=pass (mx1.freebsd.org: domain of dpchrist@holgerdanske.com designates 2001:470:0:19b::b869:801b as permitted sender) smtp.mailfrom=dpchrist@holgerdanske.com; dmarc=pass (policy=none) header.from=holgerdanske.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=holgerdanske.com; s=nov-20210719-112354; t=1732997191; bh=rn4BMAri4igMQMcmfhzAMLhlVQVB/BXrLJEY/0jc9zI=; h=Received:Message-ID:Date:MIME-Version:User-Agent:Subject:To: References:Content-Language:From:In-Reply-To:Content-Type: Content-Transfer-Encoding; b=gneAFan8YxFOCeyoLkiBS1mwt8YRk+mYbuQSN9G3Jgau2en2TjlDxazZjQ98hCzU5 LZ4vmoLIvfcso+/jHmXjWTjly5wgyLNiZk6+W8y83Rv62jnqicu6N8PsblH3KJJ6Vy TihyCRadmtiH37usHS+wAWXNJRYjSy3ggKTJrM3FpxwFG5viPPpiYfNyh3seFxmmBj nz+I41DfhqF2GYkvVzpZRrljSfheru7DwnkrlL0pl5brx7k9vbYC3107zyR+c0+JN+ mV0RMNbnsQCv19sxeIwFND4vOiMImgHb8yCziQ6IQjWqZHhNN5PJFHnW9/yTQVJ3M8 5xRrt+krXsUiyt1toLOcwX8rYS1RR5OQ9MRKaRsCW+D3ldsZItR8dGQcvXkODxG3AK 64iqkdNLuLyAIGVYVYmW5VuCVBG/A6HHYFAOa6LEgJDKW7MK5znOiLb8i/RQmEmiR1 RUkNe08IwZVIqm7hFEHaHnYRB2lSZ2mcsfwv0qT6XCL/FmipgJ9lMLRhEYyc8WRmgI 0OxNRDHq/DcKiaKmhghQ9AYEI0KgvK4Zzt0mYRfoayVekZ+GH8RqzwgzdB5Z0AUaYR yZUQETCVclKVKqLtfZ7N/v0liJjWMS2p2tIOECO5Ubg/nlMELJ/FxVUkdTZRestcbW vITtAbwdEQ7QXiZ4LHa6bsrA= Received: from 99.100.19.101 (99-100-19-101.lightspeed.frokca.sbcglobal.net [99.100.19.101]) by holgerdanske.com with ESMTPSA (TLS_AES_128_GCM_SHA256:TLSv1.3:Kx=any:Au=any:Enc=AESGCM(128):Mac=AEAD) (SMTP-AUTH username dpchrist@holgerdanske.com, mechanism PLAIN) for ; Sat, 30 Nov 2024 12:06:31 -0800 Message-ID: Date: Sat, 30 Nov 2024 12:06:15 -0800 List-Id: User questions List-Archive: https://lists.freebsd.org/archives/freebsd-questions List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: freebsd-questions@freebsd.org Sender: owner-freebsd-questions@FreeBSD.org MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: shell script for removing unprintable characters in file names To: questions@freebsd.org References: <95966dac-9d93-401f-9948-5fcb224a1e1f@nethead.se> Content-Language: en-US From: David Christensen In-Reply-To: <95966dac-9d93-401f-9948-5fcb224a1e1f@nethead.se> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spamd-Result: default: False [-3.89 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; DMARC_POLICY_ALLOW(-0.50)[holgerdanske.com,none]; R_SPF_ALLOW(-0.20)[+a]; R_DKIM_ALLOW(-0.20)[holgerdanske.com:s=nov-20210719-112354]; ONCE_RECEIVED(0.10)[]; MIME_GOOD(-0.10)[text/plain]; XM_UA_NO_VERSION(0.01)[]; RCPT_COUNT_ONE(0.00)[1]; RCVD_VIA_SMTP_AUTH(0.00)[]; ASN(0.00)[asn:6939, ipnet:2001:470::/32, country:US]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_ONE(0.00)[1]; RCVD_TLS_ALL(0.00)[]; MLMMJ_DEST(0.00)[questions@freebsd.org]; ARC_NA(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; TO_DN_NONE(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[questions@freebsd.org]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DKIM_TRACE(0.00)[holgerdanske.com:+] X-Rspamd-Queue-Id: 4Y11KC5LdPz4SLk X-Spamd-Bar: --- On 11/30/24 07:33, Per olof Ljungmark wrote: > Hi, > > I am tasked with recovering hundreds or more files created with unknown > OSs and have unknown characters in the name, replaced with a '?'. > > Like file?nam?.??? > > Please, if you have such a script can you post or email it? Replacing > the unknown character with anything, like '-' or '_' using whatever > shell, sh, bash or csh. > > Thanks a lot! > > Per The first thing to understand is that a traditional Unix file names may be composed of any 8-bit characters except the directory path delimiter (forward slash '/', hexadecimal 0x2f) and the C programming language string terminator (NUL, hexadecimal 0x00). Create a test file with some control characters in the file name: 2024-11-30 11:40:47 dpchrist@laalaa ~/sandbox/rename $ cat /etc/debian_version ; uname -a 11.11 Linux laalaa 5.10.0-33-amd64 #1 SMP Debian 5.10.226-1 (2024-10-03) x86_64 GNU/Linux 2024-11-30 11:41:04 dpchrist@laalaa ~/sandbox/rename $ perl -e 'open FH, ">", "space tab\tnewline\n"; print FH "hello, world!\n"; close FH' 2024-11-30 11:41:55 dpchrist@laalaa ~/sandbox/rename $ ls 'space tab'$'\t''newline'$'\n' 2024-11-30 11:42:09 dpchrist@laalaa ~/sandbox/rename $ ls | hexdump 00000000 73 70 61 63 65 20 74 61 62 09 6e 65 77 6c 69 6e |space tab.newlin| 00000010 65 0a 0a |e..| 00000013 2024-11-30 11:42:31 dpchrist@laalaa ~/sandbox/rename $ cat 'space tab'$'\t''newline'$'\n' hello, world! Perl and the URI::Escape module can be used to replace problematic characters with percent-hexadecimal escape codes: 2024-11-30 11:47:11 dpchrist@laalaa ~/sandbox/rename $ perl -v | head -n 2 | tail -n 1 This is perl 5, version 32, subversion 1 (v5.32.1) built for x86_64-linux-gnu-thread-multi 2024-11-30 11:48:11 dpchrist@laalaa ~/sandbox/rename $ perl -mURI::Escape -e 'print $URI::Escape::VERSION, $/' 5.08 2024-11-30 11:49:25 dpchrist@laalaa ~/sandbox/rename $ perl -mURI::Escape=uri_escape -e 'foreach (@ARGV) {$in=$_; $out=uri_escape($_); rename($in, $out) or die "failed to rename $in"}' * 2024-11-30 11:50:54 dpchrist@laalaa ~/sandbox/rename $ ls space%20tab%09newline%0A By encoding problematic characters, rather than replacing them with a constant placeholder character ('?', '_', etc.), information is preserved and the process is reversible: 2024-11-30 11:50:57 dpchrist@laalaa ~/sandbox/rename $ perl -mURI::Escape=uri_unescape -e 'foreach (@ARGV) {$in=$_; $out=uri_unescape($_); rename($in, $out) or die "failed to rename $in"}' * 2024-11-30 11:51:07 dpchrist@laalaa ~/sandbox/rename $ ls 'space tab'$'\t''newline'$'\n' HTH, David p.s. I tried rename(1), but it has problems with newlines OOTB. Writing a pair of one-liners was faster than trying to work-around rename(1): https://manpages.debian.org/bullseye/rename/rename.1.en.html