From nobody Tue Feb 21 10:14:21 2023 X-Original-To: freebsd-questions@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4PLZs53R8Kz3s89D for ; Tue, 21 Feb 2023 10:14:33 +0000 (UTC) (envelope-from andreas.kahari@abc.se) Received: from hekla.abc.se (hekla.abc.se [158.174.61.227]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA512) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4PLZs46VQcz3wxH for ; Tue, 21 Feb 2023 10:14:32 +0000 (UTC) (envelope-from andreas.kahari@abc.se) Authentication-Results: mx1.freebsd.org; none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; s=default; bh=xwV/Yxv52v K5aMA+7i0x3WUhOGgVkqw1KGYx+whuSDI=; h=in-reply-to:references:subject: cc:to:from:date; d=abc.se; b=ng2ocvSo9cRFNxINPkMapU6eMMsz0rY17re9LuVbM kX/W5JyRwWV2yX5XeUdgi30vEjUYx/1dI1Wv9Dfdn1pH6xij35uv0T/z28gYx0UcpYHLF3 mCZmYKUzdC3zb5PkPqnOgZGdG0Gma+S10DtSXexlW3xBGobz3QB8RX33o3R3gD4IYQhYfX eT2GWlny+P8Ei810EqSufUyxHbgV4iAoyGcXT0qO2IB1xws6+b2jHMIT+lZKbKIoQYJq8a CGEuJwoRTWWf5OqsVvdMmE+XeC8cAMa3taiYGCrj5iQ1qSkgeXrqnoijtVmYziiDhEI5Yg ADwOJirxaLkGhf9g9QS0w== Received: from harpo.local (83-233-144-161.cust.bredband2.com [83.233.144.161]) by hekla.abc.se (OpenSMTPD) with ESMTPSA id b8b404a7 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Tue, 21 Feb 2023 11:14:23 +0100 (CET) Date: Tue, 21 Feb 2023 11:14:21 +0100 From: Andreas Kusalananda =?utf-8?B?S8OkaMOkcmk=?= To: Sysadmin Lists Cc: Freebsd Questions Subject: Re: BSD-awk print() Behavior Message-ID: Mail-Followup-To: Sysadmin Lists , Freebsd Questions References: <1600449078.170379.1676939080787@fidget.co-bxl> List-Id: User questions List-Archive: https://lists.freebsd.org/archives/freebsd-questions List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-questions@freebsd.org X-BeenThere: freebsd-questions@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1600449078.170379.1676939080787@fidget.co-bxl> X-Rspamd-Queue-Id: 4PLZs46VQcz3wxH X-Spamd-Bar: ---- X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; ASN(0.00)[asn:8473, ipnet:158.174.0.0/16, country:SE] X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-ThisMailContainsUnwantedMimeParts: N On Tue, Feb 21, 2023 at 01:24:41AM +0100, Sysadmin Lists wrote: > Trying to wrap my head around what BSD awk is doing here. Although the behavior > is unwanted for this exercise, it seems like a possibly useful feature or hack > for future projects. Either way I'd like to understand what's going on. > > I extracted a list of URLs from my browser's history sql file, and when > iterating over the list with awk got some strange results. > > file_1 has the sql-extracted URLs, and file_2 is a copy-paste of that file's > contents using vim's yank-and-paste. > > $ cat file_{1,2} > https://github.com/ > https://github.com/ > https://github.com/ > https://github.com/ > > $ diff file_{1,2} > 1,2c1,2 > < https://github.com/ > < https://github.com/ > --- > > https://github.com/ > > https://github.com/ > > $ awk '{ print $0 " abc " }' file_{1,2} > abc ://github.com/ > abc ://github.com/ > https://github.com/ abc > https://github.com/ abc file_1 is a DOS text file, while file_2 is a Unix text file. The DOS text file, when interpreted by tools expecting Unix text, has an extra carriage-return character at the end of each line. This carriage-return character will be part of $0 in the awk code and causes the cursor to be moved back to the start of the line when printing it, giving the effect that you are seeing. This has nothing to do with awk's print keyword. You would get similar strange result if you simply pasted the data side by side: $ paste file_{1,2} https://https://github.com/ https://https://github.com/ Here, "https://github.com/" is first printed from the DOS text file, after which the cursor is returned to the start of the line. Then, paste inserts a tab character which "steps over" the eight first characters that had already been outputted ("https://") and then outputs "https://github.com/" from the Unix text file. > > The sql-extracted URLs cause awk's print() to replace the front of the string > with text following $0. file_2 does not. I used vim's `:set list' option to > view hidden chars, but there's no apparent difference between the two -- > although `diff' clearly thinks so. Both files show this when `list' is set: > > https://github.com/$ > https://github.com/$ Yes, because Vim automatically interprets DOS text files as ordinary text. I'm asssuming that while editing file_1 in Vim, you see "[dos]" at the bottom of the screen? > > > Here's more background if needed: [cut] -- Andreas (Kusalananda) Kähäri SciLifeLab, NBIS, ICM Uppsala University, Sweden .