From nobody Tue May 30 12:57:18 2023 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4QVsqf5R4gz4XSTv; Tue, 30 May 2023 12:57:18 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4QVsqf4vxlz4GMs; Tue, 30 May 2023 12:57:18 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1685451438; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=sS+A5mSa2wI1OG4KqfVlUwaeEQ2uMmse183uFtx4EiE=; b=vAADNXfmIG9TEj9tLptJcH90UTM9cJOQkVOvV7eAIalyC5gm+ODvENw79wr5OOlcMZ8xIs 0DCSr8oj7SMxE09qo/9Ps9zyOFMTYbKEcNrdWra1u6T8FvnwGciCntg+7z8ddBm8IOw2W/ btTc3Cn5cDko9UCDImxcU+rlCcvd6051Arf6xBXPNkeYm0clZE7/NE3QATj3J8EJ4wVQke QmtWxquEea6O7EqYA2Y7wR8W5d4wM6zlSgG2vHfNIOiFD9KTTtcCUuMU/mgLyHYQTkzmGg 7VBGOncsraItdTJQ4Pal5JoR0ezL0jRN9qbFId2zJr1XZtNgH3l+0d/m9lIy2w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1685451438; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=sS+A5mSa2wI1OG4KqfVlUwaeEQ2uMmse183uFtx4EiE=; b=xnRSZkzF9Ovay/OPjqSt4PA+WKDT+m8vcGQpuzFiafxLbO58lRjYM6N+FjI1slcFnV8kcg sXF4bKdZ3aYBmj/MZ7kegTmAIsZMix73bGZwYZmSXVjEN90M5mBhbM1GypaCAUtTeHNHps Z/PVrnRa8CMkKKthNheVSelk9t7gG/UC3LDlh/J6bwYCVHCBFsh4iarRXez73J4WoUxPVT pnWG1hNv6DoQZfFne0KvWBgDVdd8kcgfBDjHn5IjkZoqzv7NDKLdXskhWFAg0rWf73Gq1Y ngFkno+oaoIDSTHyr2x1BnQ0mMOooRn5oEdlSKqlAeTkIlL7UF2ALeMgeJ1IQA== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1685451438; a=rsa-sha256; cv=none; b=U8eqWidH+Hi8wJbcJL9ItsRg2ZaILFHO5nRpbavUyvMebKdsWzX1YX/DCuMv7m3MNaJvRc eaCW50KCIfQ6T9405mn9pzf/b86vi6BMQEc5n4NIkj+7x84wvXOsP2mOA3jNG2Yes9oaKc HwaPWfpOYKBJOXTY0qOzIjrZENtV3LKCHhIy6HUCym1sB8ZQtWyGcwYctE2LekSCrUeGcB +Iy6j1qyFcb52NrF0D30QPogF1+32c3LUxnT/bekj7J9seXF9xpYmnAlWdBha/fkKFmY7i +WcJEzqExvesbNpea4Q8SRR+FLOC4q5hIucQo8+t8i6aVc7Qg60U+RkOGlD8/g== Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4QVsqf3ySFzv5G; Tue, 30 May 2023 12:57:18 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.16.1/8.16.1) with ESMTP id 34UCvIPx000039; Tue, 30 May 2023 12:57:18 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.16.1/8.16.1/Submit) id 34UCvIVZ000038; Tue, 30 May 2023 12:57:18 GMT (envelope-from git) Date: Tue, 30 May 2023 12:57:18 GMT Message-Id: <202305301257.34UCvIVZ000038@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org From: Christos Margiolis Subject: git: c4f7198f47c1 - main - split(1): auto-extend suffix length if required List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-all@freebsd.org X-BeenThere: dev-commits-src-all@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: christos X-Git-Repository: src X-Git-Refname: refs/heads/main X-Git-Reftype: branch X-Git-Commit: c4f7198f47c15eece849d06e8fdd1fb46ed43bba Auto-Submitted: auto-generated X-ThisMailContainsUnwantedMimeParts: N The branch main has been updated by christos: URL: https://cgit.FreeBSD.org/src/commit/?id=c4f7198f47c15eece849d06e8fdd1fb46ed43bba commit c4f7198f47c15eece849d06e8fdd1fb46ed43bba Author: Jan Schaumann AuthorDate: 2023-05-30 12:55:38 +0000 Commit: Christos Margiolis CommitDate: 2023-05-30 12:55:38 +0000 split(1): auto-extend suffix length if required If the input cannot be split into the number of files resulting from the default suffix length, automatically extend the suffix length rather than bailing out with 'too many files'. Suffixes are extended such that the resulting files continue to sort lexically and "cat *" would reproduce the input. For example, splitting a 1M lines file into (default) 1000 lines per file would yield files named 'xaa', 'xab', ..., 'xyy', 'xyz', 'xzaaa', 'xzaab', ..., 'xzanl'. If '-a' is specified, the suffix length is not auto-extended. This behavior matches GNU sort(1) since around version 8.16. Reviewed by: christos Approved by: kevans Different Revision: https://reviews.freebsd.org/D38279 --- usr.bin/split/split.1 | 8 ++++++-- usr.bin/split/split.c | 31 +++++++++++++++++++++++++++++++ 2 files changed, 37 insertions(+), 2 deletions(-) diff --git a/usr.bin/split/split.1 b/usr.bin/split/split.1 index 14ea2eec8dad..ee7c3d412db4 100644 --- a/usr.bin/split/split.1 +++ b/usr.bin/split/split.1 @@ -28,7 +28,7 @@ .\" @(#)split.1 8.3 (Berkeley) 4/16/94 .\" $FreeBSD$ .\" -.Dd April 18, 2023 +.Dd May 26, 2023 .Dt SPLIT 1 .Os .Sh NAME @@ -151,7 +151,11 @@ characters in the range .Dq Li a Ns - Ns Li z . If .Fl a -is not specified, two letters are used as the suffix. +is not specified, two letters are used as the initial suffix. +If the output does not fit into the resulting number of files and the +.Fl d +flag is not specified, then the suffix length is automatically extended as +needed such that all output files continue to sort in lexical order. .Pp If the .Ar prefix diff --git a/usr.bin/split/split.c b/usr.bin/split/split.c index 5d6cbe138d38..769567b28325 100644 --- a/usr.bin/split/split.c +++ b/usr.bin/split/split.c @@ -75,6 +75,7 @@ static regex_t rgx; static int pflag; static bool dflag; static long sufflen = 2; /* File name suffix length. */ +static int autosfx = 1; /* Whether to auto-extend the suffix length. */ static void newfile(void); static void split1(void); @@ -116,6 +117,7 @@ main(int argc, char **argv) if ((sufflen = strtol(optarg, &ep, 10)) <= 0 || *ep) errx(EX_USAGE, "%s: illegal suffix length", optarg); + autosfx = 0; break; case 'b': /* Byte count. */ errno = 0; @@ -366,6 +368,35 @@ newfile(void) } pattlen = end - beg + 1; + /* + * If '-a' is not specified, then we automatically expand the + * suffix length to accomodate splitting all input. We do this + * by moving the suffix pointer (fpnt) forward and incrementing + * sufflen by one, thereby yielding an additional two characters + * and allowing all output files to sort such that 'cat *' yields + * the input in order. I.e., the order is '... xyy xyz xzaaa + * xzaab ... xzyzy, xzyzz, xzzaaaa, xzzaaab' and so on. + */ + if (!dflag && autosfx && (fpnt[0] == 'y') && + strspn(fpnt+1, "z") == strlen(fpnt+1)) { + fpnt = fname + strlen(fname) - sufflen; + fpnt[sufflen + 2] = '\0'; + fpnt[0] = end; + fpnt[1] = beg; + + /* Basename | Suffix + * before: + * x | yz + * after: + * xz | a.. */ + fpnt++; + sufflen++; + + /* Reset so we start back at all 'a's in our extended suffix. */ + tfnum = 0; + fnum = 0; + } + /* maxfiles = pattlen^sufflen, but don't use libm. */ for (maxfiles = 1, i = 0; i < sufflen; i++) if (LONG_MAX / pattlen < maxfiles)