From nobody Wed Jan 25 07:46:31 2023 X-Original-To: dev-commits-ports-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4P1wrl2zCCz3byPF; Wed, 25 Jan 2023 07:46:31 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4P1wrl2WSXz3N1V; Wed, 25 Jan 2023 07:46:31 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1674632791; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=mu48i8VIkHoF86pNNdAeX0oruZvEsIF6d8+IzKvXDF0=; b=G1ZzRmxBuCpgioMu67jg6GTBcadoHKR/m2w1en3PgCA8TOEvO8tWIRvj2uD25Of49c7M76 UoteqLCywuQqY0gh2j/3TXwNibt/33pRE/spgxCYDoZIfMcjXcY1p6bc2Q0G57HCZ4OFfw qQJybJeOKbTIjKWuYhy2zi1lugEbJExtpGKdm9bVm6XTEDyCTqM6AnekeSLjBs02T1F1oJ 2BfV6SJq26Nhi5jzQbInCGy3kz+QU4C2xVoF+xJ5MU/N9a1owfsWbE3Z2AG83MIeDMIRmj 2NFM1qB6n+6d3wET9/6E4y00a9lQJRFLS89qVDJJL6wI42plP/iMM+t29DlWfg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1674632791; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=mu48i8VIkHoF86pNNdAeX0oruZvEsIF6d8+IzKvXDF0=; b=HWiN/xtiPh7VKyWUDDJRvonCRaWX0G5vS9idK8oDrRYALOwigk0Y8sFTHpGs/h6gSR/GiQ DanNzgzzbK8SNYbYYrBLucBcUw+4BbLS4qfKXWIZAfx3QXg2xkFiiQBpD1u8Luo/YrCKVl nL4FjGE2l4eCYEBW/lv28L6pp+AFQ3cGpqObGE8zJsRaLLMr9upoX/azjALIeZnI0U1EFQ NZnxSGiAhtyLZYNNoHF7n03iK1GkJuGyJzW+Ld/cIKIgwOTBdttjCqzKz15zOGjyjktMK7 8/QXDb/akW+pZdwDEp/E2t6fFTS64WVzmFAJhnx3IrO+LZDNI13/ZQsHy6ofdg== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1674632791; a=rsa-sha256; cv=none; b=qJLgqfdHssbgTko7jAizO+QjI6R7iiZPtJEF+6pJ6ME4/1Uh55k/w9Hg9/3ArZFATM4hhH eYBSrVVgm5T3ULRN7noHQSolI2B7TjNhayjHpMqSRuaRewyZZqdyiiDm+8jKIKWPJqPlnN 7D74DHQqXjDg5+wuahfZAFXPmVUKKvSMx35+RFgxFxJxdPfmeoFdh+Xp0B7Hj7NDjUNxcm hH5IJM7MVSMspe5OlEy6+831H/AC+JCDk1YrUAx0tmA8wMhe89s+00zvrpmE3ABqhfc4nt nB0BsOR+Tt8ynKSplo5dPxNjG9jEYuJf1QJZVAEHOfaTEstJABSXNmjQkHDaiQ== Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4P1wrl1H9LzhY0; Wed, 25 Jan 2023 07:46:31 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.16.1/8.16.1) with ESMTP id 30P7kV24012761; Wed, 25 Jan 2023 07:46:31 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.16.1/8.16.1/Submit) id 30P7kVla012760; Wed, 25 Jan 2023 07:46:31 GMT (envelope-from git) Date: Wed, 25 Jan 2023 07:46:31 GMT Message-Id: <202301250746.30P7kVla012760@gitrepo.freebsd.org> To: ports-committers@FreeBSD.org, dev-commits-ports-all@FreeBSD.org, dev-commits-ports-main@FreeBSD.org From: Yuri Victorovich Subject: git: 6d884b207aab - main - textproc/py-sentencepiece: New port: Unsupervised text tokenizer for Neural Network-based text generation List-Id: Commits to the main branch of the FreeBSD ports repository List-Archive: https://lists.freebsd.org/archives/dev-commits-ports-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-ports-main@freebsd.org X-BeenThere: dev-commits-ports-main@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: yuri X-Git-Repository: ports X-Git-Refname: refs/heads/main X-Git-Reftype: branch X-Git-Commit: 6d884b207aab2373494bbd713278a80474a58601 Auto-Submitted: auto-generated X-ThisMailContainsUnwantedMimeParts: N The branch main has been updated by yuri: URL: https://cgit.FreeBSD.org/ports/commit/?id=6d884b207aab2373494bbd713278a80474a58601 commit 6d884b207aab2373494bbd713278a80474a58601 Author: Yuri Victorovich AuthorDate: 2023-01-25 07:45:57 +0000 Commit: Yuri Victorovich CommitDate: 2023-01-25 07:45:57 +0000 textproc/py-sentencepiece: New port: Unsupervised text tokenizer for Neural Network-based text generation --- textproc/Makefile | 1 + textproc/py-sentencepiece/Makefile | 26 ++++++++++++++++++++++++++ textproc/py-sentencepiece/distinfo | 3 +++ textproc/py-sentencepiece/pkg-descr | 7 +++++++ 4 files changed, 37 insertions(+) diff --git a/textproc/Makefile b/textproc/Makefile index e2d0e0ea9521..3d52828e2e12 100644 --- a/textproc/Makefile +++ b/textproc/Makefile @@ -1496,6 +1496,7 @@ SUBDIR += py-rst2html5 SUBDIR += py-rstfmt SUBDIR += py-scour + SUBDIR += py-sentencepiece SUBDIR += py-simplebayes SUBDIR += py-smartypants SUBDIR += py-snowballstemmer diff --git a/textproc/py-sentencepiece/Makefile b/textproc/py-sentencepiece/Makefile new file mode 100644 index 000000000000..fe1b9cfd4ba7 --- /dev/null +++ b/textproc/py-sentencepiece/Makefile @@ -0,0 +1,26 @@ +PORTNAME= sentencepiece +DISTVERSIONPREFIX= v +DISTVERSION= 0.1.97 +CATEGORIES= textproc # machine-learning +PKGNAMEPREFIX= ${PYTHON_PKGNAMEPREFIX} + +MAINTAINER= yuri@FreeBSD.org +COMMENT= Unsupervised text tokenizer for Neural Network-based text generation +WWW= https://github.com/google/sentencepiece + +LICENSE= APACHE20 +LICENSE_FILE= ${WRKSRC}/../LICENSE + +LIB_DEPENDS= libsentencepiece.so:textproc/sentencepiece + +USES= compiler:c++17-lang pkgconfig python +USE_PYTHON= distutils autoplist pytest + +USE_GITHUB= yes +GH_ACCOUNT= google + +WRKSRC_SUBDIR= python + +TEST_ENV= ${MAKE_ENV} PYTHONPATH=${STAGEDIR}${PYTHONPREFIX_SITELIBDIR} + +.include diff --git a/textproc/py-sentencepiece/distinfo b/textproc/py-sentencepiece/distinfo new file mode 100644 index 000000000000..c29dc9430710 --- /dev/null +++ b/textproc/py-sentencepiece/distinfo @@ -0,0 +1,3 @@ +TIMESTAMP = 1673860778 +SHA256 (google-sentencepiece-v0.1.97_GH0.tar.gz) = 41c3a07f315e3ac87605460c8bb8d739955bc8e7f478caec4017ef9b7d78669b +SIZE (google-sentencepiece-v0.1.97_GH0.tar.gz) = 11945436 diff --git a/textproc/py-sentencepiece/pkg-descr b/textproc/py-sentencepiece/pkg-descr new file mode 100644 index 000000000000..62b7de5f4ece --- /dev/null +++ b/textproc/py-sentencepiece/pkg-descr @@ -0,0 +1,7 @@ +SentencePiece is an unsupervised text tokenizer and detokenizer mainly for +Neural Network-based text generation systems where the vocabulary size is +predetermined prior to the neural model training. SentencePiece implements +subword units (e.g., byte-pair-encoding (BPE)) and unigram language model +with the extension of direct training from raw sentences. SentencePiece +allows us to make a purely end-to-end system that does not depend on +language-specific pre/postprocessing.