From nobody Fri May 03 19:22:32 2024 X-Original-To: dev-commits-src-main@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4VWLKk11xRz5JLt7; Fri, 3 May 2024 19:22:34 +0000 (UTC) (envelope-from jhb@FreeBSD.org) Received: from smtp.freebsd.org (smtp.freebsd.org [IPv6:2610:1c1:1:606c::24b:4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4VWLKj649zz4KxM; Fri, 3 May 2024 19:22:33 +0000 (UTC) (envelope-from jhb@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1714764153; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qkHp4a1q3aOq4RT8Fx1aL5rK9Iv+AhAOaAbQ4Q238Wk=; b=GRV5Ix8NZksgAIC064S2+9HT2kj9Yq3tb9nlvQjJzB1xVaojKOSf8gskPMDL01DxhkvuVo KgPnirol0MlfLYic6pf8/g0BhQ8iU1x0PKUnAWx8rXzjuFPBCxKr//iHHcxmT/A4sEi7+C UB157Azjs1VQp7A8ZwjeloYJvVcr9Va3zleivcTeg/Ki2RoA8wosPfdKe13iomGhtNjmqI klWefYCcwk97XIpWNeOY3qlUD6H1TV8AS1KmilKWbGynvBc+OcRC6LXmMQbZQ+PPUA40bT kJstSEAvBm08WHAiCnHNiEDgFaHqkoCAzkw6FQkqUWWi60qtDHewr/95W1Pv8w== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1714764153; a=rsa-sha256; cv=none; b=Y/7bcq4/yr78MxtDrNdT4Pg7WDGMeyFadJiSZcABq8BJAGDk0kpB4Oz/1Wyh88yOLPVfbH 6jnw5VQjHU7tZdhXEG+ATAKCD4RCP1D/eXODB+jBjw3PMbZUWhLAenZ0SdJiVbS9Zz08G4 Id/Bw9TNFtD3UaqKCiI0M3phEoVQSIQE0EhIb3St62qOVeL5XZA7E8G/5eH1j3ja6qp5ml l9HPrNxqt7VggmyTn7dqz3fvrHsDvHKg4Eh6oouW8bcbzLjmlLyfD6pa15UyKVT12O0j4h hP5gDs//obZOanoVIzWxAGp/6Gonj8dw93Zpr5h2RtDMcbIWwquPasgqZLVUHA== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1714764153; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qkHp4a1q3aOq4RT8Fx1aL5rK9Iv+AhAOaAbQ4Q238Wk=; b=bET6ytn2lT2mF1eUx4P9+FCwGFnmWdn1cKgwTWryUWvUxTsKiARZ94nVyiqrVy0lOZear6 G6dUUz3Ww+SF8A2KNg+RuJEIWoV9teW3wD8JYWr6k+gCKIg3Tg/KffMAVUoTg+YWPgD6Fv xAepO7FwpMKePLpoHv5uMvHj5Yr3MNcD8tnHjdGXJUrr8enbh56RXJg+dhJ1Qf1qmZOqLD TwEyvagcc5KvEdZGuxP+TLZcZkrbxJ7F1iBKCXPj/bG4VtDxvdDFnvLw0ICEqnMRGB1gHQ rUEc5WnFQAqwB6CvQXuAlvybey+MiJUXaYNBSjrBTMQI5avgGZ5E3VRGsZCq8A== Received: from [IPV6:2601:644:937f:4c50:401e:498a:e125:d6ae] (unknown [IPv6:2601:644:937f:4c50:401e:498a:e125:d6ae]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) (Authenticated sender: jhb) by smtp.freebsd.org (Postfix) with ESMTPSA id 4VWLKj3DFKzctf; Fri, 3 May 2024 19:22:33 +0000 (UTC) (envelope-from jhb@FreeBSD.org) Message-ID: <2a48e519-84ef-4346-80e9-a06b79a7b542@FreeBSD.org> Date: Fri, 3 May 2024 12:22:32 -0700 List-Id: Commit messages for the main branch of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-main List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: dev-commits-src-main@freebsd.org Sender: owner-dev-commits-src-main@FreeBSD.org MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: git: a8089ea5aee5 - main - nvmfd: A simple userspace daemon for the NVMe over Fabrics controller Content-Language: en-US From: John Baldwin To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org References: <202405030016.4430GFq8080425@gitrepo.freebsd.org> In-Reply-To: <202405030016.4430GFq8080425@gitrepo.freebsd.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 5/2/24 5:16 PM, John Baldwin wrote: > The branch main has been updated by jhb: > > URL: https://cgit.FreeBSD.org/src/commit/?id=a8089ea5aee578e08acab2438e82fc9a9ae50ed8 > > commit a8089ea5aee578e08acab2438e82fc9a9ae50ed8 > Author: John Baldwin > AuthorDate: 2024-05-02 23:35:40 +0000 > Commit: John Baldwin > CommitDate: 2024-05-02 23:38:39 +0000 > > nvmfd: A simple userspace daemon for the NVMe over Fabrics controller I'm sure there are some subtle bugs I've missed somewhere, but I have tested the host and controller against each other (both userspace and kernel) as well as against Linux. Some of the patches Warner approved in Phab he specifically noted as being looked over, but not in detail due to the size, etc. I kind of think we might want a separate tag for those types of reviews. In GDB, we use an 'Acked-by' tag to mean that a commit is approved, but it's not had a detailed technical review the way 'Reviewed-by' implies. If we had such a tag here, some of these commits probably would have used Acked-by instead of Reviewed by. Here are some initial notes on using NVMeoF. They might be a good candidate for the handbook. (If we don't yet have notes on iSCSI for the handbook we should add those as well, maybe I will get around to that as well): # Overview NVMe over Fabrics supports access to remote block storage devices as NVMe namespaces across a network connection similar to using iSCSI to acccess remote block storage devices as SCSI LUNs. FreeBSD includes support for accessing remote namespaces via a host driver as well as support for exporting local storage devices as namespaces to remote hosts. NVMe over Fabrics supports multiple transport layers including FibreChannel, RDMA (over both iWARP and ROCE) and TCP. FreeBSD only includes support for the TCP transport currently. Enabling support requires loading a kernel module for the transport to use in addition to the host or controller module. The TCP transport is provided by `nvmf_tcp.ko`. # Host (Initiator) The fabrics host on FreeBSD exposes remote controllers as `nvmeX` new-bus devices similar to PCI-express NVMe controllers. Remote namespaces are exposed via `ndaX` disk devices via CAM. The fabrics host driver does not support the `nvd` disk driver. ## Discovery Service NVMe over Fabrics defines a discovery service. A discovery controller exports a log page enumerating a set of one or more controllers. Each log page entry contains the type of a controller (I/O or discovery) as well as the transport type and transport-specific address. For the TCP transport the address includes the IP address and TCP port number. nvmecontrol(8) supports a `discover` command to query the log page from a discovery controller. Example 1: The Discovery Log Page from a Linux Controller ``` # nvmecontrol discover ubuntu:4420 Discovery ========= Entry 01 ======== Transport type:       TCP Address family:       AF_INET Subsystem type:       NVMe SQ flow control:      optional Secure Channel:       Not specified Port ID:              1 Controller ID:        Dynamic Max Admin SQ Size:    32 Sub NQN:              nvme-test-target Transport address:    10.0.0.118 Service identifier:   4420 Security Type:        None ``` ## Connecting To an I/O Controller nvmecontrol(8) supports `connect` command to establish an association with a remote controller. Once the association is established, it is handed off to the in-kernel host which creates a new `nvmeX` device. Example 2: Connecting to an I/O Controller ``` # kldload nvmf nvmf_tcp # nvmecontrol connect ubuntu:4420 nvme-test-target ``` This results in the following lines in dmesg: ``` nvme0: nda0 at nvme0 bus 0 scbus0 target 0 lun 1 nda0: nda0: Serial Number 843bf4f791f9cdb03d8b nda0: nvme version 1.3 nda0: 1024MB (2097152 512 byte sectors) ``` The new `nvme0` device can now be used with other nvmecontrol(8) commands such as `identify` similar to PCI-express controllers. Example 3: Identify a Remote I/O Controller ``` # nvmecontrol identify nvme0 Controller Capabilities/Features ================================ ... Model Number:                Linux Firmware Version:            5.15.0-8 ... Fabrics Attributes ================== I/O Command Capsule Size: 16448 bytes I/O Response Capsule Size: 16 bytes In Capsule Data Offset: 0 bytes Controller Model: Dynamic Max SGL Descriptors: 1 Disconnect of I/O Queues: Not Supported ``` The `nda0` disk device can be used like any other NVMe disk device. ## Connecting via Discovery nvmecontrol(8)'s `connect-all` command fetches the discovery log page from the specified discovery controller and creates an association for each log page entry. ## Disconnecting nvmecontrol(8)'s `disconnect` command detaches the namespaces from a remote controller and destroys the association. Example 4: Disconnecting From a Remote I/O Controller ``` # nvmecontrol disconnect nvme0 ``` The `disconnect-all` command destroys associations with all remote controllers. ## Reconnecting If a connection is interrupted (for example, TCP connections die), the association is torn down (all queues are disconnected), but the `nvmeX` device is left in a quiesced state. Any pending I/O requests for remote namespaces are left pending as well. In this state, the `reconnect` command can be used to establish a new association to resume operation with a remote controller. Example 5: Reconnecting to a Remote I/O Controller ``` # nvmecontrol reconnect nvme0 ubuntu:4420 nvme-test-target ``` # Controller (Target) The fabrics controller on FreeBSD exposes local block devices as NVMe namespaces to remote hosts. The controller support on FreeBSD includes a userland implementation of a discovery controller as well as an in-kernel I/O controller. Similar to the existing iSCSI target in FreeBSD, the in-kernel I/O controller uses CAM's target layer (ctl(4)). Block devices are created by adding ctl(4) LUNs via ctladm(8). The discovery service and initial handling of I/O controller connections is managed by the nvmfd(8) daemon. Example 6: Exporting a local ZFS Volume ``` # kldload nvmft nvmf_tcp # ctladm create -b block -o file=/dev/zvol/bhyve/iscsi LUN created successfully backend: block device type: 0 LUN size: 4294967296 bytes blocksize 512 bytes LUN ID: 0 Serial Number: MYSERIAL0000 Device ID: MYDEVID0000 # nvmfd -F -p 4420 -n nqn.2001-03.com.chelsio:frodo0 -K ``` Open associations can be listed via `ctladm nvlist` and can be disconnected via `ctladmin nvterminate`. Eventually NVMe support should be added to ctld(8) by merging nvmfd(8) into ctld(8). -- John Baldwin