From nobody Wed Nov 06 15:17:41 2024 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4Xk82t1jf2z5ch6y; Wed, 06 Nov 2024 15:17:42 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R10" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Xk82t1N8Cz4G8r; Wed, 6 Nov 2024 15:17:42 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1730906262; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=Gqu9fxwcdVXLaGAcs7qqJI48FLVJYuYHk+P73mk7Igs=; b=dFf5U7EITjNOEAGWHY0O10fZKMotVkzHZwz9KE3lQsw8RHCbBTspisGW4eFTql5jZRLq5A t6Zq+olWumhmTjedYU5sIyEJNYabKXxebSSLhhNMBJJwcVDbw8RrZrTV4s7KOwmzK5vn8h i2VTnVHn7ir0rKNJ9h7BUSvDIw1TkAMRI5bip711a3HPw4ryWpkWlftsnoahcn36gkX5hh 0kIqJaBNtoy1fd9M0JQ8TfYA1fW+NUSKhvrF4RZ5GMlYu7N5gtrwvzDuTt/S3yqPFE+bgA oD+GTuYvkU9Qq+05l40E6qph73LwV3yHxmZXjHYdG7PrfpRYdwq8LTgwXPC1lA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1730906262; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=Gqu9fxwcdVXLaGAcs7qqJI48FLVJYuYHk+P73mk7Igs=; b=ttATP90uz3kV357deVK9FI0B2tnjXJ2k/bG9jYXzMkk+/8O4ETWhz3L6+wf6EnqM/XsceG 3NSqyJW0dgU4WiY7CZiLrgvIrD3nyHQCe+KLg6Eo8MLhEAr0UMsTBrmrf+XaGCO0RtX0Jm DIARRr5ivTzw2LK0opNe1oDppD7mhOXBcVSAxxma6YKsnsHx0WhWMiaRZWrzqeEoIjtaV2 PGubb9W3ifK7Muv6Od2b640vZdxoVtEk/mLBULiI+ybRMerxBqN07mfKv6bFLa7bjlHYkU +CZOMw7JD7VumJOjr/U2Qbof7vi1Cuyl4OjcqEhaG8uk6pDfHNyHNjmu00yJJA== ARC-Authentication-Results: i=1; mx1.freebsd.org; none ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1730906262; a=rsa-sha256; cv=none; b=sa7GJZyd80tAaLF+qfj9JxHzMcoY2p8DYo2h4RVrdRdJ7BYOITkJDcVIVsX4P/P933hEGz 10U9v788+Uc1K1PopeZ8joCejlsSd3YasvofYXIbSM/skugXvMONoBvZSAcbHH2TjhCs74 cb5c196bTt5ORDEwODGf8rukOhKd1WD/IvIN1LS3cQktGN53jV/CKSM+ZlY+EvRmoy3dcM lYbhHa/mITot6p9qQhXDX5VXDHkiQdO0BLm/Rb/PHp43VyojuvqI+nJzyQFT7yRaY/ddfv Yae9Nu66zGI8wLsVoab3qd2wVNXNy0Nfci8Zj7ftiQLDnP3C2LhQNXRvlFhzpQ== Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4Xk82t0q6qz15M0; Wed, 6 Nov 2024 15:17:42 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.18.1/8.18.1) with ESMTP id 4A6FHgFI043135; Wed, 6 Nov 2024 15:17:42 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.18.1/8.18.1/Submit) id 4A6FHfYx043132; Wed, 6 Nov 2024 15:17:41 GMT (envelope-from git) Date: Wed, 6 Nov 2024 15:17:41 GMT Message-Id: <202411061517.4A6FHfYx043132@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org From: Mark Johnston Subject: git: d438b4ef0cfc - main - gve: Add DQO RDA support List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: X-BeenThere: dev-commits-src-all@freebsd.org Sender: owner-dev-commits-src-all@FreeBSD.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: markj X-Git-Repository: src X-Git-Refname: refs/heads/main X-Git-Reftype: branch X-Git-Commit: d438b4ef0cfc6986b93d0754f49ebf3ead50f269 Auto-Submitted: auto-generated The branch main has been updated by markj: URL: https://cgit.FreeBSD.org/src/commit/?id=d438b4ef0cfc6986b93d0754f49ebf3ead50f269 commit d438b4ef0cfc6986b93d0754f49ebf3ead50f269 Author: Shailend Chand AuthorDate: 2024-11-05 19:38:29 +0000 Commit: Mark Johnston CommitDate: 2024-11-06 15:06:41 +0000 gve: Add DQO RDA support DQO is the descriptor format for our next generation virtual NIC. It is necessary to make full use of the hardware bandwidth on many newer GCP VM shapes. One major change with DQO from its predecessor GQI is that it uses dual descriptor rings for both TX and RX queues. The TX path uses a descriptor ring to send descriptors to HW, and receives packet completion events on a TX completion ring. The RX path posts buffers to HW using an RX descriptor ring and receives incoming packets on an RX completion ring. In GQI-QPL, the hardware could not access arbitrary regions of guest memory, which is why there was a pre-negotitated bounce buffer (QPL: Queue Page List). DQO-RDA has no such limitation. "RDA" is in contrast to QPL and stands for "Raw DMA Addressing" which just means that HW does not need a fixed bounce buffer and can DMA arbitrary regions of guest memory. A subsequent patch will introduce the DQO-QPL datapath that uses the same descriptor format as in this patch, but will have a fixed bounce buffer. Signed-off-by: Shailend Chand Reviewed-by: markj MFC-after: 2 weeks Differential Revision: https://reviews.freebsd.org/D46690 --- share/man/man4/gve.4 | 53 +++- sys/conf/files | 2 + sys/dev/gve/gve.h | 213 +++++++++++-- sys/dev/gve/gve_adminq.c | 123 +++++++- sys/dev/gve/gve_adminq.h | 55 +++- sys/dev/gve/gve_dqo.h | 306 ++++++++++++++++++ sys/dev/gve/gve_main.c | 67 +++- sys/dev/gve/gve_plat.h | 3 + sys/dev/gve/gve_rx.c | 110 +++++-- sys/dev/gve/gve_rx_dqo.c | 633 +++++++++++++++++++++++++++++++++++++ sys/dev/gve/gve_sysctl.c | 60 +++- sys/dev/gve/gve_tx.c | 139 ++++++--- sys/dev/gve/gve_tx_dqo.c | 793 +++++++++++++++++++++++++++++++++++++++++++++++ sys/dev/gve/gve_utils.c | 46 ++- sys/modules/gve/Makefile | 12 +- 15 files changed, 2471 insertions(+), 144 deletions(-) diff --git a/share/man/man4/gve.4 b/share/man/man4/gve.4 index a674d6b64803..d42e4ae293a7 100644 --- a/share/man/man4/gve.4 +++ b/share/man/man4/gve.4 @@ -1,6 +1,6 @@ .\" SPDX-License-Identifier: BSD-3-Clause .\" -.\" Copyright (c) 2023 Google LLC +.\" Copyright (c) 2023-2024 Google LLC .\" .\" Redistribution and use in source and binary forms, with or without modification, .\" are permitted provided that the following conditions are met: @@ -26,7 +26,7 @@ .\" ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT .\" (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS .\" SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -.Dd April 26, 2023 +.Dd October 14, 2024 .Dt GVE 4 .Os .Sh NAME @@ -192,16 +192,61 @@ These two messages correspond to the NIC alerting the driver to link state chang .Pp Apart from these messages, the driver exposes per-queue packet and error counters as sysctl nodes. Global (across queues) counters can be read using -.Xr netstat 8 . +.Xr netstat 1 . +.Sh SYSCTL VARIABLES +.Nm +exposes the following +.Xr sysctl 8 +variables: +.Bl -tag -width indent +.It Va hw.gve.driver_version +The driver version. +This is read-only. +.It Va hw.gve.queue_format +The queue format in use. +This is read-only. +.It Va hw.gve.disable_hw_lro +Setting this boot-time tunable to 1 disables Large Receive Offload (LRO) in the NIC. +The default value is 0, which means hardware LRO is enabled by default. +The software LRO stack in the kernel is always used. +This sysctl variable needs to be set before loading the driver, using +.Xr loader.conf 5 . +.El .Sh LIMITATIONS .Nm does not support the transmission of VLAN-tagged packets. All VLAN-tagged traffic is dropped. +.Sh QUEUE FORMATS +.Nm +features different datapath modes called queue formats: +.Pp +.Bl -bullet -compact +.It +GQI_QPL: "QPL" stands for "Queue Page List" and refers to the fact that +hardware expects a fixed bounce buffer and cannot access arbitrary memory. +GQI is the older descriptor format. +The G in "GQI" refers to an older generation of hardware, and the "QI" +stands for "Queue In-order" referring to the fact that the NIC sends +Tx and Rx completions in the same order as the one in which the corresponding +descriptors were posted by the driver. +.It +DQO_RDA: DQO is the descriptor format required to take full advantage of +next generation VM shapes. +"RDA" stands for "Raw DMA Addressing" and refers to the fact that hardware +can work with DMA-ed packets and does not expect them to be copied into or +out of a fixed bounce buffer. +The D in "DQO" refers to a newer generation of hardware, and the "QO" +stands for "Queue Out-of-order" referring to the fact that the NIC might +send Tx and Rx completions in an order different from the one in which +the corresponding descriptors were posted by the driver. +.El .Sh SUPPORT Please email gvnic-drivers@google.com with the specifics of the issue encountered. .Sh SEE ALSO +.Xr netstat 1 , +.Xr loader.conf 5 , .Xr ifconfig 8 , -.Xr netstat 8 +.Xr sysctl 8 .Sh HISTORY The .Nm diff --git a/sys/conf/files b/sys/conf/files index d04e75be3793..7bf2cffe8b09 100644 --- a/sys/conf/files +++ b/sys/conf/files @@ -1732,8 +1732,10 @@ dev/gve/gve_adminq.c optional gve dev/gve/gve_main.c optional gve dev/gve/gve_qpl.c optional gve dev/gve/gve_rx.c optional gve +dev/gve/gve_rx_dqo.c optional gve dev/gve/gve_sysctl.c optional gve dev/gve/gve_tx.c optional gve +dev/gve/gve_tx_dqo.c optional gve dev/gve/gve_utils.c optional gve dev/goldfish/goldfish_rtc.c optional goldfish_rtc fdt dev/gpio/acpi_gpiobus.c optional acpi gpio diff --git a/sys/dev/gve/gve.h b/sys/dev/gve/gve.h index c446199dff2d..98f1139c6bc2 100644 --- a/sys/dev/gve/gve.h +++ b/sys/dev/gve/gve.h @@ -1,7 +1,7 @@ /*- * SPDX-License-Identifier: BSD-3-Clause * - * Copyright (c) 2023 Google LLC + * Copyright (c) 2023-2024 Google LLC * * Redistribution and use in source and binary forms, with or without modification, * are permitted provided that the following conditions are met: @@ -53,6 +53,9 @@ /* Each RX bounce buffer page can fit two packet buffers. */ #define GVE_DEFAULT_RX_BUFFER_OFFSET (PAGE_SIZE / 2) +/* PTYPEs are always 10 bits. */ +#define GVE_NUM_PTYPES 1024 + /* * Number of descriptors per queue page list. * Page count AKA QPL size can be derived by dividing the number of elements in @@ -224,30 +227,61 @@ struct gve_rxq_stats { counter_u64_t rx_frag_copy_cnt; counter_u64_t rx_dropped_pkt_desc_err; counter_u64_t rx_dropped_pkt_mbuf_alloc_fail; + counter_u64_t rx_mbuf_dmamap_err; + counter_u64_t rx_mbuf_mclget_null; }; #define NUM_RX_STATS (sizeof(struct gve_rxq_stats) / sizeof(counter_u64_t)) +struct gve_rx_buf_dqo { + struct mbuf *mbuf; + bus_dmamap_t dmamap; + uint64_t addr; + bool mapped; + SLIST_ENTRY(gve_rx_buf_dqo) slist_entry; +}; + /* power-of-2 sized receive ring */ struct gve_rx_ring { struct gve_ring_com com; struct gve_dma_handle desc_ring_mem; - struct gve_dma_handle data_ring_mem; - - /* accessed in the receive hot path */ - struct { - struct gve_rx_desc *desc_ring; - union gve_rx_data_slot *data_ring; - struct gve_rx_slot_page_info *page_info; - - struct gve_rx_ctx ctx; - struct lro_ctrl lro; - uint8_t seq_no; /* helps traverse the descriptor ring */ - uint32_t cnt; /* free-running total number of completed packets */ - uint32_t fill_cnt; /* free-running total number of descs and buffs posted */ - uint32_t mask; /* masks the cnt and fill_cnt to the size of the ring */ - struct gve_rxq_stats stats; - } __aligned(CACHE_LINE_SIZE); + uint32_t cnt; /* free-running total number of completed packets */ + uint32_t fill_cnt; /* free-running total number of descs and buffs posted */ + + union { + /* GQI-only fields */ + struct { + struct gve_dma_handle data_ring_mem; + + /* accessed in the GQ receive hot path */ + struct gve_rx_desc *desc_ring; + union gve_rx_data_slot *data_ring; + struct gve_rx_slot_page_info *page_info; + uint32_t mask; /* masks the cnt and fill_cnt to the size of the ring */ + uint8_t seq_no; /* helps traverse the descriptor ring */ + }; + + /* DQO-only fields */ + struct { + struct gve_dma_handle compl_ring_mem; + + struct gve_rx_compl_desc_dqo *compl_ring; + struct gve_rx_desc_dqo *desc_ring; + struct gve_rx_buf_dqo *bufs; /* Parking place for posted buffers */ + bus_dma_tag_t buf_dmatag; /* To dmamap posted mbufs with */ + + uint32_t buf_cnt; /* Size of the bufs array */ + uint32_t mask; /* One less than the sizes of the desc and compl rings */ + uint32_t head; /* The index at which to post the next buffer at */ + uint32_t tail; /* The index at which to receive the next compl at */ + uint8_t cur_gen_bit; /* Gets flipped on every cycle of the compl ring */ + SLIST_HEAD(, gve_rx_buf_dqo) free_bufs; + } dqo; + }; + + struct lro_ctrl lro; + struct gve_rx_ctx ctx; + struct gve_rxq_stats stats; } __aligned(CACHE_LINE_SIZE); @@ -277,11 +311,26 @@ struct gve_txq_stats { counter_u64_t tx_dropped_pkt; counter_u64_t tx_dropped_pkt_nospace_device; counter_u64_t tx_dropped_pkt_nospace_bufring; + counter_u64_t tx_delayed_pkt_nospace_descring; + counter_u64_t tx_delayed_pkt_nospace_compring; + counter_u64_t tx_delayed_pkt_tsoerr; counter_u64_t tx_dropped_pkt_vlan; + counter_u64_t tx_mbuf_collapse; + counter_u64_t tx_mbuf_defrag; + counter_u64_t tx_mbuf_defrag_err; + counter_u64_t tx_mbuf_dmamap_enomem_err; + counter_u64_t tx_mbuf_dmamap_err; }; #define NUM_TX_STATS (sizeof(struct gve_txq_stats) / sizeof(counter_u64_t)) +struct gve_tx_pending_pkt_dqo { + struct mbuf *mbuf; + bus_dmamap_t dmamap; + uint8_t state; /* the gve_packet_state enum */ + int next; /* To chain the free_pending_pkts lists */ +}; + /* power-of-2 sized transmit ring */ struct gve_tx_ring { struct gve_ring_com com; @@ -290,22 +339,95 @@ struct gve_tx_ring { struct task xmit_task; struct taskqueue *xmit_tq; - /* accessed in the transmit hot path */ - struct { - union gve_tx_desc *desc_ring; - struct gve_tx_buffer_state *info; - struct buf_ring *br; + /* Accessed when writing descriptors */ + struct buf_ring *br; + struct mtx ring_mtx; + + uint32_t req; /* free-running total number of packets written to the nic */ + uint32_t done; /* free-running total number of completed packets */ + + union { + /* GQI specific stuff */ + struct { + union gve_tx_desc *desc_ring; + struct gve_tx_buffer_state *info; + + struct gve_tx_fifo fifo; + + uint32_t mask; /* masks the req and done to the size of the ring */ + }; + + /* DQO specific stuff */ + struct { + struct gve_dma_handle compl_ring_mem; + + /* Accessed when writing descriptors */ + struct { + union gve_tx_desc_dqo *desc_ring; + uint32_t desc_mask; /* masks head and tail to the size of desc_ring */ + uint32_t desc_head; /* last desc read by NIC, cached value of hw_tx_head */ + uint32_t desc_tail; /* last desc written by driver */ + uint32_t last_re_idx; /* desc which last had "report event" set */ + + /* + * The head index of a singly linked list containing pending packet objects + * to park mbufs till the NIC sends completions. Once this list is depleted, + * the "_prd" suffixed producer list, grown by the completion taskqueue, + * is stolen. + */ + int32_t free_pending_pkts_csm; + + bus_dma_tag_t buf_dmatag; /* DMA params for mapping Tx mbufs */ + } __aligned(CACHE_LINE_SIZE); + + /* Accessed when processing completions */ + struct { + struct gve_tx_compl_desc_dqo *compl_ring; + uint32_t compl_mask; /* masks head to the size of compl_ring */ + uint32_t compl_head; /* last completion read by driver */ + uint8_t cur_gen_bit; /* NIC flips a bit on every pass */ + uint32_t hw_tx_head; /* last desc read by NIC */ + + /* + * The completion taskqueue moves pending-packet objects to this + * list after freeing the mbuf. The "_prd" denotes that this is + * a producer list. The trasnmit taskqueue steals this list once + * its consumer list, with the "_csm" suffix, is depleted. + */ + int32_t free_pending_pkts_prd; + } __aligned(CACHE_LINE_SIZE); + + /* Accessed by both the completion and xmit loops */ + struct { + /* completion tags index into this array */ + struct gve_tx_pending_pkt_dqo *pending_pkts; + uint16_t num_pending_pkts; + } __aligned(CACHE_LINE_SIZE); + } dqo; + }; + struct gve_txq_stats stats; +} __aligned(CACHE_LINE_SIZE); - struct gve_tx_fifo fifo; - struct mtx ring_mtx; +enum gve_packet_state { + /* + * Packet does not yet have a dmamap created. + * This should always be zero since state is not explicitly initialized. + */ + GVE_PACKET_STATE_UNALLOCATED, + /* Packet has a dmamap and is in free list, available to be allocated. */ + GVE_PACKET_STATE_FREE, + /* Packet is expecting a regular data completion */ + GVE_PACKET_STATE_PENDING_DATA_COMPL, +}; - uint32_t req; /* free-running total number of packets written to the nic */ - uint32_t done; /* free-running total number of completed packets */ - uint32_t mask; /* masks the req and done to the size of the ring */ - struct gve_txq_stats stats; - } __aligned(CACHE_LINE_SIZE); +struct gve_ptype { + uint8_t l3_type; /* `gve_l3_type` in gve_adminq.h */ + uint8_t l4_type; /* `gve_l4_type` in gve_adminq.h */ +}; -} __aligned(CACHE_LINE_SIZE); +struct gve_ptype_lut { + struct gve_ptype ptypes[GVE_NUM_PTYPES]; +}; struct gve_priv { if_t ifp; @@ -348,6 +470,8 @@ struct gve_priv { struct gve_tx_ring *tx; struct gve_rx_ring *rx; + struct gve_ptype_lut *ptype_lut_dqo; + /* * Admin queue - see gve_adminq.h * Since AQ cmds do not run in steady state, 32 bit counters suffice @@ -370,6 +494,7 @@ struct gve_priv { uint32_t adminq_dcfg_device_resources_cnt; uint32_t adminq_set_driver_parameter_cnt; uint32_t adminq_verify_driver_compatibility_cnt; + uint32_t adminq_get_ptype_map_cnt; uint32_t interface_up_cnt; uint32_t interface_down_cnt; @@ -400,6 +525,12 @@ gve_clear_state_flag(struct gve_priv *priv, int pos) BIT_CLR_ATOMIC(GVE_NUM_STATE_FLAGS, pos, &priv->state_flags); } +static inline bool +gve_is_gqi(struct gve_priv *priv) +{ + return (priv->queue_format == GVE_GQI_QPL_FORMAT); +} + /* Defined in gve_main.c */ void gve_schedule_reset(struct gve_priv *priv); @@ -407,6 +538,7 @@ void gve_schedule_reset(struct gve_priv *priv); uint32_t gve_reg_bar_read_4(struct gve_priv *priv, bus_size_t offset); void gve_reg_bar_write_4(struct gve_priv *priv, bus_size_t offset, uint32_t val); void gve_db_bar_write_4(struct gve_priv *priv, bus_size_t offset, uint32_t val); +void gve_db_bar_dqo_write_4(struct gve_priv *priv, bus_size_t offset, uint32_t val); /* QPL (Queue Page List) functions defined in gve_qpl.c */ int gve_alloc_qpls(struct gve_priv *priv); @@ -425,6 +557,14 @@ void gve_qflush(if_t ifp); void gve_xmit_tq(void *arg, int pending); void gve_tx_cleanup_tq(void *arg, int pending); +/* TX functions defined in gve_tx_dqo.c */ +int gve_tx_alloc_ring_dqo(struct gve_priv *priv, int i); +void gve_tx_free_ring_dqo(struct gve_priv *priv, int i); +void gve_clear_tx_ring_dqo(struct gve_priv *priv, int i); +int gve_tx_intr_dqo(void *arg); +int gve_xmit_dqo(struct gve_tx_ring *tx, struct mbuf **mbuf_ptr); +void gve_tx_cleanup_tq_dqo(void *arg, int pending); + /* RX functions defined in gve_rx.c */ int gve_alloc_rx_rings(struct gve_priv *priv); void gve_free_rx_rings(struct gve_priv *priv); @@ -433,6 +573,14 @@ int gve_destroy_rx_rings(struct gve_priv *priv); int gve_rx_intr(void *arg); void gve_rx_cleanup_tq(void *arg, int pending); +/* RX functions defined in gve_rx_dqo.c */ +int gve_rx_alloc_ring_dqo(struct gve_priv *priv, int i); +void gve_rx_free_ring_dqo(struct gve_priv *priv, int i); +void gve_rx_prefill_buffers_dqo(struct gve_rx_ring *rx); +void gve_clear_rx_ring_dqo(struct gve_priv *priv, int i); +int gve_rx_intr_dqo(void *arg); +void gve_rx_cleanup_tq_dqo(void *arg, int pending); + /* DMA functions defined in gve_utils.c */ int gve_dma_alloc_coherent(struct gve_priv *priv, int size, int align, struct gve_dma_handle *dma); @@ -447,7 +595,10 @@ int gve_alloc_irqs(struct gve_priv *priv); void gve_unmask_all_queue_irqs(struct gve_priv *priv); void gve_mask_all_queue_irqs(struct gve_priv *priv); -/* Systcl functions defined in gve_sysctl.c*/ +/* Systcl functions defined in gve_sysctl.c */ +extern bool gve_disable_hw_lro; +extern char gve_queue_format[8]; +extern char gve_version[8]; void gve_setup_sysctl(struct gve_priv *priv); void gve_accum_stats(struct gve_priv *priv, uint64_t *rpackets, uint64_t *rbytes, uint64_t *rx_dropped_pkt, uint64_t *tpackets, diff --git a/sys/dev/gve/gve_adminq.c b/sys/dev/gve/gve_adminq.c index 3c332607ebd4..7865b979888b 100644 --- a/sys/dev/gve/gve_adminq.c +++ b/sys/dev/gve/gve_adminq.c @@ -1,7 +1,7 @@ /*- * SPDX-License-Identifier: BSD-3-Clause * - * Copyright (c) 2023 Google LLC + * Copyright (c) 2023-2024 Google LLC * * Redistribution and use in source and binary forms, with or without modification, * are permitted provided that the following conditions are met: @@ -57,6 +57,7 @@ void gve_parse_device_option(struct gve_priv *priv, struct gve_device_descriptor *device_descriptor, struct gve_device_option *option, struct gve_device_option_gqi_qpl **dev_op_gqi_qpl, + struct gve_device_option_dqo_rda **dev_op_dqo_rda, struct gve_device_option_jumbo_frames **dev_op_jumbo_frames) { uint32_t req_feat_mask = be32toh(option->required_features_mask); @@ -85,6 +86,23 @@ void gve_parse_device_option(struct gve_priv *priv, *dev_op_gqi_qpl = (void *)(option + 1); break; + case GVE_DEV_OPT_ID_DQO_RDA: + if (option_length < sizeof(**dev_op_dqo_rda) || + req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_DQO_RDA) { + device_printf(priv->dev, GVE_DEVICE_OPTION_ERROR_FMT, + "DQO RDA", (int)sizeof(**dev_op_dqo_rda), + GVE_DEV_OPT_REQ_FEAT_MASK_DQO_RDA, + option_length, req_feat_mask); + break; + } + + if (option_length > sizeof(**dev_op_dqo_rda)) { + device_printf(priv->dev, GVE_DEVICE_OPTION_TOO_BIG_FMT, + "DQO RDA"); + } + *dev_op_dqo_rda = (void *)(option + 1); + break; + case GVE_DEV_OPT_ID_JUMBO_FRAMES: if (option_length < sizeof(**dev_op_jumbo_frames) || req_feat_mask != GVE_DEV_OPT_REQ_FEAT_MASK_JUMBO_FRAMES) { @@ -117,6 +135,7 @@ static int gve_process_device_options(struct gve_priv *priv, struct gve_device_descriptor *descriptor, struct gve_device_option_gqi_qpl **dev_op_gqi_qpl, + struct gve_device_option_dqo_rda **dev_op_dqo_rda, struct gve_device_option_jumbo_frames **dev_op_jumbo_frames) { char *desc_end = (char *)descriptor + be16toh(descriptor->total_length); @@ -130,12 +149,12 @@ gve_process_device_options(struct gve_priv *priv, if ((char *)(dev_opt + 1) > desc_end || (char *)(dev_opt + 1) + be16toh(dev_opt->option_length) > desc_end) { device_printf(priv->dev, - "options exceed device_descriptor's total length.\n"); + "options exceed device descriptor's total length.\n"); return (EINVAL); } gve_parse_device_option(priv, descriptor, dev_opt, - dev_op_gqi_qpl, dev_op_jumbo_frames); + dev_op_gqi_qpl, dev_op_dqo_rda, dev_op_jumbo_frames); dev_opt = (void *)((char *)(dev_opt + 1) + be16toh(dev_opt->option_length)); } @@ -221,16 +240,35 @@ gve_adminq_create_rx_queue(struct gve_priv *priv, uint32_t queue_index) cmd.opcode = htobe32(GVE_ADMINQ_CREATE_RX_QUEUE); cmd.create_rx_queue = (struct gve_adminq_create_rx_queue) { .queue_id = htobe32(queue_index), - .index = htobe32(queue_index), .ntfy_id = htobe32(rx->com.ntfy_id), .queue_resources_addr = htobe64(qres_dma->bus_addr), - .rx_desc_ring_addr = htobe64(rx->desc_ring_mem.bus_addr), - .rx_data_ring_addr = htobe64(rx->data_ring_mem.bus_addr), - .queue_page_list_id = htobe32((rx->com.qpl)->id), .rx_ring_size = htobe16(priv->rx_desc_cnt), .packet_buffer_size = htobe16(GVE_DEFAULT_RX_BUFFER_SIZE), }; + if (gve_is_gqi(priv)) { + cmd.create_rx_queue.rx_desc_ring_addr = + htobe64(rx->desc_ring_mem.bus_addr); + cmd.create_rx_queue.rx_data_ring_addr = + htobe64(rx->data_ring_mem.bus_addr); + cmd.create_rx_queue.index = + htobe32(queue_index); + cmd.create_rx_queue.queue_page_list_id = + htobe32((rx->com.qpl)->id); + } else { + cmd.create_rx_queue.queue_page_list_id = + htobe32(GVE_RAW_ADDRESSING_QPL_ID); + cmd.create_rx_queue.rx_desc_ring_addr = + htobe64(rx->dqo.compl_ring_mem.bus_addr); + cmd.create_rx_queue.rx_data_ring_addr = + htobe64(rx->desc_ring_mem.bus_addr); + cmd.create_rx_queue.rx_buff_ring_size = + htobe16(priv->rx_desc_cnt); + cmd.create_rx_queue.enable_rsc = + !!((if_getcapenable(priv->ifp) & IFCAP_LRO) && + !gve_disable_hw_lro); + } + return (gve_adminq_execute_cmd(priv, &cmd)); } @@ -272,11 +310,21 @@ gve_adminq_create_tx_queue(struct gve_priv *priv, uint32_t queue_index) .queue_id = htobe32(queue_index), .queue_resources_addr = htobe64(qres_dma->bus_addr), .tx_ring_addr = htobe64(tx->desc_ring_mem.bus_addr), - .queue_page_list_id = htobe32((tx->com.qpl)->id), .ntfy_id = htobe32(tx->com.ntfy_id), .tx_ring_size = htobe16(priv->tx_desc_cnt), }; + if (gve_is_gqi(priv)) { + cmd.create_tx_queue.queue_page_list_id = + htobe32((tx->com.qpl)->id); + } else { + cmd.create_tx_queue.queue_page_list_id = + htobe32(GVE_RAW_ADDRESSING_QPL_ID); + cmd.create_tx_queue.tx_comp_ring_addr = + htobe64(tx->dqo.compl_ring_mem.bus_addr); + cmd.create_tx_queue.tx_comp_ring_size = + htobe16(priv->tx_desc_cnt); + } return (gve_adminq_execute_cmd(priv, &cmd)); } @@ -338,6 +386,7 @@ gve_adminq_describe_device(struct gve_priv *priv) struct gve_device_descriptor *desc; struct gve_dma_handle desc_mem; struct gve_device_option_gqi_qpl *dev_op_gqi_qpl = NULL; + struct gve_device_option_dqo_rda *dev_op_dqo_rda = NULL; struct gve_device_option_jumbo_frames *dev_op_jumbo_frames = NULL; uint32_t supported_features_mask = 0; int rc; @@ -366,12 +415,24 @@ gve_adminq_describe_device(struct gve_priv *priv) bus_dmamap_sync(desc_mem.tag, desc_mem.map, BUS_DMASYNC_POSTREAD); - rc = gve_process_device_options(priv, desc, &dev_op_gqi_qpl, + rc = gve_process_device_options(priv, desc, + &dev_op_gqi_qpl, &dev_op_dqo_rda, &dev_op_jumbo_frames); if (rc != 0) goto free_device_descriptor; - if (dev_op_gqi_qpl != NULL) { + if (dev_op_dqo_rda != NULL) { + snprintf(gve_queue_format, sizeof(gve_queue_format), + "%s", "DQO RDA"); + priv->queue_format = GVE_DQO_RDA_FORMAT; + supported_features_mask = be32toh( + dev_op_dqo_rda->supported_features_mask); + if (bootverbose) + device_printf(priv->dev, + "Driver is running with DQO RDA queue format.\n"); + } else if (dev_op_gqi_qpl != NULL) { + snprintf(gve_queue_format, sizeof(gve_queue_format), + "%s", "GQI QPL"); priv->queue_format = GVE_GQI_QPL_FORMAT; supported_features_mask = be32toh( dev_op_gqi_qpl->supported_features_mask); @@ -380,7 +441,7 @@ gve_adminq_describe_device(struct gve_priv *priv) "Driver is running with GQI QPL queue format.\n"); } else { device_printf(priv->dev, "No compatible queue formats\n"); - rc = (EINVAL); + rc = EINVAL; goto free_device_descriptor; } @@ -506,6 +567,41 @@ gve_adminq_verify_driver_compatibility(struct gve_priv *priv, return (gve_adminq_execute_cmd(priv, &aq_cmd)); } +int +gve_adminq_get_ptype_map_dqo(struct gve_priv *priv, + struct gve_ptype_lut *ptype_lut_dqo) +{ + struct gve_adminq_command aq_cmd = (struct gve_adminq_command){}; + struct gve_ptype_map *ptype_map; + struct gve_dma_handle dma; + int err = 0; + int i; + + err = gve_dma_alloc_coherent(priv, sizeof(*ptype_map), PAGE_SIZE, &dma); + if (err) + return (err); + ptype_map = dma.cpu_addr; + + aq_cmd.opcode = htobe32(GVE_ADMINQ_GET_PTYPE_MAP); + aq_cmd.get_ptype_map = (struct gve_adminq_get_ptype_map) { + .ptype_map_len = htobe64(sizeof(*ptype_map)), + .ptype_map_addr = htobe64(dma.bus_addr), + }; + + err = gve_adminq_execute_cmd(priv, &aq_cmd); + if (err) + goto err; + + /* Populate ptype_lut_dqo. */ + for (i = 0; i < GVE_NUM_PTYPES; i++) { + ptype_lut_dqo->ptypes[i].l3_type = ptype_map->ptypes[i].l3_type; + ptype_lut_dqo->ptypes[i].l4_type = ptype_map->ptypes[i].l4_type; + } +err: + gve_dma_free_coherent(&dma); + return (err); +} + int gve_adminq_alloc(struct gve_priv *priv) { @@ -543,6 +639,7 @@ gve_adminq_alloc(struct gve_priv *priv) priv->adminq_destroy_rx_queue_cnt = 0; priv->adminq_dcfg_device_resources_cnt = 0; priv->adminq_set_driver_parameter_cnt = 0; + priv->adminq_get_ptype_map_cnt = 0; gve_reg_bar_write_4(priv, GVE_REG_ADMINQ_ADDR, priv->adminq_bus_addr / ADMINQ_SIZE); @@ -772,6 +869,10 @@ gve_adminq_issue_cmd(struct gve_priv *priv, struct gve_adminq_command *cmd_orig) priv->adminq_verify_driver_compatibility_cnt++; break; + case GVE_ADMINQ_GET_PTYPE_MAP: + priv->adminq_get_ptype_map_cnt++; + break; + default: device_printf(priv->dev, "Unknown AQ command opcode %d\n", opcode); } diff --git a/sys/dev/gve/gve_adminq.h b/sys/dev/gve/gve_adminq.h index 5923e5f353d1..b5d512331d42 100644 --- a/sys/dev/gve/gve_adminq.h +++ b/sys/dev/gve/gve_adminq.h @@ -1,7 +1,7 @@ /*- * SPDX-License-Identifier: BSD-3-Clause * - * Copyright (c) 2023 Google LLC + * Copyright (c) 2023-2024 Google LLC * * Redistribution and use in source and binary forms, with or without modification, * are permitted provided that the following conditions are met: @@ -137,9 +137,11 @@ _Static_assert(sizeof(struct gve_device_option_gqi_qpl) == 4, struct gve_device_option_dqo_rda { __be32 supported_features_mask; + __be16 tx_comp_ring_entries; + __be16 rx_buff_ring_entries; }; -_Static_assert(sizeof(struct gve_device_option_dqo_rda) == 4, +_Static_assert(sizeof(struct gve_device_option_dqo_rda) == 8, "gve: bad admin queue struct length"); struct gve_device_option_modify_ring { @@ -196,7 +198,6 @@ enum gve_driver_capability { gve_driver_capability_gqi_rda = 1, gve_driver_capability_dqo_qpl = 2, /* reserved for future use */ gve_driver_capability_dqo_rda = 3, - gve_driver_capability_alt_miss_compl = 4, }; #define GVE_CAP1(a) BIT((int) a) @@ -209,7 +210,9 @@ enum gve_driver_capability { * Only a few bits (as shown in `gve_driver_compatibility`) are currently * defined. The rest are reserved for future use. */ -#define GVE_DRIVER_CAPABILITY_FLAGS1 (GVE_CAP1(gve_driver_capability_gqi_qpl)) +#define GVE_DRIVER_CAPABILITY_FLAGS1 \ + (GVE_CAP1(gve_driver_capability_gqi_qpl) | \ + GVE_CAP1(gve_driver_capability_dqo_rda)) #define GVE_DRIVER_CAPABILITY_FLAGS2 0x0 #define GVE_DRIVER_CAPABILITY_FLAGS3 0x0 #define GVE_DRIVER_CAPABILITY_FLAGS4 0x0 @@ -282,6 +285,8 @@ struct gve_adminq_create_tx_queue { _Static_assert(sizeof(struct gve_adminq_create_tx_queue) == 48, "gve: bad admin queue struct length"); +#define GVE_RAW_ADDRESSING_QPL_ID 0xFFFFFFFF + struct gve_adminq_create_rx_queue { __be32 queue_id; __be32 index; @@ -352,6 +357,23 @@ struct stats { _Static_assert(sizeof(struct stats) == 16, "gve: bad admin queue struct length"); +/* These are control path types for PTYPE which are the same as the data path + * types. + */ +struct gve_ptype_entry { + uint8_t l3_type; + uint8_t l4_type; +}; + +struct gve_ptype_map { + struct gve_ptype_entry ptypes[1 << 10]; /* PTYPES are always 10 bits. */ +}; + +struct gve_adminq_get_ptype_map { + __be64 ptype_map_len; + __be64 ptype_map_addr; +}; + struct gve_adminq_command { __be32 opcode; __be32 status; @@ -368,6 +390,7 @@ struct gve_adminq_command { struct gve_adminq_set_driver_parameter set_driver_param; struct gve_adminq_verify_driver_compatibility verify_driver_compatibility; + struct gve_adminq_get_ptype_map get_ptype_map; uint8_t reserved[56]; }; }; @@ -375,6 +398,24 @@ struct gve_adminq_command { _Static_assert(sizeof(struct gve_adminq_command) == 64, "gve: bad admin queue struct length"); +enum gve_l3_type { + /* Must be zero so zero initialized LUT is unknown. */ + GVE_L3_TYPE_UNKNOWN = 0, + GVE_L3_TYPE_OTHER, + GVE_L3_TYPE_IPV4, + GVE_L3_TYPE_IPV6, +}; + +enum gve_l4_type { + /* Must be zero so zero initialized LUT is unknown. */ + GVE_L4_TYPE_UNKNOWN = 0, + GVE_L4_TYPE_OTHER, + GVE_L4_TYPE_TCP, + GVE_L4_TYPE_UDP, + GVE_L4_TYPE_ICMP, + GVE_L4_TYPE_SCTP, +}; + int gve_adminq_create_rx_queues(struct gve_priv *priv, uint32_t num_queues); int gve_adminq_create_tx_queues(struct gve_priv *priv, uint32_t num_queues); int gve_adminq_destroy_tx_queues(struct gve_priv *priv, uint32_t num_queues); @@ -387,8 +428,10 @@ int gve_adminq_configure_device_resources(struct gve_priv *priv); int gve_adminq_deconfigure_device_resources(struct gve_priv *priv); void gve_release_adminq(struct gve_priv *priv); int gve_adminq_register_page_list(struct gve_priv *priv, - struct gve_queue_page_list *qpl); + struct gve_queue_page_list *qpl); int gve_adminq_unregister_page_list(struct gve_priv *priv, uint32_t page_list_id); int gve_adminq_verify_driver_compatibility(struct gve_priv *priv, - uint64_t driver_info_len, vm_paddr_t driver_info_addr); + uint64_t driver_info_len, vm_paddr_t driver_info_addr); +int gve_adminq_get_ptype_map_dqo(struct gve_priv *priv, + struct gve_ptype_lut *ptype_lut); #endif /* _GVE_AQ_H_ */ diff --git a/sys/dev/gve/gve_dqo.h b/sys/dev/gve/gve_dqo.h new file mode 100644 index 000000000000..5f3f36d2245f --- /dev/null +++ b/sys/dev/gve/gve_dqo.h @@ -0,0 +1,306 @@ +/*- + * SPDX-License-Identifier: BSD-3-Clause + * + * Copyright (c) 2024 Google LLC + * + * Redistribution and use in source and binary forms, with or without modification, + * are permitted provided that the following conditions are met: + * + * 1. Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. + * + * 2. Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials provided with the distribution. + * + * 3. Neither the name of the copyright holder nor the names of its contributors + * may be used to endorse or promote products derived from this software without + * specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED + * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE + * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR + * ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; + * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON + * ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +/* GVE DQO Descriptor formats */ + +#ifndef _GVE_DESC_DQO_H_ +#define _GVE_DESC_DQO_H_ + +#include "gve_plat.h" + +#define GVE_ITR_ENABLE_BIT_DQO BIT(0) +#define GVE_ITR_NO_UPDATE_DQO (3 << 3) +#define GVE_ITR_INTERVAL_DQO_SHIFT 5 +#define GVE_ITR_INTERVAL_DQO_MASK ((1 << 12) - 1) +#define GVE_TX_IRQ_RATELIMIT_US_DQO 50 +#define GVE_RX_IRQ_RATELIMIT_US_DQO 20 + +#define GVE_TX_MAX_HDR_SIZE_DQO 255 +#define GVE_TX_MIN_TSO_MSS_DQO 88 + +/* + * Ringing the doorbell too often can hurt performance. + * + * HW requires this value to be at least 8. + */ +#define GVE_RX_BUF_THRESH_DQO 32 + +/* + * Start dropping RX fragments if at least these many + * buffers cannot be posted to the NIC. + */ +#define GVE_RX_DQO_MIN_PENDING_BUFS 32 + +/* Basic TX descriptor (DTYPE 0x0C) */ +struct gve_tx_pkt_desc_dqo { + __le64 buf_addr; + + /* Must be GVE_TX_PKT_DESC_DTYPE_DQO (0xc) */ + uint8_t dtype:5; + + /* Denotes the last descriptor of a packet. */ + uint8_t end_of_packet:1; + uint8_t checksum_offload_enable:1; + + /* If set, will generate a descriptor completion for this descriptor. */ + uint8_t report_event:1; + uint8_t reserved0; + __le16 reserved1; + + /* The TX completion for this packet will contain this tag. */ + __le16 compl_tag; + uint16_t buf_size:14; + uint16_t reserved2:2; +} __packed; +_Static_assert(sizeof(struct gve_tx_pkt_desc_dqo) == 16, + "gve: bad dqo desc struct length"); + +#define GVE_TX_PKT_DESC_DTYPE_DQO 0xc + +/* + * Maximum number of data descriptors allowed per packet, or per-TSO segment. + */ +#define GVE_TX_MAX_DATA_DESCS_DQO 10 +#define GVE_TX_MAX_BUF_SIZE_DQO ((16 * 1024) - 1) +#define GVE_TSO_MAXSIZE_DQO IP_MAXPACKET + +_Static_assert(GVE_TX_MAX_BUF_SIZE_DQO * GVE_TX_MAX_DATA_DESCS_DQO >= + GVE_TSO_MAXSIZE_DQO, + "gve: bad tso parameters"); + +/* + * "report_event" on TX packet descriptors may only be reported on the last + * descriptor of a TX packet, and they must be spaced apart with at least this + * value. + */ +#define GVE_TX_MIN_RE_INTERVAL 32 + +struct gve_tx_context_cmd_dtype { + uint8_t dtype:5; + uint8_t tso:1; + uint8_t reserved1:2; + uint8_t reserved2; +}; + +_Static_assert(sizeof(struct gve_tx_context_cmd_dtype) == 2, + "gve: bad dqo desc struct length"); + +/* + * TX Native TSO Context DTYPE (0x05) + * + * "flex" fields allow the driver to send additional packet context to HW. + */ +struct gve_tx_tso_context_desc_dqo { + /* The L4 payload bytes that should be segmented. */ + uint32_t tso_total_len:24; + uint32_t flex10:8; + + /* Max segment size in TSO excluding headers. */ + uint16_t mss:14; + uint16_t reserved:2; + + uint8_t header_len; /* Header length to use for TSO offload */ + uint8_t flex11; + struct gve_tx_context_cmd_dtype cmd_dtype; + uint8_t flex0; + uint8_t flex5; + uint8_t flex6; + uint8_t flex7; + uint8_t flex8; + uint8_t flex9; +} __packed; +_Static_assert(sizeof(struct gve_tx_tso_context_desc_dqo) == 16, + "gve: bad dqo desc struct length"); + +#define GVE_TX_TSO_CTX_DESC_DTYPE_DQO 0x5 + +/* General context descriptor for sending metadata. */ *** 2431 LINES SKIPPED ***