svn commit: r451882 - in head/emulators/xen-kernel: . files

Roger Pau Monné royger at FreeBSD.org
Thu Oct 12 15:02:32 UTC 2017


Author: royger (src committer)
Date: Thu Oct 12 15:02:30 2017
New Revision: 451882
URL: https://svnweb.freebsd.org/changeset/ports/451882

Log:
  xen-kernel: apply XSA-{237..244}
  
  Approved by:	bapt (implicit)
  MFH:		2017Q4
  Sponsored by:	Citrix Systems R&D

Added:
  head/emulators/xen-kernel/files/0001-x86-dont-allow-MSI-pIRQ-mapping-on-unowned-device.patch   (contents, props changed)
  head/emulators/xen-kernel/files/0001-x86-limit-linear-page-table-use-to-a-single-level.patch   (contents, props changed)
  head/emulators/xen-kernel/files/0002-x86-enforce-proper-privilege-when-mapping-pIRQ-s.patch   (contents, props changed)
  head/emulators/xen-kernel/files/0002-x86-mm-Disable-PV-linear-pagetables-by-default.patch   (contents, props changed)
  head/emulators/xen-kernel/files/0003-x86-MSI-disallow-redundant-enabling.patch   (contents, props changed)
  head/emulators/xen-kernel/files/0004-x86-IRQ-conditionally-preserve-irq-pirq-mapping-on-error.patch   (contents, props changed)
  head/emulators/xen-kernel/files/0005-x86-FLASK-fix-unmap-domain-IRQ-XSM-hook.patch   (contents, props changed)
  head/emulators/xen-kernel/files/xsa238.patch   (contents, props changed)
  head/emulators/xen-kernel/files/xsa239.patch   (contents, props changed)
  head/emulators/xen-kernel/files/xsa241-4.8.patch   (contents, props changed)
  head/emulators/xen-kernel/files/xsa242-4.9.patch   (contents, props changed)
  head/emulators/xen-kernel/files/xsa243-4.7.patch   (contents, props changed)
  head/emulators/xen-kernel/files/xsa244-4.7.patch   (contents, props changed)
Modified:
  head/emulators/xen-kernel/Makefile

Modified: head/emulators/xen-kernel/Makefile
==============================================================================
--- head/emulators/xen-kernel/Makefile	Thu Oct 12 14:44:18 2017	(r451881)
+++ head/emulators/xen-kernel/Makefile	Thu Oct 12 15:02:30 2017	(r451882)
@@ -2,7 +2,7 @@
 
 PORTNAME=	xen
 PORTVERSION=	4.7.2
-PORTREVISION=	5
+PORTREVISION=	6
 CATEGORIES=	emulators
 MASTER_SITES=	http://downloads.xenproject.org/release/xen/${PORTVERSION}/
 PKGNAMESUFFIX=	-kernel
@@ -67,7 +67,20 @@ EXTRA_PATCHES=	${FILESDIR}/0001-xen-logdirty-prevent-p
 		${FILESDIR}/xsa231-4.7.patch:-p1 \
 		${FILESDIR}/xsa232.patch:-p1 \
 		${FILESDIR}/xsa233.patch:-p1 \
-		${FILESDIR}/xsa234-4.8.patch:-p1
+		${FILESDIR}/xsa234-4.8.patch:-p1 \
+		${FILESDIR}/0001-x86-dont-allow-MSI-pIRQ-mapping-on-unowned-device.patch:-p1 \
+		${FILESDIR}/0002-x86-enforce-proper-privilege-when-mapping-pIRQ-s.patch:-p1 \
+		${FILESDIR}/0003-x86-MSI-disallow-redundant-enabling.patch:-p1 \
+		${FILESDIR}/0004-x86-IRQ-conditionally-preserve-irq-pirq-mapping-on-error.patch:-p1 \
+		${FILESDIR}/0005-x86-FLASK-fix-unmap-domain-IRQ-XSM-hook.patch:-p1 \
+		${FILESDIR}/xsa238.patch:-p1 \
+		${FILESDIR}/xsa239.patch:-p1 \
+		${FILESDIR}/0001-x86-limit-linear-page-table-use-to-a-single-level.patch:-p1 \
+		${FILESDIR}/0002-x86-mm-Disable-PV-linear-pagetables-by-default.patch:-p1 \
+		${FILESDIR}/xsa241-4.8.patch:-p1 \
+		${FILESDIR}/xsa242-4.9.patch:-p1 \
+		${FILESDIR}/xsa243-4.7.patch:-p1 \
+		${FILESDIR}/xsa244-4.7.patch:-p1
 
 .include <bsd.port.options.mk>
 

Added: head/emulators/xen-kernel/files/0001-x86-dont-allow-MSI-pIRQ-mapping-on-unowned-device.patch
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ head/emulators/xen-kernel/files/0001-x86-dont-allow-MSI-pIRQ-mapping-on-unowned-device.patch	Thu Oct 12 15:02:30 2017	(r451882)
@@ -0,0 +1,27 @@
+From: Jan Beulich <jbeulich at suse.com>
+Subject: x86: don't allow MSI pIRQ mapping on unowned device
+
+MSI setup should be permitted only for existing devices owned by the
+respective guest (the operation may still be carried out by the domain
+controlling that guest).
+
+This is part of XSA-237.
+
+Reported-by: HW42 <hw42 at ipsumj.de>
+Signed-off-by: Jan Beulich <jbeulich at suse.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3 at citrix.com>
+
+--- a/xen/arch/x86/irq.c
++++ b/xen/arch/x86/irq.c
+@@ -1964,7 +1964,10 @@ int map_domain_pirq(
+         if ( !cpu_has_apic )
+             goto done;
+ 
+-        pdev = pci_get_pdev(msi->seg, msi->bus, msi->devfn);
++        pdev = pci_get_pdev_by_domain(d, msi->seg, msi->bus, msi->devfn);
++        if ( !pdev )
++            goto done;
++
+         ret = pci_enable_msi(msi, &msi_desc);
+         if ( ret )
+         {

Added: head/emulators/xen-kernel/files/0001-x86-limit-linear-page-table-use-to-a-single-level.patch
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ head/emulators/xen-kernel/files/0001-x86-limit-linear-page-table-use-to-a-single-level.patch	Thu Oct 12 15:02:30 2017	(r451882)
@@ -0,0 +1,494 @@
+From ea7513a3e3f28cfec59dda6e128b6b4968685762 Mon Sep 17 00:00:00 2001
+From: Jan Beulich <jbeulich at suse.com>
+Date: Thu, 28 Sep 2017 15:17:27 +0100
+Subject: [PATCH 1/2] x86: limit linear page table use to a single level
+
+That's the only way that they're meant to be used. Without such a
+restriction arbitrarily long chains of same-level page tables can be
+built, tearing down of which may then cause arbitrarily deep recursion,
+causing a stack overflow. To facilitate this restriction, a counter is
+being introduced to track both the number of same-level entries in a
+page table as well as the number of uses of a page table in another
+same-level one (counting into positive and negative direction
+respectively, utilizing the fact that both counts can't be non-zero at
+the same time).
+
+Note that the added accounting introduces a restriction on the number
+of times a page can be used in other same-level page tables - more than
+32k of such uses are no longer possible.
+
+Note also that some put_page_and_type[_preemptible]() calls are
+replaced with open-coded equivalents.  This seemed preferrable to
+adding "parent_table" to the matrix of functions.
+
+Note further that cross-domain same-level page table references are no
+longer permitted (they probably never should have been).
+
+This is XSA-240.
+
+Reported-by: Jann Horn <jannh at google.com>
+Signed-off-by: Jan Beulich <jbeulich at suse.com>
+Signed-off-by: George Dunlap <george.dunlap at citrix.com>
+---
+ xen/arch/x86/domain.c        |   1 +
+ xen/arch/x86/mm.c            | 171 ++++++++++++++++++++++++++++++++++++++-----
+ xen/include/asm-x86/domain.h |   2 +
+ xen/include/asm-x86/mm.h     |  25 +++++--
+ 4 files changed, 175 insertions(+), 24 deletions(-)
+
+diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
+index 452748dd5b..44ed2ccd0a 100644
+--- a/xen/arch/x86/domain.c
++++ b/xen/arch/x86/domain.c
+@@ -1237,6 +1237,7 @@ int arch_set_info_guest(
+                 case -EINTR:
+                     rc = -ERESTART;
+                 case -ERESTART:
++                    v->arch.old_guest_ptpg = NULL;
+                     v->arch.old_guest_table =
+                         pagetable_get_page(v->arch.guest_table);
+                     v->arch.guest_table = pagetable_null();
+diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
+index e97ecccd93..e81a461b91 100644
+--- a/xen/arch/x86/mm.c
++++ b/xen/arch/x86/mm.c
+@@ -732,6 +732,61 @@ static void put_data_page(
+         put_page(page);
+ }
+ 
++static bool_t inc_linear_entries(struct page_info *pg)
++{
++    typeof(pg->linear_pt_count) nc = read_atomic(&pg->linear_pt_count), oc;
++
++    do {
++        /*
++         * The check below checks for the "linear use" count being non-zero
++         * as well as overflow.  Signed integer overflow is undefined behavior
++         * according to the C spec.  However, as long as linear_pt_count is
++         * smaller in size than 'int', the arithmetic operation of the
++         * increment below won't overflow; rather the result will be truncated
++         * when stored.  Ensure that this is always true.
++         */
++        BUILD_BUG_ON(sizeof(nc) >= sizeof(int));
++        oc = nc++;
++        if ( nc <= 0 )
++            return 0;
++        nc = cmpxchg(&pg->linear_pt_count, oc, nc);
++    } while ( oc != nc );
++
++    return 1;
++}
++
++static void dec_linear_entries(struct page_info *pg)
++{
++    typeof(pg->linear_pt_count) oc;
++
++    oc = arch_fetch_and_add(&pg->linear_pt_count, -1);
++    ASSERT(oc > 0);
++}
++
++static bool_t inc_linear_uses(struct page_info *pg)
++{
++    typeof(pg->linear_pt_count) nc = read_atomic(&pg->linear_pt_count), oc;
++
++    do {
++        /* See the respective comment in inc_linear_entries(). */
++        BUILD_BUG_ON(sizeof(nc) >= sizeof(int));
++        oc = nc--;
++        if ( nc >= 0 )
++            return 0;
++        nc = cmpxchg(&pg->linear_pt_count, oc, nc);
++    } while ( oc != nc );
++
++    return 1;
++}
++
++static void dec_linear_uses(struct page_info *pg)
++{
++    typeof(pg->linear_pt_count) oc;
++
++    oc = arch_fetch_and_add(&pg->linear_pt_count, 1);
++    ASSERT(oc < 0);
++}
++
+ /*
+  * We allow root tables to map each other (a.k.a. linear page tables). It
+  * needs some special care with reference counts and access permissions:
+@@ -761,15 +816,35 @@ get_##level##_linear_pagetable(                                             \
+                                                                             \
+     if ( (pfn = level##e_get_pfn(pde)) != pde_pfn )                         \
+     {                                                                       \
++        struct page_info *ptpg = mfn_to_page(pde_pfn);                      \
++                                                                            \
++        /* Make sure the page table belongs to the correct domain. */       \
++        if ( unlikely(page_get_owner(ptpg) != d) )                          \
++            return 0;                                                       \
++                                                                            \
+         /* Make sure the mapped frame belongs to the correct domain. */     \
+         if ( unlikely(!get_page_from_pagenr(pfn, d)) )                      \
+             return 0;                                                       \
+                                                                             \
+         /*                                                                  \
+-         * Ensure that the mapped frame is an already-validated page table. \
++         * Ensure that the mapped frame is an already-validated page table  \
++         * and is not itself having linear entries, as well as that the     \
++         * containing page table is not iself in use as a linear page table \
++         * elsewhere.                                                       \
+          * If so, atomically increment the count (checking for overflow).   \
+          */                                                                 \
+         page = mfn_to_page(pfn);                                            \
++        if ( !inc_linear_entries(ptpg) )                                    \
++        {                                                                   \
++            put_page(page);                                                 \
++            return 0;                                                       \
++        }                                                                   \
++        if ( !inc_linear_uses(page) )                                       \
++        {                                                                   \
++            dec_linear_entries(ptpg);                                       \
++            put_page(page);                                                 \
++            return 0;                                                       \
++        }                                                                   \
+         y = page->u.inuse.type_info;                                        \
+         do {                                                                \
+             x = y;                                                          \
+@@ -777,6 +852,8 @@ get_##level##_linear_pagetable(                                             \
+                  unlikely((x & (PGT_type_mask|PGT_validated)) !=            \
+                           (PGT_##level##_page_table|PGT_validated)) )       \
+             {                                                               \
++                dec_linear_uses(page);                                      \
++                dec_linear_entries(ptpg);                                   \
+                 put_page(page);                                             \
+                 return 0;                                                   \
+             }                                                               \
+@@ -1201,6 +1278,9 @@ get_page_from_l4e(
+             l3e_remove_flags((pl3e), _PAGE_USER|_PAGE_RW|_PAGE_ACCESSED);   \
+     } while ( 0 )
+ 
++static int _put_page_type(struct page_info *page, bool_t preemptible,
++                          struct page_info *ptpg);
++
+ void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner)
+ {
+     unsigned long     pfn = l1e_get_pfn(l1e);
+@@ -1270,17 +1350,22 @@ static int put_page_from_l2e(l2_pgentry_t l2e, unsigned long pfn)
+     if ( l2e_get_flags(l2e) & _PAGE_PSE )
+         put_superpage(l2e_get_pfn(l2e));
+     else
+-        put_page_and_type(l2e_get_page(l2e));
++    {
++        struct page_info *pg = l2e_get_page(l2e);
++        int rc = _put_page_type(pg, 0, mfn_to_page(pfn));
++
++        ASSERT(!rc);
++        put_page(pg);
++    }
+ 
+     return 0;
+ }
+ 
+-static int __put_page_type(struct page_info *, int preemptible);
+-
+ static int put_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn,
+                              int partial, bool_t defer)
+ {
+     struct page_info *pg;
++    int rc;
+ 
+     if ( !(l3e_get_flags(l3e) & _PAGE_PRESENT) || (l3e_get_pfn(l3e) == pfn) )
+         return 1;
+@@ -1303,21 +1388,28 @@ static int put_page_from_l3e(l3_pgentry_t l3e, unsigned long pfn,
+     if ( unlikely(partial > 0) )
+     {
+         ASSERT(!defer);
+-        return __put_page_type(pg, 1);
++        return _put_page_type(pg, 1, mfn_to_page(pfn));
+     }
+ 
+     if ( defer )
+     {
++        current->arch.old_guest_ptpg = mfn_to_page(pfn);
+         current->arch.old_guest_table = pg;
+         return 0;
+     }
+ 
+-    return put_page_and_type_preemptible(pg);
++    rc = _put_page_type(pg, 1, mfn_to_page(pfn));
++    if ( likely(!rc) )
++        put_page(pg);
++
++    return rc;
+ }
+ 
+ static int put_page_from_l4e(l4_pgentry_t l4e, unsigned long pfn,
+                              int partial, bool_t defer)
+ {
++    int rc = 1;
++
+     if ( (l4e_get_flags(l4e) & _PAGE_PRESENT) && 
+          (l4e_get_pfn(l4e) != pfn) )
+     {
+@@ -1326,18 +1418,22 @@ static int put_page_from_l4e(l4_pgentry_t l4e, unsigned long pfn,
+         if ( unlikely(partial > 0) )
+         {
+             ASSERT(!defer);
+-            return __put_page_type(pg, 1);
++            return _put_page_type(pg, 1, mfn_to_page(pfn));
+         }
+ 
+         if ( defer )
+         {
++            current->arch.old_guest_ptpg = mfn_to_page(pfn);
+             current->arch.old_guest_table = pg;
+             return 0;
+         }
+ 
+-        return put_page_and_type_preemptible(pg);
++        rc = _put_page_type(pg, 1, mfn_to_page(pfn));
++        if ( likely(!rc) )
++            put_page(pg);
+     }
+-    return 1;
++
++    return rc;
+ }
+ 
+ static int alloc_l1_table(struct page_info *page)
+@@ -1535,6 +1631,7 @@ static int alloc_l3_table(struct page_info *page)
+         {
+             page->nr_validated_ptes = i;
+             page->partial_pte = 0;
++            current->arch.old_guest_ptpg = NULL;
+             current->arch.old_guest_table = page;
+         }
+         while ( i-- > 0 )
+@@ -1627,6 +1724,7 @@ static int alloc_l4_table(struct page_info *page)
+                 {
+                     if ( current->arch.old_guest_table )
+                         page->nr_validated_ptes++;
++                    current->arch.old_guest_ptpg = NULL;
+                     current->arch.old_guest_table = page;
+                 }
+             }
+@@ -2369,14 +2467,20 @@ int free_page_type(struct page_info *pag
+ }
+ 
+ 
+-static int __put_final_page_type(
+-    struct page_info *page, unsigned long type, int preemptible)
++static int _put_final_page_type(struct page_info *page, unsigned long type,
++                                bool_t preemptible, struct page_info *ptpg)
+ {
+     int rc = free_page_type(page, type, preemptible);
+ 
+     /* No need for atomic update of type_info here: noone else updates it. */
+     if ( rc == 0 )
+     {
++        if ( ptpg && PGT_type_equal(type, ptpg->u.inuse.type_info) )
++        {
++            dec_linear_uses(page);
++            dec_linear_entries(ptpg);
++        }
++        ASSERT(!page->linear_pt_count || page_get_owner(page)->is_dying);
+         /*
+          * Record TLB information for flush later. We do not stamp page tables
+          * when running in shadow mode:
+@@ -2412,8 +2516,8 @@ static int __put_final_page_type(
+ }
+ 
+ 
+-static int __put_page_type(struct page_info *page,
+-                           int preemptible)
++static int _put_page_type(struct page_info *page, bool_t preemptible,
++                          struct page_info *ptpg)
+ {
+     unsigned long nx, x, y = page->u.inuse.type_info;
+     int rc = 0;
+@@ -2440,12 +2544,28 @@ static int __put_page_type(struct page_info *page,
+                                            x, nx)) != x) )
+                     continue;
+                 /* We cleared the 'valid bit' so we do the clean up. */
+-                rc = __put_final_page_type(page, x, preemptible);
++                rc = _put_final_page_type(page, x, preemptible, ptpg);
++                ptpg = NULL;
+                 if ( x & PGT_partial )
+                     put_page(page);
+                 break;
+             }
+ 
++            if ( ptpg && PGT_type_equal(x, ptpg->u.inuse.type_info) )
++            {
++                /*
++                 * page_set_tlbflush_timestamp() accesses the same union
++                 * linear_pt_count lives in. Unvalidated page table pages,
++                 * however, should occur during domain destruction only
++                 * anyway.  Updating of linear_pt_count luckily is not
++                 * necessary anymore for a dying domain.
++                 */
++                ASSERT(page_get_owner(page)->is_dying);
++                ASSERT(page->linear_pt_count < 0);
++                ASSERT(ptpg->linear_pt_count > 0);
++                ptpg = NULL;
++            }
++
+             /*
+              * Record TLB information for flush later. We do not stamp page
+              * tables when running in shadow mode:
+@@ -2465,6 +2585,13 @@ static int __put_page_type(struct page_info *page,
+             return -EINTR;
+     }
+ 
++    if ( ptpg && PGT_type_equal(x, ptpg->u.inuse.type_info) )
++    {
++        ASSERT(!rc);
++        dec_linear_uses(page);
++        dec_linear_entries(ptpg);
++    }
++
+     return rc;
+ }
+ 
+@@ -2599,6 +2726,7 @@ static int __get_page_type(struct page_info *page, unsigned long type,
+             page->nr_validated_ptes = 0;
+             page->partial_pte = 0;
+         }
++        page->linear_pt_count = 0;
+         rc = alloc_page_type(page, type, preemptible);
+     }
+ 
+@@ -2610,7 +2738,7 @@ static int __get_page_type(struct page_info *page, unsigned long type,
+ 
+ void put_page_type(struct page_info *page)
+ {
+-    int rc = __put_page_type(page, 0);
++    int rc = _put_page_type(page, 0, NULL);
+     ASSERT(rc == 0);
+     (void)rc;
+ }
+@@ -2626,7 +2754,7 @@ int get_page_type(struct page_info *page, unsigned long type)
+ 
+ int put_page_type_preemptible(struct page_info *page)
+ {
+-    return __put_page_type(page, 1);
++    return _put_page_type(page, 1, NULL);
+ }
+ 
+ int get_page_type_preemptible(struct page_info *page, unsigned long type)
+@@ -2832,11 +2960,14 @@ int put_old_guest_table(struct vcpu *v)
+     if ( !v->arch.old_guest_table )
+         return 0;
+ 
+-    switch ( rc = put_page_and_type_preemptible(v->arch.old_guest_table) )
++    switch ( rc = _put_page_type(v->arch.old_guest_table, 1,
++                                 v->arch.old_guest_ptpg) )
+     {
+     case -EINTR:
+     case -ERESTART:
+         return -ERESTART;
++    case 0:
++        put_page(v->arch.old_guest_table);
+     }
+ 
+     v->arch.old_guest_table = NULL;
+@@ -2993,6 +3124,7 @@ int new_guest_cr3(unsigned long mfn)
+                 rc = -ERESTART;
+                 /* fallthrough */
+             case -ERESTART:
++                curr->arch.old_guest_ptpg = NULL;
+                 curr->arch.old_guest_table = page;
+                 break;
+             default:
+@@ -3260,7 +3392,10 @@ long do_mmuext_op(
+                     if ( type == PGT_l1_page_table )
+                         put_page_and_type(page);
+                     else
++                    {
++                        curr->arch.old_guest_ptpg = NULL;
+                         curr->arch.old_guest_table = page;
++                    }
+                 }
+             }
+ 
+@@ -3293,6 +3428,7 @@ long do_mmuext_op(
+             {
+             case -EINTR:
+             case -ERESTART:
++                curr->arch.old_guest_ptpg = NULL;
+                 curr->arch.old_guest_table = page;
+                 rc = 0;
+                 break;
+@@ -3371,6 +3507,7 @@ long do_mmuext_op(
+                         rc = -ERESTART;
+                         /* fallthrough */
+                     case -ERESTART:
++                        curr->arch.old_guest_ptpg = NULL;
+                         curr->arch.old_guest_table = page;
+                         break;
+                     default:
+diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
+index 165e533ab3..5ef761be8b 100644
+--- a/xen/include/asm-x86/domain.h
++++ b/xen/include/asm-x86/domain.h
+@@ -529,6 +529,8 @@ struct arch_vcpu
+     pagetable_t guest_table_user;       /* (MFN) x86/64 user-space pagetable */
+     pagetable_t guest_table;            /* (MFN) guest notion of cr3 */
+     struct page_info *old_guest_table;  /* partially destructed pagetable */
++    struct page_info *old_guest_ptpg;   /* containing page table of the */
++                                        /* former, if any */
+     /* guest_table holds a ref to the page, and also a type-count unless
+      * shadow refcounts are in use */
+     pagetable_t shadow_table[4];        /* (MFN) shadow(s) of guest */
+diff --git a/xen/include/asm-x86/mm.h b/xen/include/asm-x86/mm.h
+index a30e76db1e..905c7971f2 100644
+--- a/xen/include/asm-x86/mm.h
++++ b/xen/include/asm-x86/mm.h
+@@ -125,11 +125,11 @@ struct page_info
+         u32 tlbflush_timestamp;
+ 
+         /*
+-         * When PGT_partial is true then this field is valid and indicates
+-         * that PTEs in the range [0, @nr_validated_ptes) have been validated.
+-         * An extra page reference must be acquired (or not dropped) whenever
+-         * PGT_partial gets set, and it must be dropped when the flag gets
+-         * cleared. This is so that a get() leaving a page in partially
++         * When PGT_partial is true then the first two fields are valid and
++         * indicate that PTEs in the range [0, @nr_validated_ptes) have been
++         * validated. An extra page reference must be acquired (or not dropped)
++         * whenever PGT_partial gets set, and it must be dropped when the flag
++         * gets cleared. This is so that a get() leaving a page in partially
+          * validated state (where the caller would drop the reference acquired
+          * due to the getting of the type [apparently] failing [-ERESTART])
+          * would not accidentally result in a page left with zero general
+@@ -153,10 +153,18 @@ struct page_info
+          * put_page_from_lNe() (due to the apparent failure), and hence it
+          * must be dropped when the put operation is resumed (and completes),
+          * but it must not be acquired if picking up the page for validation.
++         *
++         * The 3rd field, @linear_pt_count, indicates
++         * - by a positive value, how many same-level page table entries a page
++         *   table has,
++         * - by a negative value, in how many same-level page tables a page is
++         *   in use.
+          */
+         struct {
+-            u16 nr_validated_ptes;
+-            s8 partial_pte;
++            u16 nr_validated_ptes:PAGETABLE_ORDER + 1;
++            u16 :16 - PAGETABLE_ORDER - 1 - 2;
++            s16 partial_pte:2;
++            s16 linear_pt_count;
+         };
+ 
+         /*
+@@ -207,6 +215,9 @@ struct page_info
+ #define PGT_count_width   PG_shift(9)
+ #define PGT_count_mask    ((1UL<<PGT_count_width)-1)
+ 
++/* Are the 'type mask' bits identical? */
++#define PGT_type_equal(x, y) (!(((x) ^ (y)) & PGT_type_mask))
++
+  /* Cleared when the owning guest 'frees' this page. */
+ #define _PGC_allocated    PG_shift(1)
+ #define PGC_allocated     PG_mask(1, 1)
+-- 
+2.14.1
+

Added: head/emulators/xen-kernel/files/0002-x86-enforce-proper-privilege-when-mapping-pIRQ-s.patch
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ head/emulators/xen-kernel/files/0002-x86-enforce-proper-privilege-when-mapping-pIRQ-s.patch	Thu Oct 12 15:02:30 2017	(r451882)
@@ -0,0 +1,66 @@
+From: Jan Beulich <jbeulich at suse.com>
+Subject: x86: enforce proper privilege when (un)mapping pIRQ-s
+
+(Un)mapping of IRQs, just like other RESOURCE__ADD* / RESOURCE__REMOVE*
+actions (in FLASK terms) should be XSM_DM_PRIV rather than XSM_TARGET.
+This in turn requires bypassing the XSM check in physdev_unmap_pirq()
+for the HVM emuirq case just like is being done in physdev_map_pirq().
+The primary goal security wise, however, is to no longer allow HVM
+guests, by specifying their own domain ID instead of DOMID_SELF, to
+enter code paths intended for PV guest and the control domains of HVM
+guests only.
+
+This is part of XSA-237.
+
+Reported-by: HW42 <hw42 at ipsumj.de>
+Signed-off-by: Jan Beulich <jbeulich at suse.com>
+Reviewed-by: George Dunlap <george.dunlap at citrix.com>
+
+--- a/xen/arch/x86/physdev.c
++++ b/xen/arch/x86/physdev.c
+@@ -110,7 +110,7 @@ int physdev_map_pirq(domid_t domid, int
+     if ( d == NULL )
+         return -ESRCH;
+ 
+-    ret = xsm_map_domain_pirq(XSM_TARGET, d);
++    ret = xsm_map_domain_pirq(XSM_DM_PRIV, d);
+     if ( ret )
+         goto free_domain;
+ 
+@@ -255,13 +255,14 @@ int physdev_map_pirq(domid_t domid, int
+ int physdev_unmap_pirq(domid_t domid, int pirq)
+ {
+     struct domain *d;
+-    int ret;
++    int ret = 0;
+ 
+     d = rcu_lock_domain_by_any_id(domid);
+     if ( d == NULL )
+         return -ESRCH;
+ 
+-    ret = xsm_unmap_domain_pirq(XSM_TARGET, d);
++    if ( domid != DOMID_SELF || !is_hvm_domain(d) )
++        ret = xsm_unmap_domain_pirq(XSM_DM_PRIV, d);
+     if ( ret )
+         goto free_domain;
+ 
+--- a/xen/include/xsm/dummy.h
++++ b/xen/include/xsm/dummy.h
+@@ -453,7 +453,7 @@ static XSM_INLINE char *xsm_show_irq_sid
+ 
+ static XSM_INLINE int xsm_map_domain_pirq(XSM_DEFAULT_ARG struct domain *d)
+ {
+-    XSM_ASSERT_ACTION(XSM_TARGET);
++    XSM_ASSERT_ACTION(XSM_DM_PRIV);
+     return xsm_default_action(action, current->domain, d);
+ }
+ 
+@@ -465,7 +465,7 @@ static XSM_INLINE int xsm_map_domain_irq
+ 
+ static XSM_INLINE int xsm_unmap_domain_pirq(XSM_DEFAULT_ARG struct domain *d)
+ {
+-    XSM_ASSERT_ACTION(XSM_TARGET);
++    XSM_ASSERT_ACTION(XSM_DM_PRIV);
+     return xsm_default_action(action, current->domain, d);
+ }
+ 

Added: head/emulators/xen-kernel/files/0002-x86-mm-Disable-PV-linear-pagetables-by-default.patch
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ head/emulators/xen-kernel/files/0002-x86-mm-Disable-PV-linear-pagetables-by-default.patch	Thu Oct 12 15:02:30 2017	(r451882)
@@ -0,0 +1,82 @@
+From 9a4b34729f1bb92eea1e1efe52e6face9f0b17ae Mon Sep 17 00:00:00 2001
+From: George Dunlap <george.dunlap at citrix.com>
+Date: Fri, 22 Sep 2017 11:46:55 +0100
+Subject: [PATCH 2/2] x86/mm: Disable PV linear pagetables by default
+
+Allowing pagetables to point to other pagetables of the same level
+(often called 'linear pagetables') has been included in Xen since its
+inception.  But it is not used by the most common PV guests (Linux,
+NetBSD, minios), and has been the source of a number of subtle
+reference-counting bugs.
+
+Add a command-line option to control whether PV linear pagetables are
+allowed (disabled by default).
+
+Reported-by: Jann Horn <jannh at google.com>
+Signed-off-by: George Dunlap <george.dunlap at citrix.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3 at citrix.com>
+---
+Changes since v2:
+- s/_/-/; in command-line option
+- Added __read_mostly
+---
+ docs/misc/xen-command-line.markdown | 15 +++++++++++++++
+ xen/arch/x86/mm.c                   |  9 +++++++++
+ 2 files changed, 24 insertions(+)
+
+diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown
+index 73f5265fc6..061aff5edc 100644
+--- a/docs/misc/xen-command-line.markdown
++++ b/docs/misc/xen-command-line.markdown
+@@ -1280,6 +1280,21 @@ The following resources are available:
+     CDP, one COS will corespond two CBMs other than one with CAT, due to the
+     sum of CBMs is fixed, that means actual `cos_max` in use will automatically
+     reduce to half when CDP is enabled.
++
++### pv-linear-pt
++> `= <boolean>`
++
++> Default: `false`
++
++Allow PV guests to have pagetable entries pointing to other pagetables
++of the same level (i.e., allowing L2 PTEs to point to other L2 pages).
++This technique is often called "linear pagetables", and is sometimes
++used to allow operating systems a simple way to consistently map the
++current process's pagetables into its own virtual address space.
++
++None of the most common PV operating systems (Linux, NetBSD, MiniOS)
++use this technique, but there may be custom operating systems which
++do.
+ 
+ ### reboot
+ > `= t[riple] | k[bd] | a[cpi] | p[ci] | P[ower] | e[fi] | n[o] [, [w]arm | [c]old]`
+diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
+index e81a461b91..f748d4a221 100644
+--- a/xen/arch/x86/mm.c
++++ b/xen/arch/x86/mm.c
+@@ -799,6 +799,9 @@ static void dec_linear_uses(struct page_info *pg)
+  *     frame if it is mapped by a different root table. This is sufficient and
+  *     also necessary to allow validation of a root table mapping itself.
+  */
++static bool_t __read_mostly pv_linear_pt_enable = 0;
++boolean_param("pv-linear-pt", pv_linear_pt_enable);
++
+ #define define_get_linear_pagetable(level)                                  \
+ static int                                                                  \
+ get_##level##_linear_pagetable(                                             \
+@@ -808,6 +811,12 @@ get_##level##_linear_pagetable(                                             \
+     struct page_info *page;                                                 \
+     unsigned long pfn;                                                      \
+                                                                             \
++    if ( !pv_linear_pt_enable )                                             \
++    {                                                                       \
++        MEM_LOG("Attempt to create linear p.t. (feature disabled)");        \
++        return 0;                                                           \
++    }                                                                       \
++                                                                            \
+     if ( (level##e_get_flags(pde) & _PAGE_RW) )                             \
+     {                                                                       \
+         MEM_LOG("Attempt to create linear p.t. with write perms");          \
+-- 
+2.14.1
+

Added: head/emulators/xen-kernel/files/0003-x86-MSI-disallow-redundant-enabling.patch
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ head/emulators/xen-kernel/files/0003-x86-MSI-disallow-redundant-enabling.patch	Thu Oct 12 15:02:30 2017	(r451882)
@@ -0,0 +1,55 @@
+From: Jan Beulich <jbeulich at suse.com>
+Subject: x86/MSI: disallow redundant enabling
+
+At the moment, Xen attempts to allow redundant enabling of MSI by
+having pci_enable_msi() return 0, and point to the existing MSI
+descriptor, when the msi already exists.
+
+Unfortunately, if subsequent errors are encountered, the cleanup
+paths assume pci_enable_msi() had done full initialization, and
+hence undo everything that was assumed to be done by that
+function without also undoing other setup that would normally
+occur only after that function was called (in map_domain_pirq()
+itself).
+
+Rather than try to make the redundant enabling case work properly, just
+forbid it entirely by having pci_enable_msi() return -EEXIST when MSI
+is already set up.
+
+This is part of XSA-237.
+
+Reported-by: HW42 <hw42 at ipsumj.de>
+Signed-off-by: Jan Beulich <jbeulich at suse.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3 at citrix.com>
+Reviewed-by: George Dunlap <george.dunlap at citrix.com>
+
+--- a/xen/arch/x86/msi.c
++++ b/xen/arch/x86/msi.c
+@@ -1050,11 +1050,10 @@ static int __pci_enable_msi(struct msi_i
+     old_desc = find_msi_entry(pdev, msi->irq, PCI_CAP_ID_MSI);
+     if ( old_desc )
+     {
+-        printk(XENLOG_WARNING "irq %d already mapped to MSI on %04x:%02x:%02x.%u\n",
++        printk(XENLOG_ERR "irq %d already mapped to MSI on %04x:%02x:%02x.%u\n",
+                msi->irq, msi->seg, msi->bus,
+                PCI_SLOT(msi->devfn), PCI_FUNC(msi->devfn));
+-        *desc = old_desc;
+-        return 0;
++        return -EEXIST;
+     }
+ 
+     old_desc = find_msi_entry(pdev, -1, PCI_CAP_ID_MSIX);
+@@ -1118,11 +1117,10 @@ static int __pci_enable_msix(struct msi_
+     old_desc = find_msi_entry(pdev, msi->irq, PCI_CAP_ID_MSIX);
+     if ( old_desc )
+     {
+-        printk(XENLOG_WARNING "irq %d already mapped to MSI-X on %04x:%02x:%02x.%u\n",
++        printk(XENLOG_ERR "irq %d already mapped to MSI-X on %04x:%02x:%02x.%u\n",
+                msi->irq, msi->seg, msi->bus,
+                PCI_SLOT(msi->devfn), PCI_FUNC(msi->devfn));
+-        *desc = old_desc;
+-        return 0;
++        return -EEXIST;
+     }
+ 
+     old_desc = find_msi_entry(pdev, -1, PCI_CAP_ID_MSI);

Added: head/emulators/xen-kernel/files/0004-x86-IRQ-conditionally-preserve-irq-pirq-mapping-on-error.patch
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ head/emulators/xen-kernel/files/0004-x86-IRQ-conditionally-preserve-irq-pirq-mapping-on-error.patch	Thu Oct 12 15:02:30 2017	(r451882)
@@ -0,0 +1,124 @@
+From: Jan Beulich <jbeulich at suse.com>
+Subject: x86/IRQ: conditionally preserve irq <-> pirq mapping on map error paths
+
+Mappings that had been set up before should not be torn down when
+handling unrelated errors.
+
+This is part of XSA-237.
+
+Reported-by: HW42 <hw42 at ipsumj.de>
+Signed-off-by: Jan Beulich <jbeulich at suse.com>
+Reviewed-by: George Dunlap <george.dunlap at citrix.com>
+
+--- a/xen/arch/x86/irq.c
++++ b/xen/arch/x86/irq.c
+@@ -1252,7 +1252,8 @@ static int prepare_domain_irq_pirq(struc
+         return -ENOMEM;
+     }
+     *pinfo = info;
+-    return 0;
++
++    return !!err;
+ }
+ 
+ static void set_domain_irq_pirq(struct domain *d, int irq, struct pirq *pirq)
+@@ -1295,7 +1296,10 @@ int init_domain_irq_mapping(struct domai
+             continue;
+         err = prepare_domain_irq_pirq(d, i, i, &info);
+         if ( err )
++        {
++            ASSERT(err < 0);
+             break;
++        }
+         set_domain_irq_pirq(d, i, info);
+     }
+ 
+@@ -1903,6 +1907,7 @@ int map_domain_pirq(
+     struct pirq *info;
+     struct irq_desc *desc;
+     unsigned long flags;
++    DECLARE_BITMAP(prepared, MAX_MSI_IRQS) = {};
+ 
+     ASSERT(spin_is_locked(&d->event_lock));
+ 
+@@ -1946,8 +1951,10 @@ int map_domain_pirq(
+     }
+ 
+     ret = prepare_domain_irq_pirq(d, irq, pirq, &info);
+-    if ( ret )
++    if ( ret < 0 )
+         goto revoke;
++    if ( !ret )
++        __set_bit(0, prepared);
+ 
+     desc = irq_to_desc(irq);
+ 
+@@ -2019,8 +2026,10 @@ int map_domain_pirq(
+             irq = create_irq(NUMA_NO_NODE);
+             ret = irq >= 0 ? prepare_domain_irq_pirq(d, irq, pirq + nr, &info)
+                            : irq;
+-            if ( ret )
++            if ( ret < 0 )
+                 break;
++            if ( !ret )
++                __set_bit(nr, prepared);
+             msi_desc[nr].irq = irq;
+ 
+             if ( irq_permit_access(d, irq) != 0 )
+@@ -2053,15 +2062,15 @@ int map_domain_pirq(
+                 desc->msi_desc = NULL;
+                 spin_unlock_irqrestore(&desc->lock, flags);
+             }
+-            while ( nr-- )
++            while ( nr )
+             {
+                 if ( irq >= 0 && irq_deny_access(d, irq) )
+                     printk(XENLOG_G_ERR
+                            "dom%d: could not revoke access to IRQ%d (pirq %d)\n",
+                            d->domain_id, irq, pirq);
+-                if ( info )
++                if ( info && test_bit(nr, prepared) )
+                     cleanup_domain_irq_pirq(d, irq, info);
+-                info = pirq_info(d, pirq + nr);
++                info = pirq_info(d, pirq + --nr);
+                 irq = info->arch.irq;
+             }
+             msi_desc->irq = -1;
+@@ -2077,12 +2086,14 @@ int map_domain_pirq(
+         spin_lock_irqsave(&desc->lock, flags);
+         set_domain_irq_pirq(d, irq, info);
+         spin_unlock_irqrestore(&desc->lock, flags);
++        ret = 0;
+     }
+ 
+ done:
+     if ( ret )
+     {
+-        cleanup_domain_irq_pirq(d, irq, info);
++        if ( test_bit(0, prepared) )
++            cleanup_domain_irq_pirq(d, irq, info);
+  revoke:
+         if ( irq_deny_access(d, irq) )
+             printk(XENLOG_G_ERR
+--- a/xen/arch/x86/physdev.c
++++ b/xen/arch/x86/physdev.c
+@@ -185,7 +185,7 @@ int physdev_map_pirq(domid_t domid, int
+         }
+         else if ( type == MAP_PIRQ_TYPE_MULTI_MSI )
+         {
+-            if ( msi->entry_nr <= 0 || msi->entry_nr > 32 )
++            if ( msi->entry_nr <= 0 || msi->entry_nr > MAX_MSI_IRQS )
+                 ret = -EDOM;
+             else if ( msi->entry_nr != 1 && !iommu_intremap )
+                 ret = -EOPNOTSUPP;
+--- a/xen/include/asm-x86/msi.h
++++ b/xen/include/asm-x86/msi.h
+@@ -55,6 +55,8 @@
+ /* MAX fixed pages reserved for mapping MSIX tables. */
+ #define FIX_MSIX_MAX_PAGES              512
+ 
++#define MAX_MSI_IRQS 32 /* limited by MSI capability struct properties */
++
+ struct msi_info {
+     u16 seg;
+     u8 bus;

Added: head/emulators/xen-kernel/files/0005-x86-FLASK-fix-unmap-domain-IRQ-XSM-hook.patch
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ head/emulators/xen-kernel/files/0005-x86-FLASK-fix-unmap-domain-IRQ-XSM-hook.patch	Thu Oct 12 15:02:30 2017	(r451882)
@@ -0,0 +1,37 @@
+From: Jan Beulich <jbeulich at suse.com>
+Subject: x86/FLASK: fix unmap-domain-IRQ XSM hook
+
+The caller and the FLASK implementation of xsm_unmap_domain_irq()
+disagreed about what the "data" argument points to in the MSI case:
+Change both sides to pass/take a PCI device.
+
+This is part of XSA-237.
+
+Signed-off-by: Jan Beulich <jbeulich at suse.com>
+Reviewed-by: Andrew Cooper <andrew.cooper3 at citrix.com>
+
+--- a/xen/arch/x86/irq.c
++++ b/xen/arch/x86/irq.c
+@@ -2144,7 +2144,8 @@ int unmap_domain_pirq(struct domain *d,
+         nr = msi_desc->msi.nvec;
+     }
+ 
+-    ret = xsm_unmap_domain_irq(XSM_HOOK, d, irq, msi_desc);
++    ret = xsm_unmap_domain_irq(XSM_HOOK, d, irq,
++                               msi_desc ? msi_desc->dev : NULL);
+     if ( ret )
+         goto done;
+ 
+--- a/xen/xsm/flask/hooks.c
++++ b/xen/xsm/flask/hooks.c
+@@ -915,8 +915,8 @@ static int flask_unmap_domain_msi (struc
+                                    u32 *sid, struct avc_audit_data *ad)
+ {
+ #ifdef CONFIG_HAS_PCI
+-    struct msi_info *msi = data;
+-    u32 machine_bdf = (msi->seg << 16) | (msi->bus << 8) | msi->devfn;
++    const struct pci_dev *pdev = data;
++    u32 machine_bdf = (pdev->seg << 16) | (pdev->bus << 8) | pdev->devfn;
+ 
+     AVC_AUDIT_DATA_INIT(ad, DEV);
+     ad->device = machine_bdf;

Added: head/emulators/xen-kernel/files/xsa238.patch
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ head/emulators/xen-kernel/files/xsa238.patch	Thu Oct 12 15:02:30 2017	(r451882)
@@ -0,0 +1,45 @@
+From cdc2887076b19b39fab9faec495082586f3113df Mon Sep 17 00:00:00 2001
+From: XenProject Security Team <security at xenproject.org>
+Date: Tue, 5 Sep 2017 13:41:37 +0200
+Subject: x86/ioreq server: correctly handle bogus
+ XEN_DMOP_{,un}map_io_range_to_ioreq_server arguments
+
+Misbehaving device model can pass incorrect XEN_DMOP_map/
+unmap_io_range_to_ioreq_server arguments, namely end < start when
+specifying address range. When this happens we hit ASSERT(s <= e) in
+rangeset_contains_range()/rangeset_overlaps_range() with debug builds.
+Production builds will not trap right away but may misbehave later
+while handling such bogus ranges.
+
+This is XSA-238.
+
+Signed-off-by: Vitaly Kuznetsov <vkuznets at redhat.com>
+Reviewed-by: Jan Beulich <jbeulich at suse.com>
+---
+ xen/arch/x86/hvm/ioreq.c | 6 ++++++
+ 1 file changed, 6 insertions(+)
+
+diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
+index b2a8b0e986..8c8bf1f0ec 100644
+--- a/xen/arch/x86/hvm/ioreq.c
++++ b/xen/arch/x86/hvm/ioreq.c
+@@ -820,6 +820,9 @@ int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
+     struct hvm_ioreq_server *s;
+     int rc;
+ 
++    if ( start > end )
++        return -EINVAL;
++
+     spin_lock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
+ 
+     rc = -ENOENT;
+@@ -872,6 +875,9 @@ int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
+     struct hvm_ioreq_server *s;
+     int rc;
+ 
++    if ( start > end )
++        return -EINVAL;
++
+     spin_lock_recursive(&d->arch.hvm_domain.ioreq_server.lock);
+ 
+     rc = -ENOENT;

Added: head/emulators/xen-kernel/files/xsa239.patch
==============================================================================
--- /dev/null	00:00:00 1970	(empty, because file is newly added)
+++ head/emulators/xen-kernel/files/xsa239.patch	Thu Oct 12 15:02:30 2017	(r451882)
@@ -0,0 +1,46 @@
+From: Jan Beulich <jbeulich at suse.com>
+Subject: x86/HVM: prefill partially used variable on emulation paths
+
+Certain handlers ignore the access size (vioapic_write() being the
+example this was found with), perhaps leading to subsequent reads
+seeing data that wasn't actually written by the guest. For
+consistency and extra safety also do this on the read path of
+hvm_process_io_intercept(), even if this doesn't directly affect what
+guests get to see, as we've supposedly already dealt with read handlers
+leaving data completely unitialized.

*** DIFF OUTPUT TRUNCATED AT 1000 LINES ***


More information about the svn-ports-head mailing list