Re: [PATCH] Experimental vchiq and bcm2835_audio support for arm64

From: Marco Devesas Campos <devesas.campos_at_gmail.com>
Date: Wed, 11 May 2022 19:16:14 UTC
Hi Warner, and List,

(retrying as the original doesn't seem to have made it to the list)

so, quite clearly, this ended up not being a two day job...

On the other hand the vchiq code now not only works with the
bcm2835_audio driver but should be on par, feature-wise, with
the existing 32 bit code.

To wit, the patch below

   * updates vchiq and bcm2835_audio to work on 64 bit pi's

   * implements compat_freebsd32 ioctl calls so that 32 bit apps can
     be run on a 64 bit system -- including omxplayer*

   * fixes a few panics, stalls, busy waits and data corruptions

In the process of debugging things I also got the userland utilities
to work on arm64 and if this gets accepted for inclusion I'll update
the port.

On the issues that remain, audio play stalls intermittently after
a day or two of inactivity**; and running vchiq_test -p in parallel
with audio or video playing will result in stammering.

Anyway, output of git format-patch below.

Best,
Marco

* although the pi4 needs a special 32 bit version, not the one
from ports -- i'll get that out if there's interest

** workaround is to use the sysctl to change the output dest and
then change it back




 From 89f464839efca9483eabca454db0d78495e2f4ac Mon Sep 17 00:00:00 2001
From: Marco Devesas Campos <devesas.campos@gmailcom>
Date: Wed, 11 May 2022 15:19:41 +0100
Subject: [PATCH] arm64: Add support to vchiq and bcm2835_audio (plus some
  fixes)

Add 64 bit support to vchiq:
     * update fields to the appropriate fixed bit-size variants
       (everywhere [cf. e.g., ref:sizes and ref:sizes2])
     * update printfs to account for said sizes (everywhere)
     * update printfs to the different size of pointers (everywhere)
     * refer to event semaphores (that go into the very 32 bit VC) by
       offset instead of pointers [ref:sems]
     * dsb() is dsb(sy) in arm64 (vchiq_{core.c,core.h,kmod.c}) [ref:dsb]
     * comment out some unneeded code in parse_rx_slots around
       VCHIQ_MSG_BULK_RX (cf. [ref:deadcode])
     * adapt remote_event_signal to arm64 caching behaviours (vchiq_kmod.c)
     * refactor synchronization around remote_event_signal, forcing a
       wmb to be on the safe side; thereby make it look more like what linux
       does [ref:sync] (vchiq_{core,kmod}.c); and make a comment in
       vchiq_core.c true (wasn't before)
     * add a few more syncs to be on the safe side (vchiq_2835_arm.c)

     * use arm64 dcache invalidation mechanisms (vchiq_2835_arm.c)
     * explicitly invalidate pages on arm64 post bulk-read (vchiq_2835_arm.c)
     * support bulk transfers on rpi-4 (aka "long address space"
       transfers), by hard-coding their vc offset (0) and different
       bit-shift [ref:longbulk]  (vchiq_2835_arm.c)
     * refactor a loop-of-constant-test (vchiq_2835_arm.c)
     * use the correct (hard-coded) cache-line size on arm64

     * rework the handling of chipset "features" to account for the
       extra behaviours with 64 bit chipsets. (vchiq_kmod.c)
     * add compat_freebsd32 ioctls and respective datatypes.
       (vchiq_arm.c, vchiq_ioctl.h)
     * add sysctl-s (log, arm_log) to control debug (vchiq_kmod.c)

     * add example kernel config (GENERIC-VCHIQ)

Fixes:
     * Rework error handling in create_pagelist, avoiding a potential
       panic when freeing memory that had been dmamem_alloc, a potential
       null dereference, and a leak when having problems pinning pages
       (vchiq_2835_arm.c)
     * fix a confusion about the behaviour cv_wait_sig that lead to
       uninterruptible looping (vchiq_bsd.c)
     * implement detection of fatal signals (vchiq_bsd.c)
     * fix a confusion with the name of a variable introduced by
       #a0b8746 that could lead to a panic when closing the cdev file
       (vchiq_arm.c)
     * release user connection when destructing cdevpriv and avoid
       user processes sharing connection data, which lead to stalls
       and data corruption.  (vchiq_arm.c)

Update bcm2835_audio to work on 64bit systems:
     * update VC audio fields (vc_vchi_audioserv_defs.h, bcm2835_audio.c)
     * repurpose the hitherto unused `callback` field to help push a 64 bit
       pointer in (bcm2835_audio.c)
     * increase (hopefully) the robustness of the code that shifts data to
       VC (bcm2835_audio.c)
     * add a sysctl to control the amount of debugging info output by
       bcm2835_audio.c

Tested on zero, zero2 and 4+ with ping, functional, bulk
and control vchiq_test-s, and omxplayer

     [ref:dsb]: https://github.com/raspberrypi/linux/commit/35b7ebda57affcfd3616d39d5a727a4495b31123
     [ref:sems]: https://github.com/raspberrypi/linux/commit/24a4262afb10907fce3cdbc3ae336fcf4cdaece5
     [ref:sizes]: https://github.com/raspberrypi/linux/commit/e64568b8ea6c04e747e432c17ce2452652075216
     [ref:sizes2]: https://github.com/raspberrypi/linux/commit/f9bee6dd24addfa00c2c8d50c25b73efbfbb28ba
     [ref:deadcode]: https://github.com/raspberrypi/linux/commit/14f4d72fb799a9b3170a45ab80d4a3ddad541960
     [ref:sync]: https://github.com/raspberrypi/linux/commit/51c071265079319583e4c6e8c61e09660300d0bf
     [ref:longbulk]: https://github.com/raspberrypi/linux/commit/37f6f19a83722c9b866cecb5e455b2e16e5bbc6b
---
  sys/arm/broadcom/bcm2835/bcm2835_audio.c      | 248 +++++++--
  .../broadcom/bcm2835/vc_vchi_audioserv_defs.h |   8 +-
  sys/arm64/conf/GENERIC-VCHIQ                  |  23 +
  sys/contrib/vchiq/interface/compat/vchi_bsd.c |  12 +-
  .../interface/vchiq_arm/vchiq_2835_arm.c      | 159 +++++-
  .../vchiq/interface/vchiq_arm/vchiq_arm.c     | 494 +++++++++++++-----
  .../vchiq/interface/vchiq_arm/vchiq_core.c    | 285 +++++-----
  .../vchiq/interface/vchiq_arm/vchiq_core.h    |  11 +-
  .../vchiq/interface/vchiq_arm/vchiq_ioctl.h   | 121 +++++
  .../interface/vchiq_arm/vchiq_kern_lib.c      |   8 +-
  .../vchiq/interface/vchiq_arm/vchiq_kmod.c    |  76 ++-
  .../interface/vchiq_arm/vchiq_pagelist.h      |   8 +-
  .../vchiq/interface/vchiq_arm/vchiq_shim.c    |   4 +-
  13 files changed, 1100 insertions(+), 357 deletions(-)
  create mode 100644 sys/arm64/conf/GENERIC-VCHIQ

diff --git a/sys/arm/broadcom/bcm2835/bcm2835_audio.c b/sys/arm/broadcom/bcm2835/bcm2835_audio.c
index 36b1dc86535b..8d978bc20f85 100644
--- a/sys/arm/broadcom/bcm2835/bcm2835_audio.c
+++ b/sys/arm/broadcom/bcm2835/bcm2835_audio.c
@@ -27,6 +27,10 @@
  #include "opt_snd.h"
  #endif
  
+/*
+    For the PRIu64 identifier
+*/
+#include <machine/_inttypes.h>
  #include <dev/sound/pcm/sound.h>
  #include <dev/sound/chip.h>
  
@@ -116,6 +120,12 @@ struct bcm2835_audio_chinfo {
  	uint64_t retrieved_samples;
  	uint64_t underruns;
  	int starved;
+	struct bcm_log_vars {
+		unsigned int bsize ;
+		int slept_for_lack_of_space ;
+	} log_vars;
+#define DEFAULT_LOG_VALUES \
+	((struct bcm_log_vars) { .bsize = 0 , .slept_for_lack_of_space = 0 })
  };
  
  struct bcm2835_audio_info {
@@ -135,6 +145,7 @@ struct bcm2835_audio_info {
  
  	uint32_t flags_pending;
  
+	int verbose_trace;
  	/* Worker thread state */
  	int worker_state;
  };
@@ -143,6 +154,35 @@ struct bcm2835_audio_info {
  #define BCM2835_AUDIO_LOCKED(sc)	mtx_assert(&(sc)->lock, MA_OWNED)
  #define BCM2835_AUDIO_UNLOCK(sc)	mtx_unlock(&(sc)->lock)
  
+/* things that really have to be reported */
+#define REPORT_ERROR(sc,...) \
+	do{ device_printf((sc)->dev,__VA_ARGS__); }while(0)
+/* things that shouldn't clobber the output */
+#define INFORM_THAT(sc,...) \
+	do { \
+		if(sc->verbose_trace>0){ \
+			device_printf((sc)->dev,__VA_ARGS__); \
+		} \
+	}while(0)
+/* things that might clobber the output */
+#define WARN_THAT(sc,...) \
+	do { \
+		if(sc->verbose_trace>1){ \
+			device_printf((sc)->dev,__VA_ARGS__); \
+		} \
+	}while(0)
+/* things that are expected to (will) clobber the output */
+#define TRACE(sc,...) \
+	do { \
+		if(sc->verbose_trace>2){ \
+			device_printf((sc)->dev,__VA_ARGS__); \
+		} \
+	}while(0)
+
+/* Useful for circular buffer calcs */
+#define MOD_DIFF(front,rear,mod) (((mod) + (front) - (rear)) % (mod))
+
+
  static const char *
  dest_description(uint32_t dest)
  {
@@ -216,10 +256,21 @@ bcm2835_audio_callback(void *param, const VCHI_CALLBACK_REASON_T reason, void *m
  			    m.type);
  		}
  	} else if (m.type == VC_AUDIO_MSG_TYPE_COMPLETE) {
-		struct bcm2835_audio_chinfo *ch = m.u.complete.cookie;
+	  unsigned int signaled = 0;
+		struct bcm2835_audio_chinfo *ch ;
+#if defined(__aarch64__)
+		ch = (void *) ((((size_t)m.u.complete.callback) << 32)
+				| ((size_t)m.u.complete.cookie));
+#else
+		ch = (void *) (m.u.complete.cookie);
+#endif
+
  
  		int count = m.u.complete.count & 0xffff;
  		int perr = (m.u.complete.count & (1U << 30)) != 0;
+
+		TRACE(sc,"in:: count:0x%x perr:%d\n",m.u.complete.count,perr);
+
  		ch->callbacks++;
  		if (perr)
  			ch->underruns++;
@@ -239,18 +290,41 @@ bcm2835_audio_callback(void *param, const VCHI_CALLBACK_REASON_T reason, void *m
  					device_printf(sc->dev, "available_space == %d, count = %d, perr=%d\n",
  					    ch->available_space, count, perr);
  					device_printf(sc->dev,
-					    "retrieved_samples = %lld, submitted_samples = %lld\n",
+					    "retrieved_samples = %"PRIu64", submitted_samples = %"PRIu64"\n",
  					    ch->retrieved_samples, ch->submitted_samples);
  				}
-				ch->available_space += count;
-				ch->retrieved_samples += count;
  			}
-			if (perr || (ch->available_space >= VCHIQ_AUDIO_PACKET_SIZE))
-				cv_signal(&sc->worker_cv);
+			ch->available_space += count;
+			ch->retrieved_samples += count;
+			/*
+			 *  XXXMDC
+			 *  Experimental: if VC says it's empty, believe it
+			 *  Has to come after the usual adjustments
+			 */
+			if(perr){
+				ch->available_space = VCHIQ_AUDIO_BUFFER_SIZE;
+				perr = ch->retrieved_samples; // shd be != 0
+			}
+
+			if ((ch->available_space >= 1*VCHIQ_AUDIO_PACKET_SIZE)){
+					cv_signal(&sc->worker_cv);
+				signaled = 1;
+			}
  		}
  		BCM2835_AUDIO_UNLOCK(sc);
+		if(perr){
+			WARN_THAT(sc,
+				"VC starved; reported %u for a total of %u\n"
+				"worker %s\n" ,
+			 	count,perr,
+				(signaled ? "signaled": "not signaled")
+			);
+		}
  	} else
-		printf("%s: unknown m.type: %d\n", __func__, m.type);
+		WARN_THAT(sc,
+			"%s: unknown m.type: %d\n",
+			__func__, m.type
+		);
  }
  
  /* VCHIQ stuff */
@@ -262,13 +336,13 @@ bcm2835_audio_init(struct bcm2835_audio_info *sc)
  	/* Initialize and create a VCHI connection */
  	status = vchi_initialise(&sc->vchi_instance);
  	if (status != 0) {
-		printf("vchi_initialise failed: %d\n", status);
+		REPORT_ERROR(sc,"vchi_initialise failed: %d\n", status);
  		return;
  	}
  
  	status = vchi_connect(NULL, 0, sc->vchi_instance);
  	if (status != 0) {
-		printf("vchi_connect failed: %d\n", status);
+		REPORT_ERROR(sc,"vchi_connect failed: %d\n", status);
  		return;
  	}
  
@@ -300,7 +374,7 @@ bcm2835_audio_release(struct bcm2835_audio_info *sc)
  	if (sc->vchi_handle != VCHIQ_SERVICE_HANDLE_INVALID) {
  		success = vchi_service_close(sc->vchi_handle);
  		if (success != 0)
-			printf("vchi_service_close failed: %d\n", success);
+			REPORT_ERROR(sc,"vchi_service_close failed: %d\n", success);
  		vchi_service_release(sc->vchi_handle);
  		sc->vchi_handle = VCHIQ_SERVICE_HANDLE_INVALID;
  	}
@@ -330,7 +404,10 @@ bcm2835_audio_start(struct bcm2835_audio_chinfo *ch)
  		    &m, sizeof m, VCHI_FLAGS_BLOCK_UNTIL_QUEUED, NULL);
  
  		if (ret != 0)
-			printf("%s: vchi_msg_queue failed (err %d)\n", __func__, ret);
+			REPORT_ERROR(sc,
+				"%s: vchi_msg_queue failed (err %d)\n",
+				__func__, ret
+			);
  	}
  }
  
@@ -345,11 +422,15 @@ bcm2835_audio_stop(struct bcm2835_audio_chinfo *ch)
  		m.type = VC_AUDIO_MSG_TYPE_STOP;
  		m.u.stop.draining = 0;
  
+		INFORM_THAT(sc,"sending stop\n");
  		ret = vchi_msg_queue(sc->vchi_handle,
  		    &m, sizeof m, VCHI_FLAGS_BLOCK_UNTIL_QUEUED, NULL);
  
  		if (ret != 0)
-			printf("%s: vchi_msg_queue failed (err %d)\n", __func__, ret);
+			REPORT_ERROR(sc,
+				"%s: vchi_msg_queue failed (err %d)\n",
+				__func__, ret
+			);
  	}
  }
  
@@ -365,7 +446,10 @@ bcm2835_audio_open(struct bcm2835_audio_info *sc)
  		    &m, sizeof m, VCHI_FLAGS_BLOCK_UNTIL_QUEUED, NULL);
  
  		if (ret != 0)
-			printf("%s: vchi_msg_queue failed (err %d)\n", __func__, ret);
+			REPORT_ERROR(sc,
+				"%s: vchi_msg_queue failed (err %d)\n",
+				__func__, ret
+			);
  	}
  }
  
@@ -387,7 +471,10 @@ bcm2835_audio_update_controls(struct bcm2835_audio_info *sc, uint32_t volume, ui
  		    &m, sizeof m, VCHI_FLAGS_BLOCK_UNTIL_QUEUED, NULL);
  
  		if (ret != 0)
-			printf("%s: vchi_msg_queue failed (err %d)\n", __func__, ret);
+			REPORT_ERROR(sc,
+				"%s: vchi_msg_queue failed (err %d)\n",
+				__func__, ret
+			);
  	}
  }
  
@@ -407,7 +494,10 @@ bcm2835_audio_update_params(struct bcm2835_audio_info *sc, uint32_t fmt, uint32_
  		    &m, sizeof m, VCHI_FLAGS_BLOCK_UNTIL_QUEUED, NULL);
  
  		if (ret != 0)
-			printf("%s: vchi_msg_queue failed (err %d)\n", __func__, ret);
+			REPORT_ERROR(sc,
+				"%s: vchi_msg_queue failed (err %d)\n",
+				__func__, ret
+			);
  	}
  }
  
@@ -415,18 +505,25 @@ static bool
  bcm2835_audio_buffer_should_sleep(struct bcm2835_audio_chinfo *ch)
  {
  
+	ch->log_vars.slept_for_lack_of_space = 0;
  	if (ch->playback_state != PLAYBACK_PLAYING)
  		return (true);
  
  	/* Not enough data */
-	if (sndbuf_getready(ch->buffer) < VCHIQ_AUDIO_PACKET_SIZE) {
-		printf("starve\n");
+	/* XXXMDC Take unsubmitted stuff into account */
+	if (sndbuf_getready(ch->buffer)
+			- MOD_DIFF(
+				ch->unsubmittedptr,
+				sndbuf_getreadyptr(ch->buffer),
+				sndbuf_getsize(ch->buffer)
+			) < VCHIQ_AUDIO_PACKET_SIZE) {
  		ch->starved++;
  		return (true);
  	}
  
  	/* Not enough free space */
  	if (ch->available_space < VCHIQ_AUDIO_PACKET_SIZE) {
+		ch->log_vars.slept_for_lack_of_space = 1;
  		return (true);
  	}
  
@@ -447,22 +544,27 @@ bcm2835_audio_write_samples(struct bcm2835_audio_chinfo *ch, void *buf, uint32_t
  	m.type = VC_AUDIO_MSG_TYPE_WRITE;
  	m.u.write.count = count;
  	m.u.write.max_packet = VCHIQ_AUDIO_PACKET_SIZE;
-	m.u.write.callback = NULL;
-	m.u.write.cookie = ch;
+#if defined(__aarch64__)
+	m.u.write.callback = (uint32_t)(((size_t) ch) >> 32) & 0xffffffff;
+	m.u.write.cookie = (uint32_t)(((size_t) ch) & 0xffffffff);
+#else
+	m.u.write.callback = (uint32_t) NULL;
+	m.u.write.cookie = (uint32_t) ch;
+#endif
  	m.u.write.silence = 0;
  
  	ret = vchi_msg_queue(sc->vchi_handle,
  	    &m, sizeof m, VCHI_FLAGS_BLOCK_UNTIL_QUEUED, NULL);
  
  	if (ret != 0)
-		printf("%s: vchi_msg_queue failed (err %d)\n", __func__, ret);
+		REPORT_ERROR(sc,"%s: vchi_msg_queue failed (err %d)\n", __func__, ret);
  
  	while (count > 0) {
  		int bytes = MIN((int)m.u.write.max_packet, (int)count);
  		ret = vchi_msg_queue(sc->vchi_handle,
  		    buf, bytes, VCHI_FLAGS_BLOCK_UNTIL_QUEUED, NULL);
  		if (ret != 0)
-			printf("%s: vchi_msg_queue failed: %d\n",
+			REPORT_ERROR(sc,"%s: vchi_msg_queue failed: %d\n",
  			    __func__, ret);
  		buf = (char *)buf + bytes;
  		count -= bytes;
@@ -494,6 +596,10 @@ bcm2835_audio_worker(void *data)
  		while ((sc->flags_pending == 0) &&
  		    bcm2835_audio_buffer_should_sleep(ch)) {
  			cv_wait_sig(&sc->worker_cv, &sc->lock);
+			if((sc-> flags_pending == 0)
+			    && ch->log_vars.slept_for_lack_of_space) {
+				TRACE(sc,"slept for lack of space\n");
+			}
  		}
  		flags = sc->flags_pending;
  		/* Clear pending flags */
@@ -520,16 +626,32 @@ bcm2835_audio_worker(void *data)
  			BCM2835_AUDIO_LOCK(sc);
  			bcm2835_audio_reset_channel(&sc->pch);
  			ch->playback_state = PLAYBACK_IDLE;
+			long sub_total = ch->submitted_samples;
+			long retd = ch->retrieved_samples;
  			BCM2835_AUDIO_UNLOCK(sc);
+			INFORM_THAT(sc,
+				"stopped audio. submitted a total of %lu "
+				"having been acked %lu\n",
+				sub_total, retd
+			);
  			continue;
  		}
  
  		/* Requested to start playback */
  		if ((flags & AUDIO_PLAY) &&
  		    (ch->playback_state == PLAYBACK_IDLE)) {
+			INFORM_THAT(sc,
+				"starting audio\n"
+			);
+			unsigned int bsize = sndbuf_getsize(ch->buffer);
  			BCM2835_AUDIO_LOCK(sc);
  			ch->playback_state = PLAYBACK_PLAYING;
+			ch->log_vars.bsize = bsize;
  			BCM2835_AUDIO_UNLOCK(sc);
+			INFORM_THAT(sc,
+				"buffer size is %u\n",
+				bsize
+			);	
  			bcm2835_audio_start(ch);
  		}
  
@@ -538,20 +660,69 @@ bcm2835_audio_worker(void *data)
  
  		if (sndbuf_getready(ch->buffer) == 0)
  			continue;
-
-		count = sndbuf_getready(ch->buffer);
+		uint32_t i_count;
+
+		/* XXXMDC Take unsubmitted stuff into account */
+		count
+		= i_count
+		= sndbuf_getready(ch->buffer)
+			- MOD_DIFF(
+				ch->unsubmittedptr,
+				sndbuf_getreadyptr(ch->buffer),
+				sndbuf_getsize(ch->buffer)
+			);
  		size = sndbuf_getsize(ch->buffer);
-		readyptr = sndbuf_getreadyptr(ch->buffer);
+		readyptr = ch->unsubmittedptr;
  
+		int size_changed=0;
+		unsigned int available;
  		BCM2835_AUDIO_LOCK(sc);
-		if (readyptr + count > size)
+		if(size != ch->log_vars.bsize){
+			ch->log_vars.bsize = size;
+			size_changed = 1;
+		}
+		available = ch->available_space;
+		/*
+		 *  XXXMDC
+		 *
+		 *  On arm64, got into situations where 
+		 *  readyptr was less than a packet away
+		 *  from the end of the buffer, which led
+		 *  to count being set to 0 and, inexorably, starvation.
+		 *  Code below tries to take that into account.
+		 *  The problem might have been fixed with some of the
+		 *  other changes that were made in the meantime,
+		 *  but for now this works fine.
+		 */
+		if (readyptr + count > size){
  			count = size - readyptr;
-		count = min(count, ch->available_space);
-		count -= (count % VCHIQ_AUDIO_PACKET_SIZE);
+		}
+		if(count > ch->available_space){
+			count = ch->available_space;
+			count -= (count % VCHIQ_AUDIO_PACKET_SIZE);
+		}else if (count > VCHIQ_AUDIO_PACKET_SIZE){
+			count -= (count % VCHIQ_AUDIO_PACKET_SIZE);
+		}else if (size > count + readyptr) {
+			count = 0;
+		}
  		BCM2835_AUDIO_UNLOCK(sc);
-
-		if (count < VCHIQ_AUDIO_PACKET_SIZE)
+	if(count % VCHIQ_AUDIO_PACKET_SIZE != 0){
+	  WARN_THAT(sc,
+	 	"count: %u  initial count: %u  "
+	        "size: %u  readyptr: %u  available: %u"
+		"\n",
+		count,i_count,size,readyptr, available);
+	}
+	if(size_changed) INFORM_THAT(sc,"bsize changed to %u\n",size);
+		
+		if (count == 0){
+			WARN_THAT(sc,
+				"not enough room for a packet: count %d,"
+				" i_count %d, rptr %d, size %d\n",
+				count, i_count, readyptr, size
+			);
  			continue;
+		}
  
  		buf = (uint8_t*)sndbuf_getbuf(ch->buffer) + readyptr;
  
@@ -560,8 +731,17 @@ bcm2835_audio_worker(void *data)
  		ch->unsubmittedptr = (ch->unsubmittedptr + count) % sndbuf_getsize(ch->buffer);
  		ch->available_space -= count;
  		ch->submitted_samples += count;
+		long sub = count;
+		long sub_total = ch->submitted_samples;
+		long retd = ch->retrieved_samples;
  		KASSERT(ch->available_space >= 0, ("ch->available_space == %d\n", ch->available_space));
  		BCM2835_AUDIO_UNLOCK(sc);
+
+	TRACE(sc,
+		"submitted %lu for a total of %lu having been acked %lu; "
+		"rptr %d, had %u available \n",
+		sub, sub_total, retd, readyptr, available);
+
  	}
  
  	BCM2835_AUDIO_LOCK(sc);
@@ -580,7 +760,9 @@ bcm2835_audio_create_worker(struct bcm2835_audio_info *sc)
  	sc->worker_state = WORKER_RUNNING;
  	if (kproc_create(bcm2835_audio_worker, (void*)sc, &newp, 0, 0,
  	    "bcm2835_audio_worker") != 0) {
-		printf("failed to create bcm2835_audio_worker\n");
+		REPORT_ERROR(sc,
+			"failed to create bcm2835_audio_worker\n"
+		);
  	}
  }
  
@@ -613,6 +795,8 @@ bcmchan_init(kobj_t obj, void *devinfo, struct snd_dbuf *b, struct pcm_channel *
  		return NULL;
  	}
  
+	ch->log_vars = DEFAULT_LOG_VALUES;
+
  	BCM2835_AUDIO_LOCK(sc);
  	bcm2835_worker_update_params(sc);
  	BCM2835_AUDIO_UNLOCK(sc);
@@ -833,6 +1017,9 @@ vchi_audio_sysctl_init(struct bcm2835_audio_info *sc)
  	SYSCTL_ADD_INT(ctx, tree, OID_AUTO, "starved",
  			CTLFLAG_RD, &sc->pch.starved,
  			sc->pch.starved, "number of starved conditions");
+	SYSCTL_ADD_INT(ctx, tree, OID_AUTO, "trace",
+			CTLFLAG_RW, &sc->verbose_trace,
+			sc->verbose_trace, "enable tracing of transfers");
  }
  
  static void
@@ -864,6 +1051,7 @@ bcm2835_audio_delayed_init(void *xsc)
  	bcm2835_audio_open(sc);
  	sc->volume = 75;
  	sc->dest = DEST_AUTO;
+	sc->verbose_trace = 0;
  
      	if (mixer_init(sc->dev, &bcmmixer_class, sc)) {
  		device_printf(sc->dev, "mixer_init failed\n");
diff --git a/sys/arm/broadcom/bcm2835/vc_vchi_audioserv_defs.h b/sys/arm/broadcom/bcm2835/vc_vchi_audioserv_defs.h
index 143c54385916..04292df1c261 100644
--- a/sys/arm/broadcom/bcm2835/vc_vchi_audioserv_defs.h
+++ b/sys/arm/broadcom/bcm2835/vc_vchi_audioserv_defs.h
@@ -114,8 +114,8 @@ typedef struct
  typedef struct
  {
  	uint32_t count; /* in bytes */
-	void *callback;
-	void *cookie;
+	uint32_t callback;
+	uint32_t cookie;
  	uint16_t silence;
  	uint16_t max_packet;
  } VC_AUDIO_WRITE_T;
@@ -131,8 +131,8 @@ typedef struct
  typedef struct
  {
  	int32_t count;  /* Success value */
-	void *callback;
-	void *cookie;
+	uint32_t callback;
+	uint32_t cookie;
  } VC_AUDIO_COMPLETE_T;
  
  /* Message header for all messages in HOST->VC direction */
diff --git a/sys/arm64/conf/GENERIC-VCHIQ b/sys/arm64/conf/GENERIC-VCHIQ
new file mode 100644
index 000000000000..422ed425894c
--- /dev/null
+++ b/sys/arm64/conf/GENERIC-VCHIQ
@@ -0,0 +1,23 @@
+#
+# GENERIC-VCHIQ
+#
+# Custom kernel for arm64 plus VCHIQ
+#
+# $FreeBSD$
+
+#NO_UNIVERSE
+
+include		GENERIC
+ident		GENERIC-VCHIQ
+
+device vchiq
+
+# If you want to have any chance of compiling this in a RPI Zero 2
+# uncomment the stuff below
+
+# nomakeoptions DEBUG
+# nomakeoptions WITH_CTF
+# nooptions DDB_CTF
+# makeoptions MALLOC_PRODUCTION=1
+
+
diff --git a/sys/contrib/vchiq/interface/compat/vchi_bsd.c b/sys/contrib/vchiq/interface/compat/vchi_bsd.c
index f831880f5e13..e039992036aa 100644
--- a/sys/contrib/vchiq/interface/compat/vchi_bsd.c
+++ b/sys/contrib/vchiq/interface/compat/vchi_bsd.c
@@ -341,7 +341,6 @@ down_interruptible(struct semaphore *s)
  	int ret ;
  
  	ret = 0;
-
  	mtx_lock(&s->mtx);
  
  	while (s->value == 0) {
@@ -349,13 +348,11 @@ down_interruptible(struct semaphore *s)
  		ret = cv_wait_sig(&s->cv, &s->mtx);
  		s->waiters--;
  
-		if (ret == EINTR) {
+		/* XXXMDC As per its semaphore.c, linux can only return EINTR */
+		if (ret) {
  			mtx_unlock(&s->mtx);
-			return (-EINTR);
+			return -EINTR;
  		}
-
-		if (ret == ERESTART)
-			continue;
  	}
  
  	s->value--;
@@ -442,8 +439,7 @@ flush_signals(VCHIQ_THREAD_T thr)
  int
  fatal_signal_pending(VCHIQ_THREAD_T thr)
  {
-	printf("Implement ME: %s\n", __func__);
-	return (0);
+	return (curproc_sigkilled());
  }
  
  /*
diff --git a/sys/contrib/vchiq/interface/vchiq_arm/vchiq_2835_arm.c b/sys/contrib/vchiq/interface/vchiq_arm/vchiq_2835_arm.c
index 279aacd0880a..7a48ad9d21b6 100644
--- a/sys/contrib/vchiq/interface/vchiq_arm/vchiq_2835_arm.c
+++ b/sys/contrib/vchiq/interface/vchiq_arm/vchiq_2835_arm.c
@@ -65,9 +65,24 @@ MALLOC_DEFINE(M_VCPAGELIST, "vcpagelist", "VideoCore pagelist memory");
  
  #define MAX_FRAGMENTS (VCHIQ_NUM_CURRENT_BULKS * 2)
  
+/*
+ *  XXXMDC
+ * Do this less ad-hoc-y -- e.g.
+ * https://github.com/raspberrypi/linux/commit/c683db8860a80562a2bb5b451d77b3e471d24f36
+ */
+#if defined(__aarch64__)
+int g_cache_line_size = 64;
+#else
  int g_cache_line_size = 32;
+#endif
  static int g_fragment_size;
  
+unsigned int g_long_bulk_space = 0;
+#define VM_PAGE_TO_VC_BULK_PAGE(x) (\
+	g_long_bulk_space ? VM_PAGE_TO_PHYS(x)\
+		 : PHYS_TO_VCBUS(VM_PAGE_TO_PHYS(x))\
+)
+
  typedef struct vchiq_2835_state_struct {
     int inited;
     VCHIQ_ARM_STATE_T arm_state;
@@ -113,6 +128,59 @@ vchiq_dmamap_cb(void *arg, bus_dma_segment_t *segs, int nseg, int err)
  	*addr = PHYS_TO_VCBUS(segs[0].ds_addr);
  }
  
+#if defined(__aarch64__) /* See comment in free_pagelist */
+static int
+invalidate_cachelines_in_range_of_ppage(
+	vm_page_t p,
+	size_t offset,
+	size_t count
+)
+{
+	if(offset + count > PAGE_SIZE){ return EINVAL; }
+        uint8_t *dst = (uint8_t*)pmap_quick_enter_page(p);
+        if (!dst){
+                return ENOMEM;
+	}
+	cpu_dcache_inv_range((vm_offset_t)dst + offset, count);
+	pmap_quick_remove_page((vm_offset_t)dst);
+	return 0;
+}
+
+/* XXXMDC bulk instead of loading and invalidating single pages? */
+static void
+invalidate_cachelines_in_range_of_ppage_seq(
+	vm_page_t *p,
+	size_t start,
+	size_t count
+)
+{
+	if(start >= PAGE_SIZE) goto invalid_input;	
+
+#define _NEXT_AT(x,_m) (((x)+((_m)-1)) & ~((_m)-1))   /* for power of two m */
+	size_t offset = _NEXT_AT(start,g_cache_line_size);
+#undef _NEXT_AT
+	count = (offset < start + count) ? count - (offset - start) : 0;
+	offset = offset & (PAGE_SIZE - 1);
+	for(
+		size_t done = 0;
+		count > done;
+		p++, done += PAGE_SIZE - offset, offset = 0
+	){
+		size_t in_page = PAGE_SIZE - offset;
+		size_t todo = (count-done > in_page) ? in_page : count-done;
+		int e =	invalidate_cachelines_in_range_of_ppage(*p, offset, todo);
+		if(e != 0)
+			goto problem_in_loop;
+	}
+	return;
+
+problem_in_loop:
+invalid_input:
+	WARN_ON(1);
+	return;
+}
+#endif
+
  static int
  copyout_page(vm_page_t p, size_t offset, void *kaddr, size_t size)
  {
@@ -171,7 +239,7 @@ vchiq_platform_init(VCHIQ_STATE_T *state)
  		goto failed_load;
  	}
  
-	WARN_ON(((int)g_slot_mem & (PAGE_SIZE - 1)) != 0);
+	WARN_ON(((size_t)g_slot_mem & (PAGE_SIZE - 1)) != 0);
  
  	vchiq_slot_zero = vchiq_init_slots(g_slot_mem, g_slot_mem_size);
  	if (!vchiq_slot_zero) {
@@ -204,8 +272,8 @@ vchiq_platform_init(VCHIQ_STATE_T *state)
  	bcm_mbox_write(BCM2835_MBOX_CHAN_VCHIQ, (unsigned int)g_slot_phys);
  
  	vchiq_log_info(vchiq_arm_log_level,
-		"vchiq_init - done (slots %x, phys %x)",
-		(unsigned int)vchiq_slot_zero, g_slot_phys);
+		"vchiq_init - done (slots %zx, phys %zx)",
+		(size_t)vchiq_slot_zero, g_slot_phys);
  
     vchiq_call_connected_callbacks();
  
@@ -393,13 +461,14 @@ pagelist_page_free(vm_page_t pp)
  ** from increased speed as a result.
  */
  
+
  static int
  create_pagelist(char __user *buf, size_t count, unsigned short type,
  	struct proc *p, BULKINFO_T *bi)
  {
  	PAGELIST_T *pagelist;
  	vm_page_t* pages;
-	unsigned long *addrs;
+	uint32_t *addrs;
  	unsigned int num_pages, i;
  	vm_offset_t offset;
  	int pagelist_size;
@@ -436,7 +505,7 @@ create_pagelist(char __user *buf, size_t count, unsigned short type,
  
  	err = bus_dmamem_alloc(bi->pagelist_dma_tag, (void **)&pagelist,
  	    BUS_DMA_COHERENT | BUS_DMA_WAITOK, &bi->pagelist_dma_map);
-	if (err) {
+	if (err || !pagelist) {
  		vchiq_log_error(vchiq_core_log_level, "Unable to allocate pagelist memory");
  		err = -ENOMEM;
  		goto failed_alloc;
@@ -449,14 +518,12 @@ create_pagelist(char __user *buf, size_t count, unsigned short type,
  	if (err) {
  		vchiq_log_error(vchiq_core_log_level, "cannot load DMA map for pagelist memory");
  		err = -ENOMEM;
+		bi->pagelist = pagelist;
  		goto failed_load;
  	}
  
  	vchiq_log_trace(vchiq_arm_log_level,
-		"create_pagelist - %x (%d bytes @%p)", (unsigned int)pagelist, count, buf);
-
-	if (!pagelist)
-		return -ENOMEM;
+		"create_pagelist - %zx (%zu bytes @%p)", (size_t)pagelist, count, buf);
  
  	addrs = pagelist->addrs;
  	pages = (vm_page_t*)(addrs + num_pages);
@@ -467,8 +534,9 @@ create_pagelist(char __user *buf, size_t count, unsigned short type,
  
  	if (actual_pages != num_pages) {
  		vm_page_unhold_pages(pages, actual_pages);
-		free(pagelist, M_VCPAGELIST);
-		return (-ENOMEM);
+		err = -ENOMEM;
+		bi->pagelist = pagelist;
+		goto failed_hold;
  	}
  
  	pagelist->length = count;
@@ -477,27 +545,28 @@ create_pagelist(char __user *buf, size_t count, unsigned short type,
  
  	/* Group the pages into runs of contiguous pages */
  
-	base_addr = (void *)PHYS_TO_VCBUS(VM_PAGE_TO_PHYS(pages[0]));
+	size_t run_ceil = g_long_bulk_space ? 0x100 : PAGE_SIZE;
+	unsigned int pg_addr_rshift = g_long_bulk_space ? 4 : 0;
+	base_addr = (void *) VM_PAGE_TO_VC_BULK_PAGE(pages[0]);
  	next_addr = base_addr + PAGE_SIZE;
  	addridx = 0;
  	run = 0;
-
+#define _PG_BLOCK(base,run) \
+		((((size_t) (base)) >> pg_addr_rshift) & ~(run_ceil-1)) + (run)
  	for (i = 1; i < num_pages; i++) {
-		addr = (void *)PHYS_TO_VCBUS(VM_PAGE_TO_PHYS(pages[i]));
-		if ((addr == next_addr) && (run < (PAGE_SIZE - 1))) {
+		addr = (void *)VM_PAGE_TO_VC_BULK_PAGE(pages[i]);
+		if ((addr == next_addr) && (run < run_ceil - 1)) {
  			next_addr += PAGE_SIZE;
  			run++;
  		} else {
-			addrs[addridx] = (unsigned long)base_addr + run;
-			addridx++;
+			addrs[addridx++] = (uint32_t) _PG_BLOCK(base_addr,run);
  			base_addr = addr;
  			next_addr = addr + PAGE_SIZE;
  			run = 0;
  		}
  	}
-
-	addrs[addridx] = (unsigned long)base_addr + run;
-	addridx++;
+	addrs[addridx++] = _PG_BLOCK(base_addr, run);
+#undef _PG_BLOCK
  
  	/* Partial cache lines (fragments) require special measures */
  	if ((type == PAGELIST_READ) &&
@@ -519,12 +588,24 @@ create_pagelist(char __user *buf, size_t count, unsigned short type,
  		g_free_fragments = *(char **) g_free_fragments;
  		up(&g_free_fragments_mutex);
  		pagelist->type =
-			 PAGELIST_READ_WITH_FRAGMENTS + 
-			 (fragments - g_fragments_base)/g_fragment_size;
+			 PAGELIST_READ_WITH_FRAGMENTS 
+			 + (fragments - g_fragments_base)/g_fragment_size;
+#if defined(__aarch64__)
+		 bus_dmamap_sync(bcm_slots_dma_tag, bcm_slots_dma_map, BUS_DMASYNC_PREREAD);
+#endif
  	}
  
+#if defined(__aarch64__)
+	if(type == PAGELIST_READ){ 
+		cpu_dcache_wbinv_range((vm_offset_t)buf,count);
+	}else{
+		cpu_dcache_wb_range((vm_offset_t)buf,count);
+	}
+	dsb(sy);
+#else
  	pa = pmap_extract(PCPU_GET(curpmap), (vm_offset_t)buf);
  	dcache_wbinv_poc((vm_offset_t)buf, pa, count);
+#endif
  
  	bus_dmamap_sync(bi->pagelist_dma_tag, bi->pagelist_dma_map, BUS_DMASYNC_PREWRITE);
  
@@ -532,6 +613,8 @@ create_pagelist(char __user *buf, size_t count, unsigned short type,
  
  	return 0;
  
+failed_hold:
+	bus_dmamap_unload(bi->pagelist_dma_tag,bi->pagelist_dma_map);
  failed_load:
  	bus_dmamem_free(bi->pagelist_dma_tag, bi->pagelist, bi->pagelist_dma_map);
  failed_alloc:
@@ -550,7 +633,7 @@ free_pagelist(BULKINFO_T *bi, int actual)
  	pagelist = bi->pagelist;
  
  	vchiq_log_trace(vchiq_arm_log_level,
-		"free_pagelist - %x, %d (%lu bytes @%p)", (unsigned int)pagelist, actual, pagelist->length, bi->buf);
+		"free_pagelist - %zx, %d (%u bytes @%p)", (size_t)pagelist, actual, pagelist->length, bi->buf);
  
  	num_pages =
  		(pagelist->length + pagelist->offset + PAGE_SIZE - 1) /
@@ -558,6 +641,27 @@ free_pagelist(BULKINFO_T *bi, int actual)
  
  	pages = (vm_page_t*)(pagelist->addrs + num_pages);
  
+#if defined(__aarch64__)
+	/*
+         * On arm64, even if the user keeps their end of the bargain
+	 * -- do NOT touch the buffers sent to VC -- but reads around the
+	 * pagelist after the invalidation above, the arm might preemptively
+	 * load (and validate) cache lines for areas inside the page list,
+	 * so we must invalidate them again.
+	 *
+	 * The functional test does it and without this it doesn't pass.
+	 *
+	 * XXXMDC might it be enough to invalidate a couple of pages at
+	 * the ends of the page list?
+	 */
+	if(pagelist->type >= PAGELIST_READ && actual > 0)
+		invalidate_cachelines_in_range_of_ppage_seq(
+			pages,
+			pagelist->offset,
+			actual
+		);	
+#endif
+
  	/* Deal with any partial cache lines (fragments) */
  	if (pagelist->type >= PAGELIST_READ_WITH_FRAGMENTS) {
  		char *fragments = g_fragments_base +
@@ -594,13 +698,18 @@ free_pagelist(BULKINFO_T *bi, int actual)
  		up(&g_free_fragments_sema);
  	}
  
-	for (i = 0; i < num_pages; i++) {
-		if (pagelist->type != PAGELIST_WRITE) {
+	if (pagelist->type != PAGELIST_WRITE) {
+		for (i = 0; i < num_pages; i++) {
  			vm_page_dirty(pages[i]);
  			pagelist_page_free(pages[i]);
  		}
  	}
  
+#if defined(__aarch64__)
+	/* XXXMDC necessary? */
+	dsb(sy);
+#endif
+
  	bus_dmamap_unload(bi->pagelist_dma_tag, bi->pagelist_dma_map);
  	bus_dmamem_free(bi->pagelist_dma_tag, bi->pagelist, bi->pagelist_dma_map);
  	bus_dma_tag_destroy(bi->pagelist_dma_tag);
diff --git a/sys/contrib/vchiq/interface/vchiq_arm/vchiq_arm.c b/sys/contrib/vchiq/interface/vchiq_arm/vchiq_arm.c
index 763cd9ce9417..bfcff315a543 100644
--- a/sys/contrib/vchiq/interface/vchiq_arm/vchiq_arm.c
+++ b/sys/contrib/vchiq/interface/vchiq_arm/vchiq_arm.c
@@ -386,7 +386,7 @@ static void
  user_service_free(void *userdata)
  {
  	USER_SERVICE_T *user_service = userdata;
-	
+
  	_sema_destroy(&user_service->insert_event);
  	_sema_destroy(&user_service->remove_event);
  
@@ -410,7 +410,7 @@ static void close_delivered(USER_SERVICE_T *user_service)
  
  		/* Wake the user-thread blocked in close_ or remove_service */
  		up(&user_service->close_event);
- 
+
  		user_service->close_pending = 0;
  	}
  }
@@ -442,12 +442,23 @@ vchiq_ioctl(struct cdev *cdev, u_long cmd, caddr_t arg, int fflag,
  #define	_IOC_TYPE(x)	IOCGROUP(x)
  
  	vchiq_log_trace(vchiq_arm_log_level,
-		 "vchiq_ioctl - instance %x, cmd %s, arg %p",
-		(unsigned int)instance,
+		 "vchiq_ioctl - instance %zx, cmd %s, arg %p",
+		(size_t)instance,
  		((_IOC_TYPE(cmd) == VCHIQ_IOC_MAGIC) &&
  		(_IOC_NR(cmd) <= VCHIQ_IOC_MAX)) ?
  		ioctl_names[_IOC_NR(cmd)] : "<invalid>", arg);
  
+#ifdef COMPAT_FREEBSD32
+/* A fork in the road to freebsd32 compatibilty */
+#define _CF32_FORK(compat_c,native_c)\
+	{ \
+		int _____dont_call_your_vars_this = 0;\
+		switch(cmd){_CF32_CASE {_____dont_call_your_vars_this = 1;} break;} \
+		if(_____dont_call_your_vars_this) { compat_c } else { native_c } \
+	}
+#else
+#define _CF32_FORK(compat_c,native_c) { native_c }
+#endif
  	switch (cmd) {
  	case VCHIQ_IOC_SHUTDOWN:
  		if (!instance->connected)
@@ -496,13 +507,32 @@ vchiq_ioctl(struct cdev *cdev, u_long cmd, caddr_t arg, int fflag,
  				"vchiq: could not connect: %d", status);
  		break;
  
+#ifdef COMPAT_FREEBSD32
+#define _CF32_CASE \
+	case VCHIQ_IOC_CREATE_SERVICE32:
+	_CF32_CASE
+#endif
  	case VCHIQ_IOC_CREATE_SERVICE: {
  		VCHIQ_CREATE_SERVICE_T args;
  		USER_SERVICE_T *user_service = NULL;
  		void *userdata;
  		int srvstate;
  
+_CF32_FORK(
+		VCHIQ_CREATE_SERVICE32_T args32;
+		memcpy(&args32, (const void*)arg, sizeof(args32));
+		args.params.fourcc = args32.params.fourcc;
+/* XXXMDC not actually used? overwritten straight away */
+		args.params.callback = (VCHIQ_CALLBACK_T)(uintptr_t) args32.params.callback;
+		args.params.userdata = (void*)(uintptr_t)args32.params.userdata;
+		args.params.version = args32.params.version;
+		args.params.version_min = args32.params.version_min;
+		args.is_open = args32.is_open;
+		args.is_vchi = args32.is_vchi;
+		args.handle  = args32.handle;
+,
  		memcpy(&args, (const void*)arg, sizeof(args));
+)
  
  		user_service = kmalloc(sizeof(USER_SERVICE_T), GFP_KERNEL);
  		if (!user_service) {
@@ -558,15 +588,22 @@ vchiq_ioctl(struct cdev *cdev, u_long cmd, caddr_t arg, int fflag,
  					break;
  				}
  			}
-
  #ifdef VCHIQ_IOCTL_DEBUG
  			printf("%s: [CREATE SERVICE] handle = %08x\n", __func__, service->handle);
  #endif
+_CF32_FORK(
+			memcpy((void *)
+				&(((VCHIQ_CREATE_SERVICE32_T*)
+					arg)->handle),
+				(const void *)&service->handle,
+				sizeof(service->handle));
+,
  			memcpy((void *)
  				&(((VCHIQ_CREATE_SERVICE_T*)
  					arg)->handle),
  				(const void *)&service->handle,
  				sizeof(service->handle));
+);
  
  			service = NULL;
  		} else {
@@ -574,6 +611,7 @@ vchiq_ioctl(struct cdev *cdev, u_long cmd, caddr_t arg, int fflag,
  			kfree(user_service);
  		}
  	} break;
+#undef _CF32_CASE
  
  	case VCHIQ_IOC_CLOSE_SERVICE: {
  		VCHIQ_SERVICE_HANDLE_T handle;
@@ -673,9 +711,22 @@ vchiq_ioctl(struct cdev *cdev, u_long cmd, caddr_t arg, int fflag,
  			ret = -EINVAL;
  	} break;
  
+#ifdef COMPAT_FREEBSD32
+#define _CF32_CASE \
+		case VCHIQ_IOC_QUEUE_MESSAGE32:
+	_CF32_CASE
+#endif
  	case VCHIQ_IOC_QUEUE_MESSAGE: {
  		VCHIQ_QUEUE_MESSAGE_T args;
+_CF32_FORK(
+		VCHIQ_QUEUE_MESSAGE32_T args32;
+		memcpy(&args32, (const void*)arg, sizeof(args32));
+		args.handle = args32.handle;
+		args.count = args32.count;
+		args.elements = (VCHIQ_ELEMENT_T *)(uintptr_t)args32.elements;
+,
  		memcpy(&args, (const void*)arg, sizeof(args));
+)
  
  #ifdef VCHIQ_IOCTL_DEBUG
  		printf("%s: [QUEUE MESSAGE] handle = %08x\n", __func__, args.handle);
@@ -686,8 +737,22 @@ vchiq_ioctl(struct cdev *cdev, u_long cmd, caddr_t arg, int fflag,
  		if ((service != NULL) && (args.count <= MAX_ELEMENTS)) {
  			/* Copy elements into kernel space */
  			VCHIQ_ELEMENT_T elements[MAX_ELEMENTS];
-			if (copy_from_user(elements, args.elements,
-				args.count * sizeof(VCHIQ_ELEMENT_T)) == 0)
+			long cp_ret;
+_CF32_FORK(
+			VCHIQ_ELEMENT32_T elements32[MAX_ELEMENTS];
+			cp_ret = copy_from_user(elements32, args.elements,
+					args.count * sizeof(VCHIQ_ELEMENT32_T));
+			for(int i=0;cp_ret == 0 && i < args.count;++i){
+				elements[i].data =
+					(void *)(uintptr_t)elements32[i].data;
+				elements[i].size = elements32[i].size;
+			}
+
+,
+			cp_ret = copy_from_user(elements, args.elements,
+				args.count * sizeof(VCHIQ_ELEMENT_T));
+)
+			if (cp_ret == 0)
  				status = vchiq_queue_message
  					(args.handle,
  					elements, args.count);
@@ -697,16 +762,37 @@ vchiq_ioctl(struct cdev *cdev, u_long cmd, caddr_t arg, int fflag,
  			ret = -EINVAL;
  		}
  	} break;
+#undef _CF32_CASE
  
+#ifdef COMPAT_FREEBSD32
+#define _CF32_CASE \
+		case VCHIQ_IOC_QUEUE_BULK_TRANSMIT32: \
+		case VCHIQ_IOC_QUEUE_BULK_RECEIVE32:
+	_CF32_CASE
+#endif
  	case VCHIQ_IOC_QUEUE_BULK_TRANSMIT:
  	case VCHIQ_IOC_QUEUE_BULK_RECEIVE: {
  		VCHIQ_QUEUE_BULK_TRANSFER_T args;
+
  		struct bulk_waiter_node *waiter = NULL;
  		VCHIQ_BULK_DIR_T dir =
-			(cmd == VCHIQ_IOC_QUEUE_BULK_TRANSMIT) ?
+			(cmd == VCHIQ_IOC_QUEUE_BULK_TRANSMIT) || (cmd == VCHIQ_IOC_QUEUE_BULK_TRANSMIT32)?
  			VCHIQ_BULK_TRANSMIT : VCHIQ_BULK_RECEIVE;
  
+_CF32_FORK(
+		VCHIQ_QUEUE_BULK_TRANSFER32_T args32;
+		memcpy(&args32, (const void*)arg, sizeof(args32));
+		/* XXXMDC parens needed (macro parsing?) */
+		args = ((VCHIQ_QUEUE_BULK_TRANSFER_T) {
+			.handle = args32.handle,
+			.data = (void *)(uintptr_t) args32.data,
+			.size = args32.size,
+			.userdata = (void *)(uintptr_t) args32.userdata,
+			.mode = args32.mode,
+		});
+,
  		memcpy(&args, (const void*)arg, sizeof(args));
+)
  
  		service = find_service_for_instance(instance, args.handle);
  		if (!service) {
@@ -734,7 +820,6 @@ vchiq_ioctl(struct cdev *cdev, u_long cmd, caddr_t arg, int fflag,
  					list_del(pos);
  					break;
  				}
-
  			}
  			lmutex_unlock(&instance->bulk_waiter_list_mutex);
  			if (!waiter) {
@@ -745,10 +830,11 @@ vchiq_ioctl(struct cdev *cdev, u_long cmd, caddr_t arg, int fflag,
  				break;
  			}
  			vchiq_log_info(vchiq_arm_log_level,
-				"found bulk_waiter %x for pid %d",
-				(unsigned int)waiter, current->p_pid);
+				"found bulk_waiter %zx for pid %d",
+				(size_t)waiter, current->p_pid);
  			args.userdata = &waiter->bulk_waiter;
  		}
+
  		status = vchiq_bulk_transfer
  			(args.handle,
  			 VCHI_MEM_HANDLE_INVALID,
@@ -776,17 +862,31 @@ vchiq_ioctl(struct cdev *cdev, u_long cmd, caddr_t arg, int fflag,
  			list_add(&waiter->list, &instance->bulk_waiter_list);
  			lmutex_unlock(&instance->bulk_waiter_list_mutex);
  			vchiq_log_info(vchiq_arm_log_level,
-				"saved bulk_waiter %x for pid %d",
-				(unsigned int)waiter, current->p_pid);
+				"saved bulk_waiter %zx for pid %d",
+				(size_t)waiter, current->p_pid);
  
+_CF32_FORK(
+			memcpy((void *)
+				&(((VCHIQ_QUEUE_BULK_TRANSFER32_T *)
+					arg)->mode),
+				(const void *)&mode_waiting,
+				sizeof(mode_waiting));
+,
  			memcpy((void *)
  				&(((VCHIQ_QUEUE_BULK_TRANSFER_T *)
  					arg)->mode),
  				(const void *)&mode_waiting,
  				sizeof(mode_waiting));
+)
  		}
  	} break;
+#undef _CF32_CASE
  
+#ifdef COMPAT_FREEBSD32
+#define _CF32_CASE \
+		case VCHIQ_IOC_AWAIT_COMPLETION32:
+	_CF32_CASE
+#endif
  	case VCHIQ_IOC_AWAIT_COMPLETION: {
  		VCHIQ_AWAIT_COMPLETION_T args;
  		int count = 0;
@@ -797,7 +897,17 @@ vchiq_ioctl(struct cdev *cdev, u_long cmd, caddr_t arg, int fflag,
  			break;
  		}
  
+_CF32_FORK(
+		VCHIQ_AWAIT_COMPLETION32_T args32;
+                memcpy(&args32, (const void*)arg, sizeof(args32));
+		args.count = args32.count;
+		args.buf = (VCHIQ_COMPLETION_DATA_T *)(uintptr_t)args32.buf;
+		args.msgbufsize = args32.msgbufsize;
+		args.msgbufcount = args32.msgbufcount;
+		args.msgbufs = (void **)(uintptr_t)args32.msgbufs;
+,
                  memcpy(&args, (const void*)arg, sizeof(args));
+)
  
  		lmutex_lock(&instance->completion_mutex);
  
@@ -860,9 +970,9 @@ vchiq_ioctl(struct cdev *cdev, u_long cmd, caddr_t arg, int fflag,
  					if (args.msgbufsize < msglen) {
  						vchiq_log_error(
  							vchiq_arm_log_level,
-							"header %x: msgbufsize"
+							"header %zx: msgbufsize"
  							" %x < msglen %x",
-							(unsigned int)header,
+							(size_t)header,
  							args.msgbufsize,
  							msglen);
  						WARN(1, "invalid message "
@@ -877,6 +987,19 @@ vchiq_ioctl(struct cdev *cdev, u_long cmd, caddr_t arg, int fflag,
  						break;
  					/* Get the pointer from user space */
  					msgbufcount--;
+_CF32_FORK(
+					uint32_t *msgbufs32 = (uint32_t *) args.msgbufs;
+					uint32_t msgbuf32 = 0;
+					if (copy_from_user(&msgbuf32,
+						(const uint32_t __user *)
+						&msgbufs32[msgbufcount],
+						sizeof(msgbuf32)) != 0) {
+						if (count == 0)
+							ret = -EFAULT;
+						break;
+					}
+					msgbuf = (void __user *)(uintptr_t)msgbuf32;
+,
  					if (copy_from_user(&msgbuf,
  						(const void __user *)
  						&args.msgbufs[msgbufcount],
@@ -885,6 +1008,7 @@ vchiq_ioctl(struct cdev *cdev, u_long cmd, caddr_t arg, int fflag,
  							ret = -EFAULT;
  						break;
  					}
+)
  
  					/* Copy the message to user space */
  					if (copy_to_user(msgbuf, header,
@@ -908,7 +1032,26 @@ vchiq_ioctl(struct cdev *cdev, u_long cmd, caddr_t arg, int fflag,
  					VCHIQ_SERVICE_CLOSED) &&
  					!instance->use_close_delivered)
  					unlock_service(service1);
-
+_CF32_FORK(
+				VCHIQ_COMPLETION_DATA32_T comp32 = {0};
+				comp32.reason
+				= (uint32_t)(size_t) completion->reason;
+				comp32.service_userdata
+				= (uint32_t)(size_t) completion->service_userdata;
+				comp32.bulk_userdata
+				= (uint32_t)(size_t) completion->bulk_userdata;
+				comp32.header = (uint32_t)(size_t)completion->header;
+
+				VCHIQ_COMPLETION_DATA32_T __user *buf_loc;
+				buf_loc = (VCHIQ_COMPLETION_DATA32_T __user *) args.buf;
+				buf_loc += count;
+				if (copy_to_user(
+						buf_loc, &comp32, sizeof(comp32)
+				  	) != 0){
+						if (ret == 0)
+							ret = -EFAULT;
+				}
+,
  				if (copy_to_user((void __user *)(
  					(size_t)args.buf +
  					count * sizeof(VCHIQ_COMPLETION_DATA_T)),
@@ -918,6 +1061,7 @@ vchiq_ioctl(struct cdev *cdev, u_long cmd, caddr_t arg, int fflag,
  							ret = -EFAULT;
  					break;
  				}
+)
  
  				/* Ensure that the above copy has completed
  				** before advancing the remove pointer. */
@@ -927,18 +1071,33 @@ vchiq_ioctl(struct cdev *cdev, u_long cmd, caddr_t arg, int fflag,
  			}
  
  			if (msgbufcount != args.msgbufcount) {
+_CF32_FORK(
+				memcpy(
+					(void __user *)
+					&((VCHIQ_AWAIT_COMPLETION32_T *)arg)->
+						msgbufcount,
+					&msgbufcount,
+					sizeof(msgbufcount));
+,
  				memcpy((void __user *)
  					&((VCHIQ_AWAIT_COMPLETION_T *)arg)->
  						msgbufcount,
  					&msgbufcount,
  					sizeof(msgbufcount));
+)
  			}
  
  			 if (count != args.count)
  			 {
+_CF32_FORK(
+				memcpy((void __user *)
+					&((VCHIQ_AWAIT_COMPLETION32_T *)arg)->count,
+					&count, sizeof(count));
+,
  				memcpy((void __user *)
  					&((VCHIQ_AWAIT_COMPLETION_T *)arg)->count,
  					&count, sizeof(count));
+)
  			}
  		}
  
@@ -947,9 +1106,9 @@ vchiq_ioctl(struct cdev *cdev, u_long cmd, caddr_t arg, int fflag,
  
  		if ((ret == 0) && instance->closing)
  			ret = -ENOTCONN;
-		/* 
+		/*
  		 * XXXBSD: ioctl return codes are not negative as in linux, so
-		 * we can not indicate success with positive number of passed 
+		 * we can not indicate success with positive number of passed
  		 * messages
  		 */
  		if (ret > 0)
@@ -958,14 +1117,29 @@ vchiq_ioctl(struct cdev *cdev, u_long cmd, caddr_t arg, int fflag,
  		lmutex_unlock(&instance->completion_mutex);
  		DEBUG_TRACE(AWAIT_COMPLETION_LINE);
  	} break;
+#undef _CF32_CASE
  
+#ifdef COMPAT_FREEBSD32
+#define _CF32_CASE \
+		case VCHIQ_IOC_DEQUEUE_MESSAGE32:
+	_CF32_CASE
+#endif
  	case VCHIQ_IOC_DEQUEUE_MESSAGE: {
  		VCHIQ_DEQUEUE_MESSAGE_T args;
  		USER_SERVICE_T *user_service;
  		VCHIQ_HEADER_T *header;
  
  		DEBUG_TRACE(DEQUEUE_MESSAGE_LINE);
+_CF32_FORK(
+		VCHIQ_DEQUEUE_MESSAGE32_T args32;
+		memcpy(&args32, (const void*)arg, sizeof(args32));
+		args.handle = args32.handle;
+		args.blocking = args32.blocking;
+		args.bufsize = args32.bufsize;
+		args.buf = (void *)(uintptr_t)args32.buf;
+,
  		memcpy(&args, (const void*)arg, sizeof(args));
+)
  		service = find_service_for_instance(instance, args.handle);
  		if (!service) {
  			ret = -EINVAL;
@@ -1022,8 +1196,19 @@ vchiq_ioctl(struct cdev *cdev, u_long cmd, caddr_t arg, int fflag,
  				header->data,
  				header->size) == 0)) {
  				args.bufsize = header->size;
+_CF32_FORK(
+				VCHIQ_DEQUEUE_MESSAGE32_T args32;
+				args32.handle = args.handle;
+				args32.blocking = args.blocking;
+				args32.bufsize = args.bufsize;
+				args32.buf = (uintptr_t)(void *)args.buf;
+
+				memcpy((void *)arg, &args32,
+				    sizeof(args32));
+,
  				memcpy((void *)arg, &args,
  				    sizeof(args));
+)
  				vchiq_release_message(
  					service->handle,
  					header);
@@ -1031,14 +1216,15 @@ vchiq_ioctl(struct cdev *cdev, u_long cmd, caddr_t arg, int fflag,
  				ret = -EFAULT;
  		} else {
  			vchiq_log_error(vchiq_arm_log_level,
-				"header %x: bufsize %x < size %x",
-				(unsigned int)header, args.bufsize,
+				"header %zx: bufsize %x < size %x",
+				(size_t)header, args.bufsize,
  				header->size);
  			WARN(1, "invalid size\n");
  			ret = -EMSGSIZE;
  		}
  		DEBUG_TRACE(DEQUEUE_MESSAGE_LINE);
  	} break;
+#undef _CF32_CASE
  
  	case VCHIQ_IOC_GET_CLIENT_ID: {
  		VCHIQ_SERVICE_HANDLE_T handle;
@@ -1048,11 +1234,24 @@ vchiq_ioctl(struct cdev *cdev, u_long cmd, caddr_t arg, int fflag,
  		ret = vchiq_get_client_id(handle);
  	} break;
  
+#ifdef COMPAT_FREEBSD32
+#define _CF32_CASE \
+		case VCHIQ_IOC_GET_CONFIG32:
+	_CF32_CASE
+#endif
  	case VCHIQ_IOC_GET_CONFIG: {
  		VCHIQ_GET_CONFIG_T args;
  		VCHIQ_CONFIG_T config;
-
+_CF32_FORK(
+		VCHIQ_GET_CONFIG32_T args32;
+
+		memcpy(&args32, (const void*)arg, sizeof(args32));
+		args.config_size = args32.config_size;
+		args.pconfig = (VCHIQ_CONFIG_T *)
+			(uintptr_t)args32.pconfig;
+,
  		memcpy(&args, (const void*)arg, sizeof(args));
+)
  		if (args.config_size > sizeof(config)) {
  			ret = -EINVAL;
  			break;
@@ -1066,6 +1265,7 @@ vchiq_ioctl(struct cdev *cdev, u_long cmd, caddr_t arg, int fflag,
  			}
  		}
  	} break;
+#undef _CF32_CASE
  
  	case VCHIQ_IOC_SET_SERVICE_OPTION: {
  		VCHIQ_SET_SERVICE_OPTION_T args;
@@ -1082,18 +1282,31 @@ vchiq_ioctl(struct cdev *cdev, u_long cmd, caddr_t arg, int fflag,
  				args.handle, args.option, args.value);
  	} break;
  
+#ifdef COMPAT_FREEBSD32
+#define _CF32_CASE \
+		case VCHIQ_IOC_DUMP_PHYS_MEM32:
+	_CF32_CASE
+#endif
  	case VCHIQ_IOC_DUMP_PHYS_MEM: {
  		VCHIQ_DUMP_MEM_T  args;
  
+_CF32_FORK(
+		VCHIQ_DUMP_MEM32_T args32;
+		memcpy(&args32, (const void*)arg, sizeof(args32));
+		args.virt_addr = (void *)(uintptr_t)args32.virt_addr;
+		args.num_bytes = (size_t)args32.num_bytes;
+,
  		memcpy(&args, (const void*)arg, sizeof(args));
+)
  		printf("IMPLEMENT ME: %s:%d\n", __FILE__, __LINE__);
  #if 0
  		dump_phys_mem(args.virt_addr, args.num_bytes);
  #endif
  	} break;
+#undef _CF32_CASE
  
  	case VCHIQ_IOC_LIB_VERSION: {
-		unsigned int lib_version = (unsigned int)arg;
+		size_t lib_version = (size_t)arg;
  
  		if (lib_version < VCHIQ_VERSION_MIN)
  			ret = -EINVAL;
@@ -1119,6 +1332,7 @@ vchiq_ioctl(struct cdev *cdev, u_long cmd, caddr_t arg, int fflag,
  		ret = -ENOTTY;
  		break;
  	}
+#undef _CF32_FORK
  
  	if (service)
  		unlock_service(service);
@@ -1155,18 +1369,14 @@ vchiq_ioctl(struct cdev *cdev, u_long cmd, caddr_t arg, int fflag,
  	return ret;
  }
  
-static void
-instance_dtr(void *data)
-{
  
-	kfree(data);
-}
  
  /****************************************************************************
  *
  *   vchiq_open
  *
  ***************************************************************************/
+static void instance_dtr(void *data);
  
  static int
  vchiq_open(struct cdev *dev, int flags, int fmt __unused, struct thread *td)
@@ -1206,7 +1416,7 @@ vchiq_open(struct cdev *dev, int flags, int fmt __unused, struct thread *td)
  		INIT_LIST_HEAD(&instance->bulk_waiter_list);
  
  		devfs_set_cdevpriv(instance, instance_dtr);
-	} 
+	}
  	else {
  		vchiq_log_error(vchiq_arm_log_level,
  			"Unknown minor device");
@@ -1222,143 +1432,151 @@ vchiq_open(struct cdev *dev, int flags, int fmt __unused, struct thread *td)
  *
  ***************************************************************************/
  
+
  static int
-vchiq_close(struct cdev *dev, int flags __unused, int fmt __unused,
-                struct thread *td)
+_vchiq_close_instance(VCHIQ_INSTANCE_T instance)
  {
  	int ret = 0;
-	if (1) {
-		VCHIQ_INSTANCE_T instance;
-		VCHIQ_STATE_T *state = vchiq_get_state();
-		VCHIQ_SERVICE_T *service;
-		int i;
-
-		if ((ret = devfs_get_cdevpriv((void**)&instance))) {
-			printf("devfs_get_cdevpriv failed: error %d\n", ret);
-			return (ret);
-		}
-
-		vchiq_log_info(vchiq_arm_log_level,
-			"vchiq_release: instance=%lx",
-			(unsigned long)instance);
-
-		if (!state) {
-			ret = -EPERM;
-			goto out;
-		}
+	VCHIQ_STATE_T *state = vchiq_get_state();
+	VCHIQ_SERVICE_T *service;
+	int i;
  
-		/* Ensure videocore is awake to allow termination. */
-		vchiq_use_internal(instance->state, NULL,
-				USE_TYPE_VCHIQ);
+	vchiq_log_info(vchiq_arm_log_level,
+		"vchiq_release: instance=%lx",
+		(unsigned long)instance);
  
-		lmutex_lock(&instance->completion_mutex);
+	if (!state) {
+		ret = -EPERM;
+		goto out;
+	}
  
-		/* Wake the completion thread and ask it to exit */
-		instance->closing = 1;
-		up(&instance->insert_event);
+	/* Ensure videocore is awake to allow termination. */
+	vchiq_use_internal(instance->state, NULL,
+			USE_TYPE_VCHIQ);
  
-		lmutex_unlock(&instance->completion_mutex);
+	lmutex_lock(&instance->completion_mutex);
  
-		/* Wake the slot handler if the completion queue is full. */
-		up(&instance->remove_event);
+	/* Wake the completion thread and ask it to exit */
+	instance->closing = 1;
+	up(&instance->insert_event);
  
-		/* Mark all services for termination... */
-		i = 0;
-		while ((service = next_service_by_instance(state, instance,
-			&i)) !=	NULL) {
-			USER_SERVICE_T *user_service = service->base.userdata;
+	lmutex_unlock(&instance->completion_mutex);
  
-			/* Wake the slot handler if the msg queue is full. */
-			up(&user_service->remove_event);
+	/* Wake the slot handler if the completion queue is full. */
+	up(&instance->remove_event);
  
-			vchiq_terminate_service_internal(service);
-			unlock_service(service);
-		}
+	/* Mark all services for termination... */
+	i = 0;
+	while ((service = next_service_by_instance(state, instance,
+		&i)) !=	NULL) {
+		USER_SERVICE_T *user_service = service->base.userdata;
  
-		/* ...and wait for them to die */
-		i = 0;
-		while ((service = next_service_by_instance(state, instance, &i))
-			!= NULL) {
-			USER_SERVICE_T *user_service = service->base.userdata;
+		/* Wake the slot handler if the msg queue is full. */
+		up(&user_service->remove_event);
  
-			down(&service->remove_event);
+		vchiq_terminate_service_internal(service);
+		unlock_service(service);
+	}
  
-			BUG_ON(service->srvstate != VCHIQ_SRVSTATE_FREE);
+	/* ...and wait for them to die */
+	i = 0;
+	while ((service = next_service_by_instance(state, instance, &i))
+		!= NULL) {
+		USER_SERVICE_T *user_service = service->base.userdata;
  
-			spin_lock(&msg_queue_spinlock);
+		down(&service->remove_event);
  
-			while (user_service->msg_remove !=
-				user_service->msg_insert) {
-				VCHIQ_HEADER_T *header = user_service->
-					msg_queue[user_service->msg_remove &
-						(MSG_QUEUE_SIZE - 1)];
-				user_service->msg_remove++;
-				spin_unlock(&msg_queue_spinlock);
+		BUG_ON(service->srvstate != VCHIQ_SRVSTATE_FREE);
  
-				if (header)
-					vchiq_release_message(
-						service->handle,
-						header);
-				spin_lock(&msg_queue_spinlock);
-			}
+		spin_lock(&msg_queue_spinlock);
  
+		while (user_service->msg_remove !=
+			user_service->msg_insert) {
+			VCHIQ_HEADER_T *header = user_service->
+				msg_queue[user_service->msg_remove &
+					(MSG_QUEUE_SIZE - 1)];
+			user_service->msg_remove++;
  			spin_unlock(&msg_queue_spinlock);
  
-			unlock_service(service);
+			if (header)
+				vchiq_release_message(
+					service->handle,
+					header);
+			spin_lock(&msg_queue_spinlock);
  		}
  
-		/* Release any closed services */
-		while (instance->completion_remove !=
-			instance->completion_insert) {
-			VCHIQ_COMPLETION_DATA_T *completion;
-			VCHIQ_SERVICE_T *service1;
-			completion = &instance->completions[
-				instance->completion_remove &
-				(MAX_COMPLETIONS - 1)];
-			service1 = completion->service_userdata;
-			if (completion->reason == VCHIQ_SERVICE_CLOSED)
-			{
-				USER_SERVICE_T *user_service =
-					service->base.userdata;
-
-				/* Wake any blocked user-thread */
-				if (instance->use_close_delivered)
-					up(&user_service->close_event);
-				unlock_service(service1);
-			}
-			instance->completion_remove++;
-		}
+		spin_unlock(&msg_queue_spinlock);
  
-		/* Release the PEER service count. */
-		vchiq_release_internal(instance->state, NULL);
+		unlock_service(service);
+	}
  
+	/* Release any closed services */
+	while (instance->completion_remove !=
+		instance->completion_insert) {
+		VCHIQ_COMPLETION_DATA_T *completion;
+		VCHIQ_SERVICE_T *service;
+		completion = &instance->completions[
+			instance->completion_remove &
+			(MAX_COMPLETIONS - 1)];
+		service = completion->service_userdata;
+		if (completion->reason == VCHIQ_SERVICE_CLOSED)
  		{
-			struct list_head *pos, *next;
-			list_for_each_safe(pos, next,
-				&instance->bulk_waiter_list) {
-				struct bulk_waiter_node *waiter;
-				waiter = list_entry(pos,
-					struct bulk_waiter_node,
-					list);
-				list_del(pos);
-				vchiq_log_info(vchiq_arm_log_level,
-					"bulk_waiter - cleaned up %x "
-					"for pid %d",
-					(unsigned int)waiter, waiter->pid);
-		                _sema_destroy(&waiter->bulk_waiter.event);
-				kfree(waiter);
-			}
-		}
+			USER_SERVICE_T *user_service =
+				service->base.userdata;
  
+			/* Wake any blocked user-thread */
+			if (instance->use_close_delivered)
+				up(&user_service->close_event);
+
+			unlock_service(service);
+		}
+		instance->completion_remove++;
  	}
-	else {
-		vchiq_log_error(vchiq_arm_log_level,
-			"Unknown minor device");
-		ret = -ENXIO;
+
+	/* Release the PEER service count. */
+	vchiq_release_internal(instance->state, NULL);
+
+	{
+		struct list_head *pos, *next;
+		list_for_each_safe(pos, next,
+			&instance->bulk_waiter_list) {
+			struct bulk_waiter_node *waiter;
+			waiter = list_entry(pos,
+				struct bulk_waiter_node,
+				list);
+			list_del(pos);
+			vchiq_log_info(vchiq_arm_log_level,
+				"bulk_waiter - cleaned up %zx "
+				"for pid %d",
+				(size_t)waiter, waiter->pid);
+			_sema_destroy(&waiter->bulk_waiter.event);
+			kfree(waiter);
+		}
  	}
  
  out:
  	return ret;
+
+}
+
+static void
+instance_dtr(void *data)
+{
+	VCHIQ_INSTANCE_T instance =  data;
+	_vchiq_close_instance(instance);
+	kfree(data);
+}
+
+static int
+vchiq_close(struct cdev *dev, int flags __unused, int fmt __unused,
+                struct thread *td)
+{
+
+	/* XXXMDC it's privdata that tracks opens */
+	/* XXXMDC only get closes when there are no more open fds on a vnode */
+
+	return(0);
+
  }
  
  /****************************************************************************
@@ -1435,9 +1653,9 @@ vchiq_dump_platform_instances(void *dump_context)
  			instance = service->instance;
  			if (instance && !instance->mark) {
  				len = snprintf(buf, sizeof(buf),
-					"Instance %x: pid %d,%s completions "
+					"Instance %zx: pid %d,%s completions "
  						"%d/%d",
-					(unsigned int)instance, instance->pid,
+					(size_t)instance, instance->pid,
  					instance->connected ? " connected, " :
  						"",
  					instance->completion_insert -
@@ -1465,8 +1683,8 @@ vchiq_dump_platform_service_state(void *dump_context, VCHIQ_SERVICE_T *service)
  	char buf[80];
  	int len;
  
-	len = snprintf(buf, sizeof(buf), "  instance %x",
-		(unsigned int)service->instance);
+	len = snprintf(buf, sizeof(buf), "  instance %zx",
+		(size_t)service->instance);
  
  	if ((service->base.callback == service_callback) &&
  		user_service->is_vchi) {
diff --git a/sys/contrib/vchiq/interface/vchiq_arm/vchiq_core.c b/sys/contrib/vchiq/interface/vchiq_arm/vchiq_core.c
index 2e30dd7dc3de..80a3a531e8b5 100644
--- a/sys/contrib/vchiq/interface/vchiq_arm/vchiq_core.c
+++ b/sys/contrib/vchiq/interface/vchiq_arm/vchiq_core.c
@@ -31,6 +31,9 @@
   * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
   */
  
+/* For the PRIu64 format identifier */
+#include <machine/_inttypes.h>
+
  #include "vchiq_core.h"
  #include "vchiq_killable.h"
  
@@ -392,9 +395,9 @@ make_service_callback(VCHIQ_SERVICE_T *service, VCHIQ_REASON_T reason,
  	VCHIQ_HEADER_T *header, void *bulk_userdata)
  {
  	VCHIQ_STATUS_T status;
-	vchiq_log_trace(vchiq_core_log_level, "%d: callback:%d (%s, %x, %x)",
+	vchiq_log_trace(vchiq_core_log_level, "%d: callback:%d (%s, %tx, %tx)",
  		service->state->id, service->localport, reason_names[reason],
-		(unsigned int)header, (unsigned int)bulk_userdata);
+		(size_t)header, (size_t)bulk_userdata);
  	status = service->base.callback(reason, header, service->handle,
  		bulk_userdata);
  	if (status == VCHIQ_ERROR) {
@@ -417,13 +420,15 @@ vchiq_set_conn_state(VCHIQ_STATE_T *state, VCHIQ_CONNSTATE_T newstate)
  	vchiq_platform_conn_state_changed(state, oldstate, newstate);
  }
  
+#define ACTUAL_EVENT_SEM_ADDR(ref,offset)\
+	((struct semaphore *)(((size_t) ref) + ((size_t) offset)))
  static inline void
-remote_event_create(REMOTE_EVENT_T *event)
+remote_event_create(VCHIQ_STATE_T *ref, REMOTE_EVENT_T *event)
  {
  	event->armed = 0;
  	/* Don't clear the 'fired' flag because it may already have been set
  	** by the other side. */
-	_sema_init(event->event, 0);
+	_sema_init(ACTUAL_EVENT_SEM_ADDR(ref,event->event), 0);
  }
  
  __unused static inline void
@@ -433,13 +438,18 @@ remote_event_destroy(REMOTE_EVENT_T *event)
  }
  
  static inline int
-remote_event_wait(REMOTE_EVENT_T *event)
+remote_event_wait(VCHIQ_STATE_T *ref, REMOTE_EVENT_T *event)
  {
  	if (!event->fired) {
  		event->armed = 1;
+#if defined(__aarch64__)
+		dsb(sy);
+#else
  		dsb();
+#endif
+
  		if (!event->fired) {
-			if (down_interruptible(event->event) != 0) {
+			if (down_interruptible(ACTUAL_EVENT_SEM_ADDR(ref,event->event)) != 0) {
  				event->armed = 0;
  				return 0;
  			}
@@ -453,26 +463,32 @@ remote_event_wait(REMOTE_EVENT_T *event)
  }
  
  static inline void
-remote_event_signal_local(REMOTE_EVENT_T *event)
+remote_event_signal_local(VCHIQ_STATE_T *ref, REMOTE_EVENT_T *event)
  {
+/*
+ * Mirror
+ * https://github.com/raspberrypi/linux/commit/a50c4c9a65779ca835746b5fd79d3d5278afbdbe
+ * for extra safety
+ */
+	event->fired = 1;
  	event->armed = 0;
-	up(event->event);
+	up(ACTUAL_EVENT_SEM_ADDR(ref,event->event));
  }
  
  static inline void
-remote_event_poll(REMOTE_EVENT_T *event)
+remote_event_poll(VCHIQ_STATE_T *ref, REMOTE_EVENT_T *event)
  {
  	if (event->fired && event->armed)
-		remote_event_signal_local(event);
+		remote_event_signal_local(ref,event);
  }
  
  void
  remote_event_pollall(VCHIQ_STATE_T *state)
  {
-	remote_event_poll(&state->local->sync_trigger);
-	remote_event_poll(&state->local->sync_release);
-	remote_event_poll(&state->local->trigger);
-	remote_event_poll(&state->local->recycle);
+	remote_event_poll(state , &state->local->sync_trigger);
+	remote_event_poll(state , &state->local->sync_release);
+	remote_event_poll(state , &state->local->trigger);
+	remote_event_poll(state , &state->local->recycle);
  }
  
  /* Round up message sizes so that any space at the end of a slot is always big
@@ -553,7 +569,7 @@ request_poll(VCHIQ_STATE_T *state, VCHIQ_SERVICE_T *service, int poll_type)
  	wmb();
  
  	/* ... and ensure the slot handler runs. */
-	remote_event_signal_local(&state->local->trigger);
+	remote_event_signal_local(state, &state->local->trigger);
  }
  
  /* Called from queue_message, by the slot handler and application threads,
@@ -640,8 +656,8 @@ process_free_queue(VCHIQ_STATE_T *state)
  
  		rmb();
  
-		vchiq_log_trace(vchiq_core_log_level, "%d: pfq %d=%x %x %x",
-			state->id, slot_index, (unsigned int)data,
+		vchiq_log_trace(vchiq_core_log_level, "%d: pfq %d=%tx %x %x",
+			state->id, slot_index, (size_t)data,
  			local->slot_queue_recycle, slot_queue_available);
  
  		/* Initialise the bitmask for services which have used this
@@ -675,13 +691,13 @@ process_free_queue(VCHIQ_STATE_T *state)
  					vchiq_log_error(vchiq_core_log_level,
  						"service %d "
  						"message_use_count=%d "
-						"(header %x, msgid %x, "
+						"(header %tx, msgid %x, "
  						"header->msgid %x, "
  						"header->size %x)",
  						port,
  						service_quota->
  							message_use_count,
-						(unsigned int)header, msgid,
+						(size_t)header, msgid,
  						header->msgid,
  						header->size);
  					WARN(1, "invalid message use count\n");
@@ -704,24 +720,24 @@ process_free_queue(VCHIQ_STATE_T *state)
  						up(&service_quota->quota_event);
  						vchiq_log_trace(
  							vchiq_core_log_level,
-							"%d: pfq:%d %x@%x - "
+							"%d: pfq:%d %x@%tx - "
  							"slot_use->%d",
  							state->id, port,
  							header->size,
-							(unsigned int)header,
+							(size_t)header,
  							count - 1);
  					} else {
  						vchiq_log_error(
  							vchiq_core_log_level,
  								"service %d "
  								"slot_use_count"
-								"=%d (header %x"
+								"=%d (header %tx"
  								", msgid %x, "
  								"header->msgid"
  								" %x, header->"
  								"size %x)",
  							port, count,
-							(unsigned int)header,
+							(size_t)header,
  							msgid,
  							header->msgid,
  							header->size);
@@ -735,9 +751,9 @@ process_free_queue(VCHIQ_STATE_T *state)
  			pos += calc_stride(header->size);
  			if (pos > VCHIQ_SLOT_SIZE) {
  				vchiq_log_error(vchiq_core_log_level,
-					"pfq - pos %x: header %x, msgid %x, "
+					"pfq - pos %x: header %tx, msgid %x, "
  					"header->msgid %x, header->size %x",
-					pos, (unsigned int)header, msgid,
+					pos, (size_t)header, msgid,
  					header->msgid, header->size);
  				WARN(1, "invalid slot position\n");
  			}
@@ -885,17 +901,16 @@ queue_message(VCHIQ_STATE_T *state, VCHIQ_SERVICE_T *service,
  		int slot_use_count;
  
  		vchiq_log_info(vchiq_core_log_level,
-			"%d: qm %s@%x,%x (%d->%d)",
+			"%d: qm %s@%tx,%x (%d->%d)",
  			state->id,
  			msg_type_str(VCHIQ_MSG_TYPE(msgid)),
-			(unsigned int)header, size,
+			(size_t)header, size,
  			VCHIQ_MSG_SRCPORT(msgid),
  			VCHIQ_MSG_DSTPORT(msgid));
  
  		BUG_ON(!service);
  		BUG_ON((flags & (QMFLAGS_NO_MUTEX_LOCK |
  				 QMFLAGS_NO_MUTEX_UNLOCK)) != 0);
-
  		for (i = 0, pos = 0; i < (unsigned int)count;
  			pos += elements[i++].size)
  			if (elements[i].size) {
@@ -951,9 +966,9 @@ queue_message(VCHIQ_STATE_T *state, VCHIQ_SERVICE_T *service,
  		VCHIQ_SERVICE_STATS_ADD(service, ctrl_tx_bytes, size);
  	} else {
  		vchiq_log_info(vchiq_core_log_level,
-			"%d: qm %s@%x,%x (%d->%d)", state->id,
+			"%d: qm %s@%tx,%x (%d->%d)", state->id,
  			msg_type_str(VCHIQ_MSG_TYPE(msgid)),
-			(unsigned int)header, size,
+			(size_t)header, size,
  			VCHIQ_MSG_SRCPORT(msgid),
  			VCHIQ_MSG_DSTPORT(msgid));
  		if (size != 0) {
@@ -1017,7 +1032,7 @@ queue_message_sync(VCHIQ_STATE_T *state, VCHIQ_SERVICE_T *service,
  		(lmutex_lock_interruptible(&state->sync_mutex) != 0))
  		return VCHIQ_RETRY;
  
-	remote_event_wait(&local->sync_release);
+	remote_event_wait(state, &local->sync_release);
  
  	rmb();
  
@@ -1036,9 +1051,9 @@ queue_message_sync(VCHIQ_STATE_T *state, VCHIQ_SERVICE_T *service,
  		int i, pos;
  
  		vchiq_log_info(vchiq_sync_log_level,
-			"%d: qms %s@%x,%x (%d->%d)", state->id,
+			"%d: qms %s@%tx,%x (%d->%d)", state->id,
  			msg_type_str(VCHIQ_MSG_TYPE(msgid)),
-			(unsigned int)header, size,
+			(size_t)header, size,
  			VCHIQ_MSG_SRCPORT(msgid),
  			VCHIQ_MSG_DSTPORT(msgid));
  
@@ -1065,9 +1080,9 @@ queue_message_sync(VCHIQ_STATE_T *state, VCHIQ_SERVICE_T *service,
  		VCHIQ_SERVICE_STATS_ADD(service, ctrl_tx_bytes, size);
  	} else {
  		vchiq_log_info(vchiq_sync_log_level,
-			"%d: qms %s@%x,%x (%d->%d)", state->id,
+			"%d: qms %s@%tx,%x (%d->%d)", state->id,
  			msg_type_str(VCHIQ_MSG_TYPE(msgid)),
-			(unsigned int)header, size,
+			(size_t)header, size,
  			VCHIQ_MSG_SRCPORT(msgid),
  			VCHIQ_MSG_DSTPORT(msgid));
  		if (size != 0) {
@@ -1098,9 +1113,6 @@ queue_message_sync(VCHIQ_STATE_T *state, VCHIQ_SERVICE_T *service,
  			size);
  	}
  
-	/* Make sure the new header is visible to the peer. */
-	wmb();
-
  	remote_event_signal(&state->remote->sync_trigger);
  
  	if (VCHIQ_MSG_TYPE(msgid) != VCHIQ_MSG_PAUSE)
@@ -1368,26 +1380,26 @@ resolve_bulks(VCHIQ_SERVICE_T *service, VCHIQ_BULK_QUEUE_T *queue)
  				"Send Bulk to" : "Recv Bulk from";
  			if (bulk->actual != VCHIQ_BULK_ACTUAL_ABORTED)
  				vchiq_log_info(SRVTRACE_LEVEL(service),
-					"%s %c%c%c%c d:%d len:%d %x<->%x",
+					"%s %c%c%c%c d:%d len:%d %tx<->%tx",
  					header,
  					VCHIQ_FOURCC_AS_4CHARS(
  						service->base.fourcc),
  					service->remoteport,
  					bulk->size,
-					(unsigned int)bulk->data,
-					(unsigned int)bulk->remote_data);
+					(size_t)bulk->data,
+					(size_t)bulk->remote_data);
  			else
  				vchiq_log_info(SRVTRACE_LEVEL(service),
  					"%s %c%c%c%c d:%d ABORTED - tx len:%d,"
-					" rx len:%d %x<->%x",
+					" rx len:%d %tx<->%tx",
  					header,
  					VCHIQ_FOURCC_AS_4CHARS(
  						service->base.fourcc),
  					service->remoteport,
  					bulk->size,
  					bulk->remote_size,
-					(unsigned int)bulk->data,
-					(unsigned int)bulk->remote_data);
+					(size_t)bulk->data,
+					(size_t)bulk->remote_data);
  		}
  
  		vchiq_complete_bulk(bulk);
@@ -1522,8 +1534,8 @@ parse_open(VCHIQ_STATE_T *state, VCHIQ_HEADER_T *header)
  
  		fourcc = payload->fourcc;
  		vchiq_log_info(vchiq_core_log_level,
-			"%d: prs OPEN@%x (%d->'%c%c%c%c')",
-			state->id, (unsigned int)header,
+			"%d: prs OPEN@%tx (%d->'%c%c%c%c')",
+			state->id, (size_t)header,
  			localport,
  			VCHIQ_FOURCC_AS_4CHARS(fourcc));
  
@@ -1661,7 +1673,7 @@ parse_rx_slots(VCHIQ_STATE_T *state)
  
  		header = (VCHIQ_HEADER_T *)(state->rx_data +
  			(state->rx_pos & VCHIQ_SLOT_MASK));
-		DEBUG_VALUE(PARSE_HEADER, (int)header);
+		DEBUG_VALUE(PARSE_HEADER, (size_t)header);
  		msgid = header->msgid;
  		DEBUG_VALUE(PARSE_MSGID, msgid);
  		size = header->size;
@@ -1695,20 +1707,20 @@ parse_rx_slots(VCHIQ_STATE_T *state)
  					remoteport);
  				if (service)
  					vchiq_log_warning(vchiq_core_log_level,
-						"%d: prs %s@%x (%d->%d) - "
+						"%d: prs %s@%tx (%d->%d) - "
  						"found connected service %d",
  						state->id, msg_type_str(type),
-						(unsigned int)header,
+						(size_t)header,
  						remoteport, localport,
  						service->localport);
  			}
  
  			if (!service) {
  				vchiq_log_error(vchiq_core_log_level,
-					"%d: prs %s@%x (%d->%d) - "
+					"%d: prs %s@%zx (%d->%d) - "
  					"invalid/closed service %d",
  					state->id, msg_type_str(type),
-					(unsigned int)header,
+					(size_t)header,
  					remoteport, localport, localport);
  				goto skip_message;
  			}
@@ -1734,12 +1746,12 @@ parse_rx_slots(VCHIQ_STATE_T *state)
  					min(16, size));
  		}
  
-		if (((unsigned int)header & VCHIQ_SLOT_MASK) + calc_stride(size)
+		if (((size_t)header & VCHIQ_SLOT_MASK) + calc_stride(size)
  			> VCHIQ_SLOT_SIZE) {
  			vchiq_log_error(vchiq_core_log_level,
-				"header %x (msgid %x) - size %x too big for "
+				"header %tx (msgid %x) - size %x too big for "
  				"slot",
-				(unsigned int)header, (unsigned int)msgid,
+				(size_t)header, (unsigned int)msgid,
  				(unsigned int)size);
  			WARN(1, "oversized for slot\n");
  		}
@@ -1758,8 +1770,8 @@ parse_rx_slots(VCHIQ_STATE_T *state)
  				service->peer_version = payload->version;
  			}
  			vchiq_log_info(vchiq_core_log_level,
-				"%d: prs OPENACK@%x,%x (%d->%d) v:%d",
-				state->id, (unsigned int)header, size,
+				"%d: prs OPENACK@%tx,%x (%d->%d) v:%d",
+				state->id, (size_t)header, size,
  				remoteport, localport, service->peer_version);
  			if (service->srvstate ==
  				VCHIQ_SRVSTATE_OPENING) {
@@ -1776,8 +1788,8 @@ parse_rx_slots(VCHIQ_STATE_T *state)
  			WARN_ON(size != 0); /* There should be no data */
  
  			vchiq_log_info(vchiq_core_log_level,
-				"%d: prs CLOSE@%x (%d->%d)",
-				state->id, (unsigned int)header,
+				"%d: prs CLOSE@%tx (%d->%d)",
+				state->id, (size_t)header,
  				remoteport, localport);
  
  			mark_service_closing_internal(service, 1);
@@ -1794,8 +1806,8 @@ parse_rx_slots(VCHIQ_STATE_T *state)
  			break;
  		case VCHIQ_MSG_DATA:
  			vchiq_log_info(vchiq_core_log_level,
-				"%d: prs DATA@%x,%x (%d->%d)",
-				state->id, (unsigned int)header, size,
+				"%d: prs DATA@%tx,%x (%d->%d)",
+				state->id, (size_t)header, size,
  				remoteport, localport);
  
  			if ((service->remoteport == remoteport)
@@ -1819,14 +1831,23 @@ parse_rx_slots(VCHIQ_STATE_T *state)
  			break;
  		case VCHIQ_MSG_CONNECT:
  			vchiq_log_info(vchiq_core_log_level,
-				"%d: prs CONNECT@%x",
-				state->id, (unsigned int)header);
+				"%d: prs CONNECT@%tx",
+				state->id, (size_t)header);
  			state->version_common = ((VCHIQ_SLOT_ZERO_T *)
  						 state->slot_data)->version;
  			up(&state->connect);
  			break;
+/* 
+ * XXXMDC Apparently nothing uses this 
+ * https://github.com/raspberrypi/linux/commit/14f4d72fb799a9b3170a45ab80d4a3ddad541960
+ * but taking out the master bits is a whole new job
+ */
  		case VCHIQ_MSG_BULK_RX:
-		case VCHIQ_MSG_BULK_TX: {
+		case VCHIQ_MSG_BULK_TX:
+			WARN_ON(1);
+			break;
+#if 0
+		{
  			VCHIQ_BULK_QUEUE_T *queue;
  			WARN_ON(!state->is_master);
  			queue = (type == VCHIQ_MSG_BULK_RX) ?
@@ -1854,12 +1875,12 @@ parse_rx_slots(VCHIQ_STATE_T *state)
  				wmb();
  
  				vchiq_log_info(vchiq_core_log_level,
-					"%d: prs %s@%x (%d->%d) %x@%x",
+					"%d: prs %s@%tx (%d->%d) %x@%tx",
  					state->id, msg_type_str(type),
-					(unsigned int)header,
+					(size_t)header,
  					remoteport, localport,
  					bulk->remote_size,
-					(unsigned int)bulk->remote_data);
+					(size_t)bulk->remote_data);
  
  				queue->remote_insert++;
  
@@ -1888,9 +1909,11 @@ parse_rx_slots(VCHIQ_STATE_T *state)
  				lmutex_unlock(&service->bulk_mutex);
  				if (resolved)
  					notify_bulks(service, queue,
-						1/*retry_poll*/);
+						1//retry_poll
+						);
  			}
-		} break;
+		}
+#endif
  		case VCHIQ_MSG_BULK_RX_DONE:
  		case VCHIQ_MSG_BULK_TX_DONE:
  			WARN_ON(state->is_master);
@@ -1912,10 +1935,10 @@ parse_rx_slots(VCHIQ_STATE_T *state)
  				if ((int)(queue->remote_insert -
  					queue->local_insert) >= 0) {
  					vchiq_log_error(vchiq_core_log_level,
-						"%d: prs %s@%x (%d->%d) "
+						"%d: prs %s@%tx (%d->%d) "
  						"unexpected (ri=%d,li=%d)",
  						state->id, msg_type_str(type),
-						(unsigned int)header,
+						(size_t)header,
  						remoteport, localport,
  						queue->remote_insert,
  						queue->local_insert);
@@ -1932,11 +1955,11 @@ parse_rx_slots(VCHIQ_STATE_T *state)
  				queue->remote_insert++;
  
  				vchiq_log_info(vchiq_core_log_level,
-					"%d: prs %s@%x (%d->%d) %x@%x",
+					"%d: prs %s@%tx (%d->%d) %x@%tx",
  					state->id, msg_type_str(type),
-					(unsigned int)header,
+					(size_t)header,
  					remoteport, localport,
-					bulk->actual, (unsigned int)bulk->data);
+					bulk->actual, (size_t)bulk->data);
  
  				vchiq_log_trace(vchiq_core_log_level,
  					"%d: prs:%d %cx li=%x ri=%x p=%x",
@@ -1958,14 +1981,14 @@ parse_rx_slots(VCHIQ_STATE_T *state)
  			break;
  		case VCHIQ_MSG_PADDING:
  			vchiq_log_trace(vchiq_core_log_level,
-				"%d: prs PADDING@%x,%x",
-				state->id, (unsigned int)header, size);
+				"%d: prs PADDING@%tx,%x",
+				state->id, (size_t)header, size);
  			break;
  		case VCHIQ_MSG_PAUSE:
  			/* If initiated, signal the application thread */
  			vchiq_log_trace(vchiq_core_log_level,
-				"%d: prs PAUSE@%x,%x",
-				state->id, (unsigned int)header, size);
+				"%d: prs PAUSE@%tx,%x",
+				state->id, (size_t)header, size);
  			if (state->conn_state == VCHIQ_CONNSTATE_PAUSED) {
  				vchiq_log_error(vchiq_core_log_level,
  					"%d: PAUSE received in state PAUSED",
@@ -1988,8 +2011,8 @@ parse_rx_slots(VCHIQ_STATE_T *state)
  			break;
  		case VCHIQ_MSG_RESUME:
  			vchiq_log_trace(vchiq_core_log_level,
-				"%d: prs RESUME@%x,%x",
-				state->id, (unsigned int)header, size);
+				"%d: prs RESUME@%tx,%x",
+				state->id, (size_t)header, size);
  			/* Release the slot mutex */
  			lmutex_unlock(&state->slot_mutex);
  			if (state->is_master)
@@ -2010,8 +2033,8 @@ parse_rx_slots(VCHIQ_STATE_T *state)
  
  		default:
  			vchiq_log_error(vchiq_core_log_level,
-				"%d: prs invalid msgid %x@%x,%x",
-				state->id, msgid, (unsigned int)header, size);
+				"%d: prs invalid msgid %x@%tx,%x",
+				state->id, msgid, (size_t)header, size);
  			WARN(1, "invalid message\n");
  			break;
  		}
@@ -2051,7 +2074,7 @@ slot_handler_func(void *v)
  	while (1) {
  		DEBUG_COUNT(SLOT_HANDLER_COUNT);
  		DEBUG_TRACE(SLOT_HANDLER_LINE);
-		remote_event_wait(&local->trigger);
+		remote_event_wait(state, &local->trigger);
  
  		rmb();
  
@@ -2141,8 +2164,7 @@ recycle_func(void *v)
  	VCHIQ_SHARED_STATE_T *local = state->local;
  
  	while (1) {
-		remote_event_wait(&local->recycle);
-
+		remote_event_wait(state, &local->recycle);
  		process_free_queue(state);
  	}
  	return 0;
@@ -2165,7 +2187,7 @@ sync_func(void *v)
  		int type;
  		unsigned int localport, remoteport;
  
-		remote_event_wait(&local->sync_trigger);
+		remote_event_wait(state, &local->sync_trigger);
  
  		rmb();
  
@@ -2179,10 +2201,10 @@ sync_func(void *v)
  
  		if (!service) {
  			vchiq_log_error(vchiq_sync_log_level,
-				"%d: sf %s@%x (%d->%d) - "
+				"%d: sf %s@%tx (%d->%d) - "
  				"invalid/closed service %d",
  				state->id, msg_type_str(type),
-				(unsigned int)header,
+				(size_t)header,
  				remoteport, localport, localport);
  			release_message_sync(state, header);
  			continue;
@@ -2213,8 +2235,8 @@ sync_func(void *v)
  				service->peer_version = payload->version;
  			}
  			vchiq_log_info(vchiq_sync_log_level,
-				"%d: sf OPENACK@%x,%x (%d->%d) v:%d",
-				state->id, (unsigned int)header, size,
+				"%d: sf OPENACK@%tx,%x (%d->%d) v:%d",
+				state->id, (size_t)header, size,
  				remoteport, localport, service->peer_version);
  			if (service->srvstate == VCHIQ_SRVSTATE_OPENING) {
  				service->remoteport = remoteport;
@@ -2228,8 +2250,8 @@ sync_func(void *v)
  
  		case VCHIQ_MSG_DATA:
  			vchiq_log_trace(vchiq_sync_log_level,
-				"%d: sf DATA@%x,%x (%d->%d)",
-				state->id, (unsigned int)header, size,
+				"%d: sf DATA@%tx,%x (%d->%d)",
+				state->id, (size_t)header, size,
  				remoteport, localport);
  
  			if ((service->remoteport == remoteport) &&
@@ -2248,8 +2270,8 @@ sync_func(void *v)
  
  		default:
  			vchiq_log_error(vchiq_sync_log_level,
-				"%d: sf unexpected msgid %x@%x,%x",
-				state->id, msgid, (unsigned int)header, size);
+				"%d: sf unexpected msgid %x@%tx,%x",
+				state->id, msgid, (size_t)header, size);
  			release_message_sync(state, header);
  			break;
  		}
@@ -2282,7 +2304,7 @@ get_conn_state_name(VCHIQ_CONNSTATE_T conn_state)
  VCHIQ_SLOT_ZERO_T *
  vchiq_init_slots(void *mem_base, int mem_size)
  {
-	int mem_align = (VCHIQ_SLOT_SIZE - (int)mem_base) & VCHIQ_SLOT_MASK;
+	int mem_align = (int)((VCHIQ_SLOT_SIZE - (long)mem_base) & VCHIQ_SLOT_MASK);
  	VCHIQ_SLOT_ZERO_T *slot_zero =
  		(VCHIQ_SLOT_ZERO_T *)((char *)mem_base + mem_align);
  	int num_slots = (mem_size - mem_align)/VCHIQ_SLOT_SIZE;
@@ -2334,8 +2356,8 @@ vchiq_init_state(VCHIQ_STATE_T *state, VCHIQ_SLOT_ZERO_T *slot_zero,
  	if (slot_zero->magic != VCHIQ_MAGIC) {
  		vchiq_loud_error_header();
  		vchiq_loud_error("Invalid VCHIQ magic value found.");
-		vchiq_loud_error("slot_zero=%x: magic=%x (expected %x)",
-			(unsigned int)slot_zero, slot_zero->magic, VCHIQ_MAGIC);
+		vchiq_loud_error("slot_zero=%tx: magic=%x (expected %x)",
+			(size_t)slot_zero, slot_zero->magic, VCHIQ_MAGIC);
  		vchiq_loud_error_footer();
  		return VCHIQ_ERROR;
  	}
@@ -2348,9 +2370,9 @@ vchiq_init_state(VCHIQ_STATE_T *state, VCHIQ_SLOT_ZERO_T *slot_zero,
  	if (slot_zero->version < VCHIQ_VERSION_MIN) {
  		vchiq_loud_error_header();
  		vchiq_loud_error("Incompatible VCHIQ versions found.");
-		vchiq_loud_error("slot_zero=%x: VideoCore version=%d "
+		vchiq_loud_error("slot_zero=%tx: VideoCore version=%d "
  			"(minimum %d)",
-			(unsigned int)slot_zero, slot_zero->version,
+			(size_t)slot_zero, slot_zero->version,
  			VCHIQ_VERSION_MIN);
  		vchiq_loud_error("Restart with a newer VideoCore image.");
  		vchiq_loud_error_footer();
@@ -2360,9 +2382,9 @@ vchiq_init_state(VCHIQ_STATE_T *state, VCHIQ_SLOT_ZERO_T *slot_zero,
  	if (VCHIQ_VERSION < slot_zero->version_min) {
  		vchiq_loud_error_header();
  		vchiq_loud_error("Incompatible VCHIQ versions found.");
-		vchiq_loud_error("slot_zero=%x: version=%d (VideoCore "
+		vchiq_loud_error("slot_zero=%tx: version=%d (VideoCore "
  			"minimum %d)",
-			(unsigned int)slot_zero, VCHIQ_VERSION,
+			(size_t)slot_zero, VCHIQ_VERSION,
  			slot_zero->version_min);
  		vchiq_loud_error("Restart with a newer kernel.");
  		vchiq_loud_error_footer();
@@ -2375,25 +2397,25 @@ vchiq_init_state(VCHIQ_STATE_T *state, VCHIQ_SLOT_ZERO_T *slot_zero,
  		 (slot_zero->max_slots_per_side != VCHIQ_MAX_SLOTS_PER_SIDE)) {
  		vchiq_loud_error_header();
  		if (slot_zero->slot_zero_size != sizeof(VCHIQ_SLOT_ZERO_T))
-			vchiq_loud_error("slot_zero=%x: slot_zero_size=%x "
+			vchiq_loud_error("slot_zero=%tx: slot_zero_size=%x "
  				"(expected %zx)",
-				(unsigned int)slot_zero,
+				(size_t)slot_zero,
  				slot_zero->slot_zero_size,
  				sizeof(VCHIQ_SLOT_ZERO_T));
  		if (slot_zero->slot_size != VCHIQ_SLOT_SIZE)
-			vchiq_loud_error("slot_zero=%x: slot_size=%d "
+			vchiq_loud_error("slot_zero=%tx: slot_size=%d "
  				"(expected %d",
-				(unsigned int)slot_zero, slot_zero->slot_size,
+				(size_t)slot_zero, slot_zero->slot_size,
  				VCHIQ_SLOT_SIZE);
  		if (slot_zero->max_slots != VCHIQ_MAX_SLOTS)
-			vchiq_loud_error("slot_zero=%x: max_slots=%d "
+			vchiq_loud_error("slot_zero=%tx: max_slots=%d "
  				"(expected %d)",
-				(unsigned int)slot_zero, slot_zero->max_slots,
+				(size_t)slot_zero, slot_zero->max_slots,
  				VCHIQ_MAX_SLOTS);
  		if (slot_zero->max_slots_per_side != VCHIQ_MAX_SLOTS_PER_SIDE)
-			vchiq_loud_error("slot_zero=%x: max_slots_per_side=%d "
+			vchiq_loud_error("slot_zero=%tx: max_slots_per_side=%d "
  				"(expected %d)",
-				(unsigned int)slot_zero,
+				(size_t)slot_zero,
  				slot_zero->max_slots_per_side,
  				VCHIQ_MAX_SLOTS_PER_SIDE);
  		vchiq_loud_error_footer();
@@ -2478,24 +2500,24 @@ vchiq_init_state(VCHIQ_STATE_T *state, VCHIQ_SLOT_ZERO_T *slot_zero,
  	state->data_use_count = 0;
  	state->data_quota = state->slot_queue_available - 1;
  
-	local->trigger.event = &state->trigger_event;
-	remote_event_create(&local->trigger);
+	local->trigger.event = offsetof(VCHIQ_STATE_T, trigger_event);
+	remote_event_create(state, &local->trigger);
  	local->tx_pos = 0;
  
-	local->recycle.event = &state->recycle_event;
-	remote_event_create(&local->recycle);
+	local->recycle.event = offsetof(VCHIQ_STATE_T, recycle_event);
+	remote_event_create(state, &local->recycle);
  	local->slot_queue_recycle = state->slot_queue_available;
  
-	local->sync_trigger.event = &state->sync_trigger_event;
-	remote_event_create(&local->sync_trigger);
+	local->sync_trigger.event = offsetof(VCHIQ_STATE_T, sync_trigger_event);
+	remote_event_create(state, &local->sync_trigger);
  
-	local->sync_release.event = &state->sync_release_event;
-	remote_event_create(&local->sync_release);
+	local->sync_release.event = offsetof(VCHIQ_STATE_T, sync_release_event);
+	remote_event_create(state, &local->sync_release);
  
  	/* At start-of-day, the slot is empty and available */
  	((VCHIQ_HEADER_T *)SLOT_DATA_FROM_INDEX(state, local->slot_sync))->msgid
  		= VCHIQ_MSGID_PADDING;
-	remote_event_signal_local(&local->sync_release);
+	remote_event_signal_local(state, &local->sync_release);
  
  	local->debug[DEBUG_ENTRIES] = DEBUG_MAX;
  
@@ -2775,18 +2797,18 @@ release_service_messages(VCHIQ_SERVICE_T *service)
  				if ((port == service->localport) &&
  					(msgid & VCHIQ_MSGID_CLAIMED)) {
  					vchiq_log_info(vchiq_core_log_level,
-						"  fsi - hdr %x",
-						(unsigned int)header);
+						"  fsi - hdr %tx",
+						(size_t)header);
  					release_slot(state, slot_info, header,
  						NULL);
  				}
  				pos += calc_stride(header->size);
  				if (pos > VCHIQ_SLOT_SIZE) {
  					vchiq_log_error(vchiq_core_log_level,
-						"fsi - pos %x: header %x, "
+						"fsi - pos %x: header %tx, "
  						"msgid %x, header->msgid %x, "
  						"header->size %x",
-						pos, (unsigned int)header,
+						pos, (size_t)header,
  						msgid, header->msgid,
  						header->size);
  					WARN(1, "invalid slot position\n");
@@ -3360,10 +3382,10 @@ vchiq_bulk_transfer(VCHIQ_SERVICE_HANDLE_T handle,
  	wmb();
  
  	vchiq_log_info(vchiq_core_log_level,
-		"%d: bt (%d->%d) %cx %x@%x %x",
+		"%d: bt (%d->%d) %cx %x@%tx %tx",
  		state->id,
  		service->localport, service->remoteport, dir_char,
-		size, (unsigned int)bulk->data, (unsigned int)userdata);
+		size, (size_t)bulk->data, (size_t)userdata);
  
  	/* The slot mutex must be held when the service is being closed, so
  	   claim it here to ensure that isn't happening */
@@ -3382,7 +3404,7 @@ vchiq_bulk_transfer(VCHIQ_SERVICE_HANDLE_T handle,
  				(dir == VCHIQ_BULK_TRANSMIT) ?
  				VCHIQ_POLL_TXNOTIFY : VCHIQ_POLL_RXNOTIFY);
  	} else {
-		int payload[2] = { (int)bulk->data, bulk->size };
+		uint32_t payload[2] = { (uint32_t)(uintptr_t)bulk->data, bulk->size };
  		VCHIQ_ELEMENT_T element = { payload, sizeof(payload) };
  
  		status = queue_message(state, NULL,
@@ -3526,7 +3548,6 @@ static void
  release_message_sync(VCHIQ_STATE_T *state, VCHIQ_HEADER_T *header)
  {
  	header->msgid = VCHIQ_MSGID_PADDING;
-	wmb();
  	remote_event_signal(&state->remote->sync_release);
  }
  
@@ -3710,12 +3731,12 @@ vchiq_dump_state(void *dump_context, VCHIQ_STATE_T *state)
  	vchiq_dump(dump_context, buf, len + 1);
  
  	len = snprintf(buf, sizeof(buf),
-		"  tx_pos=%x(@%x), rx_pos=%x(@%x)",
+		"  tx_pos=%x(@%tx), rx_pos=%x(@%tx)",
  		state->local->tx_pos,
-		(uint32_t)state->tx_data +
+		(size_t)state->tx_data +
  			(state->local_tx_pos & VCHIQ_SLOT_MASK),
  		state->rx_pos,
-		(uint32_t)state->rx_data +
+		(size_t)state->rx_data +
  			(state->rx_pos & VCHIQ_SLOT_MASK));
  	vchiq_dump(dump_context, buf, len + 1);
  
@@ -3817,8 +3838,8 @@ vchiq_dump_service_state(void *dump_context, VCHIQ_SERVICE_T *service)
  			vchiq_dump(dump_context, buf, len + 1);
  
  			len = snprintf(buf, sizeof(buf),
-				"  Ctrl: tx_count=%d, tx_bytes=%llu, "
-				"rx_count=%d, rx_bytes=%llu",
+				"  Ctrl: tx_count=%d, tx_bytes=%"PRIu64", "
+				"rx_count=%d, rx_bytes=%"PRIu64"",
  				service->stats.ctrl_tx_count,
  				service->stats.ctrl_tx_bytes,
  				service->stats.ctrl_rx_count,
@@ -3826,8 +3847,8 @@ vchiq_dump_service_state(void *dump_context, VCHIQ_SERVICE_T *service)
  			vchiq_dump(dump_context, buf, len + 1);
  
  			len = snprintf(buf, sizeof(buf),
-				"  Bulk: tx_count=%d, tx_bytes=%llu, "
-				"rx_count=%d, rx_bytes=%llu",
+				"  Bulk: tx_count=%d, tx_bytes=%"PRIu64", "
+				"rx_count=%d, rx_bytes=%"PRIu64"",
  				service->stats.bulk_tx_count,
  				service->stats.bulk_tx_bytes,
  				service->stats.bulk_rx_count,
diff --git a/sys/contrib/vchiq/interface/vchiq_arm/vchiq_core.h b/sys/contrib/vchiq/interface/vchiq_arm/vchiq_core.h
index 38ede407f4f4..4e3f41203bc4 100644
--- a/sys/contrib/vchiq/interface/vchiq_arm/vchiq_core.h
+++ b/sys/contrib/vchiq/interface/vchiq_arm/vchiq_core.h
@@ -184,12 +184,21 @@ enum {
  #if VCHIQ_ENABLE_DEBUG
  
  #define DEBUG_INITIALISE(local) int *debug_ptr = (local)->debug;
+#if defined(__aarch64__)
+#define DEBUG_TRACE(d) \
+	do { debug_ptr[DEBUG_ ## d] = __LINE__; dsb(sy); } while (0)
+#define DEBUG_VALUE(d, v) \
+	do { debug_ptr[DEBUG_ ## d] = (v); dsb(sy); } while (0)
+#define DEBUG_COUNT(d) \
+	do { debug_ptr[DEBUG_ ## d]++; dsb(sy); } while (0)
+#else
  #define DEBUG_TRACE(d) \
  	do { debug_ptr[DEBUG_ ## d] = __LINE__; dsb(); } while (0)
  #define DEBUG_VALUE(d, v) \
  	do { debug_ptr[DEBUG_ ## d] = (v); dsb(); } while (0)
  #define DEBUG_COUNT(d) \
  	do { debug_ptr[DEBUG_ ## d]++; dsb(); } while (0)
+#endif
  
  #else /* VCHIQ_ENABLE_DEBUG */
  
@@ -265,7 +274,7 @@ typedef struct vchiq_bulk_queue_struct {
  typedef struct remote_event_struct {
  	int armed;
  	int fired;
-	struct semaphore *event;
+	uint32_t event;
  } REMOTE_EVENT_T;
  
  typedef struct opaque_platform_state_t *VCHIQ_PLATFORM_STATE_T;
diff --git a/sys/contrib/vchiq/interface/vchiq_arm/vchiq_ioctl.h b/sys/contrib/vchiq/interface/vchiq_arm/vchiq_ioctl.h
index 617479eff136..90348ca4b0d0 100644
--- a/sys/contrib/vchiq/interface/vchiq_arm/vchiq_ioctl.h
+++ b/sys/contrib/vchiq/interface/vchiq_arm/vchiq_ioctl.h
@@ -127,4 +127,125 @@ typedef struct {
  #define VCHIQ_IOC_CLOSE_DELIVERED      _IO(VCHIQ_IOC_MAGIC,   17)
  #define VCHIQ_IOC_MAX                  17
  
+
+/*
+ * COMPAT_FREEBSD32
+ */
+
+typedef struct {
+	unsigned int config_size;
+	/*VCHIQ_CONFIG_T * */ uint32_t pconfig;
+} VCHIQ_GET_CONFIG32_T;
+
+typedef struct {
+	unsigned int handle;
+	/*void * */ uint32_t data;
+	unsigned int size;
+	/*void * */ uint32_t userdata;
+	VCHIQ_BULK_MODE_T mode;
+} VCHIQ_QUEUE_BULK_TRANSFER32_T;
+
+typedef struct {
+	unsigned int handle;
+	unsigned int count;
+	const /*VCHIQ_ELEMENT_T * */ uint32_t elements;
+} VCHIQ_QUEUE_MESSAGE32_T;
+
+typedef struct {
+	unsigned int handle;
+	int blocking;
+	unsigned int bufsize;
+	/*void * */ uint32_t buf;
+} VCHIQ_DEQUEUE_MESSAGE32_T;
+
+typedef struct {
+	/*void * */ uint32_t virt_addr;
+	/*size_t*/  uint32_t num_bytes;
+} VCHIQ_DUMP_MEM32_T;
+
+typedef struct {
+	VCHIQ_REASON_T reason;
+	/* VCHIQ_HEADER_T * */ uint32_t header;
+	/* void * */ uint32_t service_userdata;
+	/* void * */ uint32_t bulk_userdata;
+} VCHIQ_COMPLETION_DATA32_T;
+
+typedef struct {
+	unsigned int count;
+	/* VCHIQ_COMPLETION_DATA32_T * */ uint32_t buf;
+	unsigned int msgbufsize;
+	unsigned int msgbufcount; /* IN/OUT */
+	/* void ** */ uint32_t msgbufs;
+} VCHIQ_AWAIT_COMPLETION32_T;
+
+typedef struct vchiq_service_params32_struct {
+	int fourcc;
+	/* VCHIQ_CALLBACK_T */ uint32_t  callback;
+	/*void * */ uint32_t userdata;
+	short version;       /* Increment for non-trivial changes */
+	short version_min;   /* Update for incompatible changes */
+} VCHIQ_SERVICE_PARAMS32_T;
+
+typedef struct {
+	VCHIQ_SERVICE_PARAMS32_T params;
+	int is_open;
+	int is_vchi;
+	unsigned int handle;       /* OUT */
+} VCHIQ_CREATE_SERVICE32_T;
+
+typedef struct {
+	const /*void */ uint32_t data;
+	unsigned int size;
+} VCHIQ_ELEMENT32_T;
+
+
+#define VCHIQ_IOC_GET_CONFIG32 \
+	_IOC_NEWTYPE( \
+		VCHIQ_IOC_GET_CONFIG, \
+		VCHIQ_GET_CONFIG32_T \
+	)
+
+#define VCHIQ_IOC_QUEUE_BULK_TRANSMIT32 \
+	_IOC_NEWTYPE( \
+		VCHIQ_IOC_QUEUE_BULK_TRANSMIT, \
+		VCHIQ_QUEUE_BULK_TRANSFER32_T \
+	)
+
+#define VCHIQ_IOC_QUEUE_BULK_RECEIVE32 \
+	_IOC_NEWTYPE( \
+		VCHIQ_IOC_QUEUE_BULK_RECEIVE, \
+		VCHIQ_QUEUE_BULK_TRANSFER32_T \
+	)
+
+#define VCHIQ_IOC_QUEUE_MESSAGE32 \
+	_IOC_NEWTYPE( \
+		VCHIQ_IOC_QUEUE_MESSAGE, \
+		VCHIQ_QUEUE_MESSAGE32_T \
+	)
+
+#define VCHIQ_IOC_DEQUEUE_MESSAGE32 \
+	_IOC_NEWTYPE( \
+		VCHIQ_IOC_DEQUEUE_MESSAGE, \
+		VCHIQ_DEQUEUE_MESSAGE32_T \
+	)
+
+#define VCHIQ_IOC_DUMP_PHYS_MEM32 \
+	_IOC_NEWTYPE( \
+		VCHIQ_IOC_DUMP_PHYS_MEM, \
+		VCHIQ_DUMP_MEM32_T \
+	)
+
+#define VCHIQ_IOC_AWAIT_COMPLETION32 \
+	_IOC_NEWTYPE( \
+		VCHIQ_IOC_AWAIT_COMPLETION, \
+		VCHIQ_AWAIT_COMPLETION32_T \
+	)
+
+#define VCHIQ_IOC_CREATE_SERVICE32 \
+	_IOC_NEWTYPE( \
+		VCHIQ_IOC_CREATE_SERVICE, \
+		VCHIQ_CREATE_SERVICE32_T \
+	)
+
+
  #endif
diff --git a/sys/contrib/vchiq/interface/vchiq_arm/vchiq_kern_lib.c b/sys/contrib/vchiq/interface/vchiq_arm/vchiq_kern_lib.c
index 1f849a09d854..22b988dcf436 100644
--- a/sys/contrib/vchiq/interface/vchiq_arm/vchiq_kern_lib.c
+++ b/sys/contrib/vchiq/interface/vchiq_arm/vchiq_kern_lib.c
@@ -151,9 +151,9 @@ VCHIQ_STATUS_T vchiq_shutdown(VCHIQ_INSTANCE_T instance)
  					list);
  			list_del(pos);
  			vchiq_log_info(vchiq_arm_log_level,
-					"bulk_waiter - cleaned up %x "
+					"bulk_waiter - cleaned up %tx "
  					"for pid %d",
-					(unsigned int)waiter, waiter->pid);
+					(size_t)waiter, waiter->pid);
  			_sema_destroy(&waiter->bulk_waiter.event);
  
  			kfree(waiter);
@@ -454,8 +454,8 @@ vchiq_blocking_bulk_transfer(VCHIQ_SERVICE_HANDLE_T handle, void *data,
  		list_add(&waiter->list, &instance->bulk_waiter_list);
  		lmutex_unlock(&instance->bulk_waiter_list_mutex);
  		vchiq_log_info(vchiq_arm_log_level,
-				"saved bulk_waiter %x for pid %d",
-				(unsigned int)waiter, current->p_pid);
+				"saved bulk_waiter %tx for pid %d",
+				(size_t)waiter, current->p_pid);
  	}
  
  	return status;
diff --git a/sys/contrib/vchiq/interface/vchiq_arm/vchiq_kmod.c b/sys/contrib/vchiq/interface/vchiq_arm/vchiq_kmod.c
index 5b47377735f1..5c7cf9035413 100644
--- a/sys/contrib/vchiq/interface/vchiq_arm/vchiq_kmod.c
+++ b/sys/contrib/vchiq/interface/vchiq_arm/vchiq_kmod.c
@@ -47,7 +47,11 @@ __FBSDID("$FreeBSD$");
  #include <dev/ofw/ofw_bus_subr.h>
  
  #include <machine/bus.h>
+/* XXXMDC Is this necessary at all? */
+#if defined(__aarch64__)
+#else
  #include <machine/fdt.h>
+#endif
  
  #include "vchiq_arm.h"
  #include "vchiq_2835.h"
@@ -78,13 +82,31 @@ struct bcm_vchiq_softc {
  
  static struct bcm_vchiq_softc *bcm_vchiq_sc = NULL;
  
-#define	BSD_DTB			1
-#define	UPSTREAM_DTB		2
+
+#define CONFIG_INVALID 0
+#define CONFIG_VALID 1 << 0
+#define BSD_REG_ADDRS 1 << 1
+#define LONG_BULK_SPACE 1 << 2
+
+/*
+ * Also controls the use of the standard VC address offset for bulk data DMA
+ * (normal bulks use that offset; bulks for long address spaces use physical
+ * page addresses)
+ */
+extern unsigned int g_long_bulk_space;
+
+
+/*
+ * XXXMDC
+ * The man page for ofw_bus_is_compatible describes ``features''
+ * as ``can be used''. Here we use understand them as ``must be used''
+ */ 
+
  static struct ofw_compat_data compat_data[] = {
-	{"broadcom,bcm2835-vchiq",	BSD_DTB},
-	{"brcm,bcm2835-vchiq",		UPSTREAM_DTB},
-	{"brcm,bcm2711-vchiq",		UPSTREAM_DTB},
-	{NULL,				0}
+	{"broadcom,bcm2835-vchiq",	BSD_REG_ADDRS | CONFIG_VALID},
+	{"brcm,bcm2835-vchiq",		CONFIG_VALID},
+	{"brcm,bcm2711-vchiq",		LONG_BULK_SPACE | CONFIG_VALID},
+	{NULL,				CONFIG_INVALID}
  };
  
  #define	vchiq_read_4(reg)		\
@@ -119,13 +141,23 @@ bcm_vchiq_intr(void *arg)
  void
  remote_event_signal(REMOTE_EVENT_T *event)
  {
-	event->fired = 1;
  
+	wmb();
+
+	event->fired = 1;
  	/* The test on the next line also ensures the write on the previous line
  		has completed */
+	/* UPDATE: not on arm64, it would seem... */
+#if defined(__aarch64__)
+	dsb(sy);
+#endif
  	if (event->armed) {
  		/* trigger vc interrupt */
+#if defined(__aarch64__)
+		dsb(sy);
+#else
  		dsb();
+#endif
  		vchiq_write_4(0x48, 0);
  	}
  }
@@ -134,13 +166,17 @@ static int
  bcm_vchiq_probe(device_t dev)
  {
  
-	if (ofw_bus_search_compatible(dev, compat_data)->ocd_data == 0)
+	if ((ofw_bus_search_compatible(dev, compat_data)->ocd_data & CONFIG_VALID) == 0)
  		return (ENXIO);
  
  	device_set_desc(dev, "BCM2835 VCHIQ");
  	return (BUS_PROBE_DEFAULT);
  }
  
+/* debug_sysctl */
+extern int vchiq_core_log_level;
+extern int vchiq_arm_log_level;
+
  static int
  bcm_vchiq_attach(device_t dev)
  {
@@ -168,14 +204,36 @@ bcm_vchiq_attach(device_t dev)
  		return (ENXIO);
  	}
  
-	if (ofw_bus_search_compatible(dev, compat_data)->ocd_data == UPSTREAM_DTB)
+	uintptr_t dev_compat_d = ofw_bus_search_compatible(dev, compat_data)->ocd_data;
+	/* XXXMDC: shouldn't happen (checked for in probe)--but, for symmetry */
+	if ((dev_compat_d & CONFIG_VALID) == 0){
+		device_printf(dev, "attempting to attach using invalid config.\n");
+		bus_release_resource(dev, SYS_RES_IRQ, rid, sc->irq_res);
+		return (EINVAL);
+	}
+	if ((dev_compat_d & BSD_REG_ADDRS) == 0)
  		sc->regs_offset = -0x40;
+	if(dev_compat_d & LONG_BULK_SPACE)
+		g_long_bulk_space = 1;
  
  	node = ofw_bus_get_node(dev);
  	if ((OF_getencprop(node, "cache-line-size", &cell, sizeof(cell))) > 0)
  		g_cache_line_size = cell;
  
  	vchiq_core_initialize();
+	
+	/* debug_sysctl */
+        struct sysctl_ctx_list *ctx_l = device_get_sysctl_ctx(dev);
+        struct sysctl_oid *tree_node = device_get_sysctl_tree(dev);
+        struct sysctl_oid_list *tree = SYSCTL_CHILDREN(tree_node);  
+	SYSCTL_ADD_INT(
+		ctx_l, tree, OID_AUTO, "log", CTLFLAG_RW,
+		&vchiq_core_log_level, vchiq_core_log_level, "log level"
+	);
+	SYSCTL_ADD_INT(
+		ctx_l, tree, OID_AUTO, "arm_log", CTLFLAG_RW,
+		&vchiq_arm_log_level, vchiq_arm_log_level, "arm log level"
+	);
  
  	/* Setup and enable the timer */
  	if (bus_setup_intr(dev, sc->irq_res, INTR_TYPE_MISC | INTR_MPSAFE,
diff --git a/sys/contrib/vchiq/interface/vchiq_arm/vchiq_pagelist.h b/sys/contrib/vchiq/interface/vchiq_arm/vchiq_pagelist.h
index 72c362464cc2..d1cb9f1e1658 100644
--- a/sys/contrib/vchiq/interface/vchiq_arm/vchiq_pagelist.h
+++ b/sys/contrib/vchiq/interface/vchiq_arm/vchiq_pagelist.h
@@ -42,10 +42,10 @@
  #define PAGELIST_READ_WITH_FRAGMENTS 2
  
  typedef struct pagelist_struct {
-	unsigned long length;
-	unsigned short type;
-	unsigned short offset;
-	unsigned long addrs[1];	/* N.B. 12 LSBs hold the number of following
+	uint32_t length;
+	uint16_t type;
+	uint16_t offset;
+	uint32_t addrs[1];	/* N.B. 12 LSBs hold the number of following
  				   pages at consecutive addresses. */
  } PAGELIST_T;
  
diff --git a/sys/contrib/vchiq/interface/vchiq_arm/vchiq_shim.c b/sys/contrib/vchiq/interface/vchiq_arm/vchiq_shim.c
index cc8ef2e071f8..f33c545cea45 100644
--- a/sys/contrib/vchiq/interface/vchiq_arm/vchiq_shim.c
+++ b/sys/contrib/vchiq/interface/vchiq_arm/vchiq_shim.c
@@ -398,7 +398,7 @@ EXPORT_SYMBOL(vchi_msg_queuev);
   ***********************************************************/
  int32_t vchi_held_msg_release(VCHI_HELD_MSG_T *message)
  {
-	vchiq_release_message((VCHIQ_SERVICE_HANDLE_T)message->service,
+	vchiq_release_message((VCHIQ_SERVICE_HANDLE_T)(size_t)message->service,
  		(VCHIQ_HEADER_T *)message->message);
  
  	return 0;
@@ -444,7 +444,7 @@ int32_t vchi_msg_hold(VCHI_SERVICE_HANDLE_T handle,
  	*msg_size = header->size;
  
  	message_handle->service =
-		(struct opaque_vchi_service_t *)service->handle;
+		(struct opaque_vchi_service_t *)(unsigned long)service->handle;
  	message_handle->message = header;
  
  	return 0;
-- 
2.32.0




On Mon, Feb 28, 2022 at 12:42:24PM -0700, Warner Losh wrote:
>On Mon, Feb 28, 2022, 12:36 PM Marco Devesas Campos <
>devesas.campos@gmail.com> wrote:
>
>> Entirely right, Ronald — thanks for catching it!
>>
>
>Oops
>
>Warner, can I send you a consolidated patch later in the week? What’s the
>> best way to submit it?
>>
>
>Git format-patch is likely best.
>
>Warner
>
>
>> Best,
>> Marco
>>
>> On 28 Feb 2022, at 19:26, Ronald Klop <ronald-lists@klop.ws> wrote:
>>
>> On Sun, 27 Feb 2022 17:41:25 +0100, Warner Losh <imp@bsdimp.com> wrote:
>>
>>
>>
>> On Sun, Feb 27, 2022 at 8:44 AM Marco Devesas Campos <
>> devesas.campos@gmail.com> wrote:
>>
>>> Hi, List
>>>
>>> On the back of Ronald Klop's comments (thanks!), I went and got myself an
>>> RPI 4 and it turns out all that was need was adding the right dtb
>>> reference and it all works (seemingly) fine (incremental patch attached).
>>>
>>
>> I've committed the patch below. If it turns out we need more, we can
>> always augment.
>>
>>
>>
>> Hi Marco, Warner,
>>
>> Isn't the patch from
>> https://lists.freebsd.org/archives/freebsd-arm/2022-February/000949.html
>> needed also?
>> As you mention the patch below is an incremental patch?
>>
>> Regards,
>> Ronald.
>>
>>
>>
>>
>>
>>
>>
>> Warner
>>
>>
>>> One of the potential projects highlighted in the latest call for proposals
>>> was exactly to get hdmi audio output in 64 bit Pis, viz. the 400-s. If
>>> anyone who voted for that reads this list, wd be nice to get some input on
>>> the patches.
>>>
>>> Best,
>>> Marco
>>>
>>>
>>>
>>> diff --git a/sys/contrib/vchiq/interface/vchiq_arm/vchiq_kmod.c
>>> b/sys/contrib/vchiq/interface/vchiq_arm/vchiq_kmod.c
>>> index dc18678b99a3..344267ff0c1c 100644
>>> --- a/sys/contrib/vchiq/interface/vchiq_arm/vchiq_kmod.c
>>> +++ b/sys/contrib/vchiq/interface/vchiq_arm/vchiq_kmod.c
>>> @@ -83,6 +83,7 @@ static struct bcm_vchiq_softc *bcm_vchiq_sc = NULL;
>>>  static struct ofw_compat_data compat_data[] = {
>>>         {"broadcom,bcm2835-vchiq",      BSD_DTB},
>>>         {"brcm,bcm2835-vchiq",          UPSTREAM_DTB},
>>> +       {"brcm,bcm2711-vchiq",          UPSTREAM_DTB},
>>>         {NULL,                          0}
>>>  };
>>>
>>>
>>>
>>> > On 8 Feb 2022, at 08:49, Ronald Klop <ronald-lists@klop.ws> wrote:
>>> >
>>> > Van: Ronald Klop <ronald-lists@klop.ws>
>>> > Datum: maandag, 7 februari 2022 21:05
>>> > Aan: Marco Devesas Campos <devesas.campos@gmail.com>,
>>> freebsd-arm@freebsd.org
>>> > Onderwerp: Re: [PATCH] Experimental vchiq and bcm2835_audio support for
>>> arm64
>>> >
>>> > On 2/6/22 14:46, Marco Devesas Campos wrote:
>>> > > Hi Ronald,
>>> > >
>>> > > Thanks so much for trying out the patch out.
>>> > >
>>> > >> On 6 Feb 2022, at 13:05, Ronald Klop <ronald-lists@klop.ws> wrote:
>>> > >>
>>> > >> Hi,
>>> > >>
>>> > >> I compiled this on a RPI4 + 14-CURRENT. It boots, but I see no
>>> difference in available devices.
>>> > >> I can try to boot it on a RPI3B+ on another time.
>>> > >
>>> > > I *think* the GPU/VC in RPI-4 is a very different beast from the
>>> others. I'll
>>> > > look into it, but if you could give it a try on the 3+ I'd be much
>>> obliged.
>>> > >
>>> > >>
>>> > >> What would be the expected outcome? Where should I look at (or
>>> listen to)?
>>> > >>
>>> > >
>>> > > You should see something like
>>> > >
>>> > >    vchiq0: <BCM2835 VCHIQ> mem 0x7e00b840-0x7e00b87b irq 54 on
>>> simplebus0
>>> > >    vchiq: local ver 8 (min 3), remote ver 8.
>>> > >    pcm0: <VCHIQ audio> on vchiq0
>>> > >
>>> > > in your dmesg output.
>>> > >
>>> > > The file /dev/vchiq should exist, as well as the following sysctl-s
>>> (I'm
>>> > > assuming no other audio devices are attached)
>>> > >
>>> > >    % sysctl dev.pcm
>>> > >    dev.pcm.0.trace: 0
>>> > >    ...
>>> > >    dev.pcm.0.dest: 0
>>> > >    ...
>>> > >    dev.pcm.0.%parent: vchiq0
>>> > >    ...
>>> > >    dev.pcm.0.%driver: pcm
>>> > >    dev.pcm.0.%desc: VCHIQ audio
>>> > >    …
>>> > >
>>> > > Then if you `cat < /dev/random > /dev/dsp` you should hear some
>>> static coming
>>> > > out of whatever is connected to hdmi (maybe headphones too? otherwise
>>> try
>>> > > setting `sysctl dev.pcm.0.dest=1`)
>>> > >
>>> > > Best,
>>> > > Marco
>>> >
>>> >
>>> > Hi,
>>> >
>>> > Booted the patched 14-CURRENT on the RPI3B+.
>>> >
>>> > dmesg diff:
>>> > +vchiq0: <BCM2835 VCHIQ> mem 0x7e00b840-0x7e00b87b irq 54 on simplebus0
>>> > +vchiq: local ver 8 (min 3), remote ver 8.
>>> > +pcm0: <VCHIQ audio> on vchiq0
>>> >
>>> > [root@rpi3 ~]# cat /dev/sndstat
>>> > Installed devices:
>>> > pcm0: <VCHIQ audio> (play) default
>>> > No devices installed from userspace.
>>> >
>>> > [root@rpi3 ~]# sysctl dev.pcm
>>> > dev.pcm.0.trace: 0
>>> > dev.pcm.0.starved: 0
>>> > dev.pcm.0.freebuffer: 40000
>>> > dev.pcm.0.underruns: 0
>>> > dev.pcm.0.retrieved: 0
>>> > dev.pcm.0.submitted: 0
>>> > dev.pcm.0.callbacks: 0
>>> > dev.pcm.0.dest: 0
>>> > dev.pcm.0.mode: 3
>>> > dev.pcm.0.bitperfect: 0
>>> > dev.pcm.0.buffersize: 0
>>> > dev.pcm.0.play.vchanformat: s16le:2.0
>>> > dev.pcm.0.play.vchanrate: 48000
>>> > dev.pcm.0.play.vchanmode: fixed
>>> > dev.pcm.0.play.vchans: 1
>>> > dev.pcm.0.%parent: vchiq0
>>> > dev.pcm.0.%pnpinfo:
>>> > dev.pcm.0.%location:
>>> > dev.pcm.0.%driver: pcm
>>> > dev.pcm.0.%desc: VCHIQ audio
>>> > dev.pcm.%parent:
>>> >
>>> >
>>> > To play some audio I need to search some headphones first. :-)
>>> >
>>> > Ronald.
>>> >
>>> >
>>> >
>>> > Good morning,
>>> >
>>> > Found headphones with a cable on the attic. Plugged it into the audio
>>> jack and played an mp3. Amazing!
>>> >
>>> > Regards,
>>> > Ronald.
>>> >
>>>
>>>
>>>
>>
>>
>>
>>