Under qemu-aarch64-static "wc /dev/null" gets "Unsupported ancillary data: 1/0" from a sendmsg attempt: because of wrong cmsg_len type in target_cmsghdr

Mark Millard marklmi at yahoo.com
Thu Jan 3 09:25:35 UTC 2019


[This note follows the investigation sequence,
ending with the important conclusions.]

My test context here is a poudriere-devel bulk -i for a
amd64->aarch64 context.

wc /dev/null or wc //dev/null does:

# wc /dev/null
Unsupported ancillary data: 1/0

that then hangs-up until I ^C to get back to a prompt.


Here is what ktrace/kdump shows the process before the hang through
when I hit ^C to stop the hang-up:

. . .
 98475 101033 qemu-aarch64-static 0.000340 CALL  sigprocmask[340](SIG_BLOCK,0x7ffffffe3c80,0x7ffffffe3d80)
 98475 101033 qemu-aarch64-static 0.000003 RET   sigprocmask[340] 0
 98475 101033 qemu-aarch64-static 0.000001 CALL  pselect[522](0x6,0,0x7ffffffe3fb0,0,0,0x7ffffffe3d80)
 98475 101033 qemu-aarch64-static 0.000001 RET   pselect[522] 1
 98475 101033 qemu-aarch64-static 0.000001 CALL  sigprocmask[340](SIG_SETMASK,0x7ffffffe3c80,0)
 98475 101033 qemu-aarch64-static 0.000001 RET   sigprocmask[340] 0
 98475 101033 qemu-aarch64-static 0.000042 CALL  write[4](0x2,0x7ffffffe3480,0x20)
 98475 101033 qemu-aarch64-static 0.000036 GIO   fd 2 wrote 32 bytes
       "Unsupported ancillary data: 1/0
       "
 98475 101033 qemu-aarch64-static 0.000003 RET   write[4] 32/0x20
 98475 101033 qemu-aarch64-static 0.000001 CALL  sendmsg[28](0x5,0x7ffffffe3c28,0)
 98475 101033 qemu-aarch64-static 0.000003 RET   sendmsg[28] -1 errno 22 Invalid argument
 98475 101033 qemu-aarch64-static 0.000184 CALL  close[6](0x3)
 98475 101033 qemu-aarch64-static 0.000040 RET   close[6] 0
 98475 101033 qemu-aarch64-static 0.000017 CALL  close[6](0x7)
 98475 101033 qemu-aarch64-static 0.000005 RET   close[6] 0
 98475 101033 qemu-aarch64-static 0.000002 CALL  sigprocmask[340](SIG_BLOCK,0x7ffffffe3c80,0x7ffffffe3d80)
 98475 101033 qemu-aarch64-static 0.000001 RET   sigprocmask[340] 0
 98475 101033 qemu-aarch64-static 0.000001 CALL  pselect[522](0x6,0x7ffffffe3dd0,0,0,0,0x7ffffffe3d80)
 98475 101539 qemu-aarch64-static 0.000089 RET   nanosleep[240] 0
 98475 101539 qemu-aarch64-static 0.000042 CALL  _umtx_op[454](0x86101f008,UMTX_OP_WAIT_UINT_PRIVATE,0,0,0)
 98475 101033 qemu-aarch64-static 15.845396 RET   pselect[522] -1 errno 4 Interrupted system call

Note the qemu-aarch64 genrated message and the later:
sendmsg[28] -1 errno 22 Invalid argument

The qemu-*-static code that wrote the message is from
t2h_freebsd_cmsg and is:

        if ((cmsg->cmsg_level == TARGET_SOL_SOCKET) &&
            (cmsg->cmsg_type == SCM_RIGHTS)) {
            int *fd = (int *)data;
            int *target_fd = (int *)target_data;
            int i, numfds = len / sizeof(int);

            for (i = 0; i < numfds; i++) {
                fd[i] = tswap32(target_fd[i]);
            }
        } else if ((cmsg->cmsg_level == TARGET_SOL_SOCKET) &&
            (cmsg->cmsg_type == SCM_TIMESTAMP) &&
            (len == sizeof(struct timeval)))  {
            /* copy struct timeval to host */
            struct timeval *tv = (struct timeval *)data;
            struct target_freebsd_timeval *target_tv =
                (struct target_freebsd_timeval *)target_data;
            __get_user(tv->tv_sec, &target_tv->tv_sec);
            __get_user(tv->tv_usec, &target_tv->tv_usec);
        } else {
            gemu_log("Unsupported ancillary data: %d/%d\n",
                cmsg->cmsg_level, cmsg->cmsg_type);
            memcpy(data, target_data, len);
        }
 
Well it turns out that qemu_*-static 's code has:

struct target_cmsghdr {
    abi_long    cmsg_len;
    int32_t     cmsg_level;
    int32_t     cmsg_type;
};

where for amd64 target_cmsghdr has:

(gdb) p/d sizeof(struct target_cmsghdr)
$2 = 16
(gdb) p/d sizeof(((struct target_cmsghdr *)0)->cmsg_len) 
$5 = 8
(gdb) p/d &((struct target_cmsghdr *)0)->cmsg_level
$4 = 8
(gdb) p/d &((struct target_cmsghdr *)0)->cmsg_type 
$1 = 12

which does not match the amd64 or aarch64 native:

struct cmsghdr {
        socklen_t       cmsg_len;               /* data byte count, including hdr */
        int             cmsg_level;             /* originating protocol */
        int             cmsg_type;              /* protocol-specific type */
/* followed by  u_char  cmsg_data[]; */
};                       

because the cmsghdr's cmsg_len is smaller, even on a 64-bit architecture:

(gdb) p/d sizeof(((struct cmsghdr *)0)->cmsg_len)
$6 = 4

/usr/include/arpa/inet.h:typedef	__socklen_t	socklen_t;
/usr/include/netinet/in.h:typedef	__socklen_t	socklen_t;
/usr/include/netinet6/in6.h:typedef	__socklen_t	socklen_t;
/usr/include/sys/_types.h:typedef	__uint32_t	__socklen_t;
/usr/include/sys/socket.h:typedef	__socklen_t	socklen_t;
. . .
/usr/include/netdb.h:typedef	__socklen_t	socklen_t;

so abi_long does not match socklen_t for 64-bit architectures.

So code such as in t2h_freebsd_cmsg:

        cmsg->cmsg_level = tswap32(target_cmsg->cmsg_level);
        cmsg->cmsg_type = tswap32(target_cmsg->cmsg_type);

is not using the correct target offsets when aarch64 is the target
that it is extracting from (for example).

For comparison on a 64-bit architecture:

(gdb) p/d sizeof(struct cmsghdr)
$1 = 12
(gdb) p/d &((struct cmsghdr *)0)->cmsg_level
$2 = 4
(gdb) p/d &((struct cmsghdr *)0)->cmsg_type 
$3 = 8


I do not yet have a tested change.


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)



More information about the freebsd-arm mailing list