From nobody Thu Nov 25 09:35:53 2021 X-Original-To: freebsd-current@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id A8A9C18B7861 for ; Thu, 25 Nov 2021 09:35:59 +0000 (UTC) (envelope-from theraven@FreeBSD.org) Received: from smtp.freebsd.org (smtp.freebsd.org [96.47.72.83]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "smtp.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4J0CRf6Fhtz4TZc for ; Thu, 25 Nov 2021 09:35:58 +0000 (UTC) (envelope-from theraven@FreeBSD.org) Received: from smtp.theravensnest.org (smtp.theravensnest.org [45.77.103.195]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) (Authenticated sender: theraven) by smtp.freebsd.org (Postfix) with ESMTPSA id 59D0A8B38 for ; Thu, 25 Nov 2021 09:35:58 +0000 (UTC) (envelope-from theraven@FreeBSD.org) Received: from [192.168.1.202] (unknown [81.141.223.90]) by smtp.theravensnest.org (Postfix) with ESMTPSA id 6E0D42D7CE for ; Thu, 25 Nov 2021 09:35:56 +0000 (GMT) Message-ID: <7e7f4ba7-16b3-fa6e-fa1d-e9df957e91f1@FreeBSD.org> Date: Thu, 25 Nov 2021 09:35:53 +0000 List-Id: Discussions about the use of FreeBSD-current List-Archive: https://lists.freebsd.org/archives/freebsd-current List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-current@freebsd.org MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.3.2 Subject: Re: VDSO on amd64 Content-Language: en-GB To: freebsd-current@freebsd.org References: From: David Chisnall In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1637832959; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LRJyhtOjbLrvfU40ZoDwGsFzLDpOZ5KfivVcDUuq6Qc=; b=gMmhFFP2ZbthTbxj313lIw3IKsFBRPNZL06eM9CcMymDnQOE6FcZeYUzJn8XHjQlIs1yOE 8Cr08uhvmOkykcLvqPlttfnSv//OY+8M5wf9uXVBMOTNto+I9I1Km2163wOA2r7bpgxTPs u+EjSXd5AjmTggVIY5rPI0dY8U2mrpUTov0hN9UIXnJLp9iZ4RTNebl9vtgyY7ukM14HKG pSQj4COwGYm1TnKO3SUUDINtYEpGzhnfjbeFvBpKt1bx41JNEFKcoNMb0eLMotkmGFXfwA ATUNqDbu3cUXID+aYKqjpvoV/IiP82AspzbYqHMikz35Sg6JMGTndoLnCH/I9Q== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1637832959; a=rsa-sha256; cv=none; b=ID8pBUKQj3VkdmUj5NVLiT6Al8FdffhG1/owYIC52VADA1kl32GJ13oMXXmmw87LprFoc9 MIwLNeZKBmWnueOg+tgAKKSelW0BAlAC5Ej91vH+hZXBtDWgWbBmfHoopEJL06pYWuDBQe 6grrCFh5tsKa7ltodZ7DENuH+atKV2jjPE2eaEToFWFYCD2LSCQS7QY0Ig3OI7XADBfZGK Ke9NWVay0k8lo7ePqyoPRKOUy5osVX+Snpg31RB8XcBOOUy7b721tD7Mfw0uwrnJBpO0VE 2UVRAOLEpYTWOJJoXN3sBRvSASAkDJ1RuexPgtVJUfkUJOCTurDN+XdtJ4DecQ== ARC-Authentication-Results: i=1; mx1.freebsd.org; none X-ThisMailContainsUnwantedMimeParts: N Great news! Note that your example of throwing an exception from a signal handler works because the signal is delivered during a system call. The compiler generates correct unwind tables for calls because any call may throw. If you did something like a division by zero to get a SIGFPE or a null-pointer dereference to get a SIGSEGV then the throw would probably not work (or, rather, would be delivered to the right place but might corrupt some register state). Neither clang nor GCC currently supports non-call exceptions by default. This mechanism is more useful for Java VMs and similar. Some Linux-based implementations (including Android) use this to avoid null-pointer checks in Java. The VDSO mechanism in Linux is also used for providing some syscall implementations. In particular, getting the current approximate time and getting the current CPU (either by reading from the VDSO's data section or by doing a real syscall, without userspace knowing which). It also provides the syscall stub that is used for the kernel transition for all 'real' syscalls. This doesn't matter so much on amd64, but on i386 it lets them select between int 80h, syscall or sysenter, depending on what the hardware supports. A few questions about future plans: - Do you have plans to extend the VDSO to provide system call entry points and fast-path syscalls? It would be really nice if we could move all of the libsyscalls bits into the VDSO so that any compartmentalisation mechanism that wanted to interpose on syscalls just needed to provide a replacement for the VDSO. - It looks as if the Linux VDSO mechanism isn't yet using this. Do you plan on moving it over? - I can't quite tell from kern_sharedpage.c (this file has almost no comments) - is the userspace mapping of the VDSO randomised? This has been done on Linux for a while because the VDSO is an incredibly high-value target for code reuse attacks (it can do system calls and it can restore the entire register state from the contents of an on-stack buffer if you can jump into it). David On 25/11/2021 02:36, Konstantin Belousov wrote: > I have mostly finished implementation of "proper" vdso for amd64 > native binaries, both 64bit and 32bit. Vdso wraps signal trampolines > into real dynamic shared object, which is prelinked into dynamically > linked image. > > The main (and in fact, now the only) reason for wrapping trampolines > into vdso is to provide proper unwind annotation for the signal frame, > without a need to teach each unwinder about special frame types. In > reality, most of them are already aware of our signal trampolines, > since there is no other way to walk over them except to match > instructions sequence in the frame. Also, we provide sysctl > kern.proc.sigtramp which reports the location of the trampoline. > > So this patch should not make much difference for e.g. gdb or lldb. > On the other hand, I noted that llvm13 unwinder with vdso is able to > catch exceptions thrown from the signal handler, which was a suprise > to me. Corresponding test code is available at > https://gist.github.com/b886401fcc92dc37b49316eaf0e871ca > > Another advantage for us is that having vdso allows to change > trampoline code without breaking unwinders. > > Vdso's for both 64bit and 32bit ABI are put into existing shared page. > This means that total size of both objects should be below 4k, and > some more space needs to be left available, for stuff like timehands > and fxrng. Using linker tricks, which is where the most complexity in > this patch belongs, I was able to reduce size of objects below 1.5k. > I believe some more space saving could be achieved, but I stopped > there for now. Or we might extend shared region object to two pages, > if current situation appears to be too tight. > > The implementation can be found at https://reviews.freebsd.org/D32960 > > Signal delivery for old i386 elf (freebsd 4.x) and a.out binaries was > not yet tested. > > Your reviews, testing, and any other form of feedback is welcomed. > The work was sponsored by The FreeBSD Foundation. >