Re: amd64 syscall ABI (vs. Darwin)
- In reply to: Konstantin Belousov : "Re: amd64 syscall ABI (vs. Darwin)"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 17 Jan 2022 23:14:55 UTC
> On 17 Jan 2022, at 23:51, Konstantin Belousov <kostikbel@gmail.com> wrote: > > On Mon, Jan 17, 2022 at 10:31:09PM +0000, Damian's Proton Mail wrote: > >>> On 17 Jan 2022, at 14:38, Konstantin Belousov <kostikbel@gmail.com> wrote: >> >>> Look at the sys/amd64/amd64/exceptions.S. The fast_syscall entry point >>> is where we receive control after the syscall instruction. >> >> A lot of new things in there for me, but the flow is clear. I was able to find corresponding logic in XNU’s sources too. Earlier I said: >> >>> At a first glance Darwin approach seems more optimal >> >> But it’s instead the opposite/no difference at all, as in Darwin, they explicitly restore/set all registers, including callee saved r12-r15. >> >> Explicitly preserving registers would prevent kernel data leakage too. Doing so in FreeBSD would also be an ABI compatible change I think, since users shouldn’t rely on values in those registers. >> I’m curious if you see any obvious pros/cons with either approach, or is it just a more arbitrary implementation choice? > > We preserve everything on syscall entry, it is the SYSCALL instruction > behavior that makes it look somewhat convoluted. I suggest you to read > the SDM description of the SYSCALL instruction to understand the registers > manipulations on entry. > > On the other hand, on the fast syscall return, we indeed not restore > everything. If you want to restore full frame, use PCB_FULL_IRET pcb > flag to request iretq return path. > >> Not that I’d propose changing the ABI though, I also want my toy project to work as a plug-in kernel module. >> I guess the only other option to emulate Darwin's behaviour would be to intercept syscalls in userspace somehow first and manually preserve the register values? > > To emulate Darwin, you would need specific ABI personality (sysent) in the > kernel, which would also provide sv_syscall_ret method. The method can > do whatever is needed to the return frame, and set PCB_FULL_IRET to indicate > that kernel should load it into CPU GPR file as is. > > BTW, does Darwin use SYSCALL instruction for syscall entry on amd64? Yes, it also uses SYSCALL. Also rax/rdx for return values and the carry bit to indicate errors. Even the syscall numbers are similar. They use different masks to distinguish BSD/Mach syscalls, but the effective BSD syscall numbers seem to be the same so far. So I already had sysent hooks, and PCB_FULL_IRET works indeed, thanks!