Unresponsive jails issues
Bjoern A. Zeeb
bzeeb-lists at lists.zabbadoz.net
Mon May 16 13:08:44 UTC 2016
> On 16 May 2016, at 12:55 , Grzegorz Junka <list1 at gjunka.com> wrote:
>
> I have a server running 13 jails for various system services. Recently I added two jails to run simple go applications for testing. They open a network socket and nginx, which is in another jail, and which round robin balances requests to them. I mention that because it may be related, however not necessarily because it was happening earlier.
>
> The problem is that every 2-3 days jails in my servers stop responding. "jexec jailname tcsh" hangs forever, "service jail stop jailname" hangs forever as well. "top" doesn't show anything suspicious. I can login through SSH to the main server fine. I don't login to jails through SSH so I can't check but it seems that when that happens they stop responding because the services that are running in them stop too (e.g. web server, imap, ...). I tried to "kill -9" the "jexec" process that hangs but that doesn't work.
>
> My first question is what evidence should I gather when that happens so that I can investigate the issue later on after the server is restarted?
>
> And the second question, any idea why that might be happening in the first place?
>
> I am running FreeBSD 10.3 AMD64 updated from 10.2 a couple of weeks ago.
If you can log into the base system and issue commands there; try to see what procstat (-k) thinks about various jailed processes. You could also check ps axl for the WCHAN and see if anything suspicious shows up.
/bz
More information about the freebsd-jail
mailing list