[Bug 276985] crash in LinuxKPI/drm
- In reply to: bugzilla-noreply_a_freebsd.org: "[Bug 276985] Crash in scheduler __curthread"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Tue, 03 Sep 2024 12:53:45 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276985 --- Comment #34 from Olivier Certner <olce@FreeBSD.org> --- (In reply to Tomasz "CeDeROM" CEDRO from comment #30) > Thanks Olivier. This is my last message in this thread. I hope not, and that you will answer the question I posted for you in comment #29. > I am happy that your setup forks fine, and for your friends, but I would not ship a product like this knowing it does not work for me nor for some people. I just got used to comfort of a release being always rock solid. I'm really amazed that this is your view on FreeBSD, especially on its graphics stack, after hanging around for so long. You must have forgotten the early 201x years, where, IIRC, you'd have to stick with >5 years old cards to get some acceleration. Of course, everyone supposedly tries its best to ship flawless (or, at least, functional) pieces of software. However, in practice, hardware support can be very difficult (and some hardware especially... uncooperative), and in particular the graphics stack is a beast of its own (I'm just a newbie here, gradually learning). It's not even developed in-house but rather imported from Linux, which gradually requires implementing the Linux KPI, and AFAIU we are undermanned for this task *alone*. Add to that the fact that even Linux DRM drivers are occasionally buggy (formerly, more often than not, but the situation seems to have improved in the past few years, but perhaps my view is biased as I avoid very recent hardware), and the numerous card models, even if based on the same chips. It's simply impossible to test all combinations. I can safely bet that if every amdgpu-supported GPUs really had been broken by the latest FreeBSD stacks, as you were insinuating, they wouldn't have been released as is, or they would have gotten a lot more attention. > Thanks for pointing out its a different issue. Well, to be crystal clear, while I responded also here for that point, I'm referring to what you posted in bug #278212, which seems clearly out of place. By contrast, in your stacks, I see one correspondence with earlier stacks posted here (bug #276985), one by feh@ and one by Vlad, where the crash happens in linux_rcu_cleaner_func() (and there's another one on Reddit). So there may be something in common, I just don't know at this time (perhaps someone else would have a hint). If you intend to post more traces/dumps or more explanations on your scenarios, it would be wise to open a new bug, yes. > Also thank you for pointing out I can work with 5.10, 5.15, and 6.10 on 14 release what is not possible on 13 (some documentation on this would be nice). It's a very useful possibility indeed. It should have been documented, but can also be inferred by looking at the available ports and the content of their Makefile. However, I've heard that the plan going forward is to integrate back DRM into base. If carried out, such substitution isn't going to be possible anymore unfortunately. > If I knew what the problem is and how to fix it I would send patches not crash logs. Sending crash logs is not the problem I'm pointing out. Crash logs are welcome. Not being able to fix isn't a problem per-se either. The problem is where you attached logs and posted comments, and the necessity to describe things factually, from the scenario of your interactions with the computer to the problem you're experiencing, with details on the hardware and software (versions in particular) used, without conflating or extrapolating things. In complex matters like this, it is also precious to be able to re-test with slight changes to be able to spot differences in behavior. These are areas where you can actually make progress and help. > Btw there is no need to use offensive aggressive and arrogant language (i.e. "you are not willing to test", "you're alone", "spreading FUD by over-generalizing your own case", etc). "not willing to test" was my feeling, although when I first wrote that I used "willing" in a broader sense than the one you actually received. "you're alone" is simply re-using your own words. And "spreading FUD by over-generalizing your own case", as I already explained, is just describing a factual reality. I certainly did not intend to be offensive nor arrogant, and I don't think I was. I've just been factual, and firm about some principles without which everybody is losing time, trust, etc. On the contrary, bragging in multiple, loosely related bugs that 14 with DRM is generally unsuitable for production use is what is aggressive, and more importantly, as I showed, wrong and unproductive. There is no denying that there are still problems (I already wrote some rough summary above), and my aim is precisely to nail them to be able to get rid of them, if possible. > This is not a constructive and motivating language that I am used for to see here. On the contrary, my messages are (at least, aim to be) very constructive, and it is exactly in this direction that I'm trying to steer you. [As a side note, concerning the second part of your sentence, given how long you've been around, I don't think you can possibly believe what you wrote. There has certainly been a lot of abusive language in the project, especially in the old years. I do not endorse such a language, and perhaps contrary to what you seem to believe, I've not engaged in that here.] > But I get the point. (...) Great. > I am just a bit scared to do this on a production machine you can imagine that. Surely. But what will you do if your production machine happens to break? Don't you have another very similar or identical setup more or less ready to replace it, especially if downtime is a concern? Such a setup would additionally be useful to test stuff without disturbing your main machine. > My last question - if you are the drm module maintainer / developer - would it be possible to mark following ports with incremental numbers like 510, 515, 610, not 61 for a newest release please (61 < 515)? I'm not (at least for now). I understand that, e.g., a name like 'drm-601-kmod' would have been more satisfying, but does this have any relevance in practice? Are you scripting things to automatically update your DRM modules depending on the inferred version (in which case, there are probably other means)? And, as said above, I think the plan is that these ports are going to disappear. So I doubt there will be any change in this area. Before trying to setup a new machine, could you please answer my very simple question of comment #29: Where did the packages you used to install DRM modules came from? Did you build them yourself, or were they official packages, or ...? If you didn't build them yourself, the first thing to try would be to do exactly that and see if problems persist. -- You are receiving this mail because: You are the assignee for the bug.