Re: llvm19 lld issue

From: mmel@freebsd.org <mmel_at_FreeBSD.org>
Date: Fri, 15 Nov 2024 08:40:44 UTC

On 14.11.2024 22:01, Dimitry Andric wrote:
> On 14 Nov 2024, at 13:44, Michal Meloun <mmel@FreeBSD.org> wrote:
>>
>> While searching for the cause of armv7 kernel corruption after updating to llvm19 lld, I came across an interesting problem.
>>
>> - The linker script does not list all generated sections. Specifically, the data sections created by the linker set are not listed.
>>
>> - The linker can place these orphaned sections in any location (OK, with some restrictions). See https://maskray.me/blog/2024-06-02-understanding-orphan-sections.
>>
>> - Creating symbols outside a section is fragile and subject to error; the linker may place an orphaned section between the symbol definition and the following section.
>>
>> We ran into this problem many years ago, see https://github.com/freebsd/freebsd-src/commit/6e764e36da019837d90e3b4b712871ee4442637a. Unfortunately, we didn't fix it completely then, and we have to address the same corruption again.
>>
>> I think we should be strict in this area and use '--orphan-handling=error' for kernel linking. However, I'm not sure we can handle linker sets gracefully.
>>
>> Any comments, contrary opinion or better solution ? Does anyone know how to properly list all linker sets (mainly but not only 'set_<foo>_set') in linker script and which section is appropriate for them ? .rodata?
> 
> I tried adding --orphan-handler=error, and on buildkernel (even for amd64) I get pretty soon:
> 
> --- all_subdir_accf_data ---
> ld: error: accf_data.o:(.data) is being placed in '.data'
> ld: error: accf_data.o:(set_modmetadata_set) is being placed in 'set_modmetadata_set'
> ld: error: accf_data.o:(set_sysinit_set) is being placed in 'set_sysinit_set'
> ld: error: accf_data.o:(.debug_loc) is being placed in '.debug_loc'
> ld: error: accf_data.o:(.debug_abbrev) is being placed in '.debug_abbrev'
> ld: error: accf_data.o:(.debug_info) is being placed in '.debug_info'
> ld: error: accf_data.o:(.debug_ranges) is being placed in '.debug_ranges'
> ld: error: accf_data.o:(.debug_str) is being placed in '.debug_str'
> ld: error: accf_data.o:(.comment) is being placed in '.comment'
> ld: error: accf_data.o:(.debug_frame) is being placed in '.debug_frame'
> ld: error: accf_data.o:(.debug_line) is being placed in '.debug_line'
> ld: error: accf_data.o:(.llvm_addrsig) is being placed in '.llvm_addrsig'
> ld: error: accf_data.o:(.SUNW_ctf) is being placed in '.SUNW_ctf'
> ld: error: <internal>:(.note.gnu.build-id) is being placed in '.note.gnu.build-id'
> ld: error: <internal>:(.note.GNU-stack) is being placed in '.note.GNU-stack'
> ld: error: <internal>:(.symtab) is being placed in '.symtab'
> ld: error: <internal>:(.shstrtab) is being placed in '.shstrtab'
> ld: error: <internal>:(.strtab) is being placed in '.strtab'
> --- all_subdir_aic7xxx ---
> --- all_subdir_aic7xxx/ahc ---
> --- machine ---
> machine -> /home/dim/src/freebsd/src/sys/amd64/include
> --- all_subdir_accf_data ---
> *** [accf_data.ko.full] Error code 1
> 
> 
> Not sure if those are all really orphaned, though?
> 
> -Dimitry
> 
Most of them are not orphaned and I think they should be explicitly 
placed. Annoying as it is, we should probably keep a list of sections 
used in the kernel (one is sufficient for all architectures) and include 
it in the ldscripts for a particular arches(it's about 24 lines now).

After discussion with jrtc27 (thanks a lot for your patience), I think 
we have only three options besides explicitly listing all kernel sections:

1) Leave the ldscripts as they are, but prefix each <foo>_start symbol 
with a guard, i.e. explicit assignment to location counter ( '.=.' or 
ALIGN()).

2) Move all <foo>_start/end symbols defined outside to the appropriate 
sections

3) Add the linker '--orphan-handling=error' and declare/discard all 
compiler-generated sections.


I definitely don't like option 1. It's too fragile and depends on not 
very defined linker behavior.

Option 2 is easy and robust, and with the explicit placement of all 
kernel sections seems sufficient.

In my best opinion, we can combine options 2 and 3 to get the most 
robust solution.

Another problem is that an explicit list of kernel sections could 
probably make modules outside the tree interact badly with linker script.


What was your preference?

Michal