kern_execve using vm_page_zero_invalid but not vm_page_set_validclean to load /sbin/init ?
Mark Millard
marklmi at yahoo.com
Mon Jun 10 23:29:27 UTC 2019
[Forcing an appropriate large .sbss alignment was not enough
to avoid the clang-based problem for *sp++ related environ
code in _init_tls .]
On 2019-Jun-10, at 12:20, Mark Millard <marklmi at yahoo.com> wrote:
> [I decided to compare some readelf information from some
> other architectures. I was surprised by some of it. But
> .bss seems to be forced to start with a large alignment
> to avoid such issues as I originally traced.]
>
> On 2019-Jun-10, at 11:24, Mark Millard <marklmi at yahoo.com> wrote:
>
>> [Looks like Conrad M. is partially confirming my trace of the
>> issue is reasonable.]
>>
>> On 2019-Jun-10, at 07:37, Conrad Meyer <cem at freebsd.org> wrote:
>>
>>> Hi Mark,
>>>
>>> On Sun, Jun 9, 2019 at 11:17 PM Mark Millard via freebsd-hackers
>>> <freebsd-hackers at freebsd.org> wrote:
>>>> ...
>>>> vm_pager_get_pages uses vm_page_zero_invalid
>>>> to "Zero out partially filled data".
>>>>
>>>> But vm_page_zero_invalid does not zero every "invalid"
>>>> byte but works in terms of units of DEV_BSIZE :
>>>> ...
>>>> The comment indicates that areas of "sub-DEV_BSIZE"
>>>> should have been handled previously by
>>>> vm_page_set_validclean .
>>>
>>> Or another VM routine, yes (e.g., vm_page_set_valid_range). The valid
>>> and dirty bitmasks in vm_page only have a single bit per DEV_BSIZE
>>> region, so care must be taken when marking any sub-DEV_BSIZE region as
>>> valid to zero out the rest of the DEV_BSIZE region. This is part of
>>> the VM page contract. I'm not sure it's related to the BSS, though.
>>
>> Yea, I had written from what I'd seen in __elfN(load_section):
>>
>> QUOTE
>> __elfN(load_section) uses vm_imgact_map_page
>> to set up for its copyout. This appears to be
>> how the FileSiz (not including .sbss or .bss)
>> vs. MemSiz (including .sbss and .bss) is
>> handled (attempted?).
>> END QUOTE
>>
>> The copyout only copies through the last byte for filesz
>> but the vm_imgact_map_page does not zero out all the
>> bytes after that on that page:
>>
>> /*
>> * We have to get the remaining bit of the file into the first part
>> * of the oversized map segment. This is normally because the .data
>> * segment in the file is extended to provide bss. It's a neat idea
>> * to try and save a page, but it's a pain in the behind to implement.
>> */
>> copy_len = filsz == 0 ? 0 : (offset + filsz) - trunc_page(offset +
>> filsz);
>> map_addr = trunc_page((vm_offset_t)vmaddr + filsz);
>> map_len = round_page((vm_offset_t)vmaddr + memsz) - map_addr;
>> . . .
>> if (copy_len != 0) {
>> sf = vm_imgact_map_page(object, offset + filsz);
>> if (sf == NULL)
>> return (EIO);
>>
>> /* send the page fragment to user space */
>> off = trunc_page(offset + filsz) - trunc_page(offset + filsz);
>> error = copyout((caddr_t)sf_buf_kva(sf) + off,
>> (caddr_t)map_addr, copy_len);
>> vm_imgact_unmap_page(sf);
>> if (error != 0)
>> return (error);
>> }
>>
>> I looked into the details of the DEV_BSIZE code after sending
>> the original message and so realized that my provided example
>> /sbin/init readelf material was a good example of the issue
>> if I'd not missed something.
>>
>>>> So, if, say, char**environ ends up at the start of .sbss
>>>> consistently, does environ always end up zeroed independently
>>>> of FileSz for the PT_LOAD that spans them?
>>>
>>> It is required to be zeroed, yes. If not, there is a bug. If FileSz
>>> covers BSS, that's a bug in the linker. Either the trailing bytes of
>>> the corresponding page in the executable should be zero (wasteful; on
>>> amd64 ".comment" is packed in there instead), or the linker/loader
>>> must zero them at initialization. I'm not familiar with the
>>> particular details here, but if you are interested I would suggest
>>> looking at __elfN(load_section) in sys/kern/imgact_elf.c.
>>
>> I had looked at it some, see the material around the earlier quote
>> above.
>>
>>>> The following is not necessarily an example of problematical
>>>> figures but is just for showing an example structure of what
>>>> FileSiz covers vs. MemSiz for PT_LOAD's that involve .sbss
>>>> and .bss :
>>>> ...
>>>
>>> Your 2nd LOAD phdr's FileSiz matches up exactly with Segment .sbss
>>> Offset minus Segment .tdata Offset, i.e., none of the FileSiz
>>> corresponds to the (s)bss regions. (Good! At least the static linker
>>> part looks sane.) That said, the boundary is not page-aligned and the
>>> section alignment requirement is much lower than page_size, so the
>>> beginning of bss will share a file page with some data. Something
>>> should zero it at image activation.
>>
>> And, so far, I've not found anything in _start or before that does
>> zero any "sub-DEV_BSIZE" part after FileSz for the PT_LOAD in
>> question.
>>
>> Thanks for checking my trace of the issue. It is good to have some
>> confirmation that I'd not missed something.
>>
>>> (Tangent: sbss/bss probably do not need to be RWE on PPC! On amd64,
>>> init has three LOAD segments rather than two: one for rodata (R), one
>>> for .text, .init, etc (RX); and one for .data (RW).)
>>
>> Yea, the section header flags indicate just WA for .sbss and .bss (but
>> WAX for .got).
>>
>> But such is more general: for example, the beginning of .rodata
>> (not executable) shares the tail part of a page with .fini
>> (executable) in the example. .got has executable code but is in
>> the middle of sections that do not. For something like /sbin/init it
>> is so small that the middle of a page can be the only part that is
>> executable, as in the example. (It is not forced onto its own page.)
>>
>> The form of .got used is also writable: WAX for section header flags.
>
>
>
> amd64's /sbin/init :
>
> There are 9 program headers, starting at offset 64
>
> Program Headers:
> Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
> PHDR 0x000040 0x0000000000200040 0x0000000000200040 0x0001f8 0x0001f8 R 0x8
> LOAD 0x000000 0x0000000000200000 0x0000000000200000 0x039e94 0x039e94 R 0x1000
> LOAD 0x03a000 0x000000000023a000 0x000000000023a000 0x0e8e40 0x0e8e40 R E 0x1000
> LOAD 0x123000 0x0000000000323000 0x0000000000323000 0x005848 0x2381d9 RW 0x1000
> TLS 0x127000 0x0000000000327000 0x0000000000327000 0x001800 0x001820 R 0x10
> GNU_RELRO 0x127000 0x0000000000327000 0x0000000000327000 0x001848 0x001848 R 0x1
> GNU_EH_FRAME 0x01b270 0x000000000021b270 0x000000000021b270 0x00504c 0x00504c R 0x4
> GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0
> NOTE 0x000238 0x0000000000200238 0x0000000000200238 0x000048 0x000048 R 0x4
>
> Section to Segment mapping:
> Segment Sections...
> 00
> 01 .note.tag .rela.plt .rodata .eh_frame_hdr .eh_frame
> 02 .text .init .fini .plt
> 03 .data .got.plt .tdata .tbss .ctors .dtors .jcr .init_array .fini_array .bss
> 04 .tdata .tbss
> 05 .tdata .tbss .ctors .dtors .jcr .init_array .fini_array
> 06 .eh_frame_hdr
> 07
> 08 .note.tag
> There are 27 section headers, starting at offset 0x157938:
>
> Section Headers:
> [Nr] Name Type Addr Off Size ES Flg Lk Inf Al
> [ 0] NULL 0000000000000000 000000 000000 00 0 0 0
> [ 1] .note.tag NOTE 0000000000200238 000238 000048 00 A 0 0 4
> [ 2] .rela.plt RELA 0000000000200280 000280 000030 18 AI 0 11 8
> [ 3] .rodata PROGBITS 00000000002002c0 0002c0 01afb0 00 AMS 0 0 64
> [ 4] .eh_frame_hdr PROGBITS 000000000021b270 01b270 00504c 00 A 0 0 4
> [ 5] .eh_frame PROGBITS 00000000002202c0 0202c0 019bd4 00 A 0 0 8
> [ 6] .text PROGBITS 000000000023a000 03a000 0e8dfc 00 AX 0 0 16
> [ 7] .init PROGBITS 0000000000322dfc 122dfc 00000e 00 AX 0 0 4
> [ 8] .fini PROGBITS 0000000000322e0c 122e0c 00000e 00 AX 0 0 4
> [ 9] .plt PROGBITS 0000000000322e20 122e20 000020 00 AX 0 0 16
> [10] .data PROGBITS 0000000000323000 123000 003a80 00 WA 0 0 16
> [11] .got.plt PROGBITS 0000000000326a80 126a80 000010 00 WA 0 0 8
> [12] .tdata PROGBITS 0000000000327000 127000 001800 00 WAT 0 0 16
> [13] .tbss NOBITS 0000000000328800 128800 000020 00 WAT 0 0 8
> [14] .ctors PROGBITS 0000000000328800 128800 000010 00 WA 0 0 8
> [15] .dtors PROGBITS 0000000000328810 128810 000010 00 WA 0 0 8
> [16] .jcr PROGBITS 0000000000328820 128820 000008 00 WA 0 0 8
> [17] .init_array INIT_ARRAY 0000000000328828 128828 000018 00 WA 0 0 8
> [18] .fini_array FINI_ARRAY 0000000000328840 128840 000008 00 WA 0 0 8
> [19] .bss NOBITS 0000000000329000 128848 2321d9 00 WA 0 0 64
> [20] .comment PROGBITS 0000000000000000 128848 0074d4 01 MS 0 0 1
> [21] .gnu.warning.mkte PROGBITS 0000000000000000 12fd1c 000043 00 0 0 1
> [22] .gnu.warning.f_pr PROGBITS 0000000000000000 12fd5f 000043 00 0 0 1
> [23] .gnu_debuglink PROGBITS 0000000000000000 1478b0 000010 00 0 0 1
> [24] .shstrtab STRTAB 0000000000000000 1478c0 0000f1 00 0 0 1
> [25] .symtab SYMTAB 0000000000000000 12fda8 017b08 18 26 1707 8
> [26] .strtab STRTAB 0000000000000000 1479b1 00ff84 00 0 0 1
>
> Note that there is space after .finit_array+8 before .bss starts
> with a sizable alignment. The MemSiz for 03 does span .bss .
>
> armv7's /sbin/init is different about MemSiz spanning .bss:
>
> Program Headers:
> Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
> PHDR 0x000034 0x00010034 0x00010034 0x00120 0x00120 R 0x4
> LOAD 0x000000 0x00010000 0x00010000 0x10674 0x10674 R 0x1000
> LOAD 0x011000 0x00021000 0x00021000 0xe9c54 0xe9c54 R E 0x1000
> LOAD 0x0fb000 0x0010b000 0x0010b000 0x03b88 0x30ccd RW 0x1000
> TLS 0x0fe000 0x0010e000 0x0010e000 0x00b60 0x00b70 R 0x20
> GNU_RELRO 0x0fe000 0x0010e000 0x0010e000 0x00b88 0x00b88 R 0x1
> GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0
> NOTE 0x000154 0x00010154 0x00010154 0x00064 0x00064 R 0x4
> ARM_EXIDX 0x0001b8 0x000101b8 0x000101b8 0x00220 0x00220 R 0x4
>
> (NOTE: 0x0010b000+0x30ccd==0x13BCCD . Compare this to the later .bss
> Addr of 0x10f000.)
>
> Section to Segment mapping:
> Segment Sections...
> 00
> 01 .note.tag .ARM.exidx .rodata .ARM.extab
> 02 .text .init .fini
> 03 .data .tdata .tbss .jcr .init_array .fini_array .got .bss
> 04 .tdata .tbss
> 05 .tdata .tbss .jcr .init_array .fini_array .got
> 06
> 07 .note.tag
> 08 .ARM.exidx
> There are 24 section headers, starting at offset 0x12be3c:
>
> Section Headers:
> [Nr] Name Type Addr Off Size ES Flg Lk Inf Al
> [ 0] NULL 00000000 000000 000000 00 0 0 0
> [ 1] .note.tag NOTE 00010154 000154 000064 00 A 0 0 4
> [ 2] .ARM.exidx ARM_EXIDX 000101b8 0001b8 000220 00 A 5 0 4
> [ 3] .rodata PROGBITS 00010400 000400 01022c 00 AMS 0 0 64
> [ 4] .ARM.extab PROGBITS 0002062c 01062c 000048 00 A 0 0 4
> [ 5] .text PROGBITS 00021000 011000 0e9c14 00 AX 0 0 128
> [ 6] .init PROGBITS 0010ac20 0fac20 000014 00 AX 0 0 16
> [ 7] .fini PROGBITS 0010ac40 0fac40 000014 00 AX 0 0 16
> [ 8] .data PROGBITS 0010b000 0fb000 002734 00 WA 0 0 8
> [ 9] .tdata PROGBITS 0010e000 0fe000 000b60 00 WAT 0 0 16
> [10] .tbss NOBITS 0010eb60 0feb60 000010 00 WAT 0 0 4
> [11] .jcr PROGBITS 0010eb60 0feb60 000000 00 WA 0 0 4
> [12] .init_array INIT_ARRAY 0010eb60 0feb60 000008 00 WA 0 0 4
> [13] .fini_array FINI_ARRAY 0010eb68 0feb68 000004 00 WA 0 0 4
> [14] .got PROGBITS 0010eb6c 0feb6c 00001c 00 WA 0 0 4
> [15] .bss NOBITS 0010f000 0feb88 02cccd 00 WA 0 0 64
> [16] .comment PROGBITS 00000000 0feb88 0074b6 01 MS 0 0 1
> [17] .ARM.attributes ARM_ATTRIBUTES 00000000 10603e 00004f 00 0 0 1
> [18] .gnu.warning.mkte PROGBITS 00000000 10608d 000043 00 0 0 1
> [19] .gnu.warning.f_pr PROGBITS 00000000 1060d0 000043 00 0 0 1
> [20] .gnu_debuglink PROGBITS 00000000 11b314 000010 00 0 0 1
> [21] .shstrtab STRTAB 00000000 11b324 0000e3 00 0 0 1
> [22] .symtab SYMTAB 00000000 106114 015200 10 23 3063 4
> [23] .strtab STRTAB 00000000 11b407 010a32 00 0 0 1
>
> Note that there is space after .got+0x1c before .bss starts
> with a sizable alignment. The MemSiz for 03 does *not* span
> .bss , unlike for amd64 (and the rest).
>
>
> aarch64's /sbin/init is similar to amd64 instead of armv7:
>
> Program Headers:
> Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
> PHDR 0x000040 0x0000000000200040 0x0000000000200040 0x0001c0 0x0001c0 R 0x8
> LOAD 0x000000 0x0000000000200000 0x0000000000200000 0x01624f 0x01624f R 0x10000
> LOAD 0x020000 0x0000000000220000 0x0000000000220000 0x0dd354 0x0dd354 R E 0x10000
> LOAD 0x100000 0x0000000000300000 0x0000000000300000 0x011840 0x252111 RW 0x10000
> TLS 0x110000 0x0000000000310000 0x0000000000310000 0x001800 0x001820 R 0x40
> GNU_RELRO 0x110000 0x0000000000310000 0x0000000000310000 0x001840 0x001840 R 0x1
> GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0
> NOTE 0x000200 0x0000000000200200 0x0000000000200200 0x000048 0x000048 R 0x4
>
> Section to Segment mapping:
> Segment Sections...
> 00
> 01 .note.tag .rodata
> 02 .text .init .fini
> 03 .data .tdata .tbss .jcr .init_array .fini_array .got .bss
> 04 .tdata .tbss
> 05 .tdata .tbss .jcr .init_array .fini_array .got
> 06
> 07 .note.tag
> There are 21 section headers, starting at offset 0x14b6f0:
>
> Section Headers:
> [Nr] Name Type Addr Off Size ES Flg Lk Inf Al
> [ 0] NULL 0000000000000000 000000 000000 00 0 0 0
> [ 1] .note.tag NOTE 0000000000200200 000200 000048 00 A 0 0 4
> [ 2] .rodata PROGBITS 0000000000200280 000280 015fcf 00 AMS 0 0 64
> [ 3] .text PROGBITS 0000000000220000 020000 0dd31c 00 AX 0 0 64
> [ 4] .init PROGBITS 00000000002fd320 0fd320 000014 00 AX 0 0 16
> [ 5] .fini PROGBITS 00000000002fd340 0fd340 000014 00 AX 0 0 16
> [ 6] .data PROGBITS 0000000000300000 100000 003a20 00 WA 0 0 16
> [ 7] .tdata PROGBITS 0000000000310000 110000 001800 00 WAT 0 0 16
> [ 8] .tbss NOBITS 0000000000311800 111800 000020 00 WAT 0 0 8
> [ 9] .jcr PROGBITS 0000000000311800 111800 000000 00 WA 0 0 8
> [10] .init_array INIT_ARRAY 0000000000311800 111800 000018 00 WA 0 0 8
> [11] .fini_array FINI_ARRAY 0000000000311818 111818 000008 00 WA 0 0 8
> [12] .got PROGBITS 0000000000311820 111820 000020 00 WA 0 0 8
> [13] .bss NOBITS 0000000000320000 111840 232111 00 WA 0 0 64
> [14] .comment PROGBITS 0000000000000000 111840 007191 01 MS 0 0 1
> [15] .gnu.warning.mkte PROGBITS 0000000000000000 1189d1 000043 00 0 0 1
> [16] .gnu.warning.f_pr PROGBITS 0000000000000000 118a14 000043 00 0 0 1
> [17] .gnu_debuglink PROGBITS 0000000000000000 13b7f8 000010 00 0 0 1
> [18] .shstrtab STRTAB 0000000000000000 13b808 0000bd 00 0 0 1
> [19] .symtab SYMTAB 0000000000000000 118a58 022da0 18 20 3621 8
> [20] .strtab STRTAB 0000000000000000 13b8c5 00fe2b 00 0 0 1
>
> Note that there is space after .got+0x20 before .bss starts
> with a sizable alignment. The MemSiz for 03 does span
> .bss , like for amd64 (and all but armv7).
>
> powerpc64's /sbin/init is similar to amd64 as well:
>
> Program Headers:
> Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
> PHDR 0x000040 0x0000000000200040 0x0000000000200040 0x0001f8 0x0001f8 R 0x8
> LOAD 0x000000 0x0000000000200000 0x0000000000200000 0x039e94 0x039e94 R 0x1000
> LOAD 0x03a000 0x000000000023a000 0x000000000023a000 0x0e8e40 0x0e8e40 R E 0x1000
> LOAD 0x123000 0x0000000000323000 0x0000000000323000 0x005848 0x2381d9 RW 0x1000
> TLS 0x127000 0x0000000000327000 0x0000000000327000 0x001800 0x001820 R 0x10
> GNU_RELRO 0x127000 0x0000000000327000 0x0000000000327000 0x001848 0x001848 R 0x1
> GNU_EH_FRAME 0x01b270 0x000000000021b270 0x000000000021b270 0x00504c 0x00504c R 0x4
> GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0
> NOTE 0x000238 0x0000000000200238 0x0000000000200238 0x000048 0x000048 R 0x4
>
> Section to Segment mapping:
> Segment Sections...
> 00
> 01 .note.tag .rela.plt .rodata .eh_frame_hdr .eh_frame
> 02 .text .init .fini .plt
> 03 .data .got.plt .tdata .tbss .ctors .dtors .jcr .init_array .fini_array .bss
> 04 .tdata .tbss
> 05 .tdata .tbss .ctors .dtors .jcr .init_array .fini_array
> 06 .eh_frame_hdr
> 07
> 08 .note.tag
> There are 27 section headers, starting at offset 0x157938:
>
> Section Headers:
> [Nr] Name Type Addr Off Size ES Flg Lk Inf Al
> [ 0] NULL 0000000000000000 000000 000000 00 0 0 0
> [ 1] .note.tag NOTE 0000000000200238 000238 000048 00 A 0 0 4
> [ 2] .rela.plt RELA 0000000000200280 000280 000030 18 AI 0 11 8
> [ 3] .rodata PROGBITS 00000000002002c0 0002c0 01afb0 00 AMS 0 0 64
> [ 4] .eh_frame_hdr PROGBITS 000000000021b270 01b270 00504c 00 A 0 0 4
> [ 5] .eh_frame PROGBITS 00000000002202c0 0202c0 019bd4 00 A 0 0 8
> [ 6] .text PROGBITS 000000000023a000 03a000 0e8dfc 00 AX 0 0 16
> [ 7] .init PROGBITS 0000000000322dfc 122dfc 00000e 00 AX 0 0 4
> [ 8] .fini PROGBITS 0000000000322e0c 122e0c 00000e 00 AX 0 0 4
> [ 9] .plt PROGBITS 0000000000322e20 122e20 000020 00 AX 0 0 16
> [10] .data PROGBITS 0000000000323000 123000 003a80 00 WA 0 0 16
> [11] .got.plt PROGBITS 0000000000326a80 126a80 000010 00 WA 0 0 8
> [12] .tdata PROGBITS 0000000000327000 127000 001800 00 WAT 0 0 16
> [13] .tbss NOBITS 0000000000328800 128800 000020 00 WAT 0 0 8
> [14] .ctors PROGBITS 0000000000328800 128800 000010 00 WA 0 0 8
> [15] .dtors PROGBITS 0000000000328810 128810 000010 00 WA 0 0 8
> [16] .jcr PROGBITS 0000000000328820 128820 000008 00 WA 0 0 8
> [17] .init_array INIT_ARRAY 0000000000328828 128828 000018 00 WA 0 0 8
> [18] .fini_array FINI_ARRAY 0000000000328840 128840 000008 00 WA 0 0 8
> [19] .bss NOBITS 0000000000329000 128848 2321d9 00 WA 0 0 64
> [20] .comment PROGBITS 0000000000000000 128848 0074d4 01 MS 0 0 1
> [21] .gnu.warning.mkte PROGBITS 0000000000000000 12fd1c 000043 00 0 0 1
> [22] .gnu.warning.f_pr PROGBITS 0000000000000000 12fd5f 000043 00 0 0 1
> [23] .gnu_debuglink PROGBITS 0000000000000000 1478b0 000010 00 0 0 1
> [24] .shstrtab STRTAB 0000000000000000 1478c0 0000f1 00 0 0 1
> [25] .symtab SYMTAB 0000000000000000 12fda8 017b08 18 26 1707 8
> [26] .strtab STRTAB 0000000000000000 1479b1 00ff84 00 0 0 1
>
>
> Note that there is space after .fini_array+8 before .bss starts
> with a sizable alignment. The MemSiz for 03 does span
> .bss , like for amd64 (and all but armv7).
I temporarily forced my 32-bit powerpc /sbin/init to have:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
. . .
[16] .got PROGBITS 0193845c 12845c 000010 04 WAX 0 0 4
[17] .sbss NOBITS 01939000 12846c 0000b0 00 WA 0 0 4
[18] .bss NOBITS 019390c0 12846c 02cc48 00 WA 0 0 64
. . .
It was not enough to avoid the problems I've elsewhere
reported for *sp++ getting SIGSEGV ( environ related
activity in _init_tls ).
===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
More information about the freebsd-hackers
mailing list