kern_execve using vm_page_zero_invalid but not vm_page_set_validclean to load /sbin/init ?

Mark Millard marklmi at yahoo.com
Mon Jun 10 19:20:38 UTC 2019


[I decided to compare some readelf information from some
other architectures. I was surprised by some of it. But
.bss seems to be forced to start with a large alignment
to avoid such issues as I originally traced.]

On 2019-Jun-10, at 11:24, Mark Millard <marklmi at yahoo.com> wrote:

> [Looks like Conrad M. is partially confirming my trace of the
> issue is reasonable.]
> 
> On 2019-Jun-10, at 07:37, Conrad Meyer <cem at freebsd.org> wrote:
> 
>> Hi Mark,
>> 
>> On Sun, Jun 9, 2019 at 11:17 PM Mark Millard via freebsd-hackers
>> <freebsd-hackers at freebsd.org> wrote:
>>> ...
>>> vm_pager_get_pages uses vm_page_zero_invalid
>>> to "Zero out partially filled data".
>>> 
>>> But vm_page_zero_invalid does not zero every "invalid"
>>> byte but works in terms of units of DEV_BSIZE :
>>> ...
>>> The comment indicates that areas of "sub-DEV_BSIZE"
>>> should have been handled previously by
>>> vm_page_set_validclean .
>> 
>> Or another VM routine, yes (e.g., vm_page_set_valid_range).  The valid
>> and dirty bitmasks in vm_page only have a single bit per DEV_BSIZE
>> region, so care must be taken when marking any sub-DEV_BSIZE region as
>> valid to zero out the rest of the DEV_BSIZE region.  This is part of
>> the VM page contract.  I'm not sure it's related to the BSS, though.
> 
> Yea, I had written from what I'd seen in __elfN(load_section):
> 
> QUOTE
> __elfN(load_section) uses vm_imgact_map_page
> to set up for its copyout. This appears to be
> how the FileSiz (not including .sbss or .bss)
> vs. MemSiz (including .sbss and .bss) is
> handled (attempted?).
> END QUOTE
> 
> The copyout only copies through the last byte for filesz
> but the vm_imgact_map_page does not zero out all the
> bytes after that on that page:
> 
>        /*
>         * We have to get the remaining bit of the file into the first part
>         * of the oversized map segment.  This is normally because the .data
>         * segment in the file is extended to provide bss.  It's a neat idea
>         * to try and save a page, but it's a pain in the behind to implement.
>         */
>        copy_len = filsz == 0 ? 0 : (offset + filsz) - trunc_page(offset +
>            filsz);
>        map_addr = trunc_page((vm_offset_t)vmaddr + filsz);
>        map_len = round_page((vm_offset_t)vmaddr + memsz) - map_addr;
> . . .
>        if (copy_len != 0) {
>                sf = vm_imgact_map_page(object, offset + filsz);
>                if (sf == NULL)
>                        return (EIO);
> 
>                /* send the page fragment to user space */
>                off = trunc_page(offset + filsz) - trunc_page(offset + filsz);
>                error = copyout((caddr_t)sf_buf_kva(sf) + off,
>                    (caddr_t)map_addr, copy_len);
>                vm_imgact_unmap_page(sf);
>                if (error != 0)
>                        return (error);
>        }
> 
> I looked into the details of the DEV_BSIZE code after sending
> the original message and so realized that my provided example
> /sbin/init readelf material was a good example of the issue
> if I'd not missed something.
> 
>>> So, if, say, char**environ ends up at the start of .sbss
>>> consistently, does environ always end up zeroed independently
>>> of FileSz for the PT_LOAD that spans them?
>> 
>> It is required to be zeroed, yes.  If not, there is a bug.  If FileSz
>> covers BSS, that's a bug in the linker.  Either the trailing bytes of
>> the corresponding page in the executable should be zero (wasteful; on
>> amd64 ".comment" is packed in there instead), or the linker/loader
>> must zero them at initialization.  I'm not familiar with the
>> particular details here, but if you are interested I would suggest
>> looking at __elfN(load_section) in sys/kern/imgact_elf.c.
> 
> I had looked at it some, see the material around the earlier quote
> above.
> 
>>> The following is not necessarily an example of problematical
>>> figures but is just for showing an example structure of what
>>> FileSiz covers vs. MemSiz for PT_LOAD's that involve .sbss
>>> and .bss :
>>> ...
>> 
>> Your 2nd LOAD phdr's FileSiz matches up exactly with Segment .sbss
>> Offset minus Segment .tdata Offset, i.e., none of the FileSiz
>> corresponds to the (s)bss regions.  (Good!  At least the static linker
>> part looks sane.)  That said, the boundary is not page-aligned and the
>> section alignment requirement is much lower than page_size, so the
>> beginning of bss will share a file page with some data.  Something
>> should zero it at image activation.
> 
> And, so far, I've not found anything in _start or before that does
> zero any "sub-DEV_BSIZE" part after FileSz for the PT_LOAD in
> question.
> 
> Thanks for checking my trace of the issue. It is good to have some
> confirmation that I'd not missed something.
> 
>> (Tangent: sbss/bss probably do not need to be RWE on PPC!  On amd64,
>> init has three LOAD segments rather than two: one for rodata (R), one
>> for .text, .init, etc (RX); and one for .data (RW).)
> 
> Yea, the section header flags indicate just WA for .sbss and .bss (but
> WAX for .got).
> 
> But such is more general: for example, the beginning of .rodata
> (not executable) shares the tail part of a page with .fini
> (executable) in the example. .got has executable code but is in
> the middle of sections that do not. For something like /sbin/init it
> is so small that the middle of a page can be the only part that is
> executable, as in the example. (It is not forced onto its own page.)
> 
> The form of .got used is also writable: WAX for section header flags.



amd64's /sbin/init :

There are 9 program headers, starting at offset 64

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x000040 0x0000000000200040 0x0000000000200040 0x0001f8 0x0001f8 R   0x8
  LOAD           0x000000 0x0000000000200000 0x0000000000200000 0x039e94 0x039e94 R   0x1000
  LOAD           0x03a000 0x000000000023a000 0x000000000023a000 0x0e8e40 0x0e8e40 R E 0x1000
  LOAD           0x123000 0x0000000000323000 0x0000000000323000 0x005848 0x2381d9 RW  0x1000
  TLS            0x127000 0x0000000000327000 0x0000000000327000 0x001800 0x001820 R   0x10
  GNU_RELRO      0x127000 0x0000000000327000 0x0000000000327000 0x001848 0x001848 R   0x1
  GNU_EH_FRAME   0x01b270 0x000000000021b270 0x000000000021b270 0x00504c 0x00504c R   0x4
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0
  NOTE           0x000238 0x0000000000200238 0x0000000000200238 0x000048 0x000048 R   0x4

 Section to Segment mapping:
  Segment Sections...
   00     
   01     .note.tag .rela.plt .rodata .eh_frame_hdr .eh_frame 
   02     .text .init .fini .plt 
   03     .data .got.plt .tdata .tbss .ctors .dtors .jcr .init_array .fini_array .bss 
   04     .tdata .tbss 
   05     .tdata .tbss .ctors .dtors .jcr .init_array .fini_array 
   06     .eh_frame_hdr 
   07     
   08     .note.tag 
There are 27 section headers, starting at offset 0x157938:

Section Headers:
  [Nr] Name              Type            Addr             Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .note.tag         NOTE            0000000000200238 000238 000048 00   A  0   0  4
  [ 2] .rela.plt         RELA            0000000000200280 000280 000030 18  AI  0  11  8
  [ 3] .rodata           PROGBITS        00000000002002c0 0002c0 01afb0 00 AMS  0   0 64
  [ 4] .eh_frame_hdr     PROGBITS        000000000021b270 01b270 00504c 00   A  0   0  4
  [ 5] .eh_frame         PROGBITS        00000000002202c0 0202c0 019bd4 00   A  0   0  8
  [ 6] .text             PROGBITS        000000000023a000 03a000 0e8dfc 00  AX  0   0 16
  [ 7] .init             PROGBITS        0000000000322dfc 122dfc 00000e 00  AX  0   0  4
  [ 8] .fini             PROGBITS        0000000000322e0c 122e0c 00000e 00  AX  0   0  4
  [ 9] .plt              PROGBITS        0000000000322e20 122e20 000020 00  AX  0   0 16
  [10] .data             PROGBITS        0000000000323000 123000 003a80 00  WA  0   0 16
  [11] .got.plt          PROGBITS        0000000000326a80 126a80 000010 00  WA  0   0  8
  [12] .tdata            PROGBITS        0000000000327000 127000 001800 00 WAT  0   0 16
  [13] .tbss             NOBITS          0000000000328800 128800 000020 00 WAT  0   0  8
  [14] .ctors            PROGBITS        0000000000328800 128800 000010 00  WA  0   0  8
  [15] .dtors            PROGBITS        0000000000328810 128810 000010 00  WA  0   0  8
  [16] .jcr              PROGBITS        0000000000328820 128820 000008 00  WA  0   0  8
  [17] .init_array       INIT_ARRAY      0000000000328828 128828 000018 00  WA  0   0  8
  [18] .fini_array       FINI_ARRAY      0000000000328840 128840 000008 00  WA  0   0  8
  [19] .bss              NOBITS          0000000000329000 128848 2321d9 00  WA  0   0 64
  [20] .comment          PROGBITS        0000000000000000 128848 0074d4 01  MS  0   0  1
  [21] .gnu.warning.mkte PROGBITS        0000000000000000 12fd1c 000043 00      0   0  1
  [22] .gnu.warning.f_pr PROGBITS        0000000000000000 12fd5f 000043 00      0   0  1
  [23] .gnu_debuglink    PROGBITS        0000000000000000 1478b0 000010 00      0   0  1
  [24] .shstrtab         STRTAB          0000000000000000 1478c0 0000f1 00      0   0  1
  [25] .symtab           SYMTAB          0000000000000000 12fda8 017b08 18     26 1707  8
  [26] .strtab           STRTAB          0000000000000000 1479b1 00ff84 00      0   0  1

Note that there is space after .finit_array+8 before .bss starts
with a sizable alignment. The MemSiz for 03 does span .bss .

armv7's /sbin/init is different about MemSiz spanning .bss:

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  PHDR           0x000034 0x00010034 0x00010034 0x00120 0x00120 R   0x4
  LOAD           0x000000 0x00010000 0x00010000 0x10674 0x10674 R   0x1000
  LOAD           0x011000 0x00021000 0x00021000 0xe9c54 0xe9c54 R E 0x1000
  LOAD           0x0fb000 0x0010b000 0x0010b000 0x03b88 0x30ccd RW  0x1000
  TLS            0x0fe000 0x0010e000 0x0010e000 0x00b60 0x00b70 R   0x20
  GNU_RELRO      0x0fe000 0x0010e000 0x0010e000 0x00b88 0x00b88 R   0x1
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0
  NOTE           0x000154 0x00010154 0x00010154 0x00064 0x00064 R   0x4
  ARM_EXIDX      0x0001b8 0x000101b8 0x000101b8 0x00220 0x00220 R   0x4

(NOTE: 0x0010b000+0x30ccd==0x13BCCD . Compare this to the later .bss
Addr of 0x10f000.)

 Section to Segment mapping:
  Segment Sections...
   00     
   01     .note.tag .ARM.exidx .rodata .ARM.extab 
   02     .text .init .fini 
   03     .data .tdata .tbss .jcr .init_array .fini_array .got .bss 
   04     .tdata .tbss 
   05     .tdata .tbss .jcr .init_array .fini_array .got 
   06     
   07     .note.tag 
   08     .ARM.exidx 
There are 24 section headers, starting at offset 0x12be3c:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .note.tag         NOTE            00010154 000154 000064 00   A  0   0  4
  [ 2] .ARM.exidx        ARM_EXIDX       000101b8 0001b8 000220 00   A  5   0  4
  [ 3] .rodata           PROGBITS        00010400 000400 01022c 00 AMS  0   0 64
  [ 4] .ARM.extab        PROGBITS        0002062c 01062c 000048 00   A  0   0  4
  [ 5] .text             PROGBITS        00021000 011000 0e9c14 00  AX  0   0 128
  [ 6] .init             PROGBITS        0010ac20 0fac20 000014 00  AX  0   0 16
  [ 7] .fini             PROGBITS        0010ac40 0fac40 000014 00  AX  0   0 16
  [ 8] .data             PROGBITS        0010b000 0fb000 002734 00  WA  0   0  8
  [ 9] .tdata            PROGBITS        0010e000 0fe000 000b60 00 WAT  0   0 16
  [10] .tbss             NOBITS          0010eb60 0feb60 000010 00 WAT  0   0  4
  [11] .jcr              PROGBITS        0010eb60 0feb60 000000 00  WA  0   0  4
  [12] .init_array       INIT_ARRAY      0010eb60 0feb60 000008 00  WA  0   0  4
  [13] .fini_array       FINI_ARRAY      0010eb68 0feb68 000004 00  WA  0   0  4
  [14] .got              PROGBITS        0010eb6c 0feb6c 00001c 00  WA  0   0  4
  [15] .bss              NOBITS          0010f000 0feb88 02cccd 00  WA  0   0 64
  [16] .comment          PROGBITS        00000000 0feb88 0074b6 01  MS  0   0  1
  [17] .ARM.attributes   ARM_ATTRIBUTES  00000000 10603e 00004f 00      0   0  1
  [18] .gnu.warning.mkte PROGBITS        00000000 10608d 000043 00      0   0  1
  [19] .gnu.warning.f_pr PROGBITS        00000000 1060d0 000043 00      0   0  1
  [20] .gnu_debuglink    PROGBITS        00000000 11b314 000010 00      0   0  1
  [21] .shstrtab         STRTAB          00000000 11b324 0000e3 00      0   0  1
  [22] .symtab           SYMTAB          00000000 106114 015200 10     23 3063  4
  [23] .strtab           STRTAB          00000000 11b407 010a32 00      0   0  1

Note that there is space after .got+0x1c before .bss starts
with a sizable alignment. The MemSiz for 03 does *not* span
.bss , unlike for amd64 (and the rest).


aarch64's /sbin/init is similar to amd64 instead of armv7:

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x000040 0x0000000000200040 0x0000000000200040 0x0001c0 0x0001c0 R   0x8
  LOAD           0x000000 0x0000000000200000 0x0000000000200000 0x01624f 0x01624f R   0x10000
  LOAD           0x020000 0x0000000000220000 0x0000000000220000 0x0dd354 0x0dd354 R E 0x10000
  LOAD           0x100000 0x0000000000300000 0x0000000000300000 0x011840 0x252111 RW  0x10000
  TLS            0x110000 0x0000000000310000 0x0000000000310000 0x001800 0x001820 R   0x40
  GNU_RELRO      0x110000 0x0000000000310000 0x0000000000310000 0x001840 0x001840 R   0x1
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0
  NOTE           0x000200 0x0000000000200200 0x0000000000200200 0x000048 0x000048 R   0x4

 Section to Segment mapping:
  Segment Sections...
   00     
   01     .note.tag .rodata 
   02     .text .init .fini 
   03     .data .tdata .tbss .jcr .init_array .fini_array .got .bss 
   04     .tdata .tbss 
   05     .tdata .tbss .jcr .init_array .fini_array .got 
   06     
   07     .note.tag 
There are 21 section headers, starting at offset 0x14b6f0:

Section Headers:
  [Nr] Name              Type            Addr             Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .note.tag         NOTE            0000000000200200 000200 000048 00   A  0   0  4
  [ 2] .rodata           PROGBITS        0000000000200280 000280 015fcf 00 AMS  0   0 64
  [ 3] .text             PROGBITS        0000000000220000 020000 0dd31c 00  AX  0   0 64
  [ 4] .init             PROGBITS        00000000002fd320 0fd320 000014 00  AX  0   0 16
  [ 5] .fini             PROGBITS        00000000002fd340 0fd340 000014 00  AX  0   0 16
  [ 6] .data             PROGBITS        0000000000300000 100000 003a20 00  WA  0   0 16
  [ 7] .tdata            PROGBITS        0000000000310000 110000 001800 00 WAT  0   0 16
  [ 8] .tbss             NOBITS          0000000000311800 111800 000020 00 WAT  0   0  8
  [ 9] .jcr              PROGBITS        0000000000311800 111800 000000 00  WA  0   0  8
  [10] .init_array       INIT_ARRAY      0000000000311800 111800 000018 00  WA  0   0  8
  [11] .fini_array       FINI_ARRAY      0000000000311818 111818 000008 00  WA  0   0  8
  [12] .got              PROGBITS        0000000000311820 111820 000020 00  WA  0   0  8
  [13] .bss              NOBITS          0000000000320000 111840 232111 00  WA  0   0 64
  [14] .comment          PROGBITS        0000000000000000 111840 007191 01  MS  0   0  1
  [15] .gnu.warning.mkte PROGBITS        0000000000000000 1189d1 000043 00      0   0  1
  [16] .gnu.warning.f_pr PROGBITS        0000000000000000 118a14 000043 00      0   0  1
  [17] .gnu_debuglink    PROGBITS        0000000000000000 13b7f8 000010 00      0   0  1
  [18] .shstrtab         STRTAB          0000000000000000 13b808 0000bd 00      0   0  1
  [19] .symtab           SYMTAB          0000000000000000 118a58 022da0 18     20 3621  8
  [20] .strtab           STRTAB          0000000000000000 13b8c5 00fe2b 00      0   0  1

Note that there is space after .got+0x20 before .bss starts
with a sizable alignment. The MemSiz for 03 does span
.bss , like for amd64 (and all but armv7).

powerpc64's /sbin/init is similar to amd64 as well:

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x000040 0x0000000000200040 0x0000000000200040 0x0001f8 0x0001f8 R   0x8
  LOAD           0x000000 0x0000000000200000 0x0000000000200000 0x039e94 0x039e94 R   0x1000
  LOAD           0x03a000 0x000000000023a000 0x000000000023a000 0x0e8e40 0x0e8e40 R E 0x1000
  LOAD           0x123000 0x0000000000323000 0x0000000000323000 0x005848 0x2381d9 RW  0x1000
  TLS            0x127000 0x0000000000327000 0x0000000000327000 0x001800 0x001820 R   0x10
  GNU_RELRO      0x127000 0x0000000000327000 0x0000000000327000 0x001848 0x001848 R   0x1
  GNU_EH_FRAME   0x01b270 0x000000000021b270 0x000000000021b270 0x00504c 0x00504c R   0x4
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0
  NOTE           0x000238 0x0000000000200238 0x0000000000200238 0x000048 0x000048 R   0x4

 Section to Segment mapping:
  Segment Sections...
   00     
   01     .note.tag .rela.plt .rodata .eh_frame_hdr .eh_frame 
   02     .text .init .fini .plt 
   03     .data .got.plt .tdata .tbss .ctors .dtors .jcr .init_array .fini_array .bss 
   04     .tdata .tbss 
   05     .tdata .tbss .ctors .dtors .jcr .init_array .fini_array 
   06     .eh_frame_hdr 
   07     
   08     .note.tag 
There are 27 section headers, starting at offset 0x157938:

Section Headers:
  [Nr] Name              Type            Addr             Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .note.tag         NOTE            0000000000200238 000238 000048 00   A  0   0  4
  [ 2] .rela.plt         RELA            0000000000200280 000280 000030 18  AI  0  11  8
  [ 3] .rodata           PROGBITS        00000000002002c0 0002c0 01afb0 00 AMS  0   0 64
  [ 4] .eh_frame_hdr     PROGBITS        000000000021b270 01b270 00504c 00   A  0   0  4
  [ 5] .eh_frame         PROGBITS        00000000002202c0 0202c0 019bd4 00   A  0   0  8
  [ 6] .text             PROGBITS        000000000023a000 03a000 0e8dfc 00  AX  0   0 16
  [ 7] .init             PROGBITS        0000000000322dfc 122dfc 00000e 00  AX  0   0  4
  [ 8] .fini             PROGBITS        0000000000322e0c 122e0c 00000e 00  AX  0   0  4
  [ 9] .plt              PROGBITS        0000000000322e20 122e20 000020 00  AX  0   0 16
  [10] .data             PROGBITS        0000000000323000 123000 003a80 00  WA  0   0 16
  [11] .got.plt          PROGBITS        0000000000326a80 126a80 000010 00  WA  0   0  8
  [12] .tdata            PROGBITS        0000000000327000 127000 001800 00 WAT  0   0 16
  [13] .tbss             NOBITS          0000000000328800 128800 000020 00 WAT  0   0  8
  [14] .ctors            PROGBITS        0000000000328800 128800 000010 00  WA  0   0  8
  [15] .dtors            PROGBITS        0000000000328810 128810 000010 00  WA  0   0  8
  [16] .jcr              PROGBITS        0000000000328820 128820 000008 00  WA  0   0  8
  [17] .init_array       INIT_ARRAY      0000000000328828 128828 000018 00  WA  0   0  8
  [18] .fini_array       FINI_ARRAY      0000000000328840 128840 000008 00  WA  0   0  8
  [19] .bss              NOBITS          0000000000329000 128848 2321d9 00  WA  0   0 64
  [20] .comment          PROGBITS        0000000000000000 128848 0074d4 01  MS  0   0  1
  [21] .gnu.warning.mkte PROGBITS        0000000000000000 12fd1c 000043 00      0   0  1
  [22] .gnu.warning.f_pr PROGBITS        0000000000000000 12fd5f 000043 00      0   0  1
  [23] .gnu_debuglink    PROGBITS        0000000000000000 1478b0 000010 00      0   0  1
  [24] .shstrtab         STRTAB          0000000000000000 1478c0 0000f1 00      0   0  1
  [25] .symtab           SYMTAB          0000000000000000 12fda8 017b08 18     26 1707  8
  [26] .strtab           STRTAB          0000000000000000 1479b1 00ff84 00      0   0  1


Note that there is space after .fini_array+8 before .bss starts
with a sizable alignment. The MemSiz for 03 does span
.bss , like for amd64 (and all but armv7).

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)



More information about the freebsd-ppc mailing list