I'm assuming you're building this with gcc -m32 -nostartfiles segment-bounds.S
or similar, so you have a 32-bit dynamic binary. (You don't need -m32
if you're actually using a 32-bit system, but most people that want to test this will have 64-bit systems.)
My 64-bit Ubuntu 15.10 system gives slightly different numbers from your program for a few things, but the overall pattern of behaviour is the same. (Different kernel, or just ASLR, explains this. The brk address varies wildly, for example, with values like 0x9354001
or 0x82a8001
)
1) Why is my program starting at address 0x8048190 instead of 0x8048000?
If you build a static binary, your _start
will be at 0x8048000.
We can see from readelf -a a.out
that 0x8048190
is the start of the .text section. But it isn't at the start of the text segment that's mapped to a page. (pages are 4096B, and Linux requires mappings to be aligned on 4096B boundaries of file position, so with the file laid out this way, it wouldn't be possible for execve
to map _start
to the start of a page. I think the Off column is position within the file.)
Presumably the other sections in the text segment before the .text
section are read-only data that's needed by the dynamic linker, so it makes sense to have it mapped into memory in the same page.
## part of readelf -a output
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .interp PROGBITS 08048114 000114 000013 00 A 0 0 1
[ 2] .note.gnu.build-i NOTE 08048128 000128 000024 00 A 0 0 4
[ 3] .gnu.hash GNU_HASH 0804814c 00014c 000018 04 A 4 0 4
[ 4] .dynsym DYNSYM 08048164 000164 000020 10 A 5 1 4
[ 5] .dynstr STRTAB 08048184 000184 00001c 00 A 0 0 1
[ 6] .gnu.version VERSYM 080481a0 0001a0 000004 02 A 4 0 2
[ 7] .gnu.version_r VERNEED 080481a4 0001a4 000020 00 A 5 1 4
[ 8] .rel.plt REL 080481c4 0001c4 000008 08 AI 4 9 4
[ 9] .plt PROGBITS 080481d0 0001d0 000020 04 AX 0 0 16
[10] .text PROGBITS 080481f0 0001f0 0000ad 00 AX 0 0 1 ########## The .text section
[11] .eh_frame PROGBITS 080482a0 0002a0 000000 00 A 0 0 4
[12] .dynamic DYNAMIC 08049f60 000f60 0000a0 08 WA 5 0 4
[13] .got.plt PROGBITS 0804a000 001000 000010 04 WA 0 0 4
[14] .data PROGBITS 0804a010 001010 0000d4 00 WA 0 0 1
[15] .bss NOBITS 0804a0e8 0010e4 0002f4 00 WA 0 0 8
[16] .shstrtab STRTAB 00000000 0010e4 0000a2 00 0 0 1
[17] .symtab SYMTAB 00000000 001188 0002b0 10 18 38 4
[18] .strtab STRTAB 00000000 001438 000123 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
2) Why is there a gap between the end of the text section and the start of the data section?
Why not? They have to be in different segments of the executable, so mapped to different pages. (Text is read-only and executable, and can be MAP_SHARED. Data is read-write and has to be MAP_PRIVATE. BTW, in Linux the default is for data to also be executable.)
Leaving a gap makes room for the dynamic linker to map the text segment of shared libraries next to the text of the executable. It also means an out-of-bounds array index into the data section is more likely to segfault. (Earlier and noisier failure is always easier to debug).
3) The bss start and end addresses are the same. I assume that the two buffers are stored somewhere else, is this correct?
That's interesting. They're in the bss, but IDK why the current position isn't affected by .lcomm
labels. Probably they go in a different subsection before linking, since you used .lcomm
instead of .comm
. If I use use .skip
or .zero
to reserve space, I get the results you expected:
.section .bss
start_bss:
#.lcomm buffer, 500
#.lcomm buffer2, 250
buffer: .skip 500
buffer2: .skip 250
end_bss:
.lcomm
puts things in the BSS even if you don't switch to that section. i.e. it doesn't care what the current section is, and maybe doesn't care about or affect what the current position in the .bss
section is. TL:DR: when you switch to the .bss
manually, use .zero
or .skip
, not .comm
or .lcomm
.
4) If the system break point is at 0x83b4001, why I get the segmentation fault earlier at 0x804a000?
That tells us that there are unmapped pages between the text segment and the brk. (Your loop starts with ebx = $start_text
, so it faults at the on the first unmapped page after the text segment). Besides the hole in virtual address space between text and data, there's probably also other holes beyond the data segment.
Memory protection has page granularity (4096B), so the first address to fault will always be the first byte of a page.