Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
689 views
in Technique[技术] by (71.8m points)

assembly - gdb behaves differently for symbols in the .bss, vs. symbols in .data

I recently started learning assembly language for the Intel x86-64 architecture using YASM. While solving one of the tasks suggested in a book (by Ray Seyfarth) I came to following problem:

When I place some characters into a buffer in the .bss section, I still see an empty string while debugging it in gdb. Placing characters into a buffer in the .data section shows up as expected in gdb.

segment .bss
result  resb    75
buf resw    100
usage   resq    1

    segment .data
str_test    db 0, 0, 0, 0

    segment .text
    global main
main:
    mov rbx, 'A'
    mov [buf], rbx          ; LINE - 1 STILL GET EMPTY STRING AFTER THAT INSTRUCTION
    mov [str_test], rbx     ; LINE - 2 PLACES CHARACTER NICELY. 
    ret

In gdb I get:

  • after LINE 1: x/s &buf, result - 0x7ffff7dd2740 <buf>: ""

  • after LINE 2: x/s &str_test, result - 0x601030: "A"

It looks like &buf isn't evaluating to the correct address, so it still sees all-zeros. 0x7ffff7dd2740 isn't in the BSS of the process being debugged, according to its /proc/PID/maps, so that makes no sense. Why does &buf evaluate to the wrong address, but &str_test evaluates to the right address? Neither are "global" symbols, but we did build with debug info.

Tested with GNU gdb (Ubuntu 7.10-1ubuntu2) 7.10 on x86-64 Ubuntu 15.10.

I'm building with

yasm -felf64 -Worphan-labels -gdwarf2 buf-test.asm
gcc -g buf-test.o -o buf-test

nm on the executable shows the correct symbol addresses:

$ nm -n  buf-test     # numeric sort, heavily edited to omit symbols from glibc
...
0000000000601028 D __data_start
0000000000601038 d str_test
... 
000000000060103c B __bss_start
0000000000601040 b result
000000000060108b b buf
0000000000601153 b usage

(editor's note: I rewrote a lot of the question because the weirdness is in gdb's behaviour, not the OP's asm!).

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

glibc includes a symbol named buf, as well.

(gdb) info variables ^buf$
All variables matching regular expression "^buf$":

File strerror.c:
static char *buf;

Non-debugging symbols:
0x000000000060108b  buf            <-- this is our buf
0x00007ffff7dd6400  buf            <-- this is glibc's buf

gdb happens to choose the symbol from glibc over the symbol from the executable. This is why ptype buf shows char *.

Using a different name for the buffer avoids the problem, and so does a global buf to make it a global symbol. You also wouldn't have a problem if you wrote a stand-alone program that didn't link libc (i.e. define _start and make an exit system call instead of running a ret)


Note that 0x00007ffff7dd6400 (address of buf on my system; different from yours) is not actually a stack address. It visually looks like a stack address, but it's not: it has a different number of f digits after the 7. Sorry for that confusion in comments and an earlier edit of the question.

Shared libraries are also loaded near the top of the low 47 bits of virtual address space, near where the stack is mapped. They're position-independent, but a library's BSS space has to be in the right place relative to its code. Checking /proc/PID/maps again more carefully, gdb's &buf is in fact in the rwx block of anonymous memory (not mapped to any file) right next to the mapping for libc-2.21.so.

7ffff7a0f000-7ffff7bcf000 r-xp 00000000 09:7f 17031175       /lib/x86_64-linux-gnu/libc-2.21.so
7ffff7bcf000-7ffff7dcf000 ---p 001c0000 09:7f 17031175       /lib/x86_64-linux-gnu/libc-2.21.so
7ffff7dcf000-7ffff7dd3000 r-xp 001c0000 09:7f 17031175       /lib/x86_64-linux-gnu/libc-2.21.so
7ffff7dd3000-7ffff7dd5000 rwxp 001c4000 09:7f 17031175       /lib/x86_64-linux-gnu/libc-2.21.so
7ffff7dd5000-7ffff7dd9000 rwxp 00000000 00:00 0        <--- &buf is in this mapping
...
7ffffffdd000-7ffffffff000 rwxp 00000000 00:00 0              [stack]     <---- more FFs before the first non-FF than in &buf.

A normal call instruction with a rel32 encoding can't reach a library function, but it doesn't need to because GNU/Linux shared libraries have to support symbol interposition, so calls to library functions actually jump to the PLT, where an indirect jmp (with a pointer from the GOT) goes to the final destination.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...