Monday, April 27, 2015

How to figure out segmentation fault (segfault)

Well, sometime we do meet segfaults on Linux/Unix systems. We definitely do not like them, as they are hard to debug.

(1) What is segfault?

A segmentation fault (often shortened to segfault) or access violation is a fault raised by hardware with memory protection, notifying an operating system (OS) about a memory access violation. In short, your program tries to access memory which it is not supposed to access.

(2) Where can I find the info about segfault?

The most straight forward way is to find it in the kernel log (/var/log/kern.log) or system log (/var/log/syslog). Its format is like:
Apr 27 18:17:55 prod-util-c01 kernel: [32427315.749998] your-program[39902]: segfault at fffffffffffffff3 ip 000000000073442c sp 00007fa141a8b460 error 5 in your-program[400000+1bc0000]
where you could find:
your hostname "prod-util-c01 kernel";
your program name "your-program";
the memory address the segfault tried to access "fffffffffffffff3";
the Instruction Pointer (ip) "000000000073442c" which is the assembly instruction address;
the Stack Pointer (sp) "00007fa141a8b460";
the error code "5": the error code is just the architectural error code for page faults and seems to be architecture specific. They are often documented in arch/*/mm/fault.c in the kernel source.

Note: if the segfault happened in a dynamic library (*.so), then you need to do "000000000073442c"-"400000" to find the internal ip address inside the library.

(3) How to debug it?

Debugging is a hard part, but still possible. :)
First of all, you'd better compile your program with "-g -O0" to add symbol info and disable optimization. If you cannot do that, that's also possible to locate the bug, but definitely harder.

(3.1) Use objdump

objdump -S your-program > your-program.objdump.txt
which will generate a text file including your C++ code (if you compiled your program with "-g"), assembly code, and the memory address.
Find the IP address (000000000073442c) to locate the code which caused the segfault. Trace back the call stack to see which functions called the code.

(3.2) Use core dump and gdb

(3.2.1) Enable core dump

To enable core dump on Ubuntu 12.04, you need to run:
ulimit -c
to see the current max number of bytes for a core dump. If the printed value is 0, it means core dump is disabled now. Then you could change the limit to some proper value, e.g., change it to 10GB:
ulimit -c 10000000
Another thing you should take care of is the core dump pattern. Ubuntu 12.04 pipes core dump files to Apport (Ubuntu's crash reporting system) via /proc/sys/kernel/core_pattern by default. If Apport discovers that the program in question is not one it should be reporting crashes for (which you can see happening in /var/log/apport.log), it falls back to simulating the default kernel behaviour of putting a core file in the cwd (this is done in the script /usr/share/apport/apport).
There is an easy temporary workaround for this by running:
sudo service apport stop
which should change /proc/sys/kernel/core_pattern from the apport pipe to just core. You could also see cat /proc/sys/kernel/core_pattern for the current core pattern.

(3.2.2) Debug with coredump and gdb

If you could enable the core dump on your machine and successfully got a core dump when segfault happened. You win a good option to debug the core dump in gdb with the following command:
gdb your-program your-core-dump
You can run backtrace (or bt) to show the call stack when the segfault happened. You can do "print variable-name" to see the value of a variable. For other commands provided by gdb, please refer to its document.
To print the first N elements of a vector (myVector), do:
print *(myVector._M_impl._M_start)@N

No comments: