Introduction
In this tutorial, we will delve into the Linux strace command and discover its capabilities in tracing and monitoring system calls initiated by a running process. The strace command is a powerful tool that provides valuable insights into the inner workings of programs, making it essential for effective debugging and troubleshooting. We will begin with an introduction to the strace command, then explore tracing system calls, and finally, learn how to use strace for debugging processes. By the end of this tutorial, you'll be equipped with the skills to effectively utilize the strace command in your Linux system administration and development tasks.
Introduction to strace Command
In this section, we will explore the strace command, a vital tool in Linux for tracing and monitoring the system calls made by a running process. System calls serve as the interface between a process and the operating system kernel, and understanding these calls is crucial for efficient debugging and problem resolution for any systemadmin.
Let's begin by installing the strace package:
sudo apt-get update
sudo apt-get install -y strace
Example output:
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
libunwind8
Suggested packages:
fakeroot
The following NEW packages will be installed:
libunwind8 strace
0 upgraded, 2 newly installed, 0 to remove and 0 not upgraded.
Need to get 292 kB of archives.
After this operation, 1,054 kB of additional disk space will be used.
Do you want to continue? [Y/n] Y
...
Now, let's use the strace command to trace a basic program. We will use the ls
command as an example:
strace ls
Example output:
execve("/usr/bin/ls", ["ls"], 0x7ffee4f7a0f0 /* 23 vars */) = 0
brk(NULL) = 0x55b7d6c23000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
...
The output reveals the sequence of system calls invoked by the ls
command, including execve
to initiate the command, brk
to allocate memory, access
to verify file permissions, and openat
to open the dynamic linker cache file.
Analyzing the strace output provides insights into a program's interaction with the operating system, which is incredibly beneficial for debugging and understanding program execution.
Tracing System Calls with strace
In this part, we will explore in greater detail the usage of the strace command for tracing system calls made by a running process on your Linux system.
Let's begin by creating a simple Python script that will serve as our tracing target:
cat > ~/project/example.py << EOF
import time
print("Hello, World!")
time.sleep(5)
EOF
Now, let's trace the execution of this script using strace:
strace python ~/project/example.py
Example output:
execve("/usr/bin/python", ["python", "/home/labex/project/example.py"], 0x7ffee4f7a0f0 /* 23 vars */) = 0
brk(NULL) = 0x55b7d6c23000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
...
write(1, "Hello, World!\n", 14) = 14
time(NULL) = 1618304400
nanosleep({5, 0}, NULL) = 0
exit_group(0) = ?
+++ exited with 0 +++
The output displays the system calls triggered by the Python script. This includes execve
, which executes the Python interpreter, write
for printing the "Hello, World!" message, time
for retrieving the current time, and nanosleep
to pause the script's execution for 5 seconds.
By analyzing this strace output, you can decipher how your program interacts with the operating system, identifying potential issues or performance limitations that might arise in Linux environments for your systemadmin tasks.
Consider another example. This time, we will trace the execution of the ls
command, enhanced with additional options:
strace -c ls -l ~/project
Example output:
total 4
-rw-r--r-- 1 labex labex 59 Apr 12 13:33 example.py
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
45.45 0.000005 5 1 execve
27.27 0.000003 3 1 brk
9.09 0.000001 1 1 access
9.09 0.000001 1 1 openat
9.09 0.000001 1 1 close
0.00 0.000000 0 4 read
0.00 0.000000 0 2 fstat
0.00 0.000000 0 1 mmap
0.00 0.000000 0 1 mprotect
0.00 0.000000 0 1 munmap
0.00 0.000000 0 2 ioctl
0.00 0.000000 0 1 statfs
0.00 0.000000 0 1 access
0.00 0.000000 0 2 newfstatat
0.00 0.000000 0 2 close
------ ----------- ----------- --------- --------- ----------------
100.00 0.000011 22 total
In this instance, we utilized the -c
option to generate a summary of system calls executed by the ls
command. The report includes the proportion of time spent in each system call, the count of calls, and any errors encountered. Understanding the root user role and systemadmin nuances is key to interpreting these logs.
This detailed information proves valuable in pinpointing performance bottlenecks or gaining deeper insights into a program's behavior.
Debugging Processes with strace
In this segment, we will explore the application of the strace command for debugging running processes and identifying possible problems within the Linux operating system.
To begin, let's create a simple C program that we can use for debugging:
cat > ~/project/example.c << EOF
#include <stdio.h>
#include <unistd.h>
int main() {
printf("Hello, World!\n");
sleep(5);
return 0;
}
EOF
Now, let's compile the program and run it under strace:
gcc -o ~/project/example ~/project/example.c
strace ~/project/example
Example output:
execve("/home/labex/project/example", ["/home/labex/project/example"], 0x7ffee4f7a0f0 /* 23 vars */) = 0
brk(NULL) = 0x55b7d6c23000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
...
write(1, "Hello, World!\n", 14) = 14
time(NULL) = 1618304400
sleep(5) = 5
exit_group(0) = ?
+++ exited with 0 +++
The output details the sequence of system calls initiated by the C program. Notable calls include execve
for starting the program, write
to output "Hello, World!", and sleep
to pause execution for 5 seconds.
Suppose we need to troubleshoot an issue with the program. Strace can help. For instance, if the program isn't writing the expected output to a file, we can trace file-related system calls to diagnose the situation:
strace -e trace=file ~/project/example
Example output:
execve("/home/labex/project/example", ["/home/labex/project/example"], 0x7ffee4f7a0f0 /* 23 vars */) = 0
write(1, "Hello, World!\n", 14) = 14
time(NULL) = 1618304400
sleep(5) = 5
exit_group(0) = ?
+++ exited with 0 +++
The output reveals that the program does not use any file-related system calls, suggesting that the problem is not connected to file handling.
By strategically employing strace to trace particular system calls or the complete system call activity, you can often pinpoint the source of problems in your programs and debug them more efficiently. This is an invaluable skill for any systemadmin.
Summary
In this tutorial, we explored the powerful Linux strace command, allowing us to trace and monitor system calls made by a running process. We began with introducing the strace command and installing it. Next, we used strace to trace system calls made by the simple ls
command, gaining insights into how the program interacts with the operating system. Then, we delved deeper into using strace to trace system calls, creating a simple Python script and observing the sequence of system calls it makes. By analyzing the strace output, we can better understand program behavior and debug issues that may arise. This knowledge is key for any systemadmin managing Linux systems.