Troubleshooting and Debugging
Troubleshooting and Debugging
Troubleshooting : the process of identifying, analyzing and solving problems
Debugging : the process of identifying, analyzing and removing bugs in a system
Troubleshooting is for infrastructure
tcpdump, wireshark
strace, ltrace
ps, top etc
Debugging : is for Software application
Debugger : follows the code line by line , inspect changes in variable assignments, interrupt the program when a specific condition is met and more
Steps to solve any issue
Getting Information to understand the problem
Isolate and finding the root cause
Performing the necessary remediation
Document what we do
The different things we tested to try
Figure out the root cause.
The steps we took to fix the issue.
strace : to trace system calls made by the program
o file.strace : to save output of trace to a file
Main steps to solving any issue ?
Isolating the root cause is super important
Load average : amount of time the process is busy in a minute
load average 1 means it was busy for whole min.
shouldn’t be above the amount of process in computer
Reproduction Case
Read the logs
Linux
/var/log/syslog
.xsession-errors
MacOS : Library/Logs/
Windows : EventViewer
Our solution dont come up by wandering about things, we have to look at information to plug things into our knowledge graph. looking at error messages or documentation
Get a reproduction, try to isolate the problem
Understanding the root cause is super important
iostat :
iotop :
vmstat :
ionice :
iftop :
rsync -bwlimit
trickle
ab
nice & renice
time
kill command
Like isolating causes, understanding error messages, adding logging information, and generating new ideas for possible failures.
First apply intermittent solution, then apply the full solution
Last updated