Unix Debugging Tips

Dæmons | Files | Logging | Network | Permissions | Process Tracing

The many flavors of Unix operating systems complicate system diagnosis and debugging. Resources like the Rosetta Stone for Unix help catalog the differences. If new to Unix administration, supplement the vendor supplied documentation with a general handbook such as the Unix System Administration Handbook or Linux Administration Handbook.

Establish baselines: debug existing systems to learn how they work under normal operation. New systems and products should likewise be investigated. Write scripts to check for common problems, and run them automatically from monitoring services such as Nagios. Setup profiling to trend usage over time and spot slow performance. Many Round Robin Database (RRD) based tools collect and display trending information.

See also shell tips and debugging notes. Unix system configuration can be unified though a configuration tools such as CFEngine. Linux Troubleshooting covers information specific to Linux.

Dæmons

Find where they launch from, and how to toggle their startup status, and check whether they are actually running properly. This varies by system, system release, and the site in question. Places to look under or in include the init.d or rc.d directories under /etc or perhaps /sbin. Tools like lsof can determine what files and ports a process has open.

Older BSD systems use /etc/rc and /etc/rc.conf, along with *.local files for local configuration, and maybe also under /usr/local/etc. Newer BSD might use RC "next generation".

Apple systems use both /etc/rc and a custom SystemStarter that reads startup scripts from /System/Library/StartupItems or /Library/StartupItems. Check /etc/hostconfig for startup toggles.

Dæmons might also be running out of /etc/inittab or crond, or any number of other custom things, such as daemontools.

On systems that do not randomize the process numbers, sorting running services by the PID may indicate when or where an early process was started. On Mac OS X, for instance, /sbin/init is launched as process one, followed by OS X specific processes such as /sbin/mach_init and netinfod.

$ ps awxo pid,command | sort -n | head -10
PID COMMAND
1 /sbin/init
2 /sbin/mach_init
84 kextd
86 /usr/sbin/configd
87 /usr/sbin/diskarbitrationd
92 /usr/sbin/notifyd
117 netinfod -s local
119 update
121 /System/Library/CoreServices/coreservicesd

Files

Files on Unix may cause problems, such as login failing due a user with a shell of /bin/bash  instead of /bin/bash, only one of which is permitted under /etc/shells.

$ </etc/passwd grep ^root | awk -F: '{print $NF}' | od -bc
0000000 057 142 151 156 057 163 150 040 012
/ b i n / s h \n
0000011

Useful tools for debugging file contents include file(1), od(1) (or the newer hexdump(1) I have yet to figure out), and strings(1).

$ file index.xml
index.xml: UTF-8 Unicode English text, with very long lines
$ head -1 index.xml | od -bc
0000000 357 273 277 074 077 170 155 154 040 166 145 162 163 151 157 156
357 273 277 < ? x m l v e r s i o n
0000020 075 042 061 056 060 042 040 145 156 143 157 144 151 156 147 075
= " 1 . 0 " e n c o d i n g =
0000040 042 165 164 146 055 070 042 077 076 012
" u t f - 8 " ? > \n
0000052
$ sudo strings /var/log/wtmp | grep -v tty | sort | uniq

Some systems have a /proc filesystem that contains interesting data. For example, under Linux one may want to know what the environment settings are for a particular proccess, in this case that of the running shell itself. This could be used (or abused) by the superuser to check whether a user process has a particular environment variable set.

$ tr \\000 \\n < /proc/$$/environ

Logging

Logging may help debug problems, assuming logs are being generated and properly stored and analysed. Applications may use their own custom logfiles— check the configuration files, or use lsof to look for open log files— or send logs via syslog(3). For the stock syslogd on Unix, check /etc/syslog.conf for logging rules, and ensure syslogd is running.

Consolidating logs into a single file or database with syslog(3) or better protocols can ease debugging by avoiding the need to consult many different logfiles and log formats. More information on setting up logging and analyzing logfiles is available.

Network

To debug network protocols, low level tools such as tcpdump, tcpflow, or ethereal may be required. First determine whether the required traffic is even being seen by the systems in question (via tcpdump on the systems), then move on to the contents of the traffic (tcpflow or ethereal).

Tools such as ping and traceroute may also help, though a firewall could easily be configured to block these tools, while allowing other traffic through. More comprehensive scanning can be done with nmap.

The following list shows example uses of network debugging commands.

Permissions

Permission problems are very common, and often hard to debug if the problem is due to a parent directory and not the file being complained about. Hence, the parsepath utility, which lists permissions including those of any parent directories.

$ parsepath /etc/mail/certs/host.cert
% /etc/mail/certs/host.cert
d 1775 root:admin /
l 0755 root:admin /etc -> private/etc
d 0755 root:wheel /etc/mail
d 0755 root:wheel /etc/mail/certs
f 0644 root:wheel /etc/mail/certs/host.cert

parsepath can also test whether a user or group has access to a file, including checks on access to the parent directories of a file. For example, as part of debugging a Common Gateway Interface (CGI) script problem, use parsepath to check whether the web server user apache can both read and execute a script.

$ parsepath user=apache +rx /var/www/cgi-bin/test-cgi
! unix-other +x fails: f 0644 root:admin /var/www/cgi-bin/test-cgi
$ sudo chmod +x /var/www/cgi-bin/test-cgi

Non-standard Access Control List (ACL) features of different filesystems such as OpenAFS or enhanced security packages may complicate permissions checking. For example, a process running under systrace on OpenBSD might be denied access to a file it would otherwise have rights to access. Applications running on Linux may run afoul SELinux policies. Look into the audt2allow and chcon commands if having problems with SELinux.

Process Tracing

Berkeley Software Distribution (BSD) | Linux

Process tracing is invaluable for debugging or when setting up complicated applications such as OpenLDAP, but the tools available vary by Unix distribution. Also be sure to trace working dæmons, to learn what applications normally look like. This way, if an application later fails, a better comparison can be made, as the normal state will be known.

Need to find better tools to quickly parse the output of process traces to look for common patterns: permission denied errors, filenames, and similar?

Berkeley Software Distribution (BSD)

BSD systems include FreeBSD, Mac OS X, OpenBSD, among others. BSD systems should have ktrace(1) for process tracing, and kdump(1) to parse the output from ktrace.

When learning a new application, trace it to better learn the behaviour of the application, and to be able to find important files looked for by the program. For instance, the cfagent command of cfengine uses many files under /var/cfengine by default, which ktrace reveals.

$ sudo ktrace /usr/local/sbin/cfagent
$ sudo kdump -f ktrace.out > data
$ grep /var/cfengine data | sort | uniq
9265 cfagent NAMI "/var/cfengine"
9265 cfagent NAMI "/var/cfengine/cf.preconf"
9265 cfagent NAMI "/var/cfengine/cf_learning.db"
9265 cfagent NAMI "/var/cfengine/inputs/cfagent.conf"
9265 cfagent NAMI "/var/cfengine/inputs/cfrc"
9265 cfagent NAMI "/var/cfengine/inputs/update.conf"
9265 cfagent NAMI "/var/cfengine/modules"
9265 cfagent NAMI "/var/cfengine/ppkeys"
9265 cfagent NAMI "/var/cfengine/ppkeys/localhost.priv"
9265 cfagent NAMI "/var/cfengine/randseed"
9265 cfagent NAMI "/var/cfengine/rpc_in"
9265 cfagent NAMI "/var/cfengine/rpc_out"
9265 cfagent NAMI "/var/cfengine/state"
9265 cfagent NAMI "/var/cfengine/state/__db.cf_state.db"
9265 cfagent NAMI "/var/cfengine/state/cf_procs"
9265 cfagent NAMI "/var/cfengine/state/cf_state.db"
9265 cfagent NAMI "/var/cfengine/state/env_data"

Linux

Use strace(1) on Linux systems to trace system calls and signals.

$ strace /bin/echo test

# strace -o /root/named.log -p 1432

For applications like xinetd(8), running strace will be complicated as xinetd forks off copies of various programs. The solution to this problem is to use the -o option to write to logfiles, and -ff to follow forks.

$ ps awx | grep xinet[d]
20749 ? S 0:00 xinetd -stayalive -pidfile /var/run/xinetd.pid
$ sudo -H strace -F -ff -o/root/xinetd.log -p 20749
Process 20749 attached - interrupt to quit
Process 20813 attached
[ Process PID=20813 runs in 32 bit mode. ]
PANIC: attached pid 20813 exited
[ Process PID=20749 runs in 64 bit mode. ]
Process 20749 detached

Details should be in the /root/xinetd.log.20813 logfile.

To only log file related system calls, use the -e trace=file argument to strace. Use this to debug vendor installer scripts.