Sysadmin triage

June 8th, 2009

Back when I was a professional sysadmin (now I just do it for fun) I came up with a few simple tests to perform on misbehaving hosts. These tests are very obvious and easy to check, but they’re worth remembering because too often we’re tempted to look for complex solutions to problems that, initially, look complex. It’s humbling just how often what looks like a complex software issue, really isn’t complex at all.

So when things go wrong, before reverting the last change, before breaking out gdb and strace and before tweaking your software on your production host, spend 5 minutes and run through these quick, simple tests – there’s a high likelihood that you’ll solve your problem quickly. (I’m sure there’s other tests you can do – these are the ones burnt into my mind)

#1 – Disk space

Don’t laugh – running out of disk space can cause you pain in so many ways it’ll spin your head.   Check all partitions, including /tmp, /var/tmp and /var. Running out of tmp means applications won’t be able to write temporary files which, depending on the app, may make it behave very strangely. /var is used for many things including logging in /var/log – not being able to log will make some software cry like a baby – i.e. it may crash and you’ll have no idea why – it certainly won’t be in the log file.  Databases like MySQL don’t like having no room to write in /var/lib/mysql – Don’t be surprised if you get some db corruption. With MySQL, you may be able to start the database and even connect to it with the mysql client, leading you to look elsewhere – but checking disk space will take you seconds.

dkam@vihko:~$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             9.4G  7.1G  1.8G  80% /
udev                   10M  116K  9.9M   2% /dev
shm                   128M     0  128M   0% /dev/shm

Don’t forget to check iNodes too – running out of inodes can cause the same issues as diskspace but is less obvious – checking for it is just as easy though:

dkam@vihko:~$ df -ih
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/sda1               1.2M    445K    772K   37% /
udev                     32K    1.1K     31K    4% /dev
shm                      32K       1     32K    1% /dev/shm

#2 - DNS resolution

DNS resolution problems can cause your system and application to hang or timeout in very strange ways.

Some applications will log the name of inbound network connections, performing reverse lookups. If no NS is available, these connections may start to take a long time to connect, as the software waits for the resolver to timeout. If only the first listed NS has failed, this timeout may be variable in length, but probably around 15 seconds. If you see weird lags or delays, check your name servers. This can happen when you’re trying to ssh into the host – if you’re getting delays connecting via ssh, check DNS.  If your software makes connections to external databases for example, and is configured to address them by name, you’ll see these timeouts.

This one can be tricky because some software will cache the name resolution and some local resolvers may cache – meaning you’ll see delays or timeouts sometimes, but not consistently.

Name lookups should be under a second, preferably in the low 100’s of milliseconds.

dkam@vihko:~$ time host www.apple.com
www.apple.com is an alias for www.apple.com.akadns.net.
www.apple.com.akadns.net has address 17.251.200.32

real	0m0.132s
user	0m0.000s
sys	0m0.000s
dkam@vihko:~$ time host www.apple.com
www.apple.com is an alias for www.apple.com.akadns.net.
www.apple.com.akadns.net has address 17.251.200.32

real	0m0.011s
user	0m0.000s
sys	0m0.000s

You can see that in the second run, the name server had cached the value and returned much faster.

It also pays to check each nameserver listed in /etc/reslov.conf:

dig www.google.com @208.78.97.155

Naturally replace 208.78.97.155 with your name server’s IP.

Check “man resolv.conf” for more information.

#3 – Ulimits

The most common ulimit’s that I’ve come across is max number of open files, but you may see others including max user processes.  This one is generally obvious if the software is running as a regular user – when you try to connect as that user you will see error messages about being unable to allocate resources. Editing a file or trying to read a man page will error out if you’re at the maximum number of open files.  Network connections fall into this category also – so you may not be able to open network connections either.

The more likely scenario is that the software is running as a different user – one that people don’t log in as.  Try logging in as, or su -’ing to the user – if you can’t or you can but the user can’t open files, check the ulimits.  In Bash, try “ulimit -a” to view your limits. Different OSs limit these values in different ways – check your OS doco for details.

#4 - /dev/random

This is a little esoteric and is pretty unlikely, but /dev/random is used for lots of reasons – the most common use that you may have problems with is login with software that uses stuff like CRAM-MD5. Random data is used as part of the authentication process and when there’s not enough random data, logging in will be slow or may timeout completely.  Most software should probably fall back to using /dev/urandom.  You can time how long it takes to read 1kb of random data like this:

dkam@vihko:~$ dd if=/dev/urandom of=/dev/null bs=1K count=10
10+0 records in
10+0 records out
10240 bytes (10 kB) copied, 0.002011 s, 5.1 MB/s

#5 - permissions

Generally this will only bite you when you’ve made changes or updated software – check that config files are readable, data directories are read/writable and that executables are executable.

dkam Geeky, SysAdmin

Weekend racing.

May 5th, 2009

Went out to race my RC car on the weekend. It broke inside two laps. See Marshall Banana’s posts for pics / details.

dkam Uncategorized

Flying Fun

March 29th, 2009

If you’re ever looking for a fun flying toy, I can confidently recommend the Syma S026. Flies around for about 5 minutes before needing a charge. With two sets of dual blades, you can rotate it left and right and fly forwards and backwards. Up and down too naturally.  It’s tough enough that multiple crashes didn’t hurt it at all and it’s small enough that it doesn’t actually damage anything if you run into stuff. Well, avoid flowers and stuff precariously balanced. Candles also are best put away. You understand.

Update: Try eBay for even better prices! ( oh wait -- $19.95 shipping!?!)

 

dkam Geeky

Virus detection. Awesome.

February 26th, 2009

This is from 2006 – wonder what they’ve done since.

dkam Uncategorized

White Zombie – Electric car

February 22nd, 2009

Twitter + Jabber = Jitter?

February 9th, 2009

I’ve been playing with Twitter lately – I created a Booko Twitter account to reserve the account name while I consider using it. I’ve got my own Twitter account and I had a bit of a play around with it, but honestly using Twitter via the web seems like a drag. Yet another page to watch. Plus, I subscribed to the MelbTransport guy’s page and all I could think was “Can’t I filter this to show me only the updates I’m interested in?” – apparently no, you can’t. 

I had a look at the applications out there to manage your tweets but all I could think was “Man, another app to run, another distraction.”. I’ve already got email and IM, I don’t really want another app bouncing in the Dock to tell me someone’s posted a message. So, I got to thinking, maybe there’s a way to get Twitter messages to be sent to me via IM?  Had a brief look around but didn’t immediately find anything suitable. A quick Google however netted two interesting Ruby Gems – twitter and xmpp4r-simple, which give you a nice Ruby interface to Twitter and Jabber. So, after a couple of hours of hacking around, getting my Twitter account temporarily rate limited and creating Jabber accounts, I’ve got a very simple Twitter <-> Jabber gateway going.

It will post tweets to your Jabber account & you can reply! Your reply will get posted to Twitter.  As an added bonus I added filtering so I can see only what I want from MelbTransport guy’s updates.  You can easily add your own filters in there – hopefully it’s pretty straight forward. 

Now, I know this isn’t beautiful, elegant Ruby code – feel free to leave constructive criticism in the comments.

#!/usr/bin/env ruby

require 'rubygems'
require 'twitter'
require 'xmpp4r-simple'
require 'benchmark'

jabber_user="sendinguser@jabber.org.au"
jabber_pass=""

$receiving_jabber = "receivinguser@jabber.org.au"

twitter_user="twitteruser"
twitter_pass=""

jabber = twitter = nil

cj = Benchmark.realtime {jabber = Jabber::Simple.new(jabber_user, jabber_pass)}
puts "Connecting to Jabber: #{cj}"

ct = Benchmark.realtime {twitter = Twitter::Base.new(twitter_user, twitter_pass) }
puts "Connecting to Twitter: #{ct}"

def filters(status)
  if status.user.name == "MelbTransport"
    yield if status.text =~ /Craigieburn|Broadmeadows|Upfield/
  else
    yield
  end
end

def get_tweets(twitter, tweets, jabber)
  begin
    twitter.timeline.reverse.each do |s|
      if tweets[s.id].nil?
        filters(s) { jabber.deliver($receiving_jabber, "#{s.user.name} says: #{s.text}") }
        tweets[s.id] = "Sent"
      end
    end
  rescue Twitter::CantConnect
    puts "Can't connect. Sleeping."
    sleep 120
    retry
  end
end

def post_tweets(twitter, jabber)
  jabber.received_messages { |msg| twitter.post(msg.body) if msg.type == :chat }
end

def main(twitter, jabber)
  tweets = {}
  while true
    puts "Action!"
    get_tweets(twitter, tweets, jabber)
    post_tweets(twitter, jabber)
    sleep 60
  end
end

main(twitter, jabber)

Edits:Reversed the order of the timeline to match how they should show up in IM (IE – oldest at the top, newest at the bottom.

dkam Uncategorized , , ,

Science & Morality discussed by Sam Haris

February 9th, 2009

Truism of the day

February 5th, 2009

“Broken gets fixed. Shoddy lasts forever”.  I’ve heard this phrased differently in different places and it’s too often true.  

via DesignAday – Truism.

dkam Uncategorized

Storm by Tim Minchin

January 31st, 2009

1981 Internet story

January 30th, 2009