Chris Adams

If you use the standard Python logging library, the syslog handler doesn't work by default on OS X 10.5. The problem is that SyslogHandler assumes that it can send to localhost:514. In 10.5 Apple disabled network syslog by default, which is a good security measure but it means that previously working code will no longer work (rdar://5871746). The solution is simple - if you used to do logging.handlers.SysLogHandler(address) you now need something like this:

if sys.platform == "darwin":
	# Apple made 10.5 more secure by disabling network syslog:
	address = "/var/run/syslog"
else:
	address = ('localhost', 514)
				
syslog = logging.handlers.SysLogHandler(address)
2008-4-17 08:16

A great example of how bad customer service works: Comcast cut David Winer's service for exceeding a secret transfer limit. He calls and is immediately confronted with a hostile legal department convinced that customers should pay for service they're not allowed to use. Hilarity ensues.

The interesting question: will it be too late by the time the cable companies realize they don't have a monopoly any more? Bandwidth already has competition and now more and enough companies are starting to offer their content on iTunes, Hulu, etc. that the cable company's $60/mo is starting to look rather excessive…

2008-4-11 09:11

Finally, a blog-meme I can related to. Here's what I do all day at home and at work - numbers which I'm sure have changed dramatically since I started living in TextMate a few years ago:

chris@home:~ $ history|awk '{print $2}'|sort|uniq -c|sort -rn|head
 166 python
  52 ls
  47 cd
  37 ssh
  34 svn
  16 pydoc
  16 mv
  16 essh
  15 vim
  12 cat
cadams@work:~ $ history|awk '{print $2}'|sort|uniq -c|sort -rn|head
 93 sudo
 56 ls
 53 cat
 42 man
 41 cd
 25 svn
 16 vim
 14 grep
  9 top
  8 rm

For years the tech press has denigrated Apple users as being shallow flash-seekers. This is an amusing bit of projection given the key role they play in the hype-cycle, selling shiny-but-useless crap to anyone unfortunate enough to trust them. It's getting a bit annoying watching yet another repetition of the "iPhone killer" article which completely misses why the iPhone was successful. Here's Gizmodo breathlessly hyping the Samsung Instinct (“At Last, a Decent iPhone Competitor”):

Samsung's Instinct may be the best stab at the coveted title of iPhone killah this CTIA. The 3.1-inch touchscreen phone has localized haptic feedback, plus three hard navigation keys. … The Instinct rocks EV-DO Rev A and GPS, in both cases besting what's in the fruit phone. And then content and app wise, Sprint's own wares are basically swapped in for Apple's. … On top of that, its customizable homescreen is amazing

Here's Gizmodo on the Instinct today pointing out that this is the kind of flash-without-substance everyone likes to accuse Apple of selling:

The iPhone-challenging visual voicemail, for instance, ain't quite live. Plus, it locked up when I was messing around with the music store, and needed a hard reset for the more money shot voice command features, which still didn't quite work (or finding a McDonald's is just too much). And the web browser doesn't, um, touch mobile Safari, at least not in its present state.

The thing which almost always gets ignored by tech writers is that iPhone has never been popular with people like the Gizmodo writers because it basically has one new feature (the multitouch interface); it otherwise ignominiously loses a feature–matrix comparison: no 3g, small screen, poor battery life, restricted bluetooth, no MMS, no third-party apps, no flash cards, etc. Everything on the feature list has been done for years by cheaper, less restricted competitors. Logically, this means that the iPhone sucks and the people buying it are brain-washed fashionistas, right?

The catch, of course, is that feature matrixes only matter to marketing people and tech journalists who want to become marketing people. Everyone else tends to care more about how well a few basic features actually work - and that's where the iPhone shines, just as with the similarly disparaged iPod before it:

I've been cell-phone only since 1999 or so and I've used a fair number of phones during that period. The iPhone is the first phone I've had which can honestly claim to have a usable web browser. It's also the first one other than the Palm devices to have an address book or calendar which are usable with more than a dozen entries. It's the first one which lets me dial my fiancé with less than a dozen clicks. It's the first phone I haven't needed to reset because it locked up or got unusably slow.

Note how 3G didn't enter into the picture? It's not really necessary, any more than broadband was a prerequisite for the Internet to enter mainstream life. The trick was that Apple actually paid attention to usability and wrote software which does things like transferring email in the background so you don't have to wait - an advanced 1980s technique which could save Motorola if they copy it quickly (the SLVR my iPhone replaced couldn't scroll text anywhere near as fast as my old PC XT).

Update:Here's another example from The Inquirer:

The Instinct was partly designed by Sprint based on Java software and will have features that the iPhone lacks such as the ability to use EV-DO Rev A, the fastest cellular broadband technology available on the Sprint and Verizon Wireless networks.


This will make it a lot faster than the Iphone. It will also be smaller and feature a GPS chip

In theory, many phones are faster than the iPhone. In practice, the network engineers' hard work has been squandered by the poorly designed browsers shipped on just about everything else. The Instinct may avoid this trend but it's likely it will be yet another 3G phone which is slower than the iPhone if you browse on the phone but faster if you tether it to a laptop and use a real browser. This, incidentally, is another example of another over-emphasized iPhone critique: you can't tether a laptop but I've never needed to, either. A few hardcore road-warriors would need a 3G card but most people can find a WiFi hotspot if they need more than web+email.

Yahoo has included a rich table control in YUI. I've been using a simple ImpTable class for years but decided to convert it to a YUI DataTable. The goal is simplicity: ImpTable now allows you to create a DataTable using as few as two lines of PHP while still allowing you to customize it extensively using CSS and JavaScript to take advantage of DataTable's rich set of features.

Here's an example of an ImpTable created when my database wrapper class (ImpDB) is profiling database activity: ImpTable reporting database queries

ImpTable tries to make the process of including a YUI DataTable as simple as possible, including automatically loading YUI if your page does not already use it. If you're already using YUI, I strongly recommend loading the DataTable and DataSource modules along with your other components to avoid version mismatches.

You can download ImpTable directly or check out a copy of the Subversion repository which contains it: svn co http://svn.improbable.org/ImpUtils/branches/trunk ImpUtils

For details including usage of xmlHttpRequest data and scripting please visit the Demonstrations.

2008-3-14 17:03

I noticed Kenny Tilton's recent Solve the First Problem post on reddit a few days ago. It's a really simple rule: when faced with a complicated problem, start with the first thing you can fix — the others often turn out to be side-effects. This is one of my favorite "debugging magic" techniques - over the years I've run into many problems which seemed far more intimidating than they actually were because the bug spewed out hundreds of errors or the assumption that the bug must be a mistake on their part because the program in question must be far too widely used to have a bug like that.

Today I fixed a problem which has been annoying us for awhile using what I think of as the zero-th law of debugging: if you can't find the problem, your first bug is poor observability.

A little background: we've been working on a more scalable upgrade to our LDAP server infrastructure. Since we're a Debian shop the process was really simple: clone our base Debian VM, apt-get install slapd, run cfengine, a quick slapcat/slapadd and we're up and running 15 minutes later with an LDAP master & a couple slaves (have I mentioned lately how much I love having invested in automation)? A little load-testing later and I'm ready to unleash this thing on our internal group — and then the guy down the hall mentions that he can't login…

After a quick look, this is even more confusing - the problem appears to be account specific but the LDAP records are very similar and copied from the current production server. Exhaustive comparisons between the two accounts turn up nothing. Soon one of the other admins discovers another interesting wrinkle: the problem only happens if pam_ldap uses SSL or STARTTLS - insecure connections work correctly for all accounts! Contributing to the confusion are the error messages insinuating that the problem is the LDAP server:

2008-03-11T10:30:58+07:00 su[17545]: pam_ldap: ldap_search_s Can't contact LDAP server 
2008-03-11T10:31:01+07:00 su[17545]: pam_ldap: ldap_simple_bind Can't contact LDAP server 
2008-03-11T10:31:03+07:00 su[17545]: pam_authenticate: Authentication service cannot retrieve authentication info. 

A quick check with debug 65535 in /etc/pam_ldap.conf shows that the LDAP server is sending back all of the requested data:

ldap_read: want=44954, got=16376
  0000:  82 af 97 04 26 75 69 64  3d 6a 62 2c 6f 75 3d 50   ....&uid=jb,ou=P  
  0010:  65 6f 70 6c 65 2c 64 63  3d 73 6e 6c 2c 64 63 3d   eople,dc=snl,dc=  
  0020:  73 61 6c 6b 2c 64 63 3d  65 64 75 30 82 af 6b 30   salk,dc=edu0..k0  
  0030:  47 04 0b 6f 62 6a 65 63  74 43 6c 61 73 73 31 38   G..objectClass18  
  0040:  04 0c 70 6f 73 69 78 41  63 63 6f 75 6e 74 04 0d   ..posixAccount..  
  0050:  73 68 61 64 6f 77 41 63  63 6f 75 6e 74 04 0d 69   shadowAccount..i  
  0060:  6e 65 74 4f 72 67 50 65  72 73 6f 6e 04 0a 61 70   netOrgPerson..ap  
  0070:  70 6c 65 2d 75 73 65 72  30 0b 04 03 75 69 64 31   ple-user0...uid1  
  0080:  04 04 02 6a 62 30 13 04  09 75 69 64 4e 75 6d 62   ...jb0...uidNumb  
  0090:  65 72 31 06 04 04 31 30  30 32 30 1b 04 0d 68 6f   er1...10020...ho  
  00a0:  6d 65 44 69 72 65 63 74  6f 72 79 31 0a 04 08 2f   meDirectory1.../  
  00b0:  68 6f 6d 65 2f 6a 62 30  18 04 0a 6c 6f 67 69 6e   home/jb0...login  
  00c0:  53 68 65 6c 6c 31 0a 04  08 2f 62 69 6e 2f 63 73   Shell1.../bin/cs  
  00d0:  68 30 10 04 02 63 6e 31  0a 04 08 4a 6f 65 20 42   h0...cn1...Joe B  
…
ldap_read: want=28578 error=Success
ber_get_next failed.
ldap_err2string

After about the fifth time comparing strace results the real problem dawns on me: the main thing making this hard is because pam_ldap and OpenLDAP have pretty rough debugging code which is really only used by the developers and so we're stuck guessing at what might be happening. This has been an inconvenience for years and I realized that I've spent more time scrutinizing high debug-level dumps and strace output than it would take to add proper logging to pam_ldap - and anyway, how many times can you not fix a mocking "error=Success" telling you that something clobbered a useful errno before you could see it?

A quick apt-get source libpam-ldap later and I had a huge pile of innovatively-indented code to slog through (thank FSM for indent). The general process was to find each function in pam_ldap.c and add syslog() at each point where an error can cause an early return. I also cleaned up a number of the error messages and made sure that the important ones included the actual error codes since it was frequently the case that a meaningful low-level error was turned into an uninformative return -1 before it was logged. The resulting works-on-my-system tested patch is part of Debian bug report #470509.

Armed with my shiny new syslog information I quickly wound up in _get_user_info() where ldap_search_s() is used to retrieve the user's information. A quick apt-get source libldap2 and not-so-quick patch to add better logging later and I finally had a real error message:

SSL_read() failed: gnutls_read() returned -59: GnuTLS internal error

Huh? Although that does kind of mesh with that clue in the big dump above: ldap_read only received part of the data it needed (suspiciously close to a 16KB chunk in fact). A quick check confirms that the problem is size related - both of the failing accounts have jpegPhoto attributes which are larger than the jpegPhotos for the working accounts. A quick test later and we've confirmed that size is the problem by telling ldap_search_s() to return only attribute names and not the values:

  rc = ldap_search_s (session->ld, ssd->base, ssd->scope, filter, NULL, 1, &res); 
  rc = ldap_search_s (session->ld, ssd->base, ssd->scope, filter, NULL, 0, &res); 

This isn't quite the approach I ended up putting in the patch (which queries for a single attribute) and ultimately this led to a second bug report () for the OpenLDAP client library since ldap_search_s() should never fail in this fashion. Since this post has already grown a bit large, though, I'm going to abbreviate those details in favor of the ever-popular

  1. OpenLDAP's logging is an operational cost: libldap is used all over the place and every single user has to live with logging which does not help isolate common operational problems and requires far more knowledge of the library internals than most users are going to have (or want).
  2. The saving grace was that everything involved is open source software: OSS is certainly not immune to unpleasant bugs and poor observability but I've run into similar bugs from all of the major commercial software vendors. The difference here is that Open Source means that my systems are patched a day after shaking off my "traditional ISV support"-induced apathy, leading to the final and most important point:
  3. I was stupid not to do this earlier: time spent improving observability is really time borrowed from the future. It saved me some time now which I otherwise would have spent in single-stepping in gdb but while that may cost me an “Extreme Sysadmin™” t-shirt it's a good investment in my future sanity. This class of problems has been widely discussed in the log-analysis community and something which most sysadmins (myself included) tend to view as more intractable than they really are. Fixing something the right way really isn't as time-consuming as it may seem at first glance and it sure beats scrutinizing chicken entrails the next time something breaks.
2008-3-19 18:19

I just finished a training meeting about good habits for scripting. I've seen plenty of "Scripting Intro" pages but I've been a bit disappointed to see that many of them head straight into Bash syntax nuances without discussing design, testing, or usability. At some level this seems to be due to the fact that shell-scripting and, to a lesser extent, non-compiled languages isn't considered Real Programming™ — darkly amusing when you consider how much of the important data in the world passes through a script at some point.

Rather than reinvent the syntax-guide wheel, I wanted to spend more of my time discussing good software engineering: architecture, testing, logging, and documentation. It's a fairly language-agnostic guide with examples in bash, Perl and Python and tries to encourage using smarter tools rather collecting cool-but-unmaintainable shell hacks. I'm planning to follow up with occasional examples discussing specific programs and the problems which they solve — I'd appreciate any feedback you might have.

Taking Scripting Seriously

In a perfect world software would gracefully network transitions. Unfortunately my users have encountered a fair number of things which don't always handle things like a laptop moving from ethernet to WiFi, a DHCP server taking awhile to respond, etc. While many programs have at least reached the point of eventually timing out and retrying it would be nice to automatically restart something as soon as the system network configuration changes. This is unfortunately system-specific and frequently required some hackish approach involving tail -f or equivalent to watch a log file, which is slow and tends to break on upgrades.

OS X has a nice way to query the current system configuration and receive event notifications when things change: the SystemConfiguration Framework (Technical Note TN1145: Living in a Dynamic TCP/IP Environment is also of interest). You can explore this using the scutil command-line tool - in the example below, I've looked at the list of available events and chosen to watch for power-state changes, receiving a notice when I unplugged the power cable from my laptop:

chris@Enceladus:~ $ scutil
> list
  subKey [0] = Plugin:IPConfiguration
  subKey [1] = Plugin:InterfaceNamer
  subKey [2] = Setup:
  subKey [3] = Setup:/
  subKey [4] = Setup:/Network/Global/IPv4
  subKey [5] = Setup:/Network/HostNames
…
  subKey [21] = State:/IOKit/PowerManagement/CurrentSettings
  subKey [22] = State:/IOKit/PowerSources/InternalBattery-0
…
> n.add State:/IOKit/PowerSources/InternalBattery-0
> n.watch
> notification callback (store address = 0x1036c0).
  changed key [0] = State:/IOKit/PowerSources/InternalBattery-0
notification callback (store address = 0x1036c0).
  changed key [0] = State:/IOKit/PowerSources/InternalBattery-0

This is pretty cool stuff but I'd like to do something smarter than scripting a copy of scutil. I could write an Objective-C application but OS X 10.5 included the very handy PyObjC 2.0 which allows access to most of the native APIs directly from within Python. James Reynolds posted a message to the MacEnterprise mailing list which prompted me to stop procrastinating and actually write some code. A little poking around later and I have a Python script which is ready for me to add whatever custom actions I want to take when the network state changes - the version below is abbreviated so you'll want to download the full watch-network-config.py for your own use:

from Cocoa import *
from SystemConfiguration import *

def handleNetworkConfigChange(store, changedKeys, info):
	print "Global network configuration changed: ", changedKeys
	# Kick a change-intolerant service in the head here

store = SCDynamicStoreCreate(None, "global-network-watcher", handleNetworkConfigChange, None)

SCDynamicStoreSetNotificationKeys(store, None, [ 'State:/Network/Global/IPv4', 'State:/Network/Global/IPv6' ])

CFRunLoopAddSource(CFRunLoopGetCurrent(), SCDynamicStoreCreateRunLoopSource(None, store, 0), kCFRunLoopCommonModes)
CFRunLoopRun()

Geoff Franks took the time to have the event handler use a dictionary so you can listen for multiple events and run a specific command for each one; I added a little syslog support and am releasing this version as a replacement for the widely-used Kicker which was removed in 10.5: Download kicker-replacement

Fun lessons from the trenches: in versions of OS X prior to 10.5 there were several nasty bugs due to lookupd and DirectoryService not having real timeouts: we have some rigs which use DHCP on our public network and static IPs on a private experiment network. When the system booted the private interface didn't need to wait for a DHCP lease and thus came up slightly faster than the public interface — this should have been harmless except that DirectoryService immediately attempted to connect to our LDAP server which isn't reachable on the private network and network timeout values aren't actually used in any version prior to 10.5, causing network accounts and NFS mounts to be unavailable until someone manually killed DirectoryService!

Older Entries