This is another post about perfectionism vs. practicality. (It contains many lists.)
This is a quickly-thrown-together pile of vaguely useful information. As you can see, it contains:
- The currently-playing (and/or last-played) piece of music from the Office Playlist, via our Last.fm account (to which the office Spotify instance posts);
- The last few tickets that have been closed in our Jira (i.e. what it is that people have recently done);
- Quick and dirty mean response times for our main APIs, from Graphite (just enough information to see how stuff is going or when people are hammering them with heavy requests);
- The count of unresolved PagerDuty incidents (red if there’s a new one, yellow if they’re all ack‘d);
- The number of open support tickets in Jira, shown sorted by urgency with warning colours if they get close to our SLA;
- A list of failing/unstable builds from Jenkins, with a count of those that are fully broken (and accompanying red colour);
- Counts of critical and warning alerts (grouped by stage vs. production) from Sensu.
I have been strangely unhappy with some of its behaviour and didn’t even really want to post about it as it’s “not totally ideal”—but actually this is a really good example of my occasionally excessive/sense-free perfectionism as an “anti-pattern”. Its flaws may or may not be real, but haven’t in any way prevented it from being really useful!
the good news
Getting this thing up on the screens has had a number of noticeable effects as a result of things simply being visible:
- Showing up all the build failures (at one point there were many) has prompted a lot of spring cleaning of abandoned projects and fixes to persistent test failures.
- General interest in (or at least awareness of) response times has increased massively, leading to a better shared understanding of the issues that affect them.
- It adds a drop of extra satisfaction to the act of closing a ticket, improving our collective compliance metrics.
- There is no longer any need to open Spotify to find out exactly who created the noises currently sledgehammering the creative flow.
Dashing is a very effective framework with which to get a useful dashboard up quickly, and I wouldn’t hesitate to recommend it.
- You write Ruby modules that poll external services and mangle the results into JSONable hashes for sending to the dashboard.
It provides everything in between.
- It runs your polling jobs every n seconds for you.
- It provides a method to serialize the results and send them to as many connected front-ends as you like over persistent EventStream connections.
- It simplifies the normally-painstaking initial decision of how to lay out widgets by providing a fixed pixel grid and some basic CSS.
There are still some things that really need improving.
- I hate the “Last updated” lines; they don’t do their job very well. When Graphite falls over (which it does slightly more often than might be reasonably expected) it’s usually a few minutes before someone realizes the response times have been the same for a while. It should actively warn about absent events (help us monitor our monitoring!) and just go away when things are working.
- There’s no decent exception handling in the Ruby jobs by default—a slightly messy stack trace just gets printed on the server. It’d be nice to actually push errors out to the clients for better visibility (along with the absence of updates).
- I really, really hate Batman and all its “data bindings” (it may be an even worse idea than Angular…) and would much rather just write templates normally—its filters are just another unnecessary abstraction (more useless knowledge to gather). I’ll probably replace it with Jade (sorry YAML-haters).
Hopefully once this stuff is sorted we’ll have a GitHub release that might actually be some use to someone. Expect an update soon! And as usual, don’t hesitate to make bird noises at us if you feel like it.