we write about the things we build and the things we consume
Oliver Hall

2014 in review: atlas

As always at MetaBroadcast towers, it’s been a busy year, particularly around our video and audio metadata platform, Atlas. In case you’ve not managed to keep up, here’s a run-down of some of the highlights from the past year of Atlas!

administering atlas

One of the big areas of improvement has been Atlas Admin, the administration tool for data sources and API keys. Originally, this was an internal tool for us to modify permissions on API keys for users, enabling or disabling data sources as required. Over the past year, we’ve rewritten this tool from the ground up, enabling users to log in directly, via Twitter, Google or GitHub, and administer their own keys and request access to sources. This required a host of new features around applying for access to data sources, but is now growing into a more mature tool. It enables us to not only administer applications using Atlas, but, with the addition of a data source wish-list, monitor which data sources users are most interested in accessing, and allow them to register interest in new features.

We’ve also improved Atlas Admin as an internal administration tool, by adding features such as usage statistics (for monitoring how often a particular key is used, and for what calls), opening up the opportunity to help our users by suggesting improvements around how they use our APIs. There has additionally been some initial work supporting outbound feed administration, providing monitoring and retry facilities for certain feeds to our clients. On a similar theme, we’ve started to add some tools to assist editorial and scheduling staff, such as allowing the ability to pick selected content, add related links to segments of shows, and add simulcast blackout restrictions.

All in all, it’s been a busy year for Atlas Admin, and that’s but the tip of the iceberg. Let’s take a look at another area in which there’s been a lot of work over the past year: supporting systems.

big is (not necessarily) better

It’s a well-known fact that systems change as they mature, and Atlas is no exception to this. Back in the day, it all used to run in a single JVM. We then split the system into two, moving the front-end API servers to an AWS elastic auto-scaling group, and the ingest and feed systems to another separate host. Currently many of our ingest processes run on the same JVM, but as we add more data sources, that box naturally ends up performing more tasks at once—not ideal! As a result, when we add a new source of data, we now tend towards building separate services, that then POST their data into Atlas. The first of these was Coyote, built by the able hands of both Liam and Oana to ingest various data from YouTube.

Ultimately, we’d like to split out all of our existing ingest systems in this way, as it allows for much more flexible resource allocation, and separation of concerns, but these things take time. For the time being, several new sources of data have been built out as separate systems over the past year. Let’s take a look at a couple…

First up is Barney—our Vimeo ingester. This was built up by our current intern, Jamie, to ingest videos and playlists from Vimeo, to augment our expanding collection of non-broadcast video metadata sources held in Atlas.

Jamie has also done some fine work decoding video streams in order to obtain actual start times for broadcast shows, based upon the signals sent down the ol’ trusty TV aerials. It wasn’t the easiest project, but has resulted in some awesome output, as well as a couple of Raspberry Pis sitting in the office, attached to the TV sockets. Hardware in the office? Who’d’ve dreamt it?!

Our other intern, Oana, (now departed back to the wonderful world of academia) also did some excellent work over the summer on Delphi (also known variously as Equivmon, Equivlog, Equivilog, and numerous other titles), a monitoring system for equivalence. As some of you are probably aware, Atlas is able to automatically match data from one source to another through a process we call equivalence. It almost always gives great results, but every now and then (as with all automated systems) they occasionally mismatch, and we have to give it a helping hand. In those circumstances, we used to have to do a large amount of debugging through the API and the databases to figure out the current situation and and improve our algorithms to correct it. Now, we have a helpful interface displaying lots of information we need to view the equivalence assertions that Atlas has made about a piece of content, as well as historical data about past links and related metrics. Awesomes!

front-end shininess!

We’ve swelled the ranks of our front-end team considerably this year, with the addition of Dan, Jason and Steve ensuring that the user-facing parts of Atlas get as much love as the backend Java and infrastructure. In particular, Steve has spent a good chunk of time tidying up the Atlas home page, as well as writing a ‘Now-Next’ widget to showcase what you can do with the data from Atlas.

ingesting all the things…

With all these extra new systems and other work, it’s easy to forget the work that’s been going on around Atlas itself. As always, a ton of new data sources have been added in the last year, including (but not limited to) Getty, Global Images, and more.

…and then outputting them again

We’ve also managed to fit in an extensive rewrite of a number of our output feeds, including our regular uploads to RadioPlayer, and our TVAnytime feeds. As part of this work, we’ve also managed to come up with a general task-based upload system, which we hope to migrate many of our feeds to, as it should make monitoring and administration much easier in future. It also paves the way for our output feeds to be split out from our other processing systems somewhere down the line.

new things, shiny things!

As always, along with all the new ingests and outputs, there’ve been a bunch of new features added to Atlas itself. We’re gradually trying to make sure these go into the new version, Deer, but there’s still been a smattering of additions to the current version, Owl. These include adding support for Events, such as football matches or motorsports, as well as integrating reviews data into Atlas.

We’ve also improved support on the Atlas model for deep linking locations. As mentioned the other week, deep linking is really important to providing a great catchup experience across different devices, so we’re really happy to have added support for this into our on-demand model.

Another nifty feature we’ve managed to integrate this year is statistics around audience demographics for a piece of content. Put another way, this means that, for a given piece of content, we can provide information about overall viewing figures, as well as breakdowns by different demographics.

infrastructural

There’ve also been a number of changes in terms of the infrastructure supporting Atlas over the past year, particularly around our databases. As long-time readers will probably be aware, the upcoming version of Atlas is backed by Cassandra, but our current iteration uses MongoDB. We’ve really started coming up against the limits of what MongoDB can do for us, but have been tweaking our replica set to best take advantage of both replica-sets and the storage options that Amazon offer through AWS.

We’ve been experimenting with spot instances for our Mongo secondary nodes. While we’d never rely exclusively on spot instances, putting in a bid for one is generally a win-win—you get an extra node to share your read load (in our case, we put all of our read load on the secondaries, while writes naturally go to the master), and because it’s a spot node, you pay well under the on-demand price. These are really handy for getting a bunch of extra capacity without spending lots of money (although you have to be OK with the fact that they may go away should you be out-priced!).

The other change we’ve started to make is to move to SSD-backed EBS volumes across the board. We’ve generally been reluctant to use EBS too much, given their track record in early years. But SSD EBS volumes, with tactical placement (i.e. not placing all your eggs in one EBS basket), give excellent, predictable, performance. We’ve noticed considerable improvements since putting these in place, and are thinking of rolling them out across our other MongoDB clusters as well!

oh deer…

We’ve been gently progressing on Deer, although for various reasons, progress has not been as swift as we might’ve hoped. Our current aim is to release in Q1 2015, although we don’t want to tie ourselves to a date at this moment—it’ll be ready when it’s ready! However, we have added support for Segments on Content, and made a start on a Meta API (an API documenting the APIs :)), that generates its output from annotations directly on the Java model. It means we can have full docs for the API without having to maintain a separate set of documentation! We’re also in the process of building an updated version 4 API explorer based upon this, so that we can offer an improved introduction to the ins-and-outs of Deer.

It’s certainly been a busy year for Atlas, but hopefully we’ve added a bunch of really useful features. Next year I’m sure will bring yet more, and hopefully we will have more news on Deer. In the mean time, if you would like to read more about Atlas, I would recommend having a browse through our A-Z of all things Atlas. All that remains is to wish everyone a Happy Christmas, and an Atlas-y new year!

blog comments powered by Disqus
sign up to #metabeers
slideshow