Chris Jackson

the long road to atlas

On Friday we were pleased to launch Atlas, the video and audio index.

Atlas, as URIplay before it, has been a long-term side-project for us, alongside building exciting video and audio discovery services on top. Slowly but surely we’ve build a great resource for building video and audio services. This is a good time to talk about how we got to this point.

2007—an rdf media resolution database

URIplay started in 2007. Already there was a proliferation of ways to watch video, with more on the horizon. Numerous websites, different services for mobile phones, more for set-top boxes, and many complex commercial and rights arrangements. How would a viewer find the shows they were interested in? We hoped to help.

Lee Denison built a first prototype version in the final months of 2007. Spotting the importance of the problem, BBC R&D sponsored some of this work. We realised that the challenge was to integrate data from across the web, so we decided to try out new some new semantic web technologies. Lee built an engine that matched and made sense of RDF data, persisting it in MySQL. Meanwhile, I helped out by translating data from lots of sources into RDF, as grissle for Lee’s mill.

Lesson learned: the idea seems sound = we’re definitely working towards a valuable service

2008—live translation

By early 2008 we had an impressive demo. For a single TV episode, we could switch seamlessly between iPlayer, iTunes, and other sources, serving a PC, mobile phone, and TV with different video formats, and even handling content as it moved between rights windows. It worked great for the content of a broadcaster or two, but it wasn’t going to scale to the web. We were already crawling the full BBC site, but crawling YouTube and hundreds of other sites around the web didn’t seem very practical.

Enter live translation: Rather than answering queries from a database, could we simply translate data on remote services into URIplay format, on the fly? A couple of experiments later, we had a first implementation. And it worked pretty well; so well that we were able to use this approach to power the BBC channel within Totem, the Ubuntu media player. Suddenly 8m desktops had access to data from URIplay.

During the final part of 2008 we used the same instance of URIplay to investigate a whole new angle: social context. We had the privilege to work with the BBC R&D Prototyping team again on the Social Media Guide. User activity generated lists of programme URIs, and URIplay seamlessly translated them into playable locations, across devices, rights windows, and content from several broadcasters plus YouTube. The prototype was never distributed beyond the BBC, but we understand that it informed future work on integrating social activity with broadcast content. The latest beta version of the BBC iPlayer now contains some similar functions to the Social Media Guide.

Lesson learned: we need to make this work at web scale = scaling, scaling, scaling for the web

2009—expansions and experiments

A new year, and two, meaty, new challenges:

Our focus until this point had been on finding places where you could play a single item. But that on its own was starting to look like a rather specialised requirement. A TV episode makes no sense without its parent programme brand. Furthermore, people expect to be able to browse content in familiar ways. Genres and channels, and all kinds of other lists of content would need to be properly represented.

We’d also spotted a further issue: a lot of data we wanted to use was orphaned. Perhaps the classic example of orphaned data is a podcast, which typically has a URL for the actual podcast feed, and URLs for each of the audio files. If an episode of a podcast is good, someone might send you a link to that audio file. But there’s no way to get back to the full podcast from the audio file.

Similarly, until recently the BBC published the list of items that are available in their iPlayer video service in a different place from the the main metadata, and it was hard to guess where it would be for a particular item. To avoid screen-scraping, URIplay would need to link the availability list to the main metadata.

In response to these issues, we undertook a major restructure of the URIplay codebase, effectively merging live translation and database functions. This meant that we could provide lists of items, and link orphaned data. We were lucky that Robert Chatley chose to join us from Google as tech lead for this work.

To test the new framework we also introduced a lot more adaptors. After a couple of months’ work we were able to resolve links to all the major online services. And we started outputting in a wide range of formats. In June I was invited to speak at the Open Video Conference in New York, where we launched another prototype: amplus.tv, a service for video tastemakers. Using URIplay, amplus.tv was able to make sense of content from all the major online video sites, allowing users to build their own channel of content that could be played on a variety of devices, take it away as an RSS feed, and embed it in websites. We hope to bring back a production version of amplus.tv soon.

During the last half of 2009, the new version of URIplay also powered Test Tube Telly, an experiment in content aggregation and social TV, which we built with the support of 4iP, Channel 4’s digital innovation fund. This service included content from Channel 4’s own 4oD and from YouTube, as well as links to BBC content.

Through these efforts, we quickly realised that URIplay was missing many of the features needed to power a world class video website. Over the course of a few months we added better support for genres, channels, and parental guidance. Within the Test Tube Telly codebase, John Ayres added a Lucene index for search, an impressive system for filtering, paging, caching and rapidly displaying lists of content, and even integration with live streams of Twitter messages.

Slowly we wound the Test Tube Telly features back into URIplay, then started the long process of testing and refining them. We knew they were going to be amazingly useful to anyone building a video or audio service, but they had to be ready, and properly solid before launch.

Lesson learned: content resolution + lists + index = winning combination

2010—enter atlas

In early 2010 we launched our 5th major service on top of URIplay. watchsomething.tv is currently operated as a pilot. It produces smart feeds of content you might like, based on subscriptions and moods.

It was refreshingly quick to build watchsomething.tv, because almost every page was fed directly from a single query on URIplay. That was encouraging! We felt that we were almost ready to announce our features.

But a new problem lurked: MySQL simply wasn’t coping. We were asking it to write huge volumes of tweets, while simultaneously answering sophisticated queries from watchsomething.tv and our other services. As the database filled up, URIplay slowed down. We had a great application, but the wrong storage.

Ben Smith arrived in March to a new challenge. As Ben put it: “you want a huge database, that can be queried on almost any field, and you want those queries to be very fast indeed”.

Ben and John spent a few weeks testing almost every storage system known to man. Eventually we settled on MongoDB, started rewriting the queries, and tuned the code to run fast. Results were impressive. Finally we could do predictive search, and execute complex queries quickly. Watchsomething.tv was converted, and started responding fast. Finally, we were ready.

As we look back on the work so far, it’s clear that we now have a very different, and much more mature system compared to those first versions in 2007 and 2008. A new name was in order. In Greek mythology, Atlas is the titan who carries the heavens on his shoulders, though often depicted as carrying the world itself. We’re not certain our Atlas is up to that kind of load yet, but it’s certainly getting there. We hope you have fun playing with it.

Lesson learned: TBC = we hope you will actively help writing the story

Find out more and get involved by following this blog, @mb_atlas, the Google Group and Atlas on GitHub. If you want to use Atlas, then http://docs.atlasapi.org/ is the place to start. Do let us know how you get on!

It will be clear why this didn’t quite fit in at a particular point above, and why, nevertheless, I cannot leave it out: Mirona Iliescu has been a tremendous help throughout these years, whether it was building the site of URIplay, designing the majority of the services we built on top of it, or ensuring that a large range of web adapters were contributing to it.

blog comments powered by Disqus