we write about the things we build and the things we consume
Here are the Atlas release notes for the week to 18th June.
The following items have been deployed to production:
- IDs on content – this a big deal
- These IDs are now stable and will carry across to 4.0
- Stronger series matching in VOD equivalence
The following items have been deployed to staging, for a future production release:
- Multiple channel logos, including transparent logos for light and dark backgrounds
all ends with beginnings
As previously advertised, Atlas has been getting the kind of makeover that will bring it to and end, and a fresh beginning all at once. Currently, our beloved global video & audio index is incarnated in two wonderful creatures, the wise 3.0 Owl we will soon let rest, and the shiny 4.0 Deer we will soon let roam the world.
what keeps the planet spinning
For those using Atlas, 4.0 features have become available pretty much every week now for a while, culminating with work in the recent weeks towards a big milestone, IDs. And while all that has been detailed on the blog and on the Atlas list as it happened, new users and old users alike might appreciate a reminder of the bigger picture.
the force of love beginning
How handy, then, to have a wonderful vehicle known as MetaTalks, which allows us to meet face to face for exactly this type of conversation! Today we are taking MetaTalks outside the office for the first time, venturing to a nice local pub, The Perseverance, and mixing things lovingly with our monthly MetaBeers.
we’ve come too far, to give up who we are
So if you’re into Atlas, be it old Owl or new Deer that takes your fancy, join us from 5:30pm this evening. The talk will start at 6pm, once we’re all comfortably ready, with a cold drink in hand and a promise of more—we’re buying! It’s a new format, in a new pub, about a new Atlas. What’s there not to like?
so let’s raise the bar, and our cups to the stars
Our speaker this evening is Fred, who very likely has the most intimate knowledge of bringing up Atlas with his bare hands, a couple times over. Next time we’ll invite Jonathan to tell you his side of the story. Until then, come raise a glas to Atlas Deer tonight, and see what we have in store for you.
So it is with Storm, as Trident has emerged to provide more sophisticated semantics for defining a distributed computation topology. Whereas Storm originally worked in terms of the nuts and bolts (or rather Spouts and Bolts in official terminology) needed to construct a flow of data across a cluster, Trident talks in terms of the flow of data - Streams - that pass through the system.
trident has a point
Describing the problem this way allows Trident to provide two strong enhancements. Firstly, data can be collected into batches that flow more efficiently between workers and can have robust 'exactly once' computation requirements applied. Secondly, Trident can optimise the topology for us, discarding unnecessary queues or network hops for direct method invocation.
The benefits are clear, and we have gladly adopted Trident as the means by which we create Storm Topologies.
However, the disadvantage of this sort of shift is that documentation and community knowledge is split. Very early adopters, wikis, example projects and other vital sources of information are torn between the original methods and the layers of enhancement that come to replace them. Newcomers to the project are as likely to discover the 'old way' of performing a given task as the new, better way. This can lead to confusion and frustration as older code that is no longer under active development languishes and degrades.
With that in mind, our experience using JmsSpouts in Storm has followed a predictable curve. Written as a plain Storm Spout, the adapter allows JMS queues to be attached to Storm topologies for processing. Under Trident however, the spout is automatically wrapped by a RichBatchSpoutExecutor which takes the individual tuples emitted by a wrapped spout and collects them together to fit Trident's batched processing model. Unfortunately we found that the combination of the two classes was unstable and led to batches of messages being failed.
Happily, as an open project, we're in a position to contribute, and at the time of writing a pull request has been submitted that provides a Trident implementation of the original JmsSpout. Where the original class might be used like this:
JmsSpout jmsSpout = new JmsSpout();
The new class can be substituted relatively easily:
TridentJmsSpout jmsSpout = new TridentJmsSpout()
all this can be yours
The replacement class combines both the retrieval of messages from a JmsQueue (such as ActiveMQ) and batching of those messages, along with acknowledgement for messages that are not AUTO_ACKNOWLEDGED. There is plenty of debugging for checking on the behaviour of messages entering a topology, including the ability to name Spouts so that debug messages can be traced. Testing so far appears to show that the Spout performs robustly, and with direct control over batching and acknowledgement, the class can act as a basis for more sophisticated fail/retry semantics should these be needed.
We hope this is a useful contribution to the Storm Community and will help those who start building Trident topologies from the start.
I've been working on a project recently that needed a feature where updates performed by the user are saved in the background to Atlas. The nature of this web application is a little unusual in that the updates performed will usually consist of lots of little updates rather than large sections of the data being changed at once. It's an HTML5 application so fortunately we have a few more options to enhance the user experience than the old days of the web.
work with a local copy
Originally the application worked directly with Atlas and each change was sent as it happened. This turned out to be a bit slow as even delays of a couple of seconds while data got transmitted across the network could add up and spoil the fluidity of the user experience. So the approach taken was to get the application to make the changes to a local copy of the data then periodically send that local copy to Atlas.
To enable this to work heavy use is made of HTML5 sessionStorage. This lets us save the data we need locally, but it gets automatically deleted when the user closes their browser and ends the session. When an item is to be worked with in the application it is read from Atlas and put into session storage.
One slight issue is that the local and session storage objects just store strings as values. Anything more complicated, like a programme from Atlas cannot be stored directly as it is a complex object. Fortunately this is relatively easy to get round by storing the object in JSON format. This one liner takes care of saving a programme previously retrieved from Atlas into the item variable:
To retrieve the item from storage and put it back into a format we can work with you can decode the stored JSON value like this:
save in the background
Now that we have the application working with a copy of the data stored in the browser rather than remotely, changes and updates to it can be performed very quickly, but it is still important to save the changes remotely as quickly as possible. Part of the nature of the application is that many small changes get made to the data. To save bandwidth and time it makes sense to roll up a few of these changes and write them to Atlas to once. In this cases the individual changes are very quick to execute to performing them, rolling up the changes and writing to Atlas can take place in a few seconds.
The first part of the jigsaw is to write a function that will take the item and transform it into the right format and POST to Atlas. This involves removing any objects only used locally and then rendering the item in JSON format. The call to save the item to Atlas is actually set up whenever a local version is stored. The trick here is to use a timeout function to slightly delay the save by a second. If another local save is made in that time the timeout function is cancelled and re set up. So you end up with something like this:
You will notice that the is also a call to view.showSavingNotification(), this shows a visual indication to the user that a background save is taking place. It is cleared by the callback function used on a successful save.
The final step is to put in a maximum number of times that a background save can be delayed. To do this a maximum and a count of the number of defers are defined in the programme object:
Now the code becomes:
So this seems to do a nice job of background saving and gives us a flexible way to work with our data without having to make the user wait for network requests. It also shows how HTML5 technology and modern browsers can be used to enhance an application and this technique could even be modified to provide an offline mode.
And we’re done!
Yes, it’s hard to believe but ABC-IP, our project with BBC R&D to automate the linking of broadcast content and archives, is a wrap. We’ve been working on it over the last two years, indeed I’ve been working on it since I started at MetaBroadcast, so there’s plenty to look back on.
the final quarter
The final quarter was about tidying things up, stabilising components and generally preparing the best of what we’ve built for productisation and commercialisation.
This meant we did two things:
new tellytopic homepage
We rebuilt the Tellytopic homepage to ensure that the content displayed is always the most high value content for the current schedule period. This replaces the previous homepage, which focused on content broadcast the previous evening.
The new design is centred around three things:
- Top content for the people most popular now
- Top content for the places most popular now
- A list of the topics currently most popular on Twitter that provide further onward journeys through topic pages
Additionally, there is a menu to switch between content sources. This defaults to all sources, but can also be used to restrict the content displayed to that from a single provider.
This new homepage provides a useful tool to discover the archive content that is currently most relevant, and hence most valuable, to a large cross section of the audience.
moving everything to atlas 4.0
We also worked on finalising the enhanced support for topics in Atlas 4.0. This has been achieved by migrating both People Match and Tellytopic to the new API, in order to prove the new functionality works as expected.
During the course of this many issues with the new API have been fixed, and many small features that make the kind of requests needed easier to make have been added. This 'dogfooding' has been invaluable in shaking out the new functionality and topics in Atlas 4.0 are now complete.
As a further benefit, both People Match and Tellytopic are running against the latest version of the Atlas API and are able to take advantage of the improvements that this brings. These include better results, by combining and sorting data server side, as well as significantly improved performance.
a look back at what we achieved
But that was just the last three months. Over the course of the last two years, we’ve built an amazing toolset for automated linking. Here are the highlights.
|Atlas data support||Extended Atlas to support the ingest of many new data sources, including the BBC World Service Archive. Also added new HTTP POST functionality that makes it significantly easier for future content providers to integrate with Atlas.|
|Atlas topic support||The Atlas data model was extended to include topics from multiple sources that have varying relationships to content. Topics describe what programmes are about, or what subjects audiences ascribe to the programmes they watch.|
|Magpie||Magpie is a service that provides both topic extraction and Twitter hashtag generation for programmes in Atlas. It reads programmes from Atlas and writes back equivalent items with automatically generated topics and hashtags. This means that programmes that do not have topics in the source data can be included in topic-based discovery mechanisms and tweets about those programmes found on Twitter.|
|Twitter topics||This component performs realtime topic extraction from tweets about programmes on Twitter. This enables a realtime link to be made as the programme is being broadcast between the content of a programme and what audiences are saying about that programme.|
|Topic recommendations||Used to generate the topics related to a topic, to display as similar or related topics. Works by finding the topics that cooccur on programmes most often with other topics, taking into account both topic weightings and topic frequency.|
|Programme recommendations||Used to generate the programmes related to a programme. Works by finding the programmes that share the most cooccurring topics, taking into account both topic weightings and topic frequency.|
|Zeitgeist||Zeitgeist is a service that analyses programme topics in Atlas and provides popular topics and programme recommendations based on current topics. It determines what is currently topical by looking at popular programme topics and popular Twitter topics in the current schedule period.|
|ALSO||ALSO (Audience Loves Something Other) is a powerful visualisation of audiences from the conversations they have on Twitter. For each tracked programme, it shows two things: what the audience talks about during the programme and what the audience for that programme talks about at other times.|
|People Match||People Match is a browser extension that augments any existing website with links to related content in any archive that has been indexed in Atlas. It works by extracting topics from the web page currently being visited by the user and then queries Atlas for matching content and displays it to the user.|
|Tellytopic||Tellytopic is an end-user prototype that enables browsing programme archives by topic. It includes programmes from the BBC, World Service Archive, Channel 4, ITV and Channel 5 and provides various methods for navigating between them using topics from each of the sources developed during the project. Methods for programme discovery include a Zeitgeist powered homepage that shows content for the topics that are currently popular, as well as topic pages, similar topics and programme recommendations.|
Of course, ABC-IP was an R&D project, so we still need to do work on each of these components to get them to a production-ready standard. This work is underway now and we expect the first users to go live over the next quarter.
So, what was our big learning from ABC-IP? Well, quite simply: topics. Structured data for what a programme is about.
Topics are becoming increasingly important for automated linking and more granular recommendation. These are areas we’re already exploring further in other projects, so expect to hear us continue to talk about them over the next few months and beyond.
We’ve certainly built a lot of things over the course of this project, but the work we’ve done puts us in a fantastic place to now start working with commercial partners on automated content linking. Do get in touch if you’d like to see more and discuss further.