Tom McAdam

what’s it all about, atlas? upcoming changes to topics

Earlier in the year we implemented the first version of topics in Atlas, which describe what a programme or segment is about. We’ve done a lot more with them since then, and we’re planning to releasing these improvements to production next week. The way they work has changed slightly, so please read on if you’re already using them to see if you’ll be affected.

share and share alike

Previously, the namespace of a topic was used to express both the namespace of the topic and how it relates to the content. For example, a programme about Charles Mingus would have topic ccPh with namespace dbpedia. If we wanted to express that people watching a show tweeted about him, then another topic, with a namespace of twitter, was created.

This is obviously problematic if you’d like to locate all content about Charles Mingus, irrespective of how the content relates to him, since http://atlas.metabroadcast.com/topics/ccPh/content.json would not include content where people have tweeted about him.

So we’ll now have a single topic for Charles Mingus, and we’ve added a new relationship attribute (more on that shortly). There are no structural changes to the API for the move to a single topic per publisher, but there will be changes in the data returned:-

  • We’ll be removing existing topics and re-importing them, so topic IDs will change and there will be a period during which topics won’t be returned while we re-import
  • Topic namespaces will now represent solely the namespace within the scope of the publisher of the topic

new attributes

We’ve added a couple of new attributes on the topic node in the API. Firstly, relationship, which can have one of the following values:-

about The content is about this topic
twitter:audience People tweeted about this topic during a broadcast of the content
twitter:audience-related People who tweeted about this content also tweeted about this topic
transcription This topic was derived from what was said in the content
transcription:subtitles This topic was derived from subtitles on the piece of content

We’ve also added a publisher attribute on topic references, which tells you who it was that made an assertion about the topic relationship and will be returned if you specify the publisher annotation. This is useful if you’re using equivalence merging, as you’ll be able to see which publisher made an assertion.

Let’s look at an example where a MetaBroadcast system has decided an episode of The Archers was about badgers:-


{
    "contents": [
        {
            ....
            "topics": [
                {
                    "publisher": { "country": "ALL", "key": "metabroadcast.com", "name": "MetaBroadcast" },
                    "relationship": "about",
                    "supervised": false,
                    "topic": {
                        "aliases": [],
                        "id": "dbQr",
                        "namespace": "dbpedia",
                        "publisher": {
                            "country": "ALL",
                            "key": "dbpedia.org",
                            "name": "DBpedia"
                        },
                        "same_as": [],
                        "title": "Badger",
                        "type": "subject",
                        "uri": "http://stage.atlas.mbst.tv/topics/ddNb",
                        "value": "http://dbpedia.org/resource/Badger"
                    },
                    "weighting": 0.55
                }
            ],
            "type": "episode",
            "uri": "http://www.bbc.co.uk/programmes/b01m5jw1"
        }
    ]
}

We’ll be posting updates to the mailing list as we roll out this change, and please do get in touch via that if you’ve any questions.

blog comments powered by Disqus