Continuing our Atlas A-Z series, M is for model. At the core of any API is its data model, and this is equally true of Atlas. Being a hub of media metadata from many disparate sources, how we handle the quirks of each in a consistent way is especially important.
We already talked about the content model earlier in this series, where we bombarded you with class diagrams to give you an idea of the relationships between types in Atlas. The post about equivalence also gave you an insight on how we deal with data from different sources.
Flexibility is key to the Atlas model. Sure, we have an object hierarchy with relationships between types, but these are all optional, and objects can be referenced in many ways. This is important so that we can cater for the nuances, and sometimes sparseness, of every dataset. Let’s look at programme relationships as an example. A brand may have a bunch of one-off programmes in it, that have no relationship to each other. It may have that more common structure of brands having series, which themselves have episodes in them. But what about Black Mirror? Each series consists of three programmes apiece, each bearing no relationship to the others. That’s fine: it’s a brand containing series, each containing one-off items, rather than episodes.
The model is sparse, too, so very few fields are required. In fact, in Atlas 3.0, only a URI and a type is needed to save something. In Atlas 4, we’ve reduced this further: only an entity’s type is needed.
Equally important is consistency across data sources. We normalise data from each source into our common data model. As you may recall, we match records depicting the same piece of content by a process called equivalence. This matched data is merged on the API depending on the data sources an API key has access to. Through the Atlas administration website, an API key with access to the Press Association and BBC data may be configured such that images are taken from the PA if present, but fall back to the BBC if not. By ensuring data is modelled consistently across all data sources, this can happen seamlessly, since someone using data from the API need not care as to the provenance of a particular piece of data, as wherever it came from it will be consistent. Enabling or disabling a data source for an API key is a therefore just a small configuration change in the Atlas administration website.
the a-z series so far
Since we’re now half-way through our A-Z series, I thought it would be a good time to recap on the previous posts:
- A is for Atlas
- B is for Broadcasts
- C is for Content
- D is for Denormalization
- E is for Equivalence
- F is for Feeds
- G is for Genres
- H is for Historical Data
- I is for IDs
- J is for Json
- K is for…. well, this is embarrassing, isn’t it. In putting together this list, I noticed that poor K had been left out. Let us know if there’s an Atlas topic beginning with K that you’d like to hear about and we’ll fill in the gap soon!
- L is for Locations