Adam Millican

the atlas model

I’ve recently been working on making a model change in Atlas, and have had to learn a lot about how the model actually works, so I decided to do a blog post on how data flows through Atlas in all it’s different stages.

atlas owl initial input

The model is defined in Atlas owl (owl was the initial version of Atlas, deer being made later) initially as a simple model that ingesters can create instances of and pass to Atlas. The simple model is transformed into a complex model within Atlas via the model transformers, then written to the database by the ContentWriteExecutor. The ContentWriteExecutor also handles the logic that merges updates to content with existing content before writing to the database. Logic also exists in the form of model simplifiers to take the complex version of the model and turn it back into the simple model form.

deer model

Once content is written to owl it is transferred to deer and needs to be transformed into deer’s version of the model. This is done by LegacyContentTranformer classes.

protocol buffers and serialization

Once transformed to deer’s version of the model, the content needs to be serialized using protocol buffers and the ContentDeserializationVisitor. The content is also transformed into a UDT version of the model and serialized into the CQL content store (CQL is the Cassandra Query Language, for using the Cassandra database), from which it can be deserialized back out again later.

Atlas Model

Figure 1. Diagram of the Atlas model.

equivalence merging

Some of our sources provide incomplete entries, or one source will have information in certain fields and another in different fields. For this reason, when outputting data, we sometimes want to merge equivalent content to provide a more complete data object. There is equivalence merging logic in both owl and deer that checks content for empty fields and provides information from other equivalated sources to fill in the gaps. In some circumstances both equivalated pieces of content will contain useful yet different information, and this is also merged before output.

summary

I’ve explained how data gets into the model in owl, is transferred to deer, is transformed in various ways, and is merged on output. This explains the general content pipeline through atlas on a basic level.


If you enjoyed the read, drop us a comment below or share the article, follow us on Twitter or subscribe to our #MetaBeers newsletter. Before you go, grab a PDF of the article, and let us know if it’s time we worked together.

blog comments powered by Disqus