Fred van den Driessche

re-mapping atlas

Much like any project, Atlas has evolved a fair amount over the years. It is currently capable of retrieving metadata from a number of different sources in a number of different formats. Some sources are crawled regularly for changes while others can be accessed on-the-fly to pull in data as requested (as Liam mentioned yesterday.) As Sergio wrote a couple of weeks ago, for persistent data storage we are currently in the process of moving our persistence implementation from mongoDB to Apache Cassandra. Any retrieved data is available in many different formats and views and Atlas is even capable of going full circle and uploading transformed data to remote locations.

a road less travelled

The organic growth that Atlas has gone through can make it a relatively daunting code-base to approach for the uninitiated. Its functionality is currently split across seven different git repositories on github. This flies directly in the face of our aims to make Atlas an open-source project which is a cinch to get up and running and to which it’s really easy to contribute. To combat these issues we’re planning on doing a bit of re-organisation in the not-too-distant future.

Some of the changes we’re considering include:
  • Consolidate the core features of Atlas. As with any project of this kind, at Atlas’ heart sits its data-model along with the interfaces defining methods for accessing and manipulating that model. Part of the work will involve making these important parts of the codebase more pertinent hopefully helping us control model versioning and also getting newcomers started quickly.
  • Make Atlas persistence less tightly coupled. One thing we learned through switching from mongoDB to Cassandra is just how closely aligned to its persistence layer Atlas had become. While it’s hard to avoid some features of the underlying database leaking through into the rest of the project, we want to make sure that Atlas can be backed by almost any persistence system, be that Cassandra, mongoDB or straight onto the filesystem if that’s how you want to roll. We want to make it easy to use your own sub-system to store data in Atlas.
  • Focus on remote-site adapters. Adapters let you get your own data into Atlas for its full indexing goodness. As with persistence, we want it to be really easy, if not easier, to create your own. There is common functionality across all the current adapters in the main atlas repository that we want to pull into a single place. We intend to hide a lot of the standard adapter tasks behind the persistence interface. By creating a ‘thicker’, more fully-featured persistence layer complexity across the individual adapters will be reduced. This should help everyone whether they’re writing an on-the-fly or regular adapter ingesting schedule data, deltas or full datasets.
  • Refactor the model. We’ve been working on cleaning up Atlas’ core data model which has grown a rapidly over the last year. Our aim is to make it more uniform in design and more consistent in its use. The more simple and less surprising the model the more straightforward creating and writing data will become.

uncharted territory

Above are just a few of the modifications on the table. Later on we may take this approach further, aiming to make Atlas less monolithic. One concept we’re considering involves spinning out each of the existing remote-site adapters into separate projects, these discrete adapters could then be loaded as needed at runtime. Along with the new architecture that Tom has been talking about, we expect this more modular structuring will create a more flexible space in which Atlas can grow further and allow separate components to advance independently of each other.

If you have any questions please feel free to drop us a line on the Atlas mailing list or if you want to contribute get involved over on github.

blog comments powered by Disqus