Tim Spurling

storing and conserving artifacts

the good old days…

Back in the old days, everything was very simple in the world of binary artifact repositories acting as Maven backends.

  • You had your released versions, named after their version number.
  • You had “future releases”, i.e. builds of in-development code, named something ending in -SNAPSHOT.

If you wanted a real version of something, you’d go to your repo and ask for com.whoever.thing:thing:2.1.17.

If you wanted the latest code that was built and pushed, for example for your stage server, you’d ask for com.whoever.thing:thing:2.1.18-SNAPSHOT, or even com.whoever.thing:thing:STAGE-SNAPSHOT.

Then, at some point, wisdom struck and ruined everything—someone involved in Maven’s development decided that builds were supposed to be consistent and repeatable. Obviously, anything depending on one of these snapshots cannot be either. Someone might come along, completely rewrite a library, and push it with the same name. You’d have literally no idea where, when (or by what…) a snapshot was built.

the present (i.e. post-2010)

The solution is “unique snapshot naming”—version strings like 2.1.18-SNAPSHOT are replaced with something along the lines of 2.1.18-20140401.053004-173.

This means that when building something dependent on a library snapshot, you can see from the downloaded file exactly when it was built (from the timestamp, 20140401.053004) and also get some idea of a sequence (the tail of the name, 173, is a “build number”, a generated number that (…theoretically…) increments with each build).

When fetching dependencies, the Maven client then uses maven-metadata.xml files generated by the backend to find out the latest available version, and only fetches the file if the number has changed. This is an improvement over the old scheme where supplying the -U option would, if I remember correctly, cause it to always redownload all snapshots all the time.

This, by the way, is the only way to do numbering if you want dependency resolution to work properly in Maven 3—and since Maven 2 was end-of-lifed in February, there’s not really any choice any more.

living (partly) in the past

While this is obviously a much better scheme for library versioning, we don’t really like it for deployable binaries. With the old non-unique numbering, deploy scripts can be really simple—there’s no need to make awkward guesses about how to parse maven-metadata.xml files, or worse, do a full Maven install just to use dependency:copy. All you have to do is consistently push a STAGE-SNAPSHOT build, and then deploy the binary to your stage server by HTTP GETting /deploy-snapshots-local/com/whoever/thing/thing/STAGE-SNAPSHOT/thing-STAGE-SNAPSHOT.war every time.

Luckily Artifactory (which is what we are using) does still give you this option.

Screenshot of local repository config: snapshot version behavior dropdown

There are three two choices:

  • Unique: Artifactory expects timestamped builds, and generates metadata normally based on them.
  • Non-unique: Artifactory accepts timestamped builds, but automatically renames them back to -SNAPSHOT to simulate the old school.
  • Deployer: Artifactory just uses the maven-metadata.xml file the deploying Maven client gives it (cool), but then screws it up totally, leaving out the proper versioning info, when something causes it to regenerate. Not recommended.

So for our purposes, we’ve configured three different repositories:

  • libs-snapshots-local: Used for library JARs and configured with unique, so that multiple build servers may consistently resolve the latest version.
  • public-snapshots-local: Same as libs but publicly accessible, to serve the open-source Atlas’s library components, and our other common libs that they depend on.
  • deploy-snapshots-local: For WARs and other executables (including the Atlas ones), using non-unique to keep our deploy scripts simple.

In each POM, we include a section like this to tell the Maven deploy plugin where to upload builds:

<distributionManagement>
  <repository>
    <id>metabroadcast-mvn</id>
    <name>MetaBroadcast Library Releases Repo</name>
    <url>dav:http://mvn.metabroadcast.com:80/libs-releases-local</url>
  </repository>
  <snapshotRepository>
    <id>metabroadcast-mvn</id>
    <name>MetaBroadcast Library Snapshots Repo</name>
    <url>dav:http://mvn.metabroadcast.com:80/libs-snapshots-local</url>
  </snapshotRepository>
</distributionManagement>

work in progress

This all works pretty well for the most part, but there are some things I’m not entirely happy with…

switching strategies

If in the course of an experiment you switch repeatedly from unique to non-unique and vice versa, you might end up with a mix of timestamped and untimestamped artifacts, meaning the most recent build may no longer be the one that ends up in the metadata. It’s necessary to manually check for this, and delete any versions that are screwing things up.

pom problems

Since each build is individually responsible for declaring which repository it belongs in, all developers must be aware of the meaning of the <distributionManagement> section. In practice, hardly anyone can be bothered to understand anything in a POM. This has led to some confusion, with new projects ending up in the wrong repo and needing to be moved (and have their versioning corrected) later.

One solution to this might be to use our own archetypes, for consistent library or web-app project structures. Of course, there will still be scope for variation and the mistakes that inevitably follow.

build numbers

One thing that can be annoying is that Artifactory selects the “latest” version for the generated metadata based on the build number, not the timestamp. This implies that build numbers must be consistently and logically generated. Ours are not. They seem to be guessed by the Maven client based on the one from the previous snapshot’s metadata. This led to further confusion when switching from unique to non-unique and back again, as new builds’ numbers started again from 1, and old builds numbered in the teens started being returned instead.

ATTENTION PLZ: If anyone reading should happen to have advice on getting Jenkins to pass build numbers into mvn deploy calls (with minimal per-build config), it would be massively appreciated!

While these few problems are unfixed, I’m periodically running a (slightly over-the-top) Ruby script to validate the generated metadata against the stored POMs and warn us in advance of any inconsistencies. (This could probably be fixed up for release if there’s any interest…)

Thanks for reading and please let me know any wise thoughts, wise reader! Cheers 🙂

blog comments powered by Disqus