Thomas Maroulis

dynamic scripting in elasticsearch

One of the tools we use to power our APIs here at MetaBroadcast is an ElasticSearch (ES) search cluster. ES is a great tool that allows us to query our data in a large variety of ways thus complementing our Cassandra database by overcoming its query limitations. The way ES accomplishes this is by having a very expressive Domain Specific Language (DSL) for constructing almost arbitrary queries in JSON that you can then post to the cluster using its REST API to get the desired results.

However the operative word for the purposes of this article is almost. In continuing a theme we have started recently it is important to note the difference between good software and magic. ES falls firmly in the first category and that means that there is a limit to what you can do with it. Sometimes you could have a business case that it can’t quite adequately express with its DSL. Fortunately and for exactly these kinds of purposes ES supports dynamic scripting in its queries. In this article we will go through how to handle a use case of sorting documents in a slightly unorthodox fashion using ES’ scripting capability.

the problem

Let’s say we have some documents like the following stored in ES:

These documents have a list of content inside them, each of which has a source field and a data field. What we want to do is to—after we have selected which documents to return—sort the documents on the field data, but based on a precedence defined on the source field. So in the above example if someOtherSource has a higher precedence than someSource then we want to use the value baz for deciding how to sort this particular document. The tricky bit here is that not all of these sources may exist so if the most precedent one is absent we would have to fall back to the second most precedent one and so forth.

the script

In order to accomplish this we are going to write a simple Groovy script that collects the values of the data fields from all content, ignores nulls and then uses a source precedence mapping to select and return one of them. The precedence is a very basic mapping from a source value to an integer with a higher number meaning that that value has a higher priority. The script would look like this:

Note that this is a basic version of the script. For production use you would need to handle nested fields, the possibility that a certain source might be absent from the precedence mapping, the possibility that none of the values of data might be set and other such edge cases. It is sufficient though for the purposes of this tutorial.

the query

To use this script in an actual ES search query you would need to add something like the following in the sort section of the query:

Note the value of script in the above JSON. This is where you declare the script that you want to run. Here you have a couple of options.

One of them is to save the script as a file with a .groovy extension in the scripts directory of each of your ES servers and then reference it by its name. The ES servers periodically scan that directory and compile any scripts they find in there so when they receive a query referencing one of those scripts they will have already compiled it and be ready to execute it.

Another option is to inline the script code as the value of script. This however has potential security implications when you are using a language like Groovy that is not sandboxed when executed by ES so this option is disabled by default. Before using this you should check whether you are inadvertently introducing a vulnerability. If you are certain that you are not then you will need to add n option in your ES’ YAML configuration to enable inlining of scripts.

For details on the security implications of inlining scripts, on how to enable it, as well as more complete information on all the different features that ES dynamic scripting supports, e.g. support for other languages, you can have a look here.

I hope your enjoyed this and/or found it useful. See you next time.

If you enjoyed the read, drop us a comment below or share the article, follow us on Twitter or subscribe to our #MetaBeers newsletter. Before you go, grab a PDF of the article, and let us know if it’s time we worked together.

blog comments powered by Disqus