I joined MetaBroadcast as part of a litter of three, so I wasn’t alone in getting used to a company that employs a Happiness Officer, lets you start work at midday if you so wish, and has strong ties with a cake shop. Needless to say, I’m loving my time here, and this blog post is to talk about my first month.
In the last month I’ve worked on two main projects. The first was building an ingester, and the second one was improving our equivalence. I’m going to talk about equivalence as it’s more interesting to me and it’s what I’m working on now.
Equivalence optimisation has the potential to maximise the accuracy and precision of our content matching code, and it’s a fascinatingly complex problem. Attributes of content need to be compared using various algorithms, to provide data about the metadata we’re handling (meta-metadata?), and a scoring system over that data needs to be calibrated so as to maximise the chances of matching content correctly, but also minimise the chances of matching content incorrectly.
Challenges include how to actually analyse the results. If we had a good way of checking if they are correct, we’d be using that as a matching algorithm anyway and then we’d have nothing left to check that with! It’s important to mention that our equivalence is already pretty good, and that I’m just working on possible ways of improving it even more.
While the simple application of algorithms is a practical way to solve the problem pretty well, machine learning could be used to bring about further improvements in the future.
If we had a cheap way of effectively classifying some of the correctly matched and incorrectly matched content, then it would be very simple to use this to generate training data, build a classifier using a datamining library, and then have a program use said classifier to run through all of our content, classifying / matching it with a measurably good accuracy.
Along the same line of thinking about machine learning methods for solving this classification problem, this problem would be perfect for unsupervised learning, which has been tried in the past; it’s just difficult to implement and get good results, and even more difficult to determine if the found clusters of data are actually equivalated content, or some other obscure pattern that’s been noticed more easily by the learning algorithm.
These methods of equivalence are something to think about in the future, although we already use some machine learning in checking equivalence. At the moment I’m building a tool to test a range of approaches to simple problems, while attempting to make small improvements to current equivalence using more simple algorithms.
but also fun
It’s definitely worth mentioning that I don’t work alone. In my first month here I have integrated with a highly efficient and productive team of experts, through our Slack usage, Happy Hour updates and demos, Metaversaries, Foodie Fridays, an Away Day, and a #Metabeers. Through all the cake, bubbly, games, beers, food, cake, boat trips, planetarium visits, meals out, coffee, socialising, and cake, we also managed to do some work!
If you enjoyed the read, drop us a comment below or share the article, follow us on Twitter or subscribe to our #MetaBeers newsletter. Before you go, grab a PDF of the article, and let us know if it’s time we worked together.