Improving Your Search Relevance Algorithm With Human Curated Data

In our previous post, we outlined the difference between Click Data and Human Curated Data metrics. Today, we’ll dig into the deeper use cases that require human curated data in particular.

Why You Need Human Curated Data

Tapping individual contributors to evaluate search results allows you to get explicit relevance judgments, which is a higher quality metric to optimize for than just clicks. For example, Etsy turned to CrowdFlower (now Appen) to help them solve for brand affinity. They wanted to make sure the products that most aligned with the Etsy brand (the most “Etsy-ness,” if you will) were displayed first in their search results. This is a problem that needed human judgement. Due to the nature of Etsy’s platform, typical click data doesn’t suffice. One example of this is the plain fact that Etsy is fun to browse through. If a user clicks from page to page of search results it doesn’t mean they can’t find what they’re looking for, it just means they’re enjoying browsing. That’s where human curated data comes in. Etsy used our services to create a better filtered search, taking the burden off of its independent sellers to label their products and instead tapping Appen contributors to take on the job. With an ecosystem of more than 40 million products, this was no small task. When embarking on setting up your relevance scoring system for human curated data, we recommend that you score your current search algorithm with individual contributors as-is to establish a baseline. Then you can make changes based on the metrics that are right for you and your site, and then re-test the query-result pairings the new algorithm produces on the same random set of queries against your old one. Here’s how you’ll be able to understand if your new algorithm is an improvement or if you should make further changes.

Ways Contributors Can Improve Your Algorithm:

  • Score Query-Results Pairs: One of the most effective ways to use a contributor is for query-results pairs to measure relevance. To establish this metric, you must design a numerical scale (typically our customers create a 2, 3 or 5 point scale), which Contributors use to score each query-result pairing. This will give you a high-level idea of how well your search relevance algorithm is performing as well as a number to try and beat during your later relevance testing.
  • Additional Tagging: Item metadata can significantly increase search relevance. Leveraging contributors on their own or in tandem with automated, machine learning-enabled tagging can fill a product database with new tags quickly.
  • Data Cleaning and Product Categorization: Product databases get messy. Manufacturers may use different wordings for similar products; different distributors can describe or title identical products in different ways; or sometimes, you may just have several images associated with one product with no real way of knowing which is best. Contributors can easily reconcile these discrepancies.
Website for deploying AI with world class training data
Language