How to Build a Successful Task for the Crowd

I built my first job on Mechanical Turk more than seven years ago. I immediately saw the power of the crowd, but the process was difficult enough that I started CrowdFlower, now Appen, to give people like me a better experience and more accurate results. Since then I’ve created thousands of jobs myself and looked at countless microtasks built by our customers. Many get great results and live up to the potential of the crowd. But often, when I look at a job on our platform, I can tell immediately that results aren’t going to be much better than that first project I ran so many years ago. The good news is that most of the mistakes users make can be avoided by following a few intuitive rules. I put together a list of my tips for building a successful task for the crowd. I also included some sage advice from a few of my fellow crowdsourcing colleagues — including Sid Viswanathan, who built the company CardMunch entirely through microtasking and sold it to LinkedIn, Omar Alonso, a Microsoft researcher who has written as many articles on crowdsourcing best practices as anyone, and Praveen Paritosh who is a thought leader on crowdsourcing at Google. If you have more suggestions, please let me know. Here are 12 essential steps you need to make the most of your crowdsourced job: 1) Ask yourself: Is my job actually possible? I’m amazed how many people overlook this step. Think about whether you provided enough information for someone you’ve never met to complete the task. What makes a job possible? If you are collecting information it must be available online and not behind a login. If you ask someone to find 10 photos of a building, there must actually be a building and there must be photos available. If you’re uploading information relevant for the job from a spreadsheet, all of the fields should be filled in. It’s not brain science, before you start your job, stop and really think through the issues that might make it impossible for someone who knows nothing about your project to complete the task you want to set up. 2) Ask your friend: Is my job actually possible? It’s always good to find someone who hasn’t looked at your job, show it to him or her and see how they perform on it. This will catch 90 percent of the basic problems we see. We offer a preview and an internal interface to make it easy for Appen job designers to do this. 3) Ask the crowd: Is my job actually possible? Get feedback on your job. Appen has a group of long-time contributors which we use to test new kinds of jobs and get feedback. This lets people who are really going to be working on your job review your instructions and setup and make sure that they can understand what you are trying to accomplish. The results can be extremely enlightening. 4) Give Immediate Feedback When I started CrowdFlower I had no idea how important it would be to give contributors real-time feedback. With an address verification job, for example, we see massive quality improvements by adding a simple address validator that warns contributors if their input seems obviously invalid. Gold Standard Data, or just Gold, is another good way to give instant feedback. Gold units are pre-labeled units with known answers that will be regularly inserted throughout your job. Gold units are then used to test and track contributor performance. Gold makes sure that only the contributors demonstrating competency in your job are allowed to submit judgments. Back in the earliest days of crowdsourcing, it was really hard to hide Gold inside a job so that contributors could see errors in their work in real time. Now Appen provides intuitive tools that make the process of creating Gold easy. While it’s true that some jobs, like essay writing, probably require a multistep validation process rather than traditional Gold, we’ve seen great results even with tasks that you might think would be incompatible with Gold. With a little bit of creativity, jobs like translation, search relevance, and surveys really benefit from hidden Gold. That’s a whole new blog post topic. 5) Make sure your Gold is working. We find that nothing frustrates workers more than doing good work and getting the wrong feedback. So if you’re using Gold, which you should, check back periodically to see which Gold contributors are getting wrong and see what you can learn from their mistakes. It might be as simple as rewording a question, or finding that two answers to the Gold make sense, and you can fix it easily. 6) Make a Nice Interface You don’t have to be a user experience (UX) expert to make a reasonably good looking task. We’ve integrated our platform with bootstrap and designed a whole language, we call Custom Markup Language (CML), to help you lay out good-looking jobs. This is especially important with more complicated tasks. Often the best results come from looking at similar jobs and copying their interfaces. 7) Limit Access to the Right Contributors Narrowing down your crowd to Appen’s trusted contributors and adding skills tests on the Appen platform can dramatically improve results. Give a little thought to who is likely to do the best work on your specific tasks. For example, if you’re running a task in German, consider restricting access to IP addresses in Germany. You’ll be glad you did. 8) Start Your Job Small If you’re running a big job with massive amounts of data, start with a limited set of data, look at the results, and iterate on your job set-up before you run the rest of your data. This should be obvious, but it’s easy to get lazy and assume you’ve got things right. Appen makes this easy by giving you the option of starting with just a fraction of your data. 9) Think About Scale Some words of wisdom from Sid Viswanathan: Make sure you have sufficient scale to justify crowdsourcing. People often come to me and say I have 500 tasks that take about 2 minutes each. My reaction in many cases is to just hire a couple guys and bang through it, because that will probably do the job for you faster. So I like to ask a couple questions: (1) Is this a one-time thing, or do you anticipate recurring jobs? (2) Is getting the results time-sensitive or not? There is no definitive minimum threshold to justify using a crowd, however, I have suggested to many folks that they save the time of designing a crowd job if it’s a one-time thing and not much work. Designing a good crowd job from scratch, depending on the complexity, is time-intensive, so you have to weigh what makes the most sense. 10) Decide How Much Work You Want to Do Crowdsourcing takes work, so decide how much of your own time you want to invest. Praveen Paritosh puts it another way, “There is the initial phase of checking feasibility, building and refining the interface and the task; and most importantly, refining the gold (the synthesis of crowd is often better than gold, which totally undermines the feedback process!). So, my advice to someone beginning will be to know about these upfront cost and involvement in terms of tools as well as subject matter expertise. Most work is spent in getting the task rolling reliably, after which, one might be able to be hands off.” The more work you do to refine a job, the more feedback you get, the better your results will be, but this process can take time. Depending on the scale and cost of a job, at some point it may be better so sacrifice some inefficiencies in the job to just get it done. And two more tips from Omar Alonso: 11) Ask for Comments “Always have an empty box for comments – you will be amazed at the responses you receive.” 12) Keep Your Workers Happy “Always keep your workers happy. You’ll run a whole lot more tasks. Don’t piss off the people who are helping you.” There are now hundreds of research papers written about how to get good results from the various platform providers, including Appen. Regardless of the route you take, the simple steps covered above will help your job be successful.
Website for deploying AI with world class training data
Language