Perhaps the most succinct summary of the relationship between artificial intelligence (AI) and data can be described as follows: an AI model is only as good as the data it was trained on. Preparing training data is a time-consuming process that requires a not-insignificant investment of money and people to annotate the data. While companies often focus on the money aspect, it’s the people who deserve our attention.
Your work as data annotators has a tremendous impact on AI projects. Your contributions provide ground-truth accuracy in labeling and, when sourced globally, a diversity of perspectives serving to create less biased AI. Achieving inclusive AI—that is, AI that works for everyone—wouldn’t be possible without your efforts.
The Role of Data Annotators
The work of a data annotator starts at the early stages of model build, during a step known as training data preparation. Before being processed by an algorithm, training data must be collected, cleaned up, and annotated. Annotation refers to the application of labels and tags to raw data. These tags identify key features relevant to the decisions the machine will be tasked to make (e.g is there a human in this image? What does this receipt say? Is this the right context for this ad? Is this search relevant?).
Data annotators undertake one of the most critical parts of AI development. The accuracy of their labels directly impacts the accuracy of the machine’s future predictions. A machine trained on poorly-labeled data will commit errors, make low-confidence predictions, and ultimately not work effectively.
The implications of incorrect labels can be enormous depending on the use case. Imagine a machine that’s learned from a set of radiology images how to identify a particular heart condition. If the data annotators mislabeled several images in the training data, the machine could easily miss a diagnosis, jeopardizing the health of some poor individual.
The social ramifications of poor data annotation are profound, but inaccurate labeling can also negatively affect many interactions between people and AI. Finance, retail, and other major industries rely on AI for various transactions, for example, and AI that’s not making accurate predictions will lead to poor customer experiences and impacts to business revenue. The work of data annotators has a widespread impact on companies around the world that use AI.
Reducing Bias Through Global Sourcing
Data annotators aren’t just impactful for providing accurate labeling. They also play an important role in mitigating bias. Building inclusive AI is mission-critical: after all, AI that doesn’t work for everyone, ultimately doesn’t work. With companies increasingly executing global operations with a diverse customer base, inclusivity is more important that ever.
AI training data prepared by humans can reflect their biases, which isn’t ideal for an algorithm’s objectivity. If you train a machine to identify which images have a nurse in them, for instance, but include very few images of male nurses in your training dataset, that machine will under-select images with male nurses. In other words, it won’t be very accurate and will likely be considered offensive. Alternatively, if you’re building a speech recognition device but only collect speech data from one demographic of people, your model will struggle to work well for any other demographic as speech patterns vary widely from group to group.
You often don’t know what you don’t know, which is a big problem for bias. And this applies to training data, too. Solving for this bias requires including diverse perspectives from the start. Luckily, companies leveraging the power of a crowd of data annotators are now able to source contributors at a global scale. Access to a global crowd brings in diverse ideas, opinions, and values. These diverse perspectives become reflected in training data and the AI solution itself, leading to a final product that’s less biased and more functional for everyone.
A diverse crowd of annotators also gives companies access to a variety of languages and geographies, so building a model that works in both Romania and Canada is no longer out of reach. Given that data annotation can be accomplished virtually, novel opportunities are now available for people in more remote locations to participate in the labeling process, providing local knowledge and expertise for underserved languages. This naturally expands the reach of organizations looking to scale to a larger customer base, giving more people across the world access to new (and sometimes much needed) technology. The globalization of the AI economy provides a platform for data annotators to drive accessibility and inclusion, ultimately creating opportunities for people from all walks of life to participate in the technological advancements of our time.
Undoubtedly, data annotators are the foundation of successful AI. You bring needed diversity that maximizes accuracy and reduces bias. AI practitioners have a responsibility to source diverse annotators for their project. And they have options: they can work with a third-party data provider like Appen to gain instant access to a global crowd of annotators or alternatively, seek out a large variety of contractors to fulfill this requirement. In-house annotation is also a possibility, but most organizations don’t have a sufficiently large pool of unique individuals, like our crowd, to accomplish the massive effort that data preparation involves.
Whichever option they choose, they need to seek out annotators that represent their end users. If their product will launch globally, they must hire annotators like you from around the world. If their product is intended for only one geographic location, they’ll still want to ensure the annotators represent all demographics present in that location. Essentially, the use case will determine who needs to label the data for the model.
It’s important to remember: annotators deserve fair treatment for their work. If program managers are hiring contractors or leveraging a crowd, they need to pay annotators fairly, provide channels for them to give feedback, protect their personal information, and overall support their well-being (for more helpful information on this, see Appen’s Crowd Code of Ethics). Annotators are, unfortunately, often overlooked in the value chain of AI, but we believe that they (you) deserve special acknowledgement for their invaluable contributions. With the efforts of diverse annotators behind AI initiatives, we’re one step closer to building AI that’s truly inclusive.