Is it Magic, or is it Data?

How Santa uses Data to Read Holiday Letters

All around the world, kids and adults alike are writing letters on the topic of what gifts they want for the holiday season. Some of these letters are given to friends or family, others are mailed to the person they know as Santa Claus. The exact number of languages that Santa can speak is unknown, but speculation says it could be anywhere from 7 to 84+. There are more than 7,100 spoken languages – that’s a lot for just one person to know. Santa doesn’t just enlist the help of his elves to translate all these letters, he’s secretly using AI to ensure everyone gets their letter read. From using optical character recognition (OCR) to leveraging natural language processing (NLP), Santa knows data is key to keeping the holidays magical.

Step 1: Upload the letter with OCR

Despite modern technology, Santa still receives a variety of letters by mail including old-fashioned handwritten ones. Before he can start adding the present requests to his master toy database, he needs to get the letter uploaded into a computer and have it translated into a language he speaks. Manually typing the 32,000+ letters he gets daily is time consuming. To speed up this process, Santa leverages OCR.

Optical character recognition is a branch of computer vision that processes images of text and converts that text into machine-readable forms. In this case, the images of text are scanned letters. After this happens the program utilizing OCR uses intelligent character recognition to determine letters from paper and convert them to an ASCII code that can be used for further manipulations. A final sweep is done by the program to look for any errors and correct them. The result is a virtual file ready to be translated into a language Santa speaks.

For OCR to be successful, it needs to be trained on data. The machine learning model that trains it includes datasets containing images of text and what that text is. This ensures that all the text on the letters is identified correctly. It takes a lot of high-quality data to train these models, and we have our own handwriting recognition dataset and a suit of specialized tools for OCR, that we can customize to your needs to help jumpstart your OCR related projects.

Step 2: Translate with NLP

Now that all the letters are uploaded, it’s time to translate them into another language. To ensure that the letters are translated correctly Santa uses a translation program that leverages natural language processing. The reason Santa doesn’t just use any translation program is because he will just get a literal translation, which can be wrong. Literal translations ignore several concepts including words with the same spelling have alternate meanings, and words with different pronunciations have the same spellings.

Thanks to NLP, not only are Santa’s letters translated, but the right gifts for each recipient are now able to be entered into his master toy database. The first part of his gift giving journey is done and it’s up to the elves to create and source the presents to be delivered.

The data behind this is simple, words translated from one language to another are fed into a machine learning algorithm. It’s not just single words that are added, but sentences and paragraphs too. This way, literal translations are a thing of the past. If you’re in need of NLP, leverage our Appen Data Annotation Platform for accurate translations in one or several of the more than 235 languages we work in.

Step 3: Present Delivery Guidance Powered by Synthetic Data

It’s the big night and time for Santa to deliver gifts. According to Forbes, he has exactly 0.0003 seconds per household to deliver all presents in time. With such a tight time crunch, Santa can’t afford to get his directions wrong. He’ll be utilising a top-notch GPS program (in conjunction with his elite team of reindeer) to make sure no house is missed. Now, Santa happens to live in a part of the world not many have been to, and he also has to fly his sleigh to areas with little map data. With the help of synthetic data, Santa’s GPS has a truly complete map of the world.

Synthetic data is ideal for Santa as it’s artificially created rather than captured from real life. Originally, training data had to be obtained to cover every possible scenario to accurately train AI models and in this case, his GPS. If a scenario (for Santa – mapping of part of the world) had not occurred or been captured, there was no data. As synthetic data was used to complete Santa’s mapping the sky for safe sleigh travel around world, he’s able to deliver all the presents on time and accurately each season.

We partnered with Mindtech, the leader in synthetic data, this year. Our partnership with them allows us to provide data for those hard to come by edge cases that occur in projects, create data that is free from PII, make inclusive datasets and more!

AI and Data: it’s Not Just for Santa

With AI by his side, Santa is guaranteed to have a successful career as the world’s beloved gift-giver. Of course, Santa needs a break from his hard work the rest of the year.

If you need help with gift-giving and can’t wait for Santa to show, you can read our article on Unwrapping the Intelligence Behind Smart Gifts to learn why data should be the gift you give. Have a tricky person to shop for? Check out the section on Uncommon Goods in AI Brings People Together. 

This season, don’t forget to track Santa in real time via Google to follow along his gift-giving journey. From all of us at Appen, Happy Holidays!

 

Website for deploying AI with world class training data
Language