Top 5 Predictions About the Future of AI and Data

For the past few years, overarching goals with AI have been aspirational. As this new technology exists in the realm of literature and academia and is applied to real-world problems and bent to real-world solutions, there has to be a reckoning with its true capabilities and use cases. For AI, 2021 was a year of over-promising.

While potentially underwhelming and not meeting the aspirational promises set forth, 2021 was a time of foundation building for AI. 2021 formed the base structure that can be built upon and changes can be made to make AI more responsible, efficient, and cost-effective. 2022 is the year to learn from past mistakes and build a better world of AI tech.

Below, you’ll find our top 5 predictions about the future of AI and why we think these changes are imperative to the overall success of AI technology. If you want even more predictions about the future of AI, you can download our 2021 State of AI and Machine Learning Report.

Top 5 Predictions About the Future of AI and Data

1. Responsible AI Goes From Aspiration to a Foundational Requirement

In 2021, the AI industry had an all-talk and no-walk problem. While you can read dozens of think pieces and thought leadership articles about responsible AI in 2021 (including our own World Economic Forum Agenda blog post), adoption of responsible AI principles was low. According to our Appen 2021 State of AI report, concern for AI ethics was at just 41% for technologists and 33% for business leaders.

In 2022, the stakes get higher and businesses will begin to recognize that responsible AI leads to better business outcomes. The business leaders will catch up to the technologists in understanding the importance of responsible AI. And, they’ll begin to see how the up-front investment will pay off for their business.

When responsible AI principles are properly implemented, they protect a business’s brand and ensure that the AI project works as expected. Entering 2022, we also have a well-established and thoroughly reviewed set of responsible AI principles. They include:

  • Unbiased data
  • Fair treatment of data collectors and labelers
  • The need for AI projects to promote social good and prevent social harm

As business leaders catch up to technologists in recognizing the importance of responsible AI, governments aren’t far behind. Governments are beginning to recognize the potential harm that can come from irresponsible AI. Along with this recognition will come regulation. Just like what has happened with data privacy, when private industries don’t regulate their harm to society, governments step in with regulations that will force businesses into ethical and responsible AI.

Another bellwether for the implementation of responsible AI comes from Gartner, which projects that by 2023 all personnel hired for AI development will need to demonstrate expertise in responsible AI.

2. Data for AI Lifecycle Becomes Critical for AI Programs

Recent statistics and trends show that AI programs are maturing and AI is increasingly present everywhere. AI is powering business operations and shaping product development. According to the Appen 2021 State of AI report, AI budgets have increased in the last year. This shows a recognition by business leaders that they must invest in AI to ensure success.

One of the key takeaways from 2021 is that businesses, even those with mature AI data science sectors, are struggling with data. What businesses are realizing is the vastness of the amount of data that’s needed for AI model development, training, and re-training. Because so much data is needed for a successful AI lifecycle, many businesses are choosing to partner with external training data providers to deploy and update AI projects at scale.

The fact that a majority of organizations are pairing with external data partners shows the challenge of continuous data sourcing, preparation, evaluation, and production. AI projects need more data and faster than ever before. This can only be achieved through automation, especially around data sourcing and preparation.

This need for data will shift in 2022. Companies will still need just as much data, but a new discipline will be developed. Data for the AI lifecycle will focus on the development of tools and best practices that enable businesses to manage the entire AI lifecycle, from data acquisition to data versioning and all the way to model retraining.

3. Rise of Synthetic Data

As more and more data is needed to satisfy data-hungry AI programs and model retraining, the industry is going to see new ways for businesses to acquire data. While the only solution for more data at the speed needed by these companies is an external data partner, another solution is on the horizon.

Generative AI can create synthetic data, which can be used to train AI models. While currently only accounting for 1% of the data on the market today, Gartner believes that generative AI will account for 10% of all data produced by 2025. Currently, generative AI is being used to address key challenges such as generating 3D worlds for AR/VR and for training autonomous vehicles.

Gartner also forecasts that by 2024, the use of synthetic data will halve the volume of real data needed for machine learning. The use of synthetic data complements and accelerates the data acquisition process because it needs less processing, security, and labeling than real-world data which is subject to responsible AI principles.

In 2022, you can expect a lot more businesses and machine learning models using and experimenting with synthetic data. Generative AI models can learn from themselves and generate new data, which is cost-effective and improves efficiency for businesses. With these benefits, it’s obvious why many businesses are excited about generative AI and synthetic data. And, as more companies experiment with and implement synthetic data and generative AI, we’ll see new use cases developed over the next few years.

4. Acceleration of Internal Efficiency Use Cases

Some great news for the industry: AI budgets are on the rise, according to the Appen 2021 State of AI report. 74% of respondents reported that they have AI budgets of over $500k. As well, 67% of business leaders say that their AI projects have “shown meaningful ROI.”

As budgets grow and the variety of use cases expands, it’s not surprising the number one most popular use case, at 62%, is supporting internal operations. The next most common use cases follow a similar pattern, businesses are using AI to make their own internal operations more efficient with:

  • 55% looking to improve their understanding of corporate data
  • 54% looking to improve productivity and efficiency of internal business processes

As companies shift towards using AI and machine learning models to improve internal efficiency in 2022, they’ll face an important data challenge. Companies now need to know how data moves through their organization and what happens to the data along that journey. As companies make this realization, they’re going to need to make two moves:

  1. They will need to focus more attention on deploying platforms that enable them to eliminate data silos and centrally manage data
  2. They will need to work internally or with partners to develop strategies focused on being able to manage data throughout the entire AI lifecycle.

If your organization can take these two steps, your AI initiatives will be more effective and efficient.

5. Model Evaluation and Tuning Becomes Mainstream

A realization has begun to slowly reverberate through the AI technology community: building an AI machine learning model isn’t just one-and-done. The model needs regular evaluation, tuning, and retraining. In 2022, this awareness will become common knowledge.

Machine learning models are dynamic, they can’t just be deployed and left to their own devices. Just like a car that needs its alignment adjusted regularly, machine learning models can develop drift over time. This drift can make the machine learning model results less and less accurate over time. Machine learning models must be reviewed and updated based on its ongoing results and any changes to infrastructure, data sources, and business models.

According to our report, the knowledge that machine learning models must be regularly reviewed and updated took a huge leap in 2021. We found:

  • 87% of organizations update their models at least quarterly, up from 80% last year
  • 57% update their models at least monthly
  • 91% of large organizations update their models at least quarterly
  • Organizations that use external data providers are most likely to update their models at least monthly.

As more and more companies have machine learning models in place, they’ll begin to realize that they can’t just be left alone once started. As companies use their machine learning models, they’ll realize and implement protocols for drift and regular tuning.

While the adoption of AI technology and machine learning models has become widespread, that’s just the first step. Now, it’s important for companies to lean on outside data partners and education sources to learn how to manage and improve their use of AI and machine learning.

As part of the growing up of AI, we’re seeing a shift towards talking about responsible AI to the actual implementation of responsible AI programs. Within that, companies are recognizing the critical nature of data. With the recognition of the importance of data for the success of AI projects comes the use of outside data partners for acquiring data for the entire lifecycle and using synthetic data that is more cost-effective and secure. Additionally, companies are realizing that one of the best ways they can use AI tools is to improve their own internal processes and that those models can’t just be left alone after deployment, but need regular updating and tuning.

If you’re interested in more information about the state of AI and machine learning in the coming years, be sure to read our 2021 State of AI and Machine Learning Report which goes into detail about all of these changes and many more.

Website for deploying AI with world class training data
Language