Summary

A successful data strategy is the foundation for a successful AI strategy. If you don’t plan for the many underlying data requirements of AI, you’ll run into data problems, which will inevitably lead to AI problems.

image_pdfimage_print

What makes an AI strategy successful? A stockpile of GPUs? The world’s top data scientists? A bottomless budget? These are all good things—but no AI project will succeed without a successful data strategy. A recent Pure Storage study showed that more than three-quarters (81%) of organizations are unprepared for the data demands and massive energy requirements required for AI—and 73% of organizations that have adopted AI will need data management upgrades.

AI is proving to be disruptive to many enterprise IT infrastructures—starting with managing and handling data. Let’s take a look at what’s behind these challenges to a successful AI strategy, and why overcoming them now, while many AI projects are beginning to scale, will translate into success.

7 Truths for AI’s Underlying Data Requirements

An AI-friendly strategy should include data curation to prepare training data sets, making data accessible to the training and inference infrastructure, and integrating new AI tools and applications. All of this has to be done in a cost-effective and highly automated manner, with high levels of security, governance, availability, and portability. 

If a successful AI strategy is really just a successful data strategy at heart, data problems will inevitably lead to AI problems. Here’s what you need to nail an enterprise data strategy that’s the right match for big AI plans.

1. There’s No Such Thing as Old or Cold Data

AI’s enormous appetite for data has put an end to the idea of old or “dormant” data. In theory, all of an organization’s data—even cold data—has the potential to yield insights or improve a model. That means the reams of data once relegated to repositories now have to be secure, mobile, and available on demand. If your organization’s data strategy hasn’t yet addressed these and other data requirements, the AI strategy will be on shaky ground.

2. Data Residency Is Critical—and Complex

While many AI processes are executed in the cloud, innovative enterprises are making the investment to leverage AI at scale and on-premises. But managing an end-to-end AI deployment at scale, and its data, is a complex undertaking. 

The data curation stage requires managing silos of tens to hundreds of geographically distributed operational databases and unstructured data repositories, each with its own unique performance requirements and management challenges. The training, inference, and tracking steps further require storage systems to deliver performance, easy orchestration, and economics.

3. Prepare to Meet the Computational and Storage Challenges of AI Training

Training a model is compute-intensive and iterative. Requests for new data, new sources, new workflows, and new objectives are part of the process, and AI and infrastructure teams are still expected to deliver workflows into production quickly. Without high throughput, you can run into computational bottlenecks and slow access times needed to perform the intricate calculations required by deep learning algorithms or to train deep neural networks.

AI adoption requires an agile storage platform to support evolving demands—such as data parallelism, where data is distributed across different nodes and operations are performed in batches. Or, model parallelism, where models are sharded and trained on the same data set in parallel. These methods require a platform that can provide extremely high throughput at scale while distributing data load based on priority and resource efficiency. 

This data platform should also employ best-practice data security and offer native support and integration with Kubernetes. It should allow data scientists and machine learning engineers to access storage, vector databases, and machine learning services in a self-service fashion, accelerating model training and deployment. 

4. Inference Is Where Data Performance Counts

Delivering AI inference—the application of trained machine learning models to new, unseen data to derive meaningful predictions or decisions—needs to be done in milliseconds. The output of the inference process may be used by several applications, business services, and workflows, with thousands or even millions of users. The inference process also needs extremely fast I/O operations and high throughput.

While training data can be distributed geographically, AI inference data can be generated from edge or remote locations in real time. During the inference step, both the data source and data type can become complex. For example, enterprises may have to manage real-time camera data from videos or images, manual processes, and workflows. They may have a GPU cluster in one of their data centers, but the data source might be remote. To handle such scenarios, enterprises need not only smart orchestration and automated workflows but also the ability to move data efficiently.

Related Reading: “The Role of Data Storage in Accelerating Time to Insights” by Bernard Marr

5. Make Room for AI and Data Growth

Most generative AI projects start with a few GPUs and the required storage. As the adoption of AI grows and data volumes expand, the infrastructure needs to scale through the addition of more GPUs and storage. Data scientists are leveraging and enhancing large language models with custom, proprietary data using retrieval-augmented generation (RAG). This enables organizations to accelerate GenAI use cases that are more current and domain-specific to their needs. The challenge is RAG expands data storage needs by 10x.

At scale, the data footprint grows, data sources increase, and data is distributed. This data growth and sprawl requires integration of multiple systems, where resources could be underutilized, workflows could be manual, and security exposures could increase. Tuning and upgrading the storage every time a change is made to the overall environment is a long and painful exercise. Managing the availability of multiple, disparate systems is also a problem when it comes to maintaining uptime.

A well-designed, efficient, and end-to-end AI infrastructure should offer predictable performance, easy management, reliability, and lower power and space consumption.

6. Provision for the Continuous AI Evolution

Enterprises want their AI infrastructure investments to last for years. Driven by the growing number of new AI models, more powerful GPUs, new tools and frameworks, and growth in data, requirements for an AI stack continue to evolve. Organizations need to future-proof AI investments with a data storage platform that can scale performance and capacity on demand, in right-size increments, without downtime and disruptions.

7. Give AI Space to Accelerate

Understand the nuances in storage of data for AI compared with data for other purposes. For data that is used in AI, many of the same value propositions for storage in IT apply, but there are some areas where there’s special focus. 

For example, the ability to scale performance and capacity independently as the AI environment grows is valuable for IT, so engineers don’t have to change or replace systems and infrastructure. Scaling should be non-disruptive, seamlessly increasing capacity and performance as data loads grow. AI engineers and data scientists benefit from being able to accelerate model training and inference without interruption, shortening the turnaround times required for AI workflows and results. Reconfigurations and moving data are very disruptive and can consume staff time and the ability to meet innovation schedules.

Pure Storage: The Last AI Storage Platform You’ll Ever Need

Data-driven insights are being democratized by AI, large language models, and generative AI. To leverage this in a sustainable way, data must be democratized, too. Organizations serious about AI strategies need an end-to-end AI platform that delivers from ingest to inference and beyond.

Tuning and upgrading the storage every time a change is made to the overall environment doesn’t have to be a long and painful exercise. With Pure Storage, organizations can add GPUs to their compute farm with confidence, knowing that they upgrade their storage non-disruptively without tuning or tweaking to meet different AI workload profiles. 

Pure Storage® Evergreen//One™ subscription services provide enterprises with choices for consuming and deploying storage—even delivering as-a-service experiences for traditional CAPEX storage. Evergreen subscription offerings range from on premises, hybrid, and storage as a service (STaaS) so that organizations can upgrade non-disruptively for 10+ years.

Download the Analyst Report:

Setting Direction for Enterprise AI Infrastructure

The value from a storage data platform for customers goes beyond the ability to store and retrieve data and includes other selection criteria in an evaluation. For use in AI, many of the same value propositions for storage in IT apply. However, the above are some areas where there’s additional focus.