search
close-icon
Data Centers
PlatformDIGITAL®
Partners
Expertise & Resources
About
Language
Login
banner
Direct Liquid Cooling in Data Centres

Understanding and Implementing AI Workflows for Optimal Training and Deployment

Author: Brian Letort, Head of Data Office and Platform AI, Digital Realty

Artificial intelligence (AI) has the potential to revolutionize business operations and change the trajectory of innovation across industries, and across the globe.

Over 75% of the total annual impact of generative AI will occur specifically in sales, marketing, product R&D, software engineering, and customer operations. But these opportunities can only be realized if AI is thoughtfully and strategically executed.

Implementing AI is a structured process; one that requires CTOs, CDOs and CIOs alike to understand new technology requirements, rethink their current infrastructure strategy, and evaluate legacy systems. For AI to be truly impactful, IT leaders must build a foundation that meets the development and deployment needs of today and tomorrow’s AI models.

To do so, IT leaders need to strategically consider the stages of AI workflows. These workflows involve three independent ways of working with data: data aggregation, training, and inference. Considerations include how each one works, and what key foundational elements are required for them to run smoothly.

Three stages of AI workflows and how they impact IT infrastructure

Stage 1: Data aggregation

AI is only as good as the data it's trained on. Data aggregation begins with a comprehensive approach to gathering, cleansing, organizing, and storing data so that it can be used to train and inform an enterprise AI application.

In order for this pre-processing stage to work as intended, IT leaders should follow these best practices:

  1. Identify data sources: Large organisations produce and collect a lot of data: data about customers or employees, operational data, financial data, network logs, intranet searches, or from IoT sensors. For AI to intelligently know information and answers about an organisation, it needs a comprehensive picture of an organisation’s environment. The first step is locating where those data sources are.
  2. Gather data into a central system: Then, IT leaders need a system through which they can gather that data from its disparate sources, databases, and endpoints, and collect it into a centralised hub or repository. This could look like adopting an enterprise resource planning (ERP) system or a data lake that's wired up to various sources such as direct database connections, application programming interfaces (APIs), or other source.
  3. Clean and organise the data: Even though the data has been gathered into a centralised spot, it may still not be uniformly formatted, arriving as structured (specific and uniform from a database), semi-structured (organised but not according to governance), or unstructured (varied data types). IT leaders should implement best practices for data hygiene that include putting data into standardised formats and naming conventions, scanning for errors, and eliminating outdated or duplicated data. This step can be both time-consuming and costly.
  4. Establish data governance: Finally, establish a repeatable process that governs data management, including how the data will be protected and who will have access to it. This ensures that your data quality stays consistent over time. Given the size of the data collected, it's best practice to process data at the edge to mitigate the impacts of Data Gravity.

Stage 2: Training

Though enterprises will most likely only use foundational models that they will augment with their own data (rather than creating and training their own models) it is important to understand this stage in AI development.

Organised and clean datasets are a precursor to training an enterprise AI model. Typically, an enterprise will use a foundational model, like OpenAI's GPT-4 model, and then train or "prompt-tune" it further with internal data so that AI's knowledge is both all-encompassing and domain-specific. This training workflow can look like the following:

  1. Choose the right model: Not every AI model performs the same or will be suited to an organisation’s specific data. First, IT leaders should outline the use case for AI adoption: Data analysis? Customer service? Operational efficiency? This will guide them in choosing the right AI architecture, algorithm, and parameters for their business needs.
  2. Start with a task: Next, teach the new AI model to perform a specific task using a training dataset from the data aggregation stage. This will allow the AI model to start to recognise patterns and relationships in the data. As it begins to learn, those training the AI model can begin tuning its parameters to make it more precise and effective.
  3. Test and evaluate: Continue to test the model to strengthen and deepen its knowledge. Begin to expand its testing to additional datasets and see how it adapts, tuning as needed. Evaluate its performance throughout the testing process to ensure that it is performing as expected, and hitting accuracy, recall, precision, and other necessary evaluation metrics.
  4. Validate the performance: Now is the time to assess AI's performance on a new validation dataset to see how it's learned. It's time to fine-tune parameters and prepare the AI model for enterprise use.

Stage 3: Inference

Now is the time to put AI to work! At this stage, the AI model is ready to be launched across applications and use cases, where it will use its inference capabilities to make predictions or decisions on new sets of data.

However, don't simply deploy AI and let it work unchecked. Establish a process for continuous validation to ensure that the AI model is accurate and able to deliver actionable results and especially hedge against any “hallucinations” that may result in business impact. This process of monitoring an algorithm in production is commonly known as model observability.

For example, Harvard Business Review details how Morgan Stanley has “a set of 400 ‘golden questions’” whose answers are known, that they use as an ongoing AI accuracy test. If there’s any change to how the AI answers those golden questions, they know they need to re-tune the model.

Navigating AI workflow challenges

Ideally, the process of implementing enterprise AI is a smooth one, from data strategy to training to rollout. However, IT leaders should be aware of the following challenges as they do so.

  • Poor data quality: AI is only as good as the data it's trained on, and data that's unstructured or unformatted, or even a lack of data from across the organisation, won't allow the AI model to learn or function as needed. According to IBM’s “Global AI Adoption Index 2023,” “too much data complexity” is the second largest barrier to successful AI adoption. This is why it's important to take steps to put a data management plan in place first, as outlined above.
  • Lack of hardware: Running AI fast and efficiently requires computational hardware that can stand up to the requirements — and a lack thereof will impact performance and speed. High-performance computing (HPC) hardware is what can sufficiently support enterprise AI needs — particularly in the latency-sensitive inference stage — by powering simulation and modeling environments through parallel processing. It also takes up a fraction of the footprint of legacy hardware.
  • Weak infrastructure: In addition to hardware, AI needs the right infrastructure to support it. This includes having the right power density, which, for AI, can be five to ten times higher than other systems managed by legacy infrastructure. With this increase in computing, AI workflows will require specialised cooling for the data center as well.
  • Soaring costs: AI investments can be costly to an organisation, both in development and in the underlying infrastructure and hardware. One way to make AI more cost-effective is to ensure that AI deployments and projects align with business goals and that AI use cases will benefit the organisation's growth and impact. POCs are often used to demonstrate quantifiable value before embarking on a project.
  • Complex IT operations: AI workflows require many new IT operational considerations, including a strategic approach to data management and interconnectivity, time commitment for training, and commensurate hardware and infrastructure. It also includes a strategic approach to team management as well, and ensuring that there’s expertise on staff to manage new AI roll-outs — especially when, according to the above IBM report, “limited AI skills, expertise, or knowledge” is the biggest barrier to successful AI adoption.
Building the foundation of enterprise AI

There’s enormous opportunity for enterprise AI today — which can only be realised if it’s successfully implemented. It starts by building a strategic data aggregation program to collect, process, and store data efficiently. Then, use that data to train the AI model so it becomes familiar with domain-specific knowledge, patterns, and statistical relationships. Finally, ensure there’s a plan for evaluating performance and tracking metrics after AI’s deployment and real-world use.

These three stages of AI workflows provide the foundation upon which enterprises will build innovation.

Looking to learn how to successfully implement AI to bring your organisation into the future? Download the “AI for IT Leaders: Deploying a Future-Proof IT Infrastructure” today.

Tags