<img height="1" width="1" style="display:none;" alt="" src="https://dc.ads.linkedin.com/collect/?pid=76180&amp;fmt=gif">

RoundTower Blog

How to Start Your AI Journey? The role a Parallel File system plays in Scaling Up

The competitive stakes are high in the race for artificial intelligence (AI) innovations. If you haven’t already started your AI journey, beginning now is the key to keeping pace with the competition. And while embarking on any journey requires advanced preparation if your final destination is important enough, ultimately, you just have to begin. So how do you get going?

No matter what subject your AI initiative quest covers, such as autonomous driving, cures for disease, or vaccines for life-changing pandemics like COVID-19, making your AI setup effective and competitive requires a solid foundation of data science talent within your organization and three essential elements that work together in harmony to effectively arm the data scientists with the tools they need for the quick wins they want.

What are these essential elements? 

(1) Compute accelerators, like NVIDIA® GPUs
(2) A fast network, think NVIDIA® Mellanox® or Arista
(3) A modern parallel file system to manage the data, like WekaIO™


These elements can be depicted as a triangle, with all sides fitting together and sitting firmly on your organization’s foundation of talent. (See Figure 1.)

Figure 1

Figure 1. Essential elements in a competitive AI setup

Where AI Meets HPC
Let’s step back from the triangle for a moment and look at the changing landscape. Historically, high-performance computing (HPC) and AI were two distinct markets, but now there is a convergence of HPC and AI. Whereas HPC traditionally had been called the "lunatic fringe," reflecting the relatively small number of large organizations on the edges of enterprise computing that led crazy-big research projects and used enormous data clusters, these days AI, machine learning (ML), and deep learning have become HPC in the enterprise. It’s speculated that by 2022 the commercial mainstream market will be in full production for AI/ML. Admittedly, many organizations are just in the budding phases of their initiatives, but most will be putting real resources and energy behind their AI efforts by 2022, and that’s just around the corner. 

GPUs Are the Workhorses of Computing
To elaborate upon the first essential element in our isosceles triangle, let’s talk about what's happening in HPC and AI enterprises now. The modern buyer journey is starting to involve investing in compute acceleration technologies, like GPUs, which makes perfect sense. AI needs a powerful compute infrastructure to explore, extract, and examine the data to gain deep insights and deliver breakthrough results, and GPUs are at the heart of modern supercomputing.


Figure 2. GPUs are the workhorses of graphics processing

As the quintessential workhorses and multitaskers, GPUs easily manage the most complex data sets in AI workloads. That’s where NVIDIA comes in with their GPUs and their revolutionary GPUDirect Storage ® (GDS), an IO protocol developed by NVIDIA to accelerate IO between your storage and the GPU server nodes. GDS is a key feature of NVIDIA® Magnum IO, addressing potential storage IO bottlenecks for AI, ML, and HPC workloads. In fact, GDS can be seen as a force multiplier for complete IO acceleration. (See “NVIDIA GPUDirect Storage: Accelerating The Data Path to the GPU.”)

Bigger, Better, Faster, Stronger
Our second essential element is a fast network. Data centers carry heavy loads as they try to keep up with the growth that’s necessary to stay competitive in the world of AI. Everything is getting bigger: application size, data size, cluster size, compute size, and more. The networking portion is arguably the most difficult.


Figure 3. Fast network speed is key to AI success

Nevertheless, networking continues to improve with high-speed and low-latency solutions that replace the aging Fibre Channel and Ethernet links to speed data transfers from your network to your servers and storage systems. We now have 100gig and 200gig networking, and that’s where Mellanox lives. Plus, we hear more and more about long-range plans to create datacenter-scale computing architectures in which the network will become part of the computing fabric. WekaIO and its partners have strong relationships, built to support each other in the need for networking speed. 

WekaFS™ for the Win
If you have great networking, and you have powerhouse compute acceleration with workhorse GPUs, you might think you’re set. Think again. A modern file system, our third essential element, is required to get the most out of the other two elements in our triangle. When companies put their GPU technology into production, often they haven’t considered the ability of their storage infrastructures to support their data-hungry beasts. GPUs sit idle because legacy storage infrastructures can't get the data to the application servers fast enough. Yes, organizations are spending billions of dollars on their IT infrastructures, but some are implementing outdated technologies, adding more of the same—that which they’ve bought for years—and with legacy storage file systems layered on top, there’s a bottleneck. (See “10 Reasons a Modern Parallel File System Can Solve Big Storage Problems.”)


Figure 4. WekaFS breaks the bottleneck imposed by legacy storage

Admittedly, some legacy products are fine within standard swim lanes, but modern workloads require a high-performance, scalable parallel file system that solves today’s biggest storage problems and accelerates modern IO-intensive workloads. With the global 2000 customers that actively work in AI and ML at scale, WekaFS™ is the only file system to consider because it breaks the bottleneck imposed by legacy storage file systems. Weka touches the revenue-generating enterprise applications, providing first-to-market competitive advantages and moving our customers’ top line by reducing their time to market.

“GPU performance has continued to grow, data movement becomes increasingly important, and WekaIO has pioneered an impressive modern parallel file system that delivers important capabilities to accelerate AI and workloads at scale.”

– Jeff Herbst, vice president of business development at NVIDIA

Let’s face it. If AI is mainstream by 2022, no organization can afford to ignore the bottleneck of a legacy storage file system when it needs access to an extreme amount of data at a high rate of speed. WekaIO’s “world's fastest file system” message resonates with customers because WekaFS delivers 10x – 50x faster than anything else on the market, so it will undoubtedly resonate with future AI enterprise markets as they go to production and as they scale. Moreover, as WekaFS matures to include enterprise data management services, the enterprise can "have its cake and it too." Customers get their fast performance and their required features. It’s a win-win.

Effectively Arm Data Scientists for Success 
The triangle is a strong architectural element that has been used since ancient Greek times. Its simple yet powerful design provides a strong structure when built upon a solid foundation. For our discussion, it helps to illustrate any organization’s need to employ three essential elements when embarking on an AI journey and designing an architecture for success. If you are just beginning your AI journey or if you are looking for a performance file system that effectively arms your data scientists with the tools they need for the quick wins they want, contact www.Weka.IO for no cost demo.

About WekaIO

Weka exists to solve the big storage problems for our customers, which we believe requires a totally new approach to manage the scale and performance of emerging applications. Organizations and enterprises that utilize legacy storage systems that are not optimized for today’s accelerated datacenters will be hindered in their ability to extract value from their data. We are committed to helping our customers maximize the investment in high-powered IT infrastructure, such as GPU and FPGA-enabled servers, so they can innovate faster and solve previously unsolvable problems.

While providing the best in universe 160GB per sec throughput as shown: https://www.youtube.com/watch?v=GAZP1NcdWMo&t=5s
website: www.weka.io
Share this Post: