The Original Tech Aunties
Welcome to the Tech Aunties podcast, where we're bringing you industry context and vision from myself, Angelia McFarland, and Gina Rosenthal. On each podcast, we will share our marketing and technology industry experiences along with the team. Listen to us as we explain the past so you can have context to understand and create your own version of the future. So let's get into it.
Nice to see you again, Angelia, and nice to see you again, Tony.
Hi. Howdy, howdy.
Today we're going to pick your brain some more about the computing needs of AI, but also the impact on the environment of those computing needs. But what I want to do first is introduce you again, just so our audience knows who we're talking to. Our guest this episode is Tony Foster. He is a Senior Principal Technical Marketing Engineer at Dell Technologies. He's also an Adjunct Professor of Technology at Kansas State University — or as we like to call him, the Wonder Nerd. He also describes himself as the VDI, EUC, and GPU fanatic bringing deep learning, machine learning, AI, and HPC to the virtual world. Did I get it right again?
You got it right again. And again, it's a mouthful. Let's just stick with Wonder Nerd and not try to get all the other parts in there. Keep it high level.
I like it. So last time we had you on, we had you explain what AI is — because it's such a big topic. Some people are scared of it, and there are organizations where marketers are being pushed to call everything AI when it may not be. Hopefully we gave them some vocabulary and understanding. Today, what we wanted to dig into is the architecture needed to support one of these AI systems — whether it's machine learning, deep learning, or whatever — and also the carbon footprint of these things, the environmental impact, and other things we should be aware of. Are you up for that?
I'm up for that. Let's go.
Wait — I didn't ask Angelia. Angelia, are you up for that?
I'm up. I'm ready. Let's do it.
Sorry about that, Angelia. So let's level set with architecture. This is not AI being some magical futuristic thing that only a select few can understand. It is a technical domain with smart technical people building the algorithms, and equally smart people architecting the computers those algorithms are going to run on. So what does that architecture actually look like?
Sure. The architecture for AI is not that far off from what you'd expect for any high-performance computing environment. You need very powerful processors — and specifically for AI, you need GPUs. Graphics Processing Units. These are very good at doing the kind of math that AI requires — lots and lots of parallel calculations happening simultaneously. The more GPUs you can throw at a problem, the faster you can train a model and the faster you can run inference — which is actually using the trained model. You also need significant amounts of memory — both system memory and GPU memory. The bigger the model, the more memory you need. And you need very fast, very large storage. You're dealing with massive data sets, and you need to be able to feed data to those GPUs fast enough to keep them busy. If you're starving your GPUs of data, you're wasting money.
These are neural networks — programs being written by a new type of developer that can do the really hard math problems that even I don't understand. But you can talk to one of these developers — these data scientists — and figure out what kind of architecture you need underneath it, which is what Tony is describing.
And I'm not saying you need to go bleeding edge or have the latest and greatest to run your environment. But you definitely want to be running on newer equipment.
I would slightly disagree.
Oh, I'm a marketer. Let's hear this.
The reason I would disagree is because the technology moves so fast. Any time someone asks me what laptop to get, I tell them — if you have the money, get the best that's out there. Because the technology is moving so fast, if you get something less powerful, the software will outpace the hardware within two years. And I think that's what an architect would say too. You can go out there and buy 20 servers to run your AI environment today, or you can take an opportunity loss and wait for the next generation — where maybe that shrinks down to 15 or 10 servers. What's that opportunity loss for your AI program? Those are the questions.
That's kind of like where I was going too. You can't build these kinds of programs and you can't architect with throwaway machines. You've got to get what you need. And your architect needs to be talking to the data scientists to find out what's required — and that's not even that hard, because most neural networks have a map of what hardware you need to run them on.
Correct. And it's not difficult. What I'm saying is: you can run it on 20 systems today, or you can take an opportunity loss and wait for the next generation where maybe that shrinks down to 15 or 10 systems. What's that opportunity loss for you on your AI program? If you can run it on 20 systems today, go for it. If you can run it on 15 or 10 systems tomorrow and save on systems and operating costs — is it worth the wait? Those are the questions every organization has to answer.
I think we also didn't really talk about networking. And my hope is most of our audience wouldn't need to get into the nitty gritty of fast networking. But Tony — I don't think most of our audience needs to go deep on networking, but I think they have to have a clear understanding of where they need throughput.
Absolutely. With AI, you need the throughput. That is one place you absolutely cannot skimp on your architecture. You can't go to a big box store and buy a switch and expect to run your AI on a one-gig network. You can't even really do it on a ten-gig network anymore. You've got to have a good solid network — because you're moving a lot of data around. If your AI spans more than one system, you're going to be pulling data in and out of your data lake and sending it to multiple systems, and those systems are going to be talking with each other. All of that traffic is large. Today's modern data centers are about 25-gig networks at the bottom end. You'll see 40, 80, 100-gig networks. And 100 gig is where a lot of AI runs right now. But 200 gig is fast approaching, and 400 gig is what the big supercomputer systems are now using — because that becomes the bottleneck. The systems are delivering answers in sub-seconds, and if they can't move data across the network fast enough, that's what increases your time to discovery.
Absolutely. Look at me coming through with the tech.
That was a really great description, Tony — it kind of included all the other components. This is a lot of work. And a lot of companies now have ESG — Environmental, Social, and Governance — requirements they have to worry about. So we're going to get into some of the social aspects in later episodes. But what about the carbon footprint? What kind of environmental impact are we talking about? And does it vary at different stages of a neural network's life cycle?
It does vary across life cycles. With AI, you typically have three areas. You have the development or training stage — and that's the biggest draw on power, resources, and everything. You're running this nonstop: training, tweaking, running again, training, tweaking, running again, until you get the results you want. That uses a lot of power, a lot of resources, a lot of data. And water for cooling, depending on how you're cooling your data center. Unless you're lucky enough to be somewhere cool year-round where you can just open the vents and let the heat out.
Water for cooling. Yes.
Then next you have the validation phase. That tends to be shorter — it can still be as intense as training, but not quite. You take a new set of data and just make sure it returns valid results from the trained model. You think you have the right model — now let's run new stuff through it and confirm it's returning what the training data says it should return. And then you have the actual usage phase — where everything's trained. You're not pumping new training data into it. You're just taking new requests, putting those in, and running them. That's the least intensive phase of the model's life cycle. And a trained model can actually run on something about the size of a deck of cards. I'm holding up a Jetson Nano right now — it's about the size of a deck of cards and it has a GPU on it. A lot of models, once they've been trained and optimized, can run on something that small. And they become very energy efficient at that point.
They have to be small like that because they put them in autonomous cars. So they have to be smaller. I wanted to talk about the CO2 emission estimates from the different trained models. GPT-3, to train it, took 502 tons of CO2 emissions. And that's compared to somebody traveling from New York to San Francisco being about 110 pounds of CO2. So is that kind of what you're seeing with training models in general — that type of impact?
It's similar — and I don't have exact numbers, because I'm not an emissions expert. Everything I say on this topic is very generalized. But yes, they can be significant. And they can also be lower, depending on how you do things. When you look at the carbon impact of training models, there's actually more to it than just how much power it took to power the servers. You also have to figure in cooling costs, the power used for the network switches, the power for storage. All of these things come into play. And so if you do liquid cooling in your data center — where all your servers have liquid cooling packs that plug into ports on the rack and go out to a giant water condenser — you're actually more efficient, because it is a direct heat transfer as opposed to your processors giving off heat into the air in the server room.
Otherwise you're going to be wasting energy and you're going to take longer to run your processes. And that's what developers and architects have always wanted — hardware that's super performant so their applications work correctly. An architect will talk to the developer: what do you need, what do you expect to happen, what processing and disk space? It's no different with AI. These neural network programs are being written by a new type of developer that can do the really hard math problems — and you can talk to one of those data scientists to figure out what kind of architecture you need underneath.
I think this was really good — these last couple of episodes recorded with you, making this plain talk for people who need to work in this new space. A lot of people are getting acquainted with things that feel emerging even if they're not. Tony, one more time — tell everybody where they can follow you and find you, because you always post really interesting stuff on these topics.
You can find me on LinkedIn at linkedin.com/in/wondernerd. You can find me on Twitter at @wonder_nerd. You can find me at wondernerd.net — that's my website. Or if you're on the K-State campus, you can find me wandering around the computer labs.
That's so cool. All right, Angelia — this was great. Another one in the can.
Thank you, Tony.
You're more than welcome.
Thank you for joining us today on the Tech Aunties podcast. If you have a topic you would like us to cover, please connect with us on LinkedIn and Instagram. You can also find this episode and others at Tech Aunties dot com. Until next time, y'all be sweet.