Supercomputer Trains GPT-4-Sized Model with Only 8% GPUs, Achieving Impressive Results, US

The world’s most powerful supercomputer, Frontier, has achieved impressive results by training a language model with just over 8% of its GPUs. The supercomputer, located in the Oak Ridge National Laboratory, utilized 3,072 of its AMD Radeon Instinct GPUs to train an AI system at the trillion-parameter scale. It further used 1,024 GPUs (about 2.5% of its total GPUs) to train a 175-billion parameter model comparable to OpenAI’s GPT-4.

To achieve these results, the researchers had to overcome various challenges. Each MI250X GPU only had 64GB VRAM, which was insufficient for the minimum required 14TB RAM. Therefore, the researchers grouped multiple GPUs together, introducing parallelism that improved communication and resource utilization.

Typically, large language models are trained on specialized servers using a significantly higher number of GPUs. However, the researchers aimed to demonstrate whether a supercomputer could train an AI system more efficiently. They employed tensor parallelism, pipeline parallelism, and data parallelism to optimize the training process and significantly reduce the time required.

For the different parameter models, the researchers achieved impressive throughput percentages: 38.38% peak throughput (73.5 TFLOPS) for the 22-billion parameter model, 36.14% (69.2 TFLOPS) for the 175-billion parameter model, and 31.96% peak throughput (61.2 TFLOPS) for the 1-trillion parameter model. Moreover, they attained 100% weak scaling efficiency and strong scaling performances of 89.93% and 87.05% for the 175-billion and 1-trillion parameter models, respectively.

While the researchers openly shared details about the computing resources and techniques used, they did not provide specific information on the training timescales.

By leveraging the power and architecture of the Frontier supercomputer, the scientists showcased the potential of achieving faster and more efficient training of large language models. This accomplishment challenges the conventional approach of training these models on specialized servers with a higher GPU count.

It is worth mentioning that the provided news article did not include specific references or hyperlinks.

Supercomputer Trains GPT-4-Sized Model with Only 8% GPUs, Achieving Impressive Results, US

Frequently Asked Questions (FAQs) Related to the Above News

Subscribe

How to Use Chat GPT: Step by Step Guide to Start Open AI ChatGPT

Fascinating Facts on ChatGPT

ChatGPT Global News Offers Comprehensive AI-Powered News Coverage

An Overview of ChatGPT

Meet the Experts Who Trained ChatGPT

More like this
Related

Obama’s Techno-Optimism Shifts as Democrats Navigate Changing Tech Landscape

Tech Evolution: From Obama’s Optimism to Harris’s Vision

Tonix Pharmaceuticals TNXP Shares Fall 14.61% After Q2 Earnings Report

The Future of Good Jobs: Why College Degrees are Essential through 2031

About us

Company

The latest

Obama’s Techno-Optimism Shifts as Democrats Navigate Changing Tech Landscape

Tech Evolution: From Obama’s Optimism to Harris’s Vision

Tonix Pharmaceuticals TNXP Shares Fall 14.61% After Q2 Earnings Report

Subscribe

Supercomputer Trains GPT-4-Sized Model with Only 8% GPUs, Achieving Impressive Results, US

Frequently Asked Questions (FAQs) Related to the Above News

Subscribe

More like thisRelated

About us

Company

The latest

Subscribe

More like this
Related