Discover how companies are responsibly integrating AI in production. This invite-only event in SF will explore the intersection of technology and business. Find out how you can attend here.
During testing, a recently released large language model (LLM) appeared to recognize that it was being evaluated and commented on the relevance of the information it was processing. This led to speculation that this response could be an example of metacognition, an understanding of one’s own thought processes. While this recent LLM sparked conversation about AI’s potential for self-awareness, the true story lies in the model’s sheer power, providing an example of new capabilities that occur as LLMs become larger.
As they do, so do the emergent abilities and the costs, which are now reaching astronomical figures. Just as the semiconductor industry has consolidated around a handful of companies able to afford the latest multi-billion-dollar chip fabrication plants, the AI field may soon be dominated by only the largest tech giants — and their partners — able to foot the bill for developing the latest foundation LLM models like GPT-4 and Claude 3.
The cost to train these latest models, which have capabilities that have matched and, in some cases, surpassed human-level performance, is skyrocketing. In fact, training costs associated with the most recent models approach $200 million, threatening to transform the industry landscape.
If this exponential performance growth continues, not only will AI capabilities advance rapidly, but so will the exponential costs. Anthropic is among the leaders in building language models and chatbots. At least insofar as benchmark test results show, their flagship Claude 3 is arguably the current leader in performance. Like GPT-4, it is considered a foundation model that is pre-trained on a diverse and extensive range of data to develop a broad understanding of language, concepts and patterns.
Company co-founder and CEO Dario Amodei recently discussed the costs for training these models, putting the training of Claude 3 around $100 million. He added that the models that are in training now and will be introduced later in 2024 or early 2025 are “closer in cost to a billion dollars.”
To understand the reason behind these rising costs, we need to look at the ever-increasing complexity of these models. Each new generation has a greater number of parameters that enable more complex understanding and query execution, more training data and larger amounts of needed computing resources. In 2025 or 2026, Amodei believes the cost will be $5 to 10 billion dollars to train the latest models. This will prevent all but the largest companies and their partners from building these foundation LLMs.
AI is following the semiconductor industry
In this way, the AI industry is following a similar path to the semiconductor industry. In the latter part of the 20th century, most semiconductor companies designed and built their own chips. As the industry followed Moore’s Law — the concept that described the exponential rate of chip performance improvement — the costs for each new generation of equipment and fabrication plants to produce the semiconductors grew commensurately.
Due to this, many companies eventually chose instead to outsource the manufacturing of their products. AMD is a good example. The company had manufactured their own leading semiconductors but made the decision in 2008 to spin-off their fabrication plants, also known as fabs, to reduce costs.
Because of the capital costs needed, there are only three semiconductor companies today who are building state-of-the-art fabs using the latest process node technologies: TSMC, Intel and Samsung. TSMC recently said that it would cost about $20 billion to build a new fab to produce state-of-the-art semiconductors. Many companies, including Apple, Nvidia, Qualcomm and AMD outsource their product manufacturing to these fabs.
Implications for AI — LLMs and SLMs
The impact of these increased costs varies across the AI landscape, as not every application requires the latest and most powerful LLM. That is true for semiconductors too. For example, in a computer the central processing unit (CPU) is often made using the latest high-end semiconductor technology. However, it is surrounded by other chips for memory or networking that run at slower speeds, meaning that they do not need to be built using the fastest or most powerful technology.
The AI analogy here is the many smaller LLM alternatives that have appeared, such as Mistral and Llama3, that offer several billions of parameters instead of the more than a trillion thought to be part of GPT-4. Microsoft recently released their own small language model (SLM), the Phi-3. As reported by The Verge, it contains 3.8 billion parameters and is trained on a data set that is smaller relative to LLMs like GPT-4.
The smaller size and training dataset help to contain the costs, even though they may not offer the same level of performance as the larger models. In this way, these SLMs are much like the chips in a computer that support the CPU.
Nevertheless, smaller models may be right for certain applications, especially those where complete knowledge across multiple data domains is not needed. For example, an SLM can be used to fine-tune company-specific data and jargon to provide accurate and personalized responses to customer queries. Or, one could be trained using data for a specific industry or market segment or used to generate comprehensive and tailored research reports and answers to queries.
As Rowan Curran, a senior AI analyst at Forrester Research said recently about the different language model options, “You don’t need a sportscar all the time. Sometimes you need a minivan or a pickup truck. It is not going to be one broad class of models that everyone is using for all use cases.”
Few players adds risk
Just as rising costs have historically restricted the number of companies capable of building high-end semiconductors, similar economic pressures now shape the landscape of large language model development. These escalating costs threaten to limit AI innovation to a few dominant players, potentially stifling broader creative solutions and reducing diversity in the field. High entry barriers could prevent startups and smaller firms from contributing to AI development, thereby narrowing the range of ideas and applications.
To counterbalance this trend, the industry must support smaller, specialized language models that, like essential components in a broader system, provide critical and efficient capabilities for various niche applications. Promoting open-source projects and collaborative efforts is crucial to democratizing AI development, enabling a more extensive range of participants to influence this evolving technology. By fostering an inclusive environment now, we can ensure that the future of AI maximizes benefits across global communities, characterized by broad access and equitable innovation opportunities.
Gary Grossman is EVP of technology practice at Edelman and global lead of the Edelman AI Center of Excellence.
DataDecisionMakers
Welcome to the VentureBeat community!
DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.
If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.
You might even consider contributing an article of your own!