China’s cheap, open AI model DeepSeek thrills scientists

DeepSeek website seen on an iPhone screen. — Chinese firm DeepSeek debuted a version of its large language model last year.Credit: Koshiro K/Alamy

A Chinese-built large language model called DeepSeek-R1 is thrilling scientists as an affordable and open rival to ‘reasoning’ models such as OpenAI’s o1.

These models generate responses step-by-step, in a process analogous to human reasoning. This makes them more adept than earlier language models at solving scientific problems, and means they could be useful in research. Initial tests of R1, released on 20 January, show that its performance on certain tasks in chemistry, mathematics and coding is on a par with that of o1 — which wowed researchers when it was released by OpenAI in September.

“This is wild and totally unexpected,” Elvis Saravia, an artificial intelligence (AI) researcher and co-founder of the UK-based AI consulting firm DAIR.AI, wrote on X.

R1 stands out for another reason. DeepSeek, the start-up in Hangzhou that built the model, has released it as ‘open-weight’, meaning that researchers can study and build on the algorithm. Published under an MIT licence, the model can be freely reused but is not considered fully open source, because its training data have not been made available.

“The openness of DeepSeek is quite remarkable,” says Mario Krenn, leader of the Artificial Scientist Lab at the Max Planck Institute for the Science of Light in Erlangen, Germany. By comparison, o1 and other models built by OpenAI in San Francisco, California, including its latest effort, o3, are “essentially black boxes”, he says.

AI hallucinations can’t be stopped — but these techniques can limit their damage

DeepSeek hasn’t released the full cost of training R1, but it is charging people using its interface around one-thirtieth of what o1 costs to run. The firm has also created mini ‘distilled’ versions of R1 to allow researchers with limited computing power to play with the model. An “experiment that cost more than £300 [US$370] with o1, cost less than $10 with R1,” says Krenn. “This is a dramatic difference which will certainly play a role in its future adoption.”

Challenge models

R1 is part of a boom in Chinese large language models (LLMs). Spun off a hedge fund, DeepSeek emerged from relative obscurity last month when it released a chatbot called V3, which outperformed major rivals, despite being built on a shoestring budget. Experts estimate that it cost around $6 million to rent the hardware needed to train the model, compared with upwards of $60 million for Meta’s Llama 3.1 405B, which used 11 times the computing resources.

Part of the buzz around DeepSeek is that it has succeeded in making R1 despite US export controls that limit Chinese firms’ access to the best computer chips designed for AI processing. “The fact that it comes out of China shows that being efficient with your resources matters more than compute scale alone,” says François Chollet, an AI researcher in Seattle, Washington.

DeepSeek’s progress suggests that “the perceived lead [that the] US once had has narrowed significantly”, Alvin Wang Graylin, a technology specialist in Bellevue, Washington, who works at the Taiwan-based immersive technology firm HTC, wrote on X. “The two countries need to pursue a collaborative approach to building advanced AI vs continuing on the current no-win arms-race approach.”

Chain of thought

LLMs train on billions of samples of text, snipping them into word-parts, called tokens, and learning patterns in the data. These associations allow the model to predict subsequent tokens in a sentence. But LLMs are prone to inventing facts, a phenomenon called hallucination, and often struggle to reason through problems.

China’s cheap, open AI model DeepSeek thrills scientists

The world of artificial intelligence is constantly evolving, with new breakthroughs and advancements being made every day. One such development that has caught the attention of scientists and researchers around the globe is China’s cheap, open AI model DeepSeek.

DeepSeek, developed by Chinese researchers, is a cutting-edge AI model that is not only cost-effective but also open-source, making it accessible to a wide range of users. This has thrilled scientists who have long been looking for affordable and efficient AI solutions.

The capabilities of DeepSeek are truly impressive, with the model being able to process vast amounts of data in a fraction of the time that traditional AI models would require. This has opened up new possibilities for researchers in various fields, from healthcare to finance to transportation.

The potential applications of DeepSeek are vast, and scientists are eager to explore the full extent of what this AI model can achieve. With its affordability and open-source nature, DeepSeek is sure to revolutionize the field of artificial intelligence and pave the way for even more groundbreaking discoveries in the future.

Tags:

China AI
DeepSeek
open AI model
artificial intelligence
technology
scientists
China innovation
machine learning
deep learning
affordable AI solution

#Chinas #cheap #open #model #DeepSeek #thrills #scientists

China’s cheap, open AI model DeepSeek thrills scientists

Challenge models

Chain of thought

Comments

Leave a Reply Cancel reply

More posts

The Cost of Ignoring Disaster Recovery: Why Preparedness Pays Off

Paul McCartney Is Playing The Bowery Tonight. Here’s How To Get Tickets

The Impact of Remote Monitoring on Caregivers and Patients

Super Bowl champion Philadelphia Eagles return home to crowd of excited fans