China DeepSeek unveils new AI model with unique features

Although there are still questions regarding the architecture’s effectiveness and security, the most recent experimental model from Chinese startup DeepSeek promises to increase productivity and improve AI’s ability to process massive amounts of data at a fraction of the cost.

Last year, DeepSeek caused a stir in Silicon Valley when it unannounced the release of its first model, R1, proving that large language models (LLMs) could be trained quickly, cheaply, and with less sophisticated hardware.

The company released DeepSeek-V3.2-Exp on Monday, an experimental version of its current model DeepSeek-V3.1-Terminus, which advances its objective of increasing AI system efficiency, according to a post on the Hugging Face AI forum.

Adina Yakefu, the Chinese community lead for Hugging Face, told that “DeepSeek V3.2 keeps the emphasis on cost-cutting, efficiency, and open-source sharing.” Thanks to a new feature called DSA (DeepSeek Sparse Attention), the AI can now handle longer documents and conversations more effectively. It also cuts the AI’s operating expenses in half compared to the previous edition.

The Futurum Group’s vice president and practice lead for AI, Nick Patience, stated that “it’s important because it should make the model faster and more cost-effective to use without a noticeable drop in performance.” “By making powerful AI more accessible to developers, researchers, and smaller businesses, this could lead to a wave of new and creative applications.”

The advantages and disadvantages of not paying attention

An AI model bases its decisions on both its training data and new information, such as a prompt. Let’s say an airline is attempting to figure out the best route from point A to point B, and while there are many options, not all of them are feasible. Eliminating the less practical routes can result in significant savings on time, fuel, and ultimately trip costs. Sparse attention only takes into account data that it determines is important for the task at hand, as opposed to earlier models that have examined every piece of data in the model.

As stated by Ekaterina Almasque, managing partner and cofounder of the newly established venture capital firm BlankPage Capital, “basically, you cut out things that you think are not important.”

Sparse attention is advantageous for AI’s scalability and efficiency since it uses fewer resources. However, since there is no control over how or why it rejects input, there is a chance that it could lead to a drop in the models’ dependability.

“The truth is, they have lost a lot of nuances,” said Almasque, a Graphcore investor and an early backer of Dataiku and Darktrace. “The real question then becomes, did they have the right mechanism to exclude irrelevant data, or is there a mechanism that excludes really important data, in which case the outcome will be much less relevant?”

When compared to competitors or traditional architectures, the investor stated that this might not be “the optimal one or the safest” AI model to implement, which could be particularly problematic for AI safety and inclusivity.

DeepSeek claims that the experimental model’s performance is comparable to that of its V3.1-Terminus. Notwithstanding reports that a bubble is forming, AI remains at the centre of geopolitical conflict, with China and the United States vying for supremacy. Yakefu noted that DeepSeek’s models are compatible “right out of the box” with Chinese-made AI processors like Ascend and Cambricon, meaning they can run locally on domestic hardware without any additional configuration.

She added that DeepSeek supplied the actual computer code and tools needed to implement the experimental strategy. “This suggests that others can construct their own enhancements and gain knowledge from it.”

But according to Almasque, the technology itself might not be defendable. “The approach is not very new,” she said, adding that the industry has been “talking about sparse models since 2015” and that DeepSeek’s technology is open source, meaning it cannot be patented. She went on to say that DeepSeek’s competitive advantage must be in the way it selects which data to include.

The company acknowledges that V3.2-Exp is a “midway step towards our next-generation architecture,” according to the Hugging Face post.

Patience observed, “This is DeepSeek’s value proposition all over: efficiency is becoming as important as raw power.”

Yakefu went on, “DeepSeek is adopting a long-term strategy to sustain community involvement in their advancement.” “Affordable, reliable, and efficient options will always be preferred by consumers.”

News

Company:

Join our community of SUBSCRIBERS and be part of the conversation.

China DeepSeek unveils new AI model with unique features

The advantages and disadvantages of not paying attention

Table of contents [hide]

Block Breaker Google: Play Nostalgic Arcade Fun in Browser

A&TA Explained: Unlock Growth Through Awareness and Analytics

AI Controls Satellite Orientation in Space: First Success

“Social Media Silent Scroller Traits: Why They Stay Quiet”

A new AI framework can be used to find space physics equations in raw data.

Local News

Block Breaker Google: Play Nostalgic Arcade Fun in Browser

A&TA Explained: Unlock Growth Through Awareness and Analytics

AI Controls Satellite Orientation in Space: First Success

“Social Media Silent Scroller Traits: Why They Stay Quiet”

Block Breaker Google: Play Nostalgic Arcade Fun in Browser

A&TA Explained: Unlock Growth Through Awareness and Analytics

AI Controls Satellite Orientation in Space: First Success