DeepSeek Launches Vision Mode with Visual Primitives Framework for Spatial Reasoning

DEEPSEEK-5.30%

According to Beating monitoring, DeepSeek has officially launched Vision Mode on both web and app platforms, featuring deep scene analysis, spatial reasoning, and the ability to convert UI screenshots directly into HTML-structured code.

The new vision capability is built on DeepSeek's research framework of "Thinking with Visual Primitives," co-developed with researchers from Peking University and Tsinghua University. The underlying approach addresses spatial reasoning gaps in existing visual language models by treating coordinate points and bounding boxes as core thinking units, enabling the model to perform visual reasoning with integrated spatial reference during inference. The foundational academic paper was briefly released on April 30 but was withdrawn by DeepSeek on May 1. Vision Mode currently supports image input only, without video or audio support, and lacks image generation capabilities.

Disclaimer: The information on this page may come from third-party sources and is for reference only. It does not represent the views or opinions of Gate and does not constitute any financial, investment, or legal advice. Virtual asset trading involves high risk. Please do not rely solely on the information on this page when making decisions. For details, see the Disclaimer.
Comment
0/400
No comments