Google's Vision Banana: A Unified Vision Model Outperforms Task-Specific Models in Segmentation and 3D Geometry

Gate News message, April 23 — Google researchers, including He Kaiming and Xie Saining, published a paper introducing Vision Banana, a general-purpose vision understanding model created through lightweight instruction fine-tuning of the company’s Nano Banana Pro (Gemini 3 Pro Image) image generation model. The key innovation unifies outputs of all vision tasks as RGB images, enabling segmentation, depth estimation, and surface normal prediction through image generation without task-specific architectures or loss functions.

In semantic segmentation, Vision Banana outperformed the specialized model SAM 3 by 4.7 percentage points on Cityscapes; in referring expression segmentation, it surpassed SAM 3 Agent. However, it lagged behind SAM 3 in instance segmentation. For 3D tasks, metric depth estimation achieved 0.929 average accuracy across four standard datasets, exceeding Depth Anything V3’s 0.918, using only synthetic data without real depth information or camera parameters at inference. Surface normal estimation achieved state-of-the-art results on three indoor benchmarks.

Fine-tuning involved minimal vision task data mixed into original image generation training, preserving the model’s generation capabilities—performance matched the original Nano Banana Pro in generation quality tests. The paper proposes that image generation pretraining in vision parallels text generation pretraining in language: models learn the internal representations needed for image understanding during generation, with instruction fine-tuning merely releasing this capability.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Related Articles

AI Data Company Mercor Hit by At Least 7 Class-Action Lawsuits Over Data Breach, Computer Monitoring

Gate News message, April 23 — AI data labeling company Mercor, valued at $10 billion and based in San Francisco, faces at least seven class-action lawsuits in recent weeks over third-party data breaches. The company works with clients including OpenAI, Anthropic, and Meta, providing feedback data

GateNews21m ago

SpaceX Estimates Total Addressable Market at $28.5 Trillion, With $26.5T From AI Sector

Gate News message, April 23 — SpaceX estimates its total addressable market (TAM) at $28.5 trillion, according to internal documents. The company projects that over 90% of the market, approximately $26.5 trillion, will come from artificial intelligence. Enterprise AI is expected to account for the

GateNews57m ago

《Naval Handbook》— Naval launches the AI fund USVC, allowing retail investors to invest in OpenAI and Anthropic before listing

Silicon Valley’s well-known investor Naval’s AngelList recently launched a new fund called USVC, positioning it as a way for everyday investors to indirectly participate in hot private tech companies such as OpenAI, Anthropic, xAI, Vercel, Crusoe, Sierra, and Legora with a minimum threshold of just $500. The official messaging frames it as “investing in building future companies before it all becomes obvious,” and emphasizes that it’s a fund open to all investors that does not require accredited investor status. It aims to transform venture capital assets that previously were only accessible to the wealthy and insiders into a product that retail investors can also reach. Invest with $500 to invest in early AI companies USVC’s core narrative is straightforward: as more and more star startups choose to stay in the private market for longer periods, the truly explosive valuation growth often happens in I

ChainNewsAbmedia1h ago

Tesla to Acquire AI Hardware Company for Up to $2 Billion

Gate News message, April 23 — Tesla announced on April 23 that it has agreed to acquire an artificial intelligence hardware company for up to $2 billion in Tesla common stock and equity awards. Approximately $1.8 billion

GateNews1h ago

Top law firms charge more than $2,000 per hour; court documents were exposed for “AI hallucinations and a string of errors.”

A court document filed by top U.S. law firm Sullivan & Cromwell in a bankruptcy case in Manhattan contained about thirty instances of AI-generated errors, false case citations, and fabricated provisions, prompting an apology to the judge. Despite the high hourly rates and internal training policies, the review was not actually implemented during preparation, and the incident has once again sparked debate over the use of AI in the legal profession and ethical responsibility.

ChainNewsAbmedia1h ago

DeepSeek Open-Sources TileKernels, GPU Kernel Library for Large Model Training and Inference

Gate News message, April 23 — DeepSeek has open-sourced TileKernels under the MIT license, a GPU kernel library written in TileLang for large language model training and inference. TileLang is a domain-specific language developed by the tile-ai team for expressing high-performance GPU kernels in

GateNews1h ago
Comment
0/400
No comments