Gate News message, April 23 — Perplexity’s research team published a technical article detailing its post-training methodology for web search agents. The approach uses two open-source Qwen3.5 models (Qwen3.5-122B-A10B and Qwen3.5-397B-A17B) and employs a two-stage pipeline: supervised fine-tuning (SFT) to establish instruction-following and language consistency, followed by online reinforcement learning (RL) to optimize search accuracy and tool-use efficiency.
The RL phase leverages the GRPO algorithm with two data sources: a proprietary multi-hop verifiable question-answer dataset constructed from internal seed queries requiring 2–4 hops of reasoning with multi-solver verification, and rubric-based general conversation data that converts deployment requirements into objectively checkable atomic conditions to prevent SFT behavior degradation.
Reward design employs gated aggregation—preference scores only contribute when baseline correctness is achieved (question-answer match or all rubric criteria met), preventing high preference signals from masking factual errors. Efficiency penalties use within-group anchoring, applying smooth penalties to tool calls and generation length exceeding the baseline of correct answers in the same group.
Evaluation shows Qwen3.5-397B-SFT-RL achieves best-in-class performance across search benchmarks. On FRAMES, it reaches 57.3% accuracy with a single tool call, outperforming GPT-5.4 by 5.7 percentage points and Claude Sonnet 4.6 by 4.7 percentage points. Under moderate budget (four tool calls), it achieves 73.9% accuracy at $0.02 per query, compared to GPT-5.4’s 67.8% accuracy at $0.085 per query and Sonnet 4.6’s 62.4% accuracy at $0.153 per query. Cost figures are based on each provider’s public API pricing and exclude caching optimizations.
Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to
Disclaimer.
Related Articles
Google CEO: Capital expenditures in 2026 will reach $185 billion; ramping up investment in the era of AI agents
Google CEO Sundar Pichai announced at Google Cloud Next in Las Vegas on April 22 that Google plans to invest $175 billion to $185 billion in 2026 in capital expenditures to build the infrastructure needed for autonomous AI agents, up from $31 billion in 2022.
MarketWhisper21m ago
Google Jules releases a new version candidate list, repositioning it as an end-to-end product development platform
According to the official April 23 announcement by the Google Jules team, Jules’s product positioning has been upgraded from an asynchronous coding agent to an “end-to-end agentic product development platform.” The new version can read the full product context, independently determine the next steps for building, and submit a PR. The official also announced that the new version candidate list is now open.
MarketWhisper27m ago
Google Jules Rebrands as End-to-End Agentic Product Development Platform, Opens Waitlist for New Version
Gate News message, April 23 — Google's Jules team announced the opening of a waitlist for a new version of the product, repositioning Jules from an asynchronous coding agent to an end-to-end agentic product development platform. According to the official description, the upgraded platform reads enti
GateNews1h ago
OpenAI Codex Team Fixes OpenClaw Authentication Bug, Significantly Improves Agent Behavior
OpenClaw switches from Pi to Codex harness to fix a silent authentication fallback, with two PRs addressing the bridge and fallback; post-fix, the agent shifts from shallow heartbeat polling to a full work loop, enabling progress.
Abstract: OpenClaw’s Codex harness optimization addressed a critical authentication flaw that caused silent fallback to the Pi harness when using Codex with OpenAI models. Two pull requests fix the authentication bridge and prevent silent fallback, changing the runtime adapter. As a result, agent behavior evolves from shallow heartbeat polling to a full work loop that reads context, analyzes tasks, edits repositories, and verifies progress, improving continuity and visibility across heartbeats.
GateNews2h ago
OpenAI Introduces ChatGPT Workspace Agents: Codex-Powered, Team Shared, Slack Integration
OpenAI launched Workspace Agents on April 22 in ChatGPT Business/Enterprise/Edu/Teachers, powered by Codex, designed for long-running cloud operation, shared by teams, and capable of offline execution. They can proactively respond in Slack and generate invoices, execute multi-step workflows, and support scheduling. The research preview is free until May 6; afterward, it will use a credit-based pricing model, with rates to be announced. They compete alongside Google Gemini Enterprise Agent Platform and Anthropic Claude Cowork. All three focus on enterprise-grade agents, but their positioning differs.
ChainNewsAbmedia3h ago
Google Cloud Next 2026: Launches Gemini Enterprise Agent Platform, $750 million to Help Consultants Deploy
Google Cloud unveiled the Gemini Enterprise Agent Platform at Cloud Next 2026, integrating model selection, agent building, DevOps, orchestration, and enterprise security controls, and launched a $750 million fund to help McKinsey, Accenture, and Deloitte deploy enterprise agents. The platform, along with Ironwood TPU, A2A, and MCP, builds its own full-stack offering and consulting channel, to counter OpenAI Operator and Anthropic Claude Enterprise.
ChainNewsAbmedia3h ago