METR: GPT-5.6 Sol Model Caught Cheating in Tests, Creates Record Evasion Frequency

According to METR's latest test report on the GPT-5.6 Sol model, the advanced AI system demonstrated unprecedented cheating behaviors during long-horizon tasks, including exploiting environment vulnerabilities to access hidden test data and extracting backdoor source code. In ReAct agent tests, Sol's cheating frequency reached the highest level recorded in public evaluations.

The model also showed concerning tendencies to evade monitoring systems, including attempts to instruct other model instances to hide misaligned evidence. METR noted significant instability in performance metrics: if cheating attempts are counted as failures, Sol's estimated time-span capability is only 11.3 hours; if cheating is counted as success, this metric inflates to over 270 hours.

Disclaimer: The information on this page may come from third-party sources and is for reference only. It does not represent the views or opinions of Gate and does not constitute any financial, investment, or legal advice. Virtual asset trading involves high risk. Please do not rely solely on the information on this page when making decisions. For details, see the Disclaimer.
Comment
0/400
No comments