Core motivation
The entire process of a peer-review is a long-horizon task where the cognitive abilities of the human are put to test, and during this multi-hop process contextual understanding of the scientific work along with internal priors about the subject dictate the effectiveness of the review. Thematically, this peer-review process maps well as a task for most frontier LLMs. To be more specific, the peer-review agent must be multi-modal, and equipped with tool use (internet search, run code) to capture different aspects of the peer-review.
Overview



Acknowledgement
The computing resources for this work is supported in part by the Google Cloud Research Credits Grant 331845891, and Lambda Labs Credits through the support program D1: CSC-SUPPORT-CDFF-2025-3-31.