Lately there is an surge in explosion of models, recipes and software libraries that are capable of doing deep research. The nature of what constitutes as a deep-research task would really depend on the person you’re asking but its undeniable that any deep-research query is i.) agentic, ii..) long-horizon, iii.) large scale information seeking and iv.) information consumption workflow.
Deep-research agents can be used for various search directives, but they scour the information at a considerably high depth, gather the context of all of the crawled information into a final answer that hopefully gives valuable insights.[1]. Inherently, this is a huge time and effort saving exercise if the report generated in the end is of high quality.
Specialized models
The training behind DR Tulu is quite interesting because it is the first open-weight model that is post-trained using a new rubrics framework, or RLER as they describe it in their Github repo. To be very honest, the repository, blog and README in DR Tulu huggingface provide extensive information about the model and evaluation that its not worth it for me repeat the same information. Although, I personally feel the demo mentioned here gives you a sneak peek into final document quality. One of the cooler things you get to see in the demo page is the answers under section “SimpleQA”, because typically people expect deep research agents to have incredibly verbose answers when the purpose can sometimes be channeled towards rolling out terse answers but searching deeply.
Fig 1. Open deep research model list observed from DR Tulu web annoucement page. Learn more
For those interested in quickly training a deep-research model that is similar to DR Tulu, the instructions here highlight that testing the setup using Qwen-3 0.6B on a single (assuming H100) gpu is possible. A key feature of the::
Fig 2. Tongyi DeepResearch agentic model benchmark results on several search benchmarks. More available
Fig 3. Input to output pipeline for a scientific world model involved in sophisticated autonomous discovery. More about Kosmos (https://edisonscientific.com/articles/announcing-kosmos)
Structurally, are they different tho ?
One would be interested to know what are the structural differences between each of the above mentioned options because the appeal of a deep-research.
Personally, I feel
As a researcher, the biggest cognitive boost I can receieve is by having a reliable co-scientist capable of understanding my workflows for consumption(web), knowledge updates(memory), and selective recall of cosumed information(skills) at frequent/infrequent intervals such that I can play a productive role in