Tinker, smol-RL and QDoRA (Part 2)

Part 2 goes live on Feb 28, 2026 This is a short preface to Part 2. The full write-up will be published on Feb 28, 2026. If you want the context from Part 1 first, start here: Tinker, smol-RL and QDoRA. TLDR; In Part 1, I framed reproducibility as a practical problem in modern LLM work, not just a philosophical ideal. Even with greedy decoding and fixed seeds, determinism can break in subtle ways: GPU type, numerical precision, kernel choices, and non-deterministic log-probs in MoE models all conspire to make “run it again” less reliable than we like to admit. That led to a simple question: do we need a better abstraction layer so that the model ops complexity is hidden but the critical knobs for determinism are explicit and repeatable? ...

February 11, 2026 · 2 min · Akhil Pandey

Tinker, smol-RL and QDoRA

Update (Feb 11, 2026): Part 2 will land on Feb 20, 2026 with the smol-RL and QDoRA experiments. Preview: Tinker, smol-RL and QDoRA (Part 2). Reproducibility is a bedrock of scientific progress An underlying principle that governed my doctoral research [1] in applying representational learning to understand reproducibilty, was the idea that “Reproducibility is a bedrock of scientific progress” [2]. Naturally, seeing Thinky talk about non-determinism and reframing the discussion around reproducibility in large language models made me realize, that reproducibility has become an ideal that fewer and fewer researchers, engineers, hobbyists alike believed truly across any scientific project. ...

January 2, 2026 · 9 min · Akhil Pandey

What is it about these Deep research models lately ?

Lately there is an surge in explosion of models, recipes and software libraries that are capable of doing deep research. The nature of what constitutes as a deep-research task would really depend on the person you’re asking but its undeniable that any deep-research query is i.) agentic, ii..) long-horizon, iii.) large scale information seeking and iv.) information consumption workflow. Deep-research agents can be used for various search directives, but they scour the information at a considerably high depth, gather the context of all of the crawled information into a final answer that hopefully gives valuable insights.[1]. Inherently, this is a huge time and effort saving exercise if the report generated in the end is of high quality. ...

November 24, 2025 · 8 min · Akhil Pandey

Something new, hello "notes"

Context Sustaining the habit of writing is incredibly hard and something I’ve recognized is quite challenging to sustain. I really like looking at my notes and keeping my mechanical part of my brain active to ensure my recall capacity is fully functional. Although it seems pointless to other, I’ve always considered my daily notes as a great source of comfort because I know there is one place I can reliably go and fetch information that I’ve come across when I’m tinkering something random. That said, why do this publicly, well after observing a lot of people give me raised eyebrows when they look at my obsidian notes coupled with the fact that I’ve been recommended to have my notes made public, I gave into the idea that there is some public utility to my notes. But I guess more importantly, there is greater benefit for me to fetch important information when needed/required immediately (+1 if its on the web). ...

November 13, 2025 · 3 min · Akhil Pandey