I was experimenting with how local, learnable routers can reduce token overhead,...

rybosome · 2025-07-06T01:20:32 1751764832

Thanks for the informative and inspiring post! This is definitely cool, and I can imagine very useful.

However I do want to mention that the “recommended” flow these days isn’t to separate out a tool request in the way you have. Eg instead of asking an LLM to route a tool, extracting that, running the tool, passing output back to the LLM, etc. - you simply pass the tool definitions, prompt, structural output expectations, and let the LLM (and your caller library) manage the tool use loop.

That’s how these modern LLMs are trained in post-training, and so I suspect it’s likely you’ll get different (and potentially worse?) results in trying to subvert this with a small, local model.

It comes with all the downsides you mentioned to let the LLM do this, but is also more likely to be in-distribution, and it’s easier to compose multiple tool calls.

Anyway, thanks for sharing! I’d love to see evals on a task where it compares the result when an LLM is involved in tool selection versus when it is handed tool output only - if I’m wrong about quality degradation then there’s a lot to like about your local tool routing.

viksit · 2025-07-06T03:58:20 1751774300

great point, appreciate the comment. totally agree with your framing, though i think there’s still a gap in how tool use is handled today.

quick note: it doesn’t have to be an rnn. i’ve got a follow-up example coming that uses a transformer-style ToolController with self attention, more expressive routing, etc.

but here’s the thing — when you rely on few-shot bootstrapping the LLM, you never end up updating the model's priors. even after 100k tool calls, you’re still stuck in the same polluted context window and its all stateless.

this gets worse fast with more than 3–4 tool calls, especially when there’s branching logic (e.g., if api1 > 5, go left, else right).

what this approach offers is: backprop through tool calls. you can tune prompts and update priors across the full workflow, end to end. trying to develop this intuition a bit more, and would love feedback.

thanks for the suggestion on the eval — will post that comparison soon.

rybosome · 2025-07-06T15:20:42 1751815242

That’s cool, I’d love to see the advanced ToolController when it’s available!

Great points about not updating priors. I also thought about it a bit more and realized that there’s a way you can largely mitigate the out-of-distribution inference requests after local tool selection, if you wanted to.

The tool use loop in an inference framework builds up history of each interaction and sends that along with each subsequent request. You could create “synthetic history”, where you send the LLM history containing the prompt, your local tool selection masquerading as though the LLM generated it, and the tool response. This would be in-distribution but still rely on your local tool routing.

If this works well enough, then I think your approach is very powerful once you’ve decided on a task and set of tools and are able to commit to training on that. Definitely want to try this myself now.

Looking forward to seeing more! I take it your substack is the best place to follow along?

viksit · 2025-07-08T16:00:56 1751990456

this is really interesting! yes, its my substack.

also, if you're down, love to connect and talk more about what use cases / techniques you're using. I'm @viksit on X dms if that works.

krohling · 2025-07-05T22:19:21 1751753961

I think this is a creative approach. I wonder how the success rates for that little RNN compare to the success rates of the primary LLM, especially for complex queries or complex tool calls. At some point you have to scale that network up large enough to get better results. Eventually you've come back around and you might as well use an LLM. I think a similar approach with potentially better results (depends on the application) could be accomplished by using that same dataset to finetune a small language model. It'd be interesting to see some success rate comparisons.

viksit · 2025-07-06T04:00:59 1751774459

thank you, appreciate the comment! thats a great point -- as I'm developing this intuition, I'm designing an eval which does a comparison of the openAI example there + tool call using a simple RNN + one that uses an encoder model. would love more feedback (on blog / X etc) when I post.

ctxc · 2025-07-05T23:02:07 1751756527

Nit - code screenshots are a PITA to read on mobile!

viksit · 2025-07-06T04:04:20 1751774660

ty for the feedback, yes, balancing bad code blocks on substack vs making it look pretty lol. I'll post code next time.

joe_the_user · 2025-07-05T22:39:14 1751755154

My question is whether you have managed to make this work, perform a specific complex task, in some real world situation.

viksit · 2025-07-08T16:01:13 1751990473

great q. thats coming up as a future post in the series.

zitterbewegung · 2025-07-05T23:17:22 1751757442

Can you put all of the code into a gist or something?

viksit · 2025-07-06T04:03:31 1751774611

yes apologies, the code rendering in substack wasn't great, but I'll put this in a gist!

bGl2YW5j · 2025-07-06T00:26:21 1751761581

Creative. You’ve given me some ideas. Thanks!