I was experimenting with how local, learnable routers can reduce token overhead, and lower costs, and decided to publish a post about it. The main goal is to delegate tool calls via a PyTorch based learner and examples of how to integrate this into a DSPy pipeline. Feedback welcome!
Thanks for the informative and inspiring post! This is definitely cool, and I can imagine very useful.
However I do want to mention that the “recommended” flow these days isn’t to separate out a tool request in the way you have. Eg instead of asking an LLM to route a tool, extracting that, running the tool, passing output back to the LLM, etc. - you simply pass the tool definitions, prompt, structural output expectations, and let the LLM (and your caller library) manage the tool use loop.
That’s how these modern LLMs are trained in post-training, and so I suspect it’s likely you’ll get different (and potentially worse?) results in trying to subvert this with a small, local model.
It comes with all the downsides you mentioned to let the LLM do this, but is also more likely to be in-distribution, and it’s easier to compose multiple tool calls.
Anyway, thanks for sharing! I’d love to see evals on a task where it compares the result when an LLM is involved in tool selection versus when it is handed tool output only - if I’m wrong about quality degradation then there’s a lot to like about your local tool routing.
great point, appreciate the comment. totally agree with your framing, though i think there’s still a gap in how tool use is handled today.
quick note: it doesn’t have to be an rnn. i’ve got a follow-up example coming that uses a transformer-style ToolController with self attention, more expressive routing, etc.
but here’s the thing — when you rely on few-shot bootstrapping the LLM, you never end up updating the model's priors. even after 100k tool calls, you’re still stuck in the same polluted context window and its all stateless.
this gets worse fast with more than 3–4 tool calls, especially when there’s branching logic (e.g., if api1 > 5, go left, else right).
what this approach offers is: backprop through tool calls. you can tune prompts and update priors across the full workflow, end to end. trying to develop this intuition a bit more, and would love feedback.
thanks for the suggestion on the eval — will post that comparison soon.
That’s cool, I’d love to see the advanced ToolController when it’s available!
Great points about not updating priors. I also thought about it a bit more and realized that there’s a way you can largely mitigate the out-of-distribution inference requests after local tool selection, if you wanted to.
The tool use loop in an inference framework builds up history of each interaction and sends that along with each subsequent request. You could create “synthetic history”, where you send the LLM history containing the prompt, your local tool selection masquerading as though the LLM generated it, and the tool response. This would be in-distribution but still rely on your local tool routing.
If this works well enough, then I think your approach is very powerful once you’ve decided on a task and set of tools and are able to commit to training on that. Definitely want to try this myself now.
Looking forward to seeing more! I take it your substack is the best place to follow along?
I think this is a creative approach. I wonder how the success rates for that little RNN compare to the success rates of the primary LLM, especially for complex queries or complex tool calls. At some point you have to scale that network up large enough to get better results. Eventually you've come back around and you might as well use an LLM. I think a similar approach with potentially better results (depends on the application) could be accomplished by using that same dataset to finetune a small language model. It'd be interesting to see some success rate comparisons.
thank you, appreciate the comment! thats a great point -- as I'm developing this intuition, I'm designing an eval which does a comparison of the openAI example there + tool call using a simple RNN + one that uses an encoder model. would love more feedback (on blog / X etc) when I post.