I just have it run CLI commands directly, usually with its own limited credentia...

I just have it run CLI commands directly, usually with its own limited credentials and with me reviewing what its going to call outside of a small list of whitelisted commands. It'll then often do a good job composing things to filter using jq and other tools.

For things I have them do a good bit I've written out some basic skills with example usages of how to use those tools. I've also told it in its AGENTS.md to review man pages and issue --help if it isn't confident in how to use a tool.

In a way, imagine you're needing to teach a halfway technically competent person how to use your desired tool. Write a short, concise document about how to use the command. Include the common flags and options you might want it to use, give it some example output. If you see it making the same mistakes over and over, update the skill. Once you've got that skill ironed out, it can be very good at using the tool and understanding its outputs. You can even ask the agent assistance in writing the skill, and suggest it updates the skill when it has trouble doing things you've asked it to do.

One other thing I do, for agents I'm using to debug things I'll tell it in its AGENTS.md that it is only around for fact finding and investigations, that it should not modify environments or do things that change state. It can make recommendations and ask for me to choose to make changes, but never attempt to make any calls which may mutate the environment. Obviously, just asking it to do so doesn't mean it will never do it, but so far I haven't had it actually attempt to do things outside of what I've asked. But I'm also very picky about letting it reach out to things I don't completely control, as context poisoning is a good way to get burned.

And when its hopping in to try and diagnose an issue, give it context as to what you know about the environment. Give it some documentation. If you've got a coworker telling you about what they're seeing, feed that in as well. Imagine if you had someone just telling you "the system is down, fix it!" versus "when I go to this page on this site, it takes too long to load and often ends up giving me a 503 error". Which would you be more successful at rapidly finding a solution for?