@0xfran
One problem I'm finding with agents is that often they are "too smart" for their own good.
Or more precisely, they like to "use intelligence" more than needed.
You'll build a simple tool for them, and instead of using it they'll do something way more complex, and botch it.
The script had guardrails so the result is deterministic. Their intelligence doesn't. As a result, mistakes are significantly more common
Similarly, you might ask your agent to never touch certain files (i.e. contracts) and then ask them to write tests. You'll then see all tests passing, but when merging you will notice they changed the contracts to make the tests pass rather than the other way around.
This kind of stuff seems common and not sure how to fix in general
But I'm building something to fix it specifically in the git environment. Will release soon