To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
The technical storage or access that is used exclusively for statistical purposes.
The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
BC
October 7, 2025The approach of shortening context by using a separate “Shortening LLM” introduces another LLM call, which adds latency and potential errors. Testing similar patterns locally shows that the summarizer often omits details that seem irrelevant at step N but become important at step N+5. The recall-then-precision strategy sounds promising but needs extensive test data that mimics real multi-step workflows, which most teams lack. The “can a human understand the tool” benchmark for tool descriptions fails when tools have complex state dependencies or side effects. A human reading isolated function signatures doesn’t realize that calling tool A invalidates cached results from tool B or that certain parameter combinations trigger rate limits. These system interactions aren’t visible in docstrings.
The recommendation to use specific versus generic tools leads to tool overproliferation, which itself creates a context problem. “Get sorted list of documents by date for customer ID” is more specific than “fetch from database,” but this results in dozens of similar narrow-purpose tools. Testing locally shows agents have difficulty choosing among over 20 similar tools more than they do using five flexible ones correctly. The examples of error handling assume deterministic failure modes.
In practice, network errors are intermittent, rate limits vary by endpoint, and “sleep 10 seconds then retry” works until it doesn’t. Effective error recovery requires the agent to reason about retry strategies, backoff timing, and when to escalate to humans—guidance that doesn’t fit neatly into error messages.