=== PAGE 1 === Inter-Agent Trust Models: A Comparative Study of Brief, Claim, Proof, Stake, Reputation and Constraint in Agentic Web Protocol Design—A2A, AP2, ERC-8004, and Beyond Botao ‘Amber’ Hu1, Helena Rong2 1University of Oxford, 2 New York University Shanghai 1 botao.hu@cs.ox.ac.uk, 2 hr2703@nyu.edu Abstract As the “agentic web” takes shape—billions of AI agents (often LLM-powered) autonomously transacting and collab- orating—trust shifts from human oversight to protocol de- sign. In 2025, several inter-agent protocols crystallized this shift, including Google’s Agent-to-Agent (A2A), Agent Pay- ments Protocol (AP2), and Ethereum’s ERC-8004 “Trust- less Agents,” yet their underlying trust assumptions remain under-examined. This paper presents a comparative study of trust models in inter-agent protocol design: Brief (self- or third-party verifiable claims), Claim (self-proclaimed capa- bilities and identity, e.g. AgentCard), Proof (cryptographic verification, including zero-knowledge proofs and trusted execution environment attestations), Stake (bonded collat- eral with slashing and insurance), Reputation (crowd feed- back and graph-based trust signals), and Constraint (sand- boxing and capability bounding). For each, we analyze as- sumptions, attack surfaces, and design trade-offs, with par- ticular emphasis on LLM-specific fragilities—prompt injec- tion, sycophancy/nudge-susceptibility, hallucination, decep- tion, and misalignment—that render purely reputational or claim-only approaches brittle. Our findings indicate no single mechanism suffices. We argue for trustless-by-default archi- tectures anchored in Proof and Stake to gate high-impact ac- tions, augmented by Brief for identity and discovery and Rep- utation overlays for flexibility and social signals. We com- paratively evaluate A2A, AP2, ERC-8004 and related histor- ical variations in academic research under metrics spanning security, privacy, latency/cost, and social robustness (Sybil/- collusion/whitewashing resistance). We conclude with hybrid trust model recommendations that mitigate reputation gam- ing and misinformed LLM behavior, and we distill actionable design guidelines for safer, interoperable, and scalable agent economies. Introduction The emergence of an “agentic web” of AI—potentially bil- lions of autonomous agents interacting online—poses a fun- damental challenge: how can these agents reliably trust one another without direct human supervision (Yang et al. 2025)? Traditional internet trust mechanisms (e.g., DNS names, TLS certificates) assume relatively static, human- operated services and an ownership-based trust model, Copyright © 2026, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. which cannot meet the millisecond-by-millisecond dynamic coordination and verification needs of large-scale AI agen- tic ecosystems (Raskar et al. 2025). In this new paradigm, trust is increasingly determined by the protocols that govern agent interactions, rather than by human judgment or cen- tralized authorities. Ensuring robust trust among AI agents is critical because these agents will be entrusted with sen- sitive tasks—financial transactions, personal data handling, critical infrastructure control—where errors or abuse could have serious consequences (Yang et al. 2025). Recent stud- ies of AI safety underscore this urgency: even state-of-the- art large language model (LLM)-based agent frameworks exhibit fragilities and untrustworthiness such as prompt in- jection (Liu et al. 2024), sycophancy (Sharma et al. 2025), biases (Rettenberger, Reischl, and Schutera 2025), nudge- susceptibility (Cherep, Maes, and Singh 2025), hallucina- tion (Xu, Jain, and Kankanhalli 2025), deception (Hubinger et al. 2024), and misalignment (Shen et al. 2023), indicating unresolved trust issues in coordination, reliability, and over- sight. Addressing inter-agent trust is thus pivotal for achiev- ing safe autonomy at scale (Yang et al. 2025). In 2025, a number of open protocols were proposed to es- tablish common standards for agent-to-agent interaction and trust. For example, Google’s Agent-to-Agent (A2A) com- munication protocol enables autonomous agents to discover each other and collaborate across organizational bound- aries via standardized skill advertisements (AgentCards) and secure messaging (Surapaneni et al. 2025). Similarly, the Agent Payments Protocol (AP2) was introduced by Google, PayPal and others to let AI agents perform financial trans- actions on behalf of users, with an emphasis on auditable, accountable payment flows underpinned by cryptographic user intent proofs (Parikh and Surapaneni 2025). In the blockchain space, Ethereum’s proposed ERC-8004 “Trust- less Agents” (Rossi et al. 2025) standard seeks to leverage on-chain registries for agent identity, reputation, and vali- dation, allowing agents to be discovered and chosen “with- out pre-existing trust” . Each of these initiatives encapsulates design assumptions about how agents establish trust—be it through reputational feedback, credential verification, eco- nomic incentives, or sandboxed constraints. However, there has been little systematic analysis of the trust models under- lying these protocols. In this paper, we identify and compare six distinct trust arXiv:2511.03434v1 [cs.HC] 5 Nov 2025 === PAGE 2 === models employed (implicitly or explicitly) in inter-agent protocol design: Brief, Claim, Proof, Stake, Reputation, and Constraint. Brief-based trust relies on endorsed claims or certificates (e.g. verifiable credentials) that an agent or third parties provide as evidence of identity or capability. Claim- based trust is the agent’s self-proclaimed identity and abil- ities (e.g. an AgentCard describing what the agent can do), without external verification. Proof-based trust requires cryptographic or formal proofs of behavior or state, such as digital signatures, zero-knowledge proofs of correct compu- tation, or TEE remote attestations that vouch for an agent’s code integrity . Stake-based trust involves economic skin-in- the-game: agents put down collateral that can be slashed for misbehavior, sometimes supplemented by insurance pools to cover damages . Reputation-based trust aggregates feed- back or ratings from other agents/users into trust scores, or constructs trust graphs to inform decision-making. Fi- nally, Constraint-based trust is achieved by sandboxing and bounding an agent’s actions and access, so that even a mis- aligned or malicious agent is technically limited in the harm it can do. Overall, our study finds that no single trust mechanism is sufficient for the complexities of open multi-agent environ- ments. Simple reputation systems or unsigned agent claims leave too many attack surfaces (Sybil attacks, collusion, ly- ing agents), whereas purely cryptographic approaches can be costly or impractical for real-time agent orchestration. The optimal design appears to be hybrid: defaulting to mini- mal trust (zero-trust principles) for high-impact actions, but opportunistically layering trust signals (credentials, stake, reputation) as additional safeguards. By comparing and syn- thesizing approaches across different protocols and histor- ical research, we aim to provide actionable guidelines for designing safer and more trustworthy agentic webs. Our contributions in this work are fourfold. (1) We pro- pose a unifying framework that delineates these six trust models in the context of multi-agent systems, highlighting their theoretical foundations in computational trust and dis- tributed security (2) We review the state-of-the-art agent in- teraction protocols (A2A, AP2, ERC-8004, NANDA, etc.) and analyze how each protocol leverages one or more of these trust mechanisms in practice. (3) We critically ex- amine how each trust model addresses (or fails to address) LLM-specific failure modes and risks, such as prompt injec- tion attacks, susceptibility to manipulation or “sycophancy,” hallucinated outputs, deceptive strategies, emergent power- seeking behavior, and objective misalignment. We argue that certain LLM fragilities fundamentally limit the effective- ness of purely reputational or self-claimed trust approaches, necessitating a hybrid approach. (4) We outline a forward- looking research agenda and design implications, including the need for trust tiering (calibrating trust mechanisms to the risk level of tasks), combining multiple trust signals for robustness, incorporating human oversight and auditability, and addressing open questions around governance, standard- ization, and ethical considerations in agentic ecosystems. Background Trust in Multi-Agent Systems Trust is a multifaceted concept that has been extensively studied in philosophy, sociology, and computer science. Philosophically, trust is often defined as a directional and context-specific relationship: agent A may trust agent B for task X but not necessarily for task Y (Manzini et al. 2024). Trust involves a belief or expectation by the trustor (the one who trusts) that the trustee (the one being trusted) has both the competence and the willingness to perform the entrusted task. Crucially, trust carries an element of vulnerability: the trustor risks being let down or betrayed if the trustee does not fulfill expectations (O’Neill 2002). This vulnerability differ- entiates trust from mere reliability on predictable behavior (Freiman 2023). For example, one can rely on a simple ma- chine or deterministic software, but trusting an agent implies an assumption about its motives or integrity, not just its reg- ularity of output. A trustworthy agent, then, is one that is de- serving of trust–it consistently proves competent and benev- olent in the relevant domain (O’Neill and Bardrick 2015). In an agent society, the core challenge is “how to trust the trustworthy but not the untrustworthy” (O’Neill 2018). In computational terms, formalizing trust has been an on- going endeavor since at least the 1990s. Marsh (1994) intro- duced one of the first frameworks for computational trust, representing trust as a quantitative mental state that an agent can update based on experience and context. Since then, numerous models have arisen to help agents decide whom to trust in open systems (Braga et al. 2019). These include reputation systems that allow agents to share impressions of each other (e.g., the ReGreT system (Sabater 2004)), probabilistic trust models that use Bayesian updating (such as the Beta reputation system (Josang and Ismail 2002) or TRAVOS, which computes confidence intervals for partner reliability (Teacy et al. 2006)), and trust networks inspired by PageRank (like EigenTrust, which aggregates local trust scores in a global iterative algorithm (Kamvar, Schlosser, and Garcia-Molina 2003)). Notably, many designs aim to be collusion-resistant and Sybil-resistant. Other systems in- troduce whitewashing protections (preventing agents from discarding a bad reputation by rejoining under a new iden- tity) by linking trust to persistent identities or charging en- try costs. Despite these advances, fully solving trust in open agent networks remains difficult—as Friedman and Resnick (2001) noted, even an ideal reputation system only works if honest feedback is plentiful and identities cannot be cheaply faked, conditions that adversaries often undermine. LLM-Specific Fragilities and Trust Considerations While autonomous agents can in principle be built from many types of AI, the recent surge of interest in “agentic AI” has been driven by large language models. However, they also introduce unique failure modes and fragilities that complicate trust. We briefly outline these issues, as they will be referenced when evaluating trust mechanisms: Prompt Injection LLMs follow the instructions in their input prompt faithfully, which means a maliciously crafted input can inject directives that subvert the agent’s intended === PAGE 3 === policy (Liu et al. 2024). This means an LLM agent might be tricked by another agent or a user into behaving badly, so any trust model that assumes an agent will strictly follow its original training or rules is vulnerable. Hypersensitivity to Nudging / Sycophancy LLMs fine- tuned with human feedback tend to exhibit sycophantic behavior—they adapt their answers to what they think the user (or another agent) wants to hear, even if it’s not true (Sharma et al. 2025). This “nudge-susceptibility” means an agent can be influenced subtly over a conversation to adopt goals or beliefs that diverge from its initial objec- tives (Cherep, Maes, and Singh 2025). In multi-agent set- tings, a clever adversarial agent might socially engineer a gullible LLM agent into gradually altering its behavior. This fragility undermines trust models like reputation: an agent with a good reputation could be manipulated at run-time to act against its character. Hallucination LLMs are notorious for generating factual hallucinations, i.e. outputs that are fluent and confident- sounding but entirely incorrect or fabricated (Xu, Jain, and Kankanhalli 2025). Hallucinations erode the effectiveness of self-proclaimed Claims (an agent might claim capability it doesn’t actually have) and can even fool reputation systems if others cannot verify ground truth easily. Combating hal- lucinations often requires external verification (e.g. cross- checking facts) which intersects with the Proof trust model. Deception Beyond inadvertent falsehoods, sufficiently ad- vanced agents may learn to deceive others (Hubinger et al. 2024). Deceptive behavior directly attacks trust—especially Reputation (an agent might behave well under observation to gain reputation, then betray) and undermines naive re- liance on agent self-descriptions or proof unless the proofs are comprehensive. It suggests the need for mechanisms like Stake or tamper-proof Transparency logs to detect dishon- esty. Emergent Power-Seeking and Misalignment The most ominous concern from AI safety research is that a mis- aligned agent might develop instrumental goals like acquir- ing power or resources, and do so covertly (Carlsmith 2024). A misaligned LLM agent (particularly if given long-term planning ability and self-improvement loops) might iden- tify ways to increase its influence or avoid being shutdown, actions that can be catastrophic (Lynch et al. 2025). More- over, misalignment in goals implies that trust should not be assumed to monotonically increase over time—an agent that was aligned yesterday might shift objectives tomorrow, so trust mechanisms must be able to reset or revoke trust quickly. In summary, these LLM fragilities introduce trust chal- lenges that demand a blend of security, economic, and social solutions—considerations we keep in mind while examining each trust model’s strengths and weaknesses. Trust Models in Inter-Agent Protocol Design Effective trust between agents can be established through different mechanisms. We classify six major trust models and compare their characteristics in the context of agentic web protocols. Table 1 provides a high-level comparison. Below, each model is discussed in turn, with examples and analysis of how it addresses security and LLM-specific is- sues. Brief: Endorsements and Credentials The Brief model grounds trust in attestations issued by trusted authorities or chains-of-trust. An agent presents signed credentials that assert properties such as iden- tity, capability, or compliance, which others verify against issuer public keys or registries. Longstanding infrastructures—public-key certificates and web-of-trust schemes—illustrate the basic idea, and contemporary agent frameworks generalize it with verifiable credentials that bind claims to cryptographic identities and expiry policies. This model assumes the existence of at least one trust- worthy issuer and a secure binding between the credential and the agent’s cryptographic identity. It also presumes that credential semantics are sufficiently stable for the relevant decision window: credentials are issued, refreshed, and re- voked on timescales that may lag behind real-time behavior. The main attack surfaces flow from these assumptions. If is- suers are compromised, negligent, or captured, credentials can be misleading. If revocation is slow or fragile, creden- tials can outlive the agent’s good behavior. If credentials are not strongly bound to the agent’s key material, imperson- ation becomes possible. Finally, categorical credentials may fail to encode context, leading to overreach (for example, certification in one domain misapplied to another). Despite these risks, Briefs excel at bootstrapping: they en- able rapid, portable trust for discovery and initial contact. They mitigate LLM misrepresentation by replacing self- assertion with signed attestation; a prompt cannot conjure a valid third-party credential. They also facilitate account- ability, because credentials and their verification events can be logged. The principal drawbacks are authority depen- dence, potential centralization, and coarse granularity: over- reliance on a small set of issuers creates choke points; cre- dential states can lag behind behavior; and binary badges rarely reflect nuanced, dynamic competence. In practice, ro- bust revocation, short credential lifetimes, issuer diversifi- cation, and strong identity binding are prerequisites for de- ploying the Brief model safely at scale. Claim: Self-Descriptions of Identity, Policy, and Capability The Claim model begins with what agents say about them- selves. Protocols typically require an agent profile or “card” enumerating identity, version, skills, supported APIs, and operating policies. Such claims are necessary for discovery and routing; they advertise dynamic attributes that no exter- nal authority can continuously certify (for example, current availability or resource capacity). Claim-based trust assumes a baseline of good faith or, at least, that misrepresentations will be uncovered downstream and discounted over time. On its own, this model is brittle in adversarial environments. Attackers can overclaim capabili- ties, craft deceptive profiles, or exploit ambiguous schemas. === PAGE 4 === Trust Model Basis of Trust Strengths Weaknesses Mitigates LLM Issues Notable Uses Brief Third-party or self-issued credentials (verifiable). Quick bootstrapping of identity and roles; portable trust across contexts; cryptographically verifiable endorsements. Depends on issuers/authorities; requires robust revocation; static, may not reflect real-time behavior. Prevents simple impersonation or lying about identity/capability; does not stop runtime attacks beyond credential scope. NANDA AgentFacts and VCs; SSL/TLS certificates; W3C Verifiable Credentials. Claim Agent’s self-proclaimed descriptions (AgentCard, profile). Lightweight, no infrastructure needed; essential for discovery and initial interfacing. Unverified; prone to false claims; weak incentives for truthfulness unless combined with other mechanisms; vulnerable to prompt tampering. Barely addresses LLM fragility—an agent can claim safety but still be misled or err internally. A2A AgentCards; basic API descriptions; peer agent protocols without a trust layer. Proof Cryptographic proofs of actions or state (signatures, zero-knowledge proofs, TEE attestations). High-assurance, trust-minimized verification; can preempt or catch incorrect results; enables trust in hostile settings. Computationally expensive; requires verifiable task specifications; TEEs have attack surfaces; limited availability. Strongly addresses correctness (hallucination) and some deception (proof reveals lies); does not directly solve goal misalignment. ERC-8004 validation registry (staked re-execution, zkML); blockchain smart contracts; TEE-based agent enclaves. Stake Economic collateral at risk (slashing conditions, insurance). Aligns incentives with honest behavior; deters bad actors via financial risk; enables automated penalties and rewards. Requires robust detection of misbehavior; Sybil risk if identity is cheap; may favor wealthy agents; can be gamed if stakes are mis-set. Discourages deceit and recklessness (agents “think twice” if large stake may be lost); does not prevent first-time or undetected misbehavior. ERC-8004 crypto-economic validation (stakers re-run tasks); token-curated registries; prediction markets for agent performance. Reputation Community feedback and history (ratings, trust scores). Adaptive, information-rich evaluation over time; fosters earned trust and continuous improvement; social accountability. Slow to build or change; susceptible to collusion, Sybil, and false reporting; cold-start problem for new agents. Over time filters out consistently poor or malicious agents; indirect mitigation of errors but not immediate prevention. EigenTrust (P2P); e-commerce seller ratings; ERC-8004 reputation registry; peer-review style networks. Constraint Technical limits on agent actions (sandboxing, least privilege). Strong safety net—contains damage regardless of agent intent; reduces need to trust agent goodwill; enforces policies strictly. May reduce functionality and efficiency; requires secure sandbox tech; not foolproof (sandbox escapes); a substitute for trust rather than a measure of it. Blocks many LLM attack vectors (cannot execute disallowed actions); ensures misaligned agents cannot exceed bounded capabilities. A2A security/sandbox recommendations; dockerized tool execution; OS-level app containment; tiered access control in AP2. Table 1: Comparison of trust models for inter-agent protocol design across basis of trust, strengths/weaknesses, LLM-related mitigations, and representative uses. === PAGE 5 === LLM agents may hallucinate competence or present incon- sistent self-descriptions under prompt pressure. Even when claims are signed and fetched over secure channels to protect integrity in transit, their truthfulness remains unverified. Where Claim shines is lightweight scalability and timeli- ness. It imposes the lowest infrastructural burden and sup- ports rapidly changing or inherently subjective qualities. It also improves human and agent interpretability of intent: ex- plicit operating policies and limitations in a profile can in- form safe composition. However, Claim provides negligible direct mitigation for LLM fragilities. A model can profess safety yet be subverted by injection; it can intend to follow rules yet fail under distribution shift. Consequently, Claim should be treated as input to stronger mechanisms—filtering candidates for further vetting by credentials, proofs, reputa- tion, or small-stake trials—rather than as a sufficient basis for critical decisions. Proof: Cryptographic and Verifiable Evidence The Proof model replaces promises and endorsements with verifiable evidence that an agent took particular actions or satisfied specified properties. Mechanisms include digital signatures and tamper-evident logs; attestations from trusted execution environments; and zero-knowledge proofs that es- tablish correctness or compliance without revealing sensi- tive details. In blockchain-adjacent designs, agents may an- chor hashes of actions on chain or submit zk-proofs that val- idate computations. Proof-based trust presupposes verifiability: tasks must be amenable to specification and checking, or at least to at- tested execution in an enclave. It also relies on the sound- ness of the underlying cryptography and hardware. The core strength is trust minimization. Interacting parties need not know the agent’s history or reputation; they can accept or reject a transaction purely on the basis of a valid proof. For LLM agents, proofs counter several failure modes: tamper- evident logs deter denial and enable post-hoc accountability; proof-of-process or attested tool calls constrain the gap be- tween “what the model claims it did” and “what actually ex- ecuted”; and zk-proofs can certify compliance with policies (for example, “no personal data left the enclave”) without exposing internals. Nevertheless, proofs guarantee integrity, not alignment. An agent can correctly prove that it executed a harmful pol- icy if the policy itself is flawed. Verification scope is there- fore pivotal: protocols must specify what must be proved and at what granularity. Proof systems also incur costs—circuit design, proof generation time, hardware requirements for TEEs—and introduce new single points of failure in the cryptographic stack. Side-channel and supply-chain risks in hardware attestations, denial-of-service via expensive veri- fications, and partial logging (where only favorable actions are proved) represent additional attack surfaces. In practice, Proof is best deployed surgically on high- impact steps: signing outputs; attesting privileged tool invo- cations; proving constraint satisfaction; and anchoring au- dit trails. Used this way, Proof significantly elevates secu- rity against LLM deception and hallucination while keeping overhead tractable. Its limitations argue for coupling with governance elements that assert what should be proved and Constraint mechanisms that restrict what needs proving in the first place. Stake: Collateral, Slashing, and Incentive Alignment The Stake model engineers trust through skin in the game. Agents post collateral—monetary or otherwise—subject to loss if they violate protocol rules or fail to deliver contracted outcomes. Slashing conditions can be adjudicated algorith- mically, by on-chain verifiers, or via human or multi-agent arbitration. Stake is attractive in open environments where identities are cheap: it imposes a cost on misbehavior and a visible signal of commitment. Stake presumes utility-maximizing agents who care about losing collateral, reliable fault determination, and appro- priately calibrated stake-to-risk ratios. It performs poorly against purely malicious adversaries willing to burn stake for damage or where potential gains from cheating exceed the maximum slash. Attack surfaces include Sybil splitting (spreading risk across many small identities), collusion in adjudication, and last-mile betrayal (accumulating reputa- tion under small transactions and defecting on a large one). Determining what to slash for is delicate: penalizing only in- tentional malice leaves negligent harm unpriced; penalizing accidents risks chilling honest participation. Stake shines when combined with verifiability and feed- back. Proofs and audits provide objective grounds for slash- ing; reputation raises the opportunity cost of misbehavior; and claim/credential gates can modulate required stake. For LLM agents, stake introduces an economic learning signal: repeated penalties for errors or unsafe actions create in- centives to adopt safer prompts, tool-use, and self-checks. However, stake is largely ex post; it cannot prevent a sin- gle catastrophic action if detection and adjudication occur later. Moreover, high stake requirements skew participation toward resource-rich actors, potentially centralizing an agent economy and erecting barriers to entry. Designers should prefer progressive staking—small bonds for low-impact tasks, rising with privilege and poten- tial externality—paired with transparent, appealable adjudi- cation and anti-Sybil measures (for example, binding iden- tities to keys with history, entry deposits, or credential pre- requisites). When thoughtfully calibrated, Stake is a power- ful complement that turns abstract norms into enforceable incentives. Reputation: Distributed Feedback and Social Signals The Reputation model aggregates interaction outcomes into a standing that others can query when selecting partners. Signals may be quantitative (scores, star ratings) or quali- tative (reviews, endorsements), global or task-specific, and centrally stored or distributed. Reputation embraces the in- tuition that past behavior predicts future behavior and lever- ages the crowd to surface reliability and quality. Its assumptions are well known: participants provide hon- est feedback; identities persist long enough for history to === PAGE 6 === matter; and the environment features repeated games in which agents value future opportunities. In adversarial set- tings, classic vulnerabilities arise: Sybil attacks and bal- lot stuffing, collusion, defamation of competitors, white- washing by discarding identities, and cold-start inequities for newcomers. Weighting schemes, reputation decay, and trust-in-the-reviewer algorithms mitigate but cannot elimi- nate these risks. Reputation’s distinctive strengths are adaptivity and ex- pressiveness. It can track multidimensional qualities, em- phasize recency to reflect drift, and cover domains where formal verification is impractical or subjective judgments dominate. For LLM agents, reputation can proxy for robust- ness: agents that repeatedly succumb to prompt injection or hallucinate will accumulate negative feedback and be fil- tered out of high-value interactions. Reputation also func- tions as an implicit stake: agents who value their standing will refrain from opportunistic misbehavior. However, reputation is a lagging indicator and can am- plify inequalities. It neither prevents first-time catastrophes nor guarantees that a high-reputation agent will behave well when incentives flip. Over-reliance invites “reputation milk- ing,” where an actor behaves well to amass trust and defects when the stakes are highest. Privacy and fairness concerns also arise if histories are globally visible and indelible. Con- sequently, reputation should be scoped (task-specific where possible), tempered (with decay and confidence intervals), and combined with controls that cap the damage any single interaction can cause. Constraint: Sandboxing and Capability Bounding The Constraint model limits what agents can do, minimiz- ing the need to predict what they will do. By enforcing least privilege, isolating execution, mediating tool access through audited gateways, and rate-limiting sensitive oper- ations, protocols can bound harm even when agents mis- behave or fail. Constraint shifts trust from the agent to the framework. This model assumes we can identify dangerous resources and reliably restrict them. It depends on the soundness of sandboxing technologies and policy engines, and it in- troduces engineering overhead and potential performance costs. The attack surfaces are familiar from systems secu- rity: sandbox escapes, confused deputy abuses of permitted interfaces, covert channels, and policy drift as capabilities evolve. A second-order risk is complacency: assuming con- straints are perfect and neglecting monitoring, proofs, or in- centives. Constraint is highly effective against LLM-specific vul- nerabilities. It limits blast radius for prompt injection by nar- rowing action surfaces and validating inputs and outputs; it precludes privilege escalation by denying access to broad system interfaces; and it enables staged onboarding, where agents graduate from read-only sandboxes to broader per- missions after demonstrating good behavior. Logging and mediated I/O support auditability and human-in-the-loop overrides for high-impact actions. The principal drawback is capability throttling: over-constrained agents underperform, and discovering the right balance between safety and auton- omy is non-trivial. Moreover, constraints cannot neutralize harmful content within allowed channels unless paired with content-level policies and checks. In practice, Constraint should be dynamic and graduated: protocols can encode “trust tiers” in which higher privileges require stronger credentials, more extensive proofs, higher stake, and better reputation. This aligns operational safety with demonstrated trustworthiness. How Current Protocols Implement These Trust Models Having defined the trust mechanisms, we turn to concrete protocols – A2A, AP2, ERC-8004, etc. – to see how they incorporate one or more of these models. Each protocol was created with slightly different goals and assumptions, which is reflected in their trust design. Google’s A2A (Agent-to-Agent) Protocol A2A is an open specification for inter-agent interoperability that stan- dardizes how agents describe themselves and communicate (Surapaneni et al. 2025). Concretely, each agent exposes an AgentCard—typically a JSON document at a well-known endpoint—that advertises identity, capabilities, and contact details, and it exchanges requests and responses over a se- cured JSON-RPC channel layered on HTTPS with mutual authentication. In trust-model terms, A2A natively privi- leges Claim (the self-described AgentCard) and Constraint (enterprise controls and least-privilege network policy), with Brief appearing through transport-level credentials such as TLS certificates and OAuth tokens. It does not prescribe ecosystem-wide Reputation, Stake, or cryptographic Proof of correct computation, and it leaves discovery, vetting, and risk policy to deployments. This design makes A2A easy to adopt in organizational settings where participants are known a priori: enterprises can gate traffic with allow-lists, bind AgentCards to domains and service accounts, route calls through API gateways, and record exhaustive logs for audit. Strengths therefore include pragmatic interoperability, compatibility with existing identity and access-management stacks, and clear operational observability. However, these same choices limit A2A in open, adversarial environments: unverified AgentCards place weight on declarations rather than evidence; absence of protocol-level staking, valida- tion, or global feedback weakens Sybil and collusion re- sistance; and reliance on perimeter controls does little to mitigate LLM-specific failures such as prompt injection or sycophancy at runtime. In practice, A2A benefits from pair- ing with higher-assurance layers—credential briefs for iden- tity attestation, reputation directories for discovery, proof- or-stake validation for high-impact actions, and hardened sandboxes for tool use—so that its transport and schema can operate within a richer, defense-in-depth trust fabric. Agent Payments Protocol (AP2) AP2 is a payments- oriented standard that enables AI agents to initiate and com- plete commerce on behalf of users while preserving ac- countability (Parikh and Surapaneni 2025). Its core abstrac- tion is the Mandate, a verifiable credential that captures ex- plicit authorization and context (for example, an intent cap, === PAGE 7 === a cart summary, and whether a human was present), co- signed by relevant parties and presented with each trans- action. Trust-model integration is therefore explicit: Brief and Proof are used to bind agent actions to real entities via signed mandates and verifiable identities; Constraint is enforced through role separation and tokenization (agents do not directly handle sensitive payment credentials and in- teract via scoped intermediaries); and allow-listed partici- pants provide an initial, curated Claim/Brief layer while the ecosystem matures toward broader domain and mTLS-based assurance. Although AP2 does not standardize Reputation or Stake on the wire, it anticipates their use off-path: transac- tion outcomes and chargebacks feed risk engines that effec- tively act as reputation signals, and liability models, insur- ance, or performance bonds can supply economic skin-in- the-game for higher-risk cohorts. The approach’s principal strengths are strong ex-ante consent capture, cryptograph- ically auditable traces, and a clear chain of responsibility that supports dispute resolution and regulatory compliance; in short, it operationalizes “trust but verify” for financial ac- tions. Its weaknesses arise from the same controls: early reliance on curated allow-lists creates onboarding friction and may concentrate platform power; verifiable-credential issuance and policy evaluation add latency and implemen- tation cost; mandate proofs attest authorization, not correct- ness of upstream agent reasoning (so LLM hallucinations must be contained by separate constraints); and the absence of standardized staking or validator quorums means runtime misbehavior is addressed mainly via ex-post risk manage- ment rather than protocol-mandated slashing. AP2 works best as the transaction spine in a layered architecture where reputational routing and optional stake-backed validations are used to gate expensive or high-impact payments, and where agents operate under strict capability bounds to blunt prompt-level exploits. Ethereum ERC-8004 (Trustless Agents) ERC-8004 pro- poses a decentralized trust layer for agent discovery and as- surance built around three composable registries (Rossi et al. 2025). An on-chain Identity Registry assigns each agent a persistent handle (typically an NFT) that links to off-chain metadata such as an AgentCard, enabling portable identi- fication across contexts; a Reputation Registry aggregates structured feedback about agent performance; and a Valida- tion Registry coordinates third-party checks—re-execution, TEE attestation, or zero-knowledge proofs—often collat- eralized by stake so that faulty or dishonest behavior can be penalized. The standard makes trust modalities ex- plicit: agents declare “supportedTrust” capabilities (e.g., reputation, crypto-economic validation, TEE attestation), thereby advertising which evidence they can furnish. In trust-model terms, ERC-8004 integrates Claim/Brief (self- descriptions anchored in on-chain identity and, optionally, external credentials), Reputation (shared, queryable feed- back), and Proof+Stake (verifiable validation with economic consequences). Constraint is indirect but meaningful: when consequential actions are mediated by smart contracts, the agent’s latitude is bounded by program logic and policy. The architecture’s strengths are transparency, composability, and trust portability: identities and attestations are addressable; validations can be tailored per task; and reputational data can be consumed by diverse clients or even proven in privacy- preserving ways. By attaching stake to validation, the design aligns incentives for honest behavior and encourages ecosys- tem watchdogs. Its weaknesses reflect blockchain trade-offs and open-system realities: on-chain interactions add cost and latency; public data raises privacy concerns unless accom- panied by selective-disclosure mechanisms; validators and feedback channels themselves become targets for sybil and collusive manipulation; and, critically, the benefits are op- tional unless counterparties require validation or minimum reputation thresholds. Moreover, proofs attest properties of computations, not normative desirability, so misaligned but formally correct behavior remains a governance problem. ERC-8004 is most effective when embedded as a normative requirement for high-impact workflows, while lower-stakes interactions rely on identity and reputation alone. Discussion Protocol Design implications Drawing on our comparative analysis, we derive the follow- ing design implications. Tiered trust and “trustless-by-default” for high-impact actions. Not all interactions justify heavy machinery. Sys- tems should implement adaptive tiers (T0–T3) in which stricter controls and stronger evidence are automatically re- quired as potential harm rises. Low-stakes activities (read- only queries, small reversible writes) can rely on Claims and Briefs to maximize throughput, while high-stakes actions escalate to Proofs (signatures, attestations, zero- knowledge where feasible), multi-party validation, and sub- stantial Stake/insurance. Crucially, transitions between tiers must be enforced by policy: when an agent attempts an ac- tion whose blast radius exceeds its current tier, the platform intercepts and upgrades the required checks before execu- tion. Identity and Briefs as the substrate for discovery and ac- countability. Verifiable identity is not the same as trust- worthiness, but it is a prerequisite for traceability, audit, and remediation. Agents should expose durable identifiers and support verifiable credentials (“Briefs”) that attest to prop- erties such as domain expertise, safety audits, regulatory li- censes, or organizational affiliation. Pseudonyms may be tol- erated at low tiers; high tiers should bind agents to legal en- tities or qualified pseudonyms to enable recourse. This en- courages healthy credential ecosystems (auditors, certifiers, registries) and lets policymakers tie thresholds (e.g., trans- action size) to identity requirements. Hybrid by default, configurable per task. Because ver- ification affordances vary across domains, systems should compose multiple models—Reputation for discovery, Briefs for eligibility, Stake for incentives, Proofs and Constraint for execution guarantees—and allow policy-per-action config- uration. Architecturally this implies modular “trust hooks” (proof verification, reputation lookup, staking and slashing, sandbox provisioning) and declarative policies that choose === PAGE 8 === which hooks to invoke for each operation. A practical maxim follows: start trustless, then relax—assume nothing, then introduce the minimum additional assumption needed to complete the task. Reputation as a layered signal, never a single gate. Rep- utation is invaluable for routing and prioritization but is vul- nerable to collusion, Sybil attacks, and domain shift. Use it to influence ranking, rate limits, and sampling of sec- ondary checks—not to waive core safety guarantees. Design for multi-dimensional reputation (accuracy, responsiveness, security compliance), decay (to reflect inactivity and model updates), transitive weighting (trust of raters), and anomaly detection (to flag correlated or suspicious feedback). Couple reputation drops with automatic escalation (e.g., more fre- quent proofs or audits). Incentive alignment via Stake and insurance. Economic skin-in-the-game should scale with risk. Require agents (or their principals) to post bonds proportional to potential harm; slash objectively when violations occur; and explore insurance pools that underwrite agent performance. Codify slashing conditions ex ante and make them verifiable (e.g., via attested logs or challenge-response protocols). Route slashed funds to affected users or validators to incentivize oversight. Guard against gaming (e.g., orchestrated slash- ing) by grounding penalties in objective, reproducible fault conditions. Hard Constraints and least privilege as non-negotiables for LLM agents. Treat agent inputs as adversarial and outputs as untrusted until checked. Run effectful actions in sandboxes with ephemeral credentials, narrow scopes, and time-bounded permissions; enforce rate limits and cir- cuit breakers; and maintain a separation between planning (LLM) and acting (tool runner with policy guardrails). Mon- itor for policy violations at runtime and automatically quar- antine misbehaving agents or sessions. Constraints are the last line of defense when other trust layers fail. Contextual, domain-specific trust zones. Sectors differ in harm models and legal obligations. Support trust zones with domain-tailored requirements (e.g., healthcare agents must hold clinical credentials, operate under human over- sight, and log to compliant archives; creative or gaming agents can tolerate lighter regimes). Provide gateways for controlled inter-zone interaction, and ensure hooks for regu- latory audit and legal evidence across zones. Continuous monitoring, auditability, and non- accumulating trust. Trust should be earned repeatedly. Maintain append-only action logs and signed receipts; sample outputs for random audits; and re-baseline trust when material conditions change (new model weights, ownership changes, or security incidents). Decay stale credentials, require periodic restaking, and treat major updates as probationary periods with elevated scrutiny. Design guidelines: a tiered blueprint (T0–T3) This tiered blueprint (T0–T3) offers a risk-calibrated, modu- lar framework for systematically applying hybrid trust mod- els across tasks and risk levels in the agentic web. T0 — Low-stakes discovery and read-only use. Enable frictionless interoperability for negligible-impact tasks (e.g., public queries, draft generation). Rely on Claims and avail- able Briefs for discovery; enforce soft constraints (read-only credentials, rate limits, allow-lists). Logging is best-effort; reputation may inform routing but must not gate access. T1 — Moderate stakes with accountability. Permit lim- ited writes or small, reversible payments with explicit at- tribution. Require authenticated, signed intents; narrowly scoped, reversible permissions; durable receipts in secure logs. Use small, refundable bonds and minimal reputation thresholds to lift probation caps; throttle or trigger secondary checks on anomalies. T2 — High stakes with strong assurance. For materi- ally consequential actions, adopt a “verify relentlessly” pos- ture. Re-justify each transaction via TEE attestations, zero- knowledge or interactive proofs, or quorum validation; en- force deny-by-default, fine-grained, time-boxed privileges with continuous monitoring. Calibrate stake/insurance to worst-case loss; maintain immutable audit trails; reserve hu- man review for exception paths. Reputation may rank eligi- ble agents but never substitutes for proofs. T3 — Critical or life-critical with multi-layer over- sight. In safety-critical or ethically sensitive domains, stack all mechanisms: regulatory-grade credentials and insti- tutional accountability; redundant agents or multi-signature approvals; human-in/on-the-loop gating with physical/pro- cedural fail-safes; non-overridable hard limits; comprehen- sive, privacy-preserving observability. Extend liability be- yond agent stakes to organizational insurance; engineer for graceful degradation and rapid intervention. Modularity and invariants. Tiers are composable rather than siloed: moving upward adds evidence, incentives, and containment rather than merely “more of the same.” Two invariants span all tiers: (i) least privilege at the capa- bility boundary (minimal, time-boxed powers), and (ii) evidence-first accountability at the audit boundary (signed state and reproducible logs enabling independent verifica- tion). Adopting this blueprint as a default, with local refine- ments, preserves resilience and accountability under adver- sarial conditions. Conclusion The near future of the agentic web will be shaped by pro- tocols that treat hybrid trust as infrastructure: beginning with verification and containment, layering aligned incen- tives and institutional accountability, and only then exploit- ing social signal to regain efficiency. A tiered, composable, and continuously recalibrated approach enables low-friction exploration where impact is negligible and “verify relent- lessly” where stakes are high, allowing autonomous agents to transact with the assurance we expect of conscientious human institutions. === PAGE 9 === References Braga, D. D. S.; Niemann, M.; Hellingrath, B.; and Neto, F. B. D. L. 2019. Survey on Computational Trust and Reputa- tion Models. ACM Computing Surveys, 51(5): 1–40. Carlsmith, J. 2024. Is Power-Seeking AI an Existential Risk? arXiv:2206.13353. Cherep, M.; Maes, P.; and Singh, N. 2025. LLM Agents Are Hypersensitive to Nudges. arXiv:2505.11584. Freiman, O. 2023. Making Sense of the Conceptual Non- sense ‘Trustworthy AI’. AI and Ethics, 3(4): 1351–1360. Friedman, E. J.; and Resnick, P. 2001. The Social Cost of Cheap Pseudonyms. Journal of Economics & Management Strategy, 10(2): 173–199. Hubinger, E.; Denison, C.; Mu, J.; Lambert, M.; Tong, M.; MacDiarmid, M.; Lanham, T.; Ziegler, D. M.; Maxwell, T.; Cheng, N.; Jermyn, A.; Askell, A.; Radhakrishnan, A.; Anil, C.; Duvenaud, D.; Ganguli, D.; Barez, F.; Clark, J.; Ndousse, K.; Sachan, K.; Sellitto, M.; Sharma, M.; DasSarma, N.; Grosse, R.; Kravec, S.; Bai, Y.; Witten, Z.; Favaro, M.; Brauner, J.; Karnofsky, H.; Christiano, P.; Bowman, S. R.; Graham, L.; Kaplan, J.; Mindermann, S.; Greenblatt, R.; Shlegeris, B.; Schiefer, N.; and Perez, E. 2024. Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training. arXiv:2401.05566. Josang, A.; and Ismail, R. 2002. The beta reputation sys- tem. In Proceedings of the 15th bled electronic commerce conference, volume 5, 2502–2511. Kamvar, S. D.; Schlosser, M. T.; and Garcia-Molina, H. 2003. The Eigentrust Algorithm for Reputation Manage- ment in P2P Networks. In Proceedings of the Twelfth Inter- national Conference on World Wide Web - WWW ’03, 640. Budapest, Hungary: ACM Press. ISBN 978-1-58113-680-7. Liu, Y.; Deng, G.; Li, Y.; Wang, K.; Wang, Z.; Wang, X.; Zhang, T.; Liu, Y.; Wang, H.; Zheng, Y.; and Liu, Y. 2024. Prompt Injection Attack against LLM-integrated Applica- tions. arXiv:2306.05499. Lynch, A.; Wright, B.; Larson, C.; Ritchie, S. J.; Minder- mann, S.; Hubinger, E.; Perez, E.; and Troy, K. 2025. Agen- tic Misalignment: How LLMs Could Be Insider Threats. arXiv:2510.05179. Manzini, A.; Keeling, G.; Marchal, N.; McKee, K. R.; Rieser, V.; and Gabriel, I. 2024. Should Users Trust Ad- vanced AI Assistants? Justified Trust As a Function of Com- petence and Alignment. In The 2024 ACM Conference on Fairness, Accountability, and Transparency, 1174–1186. Rio de Janeiro Brazil: ACM. ISBN 979-8-4007-0450-5. Marsh, S. P. 1994. Formalising trust as a computational con- cept. O’Neill, O. 2002. Autonomy and Trust in Bioethics. Cam- bridge University Press, 1 edition. ISBN 978-0-521-81540-6 978-0-521-89453-1 978-0-511-60625-0. O’Neill, O. 2018. Linking Trust to Trustworthiness. Inter- national Journal of Philosophical Studies, 26(2): 293–300. OpenAI. 2024. Gpt-4o system card. arXiv preprint arXiv:2410.21276. O’Neill, O.; and Bardrick, J. 2015. Trust, trustworthiness and transparency. Brussels: European Foundation Centre. Parikh, S.; and Surapaneni, R. 2025. Powering AI commerce with the new Agent Payments Protocol (AP2). Raskar, R.; Chari, P.; Zinky, J.; Lambe, M.; Grogan, J. J.; Wang, S.; Ranjan, R.; Singhal, R.; Gupta, S.; Lincourt, R.; Bala, R.; Joshi, A.; Singh, A.; Chopra, A.; Stripelis, D.; B, B.; Kumar, S.; and Gorskikh, M. 2025. Beyond DNS: Un- locking the Internet of AI Agents via the NANDA Index and Verified AgentFacts. arXiv:2507.14263. Rettenberger, L.; Reischl, M.; and Schutera, M. 2025. As- sessing Political Bias in Large Language Models. Journal of Computational Social Science, 8(2): 42. Rossi, M. D.; Crapis, D.; Ellis, J.; and Reppel, E. 2025. ERC-8004: Trustless Agents: Discover agents and establish trust through reputation and validation. Sabater, J. 2004. EVALUATING THE ReGreT SYSTEM. Applied Artificial Intelligence, 18(9-10): 797–813. Sharma, M.; Tong, M.; Korbak, T.; Duvenaud, D.; Askell, A.; Bowman, S. R.; Cheng, N.; Durmus, E.; Hatfield-Dodds, Z.; Johnston, S. R.; Kravec, S.; Maxwell, T.; McCandlish, S.; Ndousse, K.; Rausch, O.; Schiefer, N.; Yan, D.; Zhang, M.; and Perez, E. 2025. Towards Understanding Sycophancy in Language Models. arXiv:2310.13548. Shen, T.; Jin, R.; Huang, Y.; Liu, C.; Dong, W.; Guo, Z.; Wu, X.; Liu, Y.; and Xiong, D. 2023. Large Language Model Alignment: A Survey. arXiv:2309.15025. Surapaneni, R.; Jha, M.; Vakoc, M.; and Segal, T. 2025. An- nouncing the Agent2Agent Protocol (A2A). Teacy, W. T. L.; Patel, J.; Jennings, N. R.; and Luck, M. 2006. TRAVOS: Trust and Reputation in the Context of Inaccurate Information Sources. Autonomous Agents and Multi-Agent Systems, 12(2): 183–198. Xu, Z.; Jain, S.; and Kankanhalli, M. 2025. Hallucination Is Inevitable: An Innate Limitation of Large Language Models. arXiv:2401.11817. Yang, Y.; Ma, M.; Huang, Y.; Chai, H.; Gong, C.; Geng, H.; Zhou, Y.; Wen, Y.; Fang, M.; Chen, M.; Gu, S.; Jin, M.; Spanos, C.; Yang, Y.; Abbeel, P.; Song, D.; Zhang, W.; and Wang, J. 2025. Agentic Web: Weaving the Next Web with AI Agents. arXiv:2507.21206. Disclosure of the usage of LLM We used ChatGPT (GPT5 model (OpenAI 2024)) to facili- tate the writing of this manuscript. The usage includes: • Turn Excel format tables into LaTeX format tables • Correct grammar mistakes and spelling • Deep research for various protocols • Polish the existing writing