=== PAGE 1 === SoK: Security of Autonomous LLM Agents in Agentic Commerce Qian’ang Mao∗, Jiaxin Wang∗, Ya Liu∗, Li Zhu∗, Cong Ma†‡, Jiaqi Yan∗ ∗Nanjing University †Southern University of Science and Technology ‡City University of Hong Kong Abstract—Autonomous large language model (LLM) agents such as OpenClaw are pushing agentic commerce from human- supervised assistance toward machine actors that can negoti- ate, purchase services, manage digital assets, and execute trans- actions across on-chain and off-chain environments. Protocols such as the Trustless Agents standard (ERC-8004), Agent Pay- ments Protocol (AP2), the HTTP 402-based payment protocol (x402), Agent Commerce Protocol (ACP), the Agentic Com- merce standard (ERC-8183), and Machine Payments Protocol (MPP) enable this transition, but they also create an attack surface that existing security frameworks do not capture well. This Systematization of Knowledge (SoK) develops a unified security framework for autonomous LLM agents in commerce and finance. We organize threats along five dimensions: agent integrity, transaction authorization, inter-agent trust, market manipulation, and regulatory compliance. From a systematically curated public corpus of academic papers, protocol documents, industry reports, and incident evidence, we derive 12 cross- layer attack vectors and show how failures propagate from reasoning and tooling layers into custody, settlement, market harm, and compliance exposure. We then propose a layered defense architecture addressing authorization gaps left by current agent-payment protocols. Overall, our analysis shows that securing agentic commerce is inherently a cross-layer problem that requires coordinated controls across LLM safety, protocol design, identity, market structure, and regulation. We conclude with a research roadmap and a benchmark agenda for secure autonomous commerce. Index Terms—Large Language Models, Autonomous Agents, Financial Security, Agentic Commerce, Blockchain, Prompt Injection, Machine-to-Machine Payments 1. Introduction The financial industry has long been at the forefront of adopting computational automation, from early algo- rithmic trading systems [1], [2] to modern high-frequency trading platforms. However, a new paradigm is emerging that fundamentally alters the relationship between artificial intelligence (AI) and financial decision-making: fully au- tonomous large language model (LLM)-based agents that operate without continuous human oversight, control their own digital wallets or payment credentials, and execute financial transactions independently [3]–[5]. Unlike traditional automated trading systems that follow pre-programmed rules, these agents leverage the reasoning, planning, and natural language understanding capabilities of large language models to interpret market conditions, nego- tiate with counterparties (including other agents), and adapt their strategies in real time [6]–[8]. Projects such as Open- Claw [3] (formerly Clawdbot) exemplify this trend, pro- viding open-source frameworks for deploying LLM agents that can autonomously manage cryptocurrency portfolios, execute decentralized finance (DeFi) trades, and interact with smart contracts on Ethereum and other blockchains [4], [9]. This shift toward full autonomy is accelerated by the emergence of machine-to-machine payment protocols. Ethereum’s Trustless Agents standard (ERC-8004) enables agents to hold and transfer tokens through standardized smart contract interfaces [10]. The Agent Payments Protocol (AP2) provides a framework for authenticated, verifiable payments between autonomous agents [11]. The HTTP 402- based payment protocol (x402) embeds payment capabilities directly into HTTP requests, enabling agents to pay for API calls, data feeds, and computational resources without human authorization [10]. Tempo’s deployment of the Ma- chine Payments Protocol (MPP) extends this model with a rail-agnostic challenge–credential–receipt flow over HTTP 402, supporting both one-time charges and session-based pay-as-you-go channels for APIs, Model Context Protocol (MCP) tools, and streamed services [12], [13]. Together, these protocols form the infrastructure of an emerging agen- tic economy [14], [15] in which billions of dollars may flow through agent-mediated channels with minimal human oversight. Despite the rapid growth of this ecosystem, security re- search remains fragmented across several disconnected com- munities. The LLM security community focuses on prompt injection, jailbreaking, and alignment [16], but often treats financial applications as merely another use case without ac- counting for the unique properties of financial systems (irre- versibility of transactions, regulatory requirements, systemic risk). The blockchain security community addresses smart contract vulnerabilities and DeFi exploits [17] but has yet to grapple with the implications of LLM-controlled digital wallets and payment credentials. The financial technology (FinTech) research community examines AI-driven trading strategies and investment management applications [18]– arXiv:2604.15367v1 [cs.CR] 15 Apr 2026 === PAGE 2 === [20] but largely assumes human oversight as a given. The multi-agent systems community has a rich history of study- ing agent-mediated commerce [21]–[23] but its frameworks predate the capabilities and vulnerabilities of LLM-based agents. This fragmentation creates dangerous blind spots. An autonomous financial agent is simultaneously an LLM (vul- nerable to prompt injection), a blockchain or payment- network actor (subject to settlement and execution risks), a financial intermediary (bound by regulatory requirements), and a participant in a multi-agent ecosystem (susceptible to strategic manipulation by other agents). No existing security framework addresses this full stack of concerns. A com- promise at any layer, whether it is a prompt injection that triggers an unauthorized trade, a malicious integration that drains an agent’s digital wallet, or a coordinated attack by adversarial agents that manipulates market prices, can have cascading consequences that propagate across layers [17]. This paper makes the following contributions: • Unified Threat Taxonomy. We present a comprehensive taxonomy of security threats specific to autonomous LLM agents in financial automation, organized across five di- mensions: agent integrity, transaction authorization, inter- agent trust, market manipulation, and regulatory compli- ance (§4). • Cross-Layer Analysis. We analyze how vulnerabilities at one layer (e.g., prompt injection at the LLM layer) propagate into harm at another (e.g., unauthorized token transfers at the blockchain layer), and we characterize 12 cross-layer attack vectors together with their adversary preconditions and mitigations (§5). • Protocol Security Assessment. We assess emerging agent-payment protocols (ERC-8004, AP2, x402, MPP) from the perspective of autonomous deployment, identi- fying protocol-level weaknesses that are manageable in human-operated settings but dangerous in autonomous ones (§4). • Defense Framework. We propose a compact layered defense architecture spanning prompt hardening, payment authorization, tool provenance, decentralized identity, and market-level safeguards, and we relate its coverage to the threat taxonomy (§5.4). • Corpus-Grounded Synthesis. We assemble and analyze a systematically curated public corpus spanning academic papers, protocol documents, industry reports, and incident evidence, and use it to ground the threat taxonomy, pro- tocol assessment, and comparative analysis. 2. Background and Terminology 2.1. Defining Autonomous Financial Agents The term “AI agent” has been applied to systems rang- ing from simple chatbots to sophisticated multi-step plan- ners [24], [25]. We adopt a precise definition tailored to the financial domain: An autonomous financial agent is a software sys- tem powered by one or more large language mod- els that (1) maintains persistent state including financial assets and payment instruments such as digital wallets, accounts, or delegated payment credentials, (2) independently plans and executes financial transactions, (3) operates without requir- ing per-transaction human approval, and (4) inter- acts with external systems including blockchains, payment networks, exchanges, and other agents. This definition intentionally uses digital wallets as an umbrella term that includes on-chain wallets, custodial stored-value accounts, and delegated payment credentials rather than only crypto-native custody. This definition excludes AI-assisted trading tools that require human confirmation (which we term co-pilot sys- tems [20], [26]), traditional algorithmic trading bots that fol- low fixed rules (which we term programmatic traders [1]), and LLM-based chatbots that provide financial advice but cannot execute transactions (which we term advisory agents [20], [27]). 2.2. Key Systems and Frameworks 2.2.1. OpenClaw and Clawdbot. OpenClaw [3] (formerly known as Clawdbot) is an open-source framework that enables the deployment of fully autonomous LLM agents with blockchain-based digital wallet capabilities. It pro- vides a modular architecture where LLM reasoning is con- nected to blockchain transaction execution through a plugin system. OpenClaw exemplifies the “agent-as-wallet-holder” paradigm in which the LLM directly controls private keys or has delegated signing authority [5], [17]. 2.2.2. ERC-8004. ERC-8004 is an Ethereum standard that defines a smart contract interface for agent-controlled token operations [10]. Unlike traditional ERC-20 token transfers that assume a human signer, ERC-8004 introduces agent identity verification, spending limits, and revocation mech- anisms designed for machine-to-machine interactions. The standard enables smart contracts to distinguish between human-initiated and agent-initiated transactions and apply different authorization policies accordingly. 2.2.3. Agent Payments Protocol (AP2). AP2 [11] is a protocol layer built on top of existing payment rails (both blockchain and traditional) that provides standardized mech- anisms for agent-to-agent payment negotiation, execution, and settlement. AP2 introduces the concept of payment intents, which are machine-readable descriptions of desired payment outcomes that agents can negotiate over before committing to a transaction. 2.2.4. Virtuals Protocol and Agent Commerce Protocol (ACP-Commerce). Terminology note. We disambiguate three overloaded acronyms throughout this paper. ACP as used in this paper refers exclusively to the Agent Commerce Protocol by Virtuals Protocol [28], which is a settlement === PAGE 3 === and escrow coordination layer (we call this ACP-Commerce when disambiguation is needed). Unrelated concurrent work uses “ACP” for an Agent Control Protocol, which is a deterministic pre-action authorization fabric [29], and we denote that protocol as ACP-Control. Similarly, PDR in this paper means Payment Delivery Receipt (a post-settlement cryptographic proof of payment completion, as formalized in [30]); unrelated literature uses PDR for Policy Decision Record. We use “PDR” exclusively in the payment-delivery sense throughout. Virtuals Protocol [31] is a decentralized infrastructure platform built on Base (Ethereum Layer 2) that enables the creation, co-ownership, tokenization, and monetiza- tion of autonomous AI agents. Its cognitive engine, the GAME (Generative Agents with Modular Execution) frame- work [32], provides a modular decision-making architecture separating task planning from execution. Central to Virtuals is the Agent Commerce Protocol (ACP) [28], a standardized coordination and settlement layer for agent-to-agent commerce. ACP operates in four phases: (1) negotiation, where agents agree on terms and produce a cryptographically signed Proof of Agreement; (2) transac- tion, where payments and deliverables are held in escrow; (3) evaluation, where specialized evaluator agents assess whether deliverables meet terms; and (4) settlement, where funds are released or returned based on evaluation. This pro- tocol introduces a novel trust primitive, namely the evaluator agent, that enables trust in subjective or non-deterministic tasks but simultaneously introduces a new attack surface if the evaluator itself is compromised. 2.2.5. ERC-8183: Agentic Commerce Standard. Building on ACP’s operational experience, Crapis et al. proposed ERC-8183 [33] in March 2026 as an Ethereum standard for trustless commercial transactions between AI agents. ERC-8183 defines a core “Job” primitive with three roles (Client, Provider, Evaluator) and a state machine (Open →Funded →Submitted →Terminal). The standard is extensible via hooks, which are optional smart contracts for custom logic such as milestone payments, bidding, and rep- utation updates, and integrates with ERC-8004 for portable on-chain reputation. The proposal was motivated by what the ERC-8183 authors describe as over $3M in agent-to- agent transactions observed on the Virtuals/ACP platform without any escrow or verification mechanism [33], a figure that is unaudited but directionally indicative of the scale of unprotected commerce. 2.2.6. x402 Protocol. The x402 protocol, initiated by Coin- base and analyzed in the agentic-commerce context by [10], embeds payment capabilities into the HTTP protocol itself. When an agent makes an HTTP request to a resource that requires payment, the server responds with a 402 Payment Required status code along with machine-readable payment instructions. The agent can then autonomously fulfill the payment and retry the request. This protocol enables seam- less pay-per-use access to APIs, data feeds, and computa- tional services without pre-established billing relationships. 2.2.7. Tempo and Machine Payments Protocol (MPP). Tempo is a payments-first blockchain optimized for low- cost stablecoin settlement and inline machine payments [12]. On top of Tempo, the Machine Payments Protocol (MPP) is an open standard co-authored by Stripe and Tempo that standardizes a challenge–credential–receipt flow over HTTP 402 and extends it to MCP transports [13], [34]. On Tempo, MPP supports both charge intents for one-time payments and session intents that open escrow-backed channels and use off-chain signed vouchers for near-zero-latency pay-as- you-go billing [12], [13]. This design makes MPP espe- cially relevant for monetized APIs, MCP tool invocations, and streamed AI services. Adjacent protocol proposals are already exploring privacy-preserving settlement and explicit human-override semantics for agent commerce, as illustrated by AESP [35]. At the implementation layer, these payment flows also intersect with lower-level signing primitives such as typed structured-data signing for wallet- or credential- bound intents and signed HTTP request binding [36], [37]. These contemporary protocols extend a much older line of agent-mediated payment research, including secure dele- gated payment schemes for software and mobile agents [38], [39]. What is new in the current setting is not autonomous payment itself, but the combination of autonomous payment with open-ended LLM reasoning, untrusted tool use, and natural-language attack surfaces. 2.2.8. Model Context Protocol (MCP). MCP [40], [41] is an open protocol that standardizes how LLM agents interact with external tools, data sources, and services. In the financial context, MCP serves as the primary interface through which agents access market data, execute trades, and invoke smart contract functions. MCP’s security properties, or the lack thereof, directly impact the security of financial operations conducted through it. 2.2.9. Protocol Deployment Status and Maturity. We explicitly qualify the deployment status of the agent pay- ment protocols analyzed in this SoK, as of early 2026, to distinguish deployed behavior from proposed features. ERC-8004 and ERC-8183 are Ethereum Improvement Pro- posals in draft/community review status with limited on- chain deployment. AP2 is a research proposal [11] with no widely adopted reference implementation. x402 has early adopter deployment by Coinbase and a growing ecosystem of MCP-compatible payment middleware, but is not yet standardized. MPP is co-authored by Stripe and Tempo and has a live Tempo mainnet deployment with documented API support [12], [13]; it is the most operationally mature of the group. ACP/ERC-8183 is deployed on Virtuals Protocol’s platform but remains Virtuals-specific. Our security analysis throughout the paper covers both deployed behavior (where independently verifiable) and proposed features (where ex- plicitly noted). Claims about security properties of deployed behavior are grounded in protocol specifications and public chain data; claims about proposed features are explicitly speculative. Protocol maturity is discussed comparatively in §5. === PAGE 4 === 2.3. Levels of Agent Autonomy Drawing on prior discussions of agent autonomy and principal-agent dynamics [42], [43], we distinguish four levels of agent autonomy in financial operations: • Level 0 (Advisory): The agent analyzes data and provides recommendations; all actions are taken by humans [20], [27]. • Level 1 (Supervised): The agent can propose and exe- cute pre-approved transaction types within strict limits; humans approve exceptions [6], [9], [26]. • Level 2 (Delegated): The agent independently executes a broad range of transactions within policy constraints; humans review periodically [6]. • Level 3 (Fully Autonomous): The agent independently manages a portfolio or financial operation with no per- transaction human oversight; humans set high-level goals and constraints only [3], [4]. This paper primarily concerns the security challenges of Level 2 and Level 3 agents, as these levels introduce qual- itatively new risks that do not exist in supervised settings. 2.4. Scope and Boundaries Our analysis focuses on security threats that arise specif- ically from the intersection of agent autonomy and financial operations. We deliberately exclude: • Generic LLM vulnerabilities (e.g., hallucination, bias) except where they have specific financial security impli- cations. • Traditional financial risks (e.g., market risk, credit risk) except where agent autonomy fundamentally changes their character. • Regulatory compliance in isolation except where au- tonomous agent behavior creates novel compliance chal- lenges. Our primary focus is blockchain-based and API-based agentic commerce (MCP, x402, MPP, ACP), reflecting the current frontier of autonomous agent deployment with real- asset exposure. Traditional payment rails and cross-chain operations are noted as important but out-of-scope exten- sions and are summarized briefly in §5.5. 3. Methodology 3.1. Literature Collection Our systematization draws on literature from five inter- secting research communities: (1) LLM security and align- ment, (2) autonomous agent architectures, (3) blockchain and DeFi security, (4) financial technology and algorithmic trading, and (5) multi-agent systems and mechanism design. We searched Google Scholar and Web of Science us- ing 23 phrases spanning agentic-commerce core terms, autonomous payments and payment protocols, delega- tion and authorization, Model Context Protocol security, prompt injection and agent security, Web3 custody, financial LLMs, autonomous trading, and historical agent-mediated e- commerce. Because protocol specifications, regulatory ma- terials, industry reports, and implementation documents cen- tral to agentic-commerce security are unevenly indexed in scholarly databases, we supplemented the database search with backward snowballing and targeted inclusion of these non-traditional sources. 3.2. Selection Criteria We included works that directly address autonomous financial agents, agent-payment or settlement protocols, at- tacks and defenses relevant to autonomous execution, em- pirical evidence of agent behavior in financial settings, or theoretical foundations for trust, identity, and authorization. We excluded papers on generic LLM capability without financial-security relevance, traditional algorithmic trading without agent autonomy, and non-technical policy discus- sion without concrete security content. We also retained foundational multi-agent commerce work where it directly informs today’s agentic-commerce threat model [21], [22], [44], [45]. 3.3. Selection Process and Evidence Base We followed a PRISMA-style selection flow. The database stage yielded 1,373 records, which were reduced to 1,237 candidates after DOI- and title-level deduplication. We then screened titles, abstracts, and full texts against the in- clusion and exclusion criteria. Database retrieval contributed 37 works to the current public corpus, 105 additional works were retained through backward snowballing and targeted inclusion of protocol documents, regulatory texts, industry reports, implementation notes, and other poorly indexed materials, and the remaining 1,192 database-originated can- didates were screened out. The resulting public corpus forms the evidentiary basis for the analysis in this paper. For the released 30-row blinded replication set, two independent coders assigned source and target layers. On the 17 rows where both coders provided non-empty source and target labels, agreement was κ = 0.850 for source-layer labels, κ = 0.833 for target-layer labels, and κ = 0.871 for the joint ordered (source, target) pair. Table 1 lists the currently released per-vector supporting works in the public corpus, distinguishing direct instantia- tion (confirmed incident, PoC, or experimental demonstra- tion) from derived support (used as conceptual precursor or cross-paper synthesis support in the released mapping). Vectors with fewer released direct-support papers (R2I, M2A, C2E) are more speculative; we mark these explicitly throughout §5 and flag them as priorities for future empirical work. 3.4. Cross-Layer Vector Derivation The 12 cross-layer attack vectors in §5 were derived through a structured three-step process. Step 1 (Layer iden- tification): we enumerated the main layers of autonomous === PAGE 5 === TABLE 1. PER-VECTOR RELEASED SUPPORTING WORKS IN THE CURRENT PUBLIC CORPUS. Vector Name Direct Derived P2T Prompt-to-Transaction Greshake et al. [16]; Acharya [17]; Nieper-Wisskirchen et al. [46] No released derived-support evidence in the current snapshot. T2R Tool-to-Reasoning Model Context Protocol [41]; Maloyan and Namiot [47]; Zhang et al. [48] No released derived-support evidence in the current snapshot. A2M Agent-to-Market Allouah et al. [49]; Kapoor et al. [50] Liu et al. [11]; Chung and Honavar [51]; de Paula et al. [52]; Cai et al. [53] T2T Tool-to-Transaction Acharya [17]; Shittu [54]; Ruan et al. [55] Deng et al. [56] P2K Prompt-to-Key Acharya [17] Steinberger [3]; Luo et al. [4]; Rizinski and Trajanov [5] C2E Collusion-to-Escrow Virtuals Protocol [28]; Crapis et al. [33] Liu et al. [11]; Yu et al. [57]; de Witt [58]; Hu and Rong [59] O2P Oracle-to-Position Moreno [60]; Assis et al. [61] Nabar and Shroff [62]; Kim et al. [63] N2C Neg-to-Compliance Faysal et al. [64] Liu et al. [11]; Allouah et al. [49]; Hornuf et al. [65] I2M Identity-to-Market Xu et al. [66] No released derived-support evidence in the current snapshot. S2I Supply-to-Integrity Model Context Protocol [41]; Ruan et al. [55] Maloyan and Namiot [47]; Zhang et al. [48] M2A Model-to- Authorization Hirano et al. [67] Zhu et al. [68]; Banerjee et al. [69]; Konstantinidis et al. [70] R2I Reg-to-Integrity Faysal et al. [64] Shukanayev [71]; Hornuf et al. [65]; Bain and Subirana [72] financial agent systems, including reasoning, tools, custody, inter-agent protocols, settlement, oracles, identity, and com- pliance. Step 2 (Pairwise analysis): for each ordered layer pair (Li, Lj) we asked whether compromise at Li could induce harm at Lj while bypassing Lj’s own defenses. Step 3 (Consolidation): attack paths with direct evidence were retained, while recurring but previously unnamed paths were generalized into cross-layer vectors. Historically, this derivation also benefits from older agent-commerce literature that predates LLMs but al- ready exposed the relevant trust, delegation, and protocol- design problems. Early platform-mediated agents and web-commerce systems established the centralized trust model [73], [74]; later multi-agent finance work explored negotiation, contract-net coordination, and reputation attri- bution in ways that remain directly relevant to evaluator- mediated and protocol-mediated commerce today [75]–[77]. Programmatic-agent research also made the trade-off be- tween predictability and flexibility explicit well before mod- ern LLM agents [78]–[82]. 3.5. Analysis Dimensions Our review uses two complementary coding layers. First, each retained work is assigned to a primary topical bucket, with an optional secondary bucket when a work substantively bridges communities: C0 = background, legal, benchmark, and framing sources that inform the review but do not fit cleanly into one of the five technical communities; C1 = LLM security and alignment; C2 = autonomous agent architectures; C3 = blockchain and DeFi security; C4 = financial technology and algorithmic trading; and C5 = multi-agent systems and mechanism design. Second, the SoK synthesis itself is organized around the paper’s five security dimensions: D1 = agent integrity, D2 = transaction authorization, D3 = inter-agent trust, D4 = market manip- ulation, and D5 = regulatory compliance. Each retained work receives one primary synthesis-dimension assignment and, when the work materially spans multiple parts of the framework, additional dimension flags. Our final public corpus spans 1994–2026. We use top- ical buckets (C0–C5) to track which research communities each source comes from, and synthesis dimensions (D1– D5) to organize how each source contributes to the security framework. 3.6. Positioning Against Related LLM-Agent Secu- rity Surveys Recent LLM-agent surveys cover prompt injection, tool misuse, multi-agent trust, and runtime control in domain- general settings [57], [58], [83]. Adjacent survey and systems work spans commerce-oriented agentic AI adop- tion [84], [85], zero-trust and cross-domain agent secu- rity [86], [87], communication- and protocol-layer defenses for agent networks [88], [89], IAM or trust-fabric style authorization layers [90], [91], and architectural views of an on-chain agent economy [92]. Our scope is narrower but operationally deeper: we focus on financial irreversibility, custody, settlement integrity, evaluator-mediated commerce, market manipulation, and compliance exposure, and we connect these threats directly to deployed or emerging agent- payment protocols such as ERC-8004, AP2, x402, MPP, and ACP. This finance-specific framing also changes how other- wise generic controls are interpreted. Safety benchmarks such as RiskyBench and quit-style loss-limiting behavior map naturally to authorization and circuit-breaker questions in financial agents [93], [94]. Deterministic pre-action policy engines such as ACP-Control and OAP can gate tool use be- fore execution [29], [95], while inter-agent trust taxonomies and wallet- and credential-security analyses clarify which assumptions belong to identity, stake, proof, or custody rather than to the LLM alone [59], [96]. === PAGE 6 === D1: Agent Integrity D2: Transaction Authorization D4: Market Manipulation D5: Regulatory Compliance Model poisoning Prompt injection Tool compromise Key management Spending bounds Intent verification Agent-driven manipulation Adversarial herding Flash attacks KYC/AML gaps Liability attribution Audit trails D3: Inter-Agent Trust Identity spoofing Collusion Protocol manipulation Figure 1. Five-dimensional threat taxonomy for autonomous financial agents. 4. Systematization Framework We propose a five-dimensional threat taxonomy for au- tonomous financial agents, illustrated in Figure 1. Each dimension captures a distinct category of security concern that arises from the intersection of LLM-based autonomy and financial operations. 4.1. Dimension 1: Agent Integrity Agent integrity concerns whether the agent’s decision- making process has been compromised, causing it to deviate from its intended financial objectives. This dimension is unique to LLM-based agents because the natural language interface that enables their flexibility also creates attack surfaces that do not exist in traditional automated systems. 4.1.1. Prompt Injection Attacks. Prompt injection remains the most direct threat to agent integrity [16], [46], [97]. Recent work shows that these attacks can be optimized automatically at small data budgets and can still be difficult to reconstruct cleanly after the fact when investigating com- promised agent workflows [97], [98]. In financial contexts, prompt injection takes on heightened severity because suc- cessful attacks can trigger irreversible financial transactions. We identify three categories of prompt injection specific to financial agents: Direct injection via data feeds. Financial agents consume market data, news feeds, and social media signals as inputs to their decision-making. An adversary can embed malicious instructions in these data sources. For example, a crafted news article or social media post containing hidden prompt injection text could instruct an agent to execute a specific trade [16]. Unlike generic prompt injection, financial in- jection can be economically motivated because the attacker profits from the manipulated trade. Injection through smart contract metadata. On-chain data, including token names, contract descriptions, and transaction memos, can carry prompt injection payloads. When an agent reads on-chain data to inform its decisions, these payloads can alter its behavior [17]. The Ethereum naming system (ENS) and token metadata fields are partic- ularly vulnerable vectors. Injection via inter-agent communication. In multi-agent settings, one agent’s output becomes another agent’s input. A compromised or adversarial agent can embed prompt injection attacks in its negotiation messages, trade proposals, or status updates [49], [99]. This creates the possibility of cascading compromises where a single compromised agent subverts an entire network of agents. 4.1.2. Model Poisoning and Backdoors. Financial agents that undergo fine-tuning on financial data [68]–[70] are vulnerable to data poisoning attacks. An adversary who can influence the training data can embed backdoor triggers that cause specific financial behaviors (e.g., always buying a particular token when a certain phrase appears in market data). The challenge is amplified by the opacity of LLM decision-making: a poisoned agent may perform normally on most inputs while executing adversarial trades on trigger inputs [67]. 4.1.3. Tool and Plugin Compromise. Autonomous finan- cial agents rely on external tools accessed through proto- cols like MCP [40], [41] for market data, trade execution, and portfolio analysis. A compromised tool can feed the agent false information (e.g., incorrect prices) or execute transactions that differ from what the agent requested. The security of the agent is therefore bounded by the security of its weakest tool integration [17], [41]. Recent defense proposals therefore attempt to attribute tool invocations back to the causally responsible prompt context so that suspicious calls can be blocked before execution [100]. 4.1.4. Memory Injection and Persistent Store Poison- ing. Long-running financial agents maintain persistent mem- ory stores, such as vector databases, key-value caches, or episodic memory logs, that inform future reasoning. Memory injection attacks corrupt this persistent state to influence the agent’s future decisions without requiring continuous prompt injection [83], [101]–[103]. An adversary who can write to a memory store (e.g., by submitting a malicious transaction whose metadata is indexed, or by injecting con- tent that the agent retrieves via RAG) can implant false “learned experiences” that bias the agent toward adversary- preferred trades in future sessions. Unlike prompt injection, which acts in the current context window, memory injec- tion persists across sessions and may be difficult to detect because the agent’s behavior changes gradually. We extend the D1 taxonomy to include three memory- specific attack sub-types: (1) Write-path poisoning, in which an adversary injects adversarial content through any channel that the agent records to memory (on-chain data inges- tion, inter-agent messages, tool outputs); mitigation requires write-access controls that restrict which data sources can populate the agent’s long-term memory, ideally enforced by content provenance tags analogous to inter-agent origin tagging (see §5); (2) Retrieval manipulation, in which adver- sarially crafted embeddings can cause malicious memories === PAGE 7 === to be preferentially retrieved for target query contexts if the agent uses semantic search (e.g., a vector database) for memory retrieval; mitigation requires adversarial-robustness testing of the embedding model and retrieval pipeline; (3) Memory staleness exploitation, in which out-of-date memories can cause agents to act on superseded policy parameters or market conditions; mitigation requires TTL- bounded memory entries with mandatory refresh for high- value decisions. Defense: cryptographic provenance tags on all memory writes recording the source, timestamp, and a hash of the originating context; memory verifiers that reject writes from untrusted sources; periodic memory audits com- paring stored experiences against independently verifiable on-chain records. Recent multi-agent designs also explore privacy-preserving shared-memory layers with explicit trust weighting as a mitigation direction, rather than treating memory as an opaque global scratchpad [104]. 4.2. Dimension 2: Transaction Authorization Transaction authorization concerns whether the agent’s financial actions are properly bounded and verifiable. This dimension addresses the fundamental question: how do we ensure that an autonomous agent only executes transactions it is authorized to perform? 4.2.1. Credential and Key Management for Autonomous Agents. When an agent holds digital-wallet credentials, payment account secrets, or cryptographic private keys, as in OpenClaw’s architecture [3], credential security becomes critical. Traditional account and key management assumes a human user who can recognize and resist social engineering attacks. An LLM agent, by contrast, can be prompt-injected into revealing or misusing credentials [17]. Multi-signature schemes, scoped API tokens, and hardware security modules (HSMs) can limit exposure, but they introduce latency that may be incompatible with high-frequency financial opera- tions. 4.2.2. Spending Policies and Bounds. ERC-8004 [10] introduces on-chain spending limits and allowance mech- anisms for agent-controlled wallets. However, the gran- ularity of these policies is a design challenge. Overly permissive policies enable unauthorized large transactions; overly restrictive policies impede legitimate agent opera- tions. Prior work on agent authorization in enterprise sys- tems [105], [106] provides relevant frameworks, but these must be adapted for the adversarial environment of pub- lic blockchains. Foundational capability-oriented delegation mechanisms, from macaroons and object-capability designs to newer semantic task-to-scope matching and workflow- scoped agent credentials, point to the same design rule: agent permissions should be attenuated, contextual, and bound to a narrow task rather than to a standing broad wallet or account authority [107]–[110]. 4.2.3. Intent Verification. A key challenge in agent- mediated finance is verifying that the agent’s intent, as formed by LLM reasoning, matches the action, as encoded in a payment instruction or blockchain transaction [17]. An agent may reason correctly about a trade but produce an incorrect transaction due to encoding errors, unit conversion mistakes, or manipulation of intermediate representations. AP2’s payment intent mechanism [11] partially addresses this by separating intent declaration from execution, en- abling pre-execution verification. MPP adds a complemen- tary transport-layer safeguard: servers can bind a payment challenge to a specific request body using content digests, reducing the risk that a valid credential is replayed against a modified API call [13]. 4.3. Dimension 3: Inter-Agent Trust As the agentic economy grows, agents increasingly in- teract with other agents rather than with humans [14], [15], [28]. This introduces trust challenges that have no direct parallel in human-operated systems. 4.3.1. Agent Identity and Authentication. In a multi- agent marketplace, how does one agent verify the identity and trustworthiness of another? Current systems often rely on cryptographic signatures or account-bound credentials, especially wallet addresses in on-chain settings, but these do not convey information about the agent’s capabilities, authorization level, or operating policies [17], [71]. An ad- versarial agent can create multiple identities (Sybil attack) to manipulate reputation systems or conduct wash trading [66]. 4.3.2. Negotiation Integrity. Agent-to-agent negotiation, which is a core function in agentic commerce [45], [51], [52], [111], is vulnerable to manipulation when one party is an LLM. Traditional negotiation protocols assume rational, self-interested actors; LLM agents may be susceptible to persuasion, anchoring effects, or adversarial framing that ex- ploits their language understanding [49], [50]. An adversary can craft negotiation messages that exploit LLM biases to obtain unfavorable terms for the victim agent. 4.3.3. Collusion and Coordination Attacks. Multiple ad- versarial agents can coordinate to manipulate market prices, conduct pump-and-dump schemes, or corner markets [60], [112]. The ability of LLM agents to communicate in natural language makes collusion harder to detect than in traditional algorithmic settings, where coordination requires explicit protocol-level mechanisms. Multi-agent systems research has studied coalition formation [111], [113] but not in the context of LLM-driven strategic behavior. 4.3.4. Cascading Compromise via Prompt Infection. A distinctive threat in LLM-based multi-agent networks is self-replicating prompt injection [114], where a single compromised agent embeds adversarial instructions in its outgoing messages that cause recipient agents to replicate the injection onward. Prompt Infection [114] demonstrates experimentally that such infections can propagate through multi-agent networks from a single entry point, analogous === PAGE 8 === to biological contagion. In a financial agent network, this mechanism could disseminate malicious trading instructions across the network before any single agent’s safety checks trigger, causing coordinated unauthorized transactions at scale. We therefore recommend incorporating inter-agent message origin tagging and stricter sanitization of agent- sourced content as baseline security requirements for multi- agent financial deployments. 4.4. Dimension 4: Market Manipulation Autonomous agents introduce novel market manipula- tion risks that extend beyond traditional concerns about algorithmic manipulation. 4.4.1. Agent-Driven Market Manipulation. An adversary who controls one or more autonomous agents can use them to conduct market manipulation at machine speed [9], [53], [115]. Unlike human manipulators, agent-based schemes can operate continuously, adapt to market responses in real time, and coordinate across multiple markets simultaneously. Re- cent evidence also suggests that agentic trading systems can drift into manipulation-like behavior even when profit max- imization, rather than explicit market abuse, is the nominal objective [115]. The combination of LLM reasoning (for strategy adaptation) and automated execution (for speed) creates a more potent manipulation capability than either alone. 4.4.2. Adversarial Herding. Because many LLM agents are built on similar foundation models and trained on over- lapping data, they may exhibit correlated behavior, a form of “herding” that can amplify market movements [62]. An ad- versary who understands these correlations can craft market signals (e.g., fake news, manipulated sentiment indicators) designed to trigger correlated responses across multiple agents, causing flash crashes or artificial price spikes [63], [67]. A critical property of adversarial herding is that per- agent authorization controls cannot prevent it: each agent’s individual action may be within its authorized scope, yet the aggregate effect is harmful. Mitigations must there- fore operate at the market and regulatory levels, including portfolio-level circuit breakers, model diversity mandates for large agent fleets, and market-wide circuit breakers at the exchange or protocol level [62]. 4.4.3. Sandwich Attacks on Agent Transactions. In DeFi settings, autonomous agents that broadcast pending trans- actions are vulnerable to sandwich attacks, where an ad- versary front-runs the agent’s transaction to manipulate the price and then back-runs to profit from the price im- pact [61]. While sandwich attacks exist in traditional DeFi, autonomous agents are particularly vulnerable because they may lack the real-time monitoring and evasive capabilities of specialized MEV (Maximal Extractable Value) bots. Es- tablished DeFi-native mitigations that autonomous agents can adopt include: private order flow via MEV-protected RPC endpoints (e.g., Flashbots Protect, MEV Blocker) that route transactions directly to block builders without public mempool exposure; orderflow auctions (OFAs) that allow agents to auction exclusive transaction rights to searchers who return MEV rebates rather than extracting them ad- versarially; and slippage-bounded transactions that set tight maximum acceptable price impact, causing the transaction to revert if a sandwich attack inflates cost beyond the bound. For autonomous agents, the key implementation challenge is incorporating these defenses into the agent’s transaction submission pipeline without introducing latency that de- grades strategy performance. We provide concrete trade-off guidance: private RPC endpoints (Flashbots Protect, MEV Blocker) add zero submission latency versus public RPC but introduce a routing delay of 50–200 ms to the next block inclusion due to builder propagation; for strategies where execution timing within a block is non-critical (e.g., DCA, rebalancing), this is acceptable. OFAs add a 300– 800 ms auction window before the transaction is committed to a builder, making them unsuitable for latency-sensitive arbitrage but appropriate for large, price-impact-sensitive trades where MEV rebates offset the latency cost. Slippage bounds add zero latency but require careful calibration: too tight a bound causes excessive transaction reverts in volatile markets; Careful calibration of slippage bounds is required to balance execution success rate against adversarial prof- itability. 4.5. Dimension 5: Regulatory Compliance The deployment of fully autonomous financial agents raises profound regulatory questions that have direct security implications. 4.5.1. KYC/AML for Non-Human Actors. Current Know Your Customer (KYC) and Anti-Money Laundering (AML) frameworks assume human account holders [65], [71]. When an autonomous agent controls a digital wallet or payment account and transacts independently, who is the “customer”? The agent’s deployer? The model provider? The framework developer? This ambiguity can be exploited by adversaries to launder money through chains of agent-mediated transac- tions that obscure the ultimate beneficial owner [64], [116]. In the United States, FinCEN’s Bank Secrecy Act (BSA) regulations require money services businesses (MSBs) to file Currency Transaction Reports (CTRs) for cash transactions exceeding $10,000 and Suspicious Activity Reports (SARs) for suspicious transactions of $5,000 or more [117]. Au- tonomous agents conducting financial transactions at scale could constitute unregistered MSBs, and the N2C attack vector (§5) specifically exploits BSA structuring prohibi- tions. In the European Union, the Markets in Crypto-Assets (MiCA) Regulation (Regulation 2023/1114) establishes li- censing requirements for crypto-asset service providers that would apply to platforms hosting autonomous agent crypto- wallets [118]. The FATF’s 2021 and 2023 guidance on Virtual Assets and Virtual Asset Service Providers (VASPs) === PAGE 9 === explicitly addresses algorithmic entities and requires mem- ber states to extend VASP AML obligations to entities that facilitate VA transfers on behalf of customers [119], a defi- nition that arguably encompasses autonomous agent infras- tructure operators. Mapping these specific obligations to our taxonomy: D5 (regulatory compliance) intersects with D2 (transaction authorization) through SAR filing requirements that would mandate real-time detection of structuring pat- terns (N2C attacks) and with D3 (inter-agent trust) through VASP travel rule requirements for transmitting beneficiary information in agent-to-agent payments. 4.5.2. VASP Classification and Principal Liability. A fundamental compliance question is whether an autonomous agent itself constitutes a Virtual Asset Service Provider (VASP) or whether its deployer bears VASP obligations. FATF Guidance (2023) [119] defines a VASP as an entity that conducts VA transfers as a business on behalf of oth- ers. Under this definition, the infrastructure operator, not the individual agent instance, is the liable VASP, bearing KYC/AML obligations for all agents on their platform. Practically, this implies a three-tier principal model: (a) the deployer must complete KYC at agent deployment time, binding their legal identity to the agent’s DID; (b) the infrastructure operator maintains the AML monitoring stack (N2C detectors, CTR/SAR pipelines); (c) the agent instance is an automated execution process, not itself a regulated entity. This model avoids regulatory uncertainty: no juris- diction currently licenses AI agents as financial institutions. 4.5.3. Liability Attribution. When an autonomous agent causes financial harm through a security breach, market manipulation, or erroneous trade, determining liability is challenging [42], [72]. The multi-layered architecture of agent systems (model provider, framework developer, tool provider, deployer) creates a diffusion of responsibility that adversaries can exploit. Without clear liability frameworks, there is insufficient incentive for any single party to invest in comprehensive security [65], [71]. 4.5.4. Audit Trail Requirements. Financial regulations typically require detailed audit trails of all transac- tions [120]. For autonomous agents, this requires logging not just the transactions themselves but the LLM reasoning that led to them. Current LLM architectures make this challenging: the mapping from input context to output action is opaque, and agents may process thousands of data points to arrive at a single trading decision. The x402 protocol [10] provides some audit trail capabilities through its payment metadata, and MPP extends this with explicit challenges, credentials, receipts, and optional request-body digests [13]; however, these protocol traces are still insufficient for full regulatory compliance. 4.5.5. Obligation-to-Control Mapping. Financial-agent compliance translates abstract obligations into engineering controls. In practice, AML/CFT and market-abuse require- ments imply beneficial-owner binding, cumulative expo- sure monitoring, Travel Rule style provenance exchange for agent-to-agent transfers, and tamper-evident audit logs [64], [65], [120]. This makes D5 inseparable from the rest of the framework: authorization limits help block structuring, iden- tity and settlement metadata support provenance exchange, and integrity-protected logs are needed when agent decisions are later audited. 4.6. Incident Lessons Representative incidents and constructed scenarios con- verge on three lessons. First, attacks are usually cross- dimensional: token-metadata injection links D1 to D2, com- promised tools link reasoning to market harm, and evaluator manipulation links D3 to settlement. Second, platform-level failures can be systemic, as illustrated by the Virtuals launch vulnerability, where a single infrastructure flaw could have affected every agent using the same launch path [54]. Third, the most damaging campaigns are often cumulative rather than spectacular: subtle oracle drift, repeated negotiation ex- ploitation, or under-threshold structuring can remain locally plausible while producing significant portfolio or compli- ance harm over time. Lifecycle-oriented analysis of Open- Claw reinforces this point by showing how initialization, memory, inference, and execution stages create compound- ing attack opportunities rather than isolated bugs [56]. These observations motivate the layered controls summarized in §5.4. 5. Comparative Analysis In this section, we compare existing approaches to se- curing autonomous financial agents across the dimensions of our taxonomy, analyze their trade-offs, and identify cross- layer attack vectors. 5.1. Defense Approaches by Dimension 5.1.1. Agent Integrity Defenses. Table 2 summarizes the principal defense categories for agent integrity. Input sanitization [16] is the most straightforward de- fense, filtering potentially malicious content from data feeds before they reach the agent’s context. However, in financial applications, aggressive filtering risks removing legitimate market signals. A news headline about a “crash” might be filtered as a potential injection vector when it is in fact critical market information. Instruction hierarchy approaches [16], [49] establish privilege levels where system-level instructions (e.g., “never transfer more than 1 ETH per transaction”) cannot be over- ridden by data-level content. While effective against many injection attacks, these approaches face challenges when agents must reason about user-provided financial objectives that necessarily interact with system constraints. Output validation [17] interposes a verification layer between the agent’s reasoning and its actions, checking proposed transactions against policy constraints before ex- ecution. This is the most robust single defense but intro- duces latency that can be costly in fast-moving financial === PAGE 10 === TABLE 2. COMPARISON OF AGENT INTEGRITY DEFENSE APPROACHES. Approach Mechanism Coverage Overhead Input sanitization Filter malicious prompts from data feeds before model ingestion Direct injection Low Instruction hierarchy Privilege separation between system and user/data prompts Direct & indirect injection Medium Output validation Verify proposed actions against policy before execution All integrity threats High Redundant reasoning Cross-check decisions with multiple independent LLM instances Model poisoning; subtle injection Very High Formal verification Prove bounded safety properties of the agent pipeline Broad but partial Infeasible markets. The validation layer itself must be secured against bypass [41]. Redundant reasoning [99], [121] uses multiple indepen- dent LLM instances to cross-check financial decisions, simi- lar to multi-factor authentication but applied to AI reasoning. While effective at catching individual model failures, this approach multiplies computational costs and still fails if all instances share the same vulnerability (e.g., a common training data bias). Runtime verification and capability bounding. Financial agents benefit from an independent control layer between reasoning and tool/action execution. ZTRV-style checks val- idate that each action remains bound to the current workflow context before execution [122], while Agent-Sentry con- strains action sequences using execution provenance and capability graphs [123]. These mechanisms complement output validation and payment authorization: the runtime verifier can reject replayed or context-drifted actions before they reach custody, while the custody layer still enforces transaction scope. Independent reproduction in financial agentic settings (with irreversible on-chain transactions and financial-specific attack scenarios) remains an open exper- imental challenge that we identify in §5.5. These controls complement the Layer 1 (prompt hardening) and Layer 2 (reasoning verification) proposals in our defense architecture (§5.4). Tool selection integrity. The tool-selection stage is an independent attack surface: compromised registries, mis- leading tool descriptions, or forged provenance metadata can redirect an agent toward malicious tools before any invocation occurs. This motivates pre-invocation verification of tool provenance and description integrity, not only post- invocation output validation. 5.1.2. Transaction Authorization Defenses. The design space for transaction authorization defenses spans a spec- trum from fully on-chain enforcement to fully off-chain policy engines. Smart contract guardrails. ERC-8004 [10] enables on- chain enforcement of spending limits, per-transaction caps, and time-locked operations. These guardrails are tamper- resistant (enforced by consensus) but inflexible because modifying policies requires on-chain transactions with as- sociated gas costs and latency. Recent work has explored programmable spending policies that combine on-chain en- forcement with off-chain configuration [10]. Multi-signature and threshold schemes. Requiring mul- tiple signatures for high-value transactions provides strong authorization guarantees [17]. In multi-agent settings, this can be implemented as requiring agreement among multiple independent agents before executing a trade. However, this approach assumes that the multiple signers are truly inde- pendent; if they share the same LLM backbone, a universal attack might compromise all of them simultaneously. Intent-action verification. AP2’s payment intent mech- anism [11] enables pre-execution verification by separat- ing the declaration of intent from its execution. A verifier can confirm that the intended action matches the proposed transaction before it is submitted to the blockchain. This approach is particularly valuable for complex transactions involving multiple steps or cross-chain operations. MPP pro- vides a related transport-layer defense through challenge– credential–receipt verification and digest-bound requests, allowing servers to ensure that the paid request is the same request that is ultimately executed [13]. 5.1.3. Inter-Agent Trust Defenses. Decentralized identity (DID). Agent identity can be anchored in decentralized identity systems that provide verifiable credentials about an agent’s capabilities, authorization level, and operating his- tory [17]. However, DID systems currently lack standardized credential types for autonomous agents, and the process of issuing credentials for non-human entities raises unresolved governance questions. Reputation systems. On-chain reputation systems track agents’ transaction histories and compute trust scores [124], [125]. These systems face the cold-start problem (new agents have no reputation) and are vulnerable to reputation farming and wash trading by adversarial agents [66]. Escrow and atomic settlement. Payment protocols like AP2 support escrow mechanisms where funds are locked in a smart contract until both parties confirm transaction completion [11]. Virtuals Protocol’s ACP extends this with an evaluator agent model, where a third-party agent assesses deliverable quality before releasing escrow funds [28]. While this reduces the need for mutual trust between trans- acting agents, it introduces a new trust assumption on the evaluator because a compromised evaluator can systemat- ically approve fraudulent deliverables or reject legitimate ones, enabling the C2E attack vector described in §5. ERC- 8183 formalizes this pattern with on-chain state machines and extensible hooks [33]. === PAGE 11 === The security of evaluator agents in ACP/ERC-8183 warrants explicit threat modeling. We identify four con- crete evaluator attack scenarios: (1) direct bribery, where a provider agent compensates an evaluator out-of-band to approve a fraudulent deliverable; (2) Sybil evaluator clus- ters, where an attacker deploys many evaluator identities to influence the evaluator selection pool; (3) evaluator– provider collusion, where the same controlling party op- erates both roles and systematically manipulates escrow outcomes; and (4) adversarial evaluator substitution, where an attacker front-runs evaluator-assignment transactions on- chain to insert a malicious evaluator. Mitigations include: bonded evaluators with slashing for misconduct; Byzantine fault-tolerant committee sizing with VRF-based selection to prevent front-running; TEE-backed cryptographic indepen- dence attestation; and on-chain statistical anomaly monitor- ing to flag evaluators with abnormal approval patterns [33]. Atomic settlement ensures that multi-step transactions ei- ther complete entirely or revert entirely, preventing partial execution attacks. 5.2. Cross-Layer Attack Vectors A critical finding of our analysis is that the most dan- gerous attacks on autonomous financial agents exploit cross- layer interactions, where a vulnerability at one layer triggers a cascading failure at another. We identify and characterize all 12 cross-layer attack vectors below; Table 3 provides a concise overview with adversary preconditions and layer paths. Three distinctions matter most operationally. T2R vs. T2T: T2R corrupts reasoning through false data, while T2T corrupts execution after correct reasoning; the former is mit- igated by provenance checks and cross-validation, the latter by end-to-end intent binding. T2R vs. O2P: T2R is usually acute and transaction-local, whereas O2P is chronic and cumulative, requiring longitudinal monitoring rather than single-trade anomaly detection. P2T vs. P2K: P2T induces a new unauthorized action; P2K coerces signing itself and therefore requires a hard separation between cognition and custody. Across the 12 vectors, the most immediate deployment risks are P2T, T2R, T2T, and S2I because they convert public inputs, tools, or dependencies into directly authorized financial actions [16], [41]. C2E, O2P, and N2C are slower- burn but often harder to detect because harm accumulates over time. R2I and M2A remain more speculative in the current corpus and should be treated as early-warning cate- gories rather than equally grounded threats. Recent protocol and deployment analyses sharpen these distinctions. MCP-specific studies point to capability- attestation gaps, unsafe trust propagation, and over- privileged tool wrappers as concrete precursors to T2R-style failure [47], [48]. Supply-chain exploitation work likewise shows that poisoned dependencies and prompt templates can bypass otherwise sound reasoning-layer defenses, which is why S2I belongs in the top deployment tier rather than being treated as a generic software risk [55]. TABLE 3. SUMMARY OF 12 CROSS-LAYER ATTACK VECTORS FOR AUTONOMOUS FINANCIAL AGENTS. ID Name Layer Path Core Mechanism P2T Prompt-to- Transaction LLM → Blockchain Injected prompt triggers signed tx T2R Tool-to- Reasoning Tool → Reasoning False data poisons decision A2M Agent-to- Market Inter-agent → Market LLM bias exploited in negotiation R2I Reg-to- Integrity Compliance →Market Regulatory gap enables laundering T2T Tool-to- Transaction Tool → Blockchain Tool modifies tx params post-reasoning P2K Prompt-to- Key LLM → Custody Injection bypasses key custody boundary M2A Model-to- Authorization Model → Authorization Backdoor defeats spending policy check C2E Collusion-to- Escrow Multi-agent →Settlement Colluding evaluators drain escrow O2P Oracle-to- Position Oracle → Portfolio Cumulative drift via subtle feed manipulation I2M Identity-to- Market Reputation → Market Sybil trust enables coordinated manipulation N2C Neg-to- Compliance Protocol → Compliance Structuring payments evades AML threshold S2I Supply-to- Integrity Supply chain →All Backdoored plugin silently alters transactions 5.3. Comparative Assessment of Protocols and In- terfaces Table 4 compares representative protocols and execution interfaces used by autonomous financial agents. No single protocol or execution interface covers all five dimensions. Payment and commerce protocols such as ERC-8004, AP2, x402, MPP, ACP, and ERC-8183 improve authorization, settlement, or inter-agent coordination, while MCP contributes tool-access control and auditability as an execution interface rather than a payment protocol. These mechanisms therefore remain complementary rather than sufficient: none of them by itself addresses LLM-layer com- promise, long-horizon market manipulation, and regulatory compliance simultaneously. Framework-level agent systems are discussed elsewhere in the paper but are not co-scored here because they sit at a different abstraction layer from the protocols and interfaces compared here. The marketplace side is similarly incomplete: emerging agent marketplaces and commerce layers promise discovery and settlement, but they still inherit unresolved problems around evaluator governance, listing integrity, and dispute resolution that earlier e-commerce work already warned about in human-mediated settings [21], [73], [126]. Finance amplifies these weaknesses because rankings, escrow re- lease, and reputation can all be monetized directly. 5.4. Layered Defense Architecture The core design implication of our comparison is de- fense in depth across the full execution path: === PAGE 12 === TABLE 4. SECURITY PROPERTY COMPARISON OF REPRESENTATIVE AGENT-COMMERCE PROTOCOLS AND EXECUTION INTERFACES. ‘N/A’ DENOTES A DIMENSION OUTSIDE THE ARTIFACT’S INTENDED SCOPE. Protocol / Interface Agent Integrity Transaction Authorization Inter-Agent Trust Market Manipulation Regulatory Compliance Virtuals/ACP [28] GAME framework Escrow settlement Evaluator agents Token caps On-chain logs ERC-8183 [33] N/A (standard) Job state machine 3-role trust model Hook-based Reputation ERC-8004 [10] N/A (protocol level) On-chain guardrails Token-based ID Spending caps Audit logs AP2 [11] N/A (protocol level) Intent verification Payment attestation N/A Payment logs x402 [10] N/A (protocol level) Per-request auth HTTP-level auth Rate limiting Request logs MPP [12], [13] N/A (protocol level) Digest-bound auth Session escrow N/A Receipts + logs MCP [40] Tool sandboxing Permission model Server auth N/A Tool call logs Layer 1: Prompt and Tool Hygiene. Sanitize external inputs, tag agent-originated content, and verify tool prove- nance before invocation so public data and registry metadata cannot silently steer action selection [16], [41]. Layer 2: Verified Execution Context. Use output val- idation, runtime context binding, and capability graphs so that a locally plausible plan still has to match the current workflow, counterparty, and permitted action sequence be- fore execution [122], [123]. Layer 3: Payment Authorization and Custody. Sepa- rate cognition from custody, enforce scoped spending poli- cies at the signing or credential layer, and bind payment or transaction parameters end to end using mechanisms such as ERC-8004 limits, AP2-style intents, and x402/MPP request binding [10], [11], [13]. Layer 4: Inter-Agent Trust Controls. Require authen- ticated agent identity, stake- or reputation-backed evaluator selection, and anomaly monitoring for collusion or Sybil behavior in escrow-mediated commerce [28], [33]. Layer 5: Market and Compliance Monitoring. Add circuit breakers, cumulative position-drift detection, expo- sure aggregation, and tamper-evident audit trails so slow- burn manipulation and compliance abuse are visible before losses compound [64], [120]. 5.5. Open Problems and Research Agenda Four research priorities follow from this condensed anal- ysis. First, financial-agent benchmarks remain weak: exist- ing agent-security and financial-LLM testbeds do not yet jointly capture irreversible execution, inter-agent settlement, and cumulative manipulation, which is why finance-specific benchmark harnesses are still needed [127]–[129]. Second, long-horizon monitoring remains immature: O2P, N2C, and correlated-agent failures are cumulative and require metrics that work over days or weeks rather than per transaction [4], [5]. Third, inter-agent trust still lacks a mature governance layer, especially around evaluator selection, cryptographic identity, and anti-collusion enforcement. Fourth, traditional payment rails and cross-chain deployments remain under- explored even though they introduce different reversibility, compliance, and custody assumptions from purely on-chain systems [65]. More specifically, general agent benchmarks such as AgentDojo and ASB still omit the finance-specific attack classes emphasized here, while newer financial testbeds such as TraderBench and CAIA improve adversarial realism but still do not fully integrate inter-agent trust and compliance vectors [130]–[133]. Financial LLM benchmarks in other languages and markets also broaden evaluation coverage but remain focused on capability rather than adversarial secu- rity [134]–[136]. Theoretical work on verifiable reasoning, red-teaming, and longitudinal monitoring therefore remains directly relevant to this agenda [137]–[139]. Finally, some deployment trade-offs remain structurally hard rather than merely under-engineered. Portfolio agents can often tolerate stricter authorization and slower review loops than spot traders [140], and adaptive policy tuning may partially reconcile autonomy with control [141]; but these mitigations do not remove the need for hard custody boundaries and market-level monitoring. 6. Conclusion This paper provides a systematic account of the security challenges facing fully autonomous LLM agents in financial settings. As agentic finance matures through frameworks such as OpenClaw, payment and coordination protocols such as ERC-8004, AP2, x402, ACP, and MPP, and the broader convergence of LLMs with decentralized finance, these challenges will become more consequential rather than less. The central result of this SoK is that autonomous financial-agent security is fundamentally a cross-layer prob- lem. Threats often originate in reasoning, tools, identity, or inter-agent interaction, but the resulting harm appears in custody, settlement, markets, or compliance. Our five- dimensional taxonomy provides a structured way to ana- lyze this space, and the corpus synthesis shows that point defenses are not enough: secure deployment requires coor- dinated controls across agent integrity, authorization, trust, market structure, and regulation. Several conclusions follow. First, agentic-finance risk is not simply the sum of traditional financial security and generic LLM security; it arises from their interaction under conditions of financial irreversibility and reduced human oversight [7], [16]. Second, no existing system or pro- tocol currently offers end-to-end coverage of this threat surface, which motivates the layered defense architecture developed in this paper. Third, common assumptions such as “a human can intervene,” “prompt injection is a localized bug,” or “on-chain finality implies correctness” do not hold === PAGE 13 === for autonomous agents. Fourth, systemic risk from model and protocol homogeneity remains underappreciated: when many agents share the same foundation model or execution stack, a single exploit can propagate into market-wide dis- ruption [60]. The protocols being designed now will shape the in- frastructure through which autonomous agents transact real value. Securing that infrastructure is a prerequisite for re- sponsible deployment of autonomous AI in finance. We hope that the framework, corpus, attack taxonomy, defense architecture, and research agenda developed here provide a useful foundation for that effort. References [1] A. Greenwald and P. Stone, “Autonomous bidding agents in the trading agent competition,” IEEE Internet Computing, 2001. [2] M. He and N. R. Jennings, “Southamptontac: Designing a successful trading agent,” in Proceedings of the Fifteenth European Conference on Artificial Intelligence. IOS Press, 2002, pp. 8–12. [Online]. Available: https://eprints.soton.ac.uk/252101/ [3] P. Steinberger, “openclaw/openclaw: Your own personal AI assistant. any OS. any platform. the lobster way.” GitHub repository, accessed: 2026-03-31. [Online]. Available: https: //github.com/openclaw/openclaw [4] Y. Luo, Y. Feng, J. Xu, P. Tasca, and Y. Liu, “LLM-powered multi- agent system for automated crypto portfolio management,” arXiv preprint arXiv:2501.00826, 2025. [5] M. Rizinski and D. Trajanov, “AI agents in finance and fintech: A scientific review of agent-based systems, applications, and future horizons,” Computers, Materials & Continua, vol. 86, no. 1, pp. 1–34, 2026. [6] H. Ding, Y. Li, J. Wang, H. Chen, D. Guo, and Y. Zhang, “Large language model agent in financial trading: A survey,” arXiv preprint arXiv:2408.06361, 2024. [7] Y. Nie, Y. Kong, X. Dong, J. M. Mulvey, H. V. Poor, Q. Wen, and S. Zohren, “A survey of large language models for financial applications: Progress, prospects and challenges,” arXiv preprint arXiv:2406.11903, 2024. [8] J. Lee, N. Stevens, and S. C. Han, “Large language models in finance (FinLLMs),” Neural Computing and Applications, vol. 37, no. 30, pp. 24 853–24 867, 2025. [9] Y. Xiao, E. Sun, T. Chen, F. Wu, D. Luo, and W. Wang, “Trading-R1: Financial trading with LLM reasoning via reinforcement learning,” arXiv preprint arXiv:2509.11420, 2025. [10] M. Goenka, T. Pathak, and S. Asthana, “TessPay: Verify-then- pay infrastructure for trusted agentic commerce,” arXiv preprint arXiv:2602.00213, 2026. [11] X. Liu, S. Gu, and D. Song, “AgenticPay: A multi-agent LLM negotiation system for buyer-seller transactions,” arXiv preprint arXiv:2602.06008, 2026. [12] Tempo, “Machine payments,” Tempo Documentation, accessed: 2026-03-31. [Online]. Available: https://docs.tempo.xyz/learn/ tempo/machine-payments [13] Tempo and Stripe, “Protocol overview,” Machine Payments Protocol documentation, accessed: 2026-03-31. [Online]. Available: https://mpp.dev/protocol [14] D. G. W. Birch and D. Gamble, “Agentic commerce and payments: Exploring the implications of robots paying robots,” Journal of Payments Strategy & Systems, 2025. [15] Y. Zhang, B. Pan, M. Zhu, J. Pei, and L. Zhao, “Agentic commerce: A survey of how ai agents are reshaping commerce,” TechRxiv, 2026. [16] K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, “Not what you’ve signed up for: Compromising real- world LLM-integrated applications with indirect prompt injection,” in Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security. ACM, 2023, pp. 79–90. [17] V. Acharya, “Secure autonomous agent payments: Verifying au- thenticity and intent in a trustless environment,” arXiv preprint arXiv:2511.15712, 2025. [18] B. Cao, S. Wang, X. Lin, X. Wu, H. Zhang, L. M. Ni, and J. Guo, “From deep learning to LLMs: A survey of AI in quantitative investment,” arXiv preprint arXiv:2503.21422, 2025. [19] H. Zhao, Z. Liu, Z. Wu, Y. Li, T. Yang, P. Shu, S. Xu, H. Dai, L. Zhao, H. Jiang, Y. Pan, J. Chen, Y. Zhou, Z. Zhang, R. Sun, G. Mai, N. Liu, and T. Liu, “Revolutionizing finance with LLMs: An overview of applications and insights,” arXiv preprint arXiv:2401.11641, 2024. [20] Y. Kong, Y. Nie, X. Dong, J. M. Mulvey, H. V. Poor, Q. Wen, and S. Zohren, “Large language models for financial and invest- ment management: Applications and benchmarks,” The Journal of Portfolio Management, 2024. [21] R. H. Guttman, A. G. Moukas, and P. Maes, “Agent-mediated elec- tronic commerce: A survey,” The Knowledge Engineering Review, 1998. [22] M. He, “On agent-mediated electronic commerce,” IEEE Transac- tions on Knowledge and Data Engineering, 2003. [23] C. Sierra, “Agent-mediated electronic commerce,” Autonomous Agents and Multi-Agent Systems, 2004. [24] P. Maes, “Agents that reduce work and information overload,” Communications of the ACM, 1994. [25] M. H. Jarrahi and P. Ritala, “Rethinking AI agents: A principal-agent perspective,” California Management Re- view, 2025. [Online]. Available: https://cmr.berkeley.edu/2025/ 07/rethinking-ai-agents-a-principal-agent-perspective/ [26] F. Adedoyin, “Human-centred AI in FinTech: Developing a user experience (UX) research point of view (PoV) playbook,” arXiv preprint arXiv:2506.15325, 2025. [27] S. Kim, S. Kim, Y. Kim, J. Park, S. Kim, M. Kim, C. H. Sung, J. Hong, and Y. Lee, “LLMs analyzing the analysts: Do BERT and GPT extract more value from financial analyst reports?” in Proceedings of the 4th ACM International Conference on AI in Finance. ACM, 2023, pp. 383–391. [28] Virtuals Protocol, “Technical deep dive,” Virtuals Protocol Whitepaper, agent Commerce Protocol (ACP); accessed: 2026-03- 31. [Online]. Available: https://whitepaper.virtuals.io/about-virtuals/ agent-commerce-protocol-acp/technical-deep-dive [29] M. Fernandez, “Agent control protocol: Admission control for agent actions,” arXiv preprint arXiv:2603.18829, 2026. [30] S. Alqithami, “Autonomous agents on blockchains: Standards, execution models, and trust boundaries,” arXiv preprint arXiv:2601.04583, 2026. [31] Virtuals Protocol, “About virtuals protocol,” Virtuals Protocol Whitepaper, accessed: 2026-03-31. [Online]. Available: https: //whitepaper.virtuals.io [32] Virtuals Protocol, “GAME framework,” Virtuals Proto- col Whitepaper, accessed: 2026-03-31. [Online]. Available: https://whitepaper.virtuals.io/builders-hub/game-framework [33] D. Crapis, B. Lim, W. Tay, and Z. Chooi, “ERC-8183: Agentic commerce,” Ethereum Improvement Proposal, 2026, created: 2026- 02-25. [Online]. Available: https://eips.ethereum.org/EIPS/eip-8183 [34] J. Weinstein and S. Kaliski, “Introducing the machine payments protocol,” Stripe Blog, 2026, published: 2026-03-18. [Online]. Available: https://stripe.com/blog/machine-payments-protocol === PAGE 14 === [35] J. S. Wang, “AESP: A human-sovereign economic protocol for AI agents with privacy-preserving settlement,” arXiv preprint arXiv:2603.00318, 2026. [36] R. Bloemen, L. Logvinov, and J. Evans, “EIP-712: Typed structured data hashing and signing,” Ethereum Improvement Proposal 712, 2017. [Online]. Available: https://eips.ethereum.org/EIPS/eip-712 [37] M. Thomson and A. Backman, “HTTP message signatures,” RFC 9421, 2024. [Online]. Available: https://www.rfc-editor.org/ rfc/rfc9421 [38] X. Pang, K.-L. Tan, Y. Wang, and J. Ren, “A secure agent-mediated payment protocol,” in Information and Communications Security. Springer Berlin Heidelberg, 2002, pp. 422–433. [39] Y. Wang and V. Varadharajan, “A mobile autonomous agent-based secure payment protocol supporting multiple payments,” in Pro- ceedings of the 2005 IEEE/WIC/ACM International Conference on Intelligent Agent Technology. IEEE Computer Society, 2005, pp. 88–94. [40] Model Context Protocol, “What is the model context protocol (MCP)?” Documentation, accessed: 2026-03-31. [Online]. Available: https://modelcontextprotocol.io/docs/getting-started/intro [41] Model Context Protocol, “Security best prac- tices,” Documentation, accessed: 2026-03-31. [Online]. Available: https://modelcontextprotocol.io/docs/tutorials/security/ security best practices [42] V. Stocker and W. Lehr, “Principal-agent dynamics and digital (platform) economics in the age of agentic AI,” Network Law Review, 2025, published: 2025-09-29. [Online]. Available: https://www.networklawreview.org/stocker-lehr-ai/ [43] S. Kapoor and A. Narayanan, “AI agents that matter,” arXiv preprint arXiv:2407.01502, 2024. [44] R. H. Guttman and P. Maes, “Agent-mediated integrative negotia- tion for retail electronic commerce,” in Agent Mediated Electronic Commerce. Springer Berlin Heidelberg, 1999. [45] T. Sandholm, “Automated negotiation,” Communications of the ACM, vol. 42, no. 3, pp. 84–85, 1999. [46] D. Nieper-Wisskirchen, P. Singh, S. Gupta, and J. Chang, “Se- curity threat modeling for emerging AI-agent protocols: A com- parative analysis of MCP, A2A, agora, and ANP,” arXiv preprint arXiv:2602.11327, 2026. [47] N. Maloyan and D. Namiot, “Breaking the protocol: Security anal- ysis of the model context protocol specification and prompt injec- tion vulnerabilities in tool-integrated LLM agents,” arXiv preprint arXiv:2601.17549, 2026. [48] H. Zhang, Y. Nian, and Y. Zhao, “Agent audit: A security analysis system for LLM agent applications,” arXiv preprint arXiv:2603.22853, 2026. [49] A. Allouah, O. Besbes, J. D. Figueroa, Y. Kanoria, and A. Ku- mar, “What is your AI agent buying? evaluation, biases, model dependence, and emerging implications of agentic e-commerce,” in Proceedings of the ACM Web Conference 2026. ACM, 2026, pp. 8697–8700. [50] S. Kapoor, N. Kolt, and D. Lazar, “Build agent advocates, not platform agents,” arXiv preprint arXiv:2505.04345, 2025. [51] M. Chung and V. Honavar, “A negotiation model in agent-mediated electronic commerce,” in Proceedings International Symposium on Multimedia Software Engineering. IEEE Computer Society, 2000, pp. 403–410. [52] G. E. de Paula, F. S. Ramos, and G. L. Ramalho, “Bilateral nego- tiation model for agent-mediated electronic commerce,” 2001. [53] T. Cai, G. Li, N. Han, C. Huang, Z. Wang, C. Zeng, Y. Wang, J. Zhou, H. Zhang, Q. Chen, Y. Pan, S. Wang, and W. Wang, “FinD- ebate: Multi-agent collaborative intelligence for financial analysis,” in Proceedings of The 10th Workshop on Financial Technology and Natural Language Processing, 2025, pp. 268–282. [54] H. Shittu, “Virtuals protocol fixes critical bug, re- wards security researcher,” Cryptonews, 2025, last updated: 2025-01-03. [Online]. Available: https://cryptonews.com/news/ virtuals-protocol-fixes-critical-bug-rewards-security-researcher/ [55] Y. Ruan, H. Dong, A. Wang, S. Pitis, Y. Zhou, J. Ba, Y. Dubois, C. J. Maddison, and T. Hashimoto, “Identifying the risks of LM agents with an LM-emulated sandbox,” arXiv preprint arXiv:2309.15817, 2023. [56] X. Deng, Y. Zhang, J. Wu, J. Bai, S. Yi, Z. Zou, Y. Xiao, R. Qiu, J. Ma, J. Chen, X. Du, X. Yang, S. Cui, C. Meng, W. Wang, J. Song, K. Xu, and Q. Li, “Taming OpenClaw: Security analysis and mitigation of autonomous LLM agent threats,” arXiv preprint arXiv:2603.11619, 2026. [57] M. Yu, F. Meng, X. Zhou, S. Wang, J. Mao, L. Pan, T. Chen, K. Wang, X. Li, Y. Zhang, B. An, and Q. Wen, “A survey on trust- worthy LLM agents: Threats and countermeasures,” in Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2. ACM, 2025, pp. 6216–6226. [58] C. Schroeder de Witt, “Open challenges in multi-agent security: Towards secure systems of interacting AI agents,” arXiv preprint arXiv:2505.02077, 2025. [59] B. A. Hu and H. Rong, “Inter-agent trust models: A comparative study of brief, claim, proof, stake, reputation and constraint in agentic web protocol design-A2A, AP2, ERC-8004, and beyond,” arXiv preprint arXiv:2511.03434, 2025. [60] A. Moreno, “Predicting stock price trends using language models to extract the sentiment from analyst reports,” Economics Letters, 2025. [61] G. Assis, D. Vianna, G. L. Pappa, A. Plastino, W. Meira Jr, A. S. da Silva, and A. Paes, “Analysis of material facts on financial assets: A generative AI approach,” in Proceedings of the Joint Workshop of the 7th Financial Technology and Natural Language Processing, the 5th Knowledge Discovery from Unstructured Data in Financial Services, and the 4th Workshop on Economics and Natural Language Processing. Association for Computational Linguistics, 2024, pp. 103–118. [Online]. Available: https://aclanthology.org/2024.finnlp-1.11/ [62] O. Nabar and G. Shroff, “Conservative predictions on noisy data,” in 4th ACM International Conference on AI in Finance, 2023. [63] S. Kim, J. Hong, and Y. Lee, “A GANs-based approach for stock price anomaly detection and investment risk management,” in Pro- ceedings of the 4th ACM International Conference on AI in Finance. ACM, 2023, pp. 1–9. [64] M. E. Faysal, W. Feng, and E. Mony, “Agentic commerce: A unified multi-retrieval framework for high-fidelity e-commerce chatbots,” Journal of Computer Science and Artificial Intelligence, 2026. [65] L. Hornuf, D. Streich, and N. T¨ollich, “Making GenAI smarter: Evidence from a portfolio allocation experiment,” SSRN Electronic Journal, 2025. [66] P. Xu, J. Gao, and H. Guo, “A deceit-tolerant negotiation model for agent mediated electronic commerce,” in 2005 International Conference on Machine Learning and Cybernetics. IEEE, 2005. [67] M. Hirano, K. Minami, and K. Imajo, “Adversarial deep hedging: Learning to hedge without price process modeling,” in Proceedings of the 4th ACM International Conference on AI in Finance. ACM, 2023, pp. 19–26. [68] S. Zhu, H. Leung, X. Wang, J. Wei, and H. Xu, “When FinTech meets privacy: Securing financial LLMs with differential private fine-tuning,” in 2025 IEEE International Performance, Computing, and Communications Conference. IEEE, 2025, pp. 1–6. === PAGE 15 === [69] N. Banerjee, A. Sarkar, S. Chakraborty, S. Ghosh, and S. K. Naskar, “Fine-tuning language models for predicting the impact of events associated to financial news articles,” in Proceedings of the Joint Workshop of the 7th Financial Technology and Natural Language Processing, the 5th Knowledge Discovery from Unstructured Data in Financial Services, and the 4th Workshop on Economics and Natural Language Processing. Association for Computational Linguistics, 2024, pp. 244–247. [Online]. Available: https://aclanthology.org/2024.finnlp-1.25/ [70] T. Konstantinidis, G. Iacovides, M. Xu, T. G. Constantinides, and D. Mandic, “Finllama: Financial sentiment classification for al- gorithmic trading applications,” arXiv preprint arXiv:2403.12285, 2024. [71] D. Shukanayev, “Who pays when the agent fails? liability frame- works for autonomous payment systems in a fragmented regulatory landscape,” SSRN Electronic Journal, 2025. [72] M. Bain and B. Subirana, “Legalising autonomous shopping agent processes,” Computer Law & Security Report, 2003. [73] H. S. Nwana, J. Rosenschein, T. Sandholm, C. Sierra, P. Maes, and R. Guttmann, “Agent-mediated electronic commerce,” in Proceed- ings of the second international conference on Autonomous agents - AGENTS ’98. ACM Press, 1998. [74] A. Moukas, G. Zacharia, R. Guttman, and P. Maes, “Agent-mediated electronic commerce: An mit media laboratory perspective,” Inter- national Journal of Electronic Commerce, 2000. [75] C. Sierra and F. Dignum, “Agent-mediated electronic commerce: Scientific and technological roadmap,” 2001. [76] B. Gˆateau, D. Khadraoui, O. Boissier, and E. Dubois, “Contract model for agent mediated electronic commerce,” in Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems. IEEE Computer Society, 2004, pp. 1454–1455. [77] W. Wang and I. Benbasat, “Attributions of trust in decision support technologies: A study of recommendation agents for e-commerce,” Journal of Management Information Systems, 2008. [78] W.-P. Lee, C.-H. Liu, and C.-C. Lu, “Intelligent agent-based systems for personalized recommendations in internet commerce,” Expert Systems with Applications, vol. 22, no. 4, pp. 275–284, 2002. [79] F. Hua and S.-U. Guan, “Agents and payment systems in e- commerce,” 2001. [80] M. Ma, “Agents in e-commerce,” Communications of the ACM, 1999. [81] W. H. Redmond, “The potential impact of artificial shopping agents in e-commerce markets,” Journal of Interactive Marketing, 2002. [82] P. R. Wurman, M. P. Wellman, and W. E. Walsh, “The michigan internet auctionbot: a configurable auction server for human and software agents,” in Proceedings of the Second International Con- ference on Autonomous Agents. ACM Press, 1998, pp. 301–308. [83] M. A. Ferrag, N. Tihanyi, D. Hamouda, L. Maglaras, A. Lakas, and M. Debbah, “From prompt injections to protocol exploits: Threats in LLM-powered AI agents workflows,” ICT Express, vol. 12, no. 2, pp. 353–383, 2026. [84] S. Balaskas, “From recommendations to delegation: A systematic review mapping agentic AI in e-commerce and its consumer effects,” Information, vol. 17, no. 3, p. 222, 2026. [85] S. Brohi, Q.-u.-a. Mastoi, N. Z. Jhanjhi, and T. R. Pillai, “A research landscape of agentic AI and large language models: Applications, challenges and future directions,” Algorithms, vol. 18, no. 8, p. 499, 2025. [86] Y. Liu, R. Zhang, H. Luo, Y. Lin, G. Sun, D. Niyato, H. Du, Z. Xiong, Y. Wen, A. Jamalipour, D. I. Kim, and P. Zhang, “Secure Multi-LLM agentic AI and agentification for edge general intelli- gence by zero-trust: A survey,” arXiv preprint arXiv:2508.19870, 2025. [87] R. Ko, J. Jeong, S. Zheng, C. Xiao, T.-W. Kim, M. Onizuka, and W.-Y. Shin, “Seven security challenges that must be solved in cross-domain multi-agent LLM systems,” arXiv preprint arXiv:2505.23847, 2025. [88] D. Kong, S. Lin, Z. Xu, Z. Wang, M. Li, Y. Li, Y. Zhang, H. Peng, X. Chen, Z. Sha, Y. Li, C. Lin, X. Wang, X. Liu, N. Zhang, C. Chen, C. Wu, M. K. Khan, and M. Han, “A survey of LLM-driven AI agent communication: Protocols, security risks, and defense countermeasures,” arXiv preprint arXiv:2506.19676, 2025. [89] Y. Louck, A. Stulman, and A. Dvir, “Security analysis of agentic AI communication protocols: A comparative evaluation,” arXiv preprint arXiv:2511.03841, 2025. [90] K. Huang, Y. Mehmood, H. Atta, J. Huang, M. Z. Baig, and S. B. Balija, “Fortifying the agentic web: A unified zero-trust architecture against logic-layer threats,” arXiv preprint arXiv:2508.12259, 2025. [91] G. Syros, A. Suri, J. Ginesin, C. Nita-Rotaru, and A. Oprea, “SAGA: A security architecture for governing AI agentic systems,” arXiv preprint arXiv:2504.21034, 2025. [92] M. Xu, “The agent economy: A blockchain-based foundation for autonomous AI agents,” arXiv preprint arXiv:2602.14219, 2026. [93] J. Zheng, Y. Luo, J. Xu, B. Liu, Y. Chen, C. Cui, G. Deng, C. Lu, X. Wang, A. Zhang, and T.-S. Chua, “Risky-Bench: Probing agentic safety risks under real-world deployment,” arXiv preprint arXiv:2602.03100, 2026. [94] V. K. Bonagiri, P. Kumaragurum, K. Nguyen, and B. Plaut, “Check yourself before you wreck yourself: Selectively quitting improves LLM agent safety,” arXiv preprint arXiv:2510.16492, 2025. [95] U. Uchibeke, “Before the tool call: Deterministic pre-action authorization for autonomous AI agents,” arXiv preprint arXiv:2603.20953, 2026. [96] Y. Erinle, Y. Kethepalli, Y. Feng, and J. Xu, “SoK: Design, vulnera- bilities, and security measures of cryptocurrency wallets,” Computer Networks, vol. 273, p. 111691, 2025. [97] X. Liu, Z. Yu, Y. Zhang, N. Zhang, and C. Xiao, “Automatic and universal prompt injection attacks against large language models,” arXiv preprint arXiv:2403.04957, 2024. [98] M. Chernyshev, Z. Baig, and R. Doss, “[short paper] forensic anal- ysis of indirect prompt injection attacks on LLM agents,” in 2024 IEEE 6th International Conference on Trust, Privacy and Security in Intelligent Systems, and Applications (TPS-ISA). IEEE, 2024, pp. 409–411. [99] X. Zhang, Y. Lu, and D. Lee, “GuruAgents: Emulating wise investors with prompt-guided LLM agents,” arXiv preprint arXiv:2510.01664, 2025. [100] Y. He, H. Zhu, Y. Li, S. Shao, H. Yao, Z. Liu, and Z. Qin, “Attri- Guard: Defeating indirect prompt injection in LLM agents via causal attribution of tool invocations,” arXiv preprint arXiv:2603.10749, 2026. [101] A. S. Patlan, P. Sheng, S. A. Hebbar, P. Mittal, and P. Viswanath, “Real AI agents with fake memories: Fatal context manipulation attacks on Web3 agents,” arXiv preprint arXiv:2503.16248, 2025. [102] S. S. Srivastava and H. He, “MemoryGraft: Persistent compromise of LLM agents via poisoned experience retrieval,” arXiv preprint arXiv:2512.16962, 2025. [103] B. D. Sunil, I. Sinha, P. Maheshwari, S. Todmal, S. Mallik, and S. Mishra, “Memory poisoning attack and defense on memory based LLM-agents,” arXiv preprint arXiv:2601.05504, 2026. [104] V. P. Bhardwaj, “SuperLocalMemory: Privacy-preserving multi- agent memory with bayesian trust defense against memory poison- ing,” arXiv preprint arXiv:2603.02240, 2026. [105] M. P. Papazoglou, “Agent-oriented technology in support of e- business,” Communications of the ACM, 2001. === PAGE 16 === [106] I. R. Kerr, “Ensuring the success of contract formation in agent- mediated electronic commerce,” Electronic Commerce Research, 2001. [107] A. Birgisson, J. G. Politz, ´U. Erlingsson, A. Taly, M. Vrable, and M. Lentczner, “Macaroons: Cookies with contextual caveats for decentralized authorization in the cloud,” in Proceedings of the Network and Distributed System Security Symposium (NDSS). Internet Society, 2014. [Online]. Available: https://www.ndss-symposium.org/ndss2014/ndss-2014-programme/ macaroons-cookies-contextual-caveats-decentralized-authorization-cloud/ [108] M. S. Miller, “Robust composition: Towards a unified approach to access control and concurrency control,” Ph.D. dissertation, Johns Hopkins University, 2006, foundational object-capability security model. [Online]. Available: http://erights.org/talks/thesis/ markm-thesis.pdf [109] M. El Helou, C. Troiani, B. Ryder, J. Diaconu, H. Muyal, and M. Yannuzzi, “Delegated authorization for agents constrained to semantic task-to-scope matching,” arXiv preprint arXiv:2510.26702, 2025. [110] A. Goswami, “Agentic JWT: A secure delegation protocol for au- tonomous AI agents,” arXiv preprint arXiv:2509.13597, 2025. [111] T. Sandholm, “Agents in electronic commerce: Component technolo- gies for automated negotiation and coalition formation,” Autonomous Agents and Multi-Agent Systems, 2000. [112] J. Pei, S. Vadlamannati, L.-K. Huang, D. Preotiuc-Pietro, and X. Hua, “Modeling and detecting company risks from news,” in Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track), 2024. [113] T. V. Solodukha, O. A. Sosnovskiy, and B. A. Zhelezko, “Multi- agent systems for e-commerce,” Prace Naukowe Uniwersytetu Ekonomicznego we Wrocławiu, no. 133, pp. 131–140, 2010. [Online]. Available: https://dbc.wroc.pl/Content/32060/PDF/ Solodukha Multi-agent Systems for E-commerce 2010.pdf [114] D. Lee and M. Tiwari, “Prompt infection: LLM-to-LLM prompt injection within multi-agent systems,” arXiv preprint arXiv:2410.07283, 2024. [115] D. Byrd, “The accidental pump and dump: When agentic AI meets autonomous trading,” in Proceedings of the 6th ACM International Conference on AI in Finance. ACM, 2025, pp. 88–95. [116] A. D. Hacini, M. Benabdelouahad, I. Abassi, S. Houhou, A. Boul- merka, and N. Farhi, “LLM-assisted financial fraud detection with reinforcement learning,” Algorithms, vol. 18, no. 12, p. 792, 2025. [117] Financial Crimes Enforcement Network, “Application of FinCEN’s regulations to certain business models involving convertible virtual currencies,” U.S. Department of the Treasury, Guidance FIN-2019-G001, 2019. [Online]. Avail- able: https://www.fincen.gov/sites/default/files/2019-05/FinCEN% 20Guidance%20CVC%20FINAL%20508.pdf [118] European Parliament and Council, “Regulation (EU) 2023/1114 on markets in crypto-assets (MiCA),” Official Journal of the European Union, vol. L 150, pp. 40–205, 2023. [Online]. Available: https:// eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32023R1114 [119] Financial Action Task Force, “Targeted update on implementation of the FATF standards on virtual assets and virtual asset service providers,” FATF, Tech. Rep., 2023. [Online]. Available: https://www.fatf-gafi.org/en/publications/Fatfrecommendations/ targeted-update-virtual-assets-vasps-2023.html [120] G. D’Agostino, A. Rocci, and C. Reed, “Capturing analysts’ questioning strategies in earnings calls via a question cornering score (QCS),” in Proceedings of the Eighth Financial Technology and Natural Language Processing and the 1st Agent AI for Scenario Planning, 2024, pp. 107–118. [Online]. Available: https://aclanthology.org/2024.finnlp-2.10/ [121] Y. Cao, Z. Chen, Z. Cui, Z. Deng, Y. He, J. Huang, Y. Jiang, D. Li, H. Li, R. Liu, K. Subbalakshmi, J. Suchow, Q. Xie, G. Xiong, Z. Xu, Z. Yao, Y. Yu, and D. Zhang, “Fincon: A synthesized LLM multi- agent system with conceptual verbal reinforcement for enhanced financial decision making,” in Advances in Neural Information Processing Systems 37. Neural Information Processing Systems Foundation, Inc. (NeurIPS), 2024, pp. 137 010–137 045. [122] Q. Lan, A. Kaul, S. Jones, and S. Westrum, “Zero-trust runtime verification for agentic payment protocols: Mitigating replay and context-binding failures in AP2,” arXiv preprint arXiv:2602.06345, 2026. [123] R. Sequeira, S. Damianakis, U. Iqbal, and K. Psounis, “Agent- Sentry: Bounding LLM agents via execution provenance,” arXiv preprint arXiv:2603.22868, 2026. [124] S. X. Komiak and I. Benbasat, “Understanding customer trust in agent-mediated electronic commerce, web-mediated electronic commerce, and traditional commerce,” Information Technology and Management, 2004. [125] W. Wang and I. Benbasat, “Recommendation agents for electronic commerce: Effects of explanation facilities on trusting beliefs,” Journal of Management Information Systems, 2007. [126] T. de Rosen, “From visibility to eligibility in the age of agentic commerce,” SSRN Electronic Journal, 2025. [127] Q. Xie, J. Huang, D. Li, Z. Chen, R. Xiang, M. Xiao, Y. Yu, V. Somasundaram, K. Yang, C. Yuan, Z. Luo, Z. Liu, Y. He, Y. Jiang, H. Li, D. Feng, X.-Y. Liu, B. Wang, H. Wang, Y. Lai, J. Suchow, A. Lopez-Lira, M. Peng, and S. Ananiadou, “FinNLP-AgentScen-2024 shared task: Financial challenges in large language models - FinLLMs,” in Proceedings of the Eighth Financial Technology and Natural Language Processing and the 1st Agent AI for Scenario Planning, 2024, pp. 119–126. [Online]. Available: https://aclanthology.org/2024.finnlp-2.11/ [128] G. Son, H. Jeon, C. Hwang, and H. Jung, “KRX bench: Automating financial benchmark creation via large language models,” in Proceedings of the Joint Workshop of the 7th Financial Technology and Natural Language Processing, the 5th Knowledge Discovery from Unstructured Data in Financial Services, and the 4th Workshop on Economics and Natural Language Processing. Association for Computational Linguistics, 2024, pp. 10–20. [Online]. Available: https://aclanthology.org/2024.finnlp-1.2/ [129] Z. Yang, R. Li, Q. Qiang, J. Wang, F. Lou, M. Li, D. Cheng, R. Xu, H. Lian, S. Zhang, X. Liang, X. Huang, Z. Wei, Z. Liu, X. Guo, H. Wang, R. Chen, and L. Zhang, “FinVault: Benchmarking financial agent safety in execution-grounded environments,” arXiv preprint arXiv:2601.07853, 2026. [130] M. Balunovic, L. Beurer-Kellner, E. Debenedetti, M. Fischer, F. Tram`er, and J. Zhang, “AgentDojo: A dynamic environment to evaluate prompt injection attacks and defenses for LLM agents,” in Advances in Neural Information Processing Systems 37. Neural Information Processing Systems Foundation, Inc. (NeurIPS), 2024, pp. 82 895–82 920. [131] H. Zhang, J. Huang, K. Mei, Y. Yao, Z. Wang, C. Zhan, H. Wang, and Y. Zhang, “Agent security bench (ASB): Formalizing and bench- marking attacks and defenses in LLM-based agents,” arXiv preprint arXiv:2410.02644, 2024. [132] X. Yuan, H. Xu, S. Xu, C. Zou, and J. Xiong, “TraderBench: How robust are AI agents in adversarial capital markets?” arXiv preprint arXiv:2603.00285, 2026. [133] Z. Dai, Z. Peng, Z. Cheng, and R. Y. Li, “When hallucination costs millions: Benchmarking AI agents in high-stakes adversarial financial markets,” arXiv preprint arXiv:2510.00332, 2025. [134] M. Hirano, “Construction of a japanese financial benchmark for large language models,” in Proceedings of the Joint Workshop of the 7th Financial Technology and Natural Language Processing, the 5th Knowledge Discovery from Unstructured Data in Financial Services, and the 4th Workshop on Economics and Natural Language Processing. Association for Computational Linguistics, 2024, pp. 1–9. [Online]. Available: https://aclanthology.org/2024.finnlp-1.1/ === PAGE 17 === [135] L. Xu, L. Zhu, Y. Wu, and H. Xue, “Superclue-fin: Graded fine- grained analysis of chinese llms on diverse financial tasks and applications,” arXiv preprint arXiv:2404.19063, 2024. [136] M. Lee and L.-K. Soon, ““finance wizard” at the FinLLM challenge task: Financial text summarization,” in Proceedings of the Eighth Financial Technology and Natural Language Processing and the 1st Agent AI for Scenario Planning, 2024, pp. 153–158. [Online]. Available: https://aclanthology.org/2024.finnlp-2.16/ [137] W. W. Li, H. Kim, M. Cucuringu, and T. Ma, “Can LLM-based financial investing strategies outperform the market in long run?” arXiv preprint arXiv:2505.07078, 2025. [138] L. Guo, J. Sanz-Cruzado, and R. McCreadie, “University of glasgow at the FinLLM challenge task: Adapting llama for financial news abstractive summarization,” in Proceedings of the Eighth Financial Technology and Natural Language Processing and the 1st Agent AI for Scenario Planning, 2024, pp. 127–132. [Online]. Available: https://aclanthology.org/2024.finnlp-2.12/ [139] M. M. Dong, T. C. Stratopoulos, and V. X. Wang, “A scoping review of chatgpt research in accounting and finance,” International Journal of Accounting Information Systems, vol. 55, p. 100715, 2024. [140] D. Ram´ırez, J.-M. Pe˜na, F. Su´arez, O. Larr´e, and A. Cifuentes, “A machine learning plus-features based approach for optimal asset allocation,” in Proceedings of the 4th ACM International Conference on AI in Finance. ACM, 2023, pp. 549–556. [141] T. J. Norman, D. H. Sleeman, and N. Chapman, “Adaptive brokering in agent-mediated electronic commerce,” 2004.