=== PAGE 1 === SAKSHI: Decentralized AI Platforms Suma Bhat1,3∗ Canhui Chen2 Zerui Cheng2 Zhixuan Fang2 Ashwin Hebbar1 Sreeram Kannan5 Ranvir Rana4 Peiyao Sheng3 Himanshu Tyagi4 Pramod Viswanath1,4 Xuechao Wang6 1 Princeton University 2 Tsinghua University 3 University of Illinois Urbana-Champaign 4 Witness Chain 5 Eigen Layer 6 HKUST August 1, 2023 Abstract Large AI models (e.g., Dall-E, GPT4) have electrified the scientific, technological and societal landscape through their superhuman capabili- ties. These services are offered largely in a traditional web2.0 format (e.g., OpenAI’s GPT4 service). As more large AI models proliferate (person- alizing and specializing to a variety of domains), there is a tremendous need to have a neutral trust-free platform that allows the hosting of AI models, clients receiving AI services efficiently, yet in a trust-free, incen- tive compatible, Byzantine behavior resistant manner. In this paper we propose SAKSHI, a trust-free decentralized platform specifically suited for AI services. The key design principles of SAKSHI are the separation of the data path (where AI query and service is managed) and the control path (where routers and compute and storage hosts are managed) from the transaction path (where the metering and billing of services are managed over a blockchain). This separation is enabled by a “proof of inference” layer which provides cryptographic resistance against a variety of misbe- haviors, including poor AI service, nonpayment for service, copying of AI models. This is joint work between multiple universities (Princeton Uni- versity, University of Illinois at Urbana-Champaign, Tsinghua University, HKUST) and two startup companies (Witness Chain and Eigen Layer). ∗Authors are listed alphabetically. Correspondance to : {hebbar, pramodv}@princeton.edu 1 arXiv:2307.16562v1 [cs.CR] 31 Jul 2023 === PAGE 2 === 1 Introduction Era of AI. Artificial Intelligence (AI) has been steadily making progress on a variety of tasks (household tasks by vacuuming robots [1, 2], playing games – Chess, Go [3, 4, 5] – at superhuman levels, scientific discovery via protein fold- ing predictions [6, 7], medical progress by drug discoveries [8, 9, 10]), but have broken through the barrier of general intelligence in recent months with the emergence of a new family of generative deep learning models – GPT4 [11, 12] is the prototypical application capturing the world’s attention, at a tremen- dous energy price. GPT4 has super-human mastery over natural language, and can comprehend complex ideas, exhibiting proficiency in a myriad of domains such as medicine, law, accounting, computer programming, music, and more. Moreover, GPT4 is capable of effectively leveraging external tools such as search engines, calculators, and APIs to complete tasks with minimal instructions and no demonstrations, showcasing its remarkable ability to adapt and learn from external resources. Such progress portends AI’s forthcoming dominance in medi- ating (and replacing under several situations) human interactions, and promises AI to be the dominant energy consuming activity for years to come. Large Generative AI Models. An AI model that is largely representative of the class is generative AI, which creates content that resembles human-generated ones. These models have attracted considerable interest and popularity due to their impressive capabilities in generating high-quality, realistic images, text, video and music. For instance, large language models (LLMs) like ChatGPT [13], Bard [14], and LLaMA [15] attain impressive performance on a wide ar- ray of tasks and are being integrated in products such as search engines [16], coding assistants [17] and productivity tools in Google Docs [18]. Further, text-to-image models like StableDiffusion [19], MidJourney [20], Flamingo [21], text-to-music models like MusicLM, [22] and text-to-video models like Make- a-Video [23] have shown the immense potential of large multimodal generative AI models. As large generative AI models continue to evolve, we will witness the emergence of numerous fine-tuned and instruction-tuned models catering to specific use cases (e.g., healthcare, finance, law). Whilst models grow rapidly, Amazon and Nvidia report that AI inference tasks particularly account for up to 90% of the computational resource in AI systems, which are much more fre- quently demanded than AI model training tasks [24]. In this white paper, we mainly focus on the AI inference tasks, but the flexibility of our layer architec- ture design allows the market for model training as well. Current model: Centralized inference. The dominant platform of serving these large models is through public inference APIs [25, 26, 27], offered via by the dominant platform companies of today’s economy. For example, the Ope- nAI API allows users to query models like ChatGPT and DALL-E over a web interface. Although this is a relatively user-friendly option, it is susceptible to the deleterious side-effect of centralization: monopolization. Apart from the rent-seeking aspect of the centralized nature of the service offering, privacy im- 2 === PAGE 3 === plications loom large: the human interactions mediated by generative AI models is vastly more personal and intrusive than a web browsing and search queries. Addressing the grand challenge of AI computation via the design of decentral- ized and programmable platforms is the goal of this paper. Proposed model: Decentralized Inference. In this paper, we propose to decentralize AI inference across servers provided by consumer devices at the grid edge. Decentralized inference can reduce communication and energy costs by leveraging local computation capabilities. This is made possible by utilizing energy-efficient devices located at the edge, which could potentially be powered by renewable energy sources. Crucially, the energy overhead of running large data-centers is largely reduced, simultaneously opening an opportunity to democratize AI whilst limiting its ecological footprint. Such a decentralized platform would also enable the deployment of a library of large customized models in a scalable manner - users can host in-demand customized models on this decentralized cloud, and earn appropriate rewards. Our decentralized AI platform, SAKSHI, is populated by a host of different agents: AI service providers, AI clients, storage and compute hosting nodes. A carefully designed incentive fabric stitches the different agents together into an efficient, trustworthy, and economically fruitful AI platform. Our design of SAKSHI is best visualized in terms of a layered architecture (analogous to network stacks). The layers are enumerated below and visualized in Figure 1. 1. Service layer. This is the path where the query and response (AI infer- ence) are managed. The goal is to have high throughput and low latency – the goal is to enable user journey similar to a standard web2-like ser- vice, with the underlying resources (storage, computation) and economic transaction managed in a decentralized and trustless manner. 2. Control layer. This is the path where networking and compute/storage load balancing actions are managed. The decentralized AI models are hosted at multiple locations connected via a (potentially peer to peer) network, and our decentralized design borrows from classical web2 con- tent delivery network designs (e.g., Akamai) while managing the economic transaction also in a decentralized and trustless manner. 3. Transaction layer. This is the path where billing and metering are conducted. The key is to have this outside the data path and visible to a broader audience (e.g., via commitments on blockchains). Importantly this is trust free crucially enabled via Witness Chain’s transaction layer service (originally designed for decentralized 5G wireless networks [28], but now naturally repurposed for decentralized AI services). 4. Proof layer. Any dispute in terms of metering and billing are handled here. These proofs also provide resistance to unauthorized usage (e.g., just copying) of AI models. This is definitely outside the data path, but also outside the transaction path. This layer allows the formulation of novel 3 === PAGE 4 === Figure 1: The six layer architecture for Web3.0 services 4 === PAGE 5 === research questions (at the intersection of large AI models, cryptography and security). We highlight three such key questions: (i) Proof of Infer- ence – where the proof of computation of a specific (deep learning) AI model can be verified; (ii) Proof of ownership, fine-tuning and watermark- ing – where the proof of downstream modification to an AI model can be verified; (iii) Proof of service delivery – where the proof of the delivery of an AI service can be verified at customizable granularities. These dispute resolutions naturally feed into a reputation system (leading to positive incentives for salutary behavior) or crypto economic security via slashing (negative incentives; see next layer). This new research, outlined in detail in this paper, is joint work between multiple universities (Princeton Uni- versity, University of Illinois at Urbana-Champaign, Tsinghua University, HKUST), and two blockchain startups Witness Chain and Eigen Layer. 5. Economic layer. So far, the transactions can be handled purely via fiat without the need for a token. This layer explores the benefits of having a token to incentivize participants, both in the transient and long term stages and the corresponding economic benefits therein. →Eigenlayer integration and ideas. 6. Marketplace. Compositional AI services, in a single atomic transaction, are naturally enabled. The common data shared on the blockchain leads to the creation of a decentralized marketplace for AI services. The supply and demand allows the efficient discovery of prices. Optional in the first version. 2 Architecture of Decentralized AI Service 2.1 Requirements We now describe a specific architecture based on the general six layer architec- ture outlined in the last section, allowing SAKSHI to be concrete. Our decentral- ized AI service is designed to enable an open marketplace for AI models where any user can access inference service offered by multiple, untrusted AI service suppliers. Our goal is to ensure that the user is guaranteed a good quality of service and the suppliers get a fair payment for their service. There are several challenges that can hinder bootstrapping and growth of such a decentralized service: 1. Individual suppliers may not be able to attract enough clients; 2. The supplier may not apply a good model and return low quality results; 3. The client may not pay after getting the service. Each of these challenges is addressed by our decentralized AI service model: 5 === PAGE 6 === Client interface Client contract (Service payment channel) Aggregator Servers Supply contract (Service payment channel) Router Marketplace PoInference Selects aggregator Select’s router Marketplace layer Control layer Service layer Transaction layer Proof layer Rewards Economics layer Figure 2: SAKSHI- Decentralized AI service architecture 1. We allow an aggregator to collectively offer service on behalf of multiple suppliers. The aggregator and suppliers engage in an SLA implemented as a smart contract to ensure that each gets a fair share of the revenue. 2. We have a proof system for quality of AI services to ensure that suppli- ers provide the promised quality of service. The proof is implemented through a challenge-response setup executed using a decentralized pool of challenger nodes. 3. We have smart contracts and payment channels to implement scalable and reliable payment service for the suppliers. This will be supported by an objective dispute resolution mechanism to ensure that suppliers can get paid if they deliver service. 2.2 The six layer architecture with Witness Chain These functionalities of SAKSHI are enabled using the architecture in Figure 2. At the top is the marketplace, a decentralized two-sided platform for buying and selling AI services. A client (user) comes to our marketplace and places an order to access inference service from an aggregator. Both agree on an SLA which contains terms for quality of service and payments. Next comes the service layer that provides the APIs for clients to make inference requests to the aggregators. This request is appropriately passed to a matching supplier server using a router deployed as a part of the control layer. Both service and control layer are reminiscent of standard web 2.0 services with multiple servers, with the caveat that the supplier servers can now be hosted 6 === PAGE 7 === by different entities with their own business incentives and without any pre- existing reputation. These servers are bound to an SLA between them and the aggregator. All the SLAs that govern the service-payment rules between different par- ties are deployed as smart contracts as a part of the transaction layer, a de- centralization middleware provided by Witness Chain [29]. The Witness Chain transaction layer not only hosts and provides interfaces for the SLA smart con- tracts, but also provides state channels to maintain the payment and service state for interacting client, aggregator and supplier. Furthermore, it provides a dispute resolution framework to ensure that the client completes the payment after availing the service. Finally, a proof layer deploys an appropriate Proof of Inference to ensure that the suppliers are using models agreed upon in the SLA. This challenge and verification for this proof is executed by a pool of challengers, Witnesses, provided by Witness Chain. These proofs interact with the transaction layer to ensure service quality promised in the SLA. The Witness Chain challenger nodes executing these proofs are incentivised by Witness Chain using a part of service payment. Witness Chain, in turn, provides a programmable layer for choosing the challenger nodes which can be used to specify how decentralized the challenger pool should be and how well-provisioned each challenger node needs to be. A detailed description of each layer is provided in Section 3; the interactions discussed above are depicted at a high level in Figure 3 below. 2.3 The economic layer with Eigen Layer All entities in the above ecosystem are incentivized to do their job fairly be- cause of the economics underlying the SLA and the incentive system for the challengers. Often, each new blockchain ecosystem launches its own token to provide this cryptoeconomic security. However, this new token may not gain the necessary volume and spread to enforce reasonable security in the early stages, resulting in failure to bootstrap for the ecosystem. This problem was solved recently by Eigen Layer [30] which provides a frame- work for using Ethereum cryptoeconomic security by engaging Ethereum valida- tors. Witness Chain integrates with Eigen Layer and uses Eigen Layer operators as challengers to extend Ethereum security to the decentralized AI marketplace. The challengers running the Proof of Inference, the ultimate root of trust in ser- vice quality, would have staked/restaked Eth using Eigen Layer. Witness Chain deploys an additional proof of custody [29] to ensure that these challengers are being diligent in their job, lest their stake be slashed. Putting the restaking framework of Eigen Layer together with the proof of diligence/custody by Wit- ness Chain provides a comprehensive economic security layer for SAKSHI. 7 === PAGE 8 === Client Aggregator Client signs SLA with aggregator Aggregator Aggregator signs SLA with servers Server Client Service Interface Router Server 2. Request router to match a server 1. API call 3. Assign server 4. Input/Output exchange Service payment Server 1. Raise dispute Transaction/Proof layer contracts 2. Post interactive commitments Server Client Aggregato r 3. Resolve dispute Transaction/Proof layer contracts Initiation phase Service usage phase Dispute phase Figure 3: Various steps in using SAKSHI 8 === PAGE 9 === Client interface Server Router Control layer 1. Assign server 2. Server ID 2. Client ID 3. Handshake 4. Process request PoInference Proof layer Data availability 4. Service commitments Transaction layer 5. Payments 5. Payments Marketplace 5. Payments Figure 4: Service Layer overview 3 Detailed Description of Each Layer 3.1 Service layer The service layer enables the infrastructure for ML inference queries and is re- sponsible for committing service information to the proof layer. This layer is equivalent to a Web2 server-client architecture with some modifications to sup- port the proof framework. An instantiation of this layer creates a connection between a client and a server to exchange data and makes the server’s com- pute available through agreed-upon Inference APIs. The service layer works in conjunction with other layers in the infrastructure as depicted in Figure 4 and described below: Server Assignment: The client requests the control layer to assign a server for an AI model, and the control layer notifies the client of the server’s ID and address. It also notifies the server of an incoming connection from the client. Service exchange: The client establishes a connection with the server using the address provided by the control layer. Both server and client verify through the transaction layer if an SLA path exists between them through the common aggregator; if such a path exists, both parties implicitly agree on the trade. The client sends inference requests using the server’s API endpoint; the client signs 9 === PAGE 10 === Router Client interface PoInference Witnesses (PoLocation, PoBackhaul) Proof layer 0. Maintain server state 0. Maintain server state Transaction layer 1. Update SLAs Service layer Servers 2. Matching request 3. Match client-server Figure 5: Control layer overview the request for use in dispute resolution if the need arises. The server processes the requests and sends the output data back to the client as the response; the server might submit a commitment to the delivered response on a DA layer at a later stage if the need arises for dispute resolution. Per service of a single unit of inference - a single API request, the server anticipates a micropayment as dictated by its SLA. A request is made to the transaction layer, which then sends payments from the client to the aggregator and from the aggregator to the server. The server proceeds to serve the subsequent request from the client only if the payment for the previous request is processed. Service dispute witnesses: The data exchanged in the service layer is used as a witness in case a payment dispute arises, such as a client not paying for the AI inference service delivered. The signed inference requests, output data committed to a DA layer, and the previous exchanged micropayment will be used for dispute resolution, as discussed in detail in the following sections on the Transaction and Proof layers. 3.2 Control Layer The control layer is responsible for matching clients and servers. This layer consists of a set of routers that maintains the state of all servers subscribed to it. It performs load balancing by allocating client requests to servers that opti- mize cost measured in latency, compute cost, and compliance to SLAs. Servers can subscribe to a router of their choice, and clients can select a router of their choice. The control layer works in conjunction with other layers as depicted in figure 5 and described below: Server state maintenance: Router maintains a server network state consist- ing of the following non-exhaustive set of variables: 10 === PAGE 11 === • Server model capacity: The set of AI models that the server can compute inference on • Server hardware capacity: The compute capacity of each server • Server request load: The number of clients the server is currently con- nected to at the service layer • Server location: Verified server location from the proof layer Some of these variables require the routing trusting server’s claims - these will be used for soft constraints in routing, whereas other variables such as location will be verified through the proof layer - these can be used for hard constraints such as geo-restricting the inference compute. SLA state maintenance: The router maintains the state of SLAs signed at the transaction layer between client-aggregators and aggregator-servers so that it can match clients to servers that share a common aggregator. The router watches the transaction layer contracts for events to register or de-register SLAs. Client-server matching: The client submits a request specifying the type of server it would like to be matched to - this request consists of parameters such as model id, location boundary, server uptime, etc. The router runs a matching logic to select a server best suited for that model at that time by utilizing the server state and the SLA state. The router then notifies the service layer to establish a connection between the client and the servers and the transaction layer to anticipate payments through their common aggregator. Note on fairness: A malicious router can unfairly route requests leading to a loss in revenue for some servers; if a server sees such behavior, it will migrate to another router that provides better revenue by providing fair routing. This market dynamic facilitates fairness in routing. 3.3 Transaction Layer The transaction layer is responsible for payment to servers and intermediaries for delivering their service. 3.3.1 Necessity of an integrated transaction layer Decentralized platforms generate supply by incentivizing and compensating an extensive network of parties - termed suppliers. The platform can be considered a marketplace for the service supply chain, with service flowing from suppliers (servers) to intermediaries and finally to consumers and compensation flowing the other way. A compensation system is, therefore, a critical part of a decen- tralized service-oriented platform. Compensation for providing services is already an integral part of existing centralized platforms such as Uber, AirBnB, and Amazon; however, the billing systems used for their decentralized counterpart need to be composable with 11 === PAGE 12 === the trustless and programmable service framework that decentralized platforms enable. Decentralized platforms need the billing system to support automated smart contract-initiated dispute resolution and high-speed dispersion of funds, as we will see next. The transaction layer incorporates the web3 equivalent of a billing system. The transaction layer ties the billing of a service with a Service Level Agreement (SLA) that codifies the terms of service and payment, and ensures that metering for the SLA is consistent with the service delivered. 3.3.2 Scalability solutions Decentralized AI platforms cannot rely on the assumption of trust between a server and a client since either party may be too small to be bound by the principles of reputation maintenance or legal agreements. Thus, they need to be constantly in consensus about the amount of inference service delivered and payment for such service. A requirement for achieving this consensus is that it must be achieved per delivery of an inference service unit - a query. All parties involved in service delivery must agree on the service delivered and settle payment for that service delivered at frequent intervals. This requirement necessitates a high throughput, low latency payment system. Consensus literature is rich in solutions to scale payment p ranging from sharding, rollups, and sidechains to payment channels. Our payment system should ideally satisfy the following properties: • High throughput of payments • Low latency between payment initiation and confirmation • Scale throughput with the number of supply or demand side participants • Payment per service delivery is not public information and may only be shared between the supplier, consumer, and the chosen intermediaries. State channels and payment channels satisfy all the above requirements. Modeling a decentralized AI platform, we observe that a single client will inter- act with multiple servers to query for different models and use different suppliers for inter-session privacy. The requirement for managing a state channel across multiple servers is not scalable. Hence we choose a payment channel approach to build the transaction layer’s payment system. We will have a payment chan- nel between a client and an aggregator intermediary and another between the aggregator intermediary and server, enabled by SLA chaining. Figure 6 depicts the interaction of transaction layer components with other layers, with details on the architecture below: 3.3.3 Architecture overview The transaction layer encompasses SLAs that any two parties agree on, an SLA manager that converts service measurements to payments using SLA, SLA clients running on machines of both parties fetching data from the measurement 12 === PAGE 13 === Client contract (Service payment channel) Supply contract (Service payment channel) SLA manager Service layer Control layer Proof layer Marketplace layer 1. Match client- aggregator 1. Match aggregator-supplier 3. Measurements 4. Micropayments 2. Maintain SLA state 5. Resolve inference disputes 6. Periodic commitment 6. Periodic Commitment Figure 6: Transaction layer overview gateway, and a blockchain wrapper for posting transactions. These components are described in detail below: Service contracts: Service contracts consist of two components: A SLA that both the transacting parties agree on and a unidirectional payment channel with funds flowing from the service consumer to the supplier. For the AI platform there exists two consumer - supplier pairs: (i) Client - Aggregator and (ii) Aggre- gator - Server. The SLA is codified based on a SLA4OpenAPI standard [31] and maps service usage to a payment. SLAs for AI application maps (model type, input size, output size) to token payment amount. The unidirectional payment channel is set up with an escrow from the consuming party to supplying party and set’s terms of delegation of payment keys to an intermediary SLA manager. SLA manager: SLA manager end clients are given to run a codebase that signs micropayments or delegate it to an application running on the cloud: SLA manager. SLA manager receives signed measurements from the consumer and supplier’s SLA client and converts that to an appropriate payment amount by signing a micropayment and sending funds on the payment channel on behalf of the consumer. SLA client and measurement gateway: SLA client and measurement gate- way are components that run on the end devices of the consumer and supplier. The measurement gateway interprets the service messages and converts them into service units. For AI applications, these would be the model requested, input size, and output size. The SLA client fetches this information from the measurement gateway, signs it with the key codified in the service contract, and sends it to the SLA manager; optionally, the SLA client (on the consumer end) can convert the measurement to a micropayment themselves and forward it to the supplier. 13 === PAGE 14 === Blockchain wrapper This component runs on the SLA manager and SLA client. It is responsible for broadcasting and listening to on-chain transactions such as payment channel start, termination, and dispute messages on-chain. The blockchain wrapper is compatible with multiple blockchains such as Ethereum, Polygon, Solana, and all EVM-compatible rollups. 3.3.4 Dispute-compatibility SAKSHI utilizes a post-service payment model - Payment disputes can emerge when a supplier claims non-receipt of payment for a service unit (a single AI inference). The associated micropayment can serve as a proof of payment to resolve such disputes. Micropayments in unidirectional payment channels typ- ically consist of a signed commitment of the total payable amount. To render these payment channels to be dispute-compatible, we need to augment them with additional parameters. Firstly, the micropayment should include a unique ‘requestID’ that corresponds to the disputed inference. Secondly, it should con- tain the hash of the preceding micropayment, which can be validated using a nonce - a counter incremented with each successive micropayment. To resolve a payment dispute raised by the server, the payer can commit the associated micropayment. Additionally, the preceding micropayment must also be commit- ted, to calculate the amount payable for the disputed service unit. Depending on who is deemed to be correct, the dispute can be settled on-chain from the existing balance in the payment channel. Our dispute resolution protocol also addresses other scenarios, such as disputes raised by a malicious server with- out providing service, and inconsistent micropayment commitments. Figure 7 depicts an example flow of utilizing payment channel commitments for service dispute resolution. 3.4 Proof Layer The proof layer, operating outside the data and transaction paths, provides a way to resolve various disputes in SAKSHI, utilizing blockchains as a immutable and trusted medium to read and write service states. A variety of disputes can arise in the AI service and “proof” systems to provide cryptographic resolution mechanisms address the corresponding issues. In this paper, we focus on two categories of proofs, each responding to different types of disputes. • Proof of Inference, a proof of correct computation on a prescribed (and open) AI model, mediates disputes of correct inference; • Proof of Model-ownership, a proof of how closely two AI models are related to each other and whether one AI model is a clone or a fine-tuned version of the other, mediates potential disputes related to intellectual property held by the owner of an AI model. 14 === PAGE 15 === Ideal path Service dispute Dispute resolution Figure 7: Utilizing transaction layer payments for service dispute resolution Service layer Transaction layer Economics layer Data availability Server network Witnesses Dispute resolution contract Client interface Incentives Micropayments Requests Figure 8: Proof layer overview Figure 8 depicts the interaction of the dispute resolution contract in the proof layer with the rest of the platform layers. A detailed description of the individual proof follows. 3.4.1 Proof of Inference A crucial aspect of decentralized inference platforms is the presence of incen- tives that encourage honest participation in the protocol while discouraging malicious actors. An essential component of this incentive design is addressing the problem of provably verifying computations executed by untrusted servers. Various design choices are available to enable such proof of inference, with sev- eral emerging research directions. One such line of research involves the application of zero-knowledge proofs (ZKP) to verify AI model execution [32]. However, this approach is extremely computationally intensive, necessitating concessions such as quantization, which 15 === PAGE 16 === leads to lower accuracy. Furthermore, generating ZKPs for modern, large-scale generative AI models is currently impractical. An alternative strategy is to adopt an optimistic approach. In this scheme, the server commits the hash of the generated output, and the system assumes the off-chain inference to be accurate. If a participant (“challenger”) doubts the inference’s correctness, they can contest its validity by submitting a fraud proof. This proof can be generated using a verification oracle that can re-run the model and determine the accuracy of the server’s or challenger’s claim. However, since these oracle nodes may have limited computational capabilities, recomputing the entire neural network forward pass is prohibitively expensive and inefficient. To address this issue, we propose a method inspired by the bisection scheme employed in the optimistic rollup Arbitrum [33]. A key observation is that AI models can be viewed as a sequence of functions, such as layers in a neural network. f(x) = y → fn(fn−1(fn−2(...f2(f1(x))...))) = y When there is a discrepancy between the outputs of a server and a challenger, we can employ an interactive bisection scheme to identify a single function—the first layer in the AI model where the outputs of the two parties differ. By im- plementing this system, oracle nodes only need to compute and verify a single layer of the network, significantly reducing costs and making the verification of extremely large models feasible. Indeed, deterministic AI inference is a prereq- uisite for such schemes, which is attainable by fixing the random state. We illustrate our ModelBisection algorithm in Figure 9, that identifies the earliest layer of the AI model where the inputs align for both parties, but the resulting outputs diverge, while minimizing the number of interactive steps in- volved. In case of a sequential model (left), one can use a form of binary search - if the output of a queried layer (typically the midpoint) is inconsistent between the parties, we recursively bisect the first half of the node sequence. Otherwise, we eliminate the first half, and recursively bisect the second half of the sequence. Each bisection step eliminates half of the remaining candidates for the faulty layer. After a logarithmic number of iterations, we locate a layer whose input is consistent, yet the parties produce differing outputs. However, the computations within an AI model are not simply sequential but rather form a Directed Acyclic Graph (DAG) structure. Consequently, the bisection mechanism used for sequential networks cannot be directly applied to AI models. We demonstrate our approach, ModelBisection, on an Inception block of GoogLeNet [34] as depicted in Figure 9 (right). Suppose we select the node n1 = L2.2 in the DAG for output verification. Both parties compute and share the intermediate output of layer L2.2. If the outputs are equal, we prune all ancestor nodes of this node in the DAG from consideration (as their outputs would have to be consistent). If, however, the outputs differ, we eliminate all non-ancestor nodes of this node in the DAG (since one of outputs among ancestors must be inconsistent). We keep track of the identified consistent and inconsistent nodes, and continue this process until we reach a single layer where the inputs are consistent between the parties, but the outputs differ. We employ 16 === PAGE 17 === L1 L2 L n 2 L n 2 +1 Ln L1 L2 L n 2 L n 2 L n 2 +1 Ln Li Li+1 Consistent Prune ancestors ⟹ Consistent Prune ancestors ⟹ Inconsistent Prune non-ancestors ⟹ Inconsistent Prune non-ancestors ⟹ Check L n 2 Check L2.2 L1 L2.4 L2.1 L3.1 L3.2 L2.2 L2.3 L3.3 L4 L2.2 L3.1 L2.1 L2.4 L2.3 L3.3 L4 L3.2 L2.2 Li.1 Li.2 Li.3 Li+1 Inception Module Base Conv 1x1 5x5 1x1 L3.2 Avg Pool 1x1 Concat 1x1 Inception Module ReLU Linear BatchNorm ReLU Linear BatchNorm Feedforward NN GoogLeNet Step 1 Convert AI model into DAG Step 2 ModelBisection first step Repeat Until layer found Inputs consistent Output inconsistent Node with consistent output Node with inconsistent output Unchecked Node Legend Figure 9: Model bisection 17 === PAGE 18 === a greedy strategy to select the node in the digraph such that it is split in the most balanced way. We choose the node which maximizes min{|x|, n −|x|}, where |x| is the number of ancestors of node x, and n is the total number of nodes in the current digraph. This score can be interpreted as the least number of nodes that would be eliminated as potential candidates for the first point of divergence, when x is queried, thus minimizing the number of ModelBisection rounds. It’s noteworthy that even in large foundation models, the ModelBisection approach can pinpoint a single layer of divergence in a very small number of iterations. For example, in the case of the 13 billion parameter LLaMA model [15], fewer than ten iterations suffice. Finally we observe that the bisection subroutine bears similarity to the one utilized by GitHub in git bisect, which aids in identifying the first faulty entry in the DAG of commits and merges. 3.4.2 Proof of Model ownership A decentralized AI marketplace comprises three main entities - model owners who collect datasets and train or finetune AI models, compute-rich servers, and end-users. As opposed to current open-source model hosting solutions, decen- tralized marketplaces can allow incentivizing model creators by rewarding them a percentage of the inference fee when their models are utilized. However, such an incentive design is susceptible to model copying attacks, where a malicious actor can copy, slightly modify, and profit from the hosted models at the cost of the model creators. Therefore, a robust mechanism for model ownership resolution becomes a crucial prerequisite for decentralized AI marketplaces. One promising solution for a proof of model ownership is by embedding a watermark in the neural networks during the training phase. To be ef- fective, a DNN watermarking scheme must fulfill several criteria: it should be functionality-preserving, meaning the watermark embedding must not im- pact model performance. The watermark must be robust, and be extractable from any transformed model (e.g., through weight scaling or finetuning). Ad- ditionally, a watermarked model should remain indistinguishable from a non- watermarked model to potential adversaries. Moreover, a watermark must be resistant to ambiguity attacks - false claims of existence of a different watermark. Various watermarking schemes have been proposed in research literature. Parameter encoding methods [35, 36, 37], integrate a watermark directly into the model’s parameters. For classification models, an alternate method involves backdooring, which involves assigning incorrect labels to examples in a trigger set, and this can be used as a watermark [38, 39]. Additionally, task-specific and model-specific watermarking methods have been proposed [40, 41, 42, 43]. Nonetheless, the robustness of existing methods against model copying has been questioned by recent attacks [44, 45, 46], highlighting an unresolved research challenge. Notably, in most watermark extraction algorithms, information about the watermark location or the trigger examples are revealed during the verification process. This knowledge facilitates easier watermark removal and ambiguity attacks. Therefore, in our system a trusted judge is required to resolve model 18 === PAGE 19 === ownership disputes. Model creators must embed watermarks in their models, and commit a commitment of the watermark on the blockchain. The judge must be able to verify the existence of watermarks using the extraction algorithm, which may be task and model-specific. Such a proof of model ownership can en- sure the non-feasibility of profiting from stolen models within the decentralized marketplace. However, it does not prevent an adversary from copying a model and using it outside this system (eg - via a black-box api). Such acts can be deterred by licensing the model’s use only in this marketplace, and resorting to legal means if necessary. 3.5 Summary Proofs of inference and ownership are two examples of a broader family of pro- tocols providing Byzantine resistance in SAKSHI. Even here, we have worked more to describe the problems rather than the solutions – a call to arms from the scientific community. As the platform evolves and participation rises, the attack space could also expand opening the door for new and different kinds of proof systems (e.g., proof of custody; proof of infrastructure hosting the AI models). References [1] iRobot. Roomba robot vacuums. https://www.irobot.com/en_US/ roomba.html. Accessed: 2023-03-23. [2] Boston Dynamics. The most dynamic humanoid robot. https://www. bostondynamics.com/atlas. Accessed: 2023-02-01. [3] David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Ku- maran, Thore Graepel, et al. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815, 2017. [4] David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of go without human knowledge. nature, 550(7676):354–359, 2017. [5] David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Ku- maran, Thore Graepel, et al. A general reinforcement learning algo- rithm that masters chess, shogi, and go through self-play. Science, 362(6419):1140–1144, 2018. 19 === PAGE 20 === [6] John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Fig- urnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Au- gustin ˇZ´ıdek, Anna Potapenko, et al. Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, 2021. [7] Richard Evans, Michael O’Neill, Alexander Pritzel, Natasha Antropova, Andrew Senior, Tim Green, Augustin ˇZ´ıdek, Russ Bates, Sam Blackwell, Jason Yim, et al. Protein complex prediction with alphafold-multimer. BioRxiv, pages 2021–10, 2021. [8] Jonas Bostr¨om, Dean G Brown, Robert J Young, and Gy¨orgy M Keser¨u. Expanding the medicinal chemistry synthetic toolbox. Nature Reviews Drug Discovery, 17(10):709–727, 2018. [9] Alexey Strokach, David Becerra, Carles Corbi-Verge, Albert Perez-Riba, and Philip M Kim. Fast and flexible protein design using deep graph neural networks. Cell systems, 11(4):402–411, 2020. [10] Petra Schneider, W Patrick Walters, Alleyn T Plowright, Norman Sieroka, Jennifer Listgarten, Robert A Goodnow Jr, Jasmin Fisher, Johanna M Jansen, Jos´e S Duca, Thomas S Rush, et al. Rethinking drug design in the artificial intelligence era. Nature Reviews Drug Discovery, 19(5):353–364, 2020. [11] OpenAI. Gpt-4 technical report, 2023. [12] S´ebastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lund- berg, et al. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023. [13] Introducing chatgpt, 2022. Retrieved March 14, 2023, from https://openai.com/blog/chatgpt. [14] Google. BARD. https://blog.google/technology/ai/ bard-google-ai-search-updates/. [15] Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie- Anne Lachaux, Timoth´ee Lacroix, Baptiste Rozi`ere, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation lan- guage models. arXiv preprint arXiv:2302.13971, 2023. [16] Yusuf Mehdi. Reinventing search with a new ai- powered microsoft bing and edge, your copilot for the web. https://blogs.microsoft.com/blog/2023/02/07/ reinventing-search-with-a-new-ai-powered-microsoft-bing-and-edge-your-copilot-for-the-w [17] Github CoPilot. Your ai pair programmer is leveling up. https://github. com/features/preview/copilot-x, 2023. Accessed: 2023-03-24. 20 === PAGE 21 === [18] Google Cloud. The next generation of ai for developers and google workspace. https://blog.google/technology/ai/ ai-developers-google-cloud-workspace/, 2023. Accessed: 2023- 03-24. [19] Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨orn Ommer. High-resolution image synthesis with latent diffusion mod- els. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022. [20] Midjourney. https://www.midjourney.com. Accessed: 2023-03-23. [21] Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katie Millican, Malcolm Reynolds, et al. Flamingo: a visual language model for few-shot learning. arXiv preprint arXiv:2204.14198, 2022. [22] Andrea Agostinelli, Timo I Denk, Zal´an Borsos, Jesse Engel, Mauro Verzetti, Antoine Caillon, Qingqing Huang, Aren Jansen, Adam Roberts, Marco Tagliasacchi, et al. Musiclm: Generating music from text. arXiv preprint arXiv:2301.11325, 2023. [23] Uriel Singer, Adam Polyak, Thomas Hayes, Xi Yin, Jie An, Songyang Zhang, Qiyuan Hu, Harry Yang, Oron Ashual, Oran Gafni, et al. Make- a-video: Text-to-video generation without text-video data. arXiv preprint arXiv:2209.14792, 2022. [24] Joseph McDonald, Baolin Li, Nathan Frey, Devesh Tiwari, Vijay Gadepally, and Siddharth Samsi. Great power, great responsibility: Recommendations for reducing energy for training language models. In Findings of the As- sociation for Computational Linguistics: NAACL 2022, pages 1962–1970, 2022. [25] OpenAI. Transforming work and creativity with ai. https://openai.com/ product. Accessed: 2023-03-23. [26] Forefront. Powerful language models a click away. https://forefront. ai/. Accessed: 2023-03-23. [27] AI21 Labs. When machines become thought partners. https://ai21.com/. Accessed: 2023-03-23. [28] SVR Anand, Serhat Arslan, Rajat Chopra, Sachin Katti, Milind Kumar Vaddiraju, Ranvir Rana, Peiyao Sheng, Himanshu Tyagi, and Pramod Viswanath. Trust-free service measurement and payments for decentral- ized cellular networks. In Proceedings of the 21st ACM Workshop on Hot Topics in Networks, pages 68–75, 2022. [29] Witness Chain team. Witness chain. https://www.witnesschain.com/. Accessed: 2023-07-16. 21 === PAGE 22 === [30] Eigenlayer. https://www.eigenlayer.xyz/. Accessed: 2023-07-17. [31] Sla4oai-specification. https://github.com/isa-group/ SLA4OAI-Specification, 2022. [32] Daniel Kang, Tatsunori Hashimoto, Ion Stoica, and Yi Sun. Scaling up trustless dnn inference with zero-knowledge proofs. arXiv preprint arXiv:2210.08674, 2022. [33] Harry Kalodner, Steven Goldfeder, Xiaoqi Chen, S Matthew Weinberg, and Edward W Felten. Arbitrum: Scalable, private smart contracts. In 27th {USENIX} Security Symposium ({USENIX} Security 18), pages 1353– 1370, 2018. [34] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Ra- binovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9, 2015. [35] Yusuke Uchida, Yuki Nagai, Shigeyuki Sakazawa, and Shin’ichi Satoh. Em- bedding watermarks into deep neural networks. In Proceedings of the 2017 ACM on international conference on multimedia retrieval, pages 269–277, 2017. [36] Bita Darvish Rouhani, Huili Chen, and Farinaz Koushanfar. Deepsigns: An end-to-end watermarking framework for ownership protection of deep neural networks. In Proceedings of the Twenty-Fourth International Confer- ence on Architectural Support for Programming Languages and Operating Systems, pages 485–497, 2019. [37] Lixin Fan, Kam Woh Ng, and Chee Seng Chan. Rethinking deep neural network ownership verification: Embedding passports to defeat ambiguity attacks. Advances in neural information processing systems, 32, 2019. [38] Yossi Adi, Carsten Baum, Moustapha Cisse, Benny Pinkas, and Joseph Keshet. Turning your weakness into a strength: Watermarking deep neural networks by backdooring. In 27th USENIX Security Symposium (USENIX Security 18), pages 1615–1631, 2018. [39] Sebastian Szyller, Buse Gul Atli, Samuel Marchal, and N Asokan. Dawn: Dynamic adversarial watermarking of neural networks. In Proceedings of the 29th ACM International Conference on Multimedia, pages 4417–4425, 2021. [40] Pierre Fernandez, Guillaume Couairon, Herv´e J´egou, Matthijs Douze, and Teddy Furon. The stable signature: Rooting watermarks in latent diffusion models. arXiv preprint arXiv:2303.15435, 2023. 22 === PAGE 23 === [41] Yunqing Zhao, Tianyu Pang, Chao Du, Xiao Yang, Ngai-Man Cheung, and Min Lin. A recipe for watermarking diffusion models. arXiv preprint arXiv:2303.10137, 2023. [42] Miranda Christ, Sam Gunn, and Or Zamir. Undetectable watermarks for language models. arXiv preprint arXiv:2306.09194, 2023. [43] John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers, and Tom Goldstein. A watermark for large language models. arXiv preprint arXiv:2301.10226, 2023. [44] Nils Lukas, Edward Jiang, Xinda Li, and Florian Kerschbaum. Sok: How robust is image classification deep neural network watermarking? In 2022 IEEE Symposium on Security and Privacy (SP), pages 787–804. IEEE, 2022. [45] Yifan Yan, Xudong Pan, Mi Zhang, and Min Yang. Rethinking white-box watermarks on deep learning models under neural structural obfuscation. In 32th USENIX security symposium (USENIX Security 23), 2023. [46] Jian Liu, Rui Zhang, Sebastian Szyller, Kui Ren, and N Asokan. False claims against model ownership resolution. arXiv preprint arXiv:2304.06607, 2023. 23