=== PAGE 1 ===
SAKSHI: Decentralized AI Platforms
Suma Bhat1,3∗
Canhui Chen2
Zerui Cheng2
Zhixuan Fang2
Ashwin Hebbar1
Sreeram Kannan5
Ranvir Rana4
Peiyao Sheng3
Himanshu Tyagi4
Pramod Viswanath1,4
Xuechao Wang6
1 Princeton University
2 Tsinghua University
3 University of Illinois Urbana-Champaign
4 Witness Chain
5 Eigen Layer
6 HKUST
August 1, 2023
Abstract
Large AI models (e.g., Dall-E, GPT4) have electrified the scientific,
technological and societal landscape through their superhuman capabili-
ties. These services are offered largely in a traditional web2.0 format (e.g.,
OpenAI’s GPT4 service). As more large AI models proliferate (person-
alizing and specializing to a variety of domains), there is a tremendous
need to have a neutral trust-free platform that allows the hosting of AI
models, clients receiving AI services efficiently, yet in a trust-free, incen-
tive compatible, Byzantine behavior resistant manner. In this paper we
propose SAKSHI, a trust-free decentralized platform specifically suited for
AI services. The key design principles of SAKSHI are the separation of the
data path (where AI query and service is managed) and the control path
(where routers and compute and storage hosts are managed) from the
transaction path (where the metering and billing of services are managed
over a blockchain). This separation is enabled by a “proof of inference”
layer which provides cryptographic resistance against a variety of misbe-
haviors, including poor AI service, nonpayment for service, copying of AI
models. This is joint work between multiple universities (Princeton Uni-
versity, University of Illinois at Urbana-Champaign, Tsinghua University,
HKUST) and two startup companies (Witness Chain and Eigen Layer).
∗Authors are listed alphabetically.
Correspondance to : {hebbar, pramodv}@princeton.edu
1
arXiv:2307.16562v1  [cs.CR]  31 Jul 2023


=== PAGE 2 ===
1
Introduction
Era of AI. Artificial Intelligence (AI) has been steadily making progress on a
variety of tasks (household tasks by vacuuming robots [1, 2], playing games –
Chess, Go [3, 4, 5] – at superhuman levels, scientific discovery via protein fold-
ing predictions [6, 7], medical progress by drug discoveries [8, 9, 10]), but have
broken through the barrier of general intelligence in recent months with the
emergence of a new family of generative deep learning models – GPT4 [11, 12]
is the prototypical application capturing the world’s attention, at a tremen-
dous energy price. GPT4 has super-human mastery over natural language, and
can comprehend complex ideas, exhibiting proficiency in a myriad of domains
such as medicine, law, accounting, computer programming, music, and more.
Moreover, GPT4 is capable of effectively leveraging external tools such as search
engines, calculators, and APIs to complete tasks with minimal instructions and
no demonstrations, showcasing its remarkable ability to adapt and learn from
external resources. Such progress portends AI’s forthcoming dominance in medi-
ating (and replacing under several situations) human interactions, and promises
AI to be the dominant energy consuming activity for years to come.
Large Generative AI Models. An AI model that is largely representative of
the class is generative AI, which creates content that resembles human-generated
ones. These models have attracted considerable interest and popularity due to
their impressive capabilities in generating high-quality, realistic images, text,
video and music. For instance, large language models (LLMs) like ChatGPT
[13], Bard [14], and LLaMA [15] attain impressive performance on a wide ar-
ray of tasks and are being integrated in products such as search engines [16],
coding assistants [17] and productivity tools in Google Docs [18].
Further,
text-to-image models like StableDiffusion [19], MidJourney [20], Flamingo [21],
text-to-music models like MusicLM, [22] and text-to-video models like Make-
a-Video [23] have shown the immense potential of large multimodal generative
AI models. As large generative AI models continue to evolve, we will witness
the emergence of numerous fine-tuned and instruction-tuned models catering to
specific use cases (e.g., healthcare, finance, law). Whilst models grow rapidly,
Amazon and Nvidia report that AI inference tasks particularly account for up
to 90% of the computational resource in AI systems, which are much more fre-
quently demanded than AI model training tasks [24]. In this white paper, we
mainly focus on the AI inference tasks, but the flexibility of our layer architec-
ture design allows the market for model training as well.
Current model: Centralized inference. The dominant platform of serving
these large models is through public inference APIs [25, 26, 27], offered via by
the dominant platform companies of today’s economy. For example, the Ope-
nAI API allows users to query models like ChatGPT and DALL-E over a web
interface. Although this is a relatively user-friendly option, it is susceptible to
the deleterious side-effect of centralization: monopolization. Apart from the
rent-seeking aspect of the centralized nature of the service offering, privacy im-
2


=== PAGE 3 ===
plications loom large: the human interactions mediated by generative AI models
is vastly more personal and intrusive than a web browsing and search queries.
Addressing the grand challenge of AI computation via the design of decentral-
ized and programmable platforms is the goal of this paper.
Proposed model:
Decentralized Inference.
In this paper, we propose
to decentralize AI inference across servers provided by consumer devices at
the grid edge. Decentralized inference can reduce communication and energy
costs by leveraging local computation capabilities. This is made possible by
utilizing energy-efficient devices located at the edge, which could potentially be
powered by renewable energy sources. Crucially, the energy overhead of running
large data-centers is largely reduced, simultaneously opening an opportunity to
democratize AI whilst limiting its ecological footprint.
Such a decentralized
platform would also enable the deployment of a library of large customized
models in a scalable manner - users can host in-demand customized models on
this decentralized cloud, and earn appropriate rewards.
Our decentralized AI platform, SAKSHI, is populated by a host of different
agents: AI service providers, AI clients, storage and compute hosting nodes.
A carefully designed incentive fabric stitches the different agents together into
an efficient, trustworthy, and economically fruitful AI platform.
Our design
of SAKSHI is best visualized in terms of a layered architecture (analogous to
network stacks). The layers are enumerated below and visualized in Figure 1.
1. Service layer. This is the path where the query and response (AI infer-
ence) are managed. The goal is to have high throughput and low latency
– the goal is to enable user journey similar to a standard web2-like ser-
vice, with the underlying resources (storage, computation) and economic
transaction managed in a decentralized and trustless manner.
2. Control layer. This is the path where networking and compute/storage
load balancing actions are managed.
The decentralized AI models are
hosted at multiple locations connected via a (potentially peer to peer)
network, and our decentralized design borrows from classical web2 con-
tent delivery network designs (e.g., Akamai) while managing the economic
transaction also in a decentralized and trustless manner.
3. Transaction layer.
This is the path where billing and metering are
conducted. The key is to have this outside the data path and visible to
a broader audience (e.g., via commitments on blockchains). Importantly
this is trust free crucially enabled via Witness Chain’s transaction layer
service (originally designed for decentralized 5G wireless networks [28],
but now naturally repurposed for decentralized AI services).
4. Proof layer. Any dispute in terms of metering and billing are handled
here. These proofs also provide resistance to unauthorized usage (e.g., just
copying) of AI models. This is definitely outside the data path, but also
outside the transaction path. This layer allows the formulation of novel
3


=== PAGE 4 ===
Figure 1: The six layer architecture for Web3.0 services
4


=== PAGE 5 ===
research questions (at the intersection of large AI models, cryptography
and security). We highlight three such key questions: (i) Proof of Infer-
ence – where the proof of computation of a specific (deep learning) AI
model can be verified; (ii) Proof of ownership, fine-tuning and watermark-
ing – where the proof of downstream modification to an AI model can be
verified; (iii) Proof of service delivery – where the proof of the delivery of
an AI service can be verified at customizable granularities. These dispute
resolutions naturally feed into a reputation system (leading to positive
incentives for salutary behavior) or crypto economic security via slashing
(negative incentives; see next layer). This new research, outlined in detail
in this paper, is joint work between multiple universities (Princeton Uni-
versity, University of Illinois at Urbana-Champaign, Tsinghua University,
HKUST), and two blockchain startups Witness Chain and Eigen Layer.
5. Economic layer. So far, the transactions can be handled purely via fiat
without the need for a token. This layer explores the benefits of having
a token to incentivize participants, both in the transient and long term
stages and the corresponding economic benefits therein.
→Eigenlayer
integration and ideas.
6. Marketplace. Compositional AI services, in a single atomic transaction,
are naturally enabled. The common data shared on the blockchain leads
to the creation of a decentralized marketplace for AI services. The supply
and demand allows the efficient discovery of prices. Optional in the first
version.
2
Architecture of Decentralized AI Service
2.1
Requirements
We now describe a specific architecture based on the general six layer architec-
ture outlined in the last section, allowing SAKSHI to be concrete. Our decentral-
ized AI service is designed to enable an open marketplace for AI models where
any user can access inference service offered by multiple, untrusted AI service
suppliers. Our goal is to ensure that the user is guaranteed a good quality of
service and the suppliers get a fair payment for their service.
There are several challenges that can hinder bootstrapping and growth of
such a decentralized service:
1. Individual suppliers may not be able to attract enough clients;
2. The supplier may not apply a good model and return low quality results;
3. The client may not pay after getting the service.
Each of these challenges is addressed by our decentralized AI service model:
5


=== PAGE 6 ===
Client interface
Client contract
(Service payment channel)
Aggregator
Servers
Supply contract
(Service payment channel)
Router
Marketplace
PoInference
Selects aggregator
Select’s router
Marketplace layer
Control layer
Service layer
Transaction layer
Proof layer
Rewards
Economics layer
Figure 2: SAKSHI- Decentralized AI service architecture
1. We allow an aggregator to collectively offer service on behalf of multiple
suppliers. The aggregator and suppliers engage in an SLA implemented
as a smart contract to ensure that each gets a fair share of the revenue.
2. We have a proof system for quality of AI services to ensure that suppli-
ers provide the promised quality of service.
The proof is implemented
through a challenge-response setup executed using a decentralized pool of
challenger nodes.
3. We have smart contracts and payment channels to implement scalable and
reliable payment service for the suppliers. This will be supported by an
objective dispute resolution mechanism to ensure that suppliers can get
paid if they deliver service.
2.2
The six layer architecture with Witness Chain
These functionalities of SAKSHI are enabled using the architecture in Figure 2.
At the top is the marketplace, a decentralized two-sided platform for buying
and selling AI services. A client (user) comes to our marketplace and places
an order to access inference service from an aggregator. Both agree on an SLA
which contains terms for quality of service and payments.
Next comes the service layer that provides the APIs for clients to make
inference requests to the aggregators. This request is appropriately passed to a
matching supplier server using a router deployed as a part of the control layer.
Both service and control layer are reminiscent of standard web 2.0 services with
multiple servers, with the caveat that the supplier servers can now be hosted
6


=== PAGE 7 ===
by different entities with their own business incentives and without any pre-
existing reputation. These servers are bound to an SLA between them and the
aggregator.
All the SLAs that govern the service-payment rules between different par-
ties are deployed as smart contracts as a part of the transaction layer, a de-
centralization middleware provided by Witness Chain [29]. The Witness Chain
transaction layer not only hosts and provides interfaces for the SLA smart con-
tracts, but also provides state channels to maintain the payment and service
state for interacting client, aggregator and supplier. Furthermore, it provides a
dispute resolution framework to ensure that the client completes the payment
after availing the service.
Finally, a proof layer deploys an appropriate Proof of Inference to ensure
that the suppliers are using models agreed upon in the SLA. This challenge
and verification for this proof is executed by a pool of challengers, Witnesses,
provided by Witness Chain. These proofs interact with the transaction layer
to ensure service quality promised in the SLA. The Witness Chain challenger
nodes executing these proofs are incentivised by Witness Chain using a part of
service payment. Witness Chain, in turn, provides a programmable layer for
choosing the challenger nodes which can be used to specify how decentralized
the challenger pool should be and how well-provisioned each challenger node
needs to be.
A detailed description of each layer is provided in Section 3; the interactions
discussed above are depicted at a high level in Figure 3 below.
2.3
The economic layer with Eigen Layer
All entities in the above ecosystem are incentivized to do their job fairly be-
cause of the economics underlying the SLA and the incentive system for the
challengers. Often, each new blockchain ecosystem launches its own token to
provide this cryptoeconomic security. However, this new token may not gain the
necessary volume and spread to enforce reasonable security in the early stages,
resulting in failure to bootstrap for the ecosystem.
This problem was solved recently by Eigen Layer [30] which provides a frame-
work for using Ethereum cryptoeconomic security by engaging Ethereum valida-
tors. Witness Chain integrates with Eigen Layer and uses Eigen Layer operators
as challengers to extend Ethereum security to the decentralized AI marketplace.
The challengers running the Proof of Inference, the ultimate root of trust in ser-
vice quality, would have staked/restaked Eth using Eigen Layer. Witness Chain
deploys an additional proof of custody [29] to ensure that these challengers are
being diligent in their job, lest their stake be slashed. Putting the restaking
framework of Eigen Layer together with the proof of diligence/custody by Wit-
ness Chain provides a comprehensive economic security layer for SAKSHI.
7


=== PAGE 8 ===
Client
Aggregator
Client signs SLA with aggregator
Aggregator
Aggregator signs SLA with servers
Server
Client
Service 
Interface
Router
Server
2. Request router to 
match a server
1. API call
3. Assign server
4. Input/Output exchange
Service payment
Server
1. Raise dispute
Transaction/Proof layer 
contracts
2. Post interactive
commitments
Server
Client
Aggregato
r
3. Resolve 
dispute
Transaction/Proof layer 
contracts
Initiation phase
Service usage phase
Dispute phase
Figure 3: Various steps in using SAKSHI
8


=== PAGE 9 ===
Client interface
Server
Router
Control layer
1. Assign server
2. Server ID
2. Client ID
3. Handshake
4. Process request
PoInference
Proof layer
Data availability
4. Service 
commitments
Transaction layer
5. Payments
5. 
Payments
Marketplace
5. Payments
Figure 4: Service Layer overview
3
Detailed Description of Each Layer
3.1
Service layer
The service layer enables the infrastructure for ML inference queries and is re-
sponsible for committing service information to the proof layer. This layer is
equivalent to a Web2 server-client architecture with some modifications to sup-
port the proof framework. An instantiation of this layer creates a connection
between a client and a server to exchange data and makes the server’s com-
pute available through agreed-upon Inference APIs. The service layer works in
conjunction with other layers in the infrastructure as depicted in Figure 4 and
described below:
Server Assignment: The client requests the control layer to assign a server
for an AI model, and the control layer notifies the client of the server’s ID and
address. It also notifies the server of an incoming connection from the client.
Service exchange: The client establishes a connection with the server using
the address provided by the control layer. Both server and client verify through
the transaction layer if an SLA path exists between them through the common
aggregator; if such a path exists, both parties implicitly agree on the trade. The
client sends inference requests using the server’s API endpoint; the client signs
9


=== PAGE 10 ===
Router
Client interface
PoInference
Witnesses
(PoLocation, 
PoBackhaul)
Proof layer
0. Maintain server state
0. Maintain server state
Transaction layer
1. Update SLAs
Service layer
Servers
2. Matching request
3. Match client-server
Figure 5: Control layer overview
the request for use in dispute resolution if the need arises. The server processes
the requests and sends the output data back to the client as the response; the
server might submit a commitment to the delivered response on a DA layer at
a later stage if the need arises for dispute resolution. Per service of a single
unit of inference - a single API request, the server anticipates a micropayment
as dictated by its SLA. A request is made to the transaction layer, which then
sends payments from the client to the aggregator and from the aggregator to
the server. The server proceeds to serve the subsequent request from the client
only if the payment for the previous request is processed.
Service dispute witnesses: The data exchanged in the service layer is used
as a witness in case a payment dispute arises, such as a client not paying for
the AI inference service delivered. The signed inference requests, output data
committed to a DA layer, and the previous exchanged micropayment will be
used for dispute resolution, as discussed in detail in the following sections on
the Transaction and Proof layers.
3.2
Control Layer
The control layer is responsible for matching clients and servers. This layer
consists of a set of routers that maintains the state of all servers subscribed to
it. It performs load balancing by allocating client requests to servers that opti-
mize cost measured in latency, compute cost, and compliance to SLAs. Servers
can subscribe to a router of their choice, and clients can select a router of their
choice. The control layer works in conjunction with other layers as depicted in
figure 5 and described below:
Server state maintenance: Router maintains a server network state consist-
ing of the following non-exhaustive set of variables:
10


=== PAGE 11 ===
• Server model capacity: The set of AI models that the server can compute
inference on
• Server hardware capacity: The compute capacity of each server
• Server request load: The number of clients the server is currently con-
nected to at the service layer
• Server location: Verified server location from the proof layer
Some of these variables require the routing trusting server’s claims - these will
be used for soft constraints in routing, whereas other variables such as location
will be verified through the proof layer - these can be used for hard constraints
such as geo-restricting the inference compute.
SLA state maintenance: The router maintains the state of SLAs signed at
the transaction layer between client-aggregators and aggregator-servers so that
it can match clients to servers that share a common aggregator. The router
watches the transaction layer contracts for events to register or de-register SLAs.
Client-server matching: The client submits a request specifying the type of
server it would like to be matched to - this request consists of parameters such
as model id, location boundary, server uptime, etc. The router runs a matching
logic to select a server best suited for that model at that time by utilizing the
server state and the SLA state. The router then notifies the service layer to
establish a connection between the client and the servers and the transaction
layer to anticipate payments through their common aggregator.
Note on fairness: A malicious router can unfairly route requests leading to
a loss in revenue for some servers; if a server sees such behavior, it will migrate
to another router that provides better revenue by providing fair routing. This
market dynamic facilitates fairness in routing.
3.3
Transaction Layer
The transaction layer is responsible for payment to servers and intermediaries
for delivering their service.
3.3.1
Necessity of an integrated transaction layer
Decentralized platforms generate supply by incentivizing and compensating an
extensive network of parties - termed suppliers. The platform can be considered
a marketplace for the service supply chain, with service flowing from suppliers
(servers) to intermediaries and finally to consumers and compensation flowing
the other way. A compensation system is, therefore, a critical part of a decen-
tralized service-oriented platform.
Compensation for providing services is already an integral part of existing
centralized platforms such as Uber, AirBnB, and Amazon; however, the billing
systems used for their decentralized counterpart need to be composable with
11


=== PAGE 12 ===
the trustless and programmable service framework that decentralized platforms
enable. Decentralized platforms need the billing system to support automated
smart contract-initiated dispute resolution and high-speed dispersion of funds,
as we will see next. The transaction layer incorporates the web3 equivalent of a
billing system. The transaction layer ties the billing of a service with a Service
Level Agreement (SLA) that codifies the terms of service and payment, and
ensures that metering for the SLA is consistent with the service delivered.
3.3.2
Scalability solutions
Decentralized AI platforms cannot rely on the assumption of trust between a
server and a client since either party may be too small to be bound by the
principles of reputation maintenance or legal agreements. Thus, they need to
be constantly in consensus about the amount of inference service delivered and
payment for such service. A requirement for achieving this consensus is that
it must be achieved per delivery of an inference service unit - a query.
All
parties involved in service delivery must agree on the service delivered and
settle payment for that service delivered at frequent intervals. This requirement
necessitates a high throughput, low latency payment system.
Consensus literature is rich in solutions to scale payment p ranging from
sharding, rollups, and sidechains to payment channels. Our payment system
should ideally satisfy the following properties:
• High throughput of payments
• Low latency between payment initiation and confirmation
• Scale throughput with the number of supply or demand side participants
• Payment per service delivery is not public information and may only be
shared between the supplier, consumer, and the chosen intermediaries.
State channels and payment channels satisfy all the above requirements.
Modeling a decentralized AI platform, we observe that a single client will inter-
act with multiple servers to query for different models and use different suppliers
for inter-session privacy. The requirement for managing a state channel across
multiple servers is not scalable. Hence we choose a payment channel approach
to build the transaction layer’s payment system. We will have a payment chan-
nel between a client and an aggregator intermediary and another between the
aggregator intermediary and server, enabled by SLA chaining. Figure 6 depicts
the interaction of transaction layer components with other layers, with details
on the architecture below:
3.3.3
Architecture overview
The transaction layer encompasses SLAs that any two parties agree on, an
SLA manager that converts service measurements to payments using SLA, SLA
clients running on machines of both parties fetching data from the measurement
12


=== PAGE 13 ===
Client contract
(Service payment channel)
Supply contract
(Service payment channel)
SLA manager
Service layer
Control layer
Proof layer
Marketplace layer
1. Match client-
aggregator
1. Match aggregator-supplier
3. 
Measurements
4. 
Micropayments
2. Maintain SLA 
state
5. Resolve inference 
disputes
6. Periodic 
commitment
6. Periodic 
Commitment
Figure 6: Transaction layer overview
gateway, and a blockchain wrapper for posting transactions. These components
are described in detail below:
Service contracts: Service contracts consist of two components: A SLA that
both the transacting parties agree on and a unidirectional payment channel with
funds flowing from the service consumer to the supplier. For the AI platform
there exists two consumer - supplier pairs: (i) Client - Aggregator and (ii) Aggre-
gator - Server. The SLA is codified based on a SLA4OpenAPI standard [31] and
maps service usage to a payment. SLAs for AI application maps (model type,
input size, output size) to token payment amount. The unidirectional payment
channel is set up with an escrow from the consuming party to supplying party
and set’s terms of delegation of payment keys to an intermediary SLA manager.
SLA manager: SLA manager end clients are given to run a codebase that
signs micropayments or delegate it to an application running on the cloud: SLA
manager. SLA manager receives signed measurements from the consumer and
supplier’s SLA client and converts that to an appropriate payment amount by
signing a micropayment and sending funds on the payment channel on behalf
of the consumer.
SLA client and measurement gateway: SLA client and measurement gate-
way are components that run on the end devices of the consumer and supplier.
The measurement gateway interprets the service messages and converts them
into service units. For AI applications, these would be the model requested,
input size, and output size. The SLA client fetches this information from the
measurement gateway, signs it with the key codified in the service contract, and
sends it to the SLA manager; optionally, the SLA client (on the consumer end)
can convert the measurement to a micropayment themselves and forward it to
the supplier.
13


=== PAGE 14 ===
Blockchain wrapper This component runs on the SLA manager and SLA
client. It is responsible for broadcasting and listening to on-chain transactions
such as payment channel start, termination, and dispute messages on-chain. The
blockchain wrapper is compatible with multiple blockchains such as Ethereum,
Polygon, Solana, and all EVM-compatible rollups.
3.3.4
Dispute-compatibility
SAKSHI utilizes a post-service payment model - Payment disputes can emerge
when a supplier claims non-receipt of payment for a service unit (a single AI
inference). The associated micropayment can serve as a proof of payment to
resolve such disputes. Micropayments in unidirectional payment channels typ-
ically consist of a signed commitment of the total payable amount. To render
these payment channels to be dispute-compatible, we need to augment them
with additional parameters. Firstly, the micropayment should include a unique
‘requestID’ that corresponds to the disputed inference. Secondly, it should con-
tain the hash of the preceding micropayment, which can be validated using a
nonce - a counter incremented with each successive micropayment. To resolve
a payment dispute raised by the server, the payer can commit the associated
micropayment. Additionally, the preceding micropayment must also be commit-
ted, to calculate the amount payable for the disputed service unit. Depending
on who is deemed to be correct, the dispute can be settled on-chain from the
existing balance in the payment channel. Our dispute resolution protocol also
addresses other scenarios, such as disputes raised by a malicious server with-
out providing service, and inconsistent micropayment commitments. Figure 7
depicts an example flow of utilizing payment channel commitments for service
dispute resolution.
3.4
Proof Layer
The proof layer, operating outside the data and transaction paths, provides a
way to resolve various disputes in SAKSHI, utilizing blockchains as a immutable
and trusted medium to read and write service states. A variety of disputes can
arise in the AI service and “proof” systems to provide cryptographic resolution
mechanisms address the corresponding issues. In this paper, we focus on two
categories of proofs, each responding to different types of disputes.
• Proof of Inference, a proof of correct computation on a prescribed (and
open) AI model, mediates disputes of correct inference;
• Proof of Model-ownership, a proof of how closely two AI models are related
to each other and whether one AI model is a clone or a fine-tuned version
of the other, mediates potential disputes related to intellectual property
held by the owner of an AI model.
14


=== PAGE 15 ===
Ideal path
Service dispute
Dispute resolution
Figure 7: Utilizing transaction layer payments for service dispute resolution
Service layer
Transaction layer
Economics layer
Data availability
Server network
Witnesses
Dispute resolution
contract
Client interface
Incentives
Micropayments
Requests
Figure 8: Proof layer overview
Figure 8 depicts the interaction of the dispute resolution contract in the
proof layer with the rest of the platform layers. A detailed description of the
individual proof follows.
3.4.1
Proof of Inference
A crucial aspect of decentralized inference platforms is the presence of incen-
tives that encourage honest participation in the protocol while discouraging
malicious actors. An essential component of this incentive design is addressing
the problem of provably verifying computations executed by untrusted servers.
Various design choices are available to enable such proof of inference, with sev-
eral emerging research directions.
One such line of research involves the application of zero-knowledge proofs
(ZKP) to verify AI model execution [32]. However, this approach is extremely
computationally intensive, necessitating concessions such as quantization, which
15


=== PAGE 16 ===
leads to lower accuracy. Furthermore, generating ZKPs for modern, large-scale
generative AI models is currently impractical.
An alternative strategy is to adopt an optimistic approach. In this scheme,
the server commits the hash of the generated output, and the system assumes
the off-chain inference to be accurate. If a participant (“challenger”) doubts the
inference’s correctness, they can contest its validity by submitting a fraud proof.
This proof can be generated using a verification oracle that can re-run the model
and determine the accuracy of the server’s or challenger’s claim. However, since
these oracle nodes may have limited computational capabilities, recomputing
the entire neural network forward pass is prohibitively expensive and inefficient.
To address this issue, we propose a method inspired by the bisection scheme
employed in the optimistic rollup Arbitrum [33]. A key observation is that AI
models can be viewed as a sequence of functions, such as layers in a neural
network.
f(x) = y
→
fn(fn−1(fn−2(...f2(f1(x))...))) = y
When there is a discrepancy between the outputs of a server and a challenger,
we can employ an interactive bisection scheme to identify a single function—the
first layer in the AI model where the outputs of the two parties differ. By im-
plementing this system, oracle nodes only need to compute and verify a single
layer of the network, significantly reducing costs and making the verification of
extremely large models feasible. Indeed, deterministic AI inference is a prereq-
uisite for such schemes, which is attainable by fixing the random state.
We illustrate our ModelBisection algorithm in Figure 9, that identifies the
earliest layer of the AI model where the inputs align for both parties, but the
resulting outputs diverge, while minimizing the number of interactive steps in-
volved. In case of a sequential model (left), one can use a form of binary search -
if the output of a queried layer (typically the midpoint) is inconsistent between
the parties, we recursively bisect the first half of the node sequence. Otherwise,
we eliminate the first half, and recursively bisect the second half of the sequence.
Each bisection step eliminates half of the remaining candidates for the faulty
layer. After a logarithmic number of iterations, we locate a layer whose input
is consistent, yet the parties produce differing outputs.
However, the computations within an AI model are not simply sequential
but rather form a Directed Acyclic Graph (DAG) structure. Consequently, the
bisection mechanism used for sequential networks cannot be directly applied
to AI models. We demonstrate our approach, ModelBisection, on an Inception
block of GoogLeNet [34] as depicted in Figure 9 (right). Suppose we select the
node n1 = L2.2 in the DAG for output verification. Both parties compute and
share the intermediate output of layer L2.2. If the outputs are equal, we prune
all ancestor nodes of this node in the DAG from consideration (as their outputs
would have to be consistent).
If, however, the outputs differ, we eliminate
all non-ancestor nodes of this node in the DAG (since one of outputs among
ancestors must be inconsistent). We keep track of the identified consistent and
inconsistent nodes, and continue this process until we reach a single layer where
the inputs are consistent between the parties, but the outputs differ. We employ
16


=== PAGE 17 ===
L1
L2
L n
2
L n
2 +1
Ln
L1
L2
L n
2
L n
2
L n
2 +1
Ln
Li
Li+1
Consistent 
  Prune ancestors
⟹
Consistent 
  Prune ancestors
⟹
Inconsistent 
  Prune non-ancestors
⟹
Inconsistent 
  Prune non-ancestors
⟹
Check 
 
L n
2
Check  
  
L2.2
L1
L2.4
L2.1
L3.1
L3.2
L2.2
L2.3 L3.3
L4
L2.2
L3.1
L2.1
L2.4
L2.3 L3.3
L4
L3.2
L2.2
Li.1
Li.2
Li.3
Li+1
Inception 
Module
Base
Conv
 1x1
 5x5
 1x1
L3.2
Avg
Pool
 1x1
Concat
 1x1
Inception 
Module
ReLU
Linear
BatchNorm
ReLU
Linear
BatchNorm
Feedforward NN
GoogLeNet
Step 1 
Convert AI  
model into DAG
Step 2 
ModelBisection 
first step
Repeat 
Until layer found 
Inputs consistent 
Output inconsistent
Node with  
consistent  
output
Node with  
inconsistent  
output
Unchecked 
Node
Legend
Figure 9: Model bisection
17


=== PAGE 18 ===
a greedy strategy to select the node in the digraph such that it is split in the most
balanced way. We choose the node which maximizes min{|x|, n −|x|}, where
|x| is the number of ancestors of node x, and n is the total number of nodes in
the current digraph. This score can be interpreted as the least number of nodes
that would be eliminated as potential candidates for the first point of divergence,
when x is queried, thus minimizing the number of ModelBisection rounds. It’s
noteworthy that even in large foundation models, the ModelBisection approach
can pinpoint a single layer of divergence in a very small number of iterations. For
example, in the case of the 13 billion parameter LLaMA model [15], fewer than
ten iterations suffice. Finally we observe that the bisection subroutine bears
similarity to the one utilized by GitHub in git bisect, which aids in identifying
the first faulty entry in the DAG of commits and merges.
3.4.2
Proof of Model ownership
A decentralized AI marketplace comprises three main entities - model owners
who collect datasets and train or finetune AI models, compute-rich servers, and
end-users. As opposed to current open-source model hosting solutions, decen-
tralized marketplaces can allow incentivizing model creators by rewarding them
a percentage of the inference fee when their models are utilized. However, such
an incentive design is susceptible to model copying attacks, where a malicious
actor can copy, slightly modify, and profit from the hosted models at the cost
of the model creators.
Therefore, a robust mechanism for model ownership
resolution becomes a crucial prerequisite for decentralized AI marketplaces.
One promising solution for a proof of model ownership is by embedding
a watermark in the neural networks during the training phase.
To be ef-
fective, a DNN watermarking scheme must fulfill several criteria: it should
be functionality-preserving, meaning the watermark embedding must not im-
pact model performance. The watermark must be robust, and be extractable
from any transformed model (e.g., through weight scaling or finetuning). Ad-
ditionally, a watermarked model should remain indistinguishable from a non-
watermarked model to potential adversaries. Moreover, a watermark must be
resistant to ambiguity attacks - false claims of existence of a different watermark.
Various watermarking schemes have been proposed in research literature.
Parameter encoding methods [35, 36, 37], integrate a watermark directly into
the model’s parameters. For classification models, an alternate method involves
backdooring, which involves assigning incorrect labels to examples in a trigger
set, and this can be used as a watermark [38, 39]. Additionally, task-specific
and model-specific watermarking methods have been proposed [40, 41, 42, 43].
Nonetheless, the robustness of existing methods against model copying has been
questioned by recent attacks [44, 45, 46], highlighting an unresolved research
challenge.
Notably, in most watermark extraction algorithms, information about the
watermark location or the trigger examples are revealed during the verification
process. This knowledge facilitates easier watermark removal and ambiguity
attacks. Therefore, in our system a trusted judge is required to resolve model
18


=== PAGE 19 ===
ownership disputes. Model creators must embed watermarks in their models,
and commit a commitment of the watermark on the blockchain. The judge must
be able to verify the existence of watermarks using the extraction algorithm,
which may be task and model-specific. Such a proof of model ownership can en-
sure the non-feasibility of profiting from stolen models within the decentralized
marketplace. However, it does not prevent an adversary from copying a model
and using it outside this system (eg - via a black-box api). Such acts can be
deterred by licensing the model’s use only in this marketplace, and resorting to
legal means if necessary.
3.5
Summary
Proofs of inference and ownership are two examples of a broader family of pro-
tocols providing Byzantine resistance in SAKSHI. Even here, we have worked
more to describe the problems rather than the solutions – a call to arms from
the scientific community. As the platform evolves and participation rises, the
attack space could also expand opening the door for new and different kinds
of proof systems (e.g., proof of custody; proof of infrastructure hosting the AI
models).
References
[1] iRobot.
Roomba robot vacuums.
https://www.irobot.com/en_US/
roomba.html. Accessed: 2023-03-23.
[2] Boston Dynamics.
The most dynamic humanoid robot.
https://www.
bostondynamics.com/atlas. Accessed: 2023-02-01.
[3] David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou,
Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Ku-
maran, Thore Graepel, et al. Mastering chess and shogi by self-play with a
general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815,
2017.
[4] David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou,
Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai,
Adrian Bolton, et al. Mastering the game of go without human knowledge.
nature, 550(7676):354–359, 2017.
[5] David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou,
Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Ku-
maran, Thore Graepel, et al.
A general reinforcement learning algo-
rithm that masters chess, shogi, and go through self-play.
Science,
362(6419):1140–1144, 2018.
19


=== PAGE 20 ===
[6] John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Fig-
urnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Au-
gustin ˇZ´ıdek, Anna Potapenko, et al. Highly accurate protein structure
prediction with alphafold. Nature, 596(7873):583–589, 2021.
[7] Richard Evans, Michael O’Neill, Alexander Pritzel, Natasha Antropova,
Andrew Senior, Tim Green, Augustin ˇZ´ıdek, Russ Bates, Sam Blackwell,
Jason Yim, et al.
Protein complex prediction with alphafold-multimer.
BioRxiv, pages 2021–10, 2021.
[8] Jonas Bostr¨om, Dean G Brown, Robert J Young, and Gy¨orgy M Keser¨u.
Expanding the medicinal chemistry synthetic toolbox.
Nature Reviews
Drug Discovery, 17(10):709–727, 2018.
[9] Alexey Strokach, David Becerra, Carles Corbi-Verge, Albert Perez-Riba,
and Philip M Kim. Fast and flexible protein design using deep graph neural
networks. Cell systems, 11(4):402–411, 2020.
[10] Petra Schneider, W Patrick Walters, Alleyn T Plowright, Norman Sieroka,
Jennifer Listgarten, Robert A Goodnow Jr, Jasmin Fisher, Johanna M
Jansen, Jos´e S Duca, Thomas S Rush, et al. Rethinking drug design in the
artificial intelligence era. Nature Reviews Drug Discovery, 19(5):353–364,
2020.
[11] OpenAI. Gpt-4 technical report, 2023.
[12] S´ebastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke,
Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lund-
berg, et al. Sparks of artificial general intelligence: Early experiments with
gpt-4. arXiv preprint arXiv:2303.12712, 2023.
[13] Introducing
chatgpt,
2022.
Retrieved
March
14,
2023,
from
https://openai.com/blog/chatgpt.
[14] Google.
BARD.
https://blog.google/technology/ai/
bard-google-ai-search-updates/.
[15] Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-
Anne Lachaux, Timoth´ee Lacroix, Baptiste Rozi`ere, Naman Goyal, Eric
Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation lan-
guage models. arXiv preprint arXiv:2302.13971, 2023.
[16] Yusuf
Mehdi.
Reinventing
search
with
a
new
ai-
powered
microsoft
bing
and
edge,
your
copilot
for
the
web.
https://blogs.microsoft.com/blog/2023/02/07/
reinventing-search-with-a-new-ai-powered-microsoft-bing-and-edge-your-copilot-for-the-w
[17] Github CoPilot. Your ai pair programmer is leveling up. https://github.
com/features/preview/copilot-x, 2023. Accessed: 2023-03-24.
20


=== PAGE 21 ===
[18] Google
Cloud.
The
next
generation
of
ai
for
developers
and
google
workspace.
https://blog.google/technology/ai/
ai-developers-google-cloud-workspace/,
2023.
Accessed:
2023-
03-24.
[19] Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and
Bj¨orn Ommer. High-resolution image synthesis with latent diffusion mod-
els. In Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, pages 10684–10695, 2022.
[20] Midjourney. https://www.midjourney.com. Accessed: 2023-03-23.
[21] Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain
Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katie Millican, Malcolm
Reynolds, et al. Flamingo: a visual language model for few-shot learning.
arXiv preprint arXiv:2204.14198, 2022.
[22] Andrea Agostinelli, Timo I Denk, Zal´an Borsos, Jesse Engel, Mauro
Verzetti, Antoine Caillon, Qingqing Huang, Aren Jansen, Adam Roberts,
Marco Tagliasacchi, et al. Musiclm: Generating music from text. arXiv
preprint arXiv:2301.11325, 2023.
[23] Uriel Singer, Adam Polyak, Thomas Hayes, Xi Yin, Jie An, Songyang
Zhang, Qiyuan Hu, Harry Yang, Oron Ashual, Oran Gafni, et al. Make-
a-video: Text-to-video generation without text-video data. arXiv preprint
arXiv:2209.14792, 2022.
[24] Joseph McDonald, Baolin Li, Nathan Frey, Devesh Tiwari, Vijay Gadepally,
and Siddharth Samsi. Great power, great responsibility: Recommendations
for reducing energy for training language models. In Findings of the As-
sociation for Computational Linguistics: NAACL 2022, pages 1962–1970,
2022.
[25] OpenAI. Transforming work and creativity with ai. https://openai.com/
product. Accessed: 2023-03-23.
[26] Forefront. Powerful language models a click away. https://forefront.
ai/. Accessed: 2023-03-23.
[27] AI21 Labs. When machines become thought partners. https://ai21.com/.
Accessed: 2023-03-23.
[28] SVR Anand, Serhat Arslan, Rajat Chopra, Sachin Katti, Milind Kumar
Vaddiraju, Ranvir Rana, Peiyao Sheng, Himanshu Tyagi, and Pramod
Viswanath. Trust-free service measurement and payments for decentral-
ized cellular networks. In Proceedings of the 21st ACM Workshop on Hot
Topics in Networks, pages 68–75, 2022.
[29] Witness Chain team. Witness chain. https://www.witnesschain.com/.
Accessed: 2023-07-16.
21


=== PAGE 22 ===
[30] Eigenlayer. https://www.eigenlayer.xyz/. Accessed: 2023-07-17.
[31] Sla4oai-specification.
https://github.com/isa-group/
SLA4OAI-Specification, 2022.
[32] Daniel Kang, Tatsunori Hashimoto, Ion Stoica, and Yi Sun.
Scaling
up trustless dnn inference with zero-knowledge proofs.
arXiv preprint
arXiv:2210.08674, 2022.
[33] Harry Kalodner, Steven Goldfeder, Xiaoqi Chen, S Matthew Weinberg, and
Edward W Felten. Arbitrum: Scalable, private smart contracts. In 27th
{USENIX} Security Symposium ({USENIX} Security 18), pages 1353–
1370, 2018.
[34] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed,
Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Ra-
binovich.
Going deeper with convolutions.
In Proceedings of the IEEE
conference on computer vision and pattern recognition, pages 1–9, 2015.
[35] Yusuke Uchida, Yuki Nagai, Shigeyuki Sakazawa, and Shin’ichi Satoh. Em-
bedding watermarks into deep neural networks. In Proceedings of the 2017
ACM on international conference on multimedia retrieval, pages 269–277,
2017.
[36] Bita Darvish Rouhani, Huili Chen, and Farinaz Koushanfar. Deepsigns:
An end-to-end watermarking framework for ownership protection of deep
neural networks. In Proceedings of the Twenty-Fourth International Confer-
ence on Architectural Support for Programming Languages and Operating
Systems, pages 485–497, 2019.
[37] Lixin Fan, Kam Woh Ng, and Chee Seng Chan. Rethinking deep neural
network ownership verification: Embedding passports to defeat ambiguity
attacks. Advances in neural information processing systems, 32, 2019.
[38] Yossi Adi, Carsten Baum, Moustapha Cisse, Benny Pinkas, and Joseph
Keshet. Turning your weakness into a strength: Watermarking deep neural
networks by backdooring. In 27th USENIX Security Symposium (USENIX
Security 18), pages 1615–1631, 2018.
[39] Sebastian Szyller, Buse Gul Atli, Samuel Marchal, and N Asokan. Dawn:
Dynamic adversarial watermarking of neural networks. In Proceedings of
the 29th ACM International Conference on Multimedia, pages 4417–4425,
2021.
[40] Pierre Fernandez, Guillaume Couairon, Herv´e J´egou, Matthijs Douze, and
Teddy Furon. The stable signature: Rooting watermarks in latent diffusion
models. arXiv preprint arXiv:2303.15435, 2023.
22


=== PAGE 23 ===
[41] Yunqing Zhao, Tianyu Pang, Chao Du, Xiao Yang, Ngai-Man Cheung,
and Min Lin. A recipe for watermarking diffusion models. arXiv preprint
arXiv:2303.10137, 2023.
[42] Miranda Christ, Sam Gunn, and Or Zamir. Undetectable watermarks for
language models. arXiv preprint arXiv:2306.09194, 2023.
[43] John Kirchenbauer, Jonas Geiping, Yuxin Wen, Jonathan Katz, Ian Miers,
and Tom Goldstein. A watermark for large language models. arXiv preprint
arXiv:2301.10226, 2023.
[44] Nils Lukas, Edward Jiang, Xinda Li, and Florian Kerschbaum. Sok: How
robust is image classification deep neural network watermarking? In 2022
IEEE Symposium on Security and Privacy (SP), pages 787–804. IEEE,
2022.
[45] Yifan Yan, Xudong Pan, Mi Zhang, and Min Yang. Rethinking white-box
watermarks on deep learning models under neural structural obfuscation.
In 32th USENIX security symposium (USENIX Security 23), 2023.
[46] Jian Liu,
Rui Zhang,
Sebastian Szyller,
Kui Ren,
and N Asokan.
False
claims
against
model
ownership
resolution.
arXiv
preprint
arXiv:2304.06607, 2023.
23