AI inference,
ferried blind.

A blind, end-to-end-encrypted marketplace for AI inference. Sell GPU capacity, buy GPU capacity, pay per token in sats. The gateway introduces a rider to a boat, takes its coin, and ferries sealed cargo — it can prove it got paid and prove nothing else.

identity ahp_ tokens · auth.nuts.services / rails cashu · lightning (L402) · prepaid / backends ollama (local)

01 how it works

Two binaries and a relay.

The catch — the good catch — is that the gateway is blind. Prompt and reply are encrypted end-to-end between the two clients. Charon ferries sealed cargo.

Provider

You sell capacity.

Run a tiny client next to Ollama. Pick which models to sell and what they cost per million tokens. The client dials out — no open ports, no public IP — and waits for sealed work.

// next to your Ollama

Consumer

You buy capacity.

Run an OpenAI-compatible endpoint on localhost. Point any agent at it — Claude Code, Cline, Aider, anything. Charon finds a provider, pays the invoice, runs the encrypted handshake, streams the reply.

// localhost :8088/v1

Gateway

We relay nothing.

Introduces rider to boat. Verifies the NUTS ahp_ token. Settles the coin at the mint. Forwards the sealed payload. Cannot read your prompt. Cannot route it. Cannot replay it.

// charon.nuts.services

02 quick start

Sell or buy in five lines.

Provider config goes next to your Ollama. Consumer config goes into the agent that’s spending tokens. Both clients authenticate to the gateway with a NUTS ahp_ token from auth.nuts.services.

provider.toml// you sell

# pick which models to ferry, and your fare
name                    = "qwen2.5-coder:32b"
price_msat_per_mtok_in  = 200000     # 200 sat / 1M in
price_msat_per_mtok_out = 600000     # 600 sat / 1M out

ollama_url              = "http://localhost:11434"
auth_token              = "$NUTS_AHP_TOKEN"

under [[models]] in the provider config no open ports

consumer.toml// you buy

# point any OpenAI client at the local endpoint
type           = "openai"
base_url       = "http://localhost:8088/v1"
models         = ["qwen2.5-coder:32b"]

auth_token     = "$NUTS_AHP_TOKEN"
max_msat_call  = 10000           # cap the fare per request

a block under [providers.charon] in the agent config cap your fare

03 payment rails

Pay the ferryman, three ways.

Pricing is on the cap you set — you know the fare before you board. Pair end-to-end encryption with ecash and the ferryman goes doubly blind: ecash hides who paid, encryption hides what was asked.

Cashu ecash

default

Bearer tokens swapped at the mint. Doubly blind: gateway gets a coin with no name on it. Change returned if you overpay.

header X-Cashu

Lightning (L402)

native

402 Payment Required + macaroon. Standard Lightning invoice in the challenge; macaroon proves you paid on the retry.

header Authorization: L402 …

Prepaid balance

batch

Top up once, draw down per call. For agent fleets or long-running jobs that don’t want a handshake per request.

scope ahp_ · per-token

04 architecture

The gateway sees a coin, not the cargo.

Both clients authenticate to the gateway with a NUTS ahp_ token — the same token that already gates Grub and Shivvr. The token gates the connection, anchors the encryption key, and carries reputation. One key, three jobs.

   your agent                         Charon                       someone's Ollama
  ┌────────────┐   localhost   ┌──────────────────┐   relay    ┌──────────────────┐
  │ Claude     │──────────────►│  consumer client │◄══════════►│  provider client │
  │ Code / any │   plain       │  pay · encrypt   │  GATEWAY   │  decrypt · serve │
  │ OpenAI app │               └──────────────────┘  (blind)   └────────┬─────────┘
  └────────────┘                                                        │
                  ── prompt + reply: end-to-end encrypted ──       ┌─────▼─────┐
                  ── gateway sees a coin, not the cargo ──         │  Ollama   │
                                                                  └───────────┘

05 what we don’t do

Honest about the leaks.

If the threat model includes the box doing inference, this isn’t enough on its own. Confidential compute (TEEs) is the same path our hosted mint walks.

The gateway can’t read your traffic. The provider can — they decrypt to run the model. A blind relay still sees shape of traffic: chunk sizes, timing. We think that’s an acceptable leak for what you get.

The code is yours. Fork the gateway, run your own, ferry your own dead. Reputation travels with your identity, not with our server, so leaving costs you nothing.

06 support

Fund the ferry.

Charon is independent and open source. If it’s useful to you, chip in — proceeds go straight to developing the project: the gateway, the clients, the mint, the dashboard.

Bitcoin · on-chain

3QPwAQKmq4nSgM4SA1JM5nByVWKiQaAAVk

Scan with any Bitcoin wallet, or send to the address above. Thank you for keeping the boat afloat. ●

AI inference,ferried blind.