codecloud
HomeDocsBlog
All posts

How we run OpenCode in the cloud with E2B and Convex

February 23, 20268 min read

Codecloud runs OpenCode agents in the cloud. Each run often clones a private GitHub repo and needs full filesystem access. That means each run needs its own isolated environment (filesystem, processes, network, credentials) and that environment needs to be destroyed when the run ends. The best solution we found for this is E2B, who provide a firecracker-based VM with a convenient, programmatic API.

Why E2B

E2B gives you an ephemeral sandbox instance that spins up almost instantly and lets you run commands or read / write files via their SDK. The sandboxes can stay alive for up to 24 hours and you can configure them with the resources you need (e.g. vCPUs and RAM).

This made it the perfect choice for running isolated OpenCode instances with robust security guarantees.

E2B also recently started offering sandbox images with OpenCode pre-installed, which is a convenient way to get up and running quickly.

How sandboxing works

The reason E2B sandboxes can boot in milliseconds comes down to what's running under the hood: Firecracker, an open-source virtual machine monitor built by AWS for Lambda and Fargate.

Unlike traditional hypervisors that emulate an entire computer (BIOS, PCI bus, USB controllers, and all), Firecracker strips virtualization down to the bare minimum. It only emulates the handful of devices a Linux guest actually needs: a network card, a block device, a serial console, and not much else. The result is a VM that boots in around 125ms with less than 5 MB of memory overhead per instance.

The speed comes from the codebase being intentionally tiny: around 50k lines of Rust, compared to the roughly 2 million lines of C in QEMU. Less code means a smaller attack surface, faster startup, and fewer things that can go wrong.

Why not containers?

Container-based sandboxes like Daytona take a different approach: Docker containers with persistent workspaces. They can achieve even faster cold starts (sub-90ms) by sharing the host kernel, and they work well when you control the code being executed and want state to persist between sessions. If you're deploying agents just for your own organization against your own repos, containers are a solid choice.

The tradeoff is the isolation boundary. Containers share the host kernel, so a kernel exploit in one container can compromise everything else on the machine. Firecracker microVMs get their own kernel, their own memory space, and their own filesystem. The isolation boundary is the hardware virtualization layer (KVM), not a set of Linux namespaces.

For a multi-tenant system where each sandbox is cloning someone's private repo and running arbitrary code from an LLM, we need that stronger boundary.

This is the direction the industry is moving in general. The major cloud providers have all shifted their serverless control planes toward hardware-enforced isolation. For AI agent infrastructure specifically, Firecracker and gVisor (Google's user-space kernel, used by Modal) are the two dominant approaches for running untrusted code.

Beyond isolation, E2B's sandbox lifecycle primitives (create, connect, kill), command and file APIs, and private networking model meant we could build the control plane without managing VMs ourselves.

Anatomy of a run

When a codecloud run starts, we:

  1. Create a fresh E2B sandbox
  2. Mint a temporary, scoped GitHub token for the user
  3. Clone the repo & check out the target branch
  4. Start opencode serve inside the sandbox.
  5. Start a relay process for event streaming from the sandbox (explained below).
  6. Stream agent output to various sources (our Convex database, Webhooks, Linear)
  7. Listen to a done signal and destroy the sandbox

Private networking

Sandboxes run in secure mode: processes inside the sandbox can reach the internet (GitHub, provider APIs, package registries), but all connections to the sandbox must go through E2B's proxy and require a traffic access token:

const client = createOpencodeClient({
  baseUrl: `https://${sandbox.getHost(port)}`,
  fetch: (request) => {
    const headers = new Headers(request.headers);
    headers.set("e2b-traffic-access-token", sandbox.trafficAccessToken);
    return fetch(new Request(request, { headers }));
  },
});

Streaming events past the 10-minute convex limit

Our backend runs on Convex, and Convex actions have a 10-minute limit. So if we connect to the sandbox from a Convex action, we have 10 minutes to complete the agent run. For complex agent workloads, runs can take much longer than that!

Our first approach was to keep a long-lived Convex action connected to the sandbox and stream OpenCode events in real time. Just before the 10 minute timeout, we'd schedule a new Convex action to continue streaming.

This works, but there is a non-zero chance of missing events while the new convex action boots up, and some of those events can be important (for example session.idle for run completions).

An even worse case is if the convex action itself crashes or gets cancelled (e.g. by a deployment), and a new one doesn't get scheduled. This means we can miss the rest of a run entirely.

The solution was to reverse the flow of events: instead of Convex pulling events from the sandbox, a relay script inside the sandbox pushes events out. The relay subscribes to OpenCode's event stream locally and sends them to a webhook on our backend. Our backend only handles short webhook requests instead of a long-running streaming action.

The relay script is straightforward: connect to OpenCode's SSE event stream, translate each event into something our backend understands, and POST it to our webhook endpoint. Here's a simplified version:

// Relay script running inside the sandbox
const response = await fetch(`${opencodeUrl}/event`, {
  headers: { Accept: "text/event-stream" },
});

for await (const event of readSSEEvents(response)) {
  // Translate OpenCode events into webhook payloads
  const payload = event.type === "message.part.delta"
    ? { type: "delta", delta: event.properties.delta }
    : { type: "activity", reason: event.type };

  // Push to our Convex backend webhook
  await fetch(webhookUrl, {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ runId, token, events: [payload] }),
  });
}

In practice there's more going on (batching events, debouncing text deltas, retrying failed posts, heartbeats, crash reporting) but the core idea is this simple loop. The relay runs for as long as the sandbox lives, and our backend only ever handles short-lived HTTP requests.

Each webhook request is secured with a token only valid for the duration of the run.

Getting the relay into the sandbox

The relay script lives in our codebase as a normal TypeScript file. At build time, we transpile it to JavaScript and wrap the output in a single exported string constant. The generated file looks something like:

// convex/agent/relay/opencodeStreamRelay.generated.ts
// Auto-generated — do not edit directly
export const OPENCODE_STREAM_RELAY_SCRIPT = "import fs from \"node:fs/promises\"; ...";

This means the entire relay is just a string that ships with our Convex backend.

You might be thinking, why not just build a new sandbox image when the relay script changes? That would be a fine option too, but the relay script is small and has changed quite frequently as we improved the reliability. Not having to set up a sandbox image deployment pipeline for every small change to the relay script was a time and effort saving decision that worked out well.

When a run starts, we write that string into the sandbox as a .mjs file alongside a JSON config containing the webhook URL and auth token. Then we launch it as a background process:

// Write the relay script and config into the sandbox
await sandbox.files.write("/tmp/relay.mjs", OPENCODE_STREAM_RELAY_SCRIPT);
await sandbox.files.write("/tmp/relay-config.json", JSON.stringify({
  runId,
  sessionId,
  opencodeBaseUrl: "http://127.0.0.1:3000",
  webhookUrl,
  webhookToken,
}));

// Start it in the background so it doesn't block the run
sandbox.commands.run("node /tmp/relay.mjs /tmp/relay-config.json", {
  background: true,
});

After launching, we wait a couple of seconds and then verify the process is actually alive. We check that it shows up in pgrep, scan the log file for fatal errors, and confirm the startup webhook reached our backend. If any of those checks fail, the run gets aborted early instead of silently running without event streaming.

Streaming data to the client

Codecloud streaming the agent's reasoning tokens in real time

Getting events from the sandbox to our backend is half the problem, but ideally we should also push those updates to the Codecloud UI. This way a user can see the agent's reasoning as it happens in real time.

For a cloud-hosted coding agent, seeing the thinking tokens stream in live is really important for usability; otherwise you would be waiting for a very long time before seeing any output.

This is where Convex really shines. Convex queries are reactive by default: when data changes on the backend, every client subscribed to that query gets the update automatically.

We use the persistent text streaming component, which gives us a simple API for appending text chunks on the backend and reading them reactively on the client. When a delta event arrives from the relay webhook, we call addChunk to append the text to the stream. On the client side, a single useQuery call subscribes to the stream body and automatically re-renders as new chunks arrive!

Resuming runs with follow-up messages

E2B offers persistent sandboxes that survive between sessions, but we deliberately don't use them. Why? Well, each sandbox contains a full copy of the Git repo for a customer. We don't want this data hanging around for a second longer than necessary!

We'd rather have full control over what data is retained and for how long, so every sandbox gets destroyed when the run finishes.

This creates a problem for follow-up messages though: if a user says "actually, can you also update the tests?" after a run completes, we need to spin up a brand new sandbox with a fresh instance of the agent. The agent starts with zero context and has to figure everything out from scratch.

We could simply store and replay the user and assistant messages from the conversation, but that misses all the internal state: tool calls, file contents the agent read, reasoning context, and other metadata that shaped its decisions. As well as taking much longer, it also wastes a lot of tokens in the process.

Session export to the rescue

Luckily, OpenCode stores its session state in SQLite and supports opencode export and opencode import commands for serializing and restoring a full session. We use this to bridge the gap between disposable sandboxes and stateful conversations.

When a run finishes, before the sandbox is destroyed, we export the full session via the OpenCode SDK. This gives us a JSON blob containing the session metadata and every message (including tool calls and their results). We upload this to Convex file storage, and save the storage ID on the run document.

// Export session state before destroying the sandbox
const exportJson = await exportSessionViaSdk(client, run.sessionId);

// Upload to Convex file storage
const uploadUrl = await ctx.runMutation(internal.runs.generateSessionUploadUrl);
await fetch(uploadUrl, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: exportJson,
});

When a follow-up message comes in, we download the stored export, spin up a fresh sandbox, and import the session before the OpenCode server starts. This has to happen before the server boots to avoid SQLite concurrency issues.

// Write the exported JSON into the new sandbox
await sandbox.files.write("/tmp/session-export.json", sessionExportJson);

// Import into OpenCode's SQLite database (before server starts)
await runCommand(sandbox, "opencode import /tmp/session-export.json", {
  timeoutMs: 30_000,
  cwd: "/home/user/workspace",
});

The agent then picks up exactly where it left off, with full context of every file it read, every tool call it made, and every decision it reasoned through. From the user's perspective, it feels like one continuous conversation even though the underlying sandbox was completely rebuilt.

Reliability lessons

Most of our debugging time went into edge cases, not architecture. Here are a few that bit us:

1. Background processes block the parent command

E2B's sandbox.commands.run() doesn't just wait for the shell to finish. It waits for every descendant process to exit. So if you start a background server with nohup server &, the .run() call blocks until that server dies too.

To get around this, we used the background: true parameter from the E2B SDK, which runs the sandbox command in the background.

2. You need a watchdog, but it's tricky to get right

E2B sandboxes can live for a long time, so it's essential to know that the OpenCode process inside it is still running and healthy. If it isn't, we want to kill the run as soon as possible and notify the user.

To achieve this we built a watchdog that runs every 30 seconds and kills runs after a period of inactivity. The watchdog re-schedules itself to run again in 30 seconds so we don't need a long-lived Convex action.

3. LLM silence doesn't mean the agent is stuck

For most runs, we can rely on LLM output as an activity indicator. But we saw that more complex problems can take 10-15 minutes of reasoning before the LLM produces any output. So how do we know the sandbox is still running and hasn't stalled?

We actually frequently had issues where the watchdog would kill the run even though it was technically running. Our workaround is a process monitor inside the sandbox that looks at both log output from OpenCode and also checks the process is still running and healthy. We send these as heartbeats to our webhook running inside Convex to make sure the watchdog is aware.

4. Let the agent handle commits and PRs

We initially handled the commit and PR process programmatically. When a run finished, we'd ask the agent for a commit message, then call sandbox.git.commit() and create the PR through the GitHub API ourselves.

This quickly gets complicated. Users typically want the full power of a coding agent, including rebasing branches and resolving conflicts. If you tag @codecloud in a PR and ask it to fix conflicts for you, that's difficult to handle programmatically.

We ended up giving the agent a tightly scoped GitHub token and the gh CLI instead. It's a lot more powerful and more in line with what people expect from a coding agent.

5. OpenCode can consume a lot of memory

Early on, we used the base sandbox size: 2 vCPUs and 512 MB of memory. Given we're just calling LLMs, that felt like plenty.

It turns out that on larger repos and more complex changes, OpenCode can easily use 3 or 4 GB of RAM. We noticed this when the OpenCode process inside the sandbox would randomly die on specific runs.

Thankfully E2B lets you configure sandboxes with more resources, so we now have the option of running a larger instance if your workflow requires it.

More about Codecloud

Codecloud lets you run coding agents like OpenCode and Claude Code against your GitHub repos without having to worry about infrastructure or state management.

We also have a handy Linear and GitHub integration that lets you assign issues to @codecloud and get automated PRs.

If you're looking to build coding agent automation for your company, check out codecloud.dev.

References


Further reading