Realtime Avatar

Web playback

Play a streaming turn on a canvas with AvatarPlayer.

AvatarPlayer (from realtime-avatar/browser) decodes the avatar mux stream and renders it to a <canvas> with an audio-clocked loop, so audio and video stay in sync. It handles JPEG/I420 decode, buffering, and frame dropping for you.

Minimal player

import { RealtimeAvatarClient } from "realtime-avatar";
import { AvatarPlayer } from "realtime-avatar/browser";

const client = RealtimeAvatarClient.webProxy(); // browser-safe; key stays server-side

const player = new AvatarPlayer();
player.attach(document.querySelector("canvas")!);

// Unlock audio inside a user gesture (browser autoplay policy):
playButton.addEventListener("click", async () => {
  await player.unlock();
  await client.prepare({ avatar_id: "ava_…" });
  const stream = await client.turn({ avatar_id: "ava_…", mode: "speak_text", text: "Hi!" });
  await player.play(stream);
});

Browsers start audio suspended until a user gesture. Call player.unlock() from a click/tap before the first turn, or the avatar will appear to play silently.

React

For React apps, start with useRealtimeAvatar — it hides WebSocket setup, GPU prepare, audio-clocked canvas playback, optional WHEP/SFU preconnect, keepalives, and HTTP fallback:

import { useRealtimeAvatar, RealtimeAvatarView } from "realtime-avatar/react";

function Avatar() {
  const avatar = useRealtimeAvatar({
    avatarId: "ava_…",
    mode: "interactive", // default: fastest 1:1 path; "livestream" preconnects WHEP/SFU
  });
  return (
    <>
      <RealtimeAvatarView avatar={avatar} style={{ aspectRatio: "9 / 16" }} />
      <button
        disabled={!avatar.ready || avatar.busy}
        onClick={() => void avatar.speak("Hello!")}
      >
        Speak
      </button>
    </>
  );
}

The controller exposes ready/busy/error for UI state, chat(text, { history }) for LLM turns, stop() to cancel a turn while keeping the warm socket, and metrics/transport/mediaState for debugging. Set app-wide defaults once with RealtimeAvatarProvider. The lower-level useAvatarSession + AvatarStage remain available for advanced control.

Lifecycle

  • attach(canvas) — bind (or rebind) the canvas.
  • unlock() — resume the audio context from a gesture; safe to call repeatedly.
  • play(stream) — resolves only after playout fully drains (no end-of-turn freeze).
  • stop() / dispose() — stop the current turn / release the audio context on unmount.

On this page