Live streaming synchronization across thousands of viewers is a complex challenge, particularly when dealing with mixed environments involving multiple formats (HLS and DASH) and diverse players with unique buffer behaviors and ABR logic. The objective is clear: achieve frame-level, or at least sub-second, synchronization in a scalable manner for thousands or even millions of viewers.
This project presents an open implementation that leverages the latest standards—CMCD v2 (Common Media Client Data) and CMSD (Common Media Server Data)—to establish a robust and scalable synchronization framework. By combining a common clock reference with a bidirectional exchange of timing information, we can empower players to autonomously adjust their playback rate to meet a global latency target.
To synchronize players, we first need a shared understanding of time across different streaming protocols.
HLS: By default, HLS plays live content by simply fetching the latest segments. To enable synchronization, we rely on the #EXT-PROGRAM-DATE-TIME tag. This optional tag provides an absolute timestamp (e.g., 2024-03-31T06:53:56.994Z) needed to map media sequences to a wall-clock time.
MPEG-DASH: DASH segments usually have inherent time references by design. Using SegmentTemplate and UTCTiming, the player can calculate the exact elapsed time since the availabilityStartTime. By fetching the current UTC time and knowing the segment duration, the player can precisely determine the current segment number and its corresponding time.
By packaging both HLS and DASH with the same clock reference, we establish the foundation for synchronized playback.
The core of this framework is the exchange of timing information between the player and the server (CDN or Edge). We utilize a request/response pattern to share the state of the playback and the desired target.
Client-Side: CMCD v2
The player reports its status to the server using CMCD v2. Crucially, it sends the Playhead Time (pt) key, which expresses the current point in the stream as milliseconds since the UNIX epoch (e.g., pt=188782018). This allows the server to know exactly where every user is in the stream.
Server-Side: CMSD
The server responds with custom CMSD keys to guide the player.
com.svta-clock: The current server time (e.g., 188784567).
com.svta-latencyTarget: The global target latency in milliseconds (e.g., 5000 for a 5-second delay).
Once the player receives the target latency and the server clock via CMSD, it executes a client-side logic loop:
Sync: The player synchronizes with the Reference Clock provided by the server.
Calculate: It calculates its current live latency.
Adjust:
If the latency is too high, the playback rate is increased (e.g., 1.10x) to catch up.
If the latency is too low (buffer danger), the playback rate is decreased (e.g., 0.90x) to drift back.
The final decision always rests with the player, ensuring smooth playback and preventing buffer starvation.
We have deployed a set of demos to showcase this framework in action across different player implementations. These demos connect to a live simulation to demonstrate the synchronization logic.