Developer Jun 21, 2022

How Does a Truly Interactive Live Streaming System Works

Recently, Inke, one of the largest live streaming platforms, changed its name to Inkeverse and marched toward metaverse in its business. Inke was launched in May 2015 and has been a live streaming pioneer and leader in driving the growth of Internet entertainment and social networking.

Nowadays, live streaming has been a lifestyle for almost everybody. The new trend shows that it is bringing you into a metaverse, and lets you stream from a 3D virtual studio.

If you are working on a live streaming-related product, it is a must for you to understand how an interactive live streaming system works in a large picture. We will go through a few points and help you understand them quickly.

High-Level Architecture of the System

Live streaming is the process of broadcasting live video to an audience. An interactive live streaming system allows you to interact with your audience members or co-hosts through real-time video.

Typically, an interactive live streaming system consists of several core components: a client-side media engine, an RTC network, a CDN, and an instant messaging module.

The media engine handles all the client-side media processing, including capturing, pre-processing, encoding, transmission, decoding, and rendering. The RTC network takes care of real-time data communications and acceleration. The CDN is responsible for high concurrency distribution.

The pre-processing modules allow media data to be processed before encoding to add special effects such as beautification or avatar. The instant messaging module can be used for text chat, virtual gifting,  delivering live quiz content, and other purposes.

Large-Scale Live Streaming

When a user starts live streaming, the client-side media engine will start capturing and transmitting the stream to a nearby edge node of the RTC network. The stream will first arrive at the RTC network. And then, a dispatch center and a cluster of media servers will accelerate the stream and then relay it to the CDN for high concurrency broadcasting. When an audience member subscribes to the published stream, the client-side media engine (this time, on the subscriber side) will pull the stream from a nearby edge node of the CDN.

ZEGOCLOUD uses a private UDP-based transmission protocol to accelerate stream delivery across the RTC network. And to forward the stream to the CDN, its transmission protocol has to be converted to RTMP since the CDN supports RTMP but won’t recognize a private transmission protocol. To support a large audience, we have to rely on the CDN since it can achieve large-scale distribution at a lower cost.

ZEGOCLOUD offers a standard live streaming solution with an end-to-end latency of about 1 to 3 seconds for broadcasting. If you want lower latency, ZEGOCLOUD offers a premium solution that can help you achieve end-to-end latency of less than 1 second for broadcasting. In this case, ZEGOCLOUD uses its self-built RTC network, MSDN, to accelerate data transmission for both the host and audience.

Interactive live streaming,you can interact with others in real-time

In an interactive live streaming session, you can co-host a show with other hosts or interact with invited audience members through audio and video in real time. In this scenario, the host, co-hosts, or audience members participating in the real-time interaction will subscribe to each other’s stream from the RTC network rather than the CDN to achieve ultra-low-latency communications.

There are three parts to the stream journey, i.e. stream publishing, stream acceleration, and stream subscription and playback. The process of publishing a stream from one host to the RTC network is pretty much the same as what we’ve discussed just now for the basic live streaming scenario. To accelerate transmission, the dispatch center will select an optimized route for the stream intelligently, and the media servers on the route will transmit and relay the stream to a nearby edge node of the RTC network for a co-host or an audience member, who will subscribe to and fetch the stream for playback.

Stream Recording on the Cloud or on-Premises

There is a strong demand for social platforms or educational institutions to record live streams for playback on-demand. There are different ways to do the recording. As streams are normally distributed through a CDN, you can perform the recording on the CDN as well (that is, cloud recording); or you can deploy the recording service on your on-premises servers (we call this on-premises recording); or streams can be recorded and saved locally on end-user devices (that is, local recording).

  • Cloud recording

The streams are transmitted to the CDN and then recorded and saved in common media formats like MP4 or FLV.

  • On-premises recording

The streams are pulled from ZEGOCLOUD’s RTC network (MSDN) and recorded on your on-premises servers by using ZEGOCLOUD’s on-premises recording SDK.

  • Local recording

You can do it by calling the related APIs of ZEGOCLOUD’s live streaming SDK.

For both cloud recording and on-premises recording, you can choose to record all the streams of a live streaming session separately or mix them into a single stream and then record the mixed stream.

Instant Messaging for Virtual Gift, Likes and More

These features can be used in various use cases to enhance user engagement significantly. The instant messages used for these features are rich structured messages supported by message templates that allow you to fill in icons, text, and themed layouts.

Let’s use ZEGOCLOUD’s in-app messaging feature as an example. It supports custom messages that can be used for sending virtual gifts, likes, and others. Once a virtual gift is sent, a virtual gifting message is broadcasted in the room with text and colorful icons to notify the users about the event. You can use instant messaging to implement likes and other features in the same way.

AI-Powered Effects like Stickers and Beautification

These AI-powered audio and video effects can create a lot of fun for users and be used as effective tools to boost user interactions. You can apply audio effects to make your voice sounds nicer, change your voice to a baby voice, etc. Also, you can apply video effects to complete your look prettier, add makeup, put AR stickers on your head, etc.

All these effects must be applied to raw audio or video data after capturing before encoding at the sender end. You should obtain the raw audio or video data before encoding, feed it to AI-powered add-on modules for adding desired audio and video effects, and then send the processed data back to SDK for encoding.

The live streaming solution must provide interfaces for outputting raw media data, or the AI-powered add-on modules cannot work. For example, as the Google Chrome browser doesn’t provide interfaces for raw media data outputting, AI-powered add-on modules cannot work with real-time audio or video calls on the Google Chrome browser.

ZEGOCLOUD’s solutions adopt a very open and developer-friendly policy and provide various interfaces for customization. ZEGOCLOUD’s solutions, including live stream, allow you to obtain raw audio or video data to apply various AI-powered effects before encoding.

A live stream system is a comprehensive solution with various technologies to work together seamlessly. If you are working on a live streaming platform, and wonder if there are new ways to improve user engagement or add live streaming into a metaverse, please contact us and speak with an expert to discover more.

Tags

ZEGOCLOUD

Building stable and high-quality cloud streaming services for real-time audio and video communications.