Products Mar 18, 2022

Pushing the limits of technology, ZEGOCLOUD launches innovative online KTV real-time chorus solution

It is joyful to sing karaoke with friends in a KTV club (KTV refers to Karaoke MTV, which is popular in Asian countries). However, the Covid-19 pandemic has caused an abrupt shutdown of KTV clubs, and many people have turned to online social and entertainment activities to spend their leisure time.

Online KTV has been going through rapid upgrades, from offline solo singing being recorded and uploaded online, to online "quasi-chorus" with multiple people singing in turns, and to the most recent online real-time chorus replicating the offline karaoke chorus experience. With continuous technological breakthroughs, online KTV has been enhanced dramatically in many areas, including the real-timeness and interactivity.The rapid upgrades of online KTV reflect users’ compelling desires for “in time” and “authentic” companionship through real-time interactions in their online social life via shared experiences, and singing songs together is a popular choice.

To satisfy this common need, ZEGOCLOUD has been pushing the limits of technology and has launched an innovative online KTV real-time chorus solution that brings the wonderful offline chorus experience online

1. Adding values to clients with improved user engagement

Dozens of clients have adopted ZEGOCLOUD’s chorus solution, and all of them have achieved thrilling increase in user engagement metrics. For example, after adopting ZEGOCLOUD’s online real-time chorus solution, a sizable social platform has achieved a significant improvement in user engagement metrics, notably a 3% increase in its DAU and a 10% increase in the number of concurrent rooms.

2. Delivering an authentic real-time online karaoke experience to end-users

Currently, the common chorus solutions in the markets allow users to sing in turns. Let's call this solution “online sing-in-turns". The key difference between an online sing-in-turns solution and ZEGOCLOUD's online real-time chorus solution is that the latter allows users to sing at the same time online and hear one another’s voice in real-time, fully replicating the experience of an offline chorus.

With an online real-time chorus solution, singers can hear and interact with one another in real-time, and the accompaniment is synchronized across all participants.

With an online sing-in-turns chorus solution, the lead singer sings first, and then the sub singer take turns to sing the song. the accompaniment is mixed into the main stream at the lead-singer's end. In essence, this is not a real chorus. To list a few of its disadvantages:

1) Bad experience for the lead singer. The lead singer cannot hear the sub singer’s singing, although the sub singer can hear the lead singer, and the audience can hear all singers properly.

2) Limitation of the number of singers. It is very difficult to have an online chorus of 3 people or above because the technology solution becomes much more complicated and challenging to implement due to accumulated latency.

In order to resolve the above two limitations, and not to compromise the experiences of the sub singer and audience, ZEGOCLOUD engineered an online real-time chorus solution, which allows all singers to hear and interact with one another in real-time. Also, the solution eliminates the limit on the number of participating singers. To achieve these advantages, ZEGOCLOUD has re-factored the online chorus solution, and conquered a number of technology challenges, which will be laid out in detail in the following sections.

3. The technological challenges that ZEGOCLOUD has overcome to build the online real-time chorus solution

With years of experience in engineering and technology, ZEGOCLOUD has made a number of key technical breakthroughs:

1) End-to-end latency reached an ultra-low level of 70ms

According to research reports, online chorus without perceivable latency will happen only when the end-to-end latency reaches a very low level of 80ms or below. ZEGOCLOUD has broken the threshold and made it to 70ms, which means we can transmit the sampled music/voice data from one user-end to another without perceivable delay. Literally, with ZEGOCLOUD’s solution, online real-time chorus can be achieved.

We have broken down the whole process of transmission into 6 consecutive parts, i.e., sampling, pre-processing, encoding, transmission over MSDN (ZEGOCLOUD’s proprietary acceleration network), decoding, and rendering. And then we set lowering latency as the first priority for each part. That includes the optimization of sampling and rendering, optimization of pre-processing and post-processing, adoption of encoding and decoding solutions with least latency, and adjusting the streaming control strategies.

Through various methods to squeeze every single part for latency, ZEGOCLOUD managed to reach an ultra-low end-to-end latency of 70ms to ensure an outstanding user experience in online chorus.

2) Precise synchronization of accompaniment across all participants

To replicate the experience of offline KTV chorus where users can hear the accompaniment at the same time, ZEGOCLOUD adopted the strategy of playing the accompaniment at each user-end locally without having to depend on other participants. So all participants can sing along with the accompaniment. However, another challenge arises: how to make the playback of accompaniment synchronized across all participants?

To conquer this challenge, ZEGOCLOUD uses NTP (Network Time Protocol) servers for precise time synchronization, to achieve a nice effect that the accompaniment is played back at every single user-end in synchronization. Meanwhile, to eliminate the accompaniment loading time for the media player on the user side, ZEGOCLOUD adopted a strategy of loading the accompaniment in advance to make player ready for background music synchronization.

3) Multiple singer streams are mixed in synchronization at the server end

Besides crafting a nice choral experience for singing users, ZEGOCLOUD considers it equally important to ensure a great user experience for the audience, who should hear a smooth chorus with all the singers’ voices and the accompaniment blended nicely in perfect synchronization. To achieve this, ZEGOCLOUD adopted the approach to mix streams at the server side with precise synchronization in timing. Three key aspects are taken care of in this regard: 1) streams are mixed into one single downlink stream so as to maintain bandwidth at a low level for the audience; 2) singing users’ vocal tracks and  the background music are synced and the paces are fixed; 3) the synchronization is done through precise calibration with NTP clocks. In the end, each audience user just needs to subscribe to a single downlink stream to hear the chorus with nice effect, and the bandwidth required for the audience is very low and fit for most users.

4) Support for more than 3 participants

The number of singing users participating a chorus can be increased easily. With the well-designed architecture of ZEGOCLOUD’s solution, every single singing user is independent, and doesn’t have to rely on others for the accompaniment or vocal track as the base for stream mixing. Besides, the audience hear the chorus by subscribing to only a single combined stream and won’t be subject to the impact of the dynamics of each individual vocal track.

5) Ultra-low latency for in-ear monitoring

With in-ear monitoring switched on, singing users can get nicer feedback of their own voice and accompaniment at ultra-low latency.

4. We have the right online chorus solution for every scenario

Based on the requirements of various online KTV business scenarios, you can select the right solution to deliver the best user experience.

1) Pay-per-singing chorus

Uses pay for the opportunity to sing a chorus with the lead singer. The paying user can select a song, and connect and interact with the lead singer.

Core requirement: To guarantee the experience of the paying user (i.e., the sub singer, singing a song together with the lead singer) as the first priority, and then the experience of the audience.

Suitable solution: Online sing-in-turns chorus. With this solution, from the sub singer’s perspective, the downlink stream contains a synced mixture of the lead singer’s vocal track and the accompaniment, pulled from the lead singer’s end. The sub singer initiates a stream-mixing task, which will be executed at the server side to produce a synced mixture stream, containing both the lead singer’s and the sub singer’s vocal tracks and also the accompaniment. Thus, the audience will hear a synced chorus.

2) Online real-time chorus

Multiple users can interact in a KTV room and initiate a virtual chorus for fun. This scenario is designed to thrill users with social interactions.

Core requirement: To allow multiple users to sing at the same time and hear each other.

Suitable solution: Online real-time chorus.With this solution, ZEGOCLOUD's ultra-low latency real-time voice/video technologies ensure that participants interact with each other smoothly in real time, making online choral singing an enjoyable experience.The solution is easy to integrate, and it supports 3 or more participants in a chorus, replicating the offline KTV experience to a large extent.

3) Online-Merge-Offline (OMO) cross-room chorus

In this scenario, two groups of users in two different KTV rooms, for example, a traditional offline KTV room and a mini-KTV box in a shopping mall, . can pick a song and have a real-time chorus among the singing users across the two rooms.

Core requirement: To allow users from two rooms to connect and interact in real-time and perform chorus.

Suitable solution: online real-time chorus.ZEGOCLOUD helps break down the distance barriers with its ultra-low latency real-time voice/video technologies, making online interactions truly real-time and authentic.

5. A few recommended use cases to cover maximal users with low entry barriers

The following are few typical use cases:

1) Solo: a user who obtained a singing slot can select a song for solo singing. The user has to wait in a queue until his/her turn comes and the accompaniment starts to play before he/she can start singing.

2) Snatch-to-sing: a user can choose to join a room from a list of rooms of different song categories according to his/her preference. The song to be played is chosen randomly, and users in the room can try to snatch a chance to become a singer, and then solo a part of the song.

3) Sing-in-turns: a whole song is divided into several parts, and distributed to several users. The lead singer will sing the first part, and the rest will sing his/her own part respectively in turns.

ZEGOCLOUD's innovative online real-time chorus solution, has stood out as a significant breakthrough in technology. This achievement has raised the technology benchmark to a new and unparalleled level in the world. Nevertheless, this cutting-edge solution does impose demands on the capability of end-user devices and the network. We have to point out that not every user has a powerful enough smart phone and internet connection to do online real-time chorus. Internet entertainment is born to meet the needs of various users with the maximum coverage regardless the capability of their smart phone or internet connection.In this regard, we recommend you to consider the three use cases mentioned above: “solo”, “Snatch-to-sing”, and “Sing-in-turns”. These user cases lower the entry barriers for users to the maximum extent, which enables your platform to serve as many users as possible.

Looking forwards, with the upgrades of smart phones and network connections, the potential of ZEGOCLOUD’s online real-time chorus solution will be fully released to enable more viral apps.

Talk to expert



Building stable and high-quality cloud streaming services for real-time audio and video communications.