How does Emerstream Live Streaming work?

A blog post which explains how we achieve our Live Streaming Service.

Technology is constantly evolving and this most certainly evident when looking at Web Standards. The majority of the internet is crafting using the Programming Language HTML which has come a long way since it's creation in 1993 and is currently on "Version 5" or HTML5. HTML5 brought about major changes in functionality which we now take for granted, things like Geolocation, better Storage functionality and most importantly in our case; Web-based Real-time Communication (WebRTC).

WebRTC is a very new feature which is still actively being developed and improved upon but now fully released and available in all good internet browsers. WebRTC enables the secure peer-to-peer communication of audio, video and data. In it's simpliest terms, it allows two devices to share audio and video between themselves in a mutually agreed format, and the data is exchanged directly. This allows for minimal latency (the amount of time it takes for the data to be received after it has been sent).

For the purposes of this explanation, the sender of data will be refered to as the "Caller" and the receiver of data as the "Handler", the instance of sharing data will be refered to as a "Stream".

Emerstream's Live Stream functionality builds upon the WebRTC standard, but involves a Media Server, for reasons I will explain shortly.

A session within Emerstream begins when the Handler starts a Stream on the Emerstream Dashboard in their browser, usually by referencing a Call Log reference such as a CAD number, and supplying Emerstream with a Mobile Number for the Caller. We will then issue the Caller with a unique secure link via text message inviting them to share their device camera, microphone and (if requested by the Handler) GPS location data. Once accepted, the device's preinstalled default browser (such as Safari on iOS or Chrome on Android) will open and the Caller will see their camera feed. A secure connection is established between the Caller's mobile device and our Media Server, and audio and video data begins to be stored as a recording. The Handler's browser will be notified of this connection, and rather than connecting directly to the Caller's device, the Handler's browser requests a copy of the Stream from the Media Server. The Handler can now see and hear exactly what the Caller can see and hear.

Both the Caller and Handler have the ability to exchange text messages directly inside the Stream, and the Caller's GPS data (if requested) is also provided and mapped in real time and with accuracy typically around 10 meters.

The Handler has the ability to invite others to view the Stream by issuing secured links which can be restricted to either only allow internal connections (other employees from the same organisation), external secured connections (shared with other organisations who use Emerstream), or single-use share links (when the end-user doesn't have access to an Emerstream account). These invited viewers then request their own copy of the Stream, with the ability to playback previous sections of the stream from the recording.

Once the Stream is ended, either by the Handler or Caller, all parties are disconnected. The recording is however stored subject to the organisation's retention policies and can be accessed in line with the organisations storage and playback policies.

As mentioned earlier, Emerstream uses a Media Server to ingest and process stream data rather than using the typical peer-to-peer model. This is due to three main reasons:

  1. Stream quality and bitrate. The bitrate of a video is the number of bits per second that can be transmitted along a digital network. This is reliant on the bandwidth or connection speed available to the Caller's device. As the service will predominately rely on a cellular (3G, 4G, 5G) connection, the connection speed can vary dramatically. Lets say that the Caller has access to 1Mbps (1 mega byte transfered every second). When the Caller connects directly (peer-to-peer) with the Handler they are able to send 1Mbps to the Handler, which will provide a good quality stream. However, if the stream is shared with the first responder attending the incident, two peer-to-peer connections will be established for each "recipient". The Caller will be sending their video to the Handler and to the first responder, they still only have 1Mbps available, so the WebRTC protocol splits the available speed between the two "receipients", that's 500kbps to the Handler and 500kbps to the first responder, essentially reducing the video quality by half. In reality, a Stream will a level of interest, may be shared with, and viewed by countless numbers of interested parties within the organisation and beyond. The effect of establishing direct connections with each receipient results in the available "bandwidth" halving with each connection, eventually resulting in an unviewable video stream. Using a Media Server completely removes this effect. The Caller only ever creates one connection to the Media Server, constantly transfer the available bandwidth (1Mbps) then each recipient connects to the Media Server. Our Media Servers are supported by a huge amount of processing power and available bandwidth up to 25Gbps. To see a drop in quality from the original, in terms of bandwidth, it would require over 25,000 receipients to be connected simultaneously.
  2. Reliable Stream recording. If a Stream is direct between two browsers, in order to obtain a recording of the footage, the footage would need to be recorded and stored by either the Caller's device or the Handler's browser. There are pros and cons to both. In favour of recording to the Caller's device, the entirely of what is seen would be recorded, the recording would start straight away and not after the Handler begins to receive the footage. However, there is massive lack of control over the Caller's device and various privacy and storage concerns. Recording to the Handler's browser improves on storage concerns and control at the detriment of potentially losing the first few seconds of the incident as seen by the Caller. Both methods would also require an upload period to send the footage to cloud storage after the Stream is ended, which is not ideal. Although HTML5 brought with it some massive improvements on the previous HTML standard, it wasn't up to enabling recording by itself, there were far too many issues. When the Caller connects to the Media Server, however, the footage is instantly recorded to storage behind-the-scenes with little to no impact on the actual Stream.
  3. Added functionality. By passing footage via a Media Server, we can introduce greater functionality and analytics in real-time. The Media Server is therefore able to run, when activatated; facial recognition, content moderation and speech transcription on the Stream, with the ability to increase functionality with ease.

NB. 1Gbps is equivilent to 1000Mbps or 1,000,000Kbps