Cluster

One of Ventuz's key features is the capability to run on a cluster of machines. When the target number of outputs or the target resolution is larger than what a single machine can achieve, additional render nodes can be interconnected via a network and their outputs be synchronized by Ventuz to deliver a consistent experience. A 360 degree projection consisting of 36 projectors or a display wall consisting of 16 HD displays are just two examples of highly successful presentations achieved using the Ventuz cluster support.

In a Cluster, every machine gets the whole Ventuz Scene. With the Render Setup, every machine decides which part of the Scene it has to render. Cluster communication is established over a network, distributing a common time basis and other cluster relevant information among the machines.

Although the Cluster Support is a mature and frequently used technology, a word of caution is in order. Inherent in multi-machine setups is the fact that they introduce a completely new, additional set of potential problems. Due to recent trends of adding more and more outputs to graphic cards, a lot of scenarios that would only have been possible using a cluster not so long ago can now be done with a single machine. As a rule of thumb, if it is possible to just use a single machine by spending money on a bigger graphics board (assuming the processing power suffices to render the scene in question), do it. It will avoid a lot of the problems mentioned in the discussion below.

Hardware Setup

A cluster by definition consists of multiple machines each running an instance of Ventuz which are interconnected in a network. While a dedicated master/control machine is not strictly necessary, most clusters include one for various reasons (as a control panel for the presenter, for maintenance operations, ...).

All machines that act as rendering clients should have identical hardware components, graphics card driver versions and so on. In addition, the displays/projectors connected to their outputs should be identical. While it is possible to use mixed setups, this often introduces artifacts that are particularly hard to detect/fix.

All machines should be connected to a dedicated, fast network. Again, although it is possible to have a cluster connected to a network used by other machines, the additional traffic can disrupt the time-critical synchronization process between the Ventuz machines.

Finally, each machine requires a copy of Ventuz Presenter (see here for reasons why not to use the Ventuz Designer), the appropriate license as well as a copy of the Ventuz Presentation (VPR) to be run. Depending on the requirements, additional synchronization hardware may be necessary, a separate topic to this section has been dedicated further below.

Cluster Clock

The basic rendering synchronization between Ventuz machines is achieved via the so called Cluster Clock. Each Ventuz instance has its own internal clock that measures the number of rendered frames and uses the frame rate set in the AV Configuration (see global FPS ) to estimate the current time. All time-based nodes (e.g. the Mover Node), animations and so on use the clock to update their values to the appropriate values for the frame rendered.

In order to synchronize the renderings of the individual machines, the clock information has to be synchronized. This is done by having the machines broadcast their time via UDP packets on the network. This is activated via the Ventuz Configuration. Each machine has to be assigned a group ID (identifying machines belonging to one cluster from those on another cluster) and a unique machine ID. A machine will both broadcast its time as well as listen for packages coming from other machines with the same group ID. If it receives a package from a machine with a smaller machine ID than itself, it will set its own clock to the value received by that machine and stop broadcasting its own time. If there are multiple such packages, the one coming from the machine with the lowest machine ID is used.

The value of a machines clock can be seen in the Performance Statistics overlay as soon as Cluster Support is enabled.
The Clock row shows following values:
0013672 is the number of the rendered frames. 0d 00h 03m 47.87s is the duration this Ventuz instance is already running. Cluster ID 01.02 means that this Machine has the Group ID 01 and the (Machine) ID 02.
clock from 01.01 means that the Cluster Clock is taken from the Machine with ID 01 in the same Group. If the machine uses its own clock, gray text color is used. If the machine uses another machine as timing master, green text color is used. skip shows the number of skipped (dropped) fames and dup shows the number of frames rendered with the same clock (no clock increase ways performed compared to previous frame).

When there is a tight timing synchronization between the display outputs, the font will be green. When the Clock UDP Package coming from the master does not reach a client in time, the font will change to gray and often change back to green a few frames later. In the basic cluster setup, this happens quite frequently as the hardware displays are not synchronized to present their content at the same time and therefore one machine might start rendering a frame slightly before another. As long as the interruption is only for a frame or two, the visual result may still be acceptable as the machine will continue using its own internal clock to generate the correct time for the next frame.

Synchronization Hardware

In a cluster environment you will have to think about synchronizing the outputs. Especially in the case of video playback, having even a single frame of delay between outputs may be unacceptable. For those scenarios, additional hardware is required - so called sync boards. Popular examples are the AMD FirePro S400 Synchronization Board and the NVIDIA Quadro G-Sync Boards.

Genlock

In the case of Genlock, the timing signal comes from an arbitrary external source. Genlock is often used to synchronize a camera with a display that will be filmed to avoid flickering. In a studio environment, there is usually one time source (called house clock) which generates one common timing signal for all devices in the environment. The sync board forces the display to change its content exactly when the timing signal fires. When all machines in a cluster are connected to the same Genlock signal, they all render their content independently but then are forced to wait for the Genlock signal to present their result. The slight shift mentioned above is therefore avoided and the timing synchronization improved considerably.

Framelock

A similar synchronization is provided by establishing a Framelock. Instead of using a dedicated time source, one of the machines becomes the timing master and its timing signal is patched through to the other machines via the sync boards. On what basis the master performs its swap behavior is independent of the Framelock. The master can for example be bound to a Genlock signal.

Swap Sync

That however is not the whole story as machines may still not render exactly the same frame despite cluster clock and genlock. Imagine two machines rendering a presentation at 60Hz and for one reason or the other, one machine takes just below 16ms and the other just above 16ms to render. The first one is ready in time for the genlock signal and will present its content, the other will still have the old content and only update during the next genlock signal. In effect an animation will run smoothly on one machine and get stuck a frame on the other.

A synchronization that prevents this has to work on a deeper level. Instead of enforcing a certain time to change the display content, it has to enforce that no display changes its content before all machines are ready to do so. This is what Swap Sync does. Where Genlock/Framelock works on the driver level of the graphics card, Swap Sync requires support by both the driver and the application.
This also means, configuration of the synchronization boards is done in the graphics card driver, while Swap Sync needs to be enabled in the AV Configuration.

For further readings please visit How To Cluster Rendering.

Real Life Situation

What synchronization technique is required depends on the quality requirements of the specific situation the presentation is running in:

Genlock: A Genlock connection is required when the rendering cluster has to be synchronized with non-Ventuz devices. The most common example is a video camera that is supposed to film the output. To avoid the camera taking a picture while the display changes its content, the camera and the rendering cluster have to use a common time source.
Framelock with Swap Sync: This is the highest level of quality achievable. All render slaves act as one machine and wait for each other before starting to render a new frame. Even when the scene drops frames, the output will be consistent.
Framelock/Genlock without Swap Sync: As long as no machine drops a frame (i.e. manages to validate and render a scene within the refresh rate) and the network manages to deliver the Cluster Clock messages in time, either of these will produce a perfect synchronization. When a machine drops a frame, the output will be inconsistent to the reset of the cluster until it re-synchronizes itself via the Cluster Clock.
Cluster Clock only: Since the point in time when a display refreshes its content is independent of the other displays, the rendering loops of the Ventuz instances will not be aligned. So each machine individually decides when to start rendering a new frame and will use the cluster clock time available at that point. So a machine will in all probability either use a clock time that is either too early or too late as it is highly unlikely that it will start exactly when the cluster clock master starts the new frame. It highly depends on the scene whether the shift between the individual scenes becomes noticeable to the audience.