HowTo: Cluster Rendering


Introduction

If you have a large Ventuz Scene which spans over multiple screens you can try to use one PC with one graphics card and span all outputs together to get one large desktop (render output). Due to some architectural changes, on Windows 7 this is currently only possible with certain graphics card drivers. Make sure your chosen graphics card supports spanning. The main advantage of this solution is that you can be sure that all screens are in sync.

If your setup requires more than one machine, synchronizing animation and video playback becomes a problem. You can use OpenSoundControl (OSC) to start your animations/videos at the same time, but since all PCs are running on different time-bases the animations/videos will drift and get out of sync. To avoid this, Ventuz provides the possibility of using a Cluster Clock, which ensures that all machines connected to the same cluster are receiving an identical timing.

This page will describe the necessary technical steps to create a compelling multi-display / multi-projectors Ventuz presentation with a cluster of machines.

Building a Display Wall

The goal of this section is to setup a cluster wall of 9 displays arranged as a rhombus. The cluster should consist of three machines with three outputs each. We will assume for this example that the displays are bezelless. Each display has a resolution of 1920 x 1080. The final wall with Ventuz content on it could look like this:


Creating a Render Setup

First of all you have to create the appropriate Render Setup configuration. Open the Ventuz Configuration Editor and select the participating machines.

If the machines for the cluster are already available in the network, it is important that they use the same non-zero Group ID and each machine has an ID starting from 1 for the first machine, 2 for the seconds, etc. If the machines uses multiple outputs you have to configure the graphics driver to treat them as one logical output. NVIDIA calls this Mosaic technology, AMD calls it Eyefinity. Log onto those machines and use the appropriate tools of the graphics vendor (e.g. NVIDIA Control Panel or AMD Catalyst Control Center).

After the machines have been pre-configured correctly, select the group in the Machine Selector and add a new Render Setup by clicking the Plus button. Choose a sensible name and click OK. This will open the Render Setup Editor with the display configuration of the connected machine or group of machines which is currently active in the Graphics Driver. The initial Render Setup we need is a 3x3 display wall with three machines of 3x1 displays.


If you want to create a Setup without being connected to the real cluster machine, launch the Create New Setup dialog and select the Advanced tab. Displays Total must be set to 3x3; fill in the correct resolution and increase the Num. of Machines to three! Make sure that Displays per Machine is set to 3x1 (an arrangement of 1x3 would be possible but would not give the optimal result).
The display position of the upper and lower machine have to be modified to get the required arrangement. Do not forget to save the final result!

The Visual and Logical Resolution per machine is limited to 16384 x 16384!



Save the new Render Setup as the the active configuration in the Ventuz Configuration Editor. Every time Ventuz starts it will automatically use this configuration for rendering! Depending on the ID of the machine the correct part of the setup will be rendered. The ID 0 will render the Cluster Preview. The Machine ID can be adjusted on-the-fly via Live Options.

Deleting displays or display fragments in the Display Editor will not change the basic physical configuration of a Render Setup. The deleted display just won't receive the Ventuz scene content and keeps the default content which is in most cases just black. It is not guaranteed that this content is initialized in memory however!
E.g. a 4x1 display setup with deleted right display does NOT result in the same Render Setup configuration as a new 3x1 display setup! The initial Render Setup must always match the physical display configuration (AMD Eyefinity or NVIDIA Mosaic) of the according machine.

Note that changing and saving the active Render Setup configuration on a Machine with running Ventuz will instantly affect the render output!


Working with Render Setups in Ventuz Designer

The scene from the wall will look as follows in the Renderer window of Ventuz Designer. As Ventuz can only render rectangular areas it applies a mask in the preview modes (Cluster and Machine Preview) to mark the render content which will not be visible in the final Production Mode when running in the Ventuz Runtime.


Render Setup Modes

There are three different rendering modes for Render Setup rendering available:

The last two modes are pretty handy if you want to see whether your scene is well prepared for a multi-machine rendering or not.
The rendering mode can be changed in the Stage Editor by clicking on the display arrangement preview at the bottom of the Stage Editor. Click on a single highlighted machine or the border of the whole cluster to switch between Machine and Cluster Preview. To switch to Production mode use SHIFT - left mouse button click.


As the rendering mode is linked to the Machine ID it can also be changed in the ID drop-down box of the Stage Editor.
To get a aspect-correct rendering make sure that Format settings in the Project Properties match the aspect of the whole Cluster. The Stage Editor provides a shortcut to adapt the Format to active Cluster configuration: just click on the cross in the toolbar on top.

Render Setup sensitive Nodes

With the new Render Setup technique in Ventuz 4 the content of a scene created in Ventuz 3 usually does not need to be adapted if it is going to be displayed on a multi-machine cluster. Of course there are exceptions to the rule!
The content aspect and thus visible region might need some modifications.
There are also a handful of nodes which need a closer examination when it comes to multi-machine rendering.
These are Overlay Rectangle, Viewport and both RenderTarget nodes. All these nodes have a ScreenAligned property which tells them how to handle content in Cluster environment.

First of all we take a look at the Overlay node and how it behaves with different ScreenAligned settings in this Render Setup configuration. A simple scene containing only a Texture in front of a Overlay Rectangle will demonstrate the differences.


The left image show the full Cluster preview with one texture over all displays. The image in the middle shows the preview of the upper machine with disabled ScreenAligned mode of the Overlay node. The complete texture is visible over the whole machine output. This does NOT match with the upper machine content in the Cluster preview. The right image show the machine with enabled ScreenAligned property. In this case the machine only displays the part of the texture according to the Render Setup configuration.
In other words: the Overlay node in ScreenAligned mode performs a per-machine segmentation of cluster-wide textures. If you already have pre-segmented machine-individual textures then the ScreenAligned property must be disabled.

It's the same behaviour with the Viewport node as it can apply a fullscreen texture too! The Viewport node has some output properties which inform about the Visual Resolution of the whole Cluster and the Machine on which Ventuz is currently running. The properties in the ClusterViewport category provide information about the rectangular pixel resolution (bounding box) and aspect of the whole Cluster. The MachineViewport properties provide corresponding information but only for the current machine. In case of the Render Setup from this example the ClusterViewport properties of all machines would have the same values. The middle machine would have different MachineViewport properties than the upper and lower machine.

The RenderTarget node also needs a closer look. In most case RenderTargets are used to apply fullscreen/-scene post-processing. In multi-machine environments this could be done in two ways:

  1. Render the whole cluster scene to a RenderTarget with disabled ScreenAligned mode; do post-processing like glow or similar; apply the final result to an Overlay node in ScreenAligned mode. This method is inefficient for several reasons. In most cases RenderTarget Size property is set to Screen or Viewport and these values represent the visual Resolution of a single machine. Rendering the whole Cluster content to a single Rendertarget will degrade the visual quality due to down-scaling. This also degrades the Multisampling quality! Additionally this method renders parts of the scene which are not needed on the current machine. The Overlay Rectangle will cut and display the correct part from the Rendertarget texture but the result will look blurred with bad multisampling.
  2. Render the whole cluster scene to a RenderTarget with ScreenAligned mode enabled; do post-processing...; apply the final result to an Overlay node with ScreenAligned mode disabled. This is the better way because here only the content of the current machine is rendered to the Rendertarget without any loss of quality.

If the content of the Rendertarget is applied to a 3D geometry which may be visible everywhere in the scene, the ScreenAligned property must be disabled. An example of this usage could be the creation of small Next and Previous Slide previews in a Slideshow scene.

In addition to Viewport the SystemID node provides Cluster- and Machine-specific information, mainly about render size and position. The ID of the machine is also provided and can be used to build 'machine-aware' scenes.

If you encounter rendering performance problems with high output resolutions, try to reduce the Multisampling quality of the Output Configuration and the RenderTargets.


Multitouch and Interaction

Under certain circumstances all Interaction nodes work without problems in a Multi-Machine setup. There are a few conditions which have to be fulfilled:

All machines of the same Cluster need to receive the same input signals like TUIO. This coordinate space of the input must be mapped to the Cluster coordinate space: top-left corner of the Cluster bounding-box would be [0;0] and bottom-right would be [1;1]. Every single machines takes the input signal and transforms it into its local coordinate space.
Note that a Cluster display-wall built up of Windows Touch screens will currently not interpret the touch inputs correctly!

The Web Browser node can also be used on a multi-machine wall to create one huge browser. There are also some limitations which must be considered:

  1. Every Web Browser node renders the whole web page even if most parts of the page are not visible on a machine.
  2. Browser page size is currently limited to 4096 x 4096 pixel. Thus a Cluster with a higher resolution will display a scaled up web page.
  3. Loading and presenting the web content is not synchronized over the machines.

Math Effects

Under unfavorable conditions, some of the Math Effects Nodes might cause issues. In an environment where the machines do not render simultaneously, for example the Modulo operation on one machine may have reached the boundary value while another machine is even 1 or 2 frames behind, having not reached the value. This might be a more or less serious problem, depending on the scene. Please note that this is not a major issue when it comes to cluster rendering. But keep in mind that such issues might exist and think back if unexpected behaviour appears in your scene despite the logic is faultless.

Setting up the Hardware

Make sure that every machine that should participate in a Cluster rendering has the same Ventuz Version installed and that the Ventuz Machine Service is running!

A Ventuz Cluster setup will only work correctly in Ventuz Runtime. If you are running from inside Ventuz Designer, delays, artifacts and synchronization issues can occur. Thus ensure that all presentations and shows are run from within Ventuz Runtime, not Ventuz Designer!


Cluster Clock and Cluster ID

In order to set up a synchronized Cluster environment, you need the Cluster Feature license. Setting the Group ID in the Ventuz Configuration Editor to a value greater 0 will automatically enable the Cluster Clock synchronization via network.

Make sure that all machines which are supposed to use the same cluster clock have the same Group ID (must be greater than zero!) and a different Machine ID applied. The machine with the lowest Machine ID within a group will serve as the Cluster Clock Master. The algorithm used for interchanging the cluster clock information is easy and transparent: Every machine sends out its current time right after it has finished rendering a frame but before the vertical retrace of the graphics card has occurred. As soon the vertical retrace releases the renderer from its wait state every machine receives all clock information from all other machines - include its own. Every machine simply selects the clock from the lowest Machine ID it has received and adds a value of one to enter the next rendering cycle. This makes the lowest ID the Master but if the master stops rendering or has been disconnected from the network another machine will become the Master on-the-fly!

For synchronized Clusters in DirectX Output mode it is important to set the Prevent D3D Queuing to ON in the AV Configuration!


To achieve a properly working Cluster Clock it is crucial that:

  1. You are using machines with identical hardware.
  2. You are using the same scene on all machines (use the SystemID node to apply machine specific settings)
  3. That your scene is running properly with the maximum frame-rate.
  4. You are using identical project and graphics card settings (a different anti-aliasing setting will bring the clock out of sync).
  5. Disable Bezel or Overlap settings in the graphics card as this feature is achieved by the Render Setup rendering in Ventuz.
  6. All machines are connected to the same network.

Once the cluster is set up properly, all timing related nodes are affected by the Cluster timing. Some nodes, like Scroll Text, Counter or Ticker can be configured to work either frame based or time based. In order to sync those nodes via the Cluster Clock the Progress property has to be set to time based.

When activating the statistics in your render window, you will notice some additional information:


Cluster ID 01.02 means that this machine has the Group-ID 01 and the Machine-ID 02, the following number shows the current value of the cluster clock, and the clock from 01.01 indicates that the current clock master has Machine-ID 01.
On a slave system the type of the cluster clock is colored green, on a master system it is white.


A font color change on a slave system from green to white indicates that the slave has lost its master clock and is running independently. Make sure that your slave systems show a constant green writing without any flickering between green and white.

There is one unique feature for Nvidia in synchronized cluster setups. When using a Nvidia framelock setup with enabled Swap Sync, with Nvidia it is possible to get the timing for the Cluster Clock from the synchronization board. Since this does not base on a network protocol, this is even more precise. In this case the digits that represent the clock master in the statistics show FL (Framelock). Read more about synchronizing a cluster below.

Whereas Movers in Absolute mode are running exactly in sync no matter when the scene was started, nodes which need to be triggered (e.g. Mover in OneShot mode) are supposed to receive the trigger signal simultaneously. This can be achieved by using e.g. OSC multicasting.

Remoting

As soon as multiple Ventuz machines are running in a Cluster the Remoting Interface can be used to synchronize execution of commands on all related machines.

Network multicast cluster setup

You should generally try to run the Cluster Clock on a separate network with a separate switch (or hub). Since every PC in your Ventuz Cluster broadcasts the Cluster Clock every sixtieth of a second into your network, which can result in a substantial amount of traffic, it is a good idea to detach the Cluster network from the rest of your network.

Multicast messaging with multiple network interface cards

You must set up the network card (NIC) which is connected to the cluster clock network to the lowest metric. Use the route command in the cmd shell. Make sure that you run the cmd shell as Administrator (right click "cmd prompt", choose Run as Administrator).

route  change 224.0.0.0 mask 240.0.0.0 <IP of network card here> metric 25

If you want this to persist then add -p, otherwise changes will be lost with the next reboot.

Synchronization Hardware

In a cluster environment you will have to think about synchronizing the outputs. Especially in the case of video playback, having even a single frame of delay between outputs may be unacceptable. For those scenarios, additional hardware is required - so called sync boards. Popular examples are the AMD FirePro S400 Synchronization Board and the NVIDIA Quadro G-Sync Boards. Some information about different synchronization scenarios can be found on the Cluster page.

Setup

To synchronise your cluster outputs, install a synchronization board to every machine in your cluster, following the manufacturers instructions.

Go on with connecting the boards. Every sync board has one BNC connector for an external timing source and two RJ-45 connectors to connect the sync boards among each other. Be aware that the RJ-45 are only for connection between the sync boards. Do not connect them to a usual Ethernet as this can cause damage to the boards.
So have a look at the image below. The connection between the sync board is independent from the Ethernet connection.


To synchronize the cluster, one machine acts as the timing server, the others as clients. For Framelock applications, the timing server uses an internal signal as timing source. For Genlock applications, connect an external timing source, e.g. the house sync, to the timing server via the BNC connector as seen in the image above.

The machine that acts as the timing server uses both of its RJ-45 as outputs to distribute the timing signal. A client uses one as input and the other one as output, so the signal can be passed to another client. It is recommended to split the clients in two equal groups and arrange each group as a daisy chain connected to one of the two timing server outputs.


Next, configure the timing master and client in the graphics card driver. See your graphics card manual for details. Below is an example screenshot for a four display machine set up as timing master. Configured in AMD Catalyst Control Center. The timing signal comes from an external sync generator.


For detailed information on how to set up the synchronisation boards, refer to the manufacturers documentation.


Enable Swap Sync

So far we aligned the displays refresh rates to a common time source, be it internal or external, to avoid tearing or flickering. This was done in the graphics card configuration. As described above, Swap Sync requires support by the application - Ventuz in this case.
To enable swap sync, move to the DirectX Output of the AV Configuration.


Be aware that Swap Sync only makes sense and only works if the synchronisation boards are configured correctly.

Note that Swap Sync only works in exclusive fullscreen mode!


Our experiences with AMD Boards

Our experiences with Nvidia Boards

Check List

Unfortunately, the list of things that can go wrong while setting up a cluster is quite large. The following - incomplete - list can be used to avoid common problems:

When changing the number of displays attached to the graphics card or their cabling, make sure to reboot the machine. Displays are not as hot-pluggable as one might think.