A neural dynamic model for the perceptual grounding of spatial and movement relations

Dr.-Ing. Mathis Richter
Institut für Neuroinformatik

Setting up the software environment

The model introduced in the dissertation can be simulated with the software framework cedar.

Install cedar on your computer. The easiest way is to use a precompiled version, which is available for Windows, MacOS, and Linux. Download the cedar version for your operating system and uncompress it. If you already have a precompiled version of cedar on your computer, please make sure that you downloaded this version after February 19, 2018; versions before that will not work. The following video takes you through installing and running cedar.
Download the zipped configuration files and video dataset and extract them.
Start the graphical user interface of cedar by executing the file cedar.app in the folder cedar.
Load the configuration file through the "File -> Open file..." dialog and select the file grounding_relations.json (this may take a bit of time).
Simulate the model for a few seconds by clicking the start button (toolbar at the top, ), then pause the simulation again (same button, ), and reset the architecture (button to the right of it, ). This is currently required to initialize some buggy elements in cedar. We are working on fixing this issue.
Open the plot widget for the architecture through the "Windows -> Architecture widgets" dialog. Select "architecture plot".
Open the "Boost control widget" (toolbar at the top, ). There you can give input to the model by checking boxes.

Simulate the model

You can now simulate the model. Start the simulation by clicking the play button (toolbar at the top, ).

Note: If, at any point, you observe oscillations or really strange behavior of the model, your computer may not be able to keep up with the processing demands of the simulation. Please try decreasing the simulation speed (toolbar at the top, ).

Grounding a phrase with a single object

In this example, the architecture grounds the phrase "the red object" in a video (file: 2.00a.mp4) that shows both a red and a green ball, both of which are stationary.

Reset the architecture to remove any previous activation by clicking the reset button .
To enter the phrase, activate the boosts "target RED" and "GROUND OBJECT".
Observe that nodes in the box "target processes" become active.
They activate processes that bring the color red into the attentional foreground, as can be seen by a peak forming in the "color" field.
The "spatial attention" field forms a peak at the position of the red object.
The "target" field also forms a peak at the position of the red object - the phrase "the red object" has now been grounded.
The activation of the nodes of all involved processes (general processes, target processes) return below the threshold of zero.

Grounding a phrase with a spatial relation

In this example, the architecture grounds the phrase "the red object to the left of the green object" in a video (file: 2.00a.mp4) that shows both a red and a green ball, both of which are stationary.

Reset the architecture to remove any previous activation by clicking the reset button .
To enter the phrase, activate the boosts "target RED", "reference GREEN", "spatial LEFT".
Notice that nodes that represent these concepts become active in the left-most column of plots.
Start the grounding process by activating the boost "GROUND RELATION" in the boost control widget.
Observe that nodes in the boxes "target processes" and "spatial processes" become active first.
They activate processes that bring the color red into the attentional foreground, as can be seen by a peak forming in the "color" field.
They also activate processes that bring the spatial template of the concept "to the left of" into the "spatial relation CoS" field and the "spatial relation CoD" field.
The "spatial attention" field and the "target" field both form a peak at the position of the red object.
The nodes in the box "target processes" turn off and nodes in the box "reference processes" become active.
They activate processes that bring the color green into the attentional foreground, as can be seen by a peak forming in the "color" field.
The "spatial attention" field and the "reference" field both form a peak at the position of the green object.
At the same time, the position of the target object is projected into the "spatial relation CoS" and "spatial relation CoD" field.
The bump input overlaps with the spatial template of the relation "to the left of" in the "spatial relation CoS" field and forms a peak there.
The phrase "the red object to the left of the green object" is now grounded by the activated nodes (left most column, memory nodes) and peaks in the "target" field, "reference" field, and "spatial relation CoS" field.
The activation of the nodes of all involved processes return below the threshold of zero.

Generating a phrase about a spatial relation

In this example, the architecture generates a phrase that describes a stationary scene of objects, where a red ball is to the left of a green ball.

Reset the architecture to remove any previous activation by clicking the reset button .
In the boost control widget, activate the boost "DESCRIBE".
Notice that none of the memory nodes (left-most column are active).
Observe that nodes in the boxes "target processes" and "spatial processes" become active first.
They activate processes that bring the spatial template of all relational concepts into the "spatial relation CoS" field and the "spatial relation CoD" field.
They also activate processes that bring one of the objects into the attentional foreground, as can be seen by a peak forming in the "spatial attention" field and the "target" field.
The color of that object is read out in the "color" field.
This activates the "target color memory" node for the color.
The nodes in the box "target processes" turn off and nodes in the box "reference processes" become active.
They activate processes that bring the other object into the attantional foreground, both in the "spatial attention" field and the "reference" field.
At the same time, the position of the target object is projected into the "spatial relation CoS" and "spatial relation CoD" field.
The bump input overlaps most with the spatial template of one of the relations (e.g., "to the left of") in the "spatial relation CoS" field and forms a peak there.
This activates the "spatial relation memory" node for that relation.
It also leads to the color of the reference object being read out in the "color" field; this activates the "reference color memory" node for that color.
The scene has now been described by the phrase "a red object to the left of a green object" or "a green object to the right of a red object". The result is visible both in the activated memory nodes (left most column) as well as in the peaks in the "target" field, "reference" field, and "spatial relation CoS" field.
The activation of the nodes of all involved processes return below the threshold of zero, only the memory nodes remain active.

About the video dataset

The video dataset that was used as input to the model consists of 82 short video clips that show colored balls moving on a white background. All videos are recorded with a resolution of 640x480 pixels at 60 frames per second. Videos are compressed with the video codec H.264/MPEG-4 AVC. Most videos are only a couple of seconds long, all are shorter than 30 s.
The video dataset is licensed under Creative Commons Attribution 4.0.

Please note that the model is tuned to this video set only. Using other video input most definitely requires retuning the model to account for different color tones, sizes of objects, and movement speeds.