Thursday 19 November 2020

Randomisations - modelling spatial structure

In the previous post, I described how the rand_structured algorithm works.  If you don't feel like reading that post right now, then a short description is that it uses a filling algorithm to randomly allocate the labels in a basedata onto the groups of a new basedata, with the end result being random assemblages in each group, and where each group has the same number of labels as the original basedata by default. A swapping algorithm is then used to allocate any labels that could not be allocated using the filling process, usually because there were no unfilled groups that did not already contain these labels.

The filling algorithm has some nice advantages over other methods that also match the richness patterns, such as the independent swaps algorithm.  The main one is that filling algorithms are easily generalised to model spatial processes in their allocation, for example random walks and diffusion processes.

The videos below show some of these spatially structured randomisations at work, using Acacia tetragonophylla incidences from Mishler et al. (2014).  Note that the shade of red is used only to indicate when in the sequence a cell was allocated to, with lighter shades being earlier and darker later.

The first video shows a random walk model, in which a seed location is chosen for the first allocation.  The next location is a randomly selected neighbour of the current location, and the process repeats.  If a location has no available neighbours (all are full or already contain this label) then it backtracks until it finds one it can allocate to (backtracking can be from the last allocated, the first, or randomly selected).   If there are no possible allocations then it will try a new seed location.  One can also set a probability to randomly reseed even if the window has not be fully allocated to.   

The second uses a diffusion process.  This also uses a seed location, but considers the neighbours of all of its current locations for the next allocation.  As with the random walk, this process is also constrained by the filled and previously allocated groups, and will reseed if necessary or at random.   There is no backtracking as it is not needed since there is no "path" being followed. 

The final model is a proximity allocation process.  A seed location is chosen, and a spatial condition is then used to select a window of neighbours.   The example here uses a circle of a three cell radius, but any supported condition could be used such as predefined regions represented using a shapefile.  Once all groups in the window have been allocated to, excluding fully allocated groups, a new seed location is chosen. 

As with the main rand_structured algorithm, any unallocated incidences are handled using a swapping algorithm (see details in the previous post).  This ensures richness targets are met, at the cost of some loss of spatial contiguity in the randomly generated distributions.  

The outliers for this proximity allocation randomisation are due to the swapping algorithm used to reach the per-group richness targets.

The above randomisations are all implemented under the rand_spatially_structured randomisation, for which parameters can be set to model each approach.  However, setting parameters can be complicated and error prone so there are variants called rand_diffusion and rand_random_walk that restrict the set of arguments needed and set some useful defaults, but otherwise are just wrappers around rand_spatially_structured (which itself is implemented in Biodiverse as a particular case of the rand_structured algorithm). There is not currently a wrapper for the proximity allocation, but one could be added if needed.  The next few screenshots show the parameter settings.   

The options to control the spatially structured randomisations are marked with the red and blue arrows.  The spatial condition (marked by the red arrow) determines which neighbours of the considered group will be allocated to next.  This example uses a square, but it could be any of the supported conditions.  The other parameters (marked with the blue arrow) control reseeding, backtracking, and allocation order.  

The rand_diffusion parameters are almost identical to those for rand_spatially_structured, as the former is just a special case of the latter and this simplifies setting it up.   

As with the rand_diffusion parameters, and for the same reason, the parameters for rand_random_walk are also almost identical to those for rand_spatially_structured.   

As an aside, if you want a good introduction to the general process of dynamic spatial modelling, then have a look at O'Sullivan and Perry (2013).  There is also a good introduction to the limits of the complete spatial randomness model in O'Sullivan and Unwin (2010).  

To sum up, there is more to randomisations than just randomly shuffling the data around.  Biodiversity patterns are normally spatially structured, so finding patterns that are significantly different from those that are completely random is often not much of a challenge.  Adding constraints such as matching species richness patterns is very useful, but one can go further and introduce even more spatial structure, making the tests "harder to pass" (an example is in Laffan and Crisp, 2003).  As noted by Gotelli (2001), trying multiple hammers is a useful thing, and to quote that paper: "The advantage of null models is that they provide flexibility and specificity that cannot often be obtained with conventional statistical analyses".  Using multiple approaches might be regarded as a fishing expedition, but it does allow one to generate a distribution of models and thus deeper understanding of the patterns one is analysing.  And fish are nutritious.

Shawn Laffan


For more details about Biodiverse, see

To see what else Biodiverse has been used for, see

You can also join the Biodiverse-users mailing list at

No comments:

Post a Comment

Note: only a member of this blog may post a comment.