Tuesday 23 June 2015

Better control of randomisations

Randomisations are used in Biodiverse to assess whether the analysis results are more extreme than expected given some null model.  Below is a short summary of the existing randomisations, then a description of some of the new functionality in version 1.0_001 to allow greater control by the user.

Currently there are three null models that can be used, with more in the planning stages.

  1. In the rand_nochange case, the groups and labels are held constant.   This is not a randomisation so far as the basedata is concerned, but is useful when one wants to randomise the label and/or groups properties, or the trees, while holding everything else constant.  This allows one to disentangle the effects of the spatial data from the trees or properties.  
  2. The rand_csr_by_group option takes the contents of each group and randomly assigns them to some other group.  If one is working with biotic data then it is analogous to a case where the set of assemblages across the observed groups is held constant, but these assemblages are located randomly.
  3. The rand_structured randomisation allocates labels randomly across the landscape, but keeps the label (e.g. species) richness of each randomly generated group constant within some specified tolerances.  If the richness_multiplier and richness_addition parameters are set to 1 and 0 respectively, then the randomised basedatas will have exactly the same richness patterns as the observed.  This is the randomisation used in the CANAPE analyses.  

The new functionality, available in version 1.01, allows the user to control which subsets of the basedata are randomised.

Keep some locations constant

In the first case, one can specify a list of labels which are to be held constant across the randomisations.  That is, the labels specified in that list retain their observed distributions, while the remainder of the labels are randomly re-assigned using whichever randomisation function is chosen.  

Randomise within regions

In the second case, one can specify a spatial condition which forces the randomisations to be applied within subsets.  For example, if one specifies a shapefile condition such as sp_point_in_same_poly() then the labels within each polygon will be randomised such that they stay within that polygon.  This allows one to, for example, keep any labels (e.g. taxa) within the biome within which they are found while still randomly locating them.  If taxa span biomes then the sets within each biome are randomised independently.

This process is also best applied using a spatial condition which is non-overlapping, i.e. sp_block or sp_point_in_same_poly().  If one uses a condition such as sp_circle, in which groups are in multiple neighbour sets, then the system simply assigns each group to the first neighbour set in which it is found.  (The system processes groups in an alphabetical sort order).  This is probably not what is wanted in most cases.  

Keep some labels constant

Finally, one can hold one or more of the labels constant.  An example use for this is if you have a large tree and want to keep one clade's distribution constant while randomising everything else.

These are specified as lists, with one label per line.  If you have a large number of labels, or simply want to avoid typing species names (a good thing to avoid if you have species ending in aea and/or eae) then you can copy and paste from one of the popups when you control-click on a cell.  You can also copy the selected set from the view labels tab (also new in 1.0_001).

In the example above, labels (species) will be randomised within the biomes in which their group is found.  Only those groups (cells) whose y-coordinate is less than 1,650,000 will be randomised.  Anything greater than (north of) that will be held constant.  Finally, Genus:sp11, Genus:sp12 and Genus:sp13 will be also not randomised.  All other labels will be.  This example uses the rand_structured randomisation, so the label ranges and group richness scores will be exactly the same across all randomisation iterations.   

Shawn Laffan, 23-Jun-2015

For more details about Biodiverse, see http://purl.org/biodiverse

For the full list of changes in the 1.0 series see https://purl.org/biodiverse/wiki/ReleaseNotes#version-101

To see what else Biodiverse has been used for, see https://purl.org/biodiverse/wiki/PublicationsList

You can also join the Biodiverse-users mailing list at http://groups.google.com/group/Biodiverse-users 

No comments:

Post a Comment

Note: only a member of this blog may post a comment.