Tuesday, 30 June 2026

Biodiverse 5.99_002 development release

 A new development release of Biodiverse (version 5.99_002) is now available.  

This is the second development release leading to version 6.  

Versions for Windows and Mac are available and can be accessed via https://github.com/shawnlaffan/biodiverse/wiki/Downloads

Installation instructions are at https://github.com/shawnlaffan/biodiverse/wiki/Installation

This version includes a few minor stability updates, bug fixes and performance improvements when compared with 5.99_001.  Basedatas with groups defined using a single text axis are also plotted as a grid, greatly improving visualisations in the GUI.

The list of changes is summarised at https://github.com/shawnlaffan/biodiverse/wiki/ReleaseNotes#version-5xx

For the full list of issues and changes leading to the 6.0 release, see https://github.com/shawnlaffan/biodiverse/milestone/23


A set of links to the new documentation system is at https://biogeospatial.github.io/ 

----

Shawn Laffan

30-Jun-2026 


For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/  

For a list of some of the analyses Biodiverse has been used for, see https://biogeospatial.github.io/biodiverse-publication-list/ 

You can also join the Biodiverse-users mailing list at https://groups.google.com/group/Biodiverse-users or start a discussion at https://github.com/shawnlaffan/biodiverse/discussions 


Range polygons

Biodiverse version 6 contains new spatial conditions and visualisation options to work with polygons of label and tree node ranges.  

Three polygon types are supported, convex hulls, concave hulls and circumcircles.  Convex hulls and circumcircles are best avoided when modelling geographic ranges given they are sensitive to outliers, do not model arcuate shapes such as ranges that follow coastlines, and in the circumcircle case are gross generalisations.  They are also not constrained by other factors, so a terrestrial taxon range can easily span oceans. However, they do represent useful models of regions that can be used in randomisations to define things such as the set of locations with which to swap labels, or to define dispersal extents in spatially constrained randomisations.  

When used as a spatial condition, the range polygons can be applied either for a single label at a time, for the labels subtending a tree node, or for all labels in a groups' assemblage.  In each of the latter two aggregate cases, the system uses the union of the component polygons rather than the polygons spanning the component groups.  This produces multipart polygons that allow for gaps in the distributions, for example an internal node in a tree might have some tips on one side of a continent, and others on the opposite side.  Or even across continents.  

The spatial conditions also support arguments for the concave hulls to allow holes and to define the degree of concavity (a value of 1 matches the convex hull, 0 is maximum concavity).  

Pictures are better than text, so here are some examples of the visualisations.  The first few screen shots below are for the Labels tab but they also apply to Spatial Outputs. 

Assemblage range polygons can be added via the Map menu. 


Tree range polygons are added through the Tree menu.  


Check the boxes to select which ones you want.  You can also set the map or tree highlights to be the same, and set the values to match the other highlighting.  This save a few mouse clicks and windows.  

Now when you hover on a cell, the selected polygons are shown.  In this case it is the circumcircle and convex hull for each label in a group in the south west of Western Australia.



And for contrast, here are the circumcircle and convex hull unions for the same assemblage.  (It also shows why one should be cautious about using concave hulls to model taxon ranges).  

The same process applies to tree nodes (branches) when they are hovered over.  


The polygons for an ancestral branch of the previous plot.  



And the convex hulls.  The shapes can get pretty weird, which is why one would normally use a concavity parameter greater than zero.  


This plot shows both that the visualisations work in the spatial outputs tab, and also why one generally wants to use the union of the polygons when many labels are being plotted.  

The plots above are all static but the system is dynamic.  The set of polygons changes as you hover over a different branch or cell.  If you want to hold a plot constant then right click on the branch or cell and Biodiverse will stop updating things.  Remember to right click in the same pane again to bring back the dynamic updates.  

An important point is the the spatial conditions use the cell centroids to define the polygons, where the visualisations use the cell polygons.  This can lead to slightly different results in some cases.  We can add an option to use polygons in the spatial conditions if there is a need.  

As noted at the start, this functionality will be in Version 6.  if you want to try it now then it is also part of the 5.99_001 development release.  


----

Shawn Laffan

30-Jun-2026 


For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/  


For a list of some of the analyses Biodiverse has been used for, see https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList 


You can also join the Biodiverse-users mailing list at https://groups.google.com/group/Biodiverse-users or start a discussion at https://github.com/shawnlaffan/biodiverse/discussions 



Friday, 19 June 2026

Trees: visualise results from other spatial outputs

For a long time Biodiverse has allowed the user to visualise additional values on the phylogenetic tree in a spatial analysis tab.  These include turnover of the branches in two neighbour sets, indices related to tree branched such as the weights used in a phylogenetic endemism calculation (another example is in this post).

This is very useful but before version 5 was limited to the set of lists in the spatial output being viewed.  

From version 5 of Biodiverse you can plot lists results from any spatial output across all basedatas in your project.  A big advantage of this is that you can run one analysis, for example a randomisation to generate a CANAPE output.  Later you can run a calculation to see what the relative contribution of each clade in the tree is to each analysis window, without having to rerun the whole analysis to see the new list.  This can be in a clone of the basedata so the randomisations won't be out of synch across a basedata's outputs (Biodiverse warns about this).  

This is currently implemented as a menu option below the tree and map plots.  Unfortunately this means it is not as obvious as it could be, and this is something still being worked on.  There have also been minor changes since v5 was released, but only to how the selected list names are shown.

The screenshots below show it in operation for a very simple analysis that uses every cell in the basedata.  This allows every cell in the tree to be coloured which works better as a demonstration.


The menu is at the lower right of the options below the map and tree plots.  The exact location depends on your screen size.  

Users can select from any list across all spatial outputs across all basedatas in the project.  In this case it is the PE weights in an analysis called Acacia_spatial0 in a basedata called Acacia1 (no, these are not informative names).

And the tree branches are coloured as requested.  

The lists can also be categorical outputs.  This is the results for a Range Weighted Branch Length Differences (RWiBaLD) analysis.  More details for that are in Mishler et al. 2026.

And that's pretty much it for the description.  More of the theory is discussed in the posts linked to above.  


----

Shawn Laffan

19-Jun-2026 


For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/  


For a list of some of the analyses Biodiverse has been used for, see https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList 


You can also join the Biodiverse-users mailing list at https://groups.google.com/group/Biodiverse-users or start a discussion at https://github.com/shawnlaffan/biodiverse/discussions 


Biodiverse 5.99_001 development release

A new development release of Biodiverse (version 5.99_001) is now available.  

This is the first development release leading to version 6.  

Versions for Windows and Mac are available and can be accessed via https://github.com/shawnlaffan/biodiverse/wiki/Downloads

Installation instructions are at https://github.com/shawnlaffan/biodiverse/wiki/Installation

This version includes the ability to visualise label and tree branch ranges as polygons, as well as many computational efficiency improvements and GUI updates.  The list of changes is summarised at https://github.com/shawnlaffan/biodiverse/wiki/ReleaseNotes#version-5xx

For the full list of issues and changes leading to the 6.0 release, see https://github.com/shawnlaffan/biodiverse/milestone/23


Much of the documentation has now also been ported to a quarto book system.  This is much more readable than the wiki system that was previously used.  

A set of links is at https://biogeospatial.github.io/ 


----

Shawn Laffan

19-Jun-2026 


For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/  

For a list of some of the analyses Biodiverse has been used for, see https://biogeospatial.github.io/biodiverse-publication-list/ 

You can also join the Biodiverse-users mailing list at https://groups.google.com/group/Biodiverse-users or start a discussion at https://github.com/shawnlaffan/biodiverse/discussions 


Tuesday, 4 November 2025

Biodiverse version 5.0 has been released

Biodiverse version 5.0 has now been released.  

Versions for Windows and Mac are available and can be accessed via https://github.com/shawnlaffan/biodiverse/wiki/Downloads

Installation instructions are at https://github.com/shawnlaffan/biodiverse/wiki/Installation

For the full list of issues and changes leading to the 5.0 release, see https://github.com/shawnlaffan/biodiverse/milestone/18

This version includes a complete rebuild of the plotting engine (the maps, trees and matrices), as well as many computational efficiency improvements.  The list of changes is summarised at https://github.com/shawnlaffan/biodiverse/wiki/ReleaseNotes#version-50

Version 5.0 contains 1120 source code commits across 199 files.


----

Shawn Laffan

04-Nov-2025


For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/  

For a list of some of the analyses Biodiverse has been used for, see https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList 

You can also join the Biodiverse-users mailing list at https://groups.google.com/group/Biodiverse-users or start a discussion at https://github.com/shawnlaffan/biodiverse/discussions 


Tuesday, 5 November 2024

Randomisations: Curveball algorithm now in Biodiverse

Biodiverse supports a range of randomisations to assess significance of analysis results.  Most use cases in the published literature use the rand_structured algorithm, which is explained in this post, but several common algorithms are supported.  

One of the design principles of Biodiverse is to give the user choice.  To that end, the curveball algorithm is available from version 5.  

The publication describing Curveball is Strona et al. (2014).  The name is derived from a baseball card trading card pastime popular in North America.  

The curveball algorithm is applied to a data set of items (species, genera, words, or some other set of identifiers).  In the common biodiversity case this is a sites by species matrix, transformed to a list of lists, e.g. a list of site lists, where each site list comprises its species (or vice versa).  These lists can be considered as sets.  At each iteration, two lists (sets of items) are randomly selected.  Any items found in both sets are ignored.  The rest can be swapped between the two sets, with the number swapped limited by the smaller number of unique items in the two sets to ensure after swapping that each set retains the same number of items it started with.  As an example, consider the case where set 1 has ten items, set 2 has eight, and there are six common items found in both lists.  This means two items can be swapped between the two lists.

The general formula for the number of possible swaps at an iteration is (min (|A|,|B|) - |A ∩ B|), where A and B are the two sets being considered, and the pipes || denote the lengths of the sets (the numbers of items they contain).   If one prefers to think in terms of dissimilarity measures where a is the number of shared items, b the number unique to set 1 and c the number unique to set 2, then the formula is (min (b,c)).  Purely as an aside, this is also part of the denominator in Simpson's dissimilarity index.  

The curveball algorithm is related to the independent swaps algorithm.  The chief advantage of curveball over independent swaps is that, because it swaps as many items as it can at each iteration, it converges on a randomised result much faster.  Curveball also avoids the main pitfall of the independent swaps algorithm where a pair can be selected that cannot be swapped, thus "wasting" an iteration (swap attempt).  

Curveball does, however, have the same issue that independent swaps has in that the user needs to specify the number of iterations over which swaps will be attempted.  Too few and the resulting matrix will not be sufficiently random.  Too many and time will be "wasted".  This is addressed in Biodiverse by optionally tracking which of the original matrix entries have been swapped, and stopping when all have been done (the stop_on_all_swapped parameter).  This has some overhead in the tracking but generally this should be balanced by the time saved by running fewer iterations overall.  For those interested, the default number of swaps is the same as for the independent swaps algorithm, which is twice the number of non-zero matrix entries (twice the sum of the lengths of all lists).

Accessing the curveball algorithm in Biodiverse is the same as for any of the randomisations.  Open the Randomisation tab, select rand_curveball as the randomise function, select the number of randomisation iterations and any other algorithm specific parameters, then press Go (see image below).  The results are in the same format as always (e.g. see here, here and here).

Since it is just another algorithm, all the common options are available (another new change in version 5 is that more options are available across all algorithms in the GUI - see issue 946).  Users can define regions that are randomised separately before reassembly for analysis, including some that are not to be randomised.  One can also add some of the randomised results to the project to inspect them.

In terms of speed, curveball is faster than rand_structured.  This is largely due to there being less book-keeping required.  However, as with independent swaps, curveball can only be applied on a per-cell basis.  It does not extend to spatially structured randomisations like rand_structured does (one could ensure swap candidates come from within some local neighbourhood, but this is a different model to something like a diffusion process or a random walk.  Update 20241109: This has been implemented and will be available in V5).

All that is needed to run the curveball algorithm is to choose rand_curveball as the "Randomise function".  Other parameters are set as usual.


And that's pretty much it for the description.  If you want to read more randomisation related blog posts then check out the posts tagged with the randomisation label.  


----

Shawn Laffan

05-Nov-2024


For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/  


For a list of some of the analyses Biodiverse has been used for, see https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList 


You can also join the Biodiverse-users mailing list at https://groups.google.com/group/Biodiverse-users or start a discussion at https://github.com/shawnlaffan/biodiverse/discussions 


Monday, 4 November 2024

Plotting indices with divergent colour schemes

Many diversity indices have numerical distributions that are divergent, i.e. they are centred on some value and the interesting bit is the magnitude of the differences away from that value.  A simple example is z-scores, where the data are centre on a value of zero and the values indicate how many standard deviations above or below the expected value the input data are.   These have been plotted using a divergent scheme since version 4.1, as described here.

However, one can also have indices that are simple differences, and also ratios where 1 is the centre of the distribution, and values of 1/2 and 2 are the same magnitude difference from the centre.  The relative phylogenetic diversity and endemism indices are examples of the latter.  

From version 5, Biodiverse plots difference and ratio indices using a divergent colour scheme.    These use the same colour range as the z-scores but plotted along a continuous scale instead of as ordinal classes.  

The colouring happens automatically based on metadata stored with the indices (incidentally, the much of GUI is built using this metadata).  

Colours are also scaled so the most extreme "high" colour is equivalent to the most extreme "low" colour, i.e. if the range of difference values is -5 to 1 then the colours are assigned to the range -5 to 5, and the same for -1 to 5.  This is also accounted for when the data are log scaled or percentile trimmed to de-emphasise extreme values.  

A useful point to note is that the colour schemes can be flipped, so if one prefers blue as extreme positive values then this can be done under the Map menu at the left of the display.  

An example is below to compare the old behaviour with the new.  


Prior to version 5, ratio data were plotted using the same colour scheme as any other data, making it difficult to interpret the relative magnitude of the index values across cells.  These are the Relative Phylogenetic Diversity results for the Acacia data set of Mishler et al. (2014), scaled to emphasise the inner 90% of the distribution (i.e. the upper 5% are assigned the same colour, so too the lower 5%).  This is the interval [0.406, 0.896], which means red cells include ratios <1 which is not ideal.  Compare with the next figure.    




The same data as in the previous figure, but now using a divergent colour scheme.  Biodiverse knows this is a ratio index, so assigns colours accordingly.  Red cells have ratios exceeding 1, blue cells less than 1.  Ratios close to 1 are in yellow.  The colours are assigned to the interval [0.406,2.463], where 2.463=1/0.406.  This means one can be sure red cells have ratios exceeding 1, and there is less chance of misinterpreting the results.  





It is not shown here, but the metadata is also stored for tree-based indices so divergent colours are assigned to the tree branches where appropriate.  More details about that process are in this post.  


----

Shawn Laffan

04-Nov-2024


For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/  


For a list of some of the analyses Biodiverse has been used for, see https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList 


You can also join the Biodiverse-users mailing list at https://groups.google.com/group/Biodiverse-users or start a discussion at https://github.com/shawnlaffan/biodiverse/discussions