Biodiverse analysis software

Trees: visualise results from other spatial outputs

2026-06-19T12:34:18.954+10:00

For a long time Biodiverse has allowed the user to visualise additional values on the phylogenetic tree in a spatial analysis tab. These include turnover of the branches in two neighbour sets, indices related to tree branched such as the weights used in a phylogenetic endemism calculation (another example is in this post).

This is very useful but before version 5 was limited to the set of lists in the spatial output being viewed.

From version 5 of Biodiverse you can plot lists results from any spatial output across all basedatas in your project. A big advantage of this is that you can run one analysis, for example a randomisation to generate a CANAPE output. Later you can run a calculation to see what the relative contribution of each clade in the tree is to each analysis window, without having to rerun the whole analysis to see the new list. This can be in a clone of the basedata so the randomisations won't be out of synch across a basedata's outputs (Biodiverse warns about this).

This is currently implemented as a menu option below the tree and map plots. Unfortunately this means it is not as obvious as it could be, and this is something still being worked on. There have also been minor changes since v5 was released, but only to how the selected list names are shown.

The screenshots below show it in operation for a very simple analysis that uses every cell in the basedata. This allows every cell in the tree to be coloured which works better as a demonstration.

The menu is at the lower right of the options below the map and tree plots. The exact location depends on your screen size.

Users can select from any list across all spatial outputs across all basedatas in the project. In this case it is the PE weights in an analysis called Acacia_spatial0 in a basedata called Acacia1 (no, these are not informative names).

And the tree branches are coloured as requested.

The lists can also be categorical outputs. This is the results for a Range Weighted Branch Length Differences (RWiBaLD) analysis. More details for that are in Mishler et al. 2026.

And that's pretty much it for the description. More of the theory is discussed in the posts linked to above.

----

Shawn Laffan

19-Jun-2026

For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/

For a list of some of the analyses Biodiverse has been used for, see https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList

You can also join the Biodiverse-users mailing list at https://groups.google.com/group/Biodiverse-users or start a discussion at https://github.com/shawnlaffan/biodiverse/discussions

Biodiverse 5.99_001 development release

2026-06-19T11:15:49.147+10:00

A new development release of Biodiverse (version 5.99_001) is now available.

This is the first development release leading to version 6.

Versions for Windows and Mac are available and can be accessed via https://github.com/shawnlaffan/biodiverse/wiki/Downloads

Installation instructions are at https://github.com/shawnlaffan/biodiverse/wiki/Installation

This version includes the ability to visualise label and tree branch ranges as polygons, as well as many computational efficiency improvements and GUI updates. The list of changes is summarised at https://github.com/shawnlaffan/biodiverse/wiki/ReleaseNotes#version-5xx

For the full list of issues and changes leading to the 6.0 release, see https://github.com/shawnlaffan/biodiverse/milestone/23

Much of the documentation has now also been ported to a quarto book system. This is much more readable than the wiki system that was previously used.

A set of links is at https://biogeospatial.github.io/

----

Shawn Laffan

19-Jun-2026

For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/

For a list of some of the analyses Biodiverse has been used for, see https://biogeospatial.github.io/biodiverse-publication-list/

You can also join the Biodiverse-users mailing list at https://groups.google.com/group/Biodiverse-users or start a discussion at https://github.com/shawnlaffan/biodiverse/discussions

Biodiverse version 5.0 has been released

2025-11-04T16:13:00.006+11:00

Biodiverse version 5.0 has now been released.

Versions for Windows and Mac are available and can be accessed via https://github.com/shawnlaffan/biodiverse/wiki/Downloads

Installation instructions are at https://github.com/shawnlaffan/biodiverse/wiki/Installation

For the full list of issues and changes leading to the 5.0 release, see https://github.com/shawnlaffan/biodiverse/milestone/18

This version includes a complete rebuild of the plotting engine (the maps, trees and matrices), as well as many computational efficiency improvements. The list of changes is summarised at https://github.com/shawnlaffan/biodiverse/wiki/ReleaseNotes#version-50

Version 5.0 contains 1120 source code commits across 199 files.

----

Shawn Laffan

04-Nov-2025

For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/

For a list of some of the analyses Biodiverse has been used for, see https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList

You can also join the Biodiverse-users mailing list at https://groups.google.com/group/Biodiverse-users or start a discussion at https://github.com/shawnlaffan/biodiverse/discussions

Randomisations: Curveball algorithm now in Biodiverse

2024-11-05T15:41:00.003+11:00

Biodiverse supports a range of randomisations to assess significance of analysis results. Most use cases in the published literature use the rand_structured algorithm, which is explained in this post, but several common algorithms are supported.

One of the design principles of Biodiverse is to give the user choice. To that end, the curveball algorithm is available from version 5.

The publication describing Curveball is Strona et al. (2014). The name is derived from a baseball card trading card pastime popular in North America.

The curveball algorithm is applied to a data set of items (species, genera, words, or some other set of identifiers). In the common biodiversity case this is a sites by species matrix, transformed to a list of lists, e.g. a list of site lists, where each site list comprises its species (or vice versa). These lists can be considered as sets. At each iteration, two lists (sets of items) are randomly selected. Any items found in both sets are ignored. The rest can be swapped between the two sets, with the number swapped limited by the smaller number of unique items in the two sets to ensure after swapping that each set retains the same number of items it started with. As an example, consider the case where set 1 has ten items, set 2 has eight, and there are six common items found in both lists. This means two items can be swapped between the two lists.

The general formula for the number of possible swaps at an iteration is (min (|A|,|B|) - |A ∩ B|), where A and B are the two sets being considered, and the pipes || denote the lengths of the sets (the numbers of items they contain). If one prefers to think in terms of dissimilarity measures where a is the number of shared items, b the number unique to set 1 and c the number unique to set 2, then the formula is (min (b,c)). Purely as an aside, this is also part of the denominator in Simpson's dissimilarity index.

The curveball algorithm is related to the independent swaps algorithm. The chief advantage of curveball over independent swaps is that, because it swaps as many items as it can at each iteration, it converges on a randomised result much faster. Curveball also avoids the main pitfall of the independent swaps algorithm where a pair can be selected that cannot be swapped, thus "wasting" an iteration (swap attempt).

Curveball does, however, have the same issue that independent swaps has in that the user needs to specify the number of iterations over which swaps will be attempted. Too few and the resulting matrix will not be sufficiently random. Too many and time will be "wasted". This is addressed in Biodiverse by optionally tracking which of the original matrix entries have been swapped, and stopping when all have been done (the stop_on_all_swapped parameter). This has some overhead in the tracking but generally this should be balanced by the time saved by running fewer iterations overall. For those interested, the default number of swaps is the same as for the independent swaps algorithm, which is twice the number of non-zero matrix entries (twice the sum of the lengths of all lists).

Accessing the curveball algorithm in Biodiverse is the same as for any of the randomisations. Open the Randomisation tab, select rand_curveball as the randomise function, select the number of randomisation iterations and any other algorithm specific parameters, then press Go (see image below). The results are in the same format as always (e.g. see here, here and here).

Since it is just another algorithm, all the common options are available (another new change in version 5 is that more options are available across all algorithms in the GUI - see issue 946). Users can define regions that are randomised separately before reassembly for analysis, including some that are not to be randomised. One can also add some of the randomised results to the project to inspect them.

In terms of speed, curveball is faster than rand_structured. This is largely due to there being less book-keeping required. However, as with independent swaps, curveball can only be applied on a per-cell basis. It does not extend to spatially structured randomisations like rand_structured does (one could ensure swap candidates come from within some local neighbourhood, but this is a different model to something like a diffusion process or a random walk. Update 20241109: This has been implemented and will be available in V5).

All that is needed to run the curveball algorithm is to choose rand_curveball as the "Randomise function". Other parameters are set as usual.

And that's pretty much it for the description. If you want to read more randomisation related blog posts then check out the posts tagged with the randomisation label.

----

Shawn Laffan

05-Nov-2024

For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/

For a list of some of the analyses Biodiverse has been used for, see https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList

You can also join the Biodiverse-users mailing list at https://groups.google.com/group/Biodiverse-users or start a discussion at https://github.com/shawnlaffan/biodiverse/discussions

Plotting indices with divergent colour schemes

2024-11-04T15:04:00.002+11:00

Many diversity indices have numerical distributions that are divergent, i.e. they are centred on some value and the interesting bit is the magnitude of the differences away from that value. A simple example is z-scores, where the data are centre on a value of zero and the values indicate how many standard deviations above or below the expected value the input data are. These have been plotted using a divergent scheme since version 4.1, as described here.

However, one can also have indices that are simple differences, and also ratios where 1 is the centre of the distribution, and values of 1/2 and 2 are the same magnitude difference from the centre. The relative phylogenetic diversity and endemism indices are examples of the latter.

From version 5, Biodiverse plots difference and ratio indices using a divergent colour scheme. These use the same colour range as the z-scores but plotted along a continuous scale instead of as ordinal classes.

The colouring happens automatically based on metadata stored with the indices (incidentally, the much of GUI is built using this metadata).

Colours are also scaled so the most extreme "high" colour is equivalent to the most extreme "low" colour, i.e. if the range of difference values is -5 to 1 then the colours are assigned to the range -5 to 5, and the same for -1 to 5. This is also accounted for when the data are log scaled or percentile trimmed to de-emphasise extreme values.

A useful point to note is that the colour schemes can be flipped, so if one prefers blue as extreme positive values then this can be done under the Map menu at the left of the display.

An example is below to compare the old behaviour with the new.

Prior to version 5, ratio data were plotted using the same colour scheme as any other data, making it difficult to interpret the relative magnitude of the index values across cells. These are the Relative Phylogenetic Diversity results for the Acacia data set of Mishler et al. (2014), scaled to emphasise the inner 90% of the distribution (i.e. the upper 5% are assigned the same colour, so too the lower 5%). This is the interval [0.406, 0.896], which means red cells include ratios <1 which is not ideal. Compare with the next figure.

The same data as in the previous figure, but now using a divergent colour scheme. Biodiverse knows this is a ratio index, so assigns colours accordingly. Red cells have ratios exceeding 1, blue cells less than 1. Ratios close to 1 are in yellow. The colours are assigned to the interval [0.406,2.463], where 2.463=1/0.406. This means one can be sure red cells have ratios exceeding 1, and there is less chance of misinterpreting the results.

It is not shown here, but the metadata is also stored for tree-based indices so divergent colours are assigned to the tree branches where appropriate. More details about that process are in this post.

----

Shawn Laffan

04-Nov-2024

For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/

For a list of some of the analyses Biodiverse has been used for, see https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList

You can also join the Biodiverse-users mailing list at https://groups.google.com/group/Biodiverse-users or start a discussion at https://github.com/shawnlaffan/biodiverse/discussions

GUI: Polygon overlays (and underlays)

2024-11-04T13:30:00.003+11:00

Since its first release, Biodiverse has supported plotting of polygon and polyline feature class data (from shapefiles). The support is very basic given users can only plot the outlines of polygons, even though the colours could be changed.

This has worked well overall, but there are times when the linework from the feature data gets in the way of the cells being plotted. There are also times when it is useful to plot polygons as solid fills instead of just as the outline. From version 5 of Biodiverse it is possible to do just this.

The process is relatively simple. If a polygon overlay is loaded then it is listed twice in the selection window, once for lines and once for solid fill (with no outline). The default choice is polylines, which is the current behaviour. Users then have the option of plotting one overlay above or below the cells.

Colours can be assigned in the usual way. In this next selection window, the polygon data will be displayed below the cells using a grey colour (grey is quite useful as it does not visually dominate when coloured cells are used).

Polygon data are displayed as a solid grey fill, under the cells. In this case it makes it more obvious where there are unsampled regions. (Cell outlines have also been turned off using the map menu).

Other uses for polygon overlays are in plotting ocean polygons over terrestrial cells to cover over parts of cells that are in the sea (and vice versa for marine data).

There is no doubt more work to be done, for example plotting more than one layer at a time, but it is a useful improvement. If more complex plotting is needed then this is when it is best to leverage the power of GIS software.

----

Shawn Laffan

04-Nov-2024

For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/

For a list of some of the analyses Biodiverse has been used for, see https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList

You can also join the Biodiverse-users mailing list at https://groups.google.com/group/Biodiverse-users or start a discussion at https://github.com/shawnlaffan/biodiverse/discussions

Publications using Biodiverse in 2023

2024-02-29T09:19:00.001+11:00

2024 is moving quickly, so here is a list of publications from 2023 that used Biodiverse.

If you want to see the full list (211 at the time of writing), then go to https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList

For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/

Aragón-Parada, J., Carrillo-Reyes, P., Rodríguez, A., Munguía-Lino, G., Salinas-Rodríguez, M. M. and De-Nova, J. A. (2023) Spatial phylogenetics of the flora in the Sierra Madre del Sur, Mexico: Evolutionary puzzles in tropical mountains. Journal of Biogeography, 50, 1679-1691.
Copilaș-Ciocianu, D., Sidorov, D., & Šidagytė-Copilas, E. (2023). Global distribution and diversity of alien Ponto-Caspian amphipods. Biological Invasions, 25, 179-195.
de Pedro, D., Ceccarelli, F.S., Vandame, R. et al. (2023) Congruence between species richness and phylogenetic diversity in North America for the bee genus Diadasia (Hymenoptera: Apidae). Biodiversity and Conservation, 32, 4445–4459.
Dlamini, W.M.D. and Loffler, L. (2023). Tree Species Diversity and Richness Patterns Reveal High Priority Areas for Conservation in Eswatini. In: Dhyani, S., Adhikari, D., Dasgupta, R., Kadaverugu, R. (eds) Ecosystem and Species Habitat Modeling for Conservation and Restoration. Springer, Singapore.
Erst, A.S., Baasanmunkh, S., Tsegmed, Z. et al. (2023) Hotspot and conservation gap analysis of endemic vascular plants in the Altai Mountain Country based on a new global conservation assessment. Global Ecology and Conservation, 47, e02647
Fernandes, N.B.G., Moraes, A.M. and Milward-de-Azevedo, M.A. (2023) Diversity of the Passiflora L. in the Serra do Mar ecoregion and the relationships with environmental gradients, South and Southeast, Brazil. Acta Botanica Brasilia, 37, e20220314.
Flores-Argüelles, A., López-Ferrari, A.R., & Espejo-Serna, A. (2023). Geographic distribution and endemism of Bromeliaceae from the Western Sierra-Coast region of Jalisco, Mexico. Botanical Sciences, 101, 527-543.
Flores-Tolentino, M. et al. (2023). Delimitación geográfica y florística de la provincia fisiográfica de la Depresión del Balsas, México, con énfasis en el bosque tropical estacionalmente seco. Revista mexicana de biodiversidad, 94, e944985.
Francisco-Gutiérrez, A., Eduardo Ruiz-Sanchez, E. and Lira-Noriega, A. (2023) Biogeography and conservation assessments of the species of Lamourouxia (Orobanchaceae). Acta Botanica Mexicana 130: e2213.
González-Orozco, C.E. (2023) Unveiling evolutionary cradles and museums of flowering plants in a neotropical biodiversity hotspot. Royal Society Open Science, 10230917230917.
González-Orozco, C.E., Diaz-Giraldo, R.A. and Rodriguez-Castañeda, C. (2023) An early warning for better planning of agricultural expansion and biodiversity conservation in the Orinoco high plains of Colombia. Frontiers in Sustainable Food Systems, 7.
González-Orozco, C., Osorio-Guarín, J., & Yockteng, R. (2023). Phylogenetic diversity of cacao (Theobroma cacao L.) genotypes in Colombia. Plant Genetic Resources, 20, 203-214.
González-Orozco, C.E. & Parra-Quijano, M. (2023) Comparing species and evolutionary diversity metrics to inform conservation. Diversity and Distributions, 29, 224-231.
González-Orozco, C. E., Reyes-Herrera, P. H., Sosa, C. C., Torres, R. T., Manrique-Carpintero, N. C., Lasso-Paredes, Z., Cerón-Souza, I. and Yockteng, R. (in press). Wild relatives of potato (Solanum L. sec. Petota) poorly sampled and unprotected in Colombia. Crop Science.
Guo, WY., Serra-Diaz, J.M., Eiserhardt, W.L. et al. (2023) Climate change and land use threaten global hotspots of phylogenetic endemism for trees. Nature Communications, 14, 6950.
Mardones, D. and Scherson, R.A. (2023) Hotspots within a hotspot: evolutionary measures unveil interesting biogeographic patterns in threatened coastal forests in Chile. Botanical Journal of the Linnean Society, 202, 433–448.
McCurry, M.R., Park, T., Coombs, E.J. Hart, L.J., Laffan, S. (2023) Latitudinal gradients in the skull shape and assemblage structure of delphinoid cetaceans. Biological Journal of the Linnean Society, 138, 470-480.
Miller, J.T., Prentice, E., Bui, E.N., Knerr, N., Mishler, B.D., Schmidt-Lebuhn, A.N., González-Orozco, C.E., Laffan, S. W. (2023). Banksia (Proteaceae) contains less phylogenetic diversity than expected in Southwestern Australia. Journal of Systematics and Evolution, 61, 957-966.
Molina-Paniagua, M.E., Alves de Melo, P.H., Ramírez-Barahona, S., Monro, A.K., Burelo-Ramos, C.M., Gómez-Domínguez, H., et al. (2023) How diverse are the mountain karst forests of Mexico? PLoS ONE 18, e0292352.
Nicolau, G.K. and Edwards, S. (2023) Diversity and endemism of Southern African Gekkonids linked with the escarpment has implications for conservation priorities. Diversity, 15, 306.
Ortiz-Brunel J.P., Ochoterena H., Moore M.J., Aragón-Parada J., Flores J., Munguía-Lino G., Rodríguez A., Salinas-Rodríguez M.M. and Flores-Olvera H. (2023) Patterns of Richness and Endemism in the Gypsicolous Flora of Mexico. Diversity, 15, 522.
Ramírez-Verdugo, P., Tapia, A., Forest, F. and Scherson, R.A. (2023) Evolutionary diversity of the endemic genera of the vascular flora of Chile and its implications for conservation. PLoS ONE 18(7): e0287957. https://doi.org/10.1371/journal.pone.0287957
Ruiz-Sanchez, E., Munguía-Lino, G., Pianissola, E.M., Ely, F. and Clark, L.G. (2023) Richness and endemism in Chusquea subg. Swallenochloa (Poaceae), a Neotropical subgenus adapted to temperate conditions. Phytotaxa, 609, 180-194.
Villaseñor, J. L., Ortiz, E., & Hernández-Flores, M. M. (2023). The vascular plant species endemic or nearly endemic to Puebla, Mexico. Botanical Sciences, 101, 1207-1221.
Wang, C., Zhu, S., Jiang, X., Chen, S., Xiao, Y., Zhao, Y., Yan, Y. and Wen, Y. (2023) Spatio-temporal variation of species richness and phylogenetic diversity patterns for spring ephemeral plants in northern China. Global Ecology and Conservation, 48, e02752.
Ye, C. et al. (2023) Geographical distribution and conservation strategy of national key protected wild plants of China. iScience, 26, 107364.
Zhang, H., Chen, S.-C., Bonser, S.P., Hitchcock, T., & Moles, A.T. (2023). Factors that shape large-scale gradients in clonality. Journal of Biogeography, 50, 827-837
Zhou, R., Ci, X., Hu, J., Zhang, X., Cao, G., Xiao, J., Liu, Z., Li, L., Thornhill, A.H., Conran, J.G. and Li, J. (2023) Transitional areas of vegetation as biodiversity hotspots evidenced by multifaceted biodiversity analysis of a dominant group in Chinese evergreen broad-leaved forests. Ecological Indicators, 147, 110001

Shawn Laffan

29-Feb-2024

Map side menu: The tree plot controls are now a separate submenu, and some new features

2024-02-03T16:14:00.000+11:00

From version 5 of Biodiverse the tree plot controls in the left side menu are now their own submenu. This greatly simplifies the interface.

The displayed tree can now be exported, including the colours used when plotting the tree. Previously the colours were not stored so this was not possible. To export the colours corresponding to a specific cell then right click on that cell to fix the colouring in place. This stops any further updates until another cell is clicked on. The interface itself is unchanged, including the options to export the colours and an RGB geotiff of the spatial plot.

In addition, there are several new plotting options that allow one to plot using equal and range weighted branch lengths.

It is now possible to plot the tree using normal branch lengths, depth and also equal branch length and range weighted.

The equal branch length tree is the alternate tree in the CANAPE protocol

The range weighted tree can be used to understand how PE works.

The ranged weighted equal branch length tree can be used to understand the RPE index used in CANAPE.

----

Shawn Laffan

03-Feb-2024

For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/

For a list of some of the analyses Biodiverse has been used for, see https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList

You can also join the Biodiverse-users mailing list at https://groups.google.com/group/Biodiverse-users or start a discussion at https://github.com/shawnlaffan/biodiverse/discussions

Tree panels: colour the tree using any list from spatial outputs across the project

2024-02-03T15:24:00.011+11:00

It has long been possible to colour the tree branches in the spatial tab. However, it was only possible to use a list from the spatial analysis being plotted.

From version 5 the interface has been changed to enable selection of any list from spatial outputs across the project. Where before the system had a simple drop down list, it is now a menu with submenus for each basedata and then each of its spatial outputs.

The tree colour list selection is now a menu that allows users to choose any list across all spatial outputs in the project

As with most widgets in the GUI, the menu entries are described in the tooltip. That text is duplicated below.

The first (default) option shows the paths connecting the labels in the neighbour sets used for the analysis. When there is one such set all branches are coloured blue. When there are two such sets blue denotes branches only in the first set, red denotes those only in the second set, and black denotes those in both. From these one can see the turnover of branches between the groups (cells) in each neighbour set.

The next set of menu options are list indices in the spatial output that belongs to this tab. The remainder are lists across other spatial outputs in the project, organised by their basedata objects. These are in the same order as in the Outputs tab. Basedatas and outputs with no list indices are not shown.

If a branch is not in the list then it is highlighted using a default colour (usually black). If the selected output has no labels that are also on the tree then no highlighting is done (all branches remain black).

Right clicking on a group (cell) fixes the highlighting in place, stopping changes to the branch colouring as the mouse is hovered over other groups. This allows the tree to be exported with the current colouring (another new option in version 5).

----

Shawn Laffan

03-Feb-2024

For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/

For a list of some of the analyses Biodiverse has been used for, see https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList

You can also join the Biodiverse-users mailing list at https://groups.google.com/group/Biodiverse-users or start a discussion at https://github.com/shawnlaffan/biodiverse/discussions

Trimming basedatas has been generalised

2024-02-03T14:59:00.002+11:00

It has long been possible to trim the basedata labels to keep only those that match either the selected tree or selected matrix.

From Version 5 (actually 4.99_002 if you like development versions) it is possible to trim using a different basedata. The interface has also been generalised in the process.

There's not much to it, so here are some screenshots to demonstrate the process.

Generalised trimming is accessed from the basedata menu

It has the usual interface where one can specify a new name. "Trimming a clone" ensures it operates on a copy. "Delete matching" allows one to invert the trim, i.e. if one wants to keep only the labels that do not match,

Any of the basedatas, trees or matrices in the project can be selected to use as the label source.

----

Shawn Laffan

03-Feb-2024

For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/

For a list of some of the analyses Biodiverse has been used for, see https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList

You can also join the Biodiverse-users mailing list at https://groups.google.com/group/Biodiverse-users or start a discussion at https://github.com/shawnlaffan/biodiverse/discussions

Biodiverse now calculates the CANAPE super class

2023-12-02T14:30:00.001+11:00

Since version 4.3, Biodiverse has calculated and plotted the CANAPE results when the relevant calculations have been run.

However, it did not calculate the super class when first implemented. Now it does.

From version 5, Biodiverse calculates all CANAPE classes when a randomisation is run for an analysis that includes phylogenetic endemism and relative phylogenetic endemism.

Note that the CANAPE classed are only updated after at least one randomisation iteration has been run. If you have an existing randomisation then you can run one more iteration to trigger the calculation. Otherwise you can run a new randomisation with the same settings. This should not take long for most analyses, assuming they are consistent with the sizes of data sets in existing publications.

If you are wondering why it was not plotted in the first place, it was largely because the plotting system needed some re-engineering to allow for additional legend labels. This was done when the z-score and p-rank plotting was implemented, a little while after the initial CANAPE plotting.

----

Shawn Laffan

02-Dec-2023

For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/

For a list of some of the analyses Biodiverse has been used for, see https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList

You can also join the Biodiverse-users mailing list at https://groups.google.com/group/Biodiverse-users or start a discussion at https://github.com/shawnlaffan/biodiverse/discussions

Biodiverse 4.3 has been released

2023-05-01T12:38:00.003+10:00

Biodiverse version 4.3 has now been released.

Versions for Windows, Mac and Linux (Ubuntu) are available and can be accessed via https://github.com/shawnlaffan/biodiverse/wiki/Downloads

Installation instructions are at https://github.com/shawnlaffan/biodiverse/wiki/Installation

This release contains a small number of bug fixes and improved functionality.

For the full list of issues and changes leading to the 4.3 release, see https://github.com/shawnlaffan/biodiverse/milestone/21

Main changes:

GUI:

z-score plotting has been fixed (colours were reversed). Issue 857.

Randomisations

The p-rank calculations now generate ranks for all defined values. The GUI also now colours the values, similar to the z-scores. Issue 856. More details in the blog post.

Spatial conditions

The sp_points_in_same_poly_shape condition is now faster when any points do not intersect any polygons. See commit 3ca2703.

----

Shawn Laffan

01-May-2023

For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/

For a list of some of the analyses Biodiverse has been used for, see https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList

You can also join the Biodiverse-users mailing list at https://groups.google.com/group/Biodiverse-users or start a discussion at https://github.com/shawnlaffan/biodiverse/discussions

Changes to randomisation results - the p-rank data

2023-04-27T13:35:00.000+10:00

Randomisations in Biodiverse produce a range of outputs. These are kept in a range of lists, differing by name (see the help system).

One of the lists that is generated in the p-ranks. This is essentially the same as the P_ values in the main randomisation lists but where the low values account for ties so one can be sure the values represent the relative ranking of the observed value against those generated from the randomised data. For example, the significance of a low value should account for any ties.

The p-ranks were implemented a few years versions ago and are detailed in this blog post. Due to how the plotting was set up at the time, only values in the outer 10% of the distribution were retained. This helped understand which groups contained significant results without a major update to the display system but in the end was probably confusing. Now that the z-score plotting has been implemented the system has the infrastructure to handle the full range of values.

So what has changed?

Two things: the calculation of values and how they are plotted.

Note that the set of cells that can be regarded as significant using the standard alpha threshold of 0.05 for high or low values is unchanged. All that has changed is the number of cells with defined values and how they are displayed in the GUI.

The calculation

Put simply, all values are now retained. Any "P_" value less than 0.5 accounts to the number of ties. Expressed as pseudocode it is:

if P_index > 0.5

p_rank = P_index

else

p_rank = ((C_index + T_index) / Q_index)

where "index" is whichever index is being compared at the time.

This makes post-hoc calculation of compound indices like CANAPE easier (although remember that Biodiverse now does that for you).

The display

The addition of the z-score plotting means that the infrastructure for the plotting is in place so it was not too difficult to re-use it to instead display percentile classes. This is applied to the p-score lists by default.

Compare the two plots below and consider which is easier to work with.

The p-rank plotting in Biodiverse version 4.2 and earlier works, but it is difficult to see which cells are in specific percentile bands. For example which of these cells is in the outer 5%?

Indices in the p-rank lists are now plotted as percentile classes. Compare with the plot above.

As with other plots, the coloured cells can be exported as RGB geotiffs to display in a GIS or other plotting system.

----

Shawn Laffan

27-Apr-2023

For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/

For a list of some of the analyses Biodiverse has been used for, see https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList

You can also join the Biodiverse-users mailing list at https://groups.google.com/group/Biodiverse-users or start a discussion at https://github.com/shawnlaffan/biodiverse/discussions

Publications using Biodiverse in 2022

2023-04-02T21:17:00.000+10:00

2023 is now in full swing, so here is a list of publications from 2022 that used Biodiverse.

If you want to see the full list (183 at the time of writing), then go to https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList

For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/

Amaral, D.T., Bonatelli, I.A.S., Romeiro-Brito, M., Moraes, E.M. and Franco, F.F. (2022) Spatial patterns of evolutionary diversity in Cactaceae show low ecological representation within protected areas. Biological Conservation, 273, 109677.
Ávila-González, H., González-Gallegos, J.G., Munguía-Lino, G. & Castro-Castro, A. (2022) The genus Sisyrinchium (Iridaceae) in Sierra Madre Occidental, Mexico: A new species, richness and distribution. Systematic Botany, 47, 319-334.
Carter, B. E., Misiewicz, T. M. & Mishler, B. D. (2022). Spatial phylogenetic patterns in the North American moss flora are shaped by history and climate. Journal of Biogeography, 49, 1327-1338.
Chen, K., Khine, P.K., Yang, Z. and Schneider, H. (2022) Historical plant records enlighten the conservation efforts of ferns and Lycophytes’ diversity in tropical China, Journal for Nature Conservation, 68, 126197.
Contreras-Medina, R., García-Martínez, A. I., Ramírez-Martínez, J. C., Espinosa, D., Balam-Narváez, R., and Luna-Vega, I. (2021). Biogeographic analysis of ferns and lycophytes in Oaxaca: A Mexican beta-diverse area. Botanical Sciences, 100, 204-222.
Fernandes, N.B.G., de Menezes Yazbeck, G. & Milward-de-Azevedo, M.A. (2022) Taxonomic diversity of Passifloraceae sensu stricto along altitudinal gradient and on Serra dos Órgãos mountain slopes in southeastern Brazil. Rodriguésia, 73, e00702021.
Gosper C.R., Percy-Bower J.M., Byrne M., Llorens T.M. & Yates C.J. (2022) Distribution, Biogeography and Characteristics of the Threatened and Data-Deficient Flora in the Southwest Australian Floristic Region. Diversity, 14, 493.
Griffiths, D. (2022). Do the drivers and levels of isolation in fish faunas differ across Atlantic and Pacific drainages in the Americas? Journal of Biogeography, 49, 930-941.
Gutiérrez-Rodríguez, B.E., Guevara, R., Angulo, D.F. et al. (2022) Ecological niches, endemism and conservation of the species in Selenicereus (Hylocereeae, Cactaceae). Brazilian Journal of Botany, 45, pages 1149–1160.
Gutiérrez–Rodríguez, B.E., Vásquez–Cruz, M. and Sosa, V. (2022) Phylogenetic endemism of the orchids of Megamexico reveals complementary areas for conservation. Plant Diversity, 44, 351-359.
Kong, H., Condamine, F.L., Yang, L., Harris, A.J., Feng, C., Wen, F. and Kang, M. (in press) Phylogenomic and macroevolutionary evidence for an explosive radiation of a plant genus in the Miocene. Systematic Biology, 71, 589–609.
Moreira-Muñoz, A, Palchetti, V.A., Morales-Fierro, V., Duval, V.S., Allesch-Villalobos, R., & González-Orozco, C.E. (2022) Diversity and Conservation Gap Analysis of the Solanaceae of Southern South America. Frontiers in Plant Science, 13.
Murillo-Pérez, G., Rodríguez, A., Sánchez-Carbajal, D., Ruiz-Sanchez, E., Carrillo-Reyes, P., Munguía-Lino, G. (2022) Spatial distribution of species richness and endemism of Solanum (Solanaceae) in Mexico. Phytotaxa 558, 147–177
Olivares-Juárez, M.I., Burgos-Hernández, M. and Santiago-Alvarádo, M. (2022) Patterns of Species Richness and Distribution of the Genus Laelia s.l. vs. Laelia s.s. (Laeliinae: Epidendroideae: Orchidaceae) in Mexico: Taxonomic Contribution and Conservation Implications. Plants, 11:2742.
Paz, A., Silva, A.S. & Carnaval, A. (2022) A framework for near-real time monitoring of diversity patterns based on indirect remote sensing, with an application in the Brazilian Atlantic rainforest. PeerJ, 10:e13534.
Rivera-Martínez, R., Ramírez-Morillo, I.M., De-Nova, José A., Carnevali, G., Pinzón, J.P., Romero-Soler, K.J. & Raigoza, N. (2022) Spatial phylogenetics in Hechtioideae (Bromeliaceae) reveals recent diversification and dispersal. Botanical Sciences, 100, 692-709.
Silva, D.C., Oliveira, H.F.M., Zangrandi, P.L. and Domingos, F.M.C.B. (2022) Flying Over Amazonian Waters: The Role of Rivers on the Distribution and Endemism Patterns of Neotropical Bats. Frontiers in Ecology and Evolution, 10:774083.
Wang, Q., Huang, J., Zang, R., Li, Z. and El-Kassaby, Y. A. (2022). Centres of neo- and paleo-endemism for Chinese woody flora and their environmental features. Biological Conservation, 276, 109817.
Yang, X., Qin, F., Xue, T., Xia, C., Gadagkar, S. R., & Yu, S. (2022). Insights into plant biodiversity conservation in large river valleys in China: A spatial analysis of species and phylogenetic diversity. Ecology and Evolution, 12, e8940.
Yang, X., Zhang, W., Qin, F., et al. (2022). Biodiversity priority areas and conservation strategies for seed plants in China. Frontiers in Plant Science, 13, 962609.
Zhang, W., Bussmann, R.W., Li, J., Liu, B., Xue, T., Yang, X., Qin, F., Liu, H. and Yu, S. (2022) Biodiversity hotspots and conservation efficiency of a large drainage basin: Distribution patterns of species richness and conservation gaps analysis in the Yangtze River Basin, China. Conservation Science and Practice, 4, e12653.
Zhang, Y., Qian, L., Chen, X., Sun, L., Sun, H. and Chen, J. (2022) Diversity patterns of cushion plants on the Qinghai-Tibet Plateau: a basic study for future conservation efforts on alpine ecosystems. Plant Diversity, 44, 231-242.
Zhang, X.X, Ye, J.F., Laffan, S.W., Mishler, B.D., Thornhill, A.H., Lu, L.M. et al. (2022) Spatial phylogenetics of the Chinese angiosperm flora provides insights into endemism and conservation. Journal of Integrative Plant Biology, 64, 105-117.
Zhao, R., Xu, S., Song, P., Zhou, X, Zhang, Y. and Yuan, Y. (2022) Distribution patterns of medicinal plant diversity and their conservation in the Qinghai-Tibet Plateau. Biodiversity Science, 30, 21385.

Shawn Laffan

02-Apr-2023

Biodiverse version 4.2 has been released

2023-03-29T12:06:00.003+11:00

Biodiverse version 4.2 has now been released.

Versions for Windows, Mac and Linux (Ubuntu) are available and can be accessed via https://github.com/shawnlaffan/biodiverse/wiki/Downloads

Installation instructions are at https://github.com/shawnlaffan/biodiverse/wiki/Installation

This release contains a small number of bug fixes and improved functionality. For the full list of issues and changes leading to the 4.2 release, see https://github.com/shawnlaffan/biodiverse/milestone/20

Main changes:

GUI
- Branch highlighting in the View Labels tab works again. This was broken in version 4.1. Issue #850.
Data imports
- Raster imports now include the band labels if defined in multiband files. Issue #852.
- Importing a raster now works when the nodata value is NaN. Issue #851.

----

Shawn Laffan

29-Mar-2023

For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/

For a list of some of the analyses Biodiverse has been used for, see https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList

You can also join the Biodiverse-users mailing list at https://groups.google.com/group/Biodiverse-users or start a discussion at https://github.com/shawnlaffan/biodiverse/discussions

Biodiverse version 4.1 has been released

2023-02-07T15:26:00.000+11:00

We are pleased to announce the release of Biodiverse version 4.1.

Versions for Windows, Mac and Linux (Ubuntu) are available and can be accessed via https://github.com/shawnlaffan/biodiverse/wiki/Downloads

Installation instructions are at https://github.com/shawnlaffan/biodiverse/wiki/Installation

Version 4.1 represents five issues closed across 96 source code commits.

Highlights of the changes since version 4.0 are at https://github.com/shawnlaffan/biodiverse/wiki/ReleaseNotes#version-41, and the related blog posts can be accessed via https://biodiverse-analysis-software.blogspot.com/search/label/Version41

A more detailed listing of the closed issues is at https://github.com/shawnlaffan/biodiverse/milestone/19?closed=1

The main user visible change is that z-score indices are now plotted using a divergent colour scale using z-score significance thresholds. More details are in this blog post.

----

Shawn Laffan

07-Feb-2023

For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/

For a list of some of the analyses Biodiverse has been used for, see https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList

You can also join the Biodiverse-users mailing list at https://groups.google.com/group/Biodiverse-users or start a discussion at https://github.com/shawnlaffan/biodiverse/discussions

Plotting z-score indices and randomisation results

2023-02-07T15:23:00.003+11:00

From version 4.1, Biodiverse will plot indices it knows are z-scores using a divergent colour scheme, with values classified into intervals (adapted from the ArcGIS implementation). This makes it much easier to see which locations are potentially significant given the expected values.

This process applies to indices like the Net Relatedness Index and Net Taxon Index, all of the Gi* indices such as for group properties and label properties (more on such analyses here), as well as the z-scores generated by randomisation analyses. It also applies to branches of a cluster dendrogram when indices have been calculated for each node/branch.

You can export the coloured images to geotiff in the same way as for any data set.

There is not much more to it than that, so here are some images of what it looks like for a spatial analysis using the Acacia data set of Mishler et al. (2014).

The Net Relatedness Index

Z-scores for Phylogenetic Diversity after a spatial randomisation process

Net Relatedness Index calculated for the groups (cells) under each branch of a cluster analysis. Coloured cells are associated with the dendrogram branches that intersect the blue slider bar.

The spatial distribution of PD significance (left) with branches occurring in a cell in south-west Western Australia (black dot) coloured by clade score significance against the same randomisation process.

----

Shawn Laffan

07-Feb-2023

For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/

For a list of some of the analyses Biodiverse has been used for, see https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList

You can also join the Biodiverse-users mailing list at https://groups.google.com/group/Biodiverse-users or start a discussion at https://github.com/shawnlaffan/biodiverse/discussions

Biodiverse version 4.0 has been released

2022-11-26T09:24:00.002+11:00

We are pleased to announce the release of Biodiverse version 4.0.

Versions for Windows, Mac and Linux (Ubuntu) are available and can be accessed via https://github.com/shawnlaffan/biodiverse/wiki/Downloads

Installation instructions are at https://github.com/shawnlaffan/biodiverse/wiki/Installation

Version 4.0 represents 52 issues closed across 752 source code commits. 260 files have been changed.

Highlights of the changes since version 3.1 are at https://github.com/shawnlaffan/biodiverse/wiki/ReleaseNotes#version-40, and the related blog posts can be accessed via https://biodiverse-analysis-software.blogspot.com/search/label/Version4

A more detailed listing of the closed issues is at https://github.com/shawnlaffan/biodiverse/milestone/17?closed=1

-----

Shawn Laffan

26-Nov-2022

For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/

For a list of some of the analyses Biodiverse has been used for, see https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList

You can also join the Biodiverse-users mailing list at https://groups.google.com/group/Biodiverse-users or start a discussion at https://github.com/shawnlaffan/biodiverse/discussions

Export cluster groups to shapefile

2022-11-25T15:19:00.001+11:00

Biodiverse Version 4 allows users to export their cluster analyses using the same grouping process as is used to colour the branches.

This can be convenient to reconstruct the clusters in a GIS or other graphics system.

One issue is that only the cluster polygons (or points) are exported. If you want to attached data from the clusters then you can export them to delimited text using the Table Grouped method (with the same grouping parameters) and use a database join to attach them to the shapefile. The main reason for this is that shapefiles have a limit of 11 characters for field names, and many indices in Biodiverse exceed this (as well as sometimes containing characters other than letters, numbers and the underscore).

Another point to be aware of is that each group (cell) is a separate polygon so use a dissolve to merge them if you want to remove the internal boundaries.

Pictures are better than words so here are some screenshots.

An example cluster analysis, in this case with six clusters coloured.

The export option is in the usual place. It can also be accessed through the outputs tab.

In this case the export is set to use six clusters to match the display, but you can choose whatever you like. Other options include selecting by depth or by distance from the root (by length or depth).

And here we have a plot of the clusters. The colours differ but the clusters themselves are the same (and one can always update the colours).

If you want to use the grouped clusters in a spatial condition then it is easier to do so directly - see more details here.

If you just want to replicate the display then it is better to export the spatial data to an RGB geotiff and the tree to nexus with the colours embedded - see geotiff details here and the tree details here.

--------

Shawn Laffan

25-Nov-2022

For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/

To see what else Biodiverse has been used for, see https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList

You can also join the Biodiverse-users mailing list at https://groups.google.com/group/Biodiverse-users or start a discussion at https://github.com/shawnlaffan/biodiverse/discussions

Trees: Merge single-child branches with their children

2022-11-25T14:20:00.001+11:00

When Biodiverse is used to trim a tree to a subset of branches, for example to match the selected BaseData object, any branch with no remaining descendants is removed from the tree. All other branches are retained.

What this means is that some internal branches (nodes) can be left with only one child branch (node),. These can be referred to as single-child nodes and also knuckles. Retaining such nodes can be useful if some of the structure of the original tree needs to be kept, for example to indicate that there is phylogenetic data but that it has been removed from the tree. The counter to this is that most phylogenetic trees are samples and so are likely to be missing many branches anyway.

In the spirit of letting the user decide, Biodiverse version 4 supports the merger of internal branches with their children if they have only one child.

Names are important, and like many systems any node can be named in Biodiverse. In fact, all nodes have names but internal nodes default to a number with three trailing underscores (so "1___", "35___" etc). This allows many of the branch and clade level indices such as the phylogenetic endemism clade contributions and PD clade loss.

The general rule when merging is that the name of the merged node is whichever node had a non-default name to begin with. If both have non-default names then a child that is a terminal wins. Otherwise the parent name is used.

The process is best demonstrated using images.

An example tree plotted using depth instead of length to show the individual branches. The black branches are not in the basedata.

The tree trimming interface includes the option to merge single child nodes. In this case it is not selected.

The black branches from the previous screenshot have been deleted but one can see several branches that appear twice as long as the others. These are actually pairs of branches.

Repeating the process above but this time merging the single child (knuckle) nodes.

In this case all the branches are the same length because all single child branches have been merged with their children.

The examples above all use the tree trimming process, but if you have a tree that already has knuckles or forget to merge them then you can also merge the nodes directly from the tree menu.

Direct access to the merging process.

--------

Shawn Laffan

25-Nov-2022

For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/

To see what else Biodiverse has been used for, see https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList

You can also join the Biodiverse-users mailing list at https://groups.google.com/group/Biodiverse-users or start a discussion at https://github.com/shawnlaffan/biodiverse/discussions

Biodiverse now calculates CANAPE for you

2022-10-25T16:41:00.001+11:00

The CANAPE protocol is one of the analyses Biodiverse is most commonly used for (see examples amongst the list of publications using Biodiverse).

The method, or protocol, was originally described in Mishler et al. (2014) and is conceptually simple. Run an analysis that includes phylogenetic endemism and relative phylogenetic endemism, run those through a randomisation, and then categorise the results based on the significance score of the indices. This process is described in more detail in previous posts here and here.

The main issue with the approach to date is that the CANAPE classes are determined outside of Biodiverse using systems like a GIS, R code or a spreadsheet. So while the process is conceptually simple, the actual implementation can all get a bit complex. Many users are not entirely sure which indices to pass through their functions, or even which lists to extract them from.

As of Version 4 Biodiverse now calculates it for you. This occurs automatically whenever an analysis has included the Phylogenetic Endemism and Relative Phylogenetic Endemism type 2 calculations. (If you want it sooner than version 4 then it is in the development release 3.99_005, which was current at the time of writing. See the downloads page for links).

Biodiverse now calculates the CANAPE scores when the requisite indices have been calculated, and a randomisation has been run. Like many of the posts on this blog, this example uses the Acacia data set from Mishler et al. (2014).

How does Biodiverse store the results?

The results are stored in a new list where the name is the randomisation output used followed by ">>CANAPE>>". So for a randomisation called "rand" you would see "rand>>CANAPE>>". The use of angle brackets might look a bit strange at first but makes the naming consistent with the other randomisation lists and simplifies the underlying code.

The CANAPE classes are stored in an index called CANAPE_CODE, with a numeric code indicating which of the categories a cell falls in. Currently this code is 0 for not significant, 1 for neo-endemism, 2 for palaeo-endemism and 3 for mixed endemism.

Biodiverse also provides individual indices for neo, palaeo and mixed in the event a user only wants to see which cells are are in a specific class. For example one might want to run a cluster analysis using only neo-endemism cells following the process described here.

The same data as above but highlighting Palaeo-endemism cells in red. All other cells containing data are in blue.

Visualisation

A big advantage of generating CANAPE results within Biodiverse is that users can now explore the results using the functionality Biodiverse provides. As an example, the next screen shot shows an exploration of the contribution of each clade on the tree in relation to the analysis groups (cells) (see more details about that process here and here).

Each tree branch is coloured by the relative contribution of the clade subtending it to the PE score in the cell being hovered over (black dot in south-western WA). This allows an understanding of which clade is driving the PE scores, and thus CANAPE, in a cell. The visualisation process is explained in more detail here.

Displaying the results in other systems

If you then want to use the plots as part of a map then they can be exported to an RGB Geotiff. Details of how to do this are in another post but the next two screenshots show the start and end.

What about a different colour scheme?

The colour scheme used is from Mishler et al. (2014) where neo is red (new is hot), palaeo is blue (old is cold) and purple is between blue and red on a colour wheel.

If you prefer a different colour scheme then you can export the data as you normally would, for example as CSV files or as non-RGB geotiffs, and recreate the plot to your own tastes.

Changing the colours within Biodiverse would be very useful and contributions are always welcome.

What about the Super class?

The system does not currently generate the Super class. It can be added if there is demand. (Edit: It was added for Biodiverse Version 5).

Do I have to run a new randomisation analysis to see the CANAPE list?

The CANAPE lists are generated at the end of any sequence of randomisations. If you already have a randomisation analysis then they can be created by running one additional iteration.

If you are concerned that your analysis is already at 999 iterations then all you lose is a bit of numeric neatness as there are now 1001 realisations in total instead of 1000 (one original plus all the random ones). This is unlikely to make any meaningful difference once that many iterations have been run.

--------

Shawn Laffan

25-Oct-2022

For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/

To see what else Biodiverse has been used for, see https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList

You can also join the Biodiverse-users mailing list at https://groups.google.com/group/Biodiverse-users or start a discussion at https://github.com/shawnlaffan/biodiverse/discussions

Biodiverse now calculates indices for the variation in phylogenetic distinctness

2022-07-10T21:06:00.000+10:00

Biodiverse has included calculations of indices from the phylocom system for several versions, specifically the Mean Phylogenetic Distance (MPD) and Mean Nearest Taxon Distance (MNTD). The MPD is the average of the pair-wise distances between tree tips in a sample, where the distances pass through all the shared ancestors below the most recent common ancestor. The MNTD is the average distance for each tip to its nearest tip in the sample.

There are many ways of slicing and diving a sample, and one of the development principles of Biodiverse is to provide more details rather than less. Consequently there are also indices for the pair-wise root mean standard deviation (RMSD), minimum and maximum distances between a sample of tips on a tree.

The min and max are simply the longest and shortest distances in the pairwise sample, so the distances between the most and least related pairs. The RMSD is the square root of the mean squared distance and is a measure of the variability in a sample. It is analogous to a standard deviation but where the expected value (the mean) is zero, and follows the same formulation as the Root Mean Squared Error except a value of zero in RMSE means no error whereas in RMSD it means a zero distance between tips on the tree.

However, the RMSD is not the variance and sometimes one is looking to see how a set of pair-wise distances is distributed around the mean. This is where the Variance becomes useful, as first described by Warwick and Clarke (2001).

Biodiverse version 4 includes indices for the variance of the pairwise distances. The index names are subject to change before then but for now follow the pattern PMPD1_VARIANCE, PMPD2_VARIANCE and PMPD2_VARIANCE, where the 1, 2 and 3 indicate unweighted (each tip counts equally), locally range weighted (tips count as many groups they occur in the neighbourhood) and locally abundance weighted (using the number of samples of each tip in the neighbourhood). These are calculated by default when the relevant MPS and MNTD indices are requested.

The variance indices are calculated with the other MPD and MNTD indices.

Plotting is the same as for any index. Some cells are blank because values are undefined when the sample contains only one tip, and therefore no path between tips. Zero variances are where there are only two tips, and thus no variation.

This is just a plot of the mean for comparison.

But are the values significant?

A common approach to testing significance of the MPD and MNTD indices in the unweighted case is to use a resampling approach. For each sample this generates a distribution of possible values under random resampling of the same number of tips. More details are given in another blog post.

The unweighted pairwise variance is also assessed in this way, with the index name using NET_VPD. As with NRI and NTI, this is a z-score so values more extreme than +/-1.96 can be considered significantly higher or lower than expected.

The resampling approach uses the same code as for NRI and NTI so the same sequence of resamples can be used across NRI, NTI and NET_VPD, although in Biodiverse version 4 this is only for NTI for non-ultrametric trees an exact calculation is used for NRI with any trees and for NTI for ultrametric trees. This exact calculation avoids resampling and is much faster to run. More details and references are in the same blog post referred to above).

The NET_VPD indices are also under the PhyloCom set. Users can calculate the NET_VPD as well as the expected values used in its calculation.

Values are z-scores. At least three tips are needed to calculate the z-score as standard deviations are always zero for two tips and thus the z-score is undefined.

Control clicking on cells allow users to see the values for all indices that were calculated (within each output list, where SPATIAL_RESULTS is where most go).

Shawn Laffan

10-Jul-2022

For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/

To see what else Biodiverse has been used for, see https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList

You can also join the Biodiverse-users mailing list at https://groups.google.com/group/Biodiverse-users or start a discussion at https://github.com/shawnlaffan/biodiverse/discussions

Use clusters in spatial conditions

2022-05-02T17:53:00.000+10:00

Spatial conditions are a core part of Biodiverse

Most people seem to focus on using single cells for their analysis and trying to find the ideal cell size. This is missing much of the benefit of spatial analyses. You are not constrained to using single cells in isolation.

You can analyse regions around each focal location (processing group) using geometric shapes like circles. Varying the size of the window gives an understanding of the spatial scale of the patterns (the operational scale). However, there is no need to be geometric - you can use arbitrarily complex spatial conditions based on polygon features, proximity and/or matching text. See for example Laffan and Crisp (2003) and Laity et al. (2015).

You can also use cluster (and region grower) analyses to define your spatial windows. These allow you to let the data define the regions, with the calculations then applied giving you more understanding of the groupings that have been identified. Care needs to be taken with interpretation due to the risk of circularity, but that's not unusual. And sometimes you just want to understand something about the assemblage that falls under a node (branch). You might also be interested in the environmental properties associated with a cluster.

One issue with the cluster approach is that it can be difficult to use the branches in a spatial condition for a different analysis. Consider the case where one wants to spatially partition a randomisation so labels are kept within their associated clusters (for a given cluster cutoff). You could export the clusters to shapefile format, extract the relevant features to a new shapefile, and then use that in a new spatial condition. But that's a lot of work and not easy for people less familiar with geoprocessing and GIS.

From version 4 you can access the set of groups under a cluster analysis and use that to define spatial conditions (actually it is in the 3.99_003 development version). This can use any of the current cutting methods, so you can slice by distance from the tips, depth, or number of clusters from the root using the sp_points_in_same_cluster condition. You can also select individual branches (nodes) by name (sp_point_in_cluster).

Some snippets are below that can be copied into your spatial conditions windows. No screenshots this time, but I can add a new post of that is needed.

Note that the cluster analysis being referred to must be in the same basedata.

## sp_points_in_same_cluster examples

# Try to use the highest four clusters from the root.
# Note that the next highest number will be used
# if four is not possible, e.g. there might be five
# siblings below the root. Fewer will be returned
# if the tree has insufficient tips.
sp_points_in_same_cluster (
output       => "some_cluster_output",
num_clusters => 4,
)

# Cut the tree at a distance of 0.25 from the tips
sp_points_in_same_cluster (
output          => "some_cluster_output",
target_distance => 0.25,
)

# Cut the tree at a depth of 3 from the root.
# The root is depth 1.
sp_points_in_same_cluster (
output          => "some_cluster_output",
target_distance => 3,
group_by_depth => 1,
)

# Select four clusters below a specified node
sp_points_in_same_cluster (
output       => "some_cluster_output",
num_clusters => 4,
from_node    => '118___', # use the node's name
)

# target_distance is ignored if num_clusters is set

# so this is the same as the first example
sp_points_in_same_cluster (
output => "some_cluster_output",
num_clusters => 4,
target_distance => 0.25,
)

## sp_point_in_cluster examples

# This will select any element that is a terminal in the cluster output
# It is useful when the cluster analysis was run under
# a definition query to reduce the number of elements clustered,

# and you want the same set of elements.
sp_point_in_cluster (
output       => "some_cluster_output",
)

# Now specify a cluster within the output
sp_point_in_cluster (
output       => "some_cluster_output",
from_node    => '118___', # use the node's name
)

# Specify an element to check instead of the current
# processing element.
sp_point_in_cluster (
output       => "some_cluster_output",
from_node    => '118___', # use the node's name
element      => '123:456', # specify an element to check
)

Shawn Laffan

02-May-2022

For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/

To see what else Biodiverse has been used for, see https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList

You can also join the Biodiverse-users mailing list at https://groups.google.com/group/Biodiverse-users

Importing group properties directly from rasters

2022-05-02T13:29:00.002+10:00

What environmental conditions relate to my biodiversity patterns?

Often one wants to understand which environmental conditions are associated with the taxonomic, phylogenetic and/or trait data. Examples include edaphic and climatic variables, and publications doing so include Bickford and Laffan (2006), Gonzales-Orozco et al. (2013), González-Orozco et al. (2014a), González-Orozco et al. (2014a), Nagalingum et al. (2015) and Bein et al. (2020).

Such data are typically obtained as rasters, with spatial resolutions often of the order of hundreds of metres. This is in contrast to the resolution typically used for Biodiverse analyses (tens to hundreds of kilometres).

Up until now this has been something of a complex process. The raster data need to be aggregated to the same resolution as the Biodiverse data, and aligned as part of that process. Some sort of summary statistic needs to be calculated for each cell, usually the mean. Then the data need to be converted to a CSV format with coordinates that exactly match the Basedata group labels so they can be attached as group properties using the import process. The latter can be done by importing the rasters as their own basedatas, running numeric label statistics, exporting the results to CSV format and then attaching from there. Still not simple, and not easy when there are tens of rasters to process.

Now it is much easier

This process is greatly simplified in Biodiverse version 4, with early access via the 3.99_003 development release. (Access to releases is via the downloads page).

A set of rasters can be selected, imported and attached. Biodiverse takes care of all the spatial matching and runs the summary statistics. As a bonus, the imported data can also be attached to the project in the event the user wants to run other analyses on them.

Currently there is support for the mean, standard deviation, min, max etc. If there is demand for other statistics like the median or inter-quartile range then these can be added.

Any raster data supported by GDAL can be imported. Development has used geotiffs as they are the most common. The process could probably also be generalised to support other file formats like CSV and shapefile. It depends on demand and developer time.

The key criteria for the raster data are that they must be in the same coordinate system as your basedata and they must represent continuous data (i.e. not be numerical categories). The latter point is important because the group property analyses do not work with nominal/categorical values. If you need to summarise categorical data then use an indicator approach where each class is represented by its own raster, and that raster has values of 1 for where that class occurs, and zero elsewhere.

How it works

Some screenshots are probably the best means of showing the process.

In these examples I import two data sets from WorldClim at a 5 arc minute resolution, the Annual Mean Temperature and Mean Diurnal Range. These are just the first two of the Bioclim layers provided by WorldClim. The data have been projected into a Lambert Conic Conformal coordinate system to match the basedata being used (the example data that come with Biodiverse) and have been cropped to the Australian extent.

Annual rainfall from WorldClim2 for Australia, using a Lambert Conic Conformal projection. Brown is low, blue is high.

The data are going to be attached to the example data that come with Biodiverse.

The process is accessed via the Basedata menu.

Rasters are selected from a folder at the same time as the options. In this case the mean and standard deviation stats will be attached as properties to the the added to the selected basedata, and the intermediate basedatas will be added to the project so they can be visualised and/or analysed further.

The process provides some general feedback when it completes (successfully or otherwise).

The outputs tab shows the intermediate basedatas have been added. Each contains a spatial analysis that was used to calculate the statistics.

The property data cannot be visualised directly (yet). To explore them without using an analysis you need to open the View Labels window for the basedata they were attached to and control click on a cell using your mouse.

The popup window shows the properties for the cell that was clicked on (you will need to change the list being shown to be Properties).

The group properties can be analysed in a spatial or cluster analysis. Look for the calculations starting with "Group properties" under the Element Properties set. In this case the analyses will follow those linked to the the very top and calculate summary stats and Gi* hotspot stats for each branch in a cluster tree.

And here is a visualisation of the Gi* hotspot stat for branches cut at 0.4744 from the tips (you can slide the blue line to change this value). The interpretation depends on your significance threshold but Gi* scores are z-scores so, for a two-tailed test where values could be high or low, values above 1.96 are hotspots at alpha=0.05, while those below -1.96 are coldspots.

And here are the same clusters but this time coloured by the mean stat across all groups in the sample. (The naming scheme results in lots of "means").

And here is an example of the imported raster data (diurnal range) that were used to generate the group properties.

This image demonstrates what can happen when coarse resolution data are used. The 5 arc minute resolution translates to approximately 18 km when projected. The cells in the basedata containing the species observations is 50 km. The system uses raster cell centroid coordinates to allocate their values to a basedata cell and there are clearly alignment offsets here. There are many sources of finer resolution data you can use.

Shawn Laffan

02-May-2022

For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/

To see what else Biodiverse has been used for, see https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList

You can also join the Biodiverse-users mailing list at https://groups.google.com/group/Biodiverse-users

Publications using Biodiverse in 2021

2022-03-12T11:24:00.001+11:00

2021 is now in full swing, so here is a list of publications from 2019 that used Biodiverse.

If you want to see the full list (155 at the time of writing), then go to https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList

For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/

Shawn Laffan
12-Mar-2022

Anguiano-Constante, M.A., Dean, E., Starbuck, T., Rodríguez, A. And Munguía-Lino, G. (2021) Diversity, species richness distribution and centers of endemism of Lycianthes (Capsiceae, Solanaceae) in Mexico. Phytotaxa, 514, 39-60.

Bharti, D.K., Edgecombe, G.D., Karanth, K.P. and Joshi, J. (2021) Spatial patterns of phylogenetic diversity and endemism in the Western Ghats, India: A case study using ancient predatory arthropods. Ecology and Evolution, 11, 16499-16513.

Camacho, G.P., Loss, A.C., Fisher, B.L., Blaimer, B.B. (2021) Spatial phylogenomics of acrobat ants in Madagascar—Mountains function as cradles for recent diversity and endemism. Journal of Biogeography, 48, 1706-1719.

Cheikh Albassatneh, M., Escudero, M., Monnet, A‐C., et al. (2021) Spatial patterns of genus‐level phylogenetic endemism in the tree flora of Mediterranean Europe. Diversity and Distributions, 27, 913– 928.

Earl, C., Belitz, M.W., Laffan, S.W., Barve, V., Barve, N., Soltis, D.E., Allen, J.M., Soltis, P.S., Mishler, B.D., Kawahara, A.Y., & Guralnick, R. (2021) Spatial phylogenetics of butterflies in relation to environmental drivers and angiosperm diversity across North America. iScience, 102239.

Flores-Tolentino M., Beltrán-Rodríguez L., Morales-Linares J., et al. (2021) Biogeographic regionalization by spatial and environmental components: Numerical proposal. PLoS ONE 16, e0253152.

Furtado, S.G. and Menini Neto, L. (2021) What is the role of topographic heterogeneity and climate on the distribution and conservation of vascular epiphytes in the Brazilian Atlantic Forest? Biodiversity and Conservation, 30, 1415–1431.

Garcia-Rodriguez, A., Luna-Vega, I., Yáñez-Ordóñez, O., Ramírez-Martínez, J.C., Espinosa, D., and Contreras-Medina, R. (2021). Patrones de Distribución de las Abejas del Bosque Mesófilo de Montaña de la Sierra Madre Oriental, México. Southwestern Entomologist, 46, 1021-1036.

González-Orozco, C.E. (2021) Biogeographical regionalisation of Colombia: a revised area taxonomy. Phytotaxa, 484, 3.

González-Orozco, C.E. (2021) Regiones biogeográficas del género Cinchona L. (Rubiaceae- Cinchoneae). Revista Novedades Colombianas, 16, 135-156.

González-Orozco, C. E., Sosa, C. C., Thornhill, A. H., and Laffan, S. W. (2021). Phylogenetic diversity and conservation of crop wild relatives in Colombia. Evolutionary Applications, 14, 2603-2617.

Gosper, C.R., Coates, D.J., Hopper, S.D., Byrne, M., Yates, C.J. (2021) The role of landscape history in the distribution and conservation of threatened flora in the Southwest Australian Floristic Region. Biological Journal of the Linnean Society, 133, 394–410.

Hammer, T.A., Renton, M., Mucina, L. and Thiele, K. (2021) Arid Australia as a source of plant diversity: the origin and climatic evolution of Ptilotus (Amaranthaceae). Australian Systematic Botany, 34, 570-586.

Hao, T., Elith, J., Guillera-Arroita, G., Lahoz-Monfort, J. J., & May, T. W. (2021). Enhancing repository fungal data for biogeographic analyses. Fungal Ecology, 53, 101097.

Kougioumoutzis, K., Kokkoris, I.P., Panitsa, M., Kallimanis, A., Strid, A., and Dimopoulos, P. (2021) Plant Endemism Centres and Biodiversity Hotspots in Greece. Biology, 10, 72.

Murali, G., Gumbs, R., Meiri, S. and Rull, U. (2021) Global determinants and conservation of evolutionary and geographic rarity in land vertebrates. Science Advances, 7, eabe5582.

Ortiz-Brunel, J.P., Munguía-Lino, G., Castro-Castro, A. and Rodríguez, A. (2021) Biogeographic analysis of the American genus Echeandia (Agavoideae: Asparagaceae). Revista Mexicana de Biodiversidad 92, e923739.

Paz, A., Brown, J.L., Cordeiro, C.L.O., Aguirre‐Santoro, J., Assis, C., Amaro, R.C., Raposo do Amaral, F., Bochorny, T., Bacci, L.F., Caddah, M.K., d’Horta, F., Kaehler, M., Lyra, M., Grohmann, C.H., Reginato, M., Silva‐Brandão, K.L., Freitas, A.V.L., Goldenberg, R., Lohmann, L.G., Michelangeli, F.A., Miyaki, C., Rodrigues, M.T., Silva, T.S. and Carnaval, A.C. (2021) Environmental correlates of taxonomic and phylogenetic diversity in the Atlantic Forest. Journal of Biogeography, 48, 1377-1391.

Pereira, L.C., Chautems, A. and Menini Neto, L. (2021) Biogeography and Conservation of Gesneriaceae in the Serra da Mantiqueira, Southeastern Region of Brazil. Brazilian Journal of Botany, 44, 239–248.

Pinedo-Escatel, J.A., Aragón-Parada, J., Dietrich, C.H., Moya-Raygoza, G., Zahniser, J.N. and Portillo, L. (2021) Biogeographical evaluation and conservation assessment of arboreal leafhoppers in the Mexican Transition Zone biodiversity hotspot. Diversity and Distributions, 27, 1051-1065.

Suissa, J.S., Sundue, M.A. and Testo, W.L. (2021), Mountains, climate and niche heterogeneity explain global patterns of fern diversity. Journal of Biogeography, 48, 1296-1308.

Yang, X., Liu, B., Bussman, R.W., Guan, X., et al. (2021) Integrated plant diversity hotspots and long-term stable conservation strategies in the unique karst area of southern China under global climate change. Forest Ecology and Management, 498, 119540.

Xu, M.‐Z., Yang, L.‐H., Kong, H.‐H., Wen, F. and Kang, M. (2021) Congruent spatial patterns of species richness and phylogenetic diversity in karst flora: the case study of Primulina (Gesnariaceae). Journal of Systematics and Evolution, 59, 251-261.

Xue, T., Gadagkar, S.H., Albright, T.P., Yang, X., Li, J., Xia, C., Wu, J., and Yu, S. (2021) Prioritizing conservation of biodiversity in an alpine region: Distribution pattern and conservation status of seed plants in the Qinghai-Tibetan Plateau. Global Ecology and Conservation, 32, e01885.

Zhang, Y., Chen, J. and Sun, H. (2021) Alpine speciation and morphological innovations: revelations from a species-rich genus in the Northern Hemisphere. AoB PLANTS, 13, 3, plab018.

Zhang, Y., Qian, L., Spalink, D., Sun, L., Chen, J. and Sun, H. (2021) Spatial phylogenetics of two topographic extremes of the Hengduan Mountains in southwestern China and its implications for biodiversity conservation. Plant Diversity, 43, 181-191.

Zhu, Z-X, Harris, A.J., Nizamani, M.M., Thornhill, A.H., Scherson, R.A. and Wang, H-F. (2021) Spatial phylogenetics of the native woody plant species in Hainan, China. Ecology and Evolution, 11, 2100-2109.