Tuesday 5 November 2024

Randomisations: Curveball algorithm now in Biodiverse

Biodiverse supports a range of randomisations to assess significance of analysis results.  Most use cases in the published literature use the rand_structured algorithm, which is explained in this post, but several common algorithms are supported.  

One of the design principles of Biodiverse is to give the user choice.  To that end, the curveball algorithm is available from version 5.  

The publication describing Curveball is Strona et al. (2014).  The name is derived from a baseball card trading pastime popular in North America.  

The curveball algorithm is applied to a data set of items (species, genera, words, or some other set of identifiers).  In the common biodiversity case this is a sites by species matrix, transformed to a list of lists, e.g. a list of site lists, where each site list comprises its species (or vice versa).  These lists can be considered as sets.  At each iteration, two lists (sets of items) are randomly selected.  Any items found in both sets are ignored.  The rest can be swapped between the two sets, with the number swapped limited by the smaller number of unique items in the two sets to ensure after swapping that each set retains the same number of items it started with.  As an example, consider the case where set 1 has ten items, set 2 has eight, and there are six common items found in both lists.  This means two items can be swapped between the two lists.

The general formula for the number of possible swaps at an iteration is (min (|A|,|B|) - |A ∩ B|), where A and B are the two sets being considered, and the pipes || denote the lengths of the sets (the numbers of items they contain).   If one prefers to think in terms of dissimilarity measures where a is the number of shared items, b the number unique to set 1 and c the number unique to set 2, then the formula is (min (b,c)).  Purely as an aside, this is also part of the denominator in Simpson's dissimilarity index.  

The curveball algorithm is related to the independent swaps algorithm.  The chief advantage of curveball over independent swaps is that, because it swaps as many items as it can at each iteration, it converges on a randomised result much faster.  Curveball also avoids the main pitfall of the independent swaps algorithm where a pair can be selected that cannot be swapped, thus "wasting" an iteration (swap attempt).  

Curveball does, however, have the same issue that independent swaps has in that the user needs to specify the number of iterations over which swaps will be attempted.  Too few and the resulting matrix will not be sufficiently random.  Too many and time will be "wasted".  This is addressed in Biodiverse by optionally tracking which of the original matrix entries have been swapped, and stopping when all have been done (the stop_on_all_swapped parameter).  This has some overhead in the tracking but generally this should be balanced by the time saved by running fewer iterations overall.  For those interested, the default number of swaps is the same as for the independent swaps algorithm, which is twice the number of non-zero matrix entries (twice the sum of the lengths of all lists).

Accessing the curveball algorithm in Biodiverse is the same as for any of the randomisations.  Open the Randomisation tab, select rand_curveball as the randomise function, select the number of randomisation iterations and any other algorithm specific parameters, then press Go (see image below).  The results are in the same format as always (e.g. see here, here and here).

Since it is just another algorithm, all the common options are available (another new change in version 5 is that more options are available across all algorithms in the GUI - see issue 946).  Users can define regions that are randomised separately before reassembly for analysis, including some that are not to be randomised.  One can also add some of the randomised results to the project to inspect them.

In terms of speed, curveball is faster than rand_structured.  This is largely due to there being less book-keeping required.  However, as with independent swaps, curveball can only be applied on a per-cell basis.  It does not extend to spatially structured randomisations like rand_structured does (one could ensure swap candidates come from within some local neighbourhood, but this is a different model to something like a diffusion process or a random walk).

All that is needed to run the curveball algorithm is to choose rand_curveball as the "Randomise function".  Other parameters are set as usual.


And that's pretty much it for the description.  If you want to read more randomisation related blog posts then check out the posts tagged with the randomisation label.  


----

Shawn Laffan

05-Nov-2024


For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/  


For a list of some of the analyses Biodiverse has been used for, see https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList 


You can also join the Biodiverse-users mailing list at https://groups.google.com/group/Biodiverse-users or start a discussion at https://github.com/shawnlaffan/biodiverse/discussions 


Monday 4 November 2024

Plotting indices with divergent colour schemes

Many diversity indices have numerical distributions that are divergent, i.e. they are centred on some value and the interesting bit is the magnitude of the differences away from that value.  A simple example is z-scores, where the data are centre on a value of zero and the values indicate how many standard deviations above or below the expected value the input data are.   These have been plotted using a divergent scheme since version 4.1, as described here.

However, one can also have indices that are simple differences, and also ratios where 1 is the centre of the distribution, and values of 1/2 and 2 are the same magnitude difference from the centre.  The relative phylogenetic diversity and endemism indices are examples of the latter.  

From version 5, Biodiverse plots difference and ratio indices using a divergent colour scheme.    These use the same colour range as the z-scores but plotted along a continuous scale instead of as ordinal classes.  

The colouring happens automatically based on metadata stored with the indices (incidentally, the much of GUI is built using this metadata).  

Colours are also scaled so the most extreme "high" colour is equivalent to the most extreme "low" colour, i.e. if the range of difference values is -5 to 1 then the colours are assigned to the range -5 to 5, and the same for -1 to 5.  This is also accounted for when the data are log scaled or percentile trimmed to de-emphasise extreme values.  

A useful point to note is that the colour schemes can be flipped, so if one prefers blue as extreme positive values then this can be done under the Map menu at the left of the display.  

An example is below to compare the old behaviour with the new.  


Prior to version 5, ratio data were plotted using the same colour scheme as any other data, making it difficult to interpret the relative magnitude of the index values across cells.  These are the Relative Phylogenetic Diversity results for the Acacia data set of Mishler et al. (2014), scaled to emphasise the inner 90% of the distribution (i.e. the upper 5% are assigned the same colour, so too the lower 5%).  This is the interval [0.406, 0.896], which means red cells include ratios <1 which is not ideal.  Compare with the next figure.    




The same data as in the previous figure, but now using a divergent colour scheme.  Biodiverse knows this is a ratio index, so assigns colours accordingly.  Red cells have ratios exceeding 1, blue cells less than 1.  Ratios close to 1 are in yellow.  The colours are assigned to the interval [0.406,2.463], where 2.463=1/0.406.  This means one can be sure red cells have ratios exceeding 1, and there is less chance of misinterpreting the results.  





It is not shown here, but the metadata is also stored for tree-based indices so divergent colours are assigned to the tree branches where appropriate.  More details about that process are in this post.  


----

Shawn Laffan

04-Nov-2024


For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/  


For a list of some of the analyses Biodiverse has been used for, see https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList 


You can also join the Biodiverse-users mailing list at https://groups.google.com/group/Biodiverse-users or start a discussion at https://github.com/shawnlaffan/biodiverse/discussions 


GUI: Polygon overlays (and underlays)

Since its first release, Biodiverse has supported plotting of polygon and polyline feature class data (from shapefiles).  The support is very basic given users can only plot the outlines of polygons, even though the colours could be changed.  

This has worked well overall, but there are times when the linework from the feature data gets in the way of the cells being plotted.  There are also times when it is useful to plot polygons as solid fills instead of just as the outline.  From version 5 of Biodiverse it is possible to do just this.  

The process is relatively simple.  If a polygon overlay is loaded then it is listed twice in the selection window, once for lines and once for solid fill (with no outline).  The default choice is polylines, which is the current behaviour.  Users then have the option of plotting one overlay above or below the cells.



Colours can be assigned in the usual way.  In this next selection window, the polygon data will be displayed below the cells using a grey colour (grey is quite useful as it does not visually dominate when coloured cells are used).  




Polygon data are displayed as a solid grey fill, under the cells.  In this case it makes it more obvious where there are unsampled regions.  (Cell outlines have also been turned off using the map menu).


Other uses for polygon overlays are in plotting ocean polygons over terrestrial cells to cover over parts of cells that are in the sea (and vice versa for marine data).  


There is no doubt more work to be done, for example plotting more than one layer at a time, but it is a useful improvement.  If more complex plotting is needed then this is when it is best to leverage the power of GIS software.  


----

Shawn Laffan

04-Nov-2024


For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/  


For a list of some of the analyses Biodiverse has been used for, see https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList 


You can also join the Biodiverse-users mailing list at https://groups.google.com/group/Biodiverse-users or start a discussion at https://github.com/shawnlaffan/biodiverse/discussions 


Thursday 29 February 2024

Publications using Biodiverse in 2023

2024 is moving quickly, so here is a list of publications from 2023 that used Biodiverse.

If you want to see the full list (211 at the time of writing), then go to https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList


For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/


  • Aragón-Parada, J., Carrillo-Reyes, P., Rodríguez, A., Munguía-Lino, G., Salinas-Rodríguez, M. M. and De-Nova, J. A. (2023) Spatial phylogenetics of the flora in the Sierra Madre del Sur, Mexico: Evolutionary puzzles in tropical mountains. Journal of Biogeography, 50, 1679-1691.

  • Copilaș-Ciocianu, D., Sidorov, D., & Šidagytė-Copilas, E. (2023). Global distribution and diversity of alien Ponto-Caspian amphipods. Biological Invasions, 25, 179-195.

  • de Pedro, D., Ceccarelli, F.S., Vandame, R. et al. (2023) Congruence between species richness and phylogenetic diversity in North America for the bee genus Diadasia (Hymenoptera: Apidae). Biodiversity and Conservation, 32, 4445–4459.

  • Dlamini, W.M.D. and Loffler, L. (2023). Tree Species Diversity and Richness Patterns Reveal High Priority Areas for Conservation in Eswatini. In: Dhyani, S., Adhikari, D., Dasgupta, R., Kadaverugu, R. (eds) Ecosystem and Species Habitat Modeling for Conservation and Restoration. Springer, Singapore.

  • Erst, A.S., Baasanmunkh, S., Tsegmed, Z. et al. (2023) Hotspot and conservation gap analysis of endemic vascular plants in the Altai Mountain Country based on a new global conservation assessment. Global Ecology and Conservation, 47, e02647

  • Fernandes, N.B.G., Moraes, A.M. and Milward-de-Azevedo, M.A. (2023) Diversity of the Passiflora L. in the Serra do Mar ecoregion and the relationships with environmental gradients, South and Southeast, Brazil. Acta Botanica Brasilia, 37, e20220314.

  • Flores-Argüelles, A., López-Ferrari, A.R., & Espejo-Serna, A. (2023). Geographic distribution and endemism of Bromeliaceae from the Western Sierra-Coast region of Jalisco, Mexico. Botanical Sciences, 101, 527-543.

  • Flores-Tolentino, M. et al. (2023). Delimitación geográfica y florística de la provincia fisiográfica de la Depresión del Balsas, México, con énfasis en el bosque tropical estacionalmente seco. Revista mexicana de biodiversidad, 94, e944985.

  • Francisco-Gutiérrez, A., Eduardo Ruiz-Sanchez, E. and Lira-Noriega, A. (2023) Biogeography and conservation assessments of the species of Lamourouxia (Orobanchaceae). Acta Botanica Mexicana 130: e2213.

  • González-Orozco, C.E. (2023) Unveiling evolutionary cradles and museums of flowering plants in a neotropical biodiversity hotspot. Royal Society Open Science, 10230917230917.

  • González-Orozco, C.E., Diaz-Giraldo, R.A. and Rodriguez-Castañeda, C. (2023) An early warning for better planning of agricultural expansion and biodiversity conservation in the Orinoco high plains of Colombia. Frontiers in Sustainable Food Systems, 7.

  • González-Orozco, C., Osorio-Guarín, J., & Yockteng, R. (2023). Phylogenetic diversity of cacao (Theobroma cacao L.) genotypes in Colombia. Plant Genetic Resources, 20, 203-214.

  • González-Orozco, C.E. & Parra-Quijano, M. (2023) Comparing species and evolutionary diversity metrics to inform conservation. Diversity and Distributions, 29, 224-231.

  • González-Orozco, C. E., Reyes-Herrera, P. H., Sosa, C. C., Torres, R. T., Manrique-Carpintero, N. C., Lasso-Paredes, Z., Cerón-Souza, I. and Yockteng, R. (in press). Wild relatives of potato (Solanum L. sec. Petota) poorly sampled and unprotected in Colombia. Crop Science.

  • Guo, WY., Serra-Diaz, J.M., Eiserhardt, W.L. et al. (2023) Climate change and land use threaten global hotspots of phylogenetic endemism for trees. Nature Communications, 14, 6950.

  • Mardones, D. and Scherson, R.A. (2023) Hotspots within a hotspot: evolutionary measures unveil interesting biogeographic patterns in threatened coastal forests in Chile. Botanical Journal of the Linnean Society, 202, 433–448.

  • McCurry, M.R., Park, T., Coombs, E.J. Hart, L.J., Laffan, S. (2023) Latitudinal gradients in the skull shape and assemblage structure of delphinoid cetaceans. Biological Journal of the Linnean Society, 138, 470-480.

  • Miller, J.T., Prentice, E., Bui, E.N., Knerr, N., Mishler, B.D., Schmidt-Lebuhn, A.N., González-Orozco, C.E., Laffan, S. W. (2023). Banksia (Proteaceae) contains less phylogenetic diversity than expected in Southwestern Australia. Journal of Systematics and Evolution, 61, 957-966.

  • Molina-Paniagua, M.E., Alves de Melo, P.H., Ramírez-Barahona, S., Monro, A.K., Burelo-Ramos, C.M., Gómez-Domínguez, H., et al. (2023) How diverse are the mountain karst forests of Mexico? PLoS ONE 18, e0292352.

  • Nicolau, G.K. and Edwards, S. (2023) Diversity and endemism of Southern African Gekkonids linked with the escarpment has implications for conservation priorities. Diversity, 15, 306.

  • Ortiz-Brunel J.P., Ochoterena H., Moore M.J., Aragón-Parada J., Flores J., Munguía-Lino G., Rodríguez A., Salinas-Rodríguez M.M. and Flores-Olvera H. (2023) Patterns of Richness and Endemism in the Gypsicolous Flora of Mexico. Diversity, 15, 522.

  • Ramírez-Verdugo, P., Tapia, A., Forest, F. and Scherson, R.A. (2023) Evolutionary diversity of the endemic genera of the vascular flora of Chile and its implications for conservation. PLoS ONE 18(7): e0287957. https://doi.org/10.1371/journal.pone.0287957

  • Ruiz-Sanchez, E., Munguía-Lino, G., Pianissola, E.M., Ely, F. and Clark, L.G. (2023) Richness and endemism in Chusquea subg. Swallenochloa (Poaceae), a Neotropical subgenus adapted to temperate conditions. Phytotaxa, 609, 180-194.

  • Villaseñor, J. L., Ortiz, E., & Hernández-Flores, M. M. (2023). The vascular plant species endemic or nearly endemic to Puebla, Mexico. Botanical Sciences, 101, 1207-1221.

  • Wang, C., Zhu, S., Jiang, X., Chen, S., Xiao, Y., Zhao, Y., Yan, Y. and Wen, Y. (2023) Spatio-temporal variation of species richness and phylogenetic diversity patterns for spring ephemeral plants in northern China. Global Ecology and Conservation, 48, e02752.

  • Ye, C. et al. (2023) Geographical distribution and conservation strategy of national key protected wild plants of China. iScience, 26, 107364.

  • Zhang, H., Chen, S.-C., Bonser, S.P., Hitchcock, T., & Moles, A.T. (2023). Factors that shape large-scale gradients in clonality. Journal of Biogeography, 50, 827-837

  • Zhou, R., Ci, X., Hu, J., Zhang, X., Cao, G., Xiao, J., Liu, Z., Li, L., Thornhill, A.H., Conran, J.G. and Li, J. (2023) Transitional areas of vegetation as biodiversity hotspots evidenced by multifaceted biodiversity analysis of a dominant group in Chinese evergreen broad-leaved forests. Ecological Indicators, 147, 110001


Shawn Laffan

29-Feb-2024



Saturday 3 February 2024

Map side menu: The tree plot controls are now a separate submenu, and some new features

From version 5 of Biodiverse the tree plot controls in the left side menu are now their own submenu.  This greatly simplifies the interface.

The displayed tree can now be exported, including the colours used when plotting the tree.  Previously the colours were not stored so this was not possible.  To export the colours corresponding to a specific cell then right click on that cell to fix the colouring in place.  This stops any further updates until another cell is clicked on.  The interface itself is unchanged, including the options to export the colours and an RGB geotiff of the spatial plot.  

In addition, there are several new plotting options that allow one to plot using equal and range weighted branch lengths.  

  

It is now possible to plot the tree using normal branch lengths, depth and also equal branch length and range weighted.  

The equal branch length tree is the alternate tree in the CANAPE protocol

The range weighted tree can be used to understand how PE works.  

The ranged weighted equal branch length tree can be used to understand the RPE index used in CANAPE.   

----

Shawn Laffan

03-Feb-2024


For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/  


For a list of some of the analyses Biodiverse has been used for, see https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList 


You can also join the Biodiverse-users mailing list at https://groups.google.com/group/Biodiverse-users or start a discussion at https://github.com/shawnlaffan/biodiverse/discussions 


Tree panels: colour the tree using any list from spatial outputs across the project

It has long been possible to colour the tree branches in the spatial tab.  However, it was only possible to use a list from the spatial analysis being plotted.  

From version 5 the interface has been changed to enable selection of any list from spatial outputs across the project.  Where before the system had a simple drop down list, it is now a menu with submenus for each basedata and then each of its spatial outputs. 


The tree colour list selection is now a menu that allows users to choose any list across all spatial outputs in the project 

As with most widgets in the GUI, the menu entries are described in the tooltip.  That text is duplicated below.  

The first (default) option shows the paths connecting the labels in the neighbour sets used for the analysis. When there is one such set all branches are coloured blue. When there are two such sets blue denotes branches only in the first set, red denotes those only in the second set, and black denotes those in both. From these one can see the turnover of branches between the groups (cells) in each neighbour set.

The next set of menu options are list indices in the spatial output that belongs to this tab.  The remainder are lists across other spatial outputs in the project, organised by their basedata objects.  These are in the same order as in the Outputs tab.  Basedatas and outputs with no list indices are not shown.

If a branch is not in the list then it is highlighted using a default colour (usually black).  If the selected output has no labels that are also on the tree then no highlighting is done (all branches remain black).

Right clicking on a group (cell) fixes the highlighting in place, stopping changes to the branch colouring as the mouse is hovered over other groups.  This allows the tree to be exported with the current colouring (another new option in version 5).

----

Shawn Laffan

03-Feb-2024


For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/  


For a list of some of the analyses Biodiverse has been used for, see https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList 


You can also join the Biodiverse-users mailing list at https://groups.google.com/group/Biodiverse-users or start a discussion at https://github.com/shawnlaffan/biodiverse/discussions 


Trimming basedatas has been generalised

It has long been possible to trim the basedata labels to keep only those that match either the selected tree or selected matrix.  

From Version 5 (actually 4.99_002 if you like development versions) it is possible to trim using a different basedata.  The interface has also been generalised in the process.    

There's not much to it, so here are some screenshots to demonstrate the process.  

Generalised trimming is accessed from the basedata menu





It has the usual interface where one can specify a new name.  "Trimming a clone" ensures it operates on a copy.  "Delete matching" allows one to invert the trim, i.e. if one wants to keep only the labels that do not match,




Any of the basedatas, trees or matrices in the project can be selected to use as the label source.  


----

Shawn Laffan

03-Feb-2024


For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/  


For a list of some of the analyses Biodiverse has been used for, see https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList 


You can also join the Biodiverse-users mailing list at https://groups.google.com/group/Biodiverse-users or start a discussion at https://github.com/shawnlaffan/biodiverse/discussions