Monday, 8 April 2019

Reduce the spatial resolution of your data



A longstanding wish-list item for Biodiverse is the ability to decrease the resolution of the cells (groups).  For example, you might have imported your data at a 50 km resolution but want to see what happens to the analysis results when the data are aggregated to 100 km.

This has just been implemented for version 3.  If you are impatient then you can try it in the 2.99_003 development release.  The download link is at the end of this post (and please provide feedback if you do try it).

Some screenshots will probably help explain things.  A few more details follow them.




The original data (100,000 units on a side) with some lines from a shapefile overlaid. 


The interface is accessed through the BaseData menu, and generates a new BaseData object


The interface allows control over the new name, the new resolution and the new origin.



In this example, the new cells will be 200,000 units on each side, with the cells aligning with an origin coordinate at (100,000, 100,000).  For these data it means that, if the cells span the coordinate (0,0) then it will be the centre of a cell.




And the new data with the reduced resolutions (200,000 units on a side), again with the lines so they can be cross-referenced with the original data.  Note how the sample counts for each label are the same, but the variety scores (number of groups each label is found in) are now smaller.    


There is nothing stopping rectangular groups, except perhaps a cultural preference for square cells.  

A few extra details:

The cell sizes can only ever be increments of the current cell sizes.  The data used in Biodiverse are commonly observations that have been aggregated to the groups (cells), and the original ditribution are not kept.  There is no good way of disaggregating the data to restore the original distributions.  This means you can aggregate to coarser units, but not go the other way.

The origins are also increments of the cell sizes, for the same reason.

The GUI will snap any values to increments of the cell size if users try to enter different values.

Text axes cannot be aggregated (although it could be done if there were a need and a good way of doing so).

Axes with zero cellsizes, e.g. where data are used as points, can be aggregated to any coarser resolution.  The system generates a default for each axis using the extent of that axis divided by 20 (so it will result in 20 cells along each axis by default).


Shawn Laffan
06-April-2019

If you want to try this out before version 3 is released then the 2.99_003 release can be accessed through the downloads page at https://github.com/shawnlaffan/biodiverse/wiki/Downloads

--------

For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse


To see what Biodiverse has been used for, see https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList


You can also join the Biodiverse-users mailing list at http://groups.google.com/group/Biodiverse-users or follow the google plus page:  https://plus.google.com/+BiodiverseSoftware


Tuesday, 2 April 2019

Publications using Biodiverse in 2018

We are well into 2019, so here is a list of publications that used Biodiverse in 2018. 

If you want to see the full list (100 at the time of writing), then go to https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList 

For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/ 

Shawn Laffan
02-Apr-2019

Bloomfield, N.J., Knerr, N. and Encinas-Viso, F. (2018) A comparison of network and clustering methods to detect biogeographical regions. Ecography, 41, 1-10.

Carta, A., Pierini, B., Roma-Marzio, F., Bedini, G. and Peruzzi, L. (2018) Phylogenetic measures of biodiversity uncover pteridophyte centres of diversity and hotspots in Tuscany. Plant Biosystems, 152, 831-839.

Dalrymple, R.L., Kemp, D.J., Laffan, S.W., White, T.E., Flores-Moreno, H., Hemmings, F.A., Hitchcock, T.D., & Moles, A.T. (2018) Abiotic and biotic predictors of macroecological patterns in bird and butterfly coloration. Ecological Monographs, 88, 204-224.

Di Virgilio, G., Wardell-Johnson, G.W., Robinson, T.P., Temple-Smith, D., Hesforde, J. (2018) Characterising fine-scale variation in plant species richness and endemism across topographically complex, semi-arid landscapes. Journal of Arid Environments, 156, 59-68.

Elliott, M.J., Knerr, N.J. and Schmidt-Lebuhn, A.N. (2018) Choice between phylogram and chronogram can have a dramatic impact on the location of phylogenetic diversity hotspots. Journal of Biogeography, 45, 2190-2201.

Guedes, T.B., Sawaya, R.J., Zizka, A., Laffan, S.W., Faurby, S., Pyron, A., Bérnils, R.S., Jansen, M., Passos, P., Prudente, A.L.C., Cisneros-Heredia, D.F., Braz, H.B., Nogueira, C.d.C., & Antonelli, A. (2018) Patterns, biases and prospects in the distribution and diversity of Neotropical snakes. Global Ecology and Biogeography, 27, 14-21.

Laffan, S.W. (2018). Phylogeny-based measurements at global and regional scales. In R. Scherson & D. Faith (Eds.), Phylogenetic Diversity: Applications and Challenges in Biodiversity Science (pp. 111-129): Springer.

Link-Pérez, M.A. and Laffan, S.W. (2018) Fern and lycophyte diversity in the Pacific Northwest: Patterns and predictors. Journal of Systematics and Evolution, 56, 498-522.

López-Aguirre, C., Archer, M., Hand, S.J. and Laffan, S.W. (2018) Phylogenetic diversity, types of endemism and the evolutionary history of New World bats. Ecography, 41, 1955-1966.

Miu, I.V., Chisamera, G.B., Popescu, V.D., Iosif R., Nita, A., Manolache, S., Gavril, V.D., Cobzaru, I. and Rozylowicz, L. (2018) Conservation priorities for terrestrial mammals in Dobrogea Region, Romania. ZooKeys, 792, 133-158.

Montaño-Arias, G., Luna-Vega, I., Morrone, J.J., Espinosa, D. (2018) Biogeographical identity of the Mesoamerican dominion with emphasis on seasonally dry tropical forests. Phytotaxa, 376, 277-290.

Orsenigo, S. et al. (2018) Red Listing plants under full national responsibility: Extinction risk and threats in the vascular flora endemic to Italy. Biological Conservation, 224, 213-222.

Sosa, V., De-Nova, J.A. and Vásquez-Cruz, M. (2018) Evolutionary history of the flora of Mexico: Dry forests cradles and museums of endemism. Journal of Systematics and Evolution, 56, 523-536.

Spalink, D., Pender, J., Escudero, M., Hipp, A.L., Roalson, E.H., Starr, J.R., Waterway, M.J., Bohs, L. and Sytsma, K.J. (2018) The spatial structure of phylogenetic and functional diversity in the United States and Canada: An example using the sedge family (Cyperaceae). Journal of Systematics and Evolution, 56, 449-465.

Spalink, D. et al. (2018) Spatial phylogenetics reveals evolutionary constraints on the assembly of a large regional flora. American Journal of Botany, 105, 1938-1950.

Yap, J-YS., Rossetto, M., Costion, C., et al. (2018) Filters of floristic exchange: How traits and climate shape the rain forest invasion of Sahul from Sunda. Journal of Biogeography, 25, 838-847.

Zhang, H., Bonser, S. P., Chen, S.-C., Hitchcock, T. and Moles, A. T. (2018) Is the proportion of clonal species higher at higher latitudes in Australia? Austral Ecology, 43, 69-75.


Thursday, 28 February 2019

Using the Run Exclusions dialogue

One of the processes users are often interested in is filtering their data after they have been imported.  This is often simpler than filtering a large  table of input data into multiple versions for different different purposes, especially if the input file is something like a 10 GB CSV file.


In Biodiverse this can be done using the Run Exclusions dialogue, accessed via the Basedata menu.

The main principle of this dialogue is to allow the user to select some set of properties, of either labels or groups, and then remove them from the basedata.  A simple example might be to remove all groups containing five or fewer labels, or labels with ranges exceeding some threshold.  More complex queries can also be specified using text matching (labels) or spatial conditions (for groups).

One can also delete labels using the selection menu in the View Labels tab (see this previous post), but this does not apply to groups (unless you transpose the basedata so you can treat the groups as labels, and then transpose them back).

An important point to note is that the Run Exclusions dialogue does not trigger updates in any open View Labels tabs.  If you delete labels or groups from a basedata then you will need to close and re-open any View Labels tabs for that basedata to see the changes.

It is also worth noting that the aggregation of labels to groups is a one way process.  The original input records cannot be recovered from a basedata object unless a cell size of zero is used (for numeric axis data) or there is only one record per text axis (when text axes are used).  It is impractical to store the 10 GB data from the example at the top.


The rest of this blog is just a set of examples.  The data are the example data provided with Biodiverse.   There are also details in the help system at https://github.com/shawnlaffan/biodiverse/wiki/SampleSession#excluding-data



The Exclusions dialogue is accessed through the Basedata menu.  Exclusions will apply to the currently selected Basedata object.  

View Labels tab showing the original distribution of data.  

This example uses a definition query to delete all groups whose centroid falls inside polygons where the state field has a value of 'Tas'.   

The system gives feedback about how much was deleted.  

Note the removal of most cells in Tasmania.  The reason not all cells are removed is that the centroid of some groups falls outside the polygon.  An overlap condition will be added one day, but until then the shapefile needs to be modified to catch these cases.

This example will remove all groups with fewer than 6 species, or where the sample redundancy score is less than 0.1.

The feedback (based on a fresh copy of the basedata)

Not much is left in this case...

This (highly contrived) contrived regular expression will delete any label ending in sp followed by any digit and ending in 2.  It will delete Genus:sp12, Genus:sp22 up to Genus:sp92, but not labels like Genus:sp2, Genus:sp83, Genus:sp222.  (Ignore the cursor between the 2 and the $)

And the feedback.  Only two labels are deleted, as the only matching labels in this basedata are Genus:sp12 and Genus:sp22.

That's about it, really.  If you have questions then they can be posted to the Biodiverse user group, or to the google plus page (until Google shuts that service down).

Shawn Laffan
28-Feb-2019


--------

For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse


To see what Biodiverse has been used for, see https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList


You can also join the Biodiverse-users mailing list at http://groups.google.com/group/Biodiverse-users or follow the google plus page:  https://plus.google.com/+BiodiverseSoftware



Saturday, 22 December 2018

Visualise matrices of spatial turnover

A long standing feature in Biodiverse is the ability to interactively visualise spatial patterns of turnover.  This was described in Laffan (2011), but without detailed instructions for how to generate the data.

This blog post provides a few more details, as well as a video showing how to visualise and export the data.  Exported data can be used in, for example, NMDS analyses to relate turnover to environmental patterns (see González-Orozco et al. 2013 and González-Orozco et al. 2014).

One thing to note is that if you have a phylogeny selected then you can view which branches of the tree are shared or differ between the index and neighbour group (cell).  This is evident in the video, and is described in an earlier blog post: https://biodiverse-analysis-software.blogspot.com/2014/10/new-tree-plots-in-biodiverse.html 


1.  Generate the data


If you only want to build the matrix (or matrices) then select this option. 

2.  View the matrices

[Update 2019-01-29]  The matrix can be viewed by opening it from the Outputs tab.  The default display from the cluster analysis is the dendrogram and its associated spatial plot.

Notice the regions that become evident depending on which cells are selected.  For example, cells in South-West Western Australia are strongly related to other cells in that region, cells in Northern Australia are related across part of the top-end, while those in Tasmania extend into Victoria.
   


3.  Export the data



Exporting of the data is via the Export menu at the left.  It can also be done using the export button in the Outputs tab when the matrix is selected.

Currently only delimited text is supported, but you can choose whether to use normal, sparse or GDM compatible output formats.  Hovering over the options gives more details about what they do.  


Shawn Laffan

22-Dec-2018



For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/


To see what Biodiverse has been used for, see https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList


You can also join the Biodiverse-users mailing list at http://groups.google.com/group/Biodiverse-users or follow the google plus page:  https://plus.google.com/+BiodiverseSoftware


Thursday, 13 December 2018

Import polygon and polyline data

The short summary

As of Biodiverse version 3, you can directly import polygon and polyline data from GIS feature data sets.  If you want to try it before v3 is released then it is in the current development release.

The more detailed explanation:

Ever since development started on Biodiverse it has been able to import spatial data as point records from delimited text files (e.g. CSV format).  The ability to import raster data was later added, as well as the capacity to import point records from more sources (spreadsheets and shapefiles).

However, taxon distribution records are also frequently provided in the form of polygon range maps.  One commonly used example of these are the IUCN Red List data, but there are many sources.

The way to import such data using Biodiverse 2.1 and earlier is to process the data outside Biodiverse so they can be represented as points or tables.  This is done by intersecting them with a fishnet of polygons (also called a vector grid) that aligns with the cells that will be used in Biodiverse.  Once intersected, they can be converted to points, or their coordinates added to the attribute tables, using centroid calculations.  This is what was done in López-Aguirre et al. (2018), for example.

The fishnet approach is relatively simple if one is familiar with GIS operations, but is not something that should be done by hand when numerous taxa are to be analysed or different coordinate origins are being tried.  In such cases one can script the process, but for many this can be yet another thing to learn, and not something that is done in a hurry to meet a short deadline.  (Note that scripting is a very useful skill to have, and is portable beyond the language du jour one might first learn).

With some recent changes to the Biodiverse codebase, importation of polygon data is automated and part of the standard Biodiverse data import process.  As an added bonus, polyline data are also supported, so if you have data such as for crustacean presences along stream segments then they can also be imported.

As another bonus, if you have a mix of point, line and polygon data then they can all be imported in one pass, providing they all have the fields or attributes you select.  If not then the system will throw an error.

The set of geometry attributes that are available to select from are :shape_x, :shape_y, :shape_z, :shape_m, :shape_area and :shape_length.  Not all files have all attributes.  Point files do not have a :shape_area or :shape_length, polyline files do not have :shape_length, and polygon files do not have :shape_area.  Many files do not have :shape_z or :shape_m axes - these are for 3D shapefiles or those with time measures.

A worked example

A worked example is probably the best way to show how to use it.  Those familiar with the process of importing data will note that it is almost the same as the current process, which is quite convenient to say the least.  

Some example data.  The polygons have no specific meaning.

As usual, select the data set (or data sets) to import.  Make sure you select Shapefile as the Format.  

This step is identical to the spreadsheet and delimited text imports.

Select the fields or attributes as appropriate.  The attributes that are visible (:shape_x, :shape_m, :shape_length, :shape_area etc) depend on properties of the first file selected. 

And here is the file imported.  There is only one taxon label in this data set, so there is not much more to show, but once imported the data area analysed like any other.

How does it work?

It is essentially an automation of the process described above.

First, a fishnet grid of polygons is generated to match the cell size of the BaseData object being imported into.

There are then two ways of handling the data.

The default approach is to treat the polygon and polyline data as presence-only, so a taxon is recorded as present in any fishnet polygon that its feature data intersect with.  This is by far the fastest approach as the system can stop checking and return true as soon as it finds an intersection.

The second approach is to calculate a new data set that is the intersection of the input data set and the fishnet data set (imagine using the fishnet polygons as a cookie cutter on the taxon polygons).  This process can be substantially slower, as the system must iterate over the polygon or polyline vertices, identify where they intersect, and then cut them as appropriate. However, if you need the additional information then so be it.  That said, this approach is only used if the area or length of the intersecting features are needed, for example they are to be added as group properties or used for the sample counts.

The underlying processing is all done using the GDAL and GEOS libraries, so some of the operations will be familiar to some users as there are interfaces for Python and R, amongst other languages.

Spatial indexes 

Both approaches use spatial indexes to speed up the calculations.  As an example of the difference this makes, one data set used in testing took 9 minutes without the index, and 70 seconds with it.  For comparison, testing for presence only takes a few seconds (with the index).  It is worth noting that spatial indexes have long been used in Biodiverse to speed up processing, albeit using a different approach.

Note that, even with the spatial index, large and complex polygons will take longer to import than simple polygons.  Multipart polygons can also sometimes take longer than single part, especially if the envelope of the features (the bounding rectangle) is very large.  This is because Biodiverse needs to check all the fishnet polygons within the envelope, so if most of the fishnet polygons do not overlap the taxon polygon then most of these checks are redundant.  If there is a need then further optimisations for the above issues can be looked for. 

[Update 22-12-2018 - several optimisations have since been implemented to address the above issues and will be available in the 2.99_002 development release.]


You can also just use the attribute table

There are some occasions when you only want the data from the attribute table.  If you don't use any of the geometry fields (:shape_x, :shape_area etc.) then Biodiverse will treat the table in the same way that it imports delimited text and spreadsheet data.  This means that each record in the table is the same as a row in a spreadsheet or line in a text file.

An example of when this might be useful is if you have data summarised across biomes or other regions and are not interested in analysing the data spatially, e.g. you only want to calculate Phylogenetic Diversity for the biome level assemblages and not at every location in the biome.




Shawn Laffan
10-Dec-2018



For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/  

For the full list of changes in the 2.99 series (leading to version 3) see https://github.com/shawnlaffan/biodiverse/wiki/ReleaseNotes#version-299 (for all issues addressed or being targeted to fix for version 2, see https://github.com/shawnlaffan/biodiverse/milestone/15 ).


To see what Biodiverse has been used for, see https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList


You can also join the Biodiverse-users mailing list at http://groups.google.com/group/Biodiverse-users or follow the google plus page:  https://plus.google.com/+BiodiverseSoftware


Thursday, 23 August 2018

What's new in version 2.1?

Version 2.1 has just been released.

It provides a small number of updates and improvements over the version 2.0 release.


Highlights are:
  • GUI
    • The label list in the view labels tab is now correctly updated when multiple labels are deleted. Issue #700
    • The user defined colours in the cluster tab uses a 13 colour palette by default (it was 9). Issue #688
  • Exports
  • Randomisations
    • The structured randomisations are faster for larger data sets. Issue #685
  • Tree trimming
    • Tree trimming has been sped up for large trees. Issue #679
    • The trim trees tool has the option to trim to the last common ancestor, thereby removing a dangling root node. Issue #670

For the full list of issues and changes in the 2.1 release, see https://github.com/shawnlaffan/biodiverse/issues?utf8=%E2%9C%93&q=milestone%3ARelease_2.1+



To see the full list of open issues or to report a bug or enhancement request, see https://github.com/shawnlaffan/biodiverse/issues




For more details about Biodiverse, see http://purl.org/biodiverse


To see what Biodiverse has been used for, see https://purl.org/biodiverse/wiki/PublicationsList

You can also join the Biodiverse-users mailing list at http://groups.google.com/group/Biodiverse-users or follow the google plus page: https://plus.google.com/+BiodiverseSoftware



Cluster analyses - export coloured GeoTIFF of the map to match the tree branch colours

One of the features of Biodiverse that gets the most positive feedback is the ability to colour the branches of a cluster dendrogram and then have the map show the same colours.  This makes it very easy to see where a selected cluster is, and how the clusters map spatially,

Up until now, however, it has been difficult to replicate the display without resorting to screenshots and their attendant issues with resolution (and sometimes apparent JPEG compression).

While one has been able to export the colours with the tree since version 2 was released, there was no option to match the spatial data.  With the release of version 2.1, that is now an option for the Nexus tree exports - there is a new option to export a geotiff file with associated colourmaps that can be used in GIS software.

Some images will show it best.  Below are steps to export the data and then to display it in ArcMap or in QGIS.

Note that the process works for continuous as well as discrete colour schemes.  Whatever colours were last displayed are what will be used.

This is also the process used to generate Figure 6 in Link-Pérez & Laffan (in press).

Exporting the data



An example cluster dendrogram with the associated spatial data.  It is easy to see the spatial distribution of the various clusters.  

The Nexus format is needed to export the coloured tree and geotiff, 

Be sure to select the geotiff option to get the spatial data.  If you want the tree colours as well then check that option too ("Export colours").


Displaying it in ArcMap

You need only add the raster to a data frame, as ArcMap looks for the colourmap automatically (as does ArcGIS Pro if you are using that).

ArcGIS will automatically see the colourmap file and display the colours as they were in Biodiverse.  

You can use FigTree to export the coloured tree to a PNG.  

...and display this tree in an ArcMap layout.  Note that the resolution can be an issue and you also have to convert the background to be white instead of transparent, but it often works well enough and images can be resampled to a higher resolution using most editing packages.  


Displaying it in QGIS

The QGIS process involves some manual loading of files via the layer properties dialogue.


In the Style section, set the Render type to be Singleband pseudocolor, then choose the folder icon to "load color map from file".  For a nexus file called example.nex, the colourmap file will be called example.nex.txt.   

And there it is, a display with the same colours as in Biodiverse.  



Shawn Laffan
23-Aug-2018


For more details about Biodiverse, see http://purl.org/biodiverse 


To see what Biodiverse has been used for, see https://purl.org/biodiverse/wiki/PublicationsList


You can also join the Biodiverse-users mailing list at http://groups.google.com/group/Biodiverse-users or follow the google plus page:  https://plus.google.com/+BiodiverseSoftware