Thursday, 13 December 2018

Import polygon and polyline data

The short summary

As of Biodiverse version 3, you can directly import polygon and polyline data from GIS feature data sets.  If you want to try it before v3 is released then it is in the current development release.

The more detailed explanation:

Ever since development started on Biodiverse it has been able to import spatial data as point records from delimited text files (e.g. CSV format).  The ability to import raster data was later added, as well as the capacity to import point records from more sources (spreadsheets and shapefiles).

However, taxon distribution records are also frequently provided in the form of polygon range maps.  One commonly used example of these are the IUCN Red List data, but there are many sources.

The way to import such data using Biodiverse 2.1 and earlier is to process the data outside Biodiverse so they can be represented as points or tables.  This is done by intersecting them with a fishnet of polygons (also called a vector grid) that aligns with the cells that will be used in Biodiverse.  Once intersected, they can be converted to points, or their coordinates added to the attribute tables, using centroid calculations.  This is what was done in López-Aguirre et al. (2018), for example.

The fishnet approach is relatively simple if one is familiar with GIS operations, but is not something that should be done by hand when numerous taxa are to be analysed or different coordinate origins are being tried.  In such cases one can script the process, but for many this can be yet another thing to learn, and not something that is done in a hurry to meet a short deadline.  (Note that scripting is a very useful skill to have, and is portable beyond the language du jour one might first learn).

With some recent changes to the Biodiverse codebase, importation of polygon data is automated and part of the standard Biodiverse data import process.  As an added bonus, polyline data are also supported, so if you have data such as for crustacean presences along stream segments then they can also be imported.

As another bonus, if you have a mix of point, line and polygon data then they can all be imported in one pass, providing they all have the fields or attributes you select.  If not then the system will throw an error.

The set of geometry attributes that are available to select from are :shape_x, :shape_y, :shape_z, :shape_m, :shape_area and :shape_length.  Not all files have all attributes.  Point files do not have a :shape_area or :shape_length, polyline files do not have :shape_length, and polygon files do not have :shape_area.  Many files do not have :shape_z or :shape_m axes - these are for 3D shapefiles or those with time measures.

A worked example

A worked example is probably the best way to show how to use it.  Those familiar with the process of importing data will note that it is almost the same as the current process, which is quite convenient to say the least.  

Some example data.  The polygons have no specific meaning.

As usual, select the data set (or data sets) to import.  Make sure you select Shapefile as the Format.  

This step is identical to the spreadsheet and delimited text imports.

Select the fields or attributes as appropriate.  The attributes that are visible (:shape_x, :shape_m, :shape_length, :shape_area etc) depend on properties of the first file selected. 

And here is the file imported.  There is only one taxon label in this data set, so there is not much more to show, but once imported the data area analysed like any other.

How does it work?

It is essentially an automation of the process described above.

First, a fishnet grid of polygons is generated to match the cell size of the BaseData object being imported into.

There are then two ways of handling the data.

The default approach is to treat the polygon and polyline data as presence-only, so a taxon is recorded as present in any fishnet polygon that its feature data intersect with.  This is by far the fastest approach as the system can stop checking and return true as soon as it finds an intersection.

The second approach is to calculate a new data set that is the intersection of the input data set and the fishnet data set (imagine using the fishnet polygons as a cookie cutter on the taxon polygons).  This process can be substantially slower, as the system must iterate over the polygon or polyline vertices, identify where they intersect, and then cut them as appropriate. However, if you need the additional information then so be it.  That said, this approach is only used if the area or length of the intersecting features are needed, for example they are to be added as group properties or used for the sample counts.

The underlying processing is all done using the GDAL and GEOS libraries, so some of the operations will be familiar to some users as there are interfaces for Python and R, amongst other languages.

Spatial indexes 

Both approaches use spatial indexes to speed up the calculations.  As an example of the difference this makes, one data set used in testing took 9 minutes without the index, and 70 seconds with it.  For comparison, testing for presence only takes a few seconds (with the index).  It is worth noting that spatial indexes have long been used in Biodiverse to speed up processing, albeit using a different approach.

Note that, even with the spatial index, large and complex polygons will take longer to import than simple polygons.  Multipart polygons can also sometimes take longer than single part, especially if the envelope of the features (the bounding rectangle) is very large.  This is because Biodiverse needs to check all the fishnet polygons within the envelope, so if most of the fishnet polygons do not overlap the taxon polygon then most of these checks are redundant.  If there is a need then further optimisations for the above issues can be looked for.

You can also just use the attribute table

There are some occasions when you only want the data from the attribute table.  If you don't use any of the geometry fields (:shape_x, :shape_area etc.) then Biodiverse will treat the table in the same way that it imports delimited text and spreadsheet data.  This means that each record in the table is the same as a row in a spreadsheet or line in a text file.

An example of when this might be useful is if you have data summarised across biomes or other regions and are not interested in analysing the data spatially, e.g. you only want to calculate Phylogenetic Diversity for the biome level assemblages and not at every location in the biome.




Shawn Laffan
10-Dec-2018



For more details about Biodiverse, see http://shawnlaffan.github.io/biodiverse/  

For the full list of changes in the 2.99 series (leading to version 3) see https://github.com/shawnlaffan/biodiverse/wiki/ReleaseNotes#version-299 (for all issues addressed or being targeted to fix for version 2, see https://github.com/shawnlaffan/biodiverse/milestone/15 ).


To see what Biodiverse has been used for, see https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList


You can also join the Biodiverse-users mailing list at http://groups.google.com/group/Biodiverse-users or follow the google plus page:  https://plus.google.com/+BiodiverseSoftware


Thursday, 23 August 2018

What's new in version 2.1?

Version 2.1 has just been released.

It provides a small number of updates and improvements over the version 2.0 release.


Highlights are:
  • GUI
    • The label list in the view labels tab is now correctly updated when multiple labels are deleted. Issue #700
    • The user defined colours in the cluster tab uses a 13 colour palette by default (it was 9). Issue #688
  • Exports
  • Randomisations
    • The structured randomisations are faster for larger data sets. Issue #685
  • Tree trimming
    • Tree trimming has been sped up for large trees. Issue #679
    • The trim trees tool has the option to trim to the last common ancestor, thereby removing a dangling root node. Issue #670

For the full list of issues and changes in the 2.1 release, see https://github.com/shawnlaffan/biodiverse/issues?utf8=%E2%9C%93&q=milestone%3ARelease_2.1+



To see the full list of open issues or to report a bug or enhancement request, see https://github.com/shawnlaffan/biodiverse/issues




For more details about Biodiverse, see http://purl.org/biodiverse


To see what Biodiverse has been used for, see https://purl.org/biodiverse/wiki/PublicationsList

You can also join the Biodiverse-users mailing list at http://groups.google.com/group/Biodiverse-users or follow the google plus page: https://plus.google.com/+BiodiverseSoftware



Cluster analyses - export coloured GeoTIFF of the map to match the tree branch colours

One of the features of Biodiverse that gets the most positive feedback is the ability to colour the branches of a cluster dendrogram and then have the map show the same colours.  This makes it very easy to see where a selected cluster is, and how the clusters map spatially,

Up until now, however, it has been difficult to replicate the display without resorting to screenshots and their attendant issues with resolution (and sometimes apparent JPEG compression).

While one has been able to export the colours with the tree since version 2 was released, there was no option to match the spatial data.  With the release of version 2.1, that is now an option for the Nexus tree exports - there is a new option to export a geotiff file with associated colourmaps that can be used in GIS software.

Some images will show it best.  Below are steps to export the data and then to display it in ArcMap or in QGIS.

Note that the process works for continuous as well as discrete colour schemes.  Whatever colours were last displayed are what will be used.

This is also the process used to generate Figure 6 in Link-Pérez & Laffan (in press).

Exporting the data



An example cluster dendrogram with the associated spatial data.  It is easy to see the spatial distribution of the various clusters.  

The Nexus format is needed to export the coloured tree and geotiff, 

Be sure to select the geotiff option to get the spatial data.  If you want the tree colours as well then check that option too ("Export colours").


Displaying it in ArcMap

You need only add the raster to a data frame, as ArcMap looks for the colourmap automatically (as does ArcGIS Pro if you are using that).

ArcGIS will automatically see the colourmap file and display the colours as they were in Biodiverse.  

You can use FigTree to export the coloured tree to a PNG.  

...and display this tree in an ArcMap layout.  Note that the resolution can be an issue and you also have to convert the background to be white instead of transparent, but it often works well enough and images can be resampled to a higher resolution using most editing packages.  


Displaying it in QGIS

The QGIS process involves some manual loading of files via the layer properties dialogue.


In the Style section, set the Render type to be Singleband pseudocolor, then choose the folder icon to "load color map from file".  For a nexus file called example.nex, the colourmap file will be called example.nex.txt.   

And there it is, a display with the same colours as in Biodiverse.  



Shawn Laffan
23-Aug-2018


For more details about Biodiverse, see http://purl.org/biodiverse 


To see what Biodiverse has been used for, see https://purl.org/biodiverse/wiki/PublicationsList


You can also join the Biodiverse-users mailing list at http://groups.google.com/group/Biodiverse-users or follow the google plus page:  https://plus.google.com/+BiodiverseSoftware


Wednesday, 22 August 2018

Analysing trait data

Biodiverse is probably best known for its ability to link phylogenies to spatial data, but that's only a subset of what it can do.  It can also attach properties to each label, for example species traits like average height, seed type, locomotion method or growth form. This has been available for several versions.


Examples using label properties include colour across birds, butterflies and flowers (Dalrymple et al. 2015, 2018), plant longevity (Zhang et al. 2018), fleshiness (Chen et al. 2017), spinescence (Tindall et al. 2017) and fruit fleshiness (Rossetto et al. 2015).

The analysis of group property data has also proven useful, for example when summarising environmental conditions in bioregionalisation analyses, example González-Orozco et al. (20132014a2014b).  In these cases one assigns a value to each group (cell) to describe, for example, the mean grain size across its area.  At the moment the matching system uses exact matches on element (cell) names, so these need to be set up for the data to be correctly attached.  This can be done by importing the environmental data into Biodiverse, one layer at a time, at the same resolution and origin as the species data, and then exporting to delimited text.  The exported file will then have element names that match exactly when imported into Biodiverse as group properties.

And as a nice example that one does not need to work only with species and cells, the data in Stephenson et al. (2015) represent a spatio-temporal data set of larval herring size classes.  These are first analysed on a per-year basis across all locations, then on a per-location basis across time periods using group properties.  See the supplementary material of that article for more detailed steps. 


So how does one analyse such data using Biodiverse?  The overview is simple - one attaches the data to a BaseData object, then one analyses it.  Examples are below, followed by some other considerations like deletion, but a few concepts need to be given first.

  1. In Biodiverse, trait data are called "properties".  This is a deliberately generic term, as there is no reason why one could not analyse non-biological phenomena using the system.  (And we aim to be generic when developing Biodiverse, after all the the computer only sees numbers - it is the user who defines the analyses and interprets the results).
  2. If you are analysing trait data then you want to assign and analyse Label Property data.  This is because one can also analyse Group Properties (see below).  
  3. Both Labels and Groups are called Elements (also a more generic term), so the relevant calculations are under the "Element Properties" section of the calculation lists.  

Attaching data


This is probably best demonstrated using images of the steps with captions.

If your label property data do not match exactly, then you can use the remapping tools introduced in Version 2.

  
The data need to be in a delimited text (e.g. CSV format) file.  One column should match the element (label or group) name, while any number of others can be properties.  

The menu option is under the Basedata menu.  

One then chooses if the properties to be assigned are for groups or labels.  In this example labels are selected.  
The file selection is the usual process.
Choose the field delimiter (usually a comma) and quote character.   
This is the important bit.  Make sure the column with the element names in it is specified as Input_element. (It does not need to be called ELEMENT).

Make sure that any column containing property data are specified as Property.  Any column with Ignore is ignored by the system.  

Once run, you are told how many labels or groups had properties assigned.  If there were fewer than you would expect then check the column you specified as Input_element contains matching items.  Be careful with quotes, and remember that spreadsheet programs can do odd things with your data when they import them, so use a text editor to be certain.  
Any label properties will now be shown as additional columns in the list in the View Labels tab.

Analysing properties


Analysis of properties follows the usual process.  The example below is for a spatial analysis, but similar selections apply in the cluster and region grower analyses. 

The most important point to understand is that the results are stored as lists, and not as single scalar indices.  This make it easier to organise the results across arbitrarily named properties. 

The property calculations are under the Element Properties groups in the calculations lists.  

Select the list containing the desired indices.  If only property calculations have been chosen then the SPATIAL_RESULTS list will be empty.  (Note that the menu has been cropped by the screenshot in this case).  

Now choose the index to be displayed.  Many of the calculations create indices in the lists that are some combination of the property name as a prefix, hence the repetition of suffixes in this example.  

Be sure to select the desired list when exporting.



Attaching ranges and abundances 


One can also attach the label ranges and abundances as label properties, as per the next two screenshots.  Be aware, though, that these are not dynamically updated.  If you add or delete groups then these will need to be updated (unless, of course, you want the old values).


Label ranges and abundances can be attached as properties

Once attached, label ranges and properties are displayed and treated just like any other property.  

It's a bit of a kludge, but if you want to attach the cell richness and sample counts for groups, then you can transpose the basedata to create a new basedata object, so the old groups are labels and the old labels are now groups.  Then attach the label ranges and abundances (remember the labels in the new object are the groups in the previous object) and transpose the data again to get back to the original structure.  This process works because the way data are stored in Biodiverse can be treated as a matrix, where the rows are the groups and the columns are the labels (consistent with many other related implementations).  The label ranges are the counts of the non-zero column entries, and the group richness scores are the counts of the non-zero row entries.  If one transposes the data then the column summations of the transposed data are on the old rows, and the same applies for the row summations. 

Can I use categories directly?

Not yet.

An implementation detail is that the property data need to be numeric, so if you have nominal classes like "gravity" or "ballistic" for seed dispersal, then you need to code them as one column each, with a value of 1 for when the trait applies, and 0 if not.  If there are unknown values then they can be left as blank and they will be ignored in the analyses.  One day we will handle categorical data.


Can I delete some of the properties after importation?  

Yes, and this has been possible since version 2 was released.

While the property import interface does not yet support column ranges (each column needs to be selected manually, which gets tedious...), one can still import more columns than are needed through inattention or because the process is automated in some way (or both). 

The deletion interface needs work, but is at least functional. There is one tab for label properties, and one for group properties. In either tab, select properties and schedule them to be deleted across all groups or labels, or choose labels and groups (elements) that will have all their property values cleared.   Nothing is actually deleted until the Apply button is pressed, and entries can be deselected.


Property deletion is accessed via the Basedata menu



The current interface needs work, but one can select rows and then schedule those entries for deletion.  In this example, SOMEPROP1 and SOMEPROP2 will be removed from all groups, while labels Genus:sp3 to Genus:sp10 will have all their property values removed.  


It is also not possible to delete properties from a BaseData that contains outputs, even if those outputs do not use the properties.  If there is a need to do so then please raise an issue using the issue tracker.  Until it is supported, you can use the Basedata > Duplicate without outputs menu option to create a new BaseData with the same labels, groups and properties, but without any of the analysis outputs attached.  Then delete the non-required properties. 



Shawn Laffan
22-Aug-2018


For more details about Biodiverse, see http://purl.org/biodiverse 


To see what else Biodiverse has been used for, see https://purl.org/biodiverse/wiki/PublicationsList


You can also join the Biodiverse-users mailing list at http://groups.google.com/group/Biodiverse-users or follow the google plus page:  https://plus.google.com/+BiodiverseSoftware


Monday, 19 February 2018

Version 2.0 | updated Mac version available for download

It turned out there was an issue with the version 2.0 release for Macs, as the build system missed some files.  The end result was that text files could not be imported. 

This has now been fixed. 

If you downloaded Biodiverse for the mac prior to 19-Feb-2018, then please download the updated version.

https://github.com/shawnlaffan/biodiverse/wiki/Downloads



Shawn Laffan
19-Feb-2018


Thursday, 11 January 2018

Publications using Biodiverse in 2017

Now that 2018 has begun, here is a list of publications that used Biodiverse in 2017. 

If you want to see the full list (79 at the time of writing), then go to https://github.com/shawnlaffan/biodiverse/wiki/PublicationsList 

For more details about Biodiverse, see http://purl.org/biodiverse 

Shawn Laffan
11-Jan-2018



Baldwin, B.G., Thornhill, A.H., Freyman, W.A., Ackerly, D.D., Kling, M.M., Morueta-Holme, N. and Mishler, B.D. (2017) Species richness and endemism in the native flora of California. American Journal of Botany, 104, 487-501.

Bui, E.N., Thornhill, A.H., Gonzalez-Orozco, C.E., Knerr, N. and Miller, J.T. (2017) Climate and geochemistry as drivers of eucalypt diversification in Australia. Geobiology, 15, 427-440.

Cassis, G., Laffan, S.W. and Ebach, M.C. (2017) Biodiversity and bioregionalisation perspectives on the historical biogeography of Australia. In Ebach, M.C (ed) Handbook of Australasian Biogeography, ch 1, pp 1-16.

Chen, S-C, Cornwell, W.K., Zhang, H-X and Moles, A.T. (2017) Plants show more flesh in the tropics: variation in fruit type along latitudinal and climatic gradients. Ecography, 14, 531–538.

Di Virgilio, G., Laffan, S.W., Nielsen, S.V. and Chapple, D.G. (2017) Does range-restricted evolutionary history predict extinction risk? A case study in lizards. Journal of Biogeography, 44, 605-614.

Heenan, P.B., Millar, T.R., Smissen, R.D., McGlone, M.S. and Wilton, A.D. (2017) Phylogenetic measures of neo- and palaeo-endemism in the indigenous vascular flora of the New Zealand archipelago. Australian Systematic Botany, 30, 124-133

Millar, T.R., Heenan, P.B., Wilton, A.D., Smissen, R.D. and Breitwieser, I. (2017) Spatial distribution of species, genus and phylogenetic endemism in the vascular flora of New Zealand, and implications for conservation. Australian Systematic Botany, 30, 134-147

Prentice, E., Knerr, N., Schmidt-Lebuhn, A.N., González-Orozco, C.E., Bui, E.N., Laffan, S. and Miller, J.T. (2017) Do soil and climate properties drive biogeography of the Australian proteaceae? Plant and Soil, 417, 317-329.

Santos, A.P.B, Bitencourt, C. and Rapini, A. (2017) Distribution patterns of Kielmeyera (Calophyllaceae): the Rio Doce basin emerges as a confluent area between the northern and southern Atlantic Forest. Neotropical Biodiversity, 3, 1-9.

Scherson, R.A., Thornhill, A.H., Urbina-Casanova, R., Freyman, W.A., Pliscoff, P.A. and Mishler, B.D. (2017) Spatial phylogenetics of the vascular flora of Chile. Molecular Phylogenetics and Evolution, 112, 88-95.

Thornhill, A.H., Baldwin, B.G., Freyman, W.A., Nosratinia, S., Kling, M.M., Morueta-Holme, N., Madsen, T.P., Ackerly, D.D. and Mishler, B.D. (2017) Spatial phylogenetics of the native California flora. BMC Biology 15, 96.

Tindall, M.L., Thomson, F.J., Laffan, S.W., and Moles, A.T. (2017) Is there a latitudinal gradient in the proportion of species with spinescence? Journal of Plant Ecology, 10, 294-300.

Yu, F., Skidmore, A.K., Wang, T., Huang, J., Ma, K and Groen, T.A. (2017) Rhododendron diversity patterns and priority conservation areas in China. Diversity and Distributions, 23, 1143–1156.



Wednesday, 15 November 2017

Biodiverse version 2.0 released

Biodiverse version 2.0 is now available.  It can be accessed from https://github.com/shawnlaffan/biodiverse/wiki/Downloads


The summary of changes since version 1.1 is at https://github.com/shawnlaffan/biodiverse/wiki/ReleaseNotes#version-20

Blog posts describing much of the new functionality can be accessed at http://biodiverse-analysis-software.blogspot.com.au/



The available binary versions work only on 64 bit computers, but windows, linux and mac operating systems are all supported.


Installation instructions are at https://github.com/shawnlaffan/biodiverse/wiki/Installation




Please report any errors or suggested improvements.  You can use the
email list at https://groups.google.com/forum/#!forum/biodiverse-users or the project issue tracker at
https://github.com/shawnlaffan/biodiverse/issues



Shawn Laffan
15-Nov-2017