Saturday 7 May 2016

Biodiverse now includes species richness estimation indices

One of the new features in the upcoming version 2 release of Biodiverse will be the ability to calculate indices of species richness estimation, as well as their associated confidence intervals and useful metadata.

If you are impatient and want to try them now then you can use a development release from the 1.99 series.   https://purl.org/biodiverse/wiki/Downloads#development-release

The indices included are Chao1, Chao2, the Abundance Coverage Estimator (ACE) and the Incidence Coverage Estimator (ICE).  Chao1 and ACE are abundance based and use the label sample counts as the abundances, while Chao2 and ACE are incidence based and use the number of occupied groups (cells) by each label in the sample as the incidences.  More detailed explanations of these indices and references are given in the help pages for the EstimateS software.  That site also includes formulae in its appendices B and C.

Four species richness estimation calculations are now in Biodiverse.  

Example results for the Acacia data set used in Mishler et al. (2014).  

Links with SpadeR and EstimateS

The calculations have been calibrated to match the SpadeR package for R, cross-referencing with EstimateS as needed.  For those wondering about reproducible results, a test driven development approach was used.  In this approach, the results from SpadeR for a given input data set were set as tests in Biodiverse and the Biodiverse code was then checked and updated until it reproduced the expected values.  The tests remain in place so we can readily identify if a change in another part of Biodiverse affects these calculations.

There are different formulae for the Chao variance and confidence intervals when the sample is missing either singletons or doubletons (species with only one or two samples/incidences).  In these cases Biodiverse follows the logic given in the EstimateS documentation.  The CHAO1_META and CHAO2_META result lists record which formulae were used for the estimate (CHAO_FORMULA index), the variance (VARIANCE_FORMULA index) and confidence interval formula (CI_FORMULA index), with the numbers corresponding to those given in the EstimateS documentation.

As with EstimateS, Biodiverse falls back to using Chao1 or Chao2 in cases where ACE or ICE, respectively, cannot be calculated.  These cases are broader than in EstimateS and include
  1. Where all rare species are singletons/uniques.
  2. Where none of the species are singletons/uniques.
  3. Where none of the species are rare/infrequent.

Biodiverse returns an undefined value for the ACE and ICE richness estimates when:
  1. There are no species.
  2. All the samples are uniques/singletons.
  3. (For ICE only) There is only one group in the neighbour sets (this avoids a divide by zero error in the sample size correction).
The calculation of ICE in SpadeR includes a correction for the number of sample units (groups).  In Biodiverse this is calculated as the number of non-empty groups.  For most users this is not an issue since they do not have empty groups but, in cases where there are such groups, all the other indices that use labels will ignore them.  This follows the logic that empties are unsampled, as opposed to sampled but empty.

Does having a second neighbour set affect the results? 

In Biodiverse, all the species richness estimation indices are calculated using the union of the two neighbour sets, so the results will be the same if you have one neighbour set specifying an analysis window (e.g. sp_circle (radius => 500000)) and two neighbour sets that result in the same overall set of groups (e.g. neighbour set 1 is sp_self_only() and neighbour set 2 is sp_circle(radius => 500000)).   (If you are unsure what the term "neighbour set" means in Biodiverse, it is the set of groups used in a calculation.  Usually they are contiguous sets of groups around the processing group, but they can be arbitrarily complex.  More details are given in the spatial conditions help.)

Future changes

The ACE and ICE indices currently use 10 as the threshold for rarity.  In a future version of Biodiverse this will be controllable by the user, but we need to change some of the GUI infrastructure first.  

Lists of which species were rare in the samples can also be returned if there is need.

The improved Chao indices described in Chiu et al. (2014) are also listed for implementation under issue #592.  The Jackknife indices could also be added if there is a need for them.  


I want to try it now

As noted above, these indices will be in the forthcoming version 2 release of Biodiverse, but at the time of writing they are available in the 1.99_002 development release.  https://purl.org/biodiverse/wiki/Downloads#development-release


Shawn Laffan, 07-May-2016


For more details about Biodiverse, see http://purl.org/biodiverse

For the full list of changes in the 2.0 release see https://purl.org/biodiverse/wiki/ReleaseNotes#version-2

To see what else Biodiverse has been used for, see https://purl.org/biodiverse/wiki/PublicationsList

You can also join the Biodiverse-users mailing list at http://groups.google.com/group/Biodiverse-users 


No comments:

Post a Comment

Note: only a member of this blog may post a comment.