Monday 29 August 2016

New, more efficient file format

Users of Biodiverse will perhaps be familiar with what is called the "native" format for basedata, trees, matrices and projects.  These are the .bds, .bts, .bms and .bps files that are created when you save these objects.

The reality is that the "native" format is just a serialisation format in which all the various parts of the perl data structures that make up an object (e.g. a tree) are converted to a format that can be written to disk and then re-read at a later date, possibly on another computer.

While the format we have been using (called Storable) is stable and has done a good job over the years, a newer, more efficient format called Sereal is now available.  Version 2 of Biodiverse will use this new format by default.

The main reason for shifting to the Sereal format is efficiency: saving files is faster, and the file sizes are smaller.  See details here: http://blog.booking.com/sereal-a-binary-data-serialization-format.html 

These size and speed improvements will not be very noticeable for small files, but it can all add up when one is working with tens of thousands of groups (e.g. cells) and thousands of labels (e.g. species) across hundreds of spatial and cluster analyses.  A quick experiment with such a data set resulted in a greatly reduced file size (~1.6GB to ~750MB), with the time taken to save to file reducing from 30s to 12s.  The file load times were about the same at ~20s.  (Admittedly this was not a very scientific experiment, but the results were consistent across multiple runs).

What do users need to be aware of?  The main thing is that files created in Biodiverse version 2 will not be backwards compatible.  This means that Biodiverse version 1.1 or earlier will not be able to open files created using version 2 by default.  However, the "save as" dialogues have the option to save to the old format so you can maintain compatibility with older versions if you are in a mixed environment.

Also, any file in the old format that is loaded into Biodiverse version 2 will still be saved using the old format unless the user explicitly saves it to the new format.

If you want to test the new file format then it will be available in the 1.99_004 development release which should be coming out within the next week.


Shawn Laffan
29-Aug-2016


For more details about Biodiverse, see http://purl.org/biodiverse 

For the full list of changes in the 1.99 series (leading to version 2) see https://purl.org/biodiverse/wiki/ReleaseNotes 

To see what else Biodiverse has been used for, see https://purl.org/biodiverse/wiki/PublicationsList


You can also join the Biodiverse-users mailing list at http://groups.google.com/group/Biodiverse-users 

No comments:

Post a Comment

Note: only a member of this blog may post a comment.