Vesper: Visualising Species Archives

Graham, M., Kennedy, J. (2014). Vesper: Visualising Species Archives. Ecological Informatics, 24, (), 132-147.


ISBN:
ISSN: 1574-9541

Abstract

Vesper (Visual Exploration of SPEcies-referenced Repositories) is a tool that visualises Darwin Core Archive (DwC-A) datasets, and is aimed at reducing the amount of time and effort expended by biologists to ascertain the quality of data they are generating or using. Currently, DwC-A quality checking is limited to table outputs of data ‘existence’ and compliance with DwC-A format guidelines via the online DwC-A archive validator and reader. While these tools thoroughly examine the presence of data, and the correctness of data structure against the DwC-A schema, they do not give any insight into the underlying quality of the data itself.

Built on top of the D3 JavaScript library, Vesper analyses and displays DwC-A datasets in three fundamental dimensions - taxonomic, geographic and temporal - with a visualisation dedicated to each of these aspects of the data. By viewing a dataset’s composition in these dimensions, a data consumer can judge whether it is suitable for the tasks or analyses they have in mind, while a data provider can identify where a dataset they’ve constructed may fall short in terms of data quality i.e. does it contains data that is obviously incorrect such as the classic longitude inversion that places North American specimens in China. A further visualisation of the taxonomic dimension can reveal the subtaxa distribution of reference taxonomies - while a simple table reveals the presence or not of certain data types for each record to give an overall data ‘existence’ profile for the dataset. Selections of parts of a dataset within one visualisation are linked to the other visualisation displays for that dataset, permitting the discovery of whether data quality issues are restricted to identifiable sub-portions of the dataset.

Vesper can handle client-side data sets of a million entities within a browser by judicious use of data filtering, as many of the data types within individual records are not necessary to judge the geographic, temporal or taxonomic distribution and extent of a dataset. Thus, many of the more verbose fields in the file can simply be passed over during an initial data decompression stage. Furthermore it can provide limited name and structure matching of a dataset against DwC-A packaged reference taxonomies to indicate data quality relative to sources outside the archive. A selection of annotated example scenarios shows how Vesper can reveal data quality issues in DwC-A archives.
[Read More]

Authors

Jessie Kennedy
Dean of Research and Innovation
j.kennedy@napier.ac.uk
+44 131 455 2772

Areas of Expertise

Information Visualisation
Information visualisation is the use of enhanced Graphical User Interfaces (GUIs) to communicate and interact with complex data sets such as social networks, multiattribute tables, and financial data. Pages of text and numbers are not the most effective way to communicate or understand information.