The overall work plan is described by the figure. Taxonomists producing traditional manuscripts can send electronic copies of the final (i.e. accepted for publication) manuscript which will be marked up by the WP6 team into an XML format. This file will then be mined for taxonomic information that will be saved into a data warehouse. Note that the XML repository is not directly accessible to users and the manuscripts cannot be reconstructed later, so are not being published on the web.
The data warehouse will hold collated and structured data which will be accessible via a web site, either independently or as part of the scratchpads. The data warehouse will also support an export facility (a web services client) for machine-machine communication. By this route the information in the warehouse will be accessible to the tools being built by WP5 in the cyber-platform.
The output from the cyber-platform will be in the form of either refinements and corrections to the data warehouse or will be new revisionary work that will go through some form of quality control (e.g. peer review for traditional publication) before being formalised for capture through the XML process, thus competing the cycle.
Other database servers, e.g. GBIF, are expected to be granted access to the XML repository for the purpose of data mining. Other public databases, including GBIF, will be used as sources of information to assist in the process of analysing the XML documents and importing the data into the warehouse. This is part of the quality control mechanism that is seeking to normalise the data (i.e. to remove ambiguous data elements).