Workshop Organizers: Dr David Roberts (Natural History Museum), Dr Soraya Villalba (Royal Botanic Gardens, Kew), Dr William Baker (Royal Botanic Gardens, Kew), Dr Jane Smith (Natural History Museum), Melinda Trudgen (Royal Botanic Gardens, Kew), Dr Simon Mayo (Royal Botanic Gardens, Kew).
Speakers: Dave Roberts, EDIT WP6 leader; Donat Agosti, taxonomist; Willi Egloff, copyright lawyer; Naomi Korn, IP consultant; Vishwas Chavan, GBIF
Rapporteurs: Vladimir Blagoderov, Irina Brake, Simon Mayo, Eckhard von Raab-Straube, Simon Rycroft, Lisa Walley
Under the paradigm of journal publication, author’s intellectual property (IP) has traditionally been respected by citation and copyright has been handled by publishers. Few authors, however, have understood these procedures in detail. Moving taxonomy onto Web 2.0 demands that we clarify these relationships and presents entirely new problems in managing authors’ rights. Following the EDIT Forward Scoping Group Report (Taxonomy in Europe in the 21st Century ) we expect that taxonomy will move from an “artisanal” to an “industrial” scale of production, which means that many individuals will deliver components to be assembled into a finished treatment.
This context lends urgency to finding a solution for these problems and the workshop aims to identify those issues that may inhibit the progress of using web tools to facilitate collaborative working, i.e. the potential Intellectual Property (IP) obstacles to achieving EDIT’s goals.
Traditional taxonomic procedures have produced more than 90 million printed pages of information covering descriptions and the relationships between organisms. The rate of production has followed a power law, so by far the greater part of this is covered by copyright and only a tiny fraction is electronically accessible. There is no central index to this information and it is dispersed into libraries across the world, leading to a generally very poor level of accessibility and contributing directly to the 'taxonomic impediment'.
These data, descriptions and statements of relationship, are factual and are not, therefore, 'original work' in the sense of copyright so there is no legal barrier to the work being abstracted for re-presentation. Copyright in this case only prohibits the reproduction of the work 'as it is'. It is an almost inviolate convention that the original author be credited with the description and relationship statements or parts thereof.
Opinion differs on the extent to which copyright is a major barrier, according to national culture and legal systems. In general some level of risk assessment is necessary and a sensible way forward for EDIT is to formulate guidelines that taxonomists and their institutions can use to implement standard procedures.
Printed publications are structured to be read by humans not machines, so technical barriers exist to solving the access problem. XML schemas, such as TaxonX and TaXMLit, can be applied to human-readable text to indicate to computers which data element is which, for instance what is a taxon name, what is a taxonomic authority and what is a description. Such mark-up makes the text available to computer searching and retrieval. Tools to add these marks to the text are being made available, such as the GoldenGATE project.
There is no reasonable doubt that large homogeneous data sets are of greater value in modern science than small heterogeneous sets and these tools present the potential to create a very large, searchable information source directly from the existing literature. Agosti pointed out that an NLM/taxonx module is in production, and PLOS-One, BMC and Zootaxa are working on implementing taxonomic specific mark-up into their production XML. Technical constraints mean that taxon-xml might be added as additional material and not integral to the publication.
Technical barriers are the least likely to be problematic in the medium and long term since there is so much innovation under way in the new collaborative Web 2.0 environment. Taxonomic institutions rather than individual taxonomists will have to bear the burden of adopting and maintaining technical infrastructures to make a global “Taxonomic Grid” feasible. This is a political and financial problem which a consortium like EDIT could help to resolve through concerted action, including major grant proposals.
Credit is used here to mean acknowledging the work someone has put into a resource, specifically here a piece of taxonomic information. The conventional way to give credit is by citation of the original work in subsequent derivative works. Traditionally both the original work and the derivative work would be published on paper and the citation index has been adopted as a convenient metric of credit, both for the author and their employer.
An immediate consequence of citing a web page is that the cited version should not change. In a dynamic web site designed to reflect current opinions, as envisaged in the EDIT project, this means that pages need to be archived so that following the citation leads the reader back to the same page that the person quoting the citation actually saw.
The clear identification of data contributors is a statement of responsibility and lends authority to a data set or taxonomic statement and it generates kudos for the authors. We wish to measure kudos because it provides evidence for status and can be used for career advancement.
The measurement of kudos is related to the level of use to which a datum is put. Use can be measured in the number of times a page is read, the number of times it is downloaded, the number of times it is cited, (both in traditional publishing and in other web sites) and the number of times it is used in other derivative works. All of these measures are technically straightforward but are contingent on common standards for making the attribution information accessible to subsequent users and the general adoption of systems to allow "deep citation", as described by Chavan, which would attach metadata to a datum accessible no matter how many times it has been copied and re-used. The image format Exif is an example that can be used as a means of storing machine-readable copyright information and permissions to images mounted on the web. An obvious candidate to hold and resolve such metadata would be LSIDs.
As a matter of principle, EDIT databases are managed repositories and do not own the rights to the data they contain. The ownership of the data remains with the data creator but such data should have a clear licence statement specifying the manner of use to which they may be put.
A clear data use policy is highly desirable to give confidence to data contributors. It is not yet clear how such policy apply to globally available information in the context of national legislation and therefore what level of protection licences really provide.
It is not at all clear how we should manage differently licenced materials within a single web page. If the web page is dynamic, then the licence should be accessible to the software building the page. For internal resources we need to determine whether there are any real conflicts and to determine the desired outcomes for both contributors and their employers. These issues are newly emerging and few people are in a position to engage in a policy discussion as yet. It will be of interest to find out whether a conflict exists between the desires of contributors and their employers. For external content, it is simply not feasible to expect the software engineer to have sought permission to use all resources that a dynamic page might recover.
There are different contractual issues relating to institutional contributions and potentially third parties, such as funding bodies. How is this represented downstream: ownership? control of use? attribution?
We look forward to receiving requests for commercial use of taxonomic data and having to deal with royalty payments! Current practice reserves all commercial rights for the data originator although we are not aware of a widespread mechanism to tag data elements with a rights statement.
One of the fundamental arguments about publishing on paper is long-term persistence. As we move to a web-based environment the question of long-term accessibility will need to be addressed. This may be a problem unique to taxonomy which has a need to find and use information from the time of Linneaus onwards.
The principles and priorities of the scientific producers are that credit for authors and originators of data is much more important in science than copyright. Communication of scientific results should prevail over commercial considerations.
There are a number of changes that speakers forecast would either take place or would increase in importance. It would be highly desirable for the Codes to allow e-publications. Social networks are becoming widely accepted and examples are beginning to appear in taxonomy, such as the Scratchpads. Most importantly, taxonomists need to be prepared to share their data in the way that happens in other areas, for instance physics and molecular biology.
The key social barrier is recognition for scientific work made available publicly. Technically, developing new crediting systems poses few problems, but the barrier is the current publication paradigm which is based on hypothesis-driven arguments and a particular, strictly defined citation system. Clear principles and methodologies regarding recognised publication of scientific data through the internet must be enunciated and accepted by the scientific community. Developing a workable peer review system for web-delivered data and publications is an indispensable element.
EDIT will prepare prototype copyright policies for institutions which can be modified to suit local requirements.
The practice of academic research will be the only motivation to change law and build up infrastructure.
This workshop initiated a debate that could not be resolved within a single day. It is clear that the management of taxonomic data within current rights framework is complicated and uncertain. Much as taxonomists would like clear and simple guidance of how to stay within the rules of copyright, this is not possible largely because the copyright framework evolved before the digital revolution and is not keeping pace with current capability and practice.
The reality of the situation is that copyright management is a managerial judgement call. The majority of copyrighted taxonomic material is out of print and therefore not available for sale, so the amount of commercial damage that might be caused by extracting descriptive and nomenclatural information for re-structuring into a web page is negligible. There is, however, at least in the UK, a climate of fear that a breach of rights legislation may have dire consequences, possibly engendered by the strategy adopted by the music industry and the rules around photocopying enforced in libraries. It is not easy to see how this has come about because taxonomists are doing now what they have always done, but using tools that allow them to do it on a rather larger scale and publishing the outcome outside the commercial publishers.
The barriers to moving taxonomy to the web are now seen to be primarily sociological rather than technical. The crucial change that needs to be made is for employers of taxonomists to give career value to those contributing content to taxonomic web services. In this vein, grant-awarding bodies need to support data gathering with greater enthusiasm than they have already shown for infrastructure proposals.
To read Donat Agosti's blog on this meeting, click here
| Donat Agosti | Science Consultant, Switzerland |
| Natasha Ali | Royal Botanic Gardens Kew |
| Anta | MNHN Paris |
| Bill Baker | Royal Botanic Gardens, Kew |
| Alberto Ballerio | |
| Christine Barker | Royal Botanic Gardens, Kew |
| Laurence Bénichou | Muséum national d'Histore naturelle |
| Vladimir Blagoderov | Natural History Museum, London |
| Irina Brake | Natural History Museum, London |
| Vishwas Chavan | GBIF |
| John Dickie | Royal Botanic Gardens Kew |
| Willi Egloff | Advocomplex, Switzerland |
| Régine Fabri | National Botanic Garden of Belgium |
| Thomas Haevermans | Muséum National d'Histoire Naturelle, Paris |
| Anna Haigh | Royal Botanic Gardens Kew |
| Ralf Hand | Botanic Garden and Botanical Museum Berlin-Dahlem |
| Nicole Hanquart | National Botanic Garden of Belgium |
| Craig Hilton-Taylor | IUCN |
| John Jackson | Natural History Museum, London |
| Thomas Janssen | Research Institute Senckenberg |
| Yde de Jong | Zoological Museum Amsterdam |
| Dr Stephen L. Jury | The University of Reading |
| Naomi Korn | Naomi Korn Copyright Consultancy |
| Gwilym Lewis | Royal Botanic Gardens Kew |
| Rien van der Linden | Centraalbureau voor Schimmelcultures (CBS), Utrecht, NL |
| Chris Lyal | The Natural History Museum, London |
| Patricia Malcolm-Tompkins | Royal Botanical Gardens, Kew, London |
| Julien Marmayou | Muséum national d'Histoire naturelle |
| Simon Mayo | Royal Botanic Gardens Kew |
| Dr Gianfranco Novarino | The Natural History Museum, London |
| Rupert Osborn | Royal Botanic Gardens Kew |
| Alan Paton | Royal Botanic Gardens Kew |
| Sarah Phillips | Royal Botanic Gardens, Kew |
| Bob Press | The Natural History Museum, London |
| Eckhard von Raab-Straube | Botanic Garden and Botanical Museum Berlin-Dahlem |
| Dave Roberts | The Natural History Museum, London |
| Simon Rycroft | The Natural History Museum, London |
| Bernard Scaife | The Natural History Museum, London |
| Monika Shaffer-Fehre | Royal Botanic Gardens, Kew |
| Jane Smith | The Natural History Museum, London |
| Vince Smith | Natural History Museum, London |
| Melinda Trudgen | Royal Botanic Gardens Kew |
| Soraya Villalba | Royal Botanic Gardens Kew |
| Lisa Walley | The Natural History Museum, London |
| Julius Welby | The Natural History Museum, London |
| Sue Zmarzty | Royal Botanic Gardens Kew |