http://nature.com/cgi-taf/DynaPage.taf/… n6884/full/417017a_fs.html Commentary Nature 417, 17 - 19 (02 May 2002); doi:///10.1038/417017a Challenges for taxonomy H. CHARLES J. GODFRAY H. Charles J. Godfray is at the NERC Centre for Population Biology, Department of Biological Sciences, Imperial College at Silwood Park, Ascot, Berkshire SL5 7PY, UK. The discipline will have to reinvent itself if it is to survive and flourish. Taxonomy, the classification of living things, has its origins in ancient Greece and in its modern form dates back nearly 250 years, to when Linnaeus introduced the binomial classification still used today. Linnaeus, of course, hugely underestimated the number of plants and animals on Earth. As subsequent workers began to describe more and more species, often in ignorance of each others' work, the resulting confusion and chaos threatened to destroy the whole enterprise while still in its infancy. In today's jargon, we might call this the first bioinformatics crisis. Using the tools then available, nineteenth-century taxonomists solved this crisis in a brilliant way that has served the subject well since then. They invented a complex set of rules that determine how a species should be named and associated with a type specimen; how generic and higher taxonomic categories should be handled; and how conflicts over the application of names should be resolved. All these rules revolved around publications in books and scientific journals, and their descendants form the current codes of zoological and biological nomenclature. But today much of taxonomy is perceived to be facing a new crisis — a lack of prestige and resources that is crippling the continuing cataloguing of biodiversity. In the United Kingdom, a Parliamentary Select Committee is currently conducting an enquiry into the health of the subject for the second time in 10 years, and similar concerns are being expressed around the world. In this article I shall first explore why descriptive taxonomy is in such straits (in contrast, its sister subject, phylogenetic taxonomy, is flourishing). Then, after this essentially negative exercise, I will argue that taxonomy can prosper again, but only if it reinvents itself as a twenty-first-century information science. It needs to adopt some of the solutions that molecular biologists have developed to cope with the second bioinformatics crisis: the huge explosion of sequence, genomic, proteomic and other molecular data. The problem Why can't descriptive taxonomy attract large-scale funds in the same way as other big programmes like the Human Genome Project or the Sloan Digital Sky Survey? All three projects are enabling science: not in themselves generating new ideas or testing hypotheses, but allowing many new areas of research to be opened up. One reason is that taxonomists lack clearly achievable goals that are both realistic and relevant. Of course it would be great to describe every species of organism on Earth, but we are still monumentally uncertain as to how many species there are (probably somewhere between 4 million and 10 million); this goal is just not realistic at present. There are various projects aimed at listing, for example, all the valid described species of animal in Europe, or butterflies on Earth (see Box 1). These aims are eminently achievable and very worthwhile, but the results are like raw, unannotated DNA sequences: unexciting and of relatively little value in themselves to non-specialists. Taxonomists need to agree on deliverable projects that will receive wide support across the biological and environmental sciences, and attract public interest. A second problem is part of the legacy of more than 200 years of systematics. Many taxonomists spend most of their career trying to interpret the work of nineteenth-century systematicists: deconstructing their often inadequate published descriptions, or scouring the world's museums for type material that is often in very poor condition. A depressing fraction of published systematic research concerns these issues. In some taxonomic groups the past acts as a dead weight on the subject, the complex synonymy and scattered type material deterring anyone from attempting a modern revision. As Frank-Thorsten Krell pointed out in Correspondence (Nature 415, 957; 2002), "original descriptions have to be referred to for ever, independent of the paper's quality". The problems do not always lie in the past. Even today, many species are being described poorly in isolated publications, with no attempt to relate a new taxon to existing species and classifications. Many of these 'new' species will have been described before, so sorting out the mess will be the headache of the next generation of taxonomists. It is not surprising if funding bodies view much of what taxonomists do as poor value for money. One of the astonishing things about being a scientist at this particular time in history is the vast amount of information that is available, essentially free, via one's desktop computer. I can download the sequences of millions of genes, the positions of countless stars. Yet, with a few wonderful exceptions, the quantity of taxonomic information available on the web is pitiful, and what is present (typically simple lists) is of little use to non-taxonomists. But surely taxonomy is made for the web: it is an information-rich subject, often requiring copious illustrations. At present, the output of much taxonomy is expensive printed monographs, or papers in low-circulation journals available only in specialized libraries. These are not attractive 'deliverables' for major research funders. Two models of taxonomy The taxonomy of a group of organisms does not reside in a single publication or a single institution, but instead is an ill-defined integral of the accumulated literature on that group. The literature is bound together and cross-references itself using the venerable rules of taxonomy encapsulated in the codes. But this is not the only way to organize a taxonomy. The taxonomy of a particular group could reside in one place and be administered by a single organization. It could be self-contained and require reference to no other sources. My main argument is that to address the problems outlined above, and for taxonomy to flourish now and in the future, it has to move from the first to the second model: from having a distributed to a unitary organization. Such a massive task could only be accomplished group by group, as resources became available. I believe a number of things would then follow. First, the only logical way to organize a unitary taxonomy and to make it widely available is on the web. The web is currently used, if used at all, as an adjunct to the distributed, printed taxonomy, but I think it should replace it. Second, the core of taxonomy is a description of each species and a means of distinguishing among them; to this core has been added the exercise of resolving their evolutionary relationships. I believe that taxonomy needs to expand to include other aspects of the species' biology, to become an information science that curates our accumulated knowledge of that species in the way a gene annotation in a genome database organizes our knowledge of a particular protein. Third, I think it is essential that the unitary taxonomy of different groups evolves from the present taxonomy. We must preserve the achievements of 250 years of distributed taxonomy, dispensing with the bad legacy of the past but retaining the good. To illustrate how this could be done I shall sketch one possible way a unitary taxonomy might be achieved. I am not a professional taxonomist and am under no illusion that what follows will be the best or even a viable model, but I hope it will bring out the issues involved. A unitary taxonomy Introduce as a formal taxonomic procedure the 'first web revision'. This would be a revision of a major group of organisms to a standard decided on by the International Commission on Zoological Nomenclature, or the International Botanical Congress, or equivalent body (let's just call it the international committee). The revision would include a traditional description of each taxon and the location of type material. It might also include material not currently required in a formal description, for example keys and, for many groups, photographs or other illustrations. For some organisms a gene sequence might be required. It would also include a treatment of existing known synonyms to preserve contact with the older literature. This draft first web revision would be placed on the web for comments from the community, then after changes have been made in response, it would become the unitary taxonomy of the group. What would this mean? First, from this time onwards all future work on the group need refer only to the set of species in the first web revision and then later to those in the 'nth (that is, current) web revision'. The taxonomy of the group is thus at a stroke liberated from nineteenth-century descriptions and potentially undiscovered synonyms. If I think I have discovered a new species I need only to check that it is not already in the web revision. So what happens if I describe a new species and then someone discovers that Linnaeus or someone had already described it in an overlooked work? Well, that interesting nugget of historical information can be added to the species' web page, but the name doesn't change. What happens if I want to lump, split or add species, or revise their higher classification? Then I submit a revision that is mounted on the web for refereeing and comment. If, as a result, it is accepted, it becomes incorporated into the current (n+1th) web revision. At any one time there is just a single current web revision to which people refer, linked to all previous revisions (which are maintained on the web, so that in future I can easily see what was understood by species x in year y). A major difference between this way of doing taxonomy and the status quo is that a unitary taxonomy needs administration: both the physical implementation on servers and networks, and the intellectual administration of the current web revision. One virtue of the present system is that if no one is interested in a group's taxonomy it can quietly slumber in the library. But the collections and type material that underpin distributed taxonomies do require administration, which is currently undertaken by our great museums and herbaria. Nearly all these organizations are enthusiastically embracing modern web technologies. Hosting web revisions is something I see as a logical extension of their moves towards becoming, in part, modern information storehouses. It is absolutely clear, however, that they need more money in order to do this. They might also undertake the intellectual administration of the web revision — the refereeing and editing — although they would probably devolve this to committees drawn from a wider constituency (the equivalent of a journal's editorial board). However it worked, standards would need to be set and monitored by the international committee, who would also determine which institute houses which taxonomy, and would prevent duplication of effort. Advantages I believe that what I have described is evolutionary rather than revolutionary in that it preserves the hard-won successes of current taxonomy while dispensing with the historical baggage. It is also evolutionary in that groups would move to the new unitary taxonomy as resources became available. It would set a series of achievable targets that could be used to spur major funding initiatives, for example the first web revision of mosquitoes, reptiles or plants (and I hope Nature or Science might celebrate these milestones as they do completed genome sequences). I believe that major government and private research funders would consider construction and maintenance of a unitary taxonomy — universally accessible, and the foundation of all future work on the group — much more attractive to support than taxonomy as presently practised. It might also attract new sources of funding. It surely isn't impossible that a major company might sponsor the web revision of, say, the Lepidoptera (butterflies and moths); and if it wants to put its logo on the site, then why not? The web revision would become an information hub, both through its contents and through its links to other sites. Links to molecular databases will facilitate the increasing usefulness of molecular techniques in species identification. There are already exciting web-based phylogenetic projects (see Box 1) that aim ultimately to build a phylogeny of all living organisms; clearly, one would build in reciprocal links to these sites. Today, a reference to a species in a scientific article usually gives just the scientific name and possibly the authority, but seldom refers (or gives credit) to the taxonomic revision upon which the identification is based. As increasing numbers of journals go electronic, the mention of a species can more and more easily be linked to its position in the current web revision. Were the status of the species to change, the link would take you to the contemporary web revision and then forward to the current conception of the taxon. These links could also be used to produce a much-needed, fair 'citation count' for taxonomists. Finally, as an increasing amount of the scientific literature becomes available online through projects such as JSTOR (http://www.jstor.org/), one can imagine links between a species description and important early papers on its taxonomy and biology, again maintaining links with the good legacy of distributed taxonomy. Many taxonomic works are very hard for non-specialists to use, sometimes because of real difficulties in telling many species apart, but more often because of the telegraphic jargon and lack of illustration imposed on taxonomists by the expense of publication in print. The web has far fewer constraints, and provides the space needed for taxonomists to be understood. Taxonomy often pays insufficient attention to its 'end users', the ecologists, conservationists, pest managers and amateur naturalists who need or want to identify animals and plants. I hope that, overlaid on the current web revision, there would be higher-level information, the equivalent of the regional field guides and floras used by field workers. For many, this 'entry level' would be all that is required, but where needed the user could burrow deeper, right through to the primary taxonomic sources. Today, few people would seriously think about taking a computer into the field as a substitute for a field guide, but that will undoubtedly change and taxonomists should be ready. Finally, the taxonomy should be available free (without access charges) to anyone who can log onto the Internet. This will raise the profile of taxonomy and increase the number of people who actually use the fruits of taxonomic research. Longer-term positive benefits will be for a new, young generation of naturalists, stalking their prey using digital cameras, downloading their captures into PCs, then identifying them over the web — exposing them to taxonomy as an active discipline, at the heart of modern biology. Disadvantages One disadvantage of a unitary taxonomy is the requirement for more administration, with its attendant costs. My assertion is that the advantages of a unitary taxonomy will prime sufficient new funds to counterbalance this, but if I'm wrong the project fails. There are also considerable technological challenges in developing the web software to support the taxonomies. A possible criticism is that the proposal is top-down, at variance with the individualistic tradition of taxonomy. Would one clique be able to impose its view of how a group is classified? The international committee would be empowered to set standards, but rejected contributions to a group's taxonomy should also be stored on the web. Even if they are not incorporated in the current web revision they can at least influence future scholarship and research. An important issue is the degree to which a treatment should be 'complete' before it is a candidate for a first web revision. Could a series of intractable species complexes requiring detailed research delay completion of a revision? The ideal solution would be to commission new taxonomic research to sort out these problems, but if this is not possible I would favour a category of 'provisional taxon', where the need for further study is clearly highlighted. After all, the heterochromatin-rich gaps in the human genome sequence did not delay the announcement of its 'completion'. Is a web-based taxonomy as permanent as a paper-based one, and are people without computers disenfranchised, especially those in less wealthy countries? I believe the first is a non-issue; there is not (as far as I know) a paper back-up to the human genome database, and the international committee would set rigid standards for archiving and backup. Access is a much more important matter, but very many more people are at present disenfranchised by their inability to get to a specialist library, or to order a reprint, or even by being unaware that certain literature exists. The web-based taxonomy must be completely downloadable so that even continuous access to the Internet is not essential, and, if all else fails, a paper copy could be printed. It might spread the geographical distribution of taxonomic activity if some sites were hosted by developing countries with strengths in computing, such as India. Conclusions I find that the commonest reaction of taxonomists to these ideas is the worry that it is an attempted technological fix that distracts attention from what they (and I) perceive to be the overwhelmingly critical issue — the lack of people and resources devoted to descriptive taxonomy. The counter-argument is that the technological fix is not an end in itself; it is the means of making grassroots taxonomy more accessible and useful, and thus attracting people and funds into the field. But is such a root-and-branch change in the culture of taxonomy really needed? Although there is near-universal agreement about the current depressed state of descriptive taxonomy, wouldn't more funding alone solve the problem? I think not: indeed, descriptive taxonomy might disappear completely for 'difficult' groups such as many insects and nematodes. Just as Moore's law says that microprocessor power doubles every 18 months, there must be a parallel law that says DNA sequencing power increases geometrically. In 10 or 20 years' time it will be simpler to take an individual organism and get enough sequence data to assign it to a 'sequence cluster' (equivalent to species) than to key it down using traditional methods, let alone describe it as new. Just as bacterial taxonomy is now nearly all sequence-based, a new way of classifying insects, nematodes and perhaps even many plants and fish might evolve that is totally divorced from current taxonomy — a point also made forcibly by Robert May, president of Britain's Royal Society. Would the death of large swathes of present-day systematics matter? Yes it would, because we would be throwing away so much of what we have learned in the past 250 years about the planet's biota, a lot of which we would then have to relearn. But unless taxonomy is unitary, web-based and able to accommodate these radical new ways of doing biology, I fear it will be sidelined. The rigidity built into the current rules and codes of taxonomy — which include prohibition of purely electronic description — is part of their success, and changes should not be made lightly. But I suspect these rules are now a brake on progress, imprisoning the subject in outdated methodologies, and rendering it difficult or impossible to attract the major funds needed to reverse its slow decline. Surely it is time to experiment — time for the international taxonomic community to come together and countenance a unitary web revision of one or a few major groups of organisms (and to work out exactly how a unitary taxonomy should operate). This venture must be sanctioned and supported by the existing international committees, or no serious taxonomist will waste his or her time on it; no institution will administer it; and no agency will fund it. If successful, it will change how taxonomy is done for ever; if it fails it would not be difficult to revert to the status quo ante. There is everything to gain and little to lose. Acknowledgements. I am grateful to the many taxonomists and other biologists who have debated these issues with me. ---------------------------------------------------------------- Box 1: http://nature.com/nature/journal/… Taxonomy on the web The current codes of zoological and botanical nomenclature do not allow original descriptions to be made purely on the web, but nevertheless there is a substantial amount of taxonomy on the Internet. The Natural History Portal of the Natural History Museum in London (http://www.nhm.ac.uk/portal/index.html) provides an excellent entry into these resources, which include such sites as the International Plant Name Index (http://www.ipni.org/) that covers all higher plants; the ant database (http://www.antbase.org/) featured recently in Nature's News section (416, 115; 2002); and the Tree of Life project (http://tolweb.org/tree/), a database of phylogenies. The most common data available are catalogues of species names and lists of museum specimens, although some identification keys and other information-rich sites are becoming available. An ambitious project led by Species 2000 (http://www.sp2000.org/) and the Integrated Taxonomic Information System (http://www.itis.usda.gov/) aims to catalogue the world's biota, and these sites themselves also link to the Global Biodiversity Information Facility (http://www.gbif.org/), intended to be a general clearing house for biodiversity information. Finally, the All Species Foundation (http://www.all-species.org/) has set itself the goal of making an inventory of all species on Earth in the next 25 years. ---------------------------------------------------------------- © 2002 Nature Publishing Group