PBDB Data Service: Record identifiers and record numbers

PBDB Data Service 1.2 v2 > General documentation > Record identifiers and record numbers

DESCRIPTION

This page describes the syntax and usage of record identifiers in this data service. Every record in the database has a unique identifier, and in most cases the first column of a query result contains the identifier of each record returned. Some of the other fields hold the identifiers of associated records, such as references, parent taxa, authorizers, and so on. These identifiers can in turn be used as parameter values, i.e. to query for associated records or to later fetch the updated state of an individual record.

OLD STYLE NUMERIC IDENTIFIERS

For most of the history of this project, each record in the database was identified by a numeric key unique within its table. The names of these key fields end in _no, as in occurrence_no or taxon_no. If you wish, you can still use these numbers as parameter values in the current version of the data service. Responses expressed in the PBDB vocabulary show these plain numbers by default, for backward compatibility with old downloads.

EXTENDED IDENTIFIER SYNTAX

The problem with these numeric identifiers is that they don't specify what type of object they refer to. So the value 1027 could refer to a taxon, an occurrence, a specimen, or even an image. This is a problem for several different reasons, not least of which is that the data service has no way to warn you if you accidentally cut and paste from the wrong column, and paste in an occurrence number where you should be putting a taxon number.

To solve this and other problems, we have defined a new identifier syntax composed of a record type together with a record number. For example, the taxonomic name "Felidae" has the following identifier:

txn:41045

The record number is the same as it always was, but this identifier specifies that it refers to a taxon and nothing else.

By default, responses in the compact vocabulary (which is used for JSON responses) include these extended identifiers. Lists of references in the RIS format use them as well. You can turn the extended identifiers on or off explicitly in any query by including extids=yes or extids=no.

CONTINUITY WITH CLASSIC

The identifying number of each individual record is exactly the same whether you retrieve the record through this data service or through PBDB Classic. The record numbers from old downloads that you may have kept will still be the same. Each record type has a particular field name in the PBDB vocabulary which is used to hold the identifying number of records of this type. These are listed in the table below.

RECORD TYPES

The database consists of the following types of records:

Type	Example	Classic field name	Description
txn	txn:285777	taxon_no	A taxonomic name record. An identifier of type txn always selects the currently accepted variant of the corresponding taxonomic name.
txn	var:285777	taxon_no	A taxonomic name record. An identifier of type var selects the exact name corresponding to the identifier, whether or not it is the currently accepted variant.
opn	opn:489175	opinion_no	A taxonomic opinion record.
occ	occ:1054042	occurrence_no	A fossil occurrence record. An identifier of type occ selects the most recent identification of the corresponding fossil occurrence.
occ	rei:12664	reid_no	A fossil occurrence record. An identifier of type rei selects a specific identification of the corresponding fossil occurrence, whether or not it is the most recent. To select the original identification of a specified occurrence, add the parameter idtype=orig.
spm	spm:77971	specimen_no	A fossil specimen record.
mea	mea:156352	measurement_no	A fossil measurement record.
col	col:128551	collection_no	A fossil collection record.
int	int:84	interval_no	A geologic time interval record.
tsc	tsc:1	n/a	A geologic time scale record.
clu	clu:305970253	n/a	A geographic summary cluster record.
php	php:1432	n/a	A Phylopic taxon image.
ref	ref:56433	reference_no	A bibliographic reference record.
prs	prs:55	person_no	A database contributor record. Currently, these are not directly retrievable through the data service. However, you can find out the authorizer and enterer of any particular record by adding the output block ent.

RECORD VARIANTS

One of the advantages of the new extended identifier syntax is that we can handle records with multiple variants in a consistant way that is not possible with purely numeric identifiers. For example, some taxonomic names have two or more variants (recombinations, misspellings, etc.). If you specify an identifier of type txn, you will get the currently accepted variant of that taxonomic name. But if you specify an identifier of type var, you will get the variant explicitly identified by that record number. This kind of distinction would be much more clumsy to implement with purely numeric identifiers. A similar situation obtains with occurrences and re-identifications (see above).

SITE PREFIX

In order to facilitate the combining of PBDB data with data from other databases, this data service also accepts identifiers that are prefixed with the name of the website. The following three variants of each identifier are accepted:

txn:41045
pbdb:txn:41045
paleobiodb.org:txn:41045

You can use these as parameter values interchangeably with the basic identifier syntax. This will allow the future definition of multi-site identifier schemes using identifiers that are unique across all of the sites together. But for now, you can ignore this and simply use first of these three forms.

This service is provided by the Paleobiology Database, hosted by the Department of Geoscience at the University of Wisconsin-Madison.

If you have questions about this data service, or wish to report a bug, please contact the database administrator at admin@paleobiodb.org