International standards and resources

The International Union of Pure and Applied Chemistry (IUPAC) and the International Union of Biochemistry and Molecular Biology (IUBMB) have established the IUPAC–IUBMB Joint Commission on Biochemical Nomenclature(Opens in a new tab/window), and the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB) to oversee nomenclature for biochemistry (the ‘White Book(Opens in a new tab/window)’), including hormones and enzymes.

The Society for In Vitro Biology provides some advice on terminology associated with cell, tissue and organ culture, molecular biology and molecular genetics(Opens in a new tab/window).

There are no international standards for nucleic acid terminology, but the Human Genome Variation Society (HGVS) has been the principal proponent for a standardised nomenclature system to describe specific variants within human genes. It has published recommendations(Opens in a new tab/window).

Australian conventions and resources

Australian scientists follow the international standards, including HGVS recommendations.

The following organisations provide general information on biotechnology:

Reminder. See Acronyms and initialisms for information on use of well-known abbreviations for biochemical terms.

Proteins

Caution! Protein naming is very complex. Consult a specialist if you are not sure how to present a name!

Apart from the standardised system for naming enzymes (see next section), there is no standardised naming system for proteins. A protein family is a group of evolutionarily related proteins, usually related to a gene family.

Protein families are sometimes grouped together into larger groups called superfamilies, based on structure and function. Other groupings of proteins are clans, folds and divisions. Use lower case (apart from descriptors) for all protein names at all levels of classification:

serum albumin     actin     fibrinogen     flavoprotein-like superfamily     -isoleucine dioxygenase

See also Genes and gene products.

Enzymes

International standards and resources

The Swissprot database ENZYME(Opens in a new tab/window), an enzyme nomenclature database, describes each type of characterised enzyme for which an Enzyme Commission (EC) number has been provided.

The NC-IUBMB provides information on nomenclature of enzymes(Opens in a new tab/window).

ExplorEnz (developed at Trinity College Dublin) provides a portal(Opens in a new tab/window) to the data of the IUBMB enzyme nomenclature list.

Enzymes are constantly being discovered and named. To avoid ambiguity, a nomenclature was developed to standardise the names of enzymes. Each enzyme has 3 identifiers: an EC number, a recommended name and a systematic name. 

The EC numerical nomenclature classifies enzymes based on reactions, identifying groups of enzymes catalysing similar reactions. Seven categories are recognised:

  • EC 1 – oxidoreductases
  • EC 2 – transferases
  • EC 3 – hydrolases
  • EC 4 – lyases
  • EC 5 – isomerases
  • EC 6 – ligases
  • EC 7 – translocases.

Separate EC  and the category number with a space. Additional numbers, separated by full stops, further define the reaction type, and specific metabolites and cofactors involved:

EC 3.5.1.2
3 indicates: hydrolase
5 indicates: acting on carbon-nitrogen bonds, other than peptide bonds
1 indicates: in linear amides
2 gives: glutaminase

The recommended name is usually the one that is in common, everyday use; it is usually formed by adding the suffix -ase to the name of the enzyme’s substrate:

EC number: EC 3.5.1.2
Recommended (or accepted) name: glutaminase

A systematic name is used to prevent ambiguity; it provides a brief chemical description of the reaction it catalyses. It is composed of the name of the substate(s) followed by a word ending in -ase that specifies the type of reaction catalysed:

Systematic name: -glutamine amidohydrolase
Reaction catalysed: -glutamine + H2O <=> -glutamate + NH3

Use the recommended common name assigned by the NC-IUBMB. If possible, also use the EC number.

Amino acids and peptides

In text, use the common (trivial) names for amino acids (e.g. glycine, alanine, tryptophan).

The standard 3-letter symbols (e.g. Gly, Ala, Trp), with an initial capital letter, can be used in tables, figures and peptide sequences. They can also be used to indicate a residue number in a protein sequence (e.g. Ala-87 for the alanine residue at position 87 in a protein sequence). The 1-letter symbols (e.g. G, A, W) can also be used.

Unless otherwise specified, these symbols represent the - configuration of the chiral amino acids – that is, the form that occurs in ‘higher’ organisms.

Peptide (including polypeptide) sequences are usually written using the 3-letter symbols for the amino acids, joined by hyphens:

Lys-Ala-Val-Glu-Ala-Phe

Use the 1-letter symbols to show a comparison between 2 sequences. It is important to use a font (such as Courier New) that uses the same horizontal space for each letter:

GDVEKGKKIFIMKCS          
GFSAGDSKKGAKLFK

Nucleic acids

The abbreviation DNA (deoxyribonucleic acid) is so well known that it is usually only necessary to spell it out in very technical texts. RNA (ribonucleic acid) is less well known and so, depending on the audience and context, may need to be spelt out at first use.

Bases

Four bases comprise DNA: guanine, cytosine, adenine and thymine. These are abbreviated using the capital letters G, C, A and T. In RNA, the thymine (T) is replaced by uracil (U). Base abbreviations are roman.

Nucleotide sequences

By convention, nucleotide sequences are presented in the 5-prime (5′) to 3-prime (3′) direction. For a single sequence, insert a space after every 5 or 10 symbols:

AATGC CGACC TGTTT AACGA CTAAG TTCCC

If it is necessary to indicate the 5′ and 3′ ends, use a hyphen to link these to the sequence:

5′-UAGCU AACCC UUUUA GGGUC-3′

Caution! The single prime symbol (unicode 2032) cannot be replaced with an apostrophe or with a single quotation mark (see Single prime symbol).

When aligning homologous sequences, use a font (such as Courier New) that uses the same horizontal space for each letter and punctuation mark, and insert hyphens to maintain the alignment:

AATGCCGACCTGTTTAACGACTAAGTTCCC
AAGGCGGACTTGTTACA---CTATTTTCCC

Other conventions and abbreviations

  • bp – base pairs. Use for DNA sequences with fewer than 1,000 base pairs:

The length of DNA sequenced is 325 bp.

  • kbp – kilobase pairs (1,000 base pairs); generally shortened to kilobases (kb):

The sequence is 1.2 kb long.

  • Mbp – megabase pairs (1 million base pairs); generally shortened to megabases (Mb). Use for long sequences or to indicate the total number of bases (similarly for gigabase pairs [Gbp] for 1,000 million base pairs):

The human genome contains about 3,000 Mb, or 3 Gb, of DNA.

  • cM – centimorgans. Use to describe the genetic distance between 2 loci on a chromosome. Note the capital M.
  • DNA and RNA descriptors. Use lower-case letters for descriptors of DNA and RNA:

cDNA [complementary]     ssDNA [single-stranded]     dsDNA [double-stranded]     mtDNA [mitochondrial]     rRNA [ribosomal]     mRNA [messenger]     nDNA [nuclear]     tRNA [transfer]

Hormones

Give hormone names in full at first use and then abbreviate, using initial capitals for the generic component of the hormone and lower case for the species designation, if applicable:

follicle stimulating hormone (FSH)     growth hormone (GH)     human growth hormone (hGH)     bovine growth hormone (bGH)