A revised nomenclature for transcribed human endogenous retroviral loci
© Mayer et al; licensee BioMed Central Ltd. 2011
Received: 11 February 2011
Accepted: 4 May 2011
Published: 4 May 2011
Skip to main content
© Mayer et al; licensee BioMed Central Ltd. 2011
Received: 11 February 2011
Accepted: 4 May 2011
Published: 4 May 2011
Endogenous retroviruses (ERVs) and ERV-like sequences comprise 8% of the human genome. A hitherto unknown proportion of ERV loci are transcribed and thus contribute to the human transcriptome. A small proportion of these loci encode functional proteins. As the role of ERVs in normal and diseased biological processes is not yet established, transcribed ERV loci are of particular interest. As more transcribed ERV loci are likely to be identified in the near future, the development of a systematic nomenclature is important to ensure that all information on each locus can be easily retrieved.
Here we present a revised nomenclature of transcribed human endogenous retroviral loci that sorts loci into groups based on Repbase classifications. Each symbol is of the format ERV + group symbol + unique number. Group symbols are based on a mixture of Repbase designations and well-supported symbols used in the literature. The presented guidelines will allow newly identified loci to be easily incorporated into the scheme.
The naming system will be employed by the HUGO Gene Nomenclature Committee for naming transcribed human ERV loci. We hope that the system will contribute to clarifying a certain aspect of a sometimes confusing nomenclature for human endogenous retroviruses. The presented system may also be employed for naming transcribed loci of human non-ERV repeat loci.
Human endogenous retroviruses (ERVs) are remnants of infections of former exogenous retroviruses. Proviruses formed by numerous distinct exogenous retroviruses in the germline genome could be inherited by subsequent generations. About 8% of the human genome consists of sequences that are potentially of retroviral origin  and are distributed in about 700,000 different loci. In addition to proviruses, these sequences include solitary long terminal repeats (LTRs), nonretroviral sequences flanked by LTRs that may not be directly derived from infectious retroviruses and sequences similar to LTRs. ERVs and related sequences are thus part of the repetitive portions of the human genome, which comprise about 45% of the human genome mass, including mobile DNA such as L1, Alu and SVA elements.
Detailed analysis of the human genome sequence by wet-lab and bioinformatics approaches resulted in the definition of ERV groups, with the number depending on the methods used for defining groups: 31 groups were defined by Sperber et al. and Blomberg et al., 42 groups were defined by Mager and Medstrand , 30 groups were defined by Gifford and Tristem  and several hundred human ERV and LTR families were defined by Repbase .
Almost all human ERV loci no longer encode former retroviral proteins because of their ancient incorporation into the host genome and thus accumulation of nonsense mutations. Many loci are missing large proviral portions, and most loci have been reduced to so-called solitary LTRs by homologous recombination between proviral LTRs. For more detailed information on human ERVs, we refer interested readers to recent reviews on the topic and the references therein [7–10].
While protein coding capacity is very limited, many human ERV loci still are transcribed and usually are initiated by promoter sequences within the proviral LTRs. Obviously, mutations within LTRs have not yet rendered all LTRs in the human genome defective. In principle, promoters in flanking, non-ERV sequences may also contribute to transcription of those loci. Probably every human tissue and cell type, diseased or not, contains ERV transcripts [11, 12]. More than a single ERV group is usually found transcribed, and patterns of transcribed ERV groups differ between tissue and cell types. Transcription of ERV loci is thus regulated in some way. While expression of ERV sequences has been associated with a number of human diseases, such as germ cell tumours, melanoma and multiple sclerosis, the involvement of ERVs in human diseases remains to be elucidated. On the other side, some ERV loci very likely provide important biological functions, such as the syncytin  and syncytin 2 loci , referred to herein as ERVW-1 and ERVFRD-1, respectively. Other loci harbouring only partial open reading frames, such as a recently characterized HERV-W locus on chromosome Xq22.3  (ERVW-2), may likewise produce partial retroviral proteins with potential biological functions. It is therefore of particular interest which ERV loci actually contribute to the human transcriptome.
Recent studies have identified transcribed ERV loci in normal and diseased human cells and tissues by means of reassigning ERV cDNA sequences to individual loci in the human reference genome sequence, employing characteristic nucleotide differences between individual loci of a regarded ERV group. Many more transcribed ERV loci are likely to be identified in future studies. It is therefore necessary to introduce a nomenclature for transcribed human ERV sequences.
Nomenclature for transcribed human endogenous retrovirus loci
Representative GenBank accession numbers
endogenous retrovirus group K, member 1
endogenous retrovirus group K, member 2
endogenous retrovirus group K, member 3
endogenous retrovirus group K, member 4
c3_C, ERVK4, HERV-K(I)
endogenous retrovirus group K, member 5
endogenous retrovirus group K, member 6
c7_A, ERVK6, HERV-K108, HERV-K(HML-2.HOM), envK2, HERV-K(C7)
FN806837, AY371029, X82271, AF080233
endogenous retrovirus group K, member 7
c1_B, ERVK7, HERV-K102
FN806827, EF153338, S46404, DQ069911
endogenous retrovirus group K, member 8
c8_A, ERVK8, HERV-K115, envK6
endogenous retrovirus group K, member 9
c6_A, HERV-K109, envK4
FN806836, AF080234, AY371030
endogenous retrovirus group K, member 10
FN806835, AF080231, CN345079
endogenous retrovirus group K, member 11
c3_E, N8.4, HML-2A
FN806833, AF080232, AF080229, U87590
endogenous retrovirus group K, member 12
endogenous retrovirus group K, member 13
endogenous retrovirus group K, member 14
endogenous retrovirus group K, member 15
endogenous retrovirus group K, member 16
FN806841, EF543114, U87587
endogenous retrovirus group K, member 17
endogenous retrovirus group K, member 18
endogenous retrovirus group K, member 19
P1.8, HERV-K(C19), envK3
endogenous retrovirus group K, member 20
endogenous retrovirus group K, member 21
endogenous retrovirus group K, member 22
endogenous retrovirus group K, member 23
endogenous retrovirus group K, member 24
FN806848, AU124350, AA580921, AW812040
endogenous retrovirus group K, member 25
FN806843, CF227268, AW818206
endogenous retrovirus group K3, member 1
AK054868, BC010118, BC011670
endogenous retrovirus group K3, member 2
AK027828, AK096726, CR591084
endogenous retrovirus group K3, member 3
endogenous retrovirus group K3, member 4
endogenous retrovirus group K3, member 5
endogenous retrovirus group K3, member 6
endogenous retrovirus group K3, member 7
endogenous retrovirus group K3, member 8
endogenous retrovirus group FRD, member 1
HERV-FRD, envFRD, ERVFRDE1, syncytin 2
AK075092, AK123938, AY358244
endogenous retrovirus group FRD, member 2
endogenous retrovirus group 3, member 1
endogenous retrovirus group 3, member 2
endogenous retrovirus group PABLB, member 1
BQ012865, CF529244, AI189490
endogenous retrovirus group FC1, member 1
endogenous retrovirus group W, member 1
ERVWE1, syncytin 1, enverin, envW, HERV-W-ENV, HERV-7q, HERV7Q
BG012022, AF208161, BX391741, BX365066
endogenous retrovirus group W, member 2
endogenous retrovirus group W, member 3
endogenous retrovirus group W, member 4
endogenous retrovirus group W, member 5
endogenous retrovirus group W, member 6
endogenous retrovirus group S71, member 1
CN288807, BQ932595, BQ941761
endogenous retrovirus group S71, member 2
endogenous retrovirus group FH21, member 1
endogenous retrovirus group 48, member 1
C21orf105, HERV-F (type b)
endogenous retrovirus group E, member 1
endogenous retrovirus group E, member 2
endogenous retrovirus group E, member 3
endogenous retrovirus group E, member 4
endogenous retrovirus group V, member 1
AK056776, BC104018, BC104019
endogenous retrovirus group V, member 2
AI434519, CA417098, DA863698
endogenous retrovirus group I, member 1
AK124340, AK124077, CR614956
endogenous retrovirus group 18, member 1
endogenous retrovirus group MER61, member 1
endogenous retrovirus group H, member 1
endogenous retrovirus group H, member 2
endogenous retrovirus group H, member 3
endogenous retrovirus group H, member 4
endogenous retrovirus group H, member 5
endogenous retrovirus group H, member 6
endogenous retrovirus group H, member 7
endogenous retrovirus group 9, member 1
X15673, X15675, X57147
The Human Genome Organisation (HUGO) Gene Nomenclature Committee (HGNC) works under the auspices of HUGO and is the only worldwide authority that assigns standardised nomenclature to human genes . The HGNC has previously focused on approving nomenclature for protein-coding genes, pseudogenes, phenotypes and noncoding RNA. In the past, the committee has approved symbols for specific human ERVs only at the request of individual researchers. The symbols did not follow a systematic nomenclature: some symbols were of a simple format (for example, ERV1), some provided information on the group to which the ERV belonged (for example, ERVK2) and others included information on proteins encoded by the ERV (for example, ERVWE1 (endogenous retroviral family W, env(C7), member 1)). On reviewing the literature, it was clear that (1) many of the most frequently published loci were not represented by HGNC symbols, (2) by following more than one system, HGNC symbols were not serving the community, and (3) the nomenclature needed both updating and expansion.
HGNC editors curate relevant information for each gene that has approved nomenclature. In addition to approving a gene symbol and name for each transcribed human ERV, the HGNC records all known symbol aliases so that information on each gene can be retrieved using any known symbol. HGNC entries also include the chromosomal location of the ERV locus, links to GenBank, European Molecular Biology Laboratory (EMBL) and DNA Databank of Japan (DDBJ) sequence records and links to at least one PubMed reference. Where appropriate, links are also provided to annotation projects at both the genomic and proteomic levels. HGNC names are propagated to other major biological databases, such as Ensembl, UniProt and Entrez Gene. Therefore, this new nomenclature will provide a useful resource that is currently unavailable to the ERV community and other researchers concerned with ERVs.
The primary definition of a gene used by the HGNC is "a DNA segment that contributes to phenotype/function" . It is beyond the scope of this nomenclature effort to standardise the nomenclature of ERVs in general or to attempt to name every ERV element in the genome. As discussed above, there is evidence that some human ERVs encode functional proteins and that some encode transcripts and/or proteins which may be associated with disease, so the transcriptionally active loci come under the remit of the HGNC for naming. This category of ERVs represents most of the individual loci that have been published with individual names, so it is worth developing a standardised nomenclature for this subset. The three criteria for being accepted as a transcriptionally active ERV are as follows: (1) The ERV must be represented by an mRNA sequence in a public database, (2) the reported cDNA sequence must map unambiguously to the reference genome to allow identification and (3) the sequence must represent a viral gene rather than solely a solitary LTR. We acknowledge that there are sources of uncertainty. Many ERVs may be expressed at a low level , a "leakage" which can be hard to distinguish from perhaps more significant expression. Groups of recently integrated ERVs may be highly expressed, but their transcripts may be identical or almost identical and could be hard to map unambiguously. However, these difficulties should not prevent the naming of ERV loci which fulfil the criteria mentioned above. There is one symbol approved per ERV locus independently of how many viral genes the ERV may encode.
The nomenclature scheme described in this paper aims to be concise so that it is user-friendly. It also aims to be informative to researchers, including those who are less familiar with the field. To be informative, the nomenclature scheme is hierarchical, with each symbol beginning with the root symbol "ERV" so that the symbols are instantly recognisable and can be grouped together in searches. Note that many researchers have published papers using symbols beginning with "HERV", but it is against the guidelines of the HGNC ever to use H for "human" in symbols, mainly because this precludes the possibility of the nomenclature scheme's being extended to other species. Each ERV symbol, then, includes an identifier that represents the group to which the ERV belongs.
Comparison of Repbase group symbols with group symbols used in the nomenclature scheme presented herein
Repbase group symbol
Group symbol in new nomenclature scheme
Finally, each ERV within a particular group is uniquely identified by a number, for example, ERVK-1. Numbers are assigned consecutively within each group to make the nomenclature system expandable. The number is used to make each symbol unique and has no intrinsic meaning. ERVK-2 has merely been assigned the next number following ERVK-1, but this provides no information on the position of the ERVs within the genome or the order in which an ERV may have been published. The use of numerical identifiers keeps the symbols as short as possible to encourage widespread use by researchers. Newly identified transcribed loci will take the next available consecutive number for their particular group; for example, if a newly transcribed ERVK locus is identified, it will take the symbol ERVK-26. Each symbol is accompanied by an expanded gene name which clearly and succinctly explains that derivation of the nomenclature; for example, the full name of ERVFRD-1 is "endogenous retrovirus group FRD, member 1".
We are aware that the proposed nomenclature scheme cannot encompass all conceivable (and sometimes known) unusual structures of ERV loci, such as hybrid loci consisting of different ERV groups and ERV insertions into existing ERV loci . HGNC, after conferring with researchers who submit newly identified transcribed loci, will decide whether or how to name such unique loci on a case-by-case basis. For example, the scheme will not incorporate ERV locus transcripts that are part of another gene's transcript, as these elements will not be considered separate loci.
Table 1 lists transcribed human ERVs that have been named according to the new nomenclature system. All ERVs in the table either have been published or have been annotated by the RefSeq project. An initial list was sent to a number of researchers in the field for their comments. The list was expanded as these researchers suggested more loci. Where no transcript sequence was available, authors were asked to submit representative sequences to the GenBank, EMBL and DDBJ databases. We encourage researchers to contact the HGNC if they know of further ERVs that can be included in the scheme.
Finally, although only human gene nomenclature is under the remit of the HGNC, we wish to mention that the naming system introduced here for transcribed ERVs could, in principle, also be applied to other, non-ERV repetitive sequences in the human genome, as well as to repetitive DNA in nonhuman species. Future research will probably reveal numerous transcribed repetitive DNA sequences in various species. Judged just from ERV designations in different species, a standardised naming system for transcribed repeat loci may be highly beneficial to avoid future confusion.
Research in the laboratory of JM is supported by Deutsche Forschungsgemeinschaft (DFG) and Homburger Forschungsförderungsprogramm (HOMFOR) The HGNC is funded by The Wellcome Trust grant 081979/Z/07/Z and National Human Genome Research Institute grant P41 HG03345). We also acknowledge those who provided extra information for the list of transcribed ERV loci: Yoshihiro Jinno, Finn Skou Pedersen, Benoit Barbeau, Christine Leib-Mösch and Patrick M Alliel.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.