A human endogenous retrovirus encoded protease potentially cleaves numerous cellular proteins

Background A considerable portion of the human genome derives from retroviruses inherited over millions of years. Human endogenous retroviruses (HERVs) are usually severely mutated, yet some coding-competent HERVs exist. The HERV-K(HML-2) group includes evolutionarily young proviruses that encode typical retroviral proteins. HERV-K(HML-2) has been implicated in various human diseases because transcription is often upregulated and some of its encoded proteins are known to affect cell biology. HERV-K(HML-2) Protease (Pro) has received little attention so far, although it is expressed in some disease contexts and other retroviral proteases are known to process cellular proteins. Results We set out to identify human cellular proteins that are substrates of HERV-K(HML-2) Pro employing a modified Terminal Amine Isotopic Labeling of Substrates (TAILS) procedure. Thousands of human proteins were identified by this assay as significantly processed by HERV-K(HML-2) Pro at both acidic and neutral pH. We confirmed cleavage of a majority of selected human proteins in vitro and in co-expression experiments in vivo. Sizes of processing products observed for some of the tested proteins coincided with product sizes predicted by TAILS. Processed proteins locate to various cellular compartments and participate in diverse, often disease-relevant cellular processes. A limited number of HERV-K(HML-2) reference and non-reference loci appears capable of encoding active Pro. Conclusions Our findings from an approach combining TAILS with experimental verification of candidate proteins in vitro and in cultured cells suggest that hundreds of cellular proteins are potential substrates of HERV-K(HML-2) Pro. It is therefore conceivable that even low-level expression of HERV-K(HML-2) Pro affects levels of a diverse array of proteins and thus has a functional impact on cell biology and possible relevance for human diseases. Further studies are indicated to elucidate effects of HERV-K(HML-2) Pro expression regarding human substrate proteins, cell biology, and disease. The latter also calls for studies on expression of specific HERV-K(HML-2) loci capable of encoding active Pro. Endogenous retrovirus-encoded Pro activity may also be relevant for disease development in species other than human. Electronic supplementary material The online version of this article (10.1186/s13100-019-0178-z) contains supplementary material, which is available to authorized users.


Background
Human endogenous retroviruses (HERVs), originating from past infections by exogenous retroviruses, and derived elements with some retroviral features, comprise about 8% of the human genome. HERVs affect the biology of the human genome in various ways, ranging from influences on transcription and splicing to biological effects of retrovirus-like proteins still encoded by some HERV groups. For instance, the envelope glycoprotein encoded by a provirus of the so-called HERV-W group was coopted to form the ERVW-1 (Syncytin-1) gene, whose protein product exerts important functions in human placenta development and functionality [1]. The HERV-K(HML-2) group, in short HML-2, includes a number of evolutionarily young proviruses, several of which are human-specific or even polymorphic in the human population [2]. Transcription of some HML-2 loci is upregulated in various human diseases with potential consequences due to the interaction of HML-2encoded proteins with other cellular proteins (for reviews, see [3][4][5]). For instance, certain types of testicular and ovarian germ cell tumors (GCTs), as well as melanoma and mammary carcinomas, display upregulated HML-2 transcription (reviewed in [6,7]). Upregulated HML-2 transcription could be observed in lesions considered precursors of testicular GCTs, so-called carcinoma in situ of the testis [8]. GCT patients suffering from GCT-types with HML-2 upregulation already show a strong humoral response against HML-2-encoded Gag and Env proteins at the time of tumor detection [9,10]. HML-2 encoded Env protein was recently shown to induce several transcription factors and to activate the cellular transformation-associated MAPK ERK1/2 pathway [11]. HML-2 Rec and Np9 proteins, encoded by spliced transcripts from the HML-2 env gene, were shown to interact with several human proteins, among them promyelocytic zinc finger protein (PLZF), testicular zinc finger protein (TZFP), Staufen-1, human small glutaminerich (hSGT), and ligand of Numb protein X (LNX). Rec expression disturbed germ cell development in mice and altered testis histology towards a carcinoma-like phenotype [12][13][14][15][16][17][18].
Retroviral genomes usually encode several catalytic proteins, among them aspartyl Protease (Pro). HML-2 also encodes Pro that, after self-processing from a Gag-Pro(−Pol) precursor translated through ribosomal frameshifts, cleaves retroviral HML-2 Gag protein into matrix, capsid and nucleocapsid domains, as is typical for other retroviral aspartyl proteases [19][20][21]. There is strong evidence that active HML-2 Pro is expressed at significant amounts and during longer periods of time, especially for GCT. HML-2-encoded retroviral particles budding from GCT cell lines have been detected. Large amounts of HML-2 Gag protein are present in GCT tissue and HML-2 Pro-cleaved Gag protein was demonstrated in GCT cell lines and especially tissue samples [10,22]. Bieda et al. [23] demonstrated mature HML-2-encoded retroviral particles budding from different GCT cell lines, immature non-budding retroviral particles, as well as cleaved Gag protein in those cell lines. Prokaryotic expression of a construct harboring HML-2 Gag-Pro ORFs results in self-processing of Pro from a Gag-Pro precursor [24], thus Pro is capable of self-processing independent of retroviral particle formation and budding.
Besides retroviral Gag protein, retroviral aspartyl proteases were found to cleave host cellular proteins. HIV Pro processes human Actin, Troponin C, Alzheimer amyloid precursor protein, and Pro-interleukin 1β in vivo. Purified HIV Pro processes Vimentin, Desmosin, and Glial fibrillary acidic protein, and Microtuble-associated proteins 1 and 2 in vitro (reviewed in [25]). Riviere et al. [26] reported processing of the precursor of NFkappa B by HIV-1 Pro during acute infection. Processing of Vimentin by proteases of Bovine Leukemia Virus, Mason-Pfizer Monkey Virus, and Myeloblastosis-Associated Virus was reported by Snásel et al. [27]. Shoeman et al. [28] reported cleavage of focal adhesion plaque proteins, including Fimbrin, Focal adhesion plaque kinase, Talin, Filamin, Spectrin and Fibronectin by HIV-1 and HIV-2 proteases. Devroe at al. [29] reported processing of human NDR1 and NDR2 serine-threonine kinases by HIV-1 Pro. More recently, more than 120 cellular substrates were reported to be processed by HIV-1 Pro in vitro by Impens et al. [30]. Thus, aspartyl proteases from diverse retroviruses appear able to degrade quite a number of host cellular proteins. Furthermore, such processing of cellular proteins by retroviral Pro can occur independent of retroviral budding. For instance, cleavage of procaspase 8 by HIV-1 Pro was observed during HIV-1 infection of T-cells and other cell types [31,32]. HIV-1 Pro was reported to cleave serinethreonine kinases RIPK1 and RIPK2 during HIV-1 infection of T-cell lines or primary activated CD4 + T cells ( [33], see references therein for additional examples). A significant amount of processing of HIV-1 Gag occurs in the cytoplasm of infected cells resulting in intracellular accumulation of appropriately processed HIV-1 Gag proteins [34]. For Mouse Mammary Tumor Virus (MMTV), a betaretrovirus closely related with HERV-K(HML-2), activation of Pro can occur before budding, and MMTV Gag protein is primarily found in the cytoplasm and traffics to intracellular membranes to initiate particle assembly. Similar observations were made for Human Foamy Virus [35][36][37]. Thus, retroviral Pro proteins are activated not only during maturation of retroviral particles.
There is evidence that such processing of cellular proteins by retroviral Pro is of biological relevance. Strack et al. [38] reported that apoptosis of HIV-infected cells was preceded by HIV Pro-mediated cleavage of Bcl-2. Cleavage of Procaspase 8 by HIV Pro in T-cells was followed by cellular events characteristic of apoptosis [31]. HIV Pro inducibly expressed in yeast caused cell lysis due to alterations in membrane permeability. Cell killing and lysis, specifically lysis by necrosis without signs of apoptosis, was observed in COS-7 cells following expression of HIV Pro [39]. Cleavage of EIF4G by several retroviral proteases profoundly inhibited capdependent translation [40]. Specific inhibition of HIV Pro reduced the extent of both necrosis and apoptosis in C8166 cells [41]. It was recently proposed that cleavage of RIPK1 by HIV-1 Pro might be one of several mechanisms by which HIV-1 counteracts host innate immune responses [33].
There is thus good evidence for cellular effects following expression of retroviral Protease. Although retroviral Protease is encoded in the human genome by HERV-K(HML-2) and expressed in the disease context, there is surprisingly little information as to potential functional relevance of HML-2 Pro expression. We therefore set out to identify human proteins processed by HML-2 Pro by employing specialized proteomics methods. Numerous human proteins were identified as substrates of HML-2 Pro. We further verified processing by HML-2 Pro for selected proteins in vitro and in vivo. Human proteins identified often exert various, often important cellular functions, and many of them are disease-relevant. The relevance of our findings for human disease is currently unknown, yet the sheer number of potentially disease-relevant proteins identified in our study as potential substrates of HML-2 Pro strongly argues for further specific analyses.

Optimization of HERV-K(HML-2) protease activity
We sought to identify human cellular proteins that are substrates of HERV-K(HML-2) Pro, employing a modified Terminal Amine Isotopic Labeling of Substrates (TAILS) protocol [42,43]. We first optimized HERV-K(HML-2) Protease activity prior to TAILS. We employed a cloned HML-2 Pro previously identified and shown to be enzymatically active [24]. Of note, the cloned Pro included self-processing sites and in-frame flanking sequence. HML-2 Pro was prokaryotically expressed and subsequently purified using a previously published protocol employing Pepstatin A, a specific inhibitor of retroviral aspartate proteinases, coupled to agarose beads [44]. In accordance with previous results, HML-2 Pro could be purified very efficiently and at relatively high yields (Fig. 1). As also observed before [44], HML-2 Pro self-processed from the precursor during the expression, purification, and renaturation steps (Additional file 2: Figure S1). We note that two different, enzymatically inactive mutants of HML-2 Pro (harboring mutations in catalytic motifs, see the Methods section) could not be purified due to inefficient binding to Pepstatin A-agarose (Additional file 2: Figure S1).
Previous studies of HML-2 and other retroviral Proteases employed differing buffer systems and pH conditions when measuring HML-2 Pro activity (for instance, see [44,45]). We therefore determined HML-2 Pro activity in various buffer systems using a fluorescent substrate previously shown to be processed by HIV Pro [46] and expected also to be processed by HML-2 Pro because of its very similar specificity profiles [47]. We found that HML-2 Pro displayed higher activity at conditions of high ionic strength. Higher concentrations of glycerol appeared to reduce HML-2 Pro activity (see the legend to Fig. 2), as did DMSO of 2% [v/v] and higher (not shown). Of further note, very similar HML-2 Pro activity was observed at different pH conditions for MES and PIPES-based buffer systems (not shown). A buffer composed of 100 mM MES and 1 M NaCl was chosen for lysis of HeLa cells and a PIPES-based buffer system was used for TAILS (see below). Further variation of reaction conditions between pH 5.5 and 8 established that HML-2 Pro was most active at pH 5.5 and somewhat less active at pH 6. Further reduced activity was seen for pH > 6, yet HML-2 Pro still displayed low activity at pH 8 (Fig. 2). In principle, these results are generally in Fig. 1 Purification of HERV-K(HML-2) Protease. A previously established method for purification of prokaryotically expressed HML-2 Pro was employed with minor modifications (see text). Samples were taken at various steps of the procedure, such as bacterial culture before induction ("pre-ind."), flow-through ("flowthr.") after binding of bacterial lysate to Pepstatin A-agarose, two wash fractions, and 4 elution fractions. Proteins were separated by SDS-PAGE in a 15% PAA-gel and visualized by staining with Coomassie Blue. Molecular mass of marker proteins (M) are indicated on the left. Purified, auto-processed HML-2 Pro migrates at approximately 12 kDa accord with previous findings (for instance, see [20]). As further addressed below, several cellular compartments have an acidic pH of 6 or less [48].
Subsequent TAILS experiments involved Pepstatin A as an inhibitor of HML-2 Pro activity. We therefore also established the molar ratio required to effectively inhibit HML-2 Pro. We found that 200 μM Pepstatin A efficiently inhibited HML-2 Pro present at 460 nM. Inhibition was even more pronounced when reactions were pre-incubated with Pepstatin A for 10 min before addition of fluorescent substrate (Fig. 2).
Identification of numerous human cellular proteins cleaved by HERV-K(HML-2) protease using TAILS Previous studies indicated that retroviral aspartate Proteases, including HIV Pro, can process not only retrovirusencoded proteins but also cellular proteins (see the Background section). We therefore were interested whether HML-2 Protease is also able to process human cellular proteins other than HML-2-encoded Gag protein. To do so, we employed a modified Terminal Amine Isotopic Labeling of Substrates (TAILS) procedure that identifies Protease-cleaved protein fragments by means of specific labeling and subsequent isolation of processed amine termini followed by mass-spectrometry [42,43]. We incubated HeLa cell total protein lysate with purified HML-2 Pro employing established reaction conditions with regard to salt concentration, pH, and molar ratio of Protease and Pepstatin A (see above). HML-2 Pro-generated cleavage sites were subsequently identified by Nterminomics [43] using the TAILS approach. As a negative selection technique, TAILS is suitable for the analysis of natively blocked (e.g. acetylated) and natively free N-termini. Since proteolysis generates free N-termini, we focused on these species. During the TAILS procedure, free N-termini are chemically dimethylated.
We performed TAILS experiments at pH 5.5 and pH 7. As for the experiment at pH 5.5, TAILS identified greater than 8500 native free or proteolytically generated N-termini in both replicates 1 and 2 (Fig. 3, Additional file 1: Tables S1a,b). As an initial filter to discern background proteolysis from HML-2 Prodependent cleavage events, we selected those cleavage events that were enriched at least 2-fold upon HML-2 Pro incubation. We observed 4370 cleavage events in replicate 1 and 2633 cleavage events in replicate 2. A variation in protease activity, as well as the different methodological processing steps, may contribute to this variance. Of those, 931 cleavage events were common to both replicates (Fig. 3, Additional file 1: Tables S1a,b) and those corresponded to 548 different human proteins. For proteins cleaved in both replicates with at least 2-fold enrichment, yet not necessarily cleaved in the same position within a protein, we identified 2024 and 1170 unique protein IDs in the two replicates, respectively. Combining both replicates, 809 different human proteins showed replicated evidence of cleavage by HML-2 Pro (Fig. 3b, Additional file 1: Tables S1a,b). As implied by the above numbers, several human proteins showed multiple cleavage events per protein (Fig. 3c).  Table S2).
Since protein degradation by HML-2 Pro may also occur in the cytoplasm or nucleoplasm at neutral pH rather than in acidic organelles, we also performed a TAILS experiment at pH 7. Overall, we observed fewer cleavage events, possibly due to lower enzymatic activity of HML-2 Pro at pH 7 ( Fig.3a, Additional file 1: Tables S1c,d). Nevertheless, greater than 3100 native free or proteolytically cleaved N-termini were identified for replicates 1 and 2, respectively, of which 1074 (replicate 1) and 514 (replicate 2) cleavage events were enriched greater than 1.5-fold upon HML-2 Pro incubation, with an overlap of 58 cleavage events. For the pH 7.0 assay, we chose a less stringent cutoff value of 1.5-fold-change due to the lower activity of HML-2 Pro at pH 7.0. For a lower protease activity the TAILS approach may miss more potential substrates at the more stringent cutoff value of 2. Though, at the lower cutoff value the potential candidates have to be considered more carefully and additional experiments like the in vitro experiments and the experiments with cultured cells are of greater value. At a cutoff value of 1.5-fold-change, 442 (replicate 1) and 369 (replicate 2) different human proteins were affected by HML-2 Pro incubation. Combining the latter experiments, a total of 154 different human proteins showed replicated evidence of cleavage by HML-2 Pro at pH 7 (Fig. 3b). Of note, four human proteins were identified only in the pH 7 TAILS experiment, though with relatively low to medium enrichment of processing products (TAGLN: 3.8-fold; MAP1B: 4.1-fold; KTN1: 1.7fold; EPB41L2: 1.6-fold).
Similar to the TAILS experiment at pH 5.5, we observed at pH 7 multiple cleavage events within several of human proteins enriched greater than 1.5-fold. For instance, there were 25 and 15 cleavage events in replicates 1 and 2, respectively, for HSP90AB1, 41 and 6 events for MYH6, 17 and 3 for ACTB, and 40 and 7 events for HSPA8 (Fig. 3c, Additional file 1: Table S2).
Combining all results, we identified 102 different human proteins cleaved by HML-2 Pro that were detected in all four TAILS experiments when applying 2-fold enrichment at pH 5.5 and 1.5-fold enrichment at pH 7 (Fig. 3b). We consider these findings to bear evidence of possible processing of human proteins by HML-2 Pro.

Involvement of human proteins cleaved by HML-2 protease in diverse cellular processes
We next used the Gene Ontology (GO) database [49,50] to identify biological properties of proteins identified by TAILS. Analysis of the 809 different human proteins common to the two pH 5.5 experiments indicated localization of proteins in diverse cellular compartments including cytosol, nucleus and membrane (Fig. 4a). Further GO term-based analysis of biological processes associated with the 809 human proteins showed their involvement in numerous biological processes, e.g. apoptosis, cell cycle regulation, DNA repair and replication, ion and nuclear transport (Fig. 4b). Moreover, intersection of the human genes corresponding to those 809 human proteins with genes included in the Catalogue Of Somatic Mutations In Cancer (COSMIC) database [53] identified 62 human genes/proteins in our dataset with an established relevance in oncology (Fig. 4b, Additional file 1: Table S3). Querying the Online Mendelian Inheritance in Man (OMIM) database [54] revealed genes for our dataset of 809 proteins to be associated with 265 different genetic disorder phenotypes, of which approximately 239 were described as inherited (Additional file 1: Table S4).

Verification of cleavage of human proteins by HERV-K(HML-2) protease in vitro
We next sought to verify in vitro cleavage by HML-2 Pro of proteins identified by TAILS experiments. We focused on substrate candidates that were enriched more than 2-fold upon active HML-2 incubation in both replicates of the TAILS experiment at pH 5.5. A recent study profiled amino acid specificities of HERV-K(HML-2) Pro at aa positions P6-P1 and P1'-P6' in, respectively, Nterminal and C-terminal direction with respect to the cleaved bond revealing, for instance, P1 as the major determinant of specificity and a preference for aromatic aa residues in P1 [47]. A subsequent profiling at pH 7 likewise revealed preferences for aromatic aa residues in P1 and aromatic and aliphatic aa residues in P1' (data not shown). We utilized these published findings to reduce the list of candidate proteins by filtering for peptides from cleavage events having F, G, Y or W in P1, and F, I, L, V or W in P1' (Fig. 3a). Furthermore, we selected proteins with a size compatible with an in vitro coupled transcription/translation system, and cellular localizations and biological functions based on associated GO terms. Eventually, we further analyzed 14 different human proteins (Table 1). We produced candidate proteins in vitro in a coupled transcription/translation system using either a radioactive label ( 35 S-methionine) or a C-terminal HA-tag. We then incubated equal amounts of each candidate protein with purified HML-2 Pro, including a control reaction without Pro and one with Pro enzymatic activity inhibited by presence of Pepstatin A. Reactions were then subjected to SDS-PAGE followed by phosphorimager or Western blot analysis depending on the protein label. Fig. 4 Gene Ontology term-based characteristics of human proteins identified as substrates of HERV-K(HML-2) Protease by TAILS. Selected cellular components (a) and biological processes (b) are depicted. Numbers were compiled using PANTHER (Protein ANalysis THrough Evolutionary Relationships) GO-Slim as provided at http://geneontology.org [51,52]. Numbers of proteins per category expected by chance are also given. Graph (b) also depicts in the bottom-most bar the overlap of proteins identified by TAILS with cancer-relevant genes as compiled by COSMIC (Catalogue Of Somatic Mutations In Cancer; https://cancer.sanger.ac.uk/cosmic) [53]. See Additional file 1: Table S3 for COSMIC cancer genes (See figure on previous page.) Fig. 3 Cleavage sites in human proteins identified as substrates of HERV-K(HML-2) Protease by TAILS. a. Results of filtering of cleavage sites observed by TAILS. Results for two experiments (rep1, rep2) performed at pH 5.5 and pH 7 are each depicted. Various filters were applied, such as greater than 1.5-fold or 2-fold (fc) enrichment for the observed cleavage event compared to controls and particular amino acids in P1 and P1' (see the paper text). Resulting numbers after applying the various filters are indicated by bars and by specific numbers when including P1 and P1'. b. Venn diagrams depicting overlap of cleavage sites and protein IDs in replicates (rep1, rep2) performed at pH 5.5 and pH 7. The overlap of protein IDs detected in all four experiments is depicted in the Venn diagram at the bottom. c. Numbers of cleavage sites in proteins identified as substrates of HERV-K(HML-2) Pro. Results are summarized for the replicate (rep1, rep2) TAILS experiments at pH 5.5 and pH 7. A single cleavage event was observed for the vast majority of proteins, fewer proteins were cleaved at more than one position, and a relatively small number of proteins were cleaved at up to 60 different positions within the particular protein. See Additional file 1: Table S2, for selected human proteins with multiple cleavage sites Table 1  n.t./+ a Gene/Protein as given by approved gene/protein symbols and full names. A representative protein ID is given. Selected functional characteristics as well as associated diseases ("AD") for respective genes/proteins were compiled from information provided by GeneCards [55]. Experimental verifications ("ver.") of processing of particular proteins by HML-2 Pro in vitro and in vivo is indicated as "+/+", no processing observed as "-", inconclusive evidence as "i" (see the text), n.t.: not tested in either in vitro or in vitro experiments Out of 14 different human proteins examined, we obtained evidence for processing by HML-2 Pro in vitro for 9 of those proteins. Evidence for processing included (i) a more or less reduced amount of full-length candidate protein compared to amounts of full-length protein in control reactions without Pro and with Pro plus Pepstatin A, (ii) presence of one or several additional protein bands in the reaction with Pro compared to the reaction without Pro, (iii) such additional protein bands also being present in the reaction with Pro plus Pepstatin A, yet at (much) lower amounts compared to the reaction with Pro. Different combinations of those criteria were observed in our verification experiments. In contrast, no or inconclusive evidence for processing by HML-2 Pro was obtained for 5 human proteins tested ( Fig. 5a and Additional file 2: Figure S2).
TAILS experiments also provided information for actual cleavage site positions in candidate proteins. We found for 6 of the 14 different human proteins tested that HML-2 Pro had produced additional protein bands coinciding with sizes of processing products predicted by cleavage sites identified by TAILS (Fig. 5b).

Verification of cleavage of human proteins by HERV-K(HML-2) protease in vivo
We also investigated candidate proteins for their ability to be processed in vivo. We selected proteins confirmed in vitro as substrates of HML-2 Pro, along with proteins identified by TAILS that were of functional interest and readily available to us as cloned cDNAs. We coexpressed in HEK293T cells epitope-tagged candidate proteins together with wild-type (enzymatically active) or mutant (inactive) HML-2 Pro, with or without an enhanced green fluorescent protein (EGFP) tag, and performed Western blot analysis with epitope-tag-specific antibodies. Expression of Pro was detected using either a polyclonal α-HML-2 Pro antibody [19], or an α-EGFP antibody (kindly provided by Gabriel Schlenstedt, University of Saarland). The α-Pro pAb detected proteins of sizes expected for self-processed and (unprocessed) precursor forms of both wild-type and mutant HML-2 Pro. The α-EGFP antibody detected proteins of sizes expected for EGFP-Pro precursor and the EGFP portion after auto-processing of Pro (Fig. 6a). Importantly, processing of HML-2 Pro from an EGFP-Pro fusion protein provides further strong experimental support for HML-2 Pro becoming active independent of retroviral particle formation and budding.
For many of the tested candidate proteins, we observed pronounced reduction of the amount of fulllength candidate protein, in some instances to below detection limits (Fig. 6a, b and Additional file 2: Figure S3). Notably, for some of the tested candidate proteins, additional products smaller in size than the full-length proteins were detected when co-expressing wild-type but not mutant HML-2 Pro, specifically for C15orf57-HA, HSP90-HA, MAP2K2-HA, FLAG-TRIM28, FLAG-RNASEH2A, and Myc-STUB1 (Fig. 6a, b). For three proteins, sizes of such additional products were very similar to sizes of cleaved protein products detected in the in vitro verification experiments. Specifically, a fragment of 15 kDa was seen for C15orf57-HA. A fragment of 60 kDa was also detected for HSP90 by both the anti-HA and the anti-HSP90 antibodies. The latter antibody also detected an approximately 50 kDa fragment of HSP90. A fragment of approximately 42 kDa was also detected for MAP2K2-HA (compare Figs. 5a, 6a, b). This suggests that HML-2 Pro processing of these candidate proteins in vivo reproduced the same (more or less stable) processing products as the in vitro reactions. Truncated protein products were also detected by Western blotting of additional candidate proteins tested only in vivo, specifically 62 kDa, 30 kDa, and 31 kDa bands for FLAG-TRIM28, FLAG-RNASEH2A, and Myc-STUB1, respectively (Fig. 6b). Thus HML-2 Pro-mediated processing of these proteins also appears to produce stable processing products.
Importantly, and similar to the in vitro verification experiments, the sizes of additional protein products observed coincided well with the sizes predicted by cleavage sites identified in TAILS experiments (Fig. 6c).
Degradation of candidate proteins is not due to HML-2 protease induced cell death When expressing HML-2 Pro in HEK293T and HeLa cells, we noted under the microscope cell death for a relatively small proportion of cells. The amount of cell death seemed higher for HeLa than for HEK293T cells. No such cell death was observed when expressing mutant HML-2 Pro. Cell death appeared reduced in the presence of 1 μM indinavir, a strong inhibitor of HIV Pro, and with less potency against HML-2 Pro in cell culture (not shown) [20,56].
We therefore quantified by FACS analysis the relative amount of cell death following HML-2 EGFP-Pro expression in HEK293T cells. We determined relative numbers of EGFP-, thus Pro-expressing cells at 5, 10, 24, 30, and 48 h after transient transfection with plasmids encoding either EGFP, EGFP-Pro-wt, or EGFP-Pro-mut. Approximately 60% of gated live cells expressing EGFP-Pro-wt or EGFP-Pro-mut were EGFP-positive up to 48 h post-transfection, indicating that only a minority of cells expressing HML-2 Pro are driven into cell death over the course of our expression experiments (Additional file 2: Figure S4).
HIV Pro has also been reported to induce apoptosis (see the Background section). Various cellular proteins are degraded during apoptosis due to activation of caspases [57]. We therefore asked whether observed cleavage of candidate proteins by HML-2 Pro could also be attributed to cleavage by caspases. We transiently expressed proteins HSPA90AA1-HA, MAP2K2-HA and C15orf57-HA in HEK293T cells and subsequently induced apoptosis by addition of Staurosporin at 2 μM. HEK293T cells harvested after 5 h did not show evidence of processing of Fig. 5 Verification of processing of human proteins by HERV-K(HML-2) Protease in vitro. Human candidate proteins were expressed in vitro using a coupled transcription/translation system. a. Results from protease incubations of various candidate proteins labeled with either 35 S-methionine or a C-terminal HA-tag (" 35 S" and "HA") are shown. Experiments included for each candidate protein a reaction without protease ("C"), one with protease ("+"), and one with protease and Pepstatin A ("+/P"). Reaction products were separated by SDS-PAGE in 10% PAA-gels and processed for phosphorimager analysis or HA-tag-specific Western blots depending on the label. Processing of full-length candidate proteins (indicated by an arrow) was evidenced by additional protein bands smaller than the respective full-length candidate protein (arrowheads) and/or a decrease in the amount of full-length candidate protein (see the Results section). One example of a candidate protein (PSMC4) without evidence of processing by HML-2 Pro is shown. b. Graphical depiction of candidate proteins confirmed to be processed by HML-2 Pro. The number of amino acids and corresponding molecular mass in kDa is indicated by scales at the top and by the line length for each protein. Positions of methionines and cleavage sites (grey and black arrowheads, respectively), as identified by TAILS in either one of the two replicate experiments at pH 5.5 (see the text), are indicated for each protein. Dashed lines indicate molecular masses of processing products observed experimentally for either 35 Smethionine (" 35 S")-or HA-tag ("HA")-labeled candidate proteins. Note that the latter label will only detect C-terminal processing products. Processing products were not indicated for the two HSP90A proteins because observed products were difficult to assign due to too many observed cleavage sites. Processing of PDIA3 protein was supported by reduction of the amount of full-length protein, though no smaller processing products could be observed. Note that C15orf57 migrated slower in gel electrophoresis than predicted by molecular mass. See Additional file 2: Figure S2 for additional evidence of processing of candidate proteins by HML-2 Pro candidate proteins due to apoptotic processes (Additional file 2: Figure S4). Importantly, a processing product of a size observed when co-expressed with HML-2 Pro was not visible (Fig. 6a, b). Of further note, in the case of HSP90AA1-HA and MAP2K2-HA co-expressed with HML-2 Pro, addition of a pan-caspase inhibitor (Q-VD, 25 μM) did not reduce the amount of processing product observed, but rather increased it slightly when compared to control cells expressing HML-2 Pro in the absence of Q-VD (Additional file 2: Figure S4).
Several HERV-K(HML-2) loci in the human genome potentially encode active protease We were interested in which HML-2 loci in the human genome may produce an active Protease when they are transcribed and translated in a retroviral fashion, that is, the Pro ORF is translated via ribosomal frameshift between the Gag and Pro ORFs. Therefore, we examined HML-2 locus sequences in the human reference genome sequence, as well as among HML-2 sequences previously reported as missing from the reference genome, for presence of Gag and Pro ORFs. We subsequently predicted sequences of encoded Pro proteins for HML-2 loci fulfilling those criteria (Fig. 7). We identified 6 different HML-2 loci in the human reference genome (3q27.2_ERVK-11; 5q33.3_ERVK-10; 6q14.1_ERVK-9; 7p22.1_ERVK-6; 8p23.1_ERVK-8 (K115); 12q14.1_ERVK-21) potentially capable of translating a Pro protein of canonical length. None of the corresponding protein sequences displayed amino acid alterations within the conserved catalytic DTG, FLAP and GRDLL motifs (Fig. 7). Of note, HML-2 locus 3q27.2_ERVK-11 displayed a fused Gag-Pro ORF extending approximately 700 aa in the N-terminal direction. Another HML-2 locus (22q11.21_ERVK-24) displayed a premature stop codon in the conserved GRDLL motif. Three out of four non-reference HML-2 sequences displayed full-length ORFs, yet one of them harbored a G → S change and another an I → V change within the FLAP-motif (Fig. 7).

Discussion
Retroviral aspartyl proteases are known to process various cellular proteins that are not directly correlated with or important for the retroviral replication cycle. Processing of such cellular proteins does not appear to play a major role during replication of exogenous retroviruses and may be regarded rather as cellular side effects of infections by exogenous retroviruses. However, processing of cellular proteins by retroviral proteases may be much more critical when the protease is encoded by endogenous retroviruses that are stable, vertically inherited components of a genome. In fact, the HERV-K(HML-2) group encodes active protease and HML-2 transcription and expression of HML-2 proteins has been reported to be upregulated in various human cancers, sometimes early in cancer development, such as in GCT carcinoma in situ [8]. Importantly, there is strong evidence that active HML-2 Pro is expressed in tumor cells and tumor-derived cell lines. HML-2 encoded retroviral particles budding from GCT cell lines were shown 25 years ago (for instance, see [22]). Large amounts of processed HML-2 Gag protein are present in GCT tissue and processed HML-2 (See figure on previous page.) Fig. 6 Verification of processing of human proteins by HERV-K(HML-2) Protease in vivo. Human candidate proteins and HML-2 Pro were coexpressed in HeLa cells in vivo and detected by Western blot using antibodies as indicated. For each blot, the leftmost lane is a control cotransfected with a plasmid encoding a candidate protein and either a GFP-encoding plasmid or empty phCMV, pcDNA6 myc/his B, or pcDNA5 FRT/TO vector, depending on GFP-Pro or (sole) Pro co-expressed in the experiment (see below). Candidate protein co-expressed with wild-type Pro (pro-wt) and mutant Pro (pro-mut) were loaded in lanes 2 and 3 each. Pro was expessed as either (sole) Pro or EGFP-Pro. Blots were probed with α-HA, α-GFP, α-Pro, or an α-HSP90 antibody as indicated. Full-length candidate protein and processing products are indicated by arrows and arrowheads, respectively (see below). A Representative results from control experiments co-expressing HSP90AA1 with either HML-2 Pro or EGFP-Pro. Relevant blot regions are shown. When expressing pro-wt and pro-mut, HML-2 Pro can be detected as approximately 18 kDa and 19 kDa protein bands representing self-processed and unprocessed products, respectively, Pro (a, bottom blot). When HML-2 Pro is expressed as EGFP-Pro-wt or EGFP-Pro-mut fusion protein, proteins of approximately 30 kDa and 47 kDa, representing processed and unprocessed EGFP(−Pro) can be detected with an α-GFP antibody (b, middle blot). Unprocessed EGFP-Pro(−mut) and self-processed Pro of approximately 50 kDa and 18 kDa, respectively, can be detected when using an α-Pro antibody (b, bottom blot; c). B. Selected Western blot results from co-expression of candidate proteins and HML-2 Pro. Candidate proteins were tagged with N-or C-terminal epitopes and detected with respective epitope-specific antibodies as indicated. Note the more or less complete reduction of amounts of full-length candidate protein (arrows), and sometimes processing products (arrow heads), in lanes with co-expressed HML-2 Pro. Note in panel Aa and Ab that the same processing product was detected for HSP90AA1 in Gag protein was furthermore demonstrated in GCT cell lines and tissue samples (for instance, see [10,22,23]). HML-2 Pro appears to become activated, and is thus present in cells, independent of budding of retroviral particles (see the Background section and below). The cellular consequences of expression of active HML-2 Pro are currently unknown. The disease relevance of HML-2 Pro is therefore unknown as well.
We employed a recently developed strategy, TAILS, for identification of human cellular proteins that are potential substrates of HML-2 Pro using purified HML-2 Pro and the proteome of HeLa cells as a model system. Our analysis identified a surprisingly high number-at least in the hundreds-of human proteins as potential substrates. A different positional proteomics approach recently identified more than 120 human proteins as processed by HIV-1 Pro in vitro [30]. Our experimental approach can be expected to be more sensitive and to thus identify more proteins than the approach employed in that study. Interestingly, 57 proteins identified in our study were also identified in that study [30] likely because of overall similar specificity profiles of HIV-1 and HML-2 Pro [47]. Methodologically our study comprised a broad TAILS approach followed by in vitro experiments and experiments in cultured cells of selected proteins. This study is a valuable example of this method combination for providing insight into potential substrates of yet under-investigated proteases.
The number of human proteins processed by HML-2 Pro in vivo is currently difficult to estimate with certainty. The two TAILS experiments at pH 5.5 identified approximately 4300 and 2600 cleavage events with at least 2-fold enrichment of a cleavage event of which 809 different human proteins were common to both experiments. Although HML-2 Pro displayed overall lower activity at pH 7, we still identified 500 to 1000 cleavage events with greater than 1.5-fold enrichment involving 154 different human proteins cleaved in both pH 7 experiments.
Furthermore, we verified processing by HML-2 Pro for 9 out of 14 (65%) human proteins in vitro. The great majority of the different human proteins examined in Fig. 7 Multiple alignment of amino acid sequences of Proteases potentially encoded by HERV-K(HML-2) loci. Because HML-2 Pro is translated via a ribosomal frameshift from the Gag ORF only HML-2 Pro sequences that also harbor a full-length Gag ORF are included. Note that other HML-2 loci may also encode protease in the case of translation bypassing Gag-Pro frameshifts. The HML-2 Pro ORF also encodes an upstream dUTPase. The C-terminal "last" dUTPase motif is included in the multiple alignment. Also indicated are a previously reported N-terminal auto-processing site for HML-2 Pro [44], and DTG, FLAP and GRDLL motifs conserved in retroviral aspartyl proteases. Note the early stop codons in two sequences that partially or entirely remove the GRDLL region. The HML-2 locus designations used here are a combination of two established naming systems; the first based on the location of HML-2 loci in chromosomal bands [58] and the second based on HUGO Gene Nomenclature Committee (HGNC)-approved designations of transcribed HML-2 loci [59]. HERV-K113 and the three bottom-most sequences are HML-2 sequences not present in the human reference genome [2,60]. Also note that locus chr3q27.2_ERVK-11 harbors a fused Gag-Pro ORF that extends approximately 700 aa in the N-terminal direction. Locus 7p22.1_ERVK-6 represents the protease sequence used for in vitro and in vivo experiments in this study vivo also showed evidence of processing by HML-2 Pro. Our selection of candidate proteins for in vitro testing involved a filter for certain amino acids in positions P1 and P1' of observed cleavage sites, and a specific molecular mass range due to technical limitations of the experimental in vitro transcription/translation system used for verification. Even when assuming favored amino acids in positions P1 and P1' to be required for cleavage by HML-2 Pro, one still has to consider several hundred human proteins as potentially processed by HML-2 Pro (see Fig. 3a). Furthermore, our TAILS analyses examined human proteins expressed in HeLa cells. Preparation of protein lysates likely involved systematic loss of some protein species because of inadequate lysis conditions for those proteins, thus very likely resulting in an incomplete sampling of the HeLa proteome. Our analysis likely also missed human proteins expressed at very low levels, or not at all, in HeLa cells. A recent study of NCI-60 cell lines identified~5600 human proteins as the core (cancer) proteome, another~5000 proteins showing a more distinct expression pattern between tissues, and~2000 proteins to be cell line-or tissue-specific and not part of the core proteome [61]. Therefore, TAILS experiments utilizing cell lines other than HeLa can be expected to identify a considerable number of additional proteins as (candidate) substrates of HML-2 Pro. We therefore hypothesize that even more human proteins than observed in our experiments are potential substrates of HML-2 Pro.
We verified processing by co-expressed HML-2 Pro for about two-thirds of selected candidate proteins in vitro and the great majority in vivo. For the latter, levels of processing ranged from slight to complete reduction of full-length candidate protein, sometimes accompanied by (more or less) stable presumed processing products. We conclude that observed reduction of full-length candidate proteins was not due to cell death (potentially apoptosis, see the Background section) triggered by HML-2 Pro and activation of caspases that then process candidate proteins. FACS analysis indicated that the majority of transfected cells were still alive at up to 48 h. Therefore, expressing HML-2 Pro does not inevitably cause cell death. Furthermore, apoptosis triggered by Staurosporin neither reduced amounts of full-length candidate protein nor generated smaller processing products, as is the case when expressing HML-2 Pro. It was furthermore reported previously that an HA-tag can be cleaved by caspase-3 and -7, causing loss of immunoreactivity for HA-tagged proteins [62]. We ruled out the possibility that observed loss of HA-tagged candidate proteins in our experiments is due to such HA-tag processing. First, our in vitro experiments demonstrated processing of candidate proteins by HML-2 Pro, the specificity of which was further demonstrated by reduced processing in the presence of Pepstatin A. Second, reduced levels of full-length protein were also observed for candidate proteins carrying epitope-tags other than HA. Third, FACS data show that the majority of HML-2 Pro expressing cells are still alive after > 30 h, thus apoptosis was not triggered in those cells. Fourth, induction of apoptosis by Staurosporin, together with activation of caspases (− 3 and − 7), did not reduce amounts of HA-tagged full-length candidate protein. Fifth, while cell death observed for HIV Pro was described as apoptosis [38,63], the specific mechanism through which HML-2 Pro-expressing cells die remains to be investigated. Our findings indicate that caspase-3 is present at only low amounts in HML-2-Pro-expressing cells (not shown).
Our findings strongly argue for HML-2 Pro being enzymatically active in vivo and further corroborate processing of human proteins by HML-2 Pro in TAILS experiments at pH 7. There is additional evidence for HML-2 Pro being enzymatically active in vivo as indicated, for instance, by processing of HML-2 Gag protein in vivo (see above). HIV-1 Pro has been detected in membranes, mitochondria and cytoplasm, and was also shown to be active in the cytoplasm [34,63]. Our analysis identified numbers of putative cleaved human proteins that localize to cytosol, membrane, mitochondria, and other organelles, based on GO-terms. We found EGFP-tagged mutant HML-2 Pro, that is unable to selfprocess from the EGFP-tag, to localize strongly to the nucleus, as well as in the cytoplasm of U2OS osteosarcoma cells and HEK293T cells when examined by fluorescence microscopy (Additional file 2: Figure S5). Likely, HML-2 Pro also localizes to the cytosol in cell types other than U2OS and HEK293T and thus could process proteins localizing to, or trafficking through, the cytosol. Whether HML-2 Pro also localizes to, and is active in, other cellular compartments remains to be investigated. HML-2 Pro likely would be enzymatically most active in compartments such as secretory granules, late endosomes and lysosomes for which pH 4.7 to 5.5 has been reported [48].
Human proteins identified as substrates of HML-2 Pro participate in a diverse array of cellular processes as assessed by GO-term analyses [49,50]. Our GO-term analyses served to compile biological information on proteins identified in our proteomics experiments. Approximately 5 times more cytosolic proteins and 2 times more nuclear proteins than expected by chance were identified in TAILS experiments. However, it currently appears unlikely that HML-2 Pro preferentially processes respective proteins. Human proteins identified as substrates of HML-2 Pro furthermore considerably overlap with cancer-relevant genes based on COSMIC (Catalogue Of Somatic Mutations In Cancer) [53] and with Mendelian disease phenotypes as revealed by OMIM [54] (Additional file 1: Tables S3, S4). HML-2 Pro expression might thus impact cell biology in various ways and contribute to disease by affecting one or more cellular processes.
HML-2 Gag-Pro precursor protein, from which Pro self-processes, is translated via an occasional ribosomal frameshift between Gag and Pro ORFs. Compared to Gag, lesser amounts of Pro are thus likely produced in cells. For the purposes of our validation experiments, we expressed HML-2 Pro from a subregion of the Pro ORF from which Pro self-processed. The actual amounts of enzymatically active Pro in cells and tissues expressing HML-2 are currently unknown. However, since HML-2 Pro is an enzyme, a relatively small amount of active Pro could have a significant impact on cell biology when expressed over long periods of time.
Preliminary data indicate that HML-2 Pro is detectable by a rabbit polyclonal anti-HML-2-Pro antibody [19] in cell lines known to overexpress HERV-K(HML-2) (Additional file 2: Figure S6). Furthermore, transient expression of EGFP-Pro-mut (Pro does not self-process from the precursor, see Fig. 6Ab) in such cell lines results in processing of the Pro portion. The amount of such processing can be reduced in the presence of HIV Pro inhibitor Indinavir (Additional file 2: Figure S6). It thus can be concluded that active HML-2 Pro that is present in those cells processes EGFP-Pro-mut.
Without further specific experiments, the cellular consequences deriving from processing of many human proteins by HML-2 Pro remain speculative. Our identification of proteins as potential substrates for HML-2 Pro processing lays groundwork for a number of specific experiments. Disease conditions involving known or suspected HERV-K(HML-2) misregulation or upregulation should be of greatest interest, including, for example, some cancers and amyotrophic lateral sclerosis [7,64]. To the best of our knowledge, a functional role for HML-2 Pro in such diseases has not yet been explored.
Expression of HML-2 Pro in disease conditions will depend on which HERV-K(HML-2) loci are transcribed as only a subset of HML-2 loci appears capable of producing active protease. Our analysis indicated 6 currently known reference and 1 non-reference HML-2 sequences potentially capable of producing active protease. Alleles affecting Gag and Pro ORFs were previously shown for locus 7p22.1_ERVK-6 [65], thus only certain alleles of that locus would encode active protease. Hitherto unidentified alleles of some other HML-2 loci likewise may possess proteasecoding capacity. It is also conceivable that some Pro ORFs are translated without Gag-Pro ORF ribosomal frameshifts or through translational starts within C-terminal Gag ORF portions. Frameshift-causing pseudoknot RNA structures may also influence protease-coding capacity of HML-2 loci. In any case, consideration of HML-2 Pro in a particular disease should include identification of HML-2 loci actually transcribed along with their protease coding capability. For instance, Pro-encoding HML-2 loci with a Gag ORF, specifically loci 3q27.2_ERVK-11, 5q33.3_ ERVK-10, 6q14.1_ERVK-9, 7p22.1_ERVK-6, and 8p23.1_ ERVK-8 (see Fig. 7), were previously identified as transcribed in GCT tissues and/or the GCT-derived cell line Tera-1 [66][67][68]. Loci 5q33.3_ERVK-10 and 7p22.1_ERVK-6 were identified in the context of amyotrophic lateral sclerosis, yet factual overexpression of HERV-K(HML-2) in ALS is currently debated [69][70][71][72].
We further note that our findings may also have implications for better understanding biological consequences of certain non-human endogenous retroviruses. For instance, endogenization of Koala endogenous retrovirus (KoRV) in Koalas (Phascolarctos cinereus) is ongoing, and KoRV-positive animals develop serious, life-threatening diseases, in particular malignant neoplasias [73]. The mechanism(s) of KoRV viral pathogenesis is poorly understood. One might hypothesize that disease-relevant Koala cellular proteins are processed by KoRV-encoded protease thus contributing to disease development.
Taken together, our findings for HERV-K(HML-2) Pro call for further experiments to better understand the relevance of endogenous retrovirus-encoded protease in health and disease in human and other species.

Conclusions
Retroviral proteases are known to process cellular proteins. While functionally less relevant in the case of expression of exogenous retroviruses, constitutive expression of protease encoded by an endogenous retrovirus is potentially more consequential if processing of cellular proteins affects cell physiology. Employing specialized proteomics technologies followed by additional experimental verification, we suggest that retroviral protease of diseaseassociated human endogenous retrovirus HERV-K(HML-2) processes numerous cellular proteins in vitro and in vivo, with many of those proteins known to be disease-relevant. Deregulated transcription of HERV-K(HML-2), as reported for various human diseases, could result in expression of HERV-K(HML-2) Protease and consequent processing of various cellular proteins with unknown physiological consequences and disease relevance. Our study provides an extensive list of human proteins potentially deserving further specialized investigations, especially relating to diseases characterized by deregulated HERV-K(HML-2) transcription. Disease-relevance of endogenous retrovirus-encoded protease may also be considered in non-human species.

Plasmid constructs for prokaryotic and eukaryotic protease expression
We generated plasmid constructs for prokaryotic expression of HERV-K(HML-2) protease (Pro). The coding region, including flanking sequence regions and selfprocessing sites of enzymatically active HERV-K(HML-2) Pro, as encoded by the previously described HERV-K(HML-2.HOM) provirus (nt 3277-3769; GenBank acc. no. AF074086.2) [24], was cloned in-frame into pET11d prokaryotic expression vector (Novagen). To do so, the particular region was amplified by PCR from a HERV-K(HML-2.HOM) provirus previously cloned in pBluescript [24]. The forward PCR primer added an NheI site and the reverse primer added a stop codon and a BamHI site to the PCR product. The PCR product was subcloned into pGEM T-Easy vector (Promega). The insert was released by an NheI/BamHI digest and cloned inframe into NheI/BamHI-digested pET11d plasmid (Novagen) giving rise to pET11dPro.
For eukaryotic Pro expression, nt 3415-3946 of HERV-K(HML-2.HOM) were amplified by PCR, with the forward primer adding a BamHI site, a spacer and a Kozak consensus sequence and the reverse primer adding a BamHI site. The PCR product was likewise subcloned into pGEM T-Easy vector, followed by release of the insert by a BamHI digest and cloning into a BamHIdigested phCMV eukaryotic expression vector, giving rise to phCMV-Pro-wt.
For the eukaryotic expression of EGFP-pro fusion protein nt 3415-3946 of HERV-K(HML-2.HOM) were amplified by PCR, with both forward and reverse primers adding a BamHI site each. The PCR product was subcloned into pGEM T-Easy vector, followed by release using BamHI and cloning into BamHI-digested pEGFP-C1 in frame with the EGFP ORF, giving rise to pEGFP-Pro-wt.
Note that the HERV-K(HML-2.HOM) Pro region used for generation of expression vectors included a known N-terminal auto-processing site [20], thus allowing for release of active HERV-K(HML-2.HOM) Pro from a precursor protein, e.g. EGFP-Pro.
Following the cloning strategies used for the design of wild-type Pro containing-plasmids, we also generated plasmids containing a mutated protease, specifically pET11dPro-mut, phCMV-Pro-mut, and pEGFP-Promut. Enzymatically inactive Pro variants were generated by PCR using Phusion polymerase (New England Biolabs) and wt-Pro in pGEM T-Easy vector as the template, followed by re-ligation of PCR products. One of the two PCR-primers introduced the desired mutation. Specifically, we generated a mutant with a D → N change in the conserved DTG motif and, only for the prokaryotic expression, another mutant with a R → K change in the GRDLL motif. Both mutants were previously shown to render HML-2 and HIV-1 protease inactive [44,74]. Plasmid constructs were verified by Sanger sequencing.
Plasmids for eukaryotic expression of epitope-tagged human cellular proteins presumably processed by HERV-K(HML-2) pro Full-length coding sequences for HSP90AA1, CIAPIN1, C15orf57, MAP2K2 and TUBA1A were obtained from GE Healthcare/Dharmacon and cloned into pcDNA3 with a human influenza hemagglutinin (HA) tag added during the cloning procedure. To do so, each full-length ORF was amplified by PCR. The forward PCR primer was the same as the one used for generation of PCR products for in vitro translation of proteins (see above). The reverse PCR primer added an HA-tag in frame at the ORF's 3′ end. The PCR product was cloned into pGEM T-Easy, released by a NotI digest and cloned into NotI-digested pcDNA3 vector. Clones were verified by Sanger sequencing.

Prokaryotic expression and purification of HERV-K(HML-2.HOM) protease
Expression and purification of HML-2 Pro followed a previously described protocol [44] with minor modifications.
In brief, Escherichia coli BL21(DE3) cells harboring plasmid pET11dPro (see above) were inoculated into 100 ml Luria-Bertani (LB Amp ) medium supplemented with ampicillin (100 μg/ml) and incubated overnight at 37°C. 20 ml of the overnight culture were then inoculated into 1 L of LB Amp medium and incubated at 37°C until A 600 = 0.6 was reached. Expression of HML-2 Pro was induced by addition of isopropyl-1-thio-β-D-galactopyranoside (Sigma) at a final concentration of 0.4 mM. After 3 h at 37°C bacterial cells were pelleted by centrifugation at 6800 g for 30 min at 4°C. Cells were resuspended in 50 ml of pre-cooled 5× TE buffer

Preparation of Hela total cell lysate
Human cervical adenocarcinoma (HeLa) cells were cultured at 37°C and 5% [v/v] CO 2 in Dulbecco's Modified Eagle's Medium supplemented with 10% [v/v] heat inactivated fetal calf serum, 50 μg/ml penicillin, and 50 μg/ ml streptomycin. A total of 1.4•10 8 cells grown to near confluence in eight 160 cm 2 tissue culture flasks were washed with 1× PBS and detached by trypsinization. Cells were collected in 20 ml 1× PBS, pelleted for 5 min at 250 g, resuspended in 0.5 ml of 5 mM MES, pH 6.0 supplemented with protease inhibitors (cOmplete,Mini, EDTA-free, Roche) at the recommended concentration, and subjected to lysis by three freeze-thaw cycles. The protein lysate was centrifuged at 4°C for 30 min. at 16, 100 g. The supernatant was stored in aliquots at − 80°C. Protein concentration was measured using the Biorad DC Protein Assay Kit.
Incubation of HeLa total cell lysate with purified HERV-K(HML-2) protease and subsequent TAILS analysis In a total reaction volume of 2 ml, we incubated 2 mg of HeLa proteins with purified HML-2 Pro (200 nM final concentration) in a buffer composed of 0.1 M PIPES, 1 M NaCl, and 2% [v/v] DMSO, pH 5.5 or pH 7. Two replicates were performed. Additional control reactions for each condition contained Pepstatin A at 200 μM that was concluded to effectively inhibit HML-2 Pro activity. All reactions were incubated for 75 min. at 37°C and stored at − 80°C until TAILS analysis (see below). TAILS was performed essentially as described previously [42,43], comparing HML-2 Pro-treated HeLa total cell lysate to control reactions for the two replicates performed at pH 5.5 and at pH 7. An Easy-LC 1000 coupled to a Q-Exactive plus mass spectrometer was used for LC-MS analysis. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE [79] partner repository (dataset identifiers PXD010159 and PXD013296).

In vitro translation of proteins presumably processed by HERV-K(HML-2) protease
The coding region of full-length protein was PCR-amplified from purified plasmid template DNA (see above). Forward primers were located at the start codon and included a 5′ extension consisting of a BamHI restriction site, a T7 promoter, a spacer and a Kozak consensus sequence for translation-initiation (5′-GGATCC|TAATA CGACTCACTATAGGG|AACAG|CCACCATG [cDNA candidate protein]-3′). The reverse primers added sequence encoding a human influenza hemagglutinin (HA) epitope tag and a stop codon (5′-TTA|AGCGTAATCTG-GAACATCGTATGGGTA[cDNA candidate protein]-3′) at the end to the PCR product's protein coding sequence. The standard PCR mix contained primers at a final concentration of 0.25 μM, 100 μM dNTP mix, 2.5 U Taq polymerase (Sigma), and 5 ng template DNA in a final reaction volume of 50 μl. PCR cycling conditions were as follows: 3 min. at 94°C; 30 cycles of 50 s. at 94°C, 50 s. at 56°C, 3 min at 72°C; and a final 10 min. at 72°C. PCR products directly served as template using a TnT T7 Quick Coupled Transcription/Translation System (Promega) following the manufacturer's recommendations. Briefly, 2.5 μl of a PCR reaction were added to 22 μl of TNT T7 PCR Quick Master Mix containing either 0.5 μl of HPLCpurified, translation-grade L-35 S-methionine (370 MBq, 10 mCi/ml; Hartmann Analytic, Braunschweig, Germany) or 0.5 μl of 1 mM "cold" methionine, incubated for 90 min. at 30°C and frozen at − 20°C immediately afterwards.

Incubation of candidate proteins with purified HERV-K(HML-2) protease in vitro
In vitro transcribed/translated radioactively or HA-tag-labeled candidate protein was incubated with purified HML-2 Pro to potentially confirm in vitro processing by HML-2 Pro. Briefly, 1 μl of the TNT® T7 in vitro transcription/ translation reaction was incubated with 400 nM purified HML-2 Pro in a buffer of 1 M NaCl, and 0.1 M PIPES pH 5.5, for 180 min. at 37°C in a final volume of 16 μl. Control reactions included Pepstatin A at 400 μM. The entire reaction was subjected to SDS-PAGE (see below).

SDS-PAGE and detection of labeled proteins
In the case of Western blots shown in Fig. 6a, b (top) and Additional file 2: Figure S3 (those with Coomassie staining), between 15 and 20 μg of each total protein sample, with equal amounts of each protein sample loaded per candidate protein examined, were subjected to reducing SDS-PAGE using a Bis-Tris buffer system. Protein lysates were mixed with 4× NuPAGE LDS Sample Buffer (Thermo Fisher Scientific) and DTT at 50 mM final concentration, denatured for 15 min. at 65°C, and briefly centrifuged. Protein samples were loaded and separated in 10% or 12% Bis-Tris polyacrylamide gels at 180 V in XCell SureLock™ Mini-Cells using NuPAGE MES SDS or MOPS SDS Running Buffer and optional NuPAGE Antioxidant.
Polyacrylamide gels with radiolabeled proteins were fixed for 30 min. in 50% [v/v] methanol/10% [v/v] acetic acid, then soaked in distilled water three times for 10 min each. Gels were dried for 2 h at 80°C under vacuum and subsequently exposed to a Storage Phosphor screen (Amersham Biosciences) at room temperature for 16 h. The screen was scanned using a Typhoon 9410 scanner (GE Healthcare).
Detection of cold proteins was done by Western blot. Following SDS-PAGE, proteins were transferred onto Hybond 0.2 μm PVDF membrane (Amersham/GE Healthcare) using a XCell II™ Blot module and NuPAGE Transfer Buffer in the presence of NuPAGE Antioxidant. Blot membranes were blocked in 1× TBS, 5% [w/v] nonfat dry milk for 1 h and incubated overnight at 4°C with an α-HA rat monoclonal antibody diluted 1:500 in 1× TBS/5% [w/ v] nonfat dry milk. Detection of proteins of interest employed antibodies specific for HA-tag, EGFP, and HML-2 Pro [19]. Secondary antibody incubation was done using peroxidase-coupled rabbit α-rat IgG (Sigma-Aldrich; A5795) or goat α-rabbit IgG (Sigma-Aldrich; A0545) each diluted at 1:5000, for 2 h at room temperature. α-HA rat monoclonal (clone 3F10) and rabbit α-rat antibodies were generously provided by Friedrich Grässer, Institute of Virology, University of Saarland. Signal detection was performed using SignalFire™ Elite ECL Reagent (Cell Signaling Technology) and Chemidoc™ Imaging System (Bio-Rad). Image analysis utilized ImageLab 5.2.1 software (Bio-Rad). Loading of equal protein amounts was verified by staining blot membranes with Coomassie Brilliant Blue after the ECL procedure.
Identification of HERV-K(HML-2) loci potentially encoding protease Reference and non-reference HERV-K(HML-2) locus sequences were analyzed for presence of pro ORFs. HML-2 pro is translated via a ribosomal frameshift between the HML-2 gag and pro ORFs. We therefore also analyzed for presence of a gag ORF in respective HML-2 sequences. Pro ORFs of HML-2 loci fulfilling criteria were translated in silico, multiply aligned, and further analyzed for presence of catalytic motifs conserved in retroviral aspartate proteases

Additional files
Additional file 1: Tables S1a-d. Results of TAILS analyses at pH 5.5 and pH 7 showing replicates 1 and 2 for each. Table S2. Selected human proteins with multiple cleavages by HERV-K(HML-2) protease. Table S3. Selected information from the Catalogue of Somatic Mutations in Cancer (COSMIC) for human genes for which encoded proteins were identified as substrates of HML-2 Pro. Table S4. Selected information from the Online Mendelian Inheritance in Man (OMIM) database for human genes for which encoded proteins were identified as substrates of HML-2 Pro. Table S5. Clone identifiers of cloned coding sequences of proteins investigated for processing by HERV-K(HML-2) Protease. (XLSX 1290 kb) Additional file 2: Figure S1. Self-processing of HERV-K(HML-2) Protease during purification. Figure S2. Additional examples of verifications of processing of human proteins by HERV-K(HML-2) Protease in vitro. Figure S3. Additional examples of verifications of processing of human proteins by HERV-K(HML-2) Protease in vivo and documentation of loading controls. Figure S4. Quantification of GFP-positive live cells and exclusion of processed protein products due to caspase activity. Figure S5. Localization of EGFP-Pro-mut in human osteosarcoma U2OS and HEK293T cells. Figure S6. Evidence for presence of HERV-K(HML-2) Protease in cell lines known to express HERV-K(HML-2) at relatively high levels. (PDF 5270 kb)