Soluble expression, purification and characterization of the full length IS2 Transposase
© Lewis et al; licensee BioMed Central Ltd. 2011
Received: 26 August 2011
Accepted: 27 October 2011
Published: 27 October 2011
The two-step transposition pathway of insertion sequences of the IS3 family, and several other families, involves first the formation of a branched figure-of-eight (F-8) structure by an asymmetric single strand cleavage at one optional donor end and joining to the flanking host DNA near the target end. Its conversion to a double stranded minicircle precedes the second insertional step, where both ends function as donors. In IS2, the left end which lacks donor function in Step I acquires it in Step II. The assembly of two intrinsically different protein-DNA complexes in these F-8 generating elements has been intuitively proposed, but a barrier to testing this hypothesis has been the difficulty of isolating a full length, soluble and active transposase that creates fully formed synaptic complexes in vitro with protein bound to both binding and catalytic domains of the ends. We address here a solution to expressing, purifying and structurally analyzing such a protein.
A soluble and active IS2 transposase derivative with GFP fused to its C-terminus functions as efficiently as the native protein in in vivo transposition assays. In vitro electrophoretic mobility shift assay data show that the partially purified protein prepared under native conditions binds very efficiently to cognate DNA, utilizing both N- and C-terminal residues. As a precursor to biophysical analyses of these complexes, a fluorescence-based random mutagenesis protocol was developed that enabled a structure-function analysis of the protein with good resolution at the secondary structure level. The results extend previous structure-function work on IS3 family transposases, identifying the binding domain as a three helix H + HTH bundle and explaining the function of an atypical leucine zipper-like motif in IS2. In addition gain- and loss-of-function mutations in the catalytic active site define its role in regional and global binding and identify functional signatures that are common to the three dimensional catalytic core motif of the retroviral integrase superfamily.
Intractably insoluble transposases, such as the IS2 transposase, prepared by solubilization protocols are often refractory to whole protein structure-function studies. The results described here have validated the use of GFP-tagging and fluorescence-based random mutagenesis in overcoming this limitation at the secondary structure level.
Transposition mechanisms, initially discovered in the IS3 family (see ) have been described as a two-step copy and paste pathway  which is now quite widespread and is found in several other families of insertion sequences, such as IS30, IS21 and IS256 [11–14]. In IS3 family members, IS911 [8, 15] and IS2 (Lewis et al, Protein-DNA interactions define the mechanistic aspects of circle formation and insertion reactions in IS2 transposition, submitted), Step I occurs within a synaptic complex (SC) or transpososome (Figure 1c, SC I) that is formed when the TPase binds to the two ends. In general, however, in these circle-forming elements the first step involves a circularization process (Figure 1c) in which either end (optionally) is the substrate for an asymmetric cleavage reaction that leads to a donor-to-target intrastrand joining reaction near the other end to form a branched figure-of-eight (F-8) structure [6, 16–18] Host replication mechanisms  convert the F-8 into a covalently closed double stranded minicircle (Figure 1c, HR) with the abutted ends generally separated by one or more base pairs derived from the host DNA flanking the target end. These abutted ends constitute the minicircle junction (MCJ) at which a powerful promoter (Figure 1b, lower; Pjunc [19–21]) is assembled and generates the higher levels of TPase needed for the formation of the second synaptic complex (Figure 1c, SC II).
In SC II, the MCJ, a reactive junction, is the substrate for strand transfer reactions; it is cleaved at the abutted termini of IRR and IRL, creating 3'OH groups which permit both ends to function symmetrically as donors (Figure 1c, Step II). Thus it has been proposed that intrinsically different transpososomes must be assembled at each of the two steps [7, 8]. This is particularly true for IS2. Although both right and left ends in other IS3 family elements, such as IS911 , IS3  and IS150 , possess donor function in Step I reactions, in IS2 the right end is the exclusive donor and the left end the only functional target; this type of asymmetry has also been described for copies of IS256 in Tn4001 . In IS2, the left end has evolved through altered residues at positions 2 (creating a TA-3' terminal dinucleotide), 5 and 7 and an additional base pair at position 9 in its catalytic domain (Figure 1b, upper) to become a unique target which ensures accuracy of the joining reaction through the insertion of a single base pair between the abutted ends . This accuracy is essential for the formation of an MCJ with a mandatory 17 bp Pjunc spacer between the -10 Pribnow box and an outwardly reading -35 motif in the right end . Despite these changes in the catalytic domain of IRL which suppress donor function in Step I, IRL does possess the donor function  needed for strand transfer to the target site in the Step II SC.
IS3 family TPases have been identified as members of the TPase/retroviral integrase superfamily (referred to as RISF) of polynucleotidyl transferases [9, 24–27] and functional comparisons of their protein-DNA interactions with those of other RISF TPases should be useful. To date, a complete and comparative biophysical analysis of the protein-DNA interactions in fully formed Step I and Step II SCs with protein complexed to the protein binding and catalytic domains of the inverted repeats (IRs) has not been reported for any IS3 family member or other circle-forming elements, primarily due to the difficulty in isolating full length proteins capable of binding efficiently and generating fully formed complexes with the IRs [8, 28]. Partial footprints of the ends have however been carried out with cell-free extracts in IS2  and similar analyses carried out with the N-terminal half of the truncated protein have been reported for IS911 [8, 15, 17] and IS30 . In order to carry out a detailed biophysical study with fully formed complexes in IS2 it was first necessary to resolve the problem of the intractable insolubility of the TPase.
We report here a protocol utilizing a green fluorescent protein (GFPuv) tag that generates an IS2 TPase derivative that functions normally in vivo. We show for the first time that preparation under native conditions results in the recovery of a full length, soluble derivative that, when partially purified, binds very efficiently to cognate DNA sequences in vitro. This binding utilizes residues at both the N- and C-termini of the protein and is shown elsewhere to generate fully formed SCs with double stranded cognate IRR, IRL and MCJ sequences, with TPase bound to both the protein binding and catalytic domains of the ends (Lewis et al, Protein-DNA interactions define the mechanistic aspects of circle formation and insertion reactions in IS2 transposition, submitted).
Distribution, from 23 recovered mutants, of 25 randomly induced mutations in the four domains of the IS2 OrfAB TPase
Purification of the IS2 TPase by conventional methods
Conventional methods for purifying active full length IS2 TPase under native conditions generated insoluble protein as inclusion bodies. Although standard solubilization protocols [35–37] and attempts at directed evolution  were unsuccessful, the protein was easily purified to homogeneity using denaturing protocols and refolded either on-column [39, 40] or in solution [41–43] in native buffers. In all cases, these TPase preparations bound very poorly to oligonucleotide substrates containing the cognate IRR DNA sequence in gel-retardation studies (for example see Figure 5a, lane 2).
Creation of an IS2orfAB::GFP fusion construct
Overexpression of the putative IS2OrfAB-GFP fusion protein
Electrophoretic mobility shift assays with IS2 OrfAB-GFP
Preparations of the OrfAB-GFP fusion protein purified to near homogeneity also bound poorly to cognate DNA sequences in gel retardation assays (Figure 5a, lane 3). Neither OrfA nor host factors, such as the bacterial histone-like protein, HU and integration host factor [45–47] enhanced binding efficiency (data not shown). On the other hand, the partially purified preparations of OrfAB-GFP shown in Figure 4a, lanes 2-4, generated results in which all of the DNA was driven into the complex (Figure 5b, lane 2). A similar result was obtained with the crude extract from the overexpression of pTW2orfAB::GFP (Figure 5b, lane 3). The multimeric nature of these complexes has been demonstrated in concurrent footprinting studies in which complexes similar to those shown in Figure 5b were created with MCJ DNA substrates containing abutted IRR and IRL ends. There, the protein binding domains and the catalytic domains of the two ends were protected along their entire lengths, suggesting that the complex consisted of at least a dimer (Lewis et al, Protein-DNA interactions define the mechanistic aspects of circle formation and insertion reactions in IS2 transposition, submitted).
Fluorescence levels can be used to isolate IS2 TPase loss-of-function mutants leading to a structure-function analysis of the protein
We asked whether loss-of-function mutants of the IS2 TPase could be isolated as faster growing more brightly fluorescing colonies in order to test the idea that the low level of fluorescence of slow growing colonies with the pLL2522 plasmid might be due to the toxicity of the fusion protein, as well as to explore the possibility that we could obtain and analyze random mutations along the entire length of the protein. Random mutagenesis of IS2orfAB was accomplished with the PCR-based Genemorph II Random Mutagenesis kit (Stratagene, Santa Clara, CA) using very low, low and medium mutation rates. PCR products were cloned into the Eco RI/Nhe I sites of pGLO-ATG2 and the ligation products transformed into XL1blue cells (Stratagene). After 72 hours at 37°C, faster growing, more brightly fluorescing colonies were observed among a background of less intensely fluorescing colonies (Figure 3b). Recovery and analysis of the plasmids pLL2524-XXX (that is, 001-110) from these brighter fluorescing isolates (referred to here as GMF strains 1-110) showed that they carried mutations at frequencies which corresponded to the protocol-based mutation rates.
In vitro electrophoretic mobility shift assays, binding efficiencies and in vivo LacZ papillation assay-determined transposition frequencies from IS2 wild type and mutant (GMFa, d) isolates
Description of wild type and mutant GMF plasmids/strains.
Description or mutation
Domain-location of mutationsb
Total number of colonies
Number of trials (n)
Number of colonies with papillae
Total number of papillae
0.18 ± 0.16
1.25 ± 0.23
WT fusion protein
1.28 ± 0.09
H + HTH
1.31 ± 0.20
H + HTH
H + HTH
0.32 ± 0.17
H + HTH
0.23 ± 0.08
H + HTH
0.07 ± 0.07
H + HTH
H + HTH
0.27 ± 0.11
H + HTH
H + HTH
1.26 ± 0.18
0.13 ± 0.09
0.21 ± 0.15
0.09 ± 0.11
1.98 ± 0.21
0.20 ± 0.26
0.02 ± 0.05
Sequence analysis of the wild type IS2 TPase and secondary structure analysis of the IS3 family TPases
A PCOILS analysis for coiled coils [57, 58] predicted the presence of a coiled coil motif (Figure 9a) in the IS2 TPase between residues 73 and 100. Lei and Hu , using deletion derivatives of IS2 OrfA, showed that a sequence between residues 58 and 105 was responsible for dimerization and they as well as Haren et al. , predicted that the sequence between residues 73 and 100 of IS2 OrfA possessed an atypical heptad repeat showing some similarities to the canonical leucine zipper (LZ) of DNA binding proteins. In this study, however, a probe for the potential for a LZ within the first 120 residues of IS2 OrfAB was scored at a probability of zero using the 2ZIP server  even though the existence of a coiled coil domain between residues 73 and 100 was confirmed with a probability of 0.8 to 1.0. Here, we show through the use of loss-of-function point mutations how this sequence functions as an LZ-like motif and describe its role in the oligomerization, DNA binding and transposition properties of the IS2 TPase.
The alignment corresponding to IS2 residues 103 to 400 in Figure 7 matches that previously published for the IS3 family TPases and the retroviral integrases , as well as for the IS3, IS4 and IS6-family TPases and integrases from several retroelements residues 236-354 . The latter sequence, the CAS, is characterized by the presence of an invariant triad of catalytic carboxylases, the D, D(35)E motif [9, 27, 62, 63]. We asked what degree of correlation might exist between the aligned residues 101 to 400 in Figure 7 and a structure-based alignment of the sequences of the α helices and β strands generated by PSIPRED analysis in Figure 8; that is, how similar would these elements be in sequence and length in the IS3 family TPases and in the HIV-1 and Rous sarcoma virus (RSV) integrases.
Structure-based sequence alignments of residues corresponding to residues 236 to 398 in IS2 for IS3 family TPases and the HIV-1 and RSV integrases showed a series of five well-aligned α helices and five equally well-aligned β strands (Figure 10b), showing almost perfect conservation in their lengths, with high levels of identity (the presence of the same amino acid in at least 85% of the eight sequences) and high proportions of functionally conserved residues per element (approximately 50% in the β strands and 25% in the α helices). The significance of this in this study is that all but one of the eight random mutations recovered in this domain occurred at these conserved residues.
These α helices and β strands occur in a conserved order (Figure 10c) characteristic of the integrases and of the TPases with the DDE motif of two aspartates and a glutamate, for example, Mu , Tn5 and the IS1 family [65, 66]. In IS3 family TPases, α helices 2 and 3 in the integrases are present as a single helix (α helix 2) and it is interesting that remnants of α helix 2 of the integrases are seen in IS2 and IS407 but specifically in IS407, as two well-conserved residues in the first three amino acids of the single α helix (Figure 10b). In IS911 of the IS3 family, this group of tightly conserved elements has been proposed to be the putative CAS [2, 24, 34].
The three-dimensional structure of this unit, the catalytic core, has been demonstrated in several members of the TPase/RISF, including the TPases of the DDE family, such as Mu  and Tn5 , the integrases, such as HIV-1 [68–71] and the avian (ASV) and Rous (RSV) Sarcoma viruses [72, 73] and other nucleases, for example, RNase H1 [74, 75] and RuvC . For comprehensive reviews see [25, 26, 77]. This catalytic core is characterized by a five-stranded partially buried β sheet of mixed parallel and antiparallel elements with a polar face, with six α helices distributed on either side of it. The two aspartate residues of the DDE catalytic triad are located on adjacent strands of the β sheet (numbers 1 and 4) with the glutamate residue assigned to the closely located α helix 4 . We show here that randomly induced mutations in this putative catalytic core that affected residues other than the DDE alter the function of this motif in both positive and negative ways, identifying additional signatures characteristic of the catalytic core and supporting the intuitive contention that, in the IS3 family, it is organized and functions like the three-dimensional structure in the RISF; additional mutations also provide insights into its role in both the regional and the global binding strategies of the protein.
Effect of TPase mutations on TPase binding efficiencies and on in vivo transposition frequencies of IS2
Eleven of the twenty-five mutations (from the twenty-one single mutants and two double mutants) were within the putative binding domain, five were located in the coiled coil domain, eight in the putative CAS and one in the middle interval (Table 1; see also Figure 8 for an overview of the locations of these mutations within the secondary structures of the TPase). The binding efficiencies of the partially purified TPases of 22 of the mutant proteins were studied by EMSA (Figure 6) using a pair of annealed oligomers (50 bp in length) containing 41 bp of cognate DNA of the IRR . The substrate was labeled at the 5' end of the upper strand with γ32P (see Methods). A summary of the binding efficiencies together with results of in vivo transposition frequencies of all 23 mutants (determined from lacZ transposition assays) is shown in Table 2.
The putative binding domain
Nine mutants with substitutions in the putative binding domain are described in Table 2 (rows 4-12). Binding data are shown in the EMSA gel (Figure 6, yellow highlights). Proteins from three of the mutants, GMF isolates 28 (S44N), 29 (L58I) and 34 (R13H) (Figure 6a, lanes 7-9) formed no complexes, indicative of structural defects. The TPase from the double mutant, GMF 36 (R37Q/S44N- Figure 6b, lane 2), however, showed a partially restored, unstable, dissociated complex, absent in isolate 28 (S44N). Two GMF isolates, 9 (R50H; Figure 6a, lane 5) and 13 (S57G; Figure 6c, lane 2) also produced proteins which formed mostly dissociated complexes, likely indicative of deficiencies in binding reactions to the DNA substrate (see discussion). All six of the mutants with TPases completely defective or deficient in binding, (GMF isolates 9, 13, 28, 29, 34 and 36) had significantly reduced or no detectable levels of transposition (Table 2, rows 5-10). The remaining three mutants with substitutions in the putative binding domain, GMF isolates 4 (A42T; Figure 6c, lane 3), 37 (W49R) and 40 (V35L) (Figure 6b, lanes 3 and 5) showed marginal or no observable effects on binding efficiency. Two of these three mutants, GMF isolates 4 (A42T) and 40 (V35L) (Table 2, rows 4 & 12) had in vivo transposition frequencies (approximately 1.3) that were statistically comparable to those of the wild type controls, two versions of which, one fused to GFP (Table 2, row 3) and the other not (Table 2, row 2), showed identical transposition frequencies within experimental error.
The third mutant with little or no loss of binding efficiency, GMF 37, (W49R) was the single exception to the consistency in the relationship between binding efficiency and transposition frequency described above (Table 2, row 11). While this TPase derivative was quite proficient in binding to the substrate, the substitution completely abolished transposition. The apparent inconsistency in these properties of GMF 37 can be explained by the fact that W49 in IS2, which is one of the most highly conserved residues in the IS3 family (Figure 7 and ) and is also conserved in the homeodomain proteins , may play a more global role in effecting transposition. It may not simply be limited to a binding domain function and is not likely to be involved in DNA sequence recognition in helix 3 (see discussion).
The coiled coil motif
Five of the randomly induced mutations (in GMF isolates 6, 7, 18, 94 and 106) fell into the coiled coil segment (Table 2, rows 13-17 and Figure 10, blue highlights). Although isolate GMF 18 carries the double substitutions A42T+L97H, its phenotype, that is the loss of transposition and an unstable complex (Table 2, row 15; Figure 6a, lane 6) should be allocated to L97H, since analysis of the A42T mutation showed that the transposition frequency of the GMF 4 mutant and the binding efficiency of its protein are identical to those of the wild type. Another mutant, GMF 106 (L83V; see Figure 6d, lane 5), showed complete loss of binding proficiency and two others, GMF 6 (Q79L; Figure 6a, lane 4) and 7 (N94D; Figure 6c, lane 5) showed marked dissociation of their complexes in the EMSA gel. All five mutations effectively eliminated transposition (Table 2, rows 13-17).
The catalytic active site
Eight of the twenty-five mutations occurred in the proposed CAS of the protein (see GMF isolates 3, 22, 24, 31, 38, 68, 71 and 96 in Table 2, rows 18-25) and seven of them altered conserved residues (Figure 10b). EMSA gel reactions are shown in Figure 6 (green highlights). Three protein derivatives from GMF 22, 24 and 31 (A341P, L266P and V301M (Figure 6e, lanes 2-4) produced no complexes. Three others showed mostly dissociated complexes, GMF 3 (R291H; Figure 6a, lane 3), GMF 68 and 71 (H267D and E391K; Figure 6d, lanes 2-3). Two mutant derivatives with proficient binding reactions were GMF 38 (A341T; Figure 6b, lane 4) and GMF 96 (W237R; Figure 6f, lane 4); of these, the transposition frequency in the former was enhanced by about 50% and abolished in the latter (Table 2, rows 22 and 25). Transposition was eliminated in the six mutant derivatives with deficient or completely defective binding reactions (Table 2, rows 18-21 and 23-24). The locations of these substitutions in three α helices and three β strands of the CAS are shown in Figure 10b. Two of the eight substitutions altered residues conserved only in the IS3 family (R291H and V301M), one affected a non-conserved residue (H267D) and the remaining five substitutions resulted from alterations of residues conserved in the RISF.
The six TPase derivatives whose binding efficiencies were partially or completely reduced give some insight into the role of the putative catalytic core's contribution to both regional (catalytic domain) and global (catalytic and binding domains) binding of the TPase. Three mutations eliminated global binding, indicative of the structurally destabilizing effects of the substitutions. The A341P substitution located one residue from E342 of the DDE catalytic triad altered a residue at a position normally conserved for a hydrophobic amino acid in α helix 4 of the RISF. The presence of the helix-breaking proline had a devastating effect on binding and most of the DNA remained uncomplexed (Figure 6e, lane 2). Binding of the protein was completely eliminated in two other derivatives (Figure 6e, lanes 3-4). First, the L266P substitution occurred in β strand 3 where proline replaced a hydrophobic residue that is essentially conserved in the RISF; secondly, V301M changed another very hydrophobic residue that is conserved the IS3 family as either a valine or leucine in β strand 4 and is located adjacent to the second Asp of the DDE triad in the RISF (D306 in IS2).
EMSA gels of TPase derivatives with three other substitutions showed reactions in which unstable complexes were formed, suggestive of a reduction in the binding affinity of the CAS for its DNA contacts. R291H altered a positively charged residue in α helix 1, which is essentially invariant in the IS3 family, for one which readily assumes a neutral state (Figure 6a, lane 3). E391K substituted a basic residue for one which is essentially conserved as glutamate or glutamine in α helix 6 of the RISF. H267D substituted a negatively charged residue at a non-conserved position in β strand 3 (Figure 6d, lanes 2-3). The combined results from these six substitutions suggest that the catalytic core plays a role not only in binding to the catalytic domain of the end (unstable complexes) but that its integrity contributes to global binding proficiency of the full length protein (see Discussion).
Two mutations which did not affect binding proficiency provided insights into the role of β strand 1 and α helix 4 in facilitating the catalytic functions of the IS2 TPase (Table 2, rows 22 and 25). The 50% increase in transposition frequency of the mutant with the A341T mutation likely results from the substitution of a polar residue at this conserved hydrophobic position in the RISF, creating the potential for an additional specific or stochastic contact with the terminus possessing the CA-3' dinucleotide. The W237R mutation, located three residues from D240, a member of the catalytic triad, replaced a highly conserved aromatic residue in the RISF in β strand 1 with a basic amino acid and completely eliminated transposition without affecting the global binding proficiency. This substitution replaced a residue that is probably involved in positioning the DNA in the catalytic pocket , a change that did not affect the integrity of the β strand (see Discussion).
The middle interval
The V179L (GMF 101) substitution occurred in α helix M5 (Figure 10a). This change disrupted binding (Figure 6f, lane 5) and completely eliminated transposition (Table 2, row 26), a result which suggests that at least α helices M4-M6 of the middle region of the protein, which are aligned with the first three N-terminal helices of the integrase protein (IN), contribute to the overall structural and functional architecture needed to facilitate binding by the protein.
Rationale for soluble expression of the GFP-tagged IS2 TPase
GFP has been used widely as a reporter or biological marker , extensively in fusion constructs to determine the extent of solubility of target proteins, in protein folding assays and in directed evolution [44, 84]. Although its use as an agent to facilitate the soluble expression of proteins that misfold or aggregate when overproduced in Escherichia coli has been approached with caution , success has been reported for a plant actin . We reasoned that, given its robust solubility, it might be used to facilitate soluble expression of the intractably insoluble IS2 TPase under native conditions.
The full length fusion protein achieves very efficient binding to cognate DNA sequences
The inefficient binding to cognate DNA of full length native or GFP-tagged IS2 TPase, purified to homogeneity, contrasts starkly with the extremely efficient binding of the partially purified OrfAB-GFP utilizing residues at both the N- and C-termini of the TPase. In addition, footprinting studies reported elsewhere show that the protein binds to both the protein binding and catalytic domains of IRR, generating fully formed complexes (Lewis et al, Protein-DNA interactions define the mechanistic aspects of circle formation and insertion reactions in IS2 transposition, submitted). In this study we have not explored in detail the reasons for this difference but reports of inefficient binding of full length TPases of insertion sequences are not uncommon. For example, in IS911 [8, 15] and in IS30 [28, 29], both of which transpose via the two-step circle-forming pathway, successful footprinting studies have only been conducted with truncated versions of the Tpase, which retain the DNA binding domain and lack the C-terminus. Inefficient binding was initially also reported for IS50 [87, 88] and in both IS50  and IS911  it has been proposed that this is due to interference of binding domain function by the C-terminus. Recently, a full length calmodulin-binding peptide fusion derivative of the IS256 TPase, which catalyzes circle formation in this element , was shown to bind to the ends, but it did so much less efficiently than N-terminal fragments containing the DNA binding domain, lending additional support to this hypothesis . Other reports of inefficient binding by recombinant TPases in both prokaryotic and eukaryotic transposons, such as IS903 , Tc1  and TAG1 , has led to the speculation that improper folding during the purification process may be the cause of inefficient binding. Our results with the partially purified IS2 TPase suggest that an unidentified component or, speculatively, even the presence of unspecific or IR DNA may be the agent which facilitates and/or maintains proper folding in these TPases.
The DNA binding domain of IS2 OrfAB consists of a three-helix bundle with a defined HTH motif
The location of three α helices, which might comprise the binding domain of the IS2 TPase at positions 13 to 26, 32 to 38 and 43 to 55, by the PHD secondary structure algorithm of PBIL  represents the best fit of our data (compare Figures 9b and 11). The only discrepancy is our decision to include residues 56 to 58 in helix 3 because substitutions S57G and L58I both negatively impact binding and transposition. L58I substitutes a residue whose most pronounced effect is its difficulty in adapting to an α helix conformation because of its branched β carbon for one which shows a distinct preference for being in α helices . The absence of complex formation (Figure 6c, lane 2) suggests that the substitution destabilized the α helix and likely the entire binding domain. We discuss the role that S57 plays in the recognition helix of an HTH motif below. These two substitutions suggest that residues 57 and 58 are within helix 3 or, less likely (given the potential role of S57 described below), are required for the stabilization of the helix. The R13H substitution completely abolished both binding and transposition (Figure 6a, lane 9) by replacing a polar, hydrophilic, positively charged residue that often has a structural role  with one which is less likely to carry a charge, making it likely that helix 1 plays an important role in the structural architecture responsible for binding the cognate DNA sequence in IS2. These data suggest that the binding domain includes all three helices and is comprised of residues 13 to 58 (Figure 11).
The HTH motif predicted by the HTH secondary structure analysis protocol of PBIL  also represents an excellent fit with our data. The motif includes residues M30 to K51 and is associated with helices 2 and 3 of the putative binding domain (compare Figures 9c and 11). The consensus sequence of Pabo and Sauer  which generally characterizes the HTH motif in prokaryotes supports the claim that it resides in helices 2 and 3 (Figure 11). When this consensus sequence [ho-G/A-(X)2]helix 1-[ho-G-ho-X]turn-[(X)3-I/L/V-...]helix2, is applied to residues M30 to L58, (where ho is a hydrophobic residue, and × is any residue) we see a very reasonable fit: [V35-A36-R37-Q38]helix 1-[H39-G40-V41-A42]turn-[A43-S44-Q45-L46....]helix2. The critical residues here (in bold) are, (i) the optional hydrophobics (ho), V35 in helix 1 and H39 and V41 in the turn (histidine has the potential to be buried like a hydrophobic ) and (ii) three conserved hydrophobics, A36 in helix 1, the invariant glycine (G40) in the second position of the turn (both weak hydrophobics) and L46 in helix 3 (Figure 11).
It is interesting that four of the nine randomly induced substitutions in the binding domain affected residues in this consensus sequence. A comparison of the effects of the S44N substitution and of the R37Q/S44N double replacement in helices 1 and 2 respectively of the proposed HTH motif gives some additional insight into the role of these two residues in the stabilization of the HTH motif. Since the drastic effect of S44N (no detectable binding and 80-85% reduction in the transposition frequency, Figure 6a, lane 7) is partially reversed by R37Q/S44N (about 60% and 65% reduction in binding and transposition frequency, respectively, Figure 6b, lane 2), we make the following assumptions: S44 and R37 are likely involved in interhelix H-bonding and contribute to stabilizing the HTH. In the S44N mutant derivative, arginine and asparagine are apparently not as effective in H-bonding, resulting in a destabilized motif. H-bonding by glutamine and asparagine in the double mutant, however, appears to be partially restored, most likely because of the increased capacity of this pair of amino acids to form H-bonds .
The fact that four of the seven mutations which disrupted binding occurred in the second helix of this HTH motif (Figure 11) supports the convention that it is the recognition helix. Two of these substitutions, R50H and S57G, help identify residues that are likely involved in making specific DNA contacts. The R50H substitution in the putative recognition helix produced a protein derivative which generated the partially dissociated complex in Figure 6a, lane 5 and completely eliminated transposition. In this case the positively charged arginine is replaced by an amino acid whose flexibility in shedding its proton allows it to readily assume a neutral state, making it less effective as a residue involved in binding to DNA sequences  and suggesting that R50 plays a pivotal role in recognizing its cognate DNA sequence. Because the IS2 transposition pathway requires separate binding events for each of the two steps, even a moderate reduction in binding would probably have a drastic effect in reducing transposition frequency, as seen with R50H. S57G substitutes a small residue without a side chain for a polar hydrophilic residue with a fairly reactive OH group, which is usually involved in forming hydrogen bonds. Since this residue is located in the putative recognition helix, a DNA-contact assignment to S57 could also explain the effect of this substitution in generating the dissociated complex in Figure 6c, lane 2.
Two substitutions, A42T and V35L, which produced little or no change in the wild type phenotype, lend additional support to our identification of the HTH based on the Pabo and Sauer predictions. Replacement of A42 in the four-residue turn with any small amino acid would probably have little effect on protein function (A42T; Figure 6c, lane 3); in addition, the replacement of the optional hydrophobic, V35, with leucine in the first helix of the HTH would not be expected to have a significantly negative effect (Figure 6b, lane 5) on HTH function (see Figure 11). These results confirm that in IS2, N-terminal helices 2 and 3 contain the HTH motif with a four-residue turn between them. Thus the IS2 binding domain consists of residues 13 to 26 which form helix 1, 32 to 38 form helix 2, (helix 1 of the HTH; Figure 11), 39 to 42 form the turn, and 43 to 58 form helix 3 (helix 2 of the HTH; Figure 11). The A42T mutation has an interesting phenotype in that it was selected as a bright colony (see the legend to Table 3) but is not toxic to the cell even though it is phenotypically a silent mutation. It is possible that its protein is produced in lower amounts or that the mutation has simply made the protein more soluble.
These results are in general accord with, and extend the work of, Prere et al. , Hu et al. , Lei and Hu  and Rousseau et al.  on IS3 family TPases. Hu et al. predicted the existence of an HTH motif in the IS2 TPase at residues 31 to 50 and Lei and Hu demonstrated the loss of binding capability experimentally for IS2 OrfA deletion derivatives lacking as few as the first 12 residues (likely destabilizing the formation of helix 1) and as many as 57 residues from the N-terminus. PSIPRED secondary structure analyses of the TPases of all other prototypes of the principal subgroups of the IS3 family show three helices whose positions are similar to those shown for IS2 (data not shown).
There is much evidence for multihelix binding domains which include at least one HTH motif in TPases. IS30, which transposes via a circle-forming pathway, possesses an N-terminal binding domain with two HTH motifs, one of which is a component of an H + HTH structure . The MuA Iβ and Iγ DNA-binding subdomains which form bipartite binding structures are composed of five and four α helices, respectively, each including an HTH motif [96, 97]. In the case of the Iβ subdomain of MuA, all five helices are involved in the interaction with the DNA. Similar results have been reported for the TPases Tc3  and the Tc1-like element Sleeping Beauty  whose multihelix structures with two HTH motifs are not dissimilar from those of the homeodomain family of helix-turn-helix DNA-binding proteins  or the paired DNA binding domain family .
The W49R substitution in the second and putative recognition helix of the HTH generated a protein with no negative effects on binding efficiency (Figure 6b, lane 3) but lacked any capacity for transposition (Table 2, row 11). Resolution of this apparent contradiction has led to the conclusion that W49 may not directly interact with the protein binding domains of IRR and IRL. Figure 7 shows that few residues in the N-terminal helix 3 (B α-3) in IS2, are conserved in IS3 family TPases. This is expected for the recognition helices of these motifs which have little identity in the sequences of their ends; on the contrary, W49 in IS2 however, corresponds to what has been described as one of the most highly conserved of all residues in the TPases of the IS3 family . The ability of the W49R mutation to disrupt transposition but not binding in IS2, (even when a charged hydrophilic residue is substituted for a highly hydrophobic one) suggests that the function of W49 may extend globally in the protein and is not confined to binding functions of the HTH motif.
A similar but not identical inconsistency in the relationship between binding efficiency and transposition was also observed with the equivalent W42 in IS911 . There, the W42F mutant derivative which produced little to no binding efficiency with a truncated OrfAB lacking the CAS, showed a strongly positive result for in vivo transposition in the presence of the CAS of the IS911 TPase. This suggested that the CAS somehow had the ability to compensate for the deficiency of the W42F substitution in facilitating binding.
Our results suggest that this conserved tryptophan in IS3 family TPases may be involved in interacting with the CAS of the protein, for example, by promoting the folding which allows that motif to be correctly positioned in binding to the catalytic domain of IRR. W49R may fail to communicate the level of accuracy in CAS binding (for example, by permitting a minor misfolding) that is needed to allow recombination, without affecting regional DNA binding. Evidence for extensive binding of the IS2 TPase to the catalytic domain of IRR (the donor end in this insertion sequence) has been shown in concurrent footprinting studies described elsewhere (Lewis et al, Protein-DNA interactions define the mechanistic aspects of circle formation and insertion reactions in IS2 transposition, submitted) and the issue of the role of the CAS in global binding of the protein is addressed in this study in the discussion of CAS mutations which reduce binding efficiency.
The IS2 TPase possesses a LZ-like oligomerization motif at its N-terminus that facilitates binding to the ends of the element
The sequence of the coiled coil motif of the IS2 OrfAB TPase (residues 73-100; Figure 12a) differs in significant ways from that of the canonical LZ. Indeed when this sequence is tested on the 2-ZIP server (2zip.molgen.mpg.de/cgi-bin/2zip.pl;) a LZ is not predicted. In this study, all five substitutions in the coiled coil domain indicate that a LZ-like motif, whose function is required for binding and transposition, exists within residues 73 to 100 in the IS2 TPase.
We have aligned the four OrfAB LZ-like heptads in IS2 with corresponding sequences from prototype elements of the four other subgroups of the IS3 family (Figure 12b). Haren et al.  have, however, created a detailed alignment of putative LZ sequences from OrfA, involving 15 members of the five subgroups (IS2, IS3, IS51, IS150 and IS407) of the IS3 family and they have specifically demonstrated the presence of a canonical LZ motif with a four-heptad repeat in OrfAB of IS911 [30, 31]. These alignments reveal, however, that the putative IS2 LZ-like motif is the only sequence in which only two of the four d positions are occupied by leucine (L83 and L97) and that IS2 alone lacks the leucine residue at the d position of the first heptad (for example, see A76-; Figure 12b). However, three of the four hydrophobic residues at the a positions (L73, I80 and L87) are occupied by leucines or isoleucine. The fourth a position, N94, in the fourth heptad is the buried polar asparagine, which is essential for inter-subunit H-bonding in canonical LZ structures . Another significant difference between this putative IS2 LZ-like motif and the canonical LZ is the restriction of ionic (g/e' g'/e) stabilizing salt bridges to the third and fourth heptads (Figure 12c). It is possible, however, that weak non-ionic inter-subunit stabilizing interactions between the first and second heptads are brought about by the glutamine residues (Q79 and Q84) in the g and e positions of these two heptads. We propose, based on the analysis of all five mutations, that stabilization of a potential LZ-like structure (Figure 12c) would be brought about as follows: the N-terminal half of the structure would be relatively weakly stabilized by the concerted action of the d-located leucines at L83 in the second heptad, the a-located hydrophobics L73 and I80 and by hydrogen bonds at the g and e positions, Q79 and Q84, in the first and second heptads respectively. The C-terminal half of the motif, on the other hand would be more strongly stabilized by the d-located leucines at L97, the a-located asparagine (N94) whose buried hydrogen bonds contribute significantly to stabilization of the zipper (both in the fourth heptad) and the canonical ionic salt bridges generated by the g and e residues at E93 and K98 in the third and fourth heptads, respectively. Thus, L83V and L97H affected the canonical d-located leucines. The L83V substitution (Figure 6c, lane 5) completely abolished both binding and transposition, suggesting that substitution of the C-β branched valine residue destroyed the primary interaction for stabilization at the N-terminus and consequently the entire LZ-like motif. The phenotype of the Q79L substitution appears to have affected the weak g/e' g'/e inter-subunit stabilizing reactions at the N-terminal end of the zipper-like structure but, given that the primary stabilization interaction is still present, it produced a less drastic phenotypic change insofar as binding efficiency is concerned (Figure 6a, lane 4), compared to the replacement at Leu83 (L83V).
L97H, on the other hand, had a much less drastic effect on binding (Figure 6a, lane 6), although transposition was all but abolished. The L97H substitution destabilized the putative motif at its C-terminal end but the two other strong stabilization interactions described above appear to allow a level of oligomerization that permits unstable binding with minimal dissociation. Similarly, N94D altered the buried a-located asparagine residue required for stabilization of the zipper but the existence of the two remaining stabilization interactions at the C-terminus appears to have influenced the production of a phenotype similar to that of L97H (Figure 6c, lane 5).
The K89M substitution (Figure 12c) also abolished transposition completely and provides further evidence for a functional LZ-like motif. Its phenotype is consistent with the location of K89 at a c-located position, which is part of the solvent-exposed helical surface that must be occupied by a hydrophilic residue. A hydrophobic residue would disrupt the formation of that surface and subsequently abolish zipper function [103, 104].
The CAS of the TPase of IS2 and other IS3 family members share the functional properties of the three-dimensional catalytic core of the TPase/RISF
The eight substitutions, W237R, L266P, H267D, R291H, V301M, A341T, A341P and E391K (Table 2, rows 18-25) fell into 3 α helices and 3 β strands of the putative CAS (Figure 10b). Four of these (W237R, L266P, H267D and V301M) impacted the putative β sheet of the catalytic core and abolished transposition but only W237R had no effect on binding (Figure 6f, lane 4), a result that helps identify the function of W237 and of β strand 1 in the CAS. Two of the remaining four mutations, A341T and A341P, located adjacent to the third member of the catalytic triad, E342, affected a highly conserved hydrophobic residue in α helix 4 in the RISF, that is, V151 in HIV-1 (Figure 10b; see also ). A341T had no negative effect on binding efficiency (Figure 6b, lane 4) and enhanced the frequency of transposition by about 50% (Table 2, row 22), a result that also sheds light on the function of α helix 4 in the IS2 CAS. Substitutions were recovered in two other α helices, E391K in α helix 6 and R291H in α helix 1. These and H267D in β strand 3, which reduced but did not eliminate binding, helped identify residues and elements which likely function in binding the CAS to the catalytic domain.
The W237R and A341T substitutions eliminated and enhanced cleavage respectively, and provide strong evidence, based on the deduced function of the two WT residues, that the three-dimensional structure of the catalytic core of the IS2 TPase functions similarly to that in the RISF. W237R is highly conserved in β strand 1 of the RISF and aligns with W61 in HIV-1 and RSV. The location of this tryptophan, three residues from the first of the catalytic aspartates (D240 in IS2 and D 64 in HIV-1) on β strand 1, is consistent with its role, as shown from crosslinking studies with W61 of HIV-1 , in interacting with the 3' end of the DNA and positioning it within the catalytic pocket. The ability of W237R to eliminate transposition without affecting binding could then be explained by a similar role for W237.
The A341T substitution highlights the essential supporting role of residues adjacent to E342 in α helix 4, in the chemistry of cleavage and joining, and we draw this conclusion from the extent of conservation in this α helix in the RISF. For example, the co-crystal structure of the Tn5 TPase has shown that Y319, R322, K330 and K333, which flank E326 (the triad glutamic acid) in α helix 4, are involved in making specific contacts with the 3' and 5' ends (transferred and non-transferred strands) of the catalytic domain of the DNA . These four residues are aligned directly, in α helix 4 of IS2, with E336, N338, K346 and K349 (N338 and K349 are highly conserved residues), which flank E342  and presumably have the same function as their equivalents in Tn5. In addition, K346 and the conserved K349 in IS2 are aligned with K156 and K159 in HIV-1 integrase (Figure 10b). These two residues in IN have been shown to contact the DNA, with K159 directly interacting with the adenosine of the terminal CA-3' dinucleotide, where it is involved in orienting the DNA properly for cleavage . Earlier, van Gent et al.  had shown that a K159V substitution in HIV-1 significantly slowed the rate of integration without significantly reducing the amount of integration in an overnight incubation. Their implication was that this mutation reduced by one the number of residues flanking E152 (the triad glutamic acid) available for contact with the DNA and thus reduced the efficiency of interaction between the protein and the DNA. In addition, Calmels et al.  demonstrated in HIV-1 that 75% of the random mutations immediately flanking E152 that resulted in an increase in the amount of binding to a strand transfer substrate included a V151T mutation, the homologue of A341T in IS2. One can then account for the 50% increase in transposition of A341T, by assuming that enhanced interaction with the catalytic domain of IRR, due to an additional specific or stochastic DNA contact by the substituted threonine, produced the subsequent enhancement. This is likely the case, given its proximity to the four residues which putatively make contact with the catalytic domain of the IS2 IRR and its location adjacent to E342. These two results, with W237R and A341T on β strand 1 and α helix 4 respectively, suggest that the three-dimensional structures of these elements, and subsequently that of the catalytic core, are functionally similar to those of the RISF.
We have been able to differentiate between substitutions in the CAS which do not affect the binding efficiency of the protein, W237R or A341T, those which affected the structural integrity of the catalytic core and thus the entire protein, preventing any complex formation, A341P, L266P and V301M, (Figure 6e, lanes 2-4) and those which reduce binding efficiency of the CAS to the cognate DNA, such as H267D, R291H and E391K (Figure 6a, lane 1 and 6d lanes 2-3); these last three produced partially dissociated complexes identifying residues that are likely important binding contacts between the CAS and the catalytic domain. H267D replaced a basic residue with a negatively charged one at a non-conserved position on β strand 3. The enhanced level of substrate dissociation is in accord with reduced contact with the DNA. R291H substituted a weakly basic residue at a position occupied by a conserved arginine in four of the five subgroups in α helix 1 of the IS3 family. The substitution reduced binding efficiency, likely compromising the DNA anchoring function provided by Arg 291. E391K occurs in α helix 6, which is characterized by two highly conserved residues, proline (P389 in IS2) in RSV and the IS3 family and a glutamic acid or glutamine in the RISF; E391K in IS2 altered the latter and the replacement of the acidic residue with the basic lysine reduced the overall binding affinity to the DNA in the catalytic domain, without completely eliminating it. The phenotypes of these mutations (H267D, R291H and E391K) suggest that their wild type residues are critical contacts which facilitate the binding of the CAS to the catalytic domain of IRR.
On the other hand, A341P, the helix-breaking proline substitution in α helix 4, altered a conserved hydrophobic residue in the RISF, significantly reducing complex formation. L266P altered a conserved hydrophobic residue in β strand 3 of the RISF and V301M altered a very hydrophobic, conserved residue in the IS3 family in β strand 4, associated with the second aspartate of the catalytic triad (D306); both of these completely eliminated complex formation. The fact that all three of these substitutions replaced very hydrophobic residues and eliminated binding suggests that their principal effect was to disrupt the α helix or β strand, or the putative β sheet and thus the catalytic core, the integrity of which is clearly essential for proper folding of the full length protein and thus global binding.
These results underscore the importance that binding of the catalytic core to the CD plays in regional and global binding of the full length protein. On one level the W49R substitution in the recognition helix of the HTH apparently failed to coordinate the necessary level of accuracy of binding of the catalytic core to the DNA of the catalytic domain (most likely due to a minor folding impairment), eliminating transposition but nevertheless permitting global binding. However, a full length protein with a mutation of a single anchoring residue in its catalytic core, which may not alter the structural integrity of the protein, significantly impacts global binding, manifested by partial dissociation of the complex. From this we conclude that the binding reactions with wild type proteins shown in Figures 2 and 6, in which all of the DNA is driven into the complex, result from fully formed complexes in which both the DNA binding domain and the CAS of the protein are fully complexed to the ends. This conclusion is supported by data showing extensive protection of the protein binding and catalytic domains of IRR or of the abutted ends of the minicircle junction (Lewis et al, Protein-DNA interactions define the mechanistic aspects of circle formation and insertion reactions in IS2 transposition, submitted). Impaired binding by either domain of the protein thus produces dissociation of the complex.
The integrity of a middle interval contributes to the binding capability of the IS2 TPase
The V179L substitution affects a hydrophobic residue that is functionally conserved in α helix M5 in the RISF (Figure 10a). Two of the three residues conserved in the IS3 family are also conserved in the RISF and V179L affected one of them. The disruption of binding and abolition of transposition in IS2 likely resulted from the replacement of the C-β branched valine, which affected the backbone of the α helix, distorting or disrupting it . The result suggests that at least α helices M4 to M6 of the middle interval of the protein, which align with good conservation with the first three α helices of IN, are critical to the functional architecture of the protein that relates to global binding to the cognate IS2 DNA.
These results validate the strategy of the GFP-tagged approach to obtaining, under native conditions, preparations of a full length, soluble, active protein like the IS2 TPase that is usually insoluble when prepared under native conditions and refractory to whole protein structure-function or biophysical studies when solubilized. This strategy has resulted, for the first time (among circle forming insertion sequences with a two-step transposition pathway), in the recovery of a full length protein which is capable of very efficient binding in vitro to cognate DNA and the formation of fully formed complexes (Lewis et al, Protein-DNA interactions define the mechanistic aspects of circle formation and insertion reactions in IS2 transposition, submitted) involving residues at both the N- and C-termini of TPase. In addition the fluorescence-based random mutagenesis approach to exploring structure-function relationships has helped refine our understanding of those relationships in IS2 and the IS3 family TPases by teasing out residues that facilitate binding, oligomerization and (as they relate to the integrases) catalysis, as well as those that define possible interactions between structural motifs of the protein.
Bacterial strains and media
E. coli strain JM105 (New England Biolabs) was used for most procedures involving plasmid DNA preparation, cloning and the lacZ papillation assay. DNA transformation was carried out into supercompetent XL1 Blue cells (Stratagene Inc, Santa Clara, CA, USA) for reactions requiring cloning and overexpression of the fused orfAB and GFPuv genes in pLL2522. BL21(DE3)pLysS cells (Novagen-EMD4Biosciences, La Jolla, CA, USA) were used for over expression of the OrfAB-GFP fusion product cloned into the pTWIN2 vector (New England Biolabs).
Cultures were routinely grown in lysogeny broth (LB) media at 37°C, supplemented where necessary with carbenicillin (Cb, 50 μg/mL), kanamycin (Km, 40 μg/mL) or chloramphenicol (Cm, 20 μg/mL). For the overexpression of pGLO, pLL2522 and pLL2524-XXX (plasmids with the GMF mutations), cultures were grown at 28°C in 2x YT media supplemented with Cb and arabinose (6 mg/mL).
Plasmid DNA preparation was carried out using the standard alkaline lysis procedure of the Wizard DNA Purification System (Promega Corp., Madison, WI, USA) for in-labarotory protocols. The Pure Link HQ Miniplasmid Purification Kit (Invitrogen Corp., Carlsbad, CA, USA) was used in the preparation of DNA samples for outsourced sequencing reactions (see below).
Restriction endonuclease digestion was carried out with enzymes and buffers from New England Biolabs. Diagnostic gels were made with 0.8% Seakem agarose and preparative gels were made with 0.6% Seaplaque Low Melting Temperature agarose (Cambrex Corp., East Rutherford, NJ, USA). DNA was purified from preparative gels with Gelase (Epicentre Biotechnologies, Madison, WI, USA) following the manufacturer's instructions and concentrated in a Microcon-100 Filter Device (Millipore, Billerica, MA, USA) to a 50 μL volume. The solution was dried down to a pellet in a Savant SpeedVac DNA concentrator, resuspended in 12 μL ultrapure H2O and frozen at -20°C until use. Standard cloning procedures were as previously described .
Standard PCR and PCR-mediated in vitro site-directed mutagenesis were carried out with the Vent DNA polymerase (New England Biolabs) used in accordance with the manufacturer's instructions. The reaction protocols were as described earlier . PCR products were cleaned up with the Direct PCR Purification Buffer and the Wizard PCR Preps Resin (Promega Corp.).
Plasmid constructs and mutagenizing oligonucleotides
pGLO-ATG2 containing 3'-located Eco RI-Nhe I cloning sites (Figure 2a) was created by removing an Eco RI site located adjacent to the two stop codons (bold upper case) at the 3' end of GFPuv with the oligonucleotide (all mutagenizing sites in this section are in bold lower case) 5'GGATCATCAGGTACCGAGCg CGt ATTCATTA TTTGTAGAGCTCATCCATGCC3' and creating a new cassetting Eco R1 site upstream of the existing Nhe I site (in upper case, containing the first two codons of GFP) and destroying the ATG start codon at the 5' end of the gene, with the oligonucleotide 5'TCCCCTTCCCCGCTATGg ATCAGCTGAgaattc TTCTCCTTCTTAAAGTTAAA3'.
pLL2521HK (Figure 2d) containing an Eco RI-Nhe I cassetted orfAB gene was created in successive steps by removing the upstream Eco RI site in pLL18 (Figure 2b) with the oligonucleotide 5'AGACTATCACTTATCCGCGGAACAGTCTAGAGCTCcccctc ACTGGCCGTC3', placing Eco RI adjacent to the IS2 start codon (pLL2509A; Figure 2c) with the oligonuclotide 5'ACTAGTTTTTAGACCGTCATTGGAgaattc ATGATTGATGTGTTAGGGCC3', adding an Nhe I site and altering the adjacent stop codon at the 3' end of IS2 orfAB to create pLL2520 with the oligonucleotide 5'GGGCCCgcgctagc ACCGGTTATTTCCAGACATCTGTTATCACTTAACC3' and adding a 6X HIS tag downstream of the IS2 orfAB start codon (Figure 2d) with oligonucleotide 5'GTATGcatcatcatcatcatcatagcagatatctggtattgagtataagc ATTGATGTCTAAGGGCCGGAG3'Finally, in order to fuse the Eco R I-Kpn I cassetted orfAB-GFPuv fusion sequence (Figure 2e) to the Kmr reporter gene, a procedure needed for the creation of lacZ papillation assay constructs, a Kpn I site was added adjacent to and downstream of the Nhe I site (upper case lettering) in the sequence that connects orfAB to the Kmr gene. For this we used the primer 5'AACTGATCCAGGGCCCGggtacc AGCTAGC ACCAGTTATTTC3'.
pLL2522 was produced by cloning the cassetted Eco R1-Nhe I orfAB gene into pGLO-ATG2 (Figure 2e).
pUH2509, a construct used for lacZ papillation assays, containing IS2 with the frame fused orfAB gene from pLL18 (Figure 2b) was created as follows. IRL in pLL18 was deleted and the weak indigenous E-10 promoter (upper case lettering) conserved while adding a Sac II site to form pLL2509A (Figure 2c), into which the Xba I-Sac II cassetted lacZ gene could be cloned. We used the oligonucleotide 5'CCAGTGGAATTCGAGCTCTAGACTGTTccgcgg ATAAGTGATAGTCT TAATATTAGTTTTTTAGACTAGTCATTGG3'. lacZ was obtained from pLL135 . The 3' end of the gene was modified to add the necessary Sac II site, generating pLL135II using 5'GGTACCGGGGATCCgccg AGACATGATAAGATACATTGATGAGTTTGG3'. The 5' end of lacZ was modified to remove the lacUV5 promoter, to add an Xba I site as well as the IS2 IRL (upper case lettering) generating LL135IRLLZ. All three reading frames reading into the IRL sequence lacked stop codons. We used the oligonucleotide 5'ATGTTCTTTCCTCGAGtctaga TAGACTGGCCCCCTGAATCTCCAGACAACCAATATCACTTAATTAT TGCCGTAAGCCGTGGCCG3'. The Xba I-Sac II fragment from pLL135IRLLZ was cloned into pLL2509A to produce plasmid pUH2509, which contained a 6.4 kb version of IS2 consisting of (from 5' to 3'): IRL, the promoterless lacZ gene sequence, the orfAB sequence without functional left or right ends, the Kmr gene and IRR.
pUH2523, the construct containing the fused orfAB::GFPuv genes, used for lacZ papillation assays, was created as follows. (i) orfAB linked to the Kmr gene in pLL2521HK is cassetted within Eco RI and Kpn I restrictions sites (Figure 2d), so in order to add the Kmr reporter gene to the fused orfAB::GFP genes we replaced orfAB in pLL2521HK (Figure 2d) with the Eco RI-Kpn I cassetted orfAB::GFP sequence shown in Figure 2e, to create pLL2523. (ii) The lacZ papillation assay plasmid pUH2509 possesses an Spe I site downstream of the E-10 promoter of IS2orfAB and an Nru I site within the Kmr reporter gene, as do all constructs in which Kmr is present as a reporter gene (see, for example pLL2521HK in Figure 2d). The Spe I-Nru I fragment from pUH2509 was replaced by the corresponding fragment from pLL2523 to create pUH2523. Similarly, Spe I-Nru I fragments from pLL2524-XXX, plasmids containing mutated orfAB genes (see below), were used to create lacZ papillation plasmids pUH2524-XXX.
pUH2523ΔorfAB, the null mutation used as a control in lacZ papillation assays (Table 2, row 1), was created by deleting a 1743 bp fragment between two Mfe I restriction sites, 103 bp from the start of the IS2orfAB sequence and 156 bp from the end of the GFPuv sequence in pUH2523, followed by blunt ligation of the sites.
pTW2orfAB::GFP was created by cloning the fused orfAB::GFP genes into the pTWIN2 vector of the IMPACT system (Intein Mediated Purification with an Affinity Chitin-binding Tag; New England Biolabs) for the purposes of improving the purification of the fusion protein. The construct was cloned into the N-terminal multiple cloning site of the vector by first creating a Sbf I site close to the existing Eco RI site with 5'GGCATACATGAATTCCTCGAGGcctgcagg CTGCGTATCCGGTGACACC3' to accommodate the EcoR I/Sbf I cassetted orfAB::GFP sequence.
Creation and cloning of mutations in IS2 orfAB from a PCR-based random mutagenesis protocol
The GeneMorph II Random Mutagenesis Kit (Stratagene) was used to create mutations within orfAB in pLL2521HK (Figure 2d) using a 30-cycle PCR-based protocol. Primers were M13F (forward) and KmR1 (reverse; ). Mutations were generated at very low, low and medium rates (900 ng of target DNA within 3.6 μg of plasmid DNA; 500 ng of target within 2.0 μg of plasmid DNA; and 250 ng of target within 1.0 μg of plasmid DNA respectively). PCR products were cloned into the Eco RI-Nhe I sites of pGLO-ATG2, transformed into XL1-Blue Supercompetent cells and plated onto LB plus Cb plus arabinose agar. After 72 hours at 37°C, plates were examined for brightly fluorescing colonies among a background of less brightly fluorescing colonies. Plasmids from the brighter fluorescing clones carrying mutations in the orfAB sequence were identified as pLL2524-XXX where XXX stands for 001-110.
LacZ papillation assays
Papillation was best observed when pUH2509, pUH2523 or pUH2524-XXX plasmid DNA was transformed into JM105 cells. The DNA concentration was titrated to produce about 50 to 60 transformants per plating on to LB plus Km plus Cb plus arabinose agar. Plates were incubated in airtight bags to minimize drying. The numbers of papillae plateaued after 20 to 25 days at 37°C.
Preparation of the wild type and mutant OrfAB-GFP fusion proteins under native conditions
pLL2522 and other mutant plasmid DNA were transformed into XLI-Blue cells (Stratagene), plated on to LB plus Cb plus arabinose agar and incubated for 48 hours at 37°C. A single fluorescing colony was inoculated into 10.0 mL of similarly supplemented 2x YT broth and incubated overnight at 28°C. After centrifugation, the pellet was checked for fluorescence, washed in 3.0 mL Native Wash Buffer pH 8.0 (50 mM sodium phosphate monobasic monohydrate, 300 mM NaCl), resuspended in 3.0 mL Bug Buster Protein Extraction Reagent (Novagen-EMD4Biosciences) supplemented with 1.0 uL of Benzonase (Novagen-EMD4Biosciences) per 10.0 mL overnight (o/n) culture and 3.0 uL of Protease Arrest (Calbiochem-EMD4Biosciences La Jolla, CA, USA) per mL of lysate and nutated at 4°C for 30 minutes. If necessary, the suspension was subjected to a single round of freezing and thawing to complete lysis. The lysate was checked for bright fluorescence before and after centrifugation at 16,000 × g for 1 hour at 4°C.
6xHis-tag purification of the protein was achieved by gravity flow affinity chromatography using Ni-NTA agarose (Qiagen Valencia, CA, USA) under native conditions essentially following the manufacturer's instructions. The crude lysate was loaded on to a 1.0 mL bed of the nickel-charged resin in a 5.0 mL column and chromatographic separation followed with UV light. The protein bound as a tight brightly fluorescing band at the top of the column and remained bound through washings with 10 to 60 mM Imidazole when a slight dissociation of the band was observed. To circumvent continued dissociation, the band was eluted with 250 mM Imidazole and its progress through the column followed. Peak fractions (fluorometrically determined) were subjected to diagnostic 12% PAGE using Ac:Bis (30%:8%) polyacrylamide gels (Figure 4a). Fractions showing both the 74 kDa OrfAB-GFP and the 17 kDa OrfA proteins were pooled (approximately 700 uL), concentrated to about 75 uL in a YM-10 Microcon Centrifugal Filter Device (Millipore), dialyzed overnight in 300 mM NaCl, 50 mM tris(hydroxymethyl)amino methane (Tris-Cl), pH 8.0 and 1.5 mM dithiothreitol using Slide-A-Lyzer cassettes (Pierce/Thermo Scientific Rockford, IL, USA) and stored in 50% glycerol at -20°C. Concentrations of GFP in the sample shown in Figure 4a were measured with spectrophotometry at 280 nm and 397 nm while those of the wild type and mutant versions of the fused OrfAB-GFP proteins were measured at 397 nm. Comparative levels of fluorescence of GFP and the fusion proteins were measured fluorometrically and used to confirm the concentration data.
For the overexpression of the OrfAB-GFP fusion protein in the pTWIN2 derivative (IMPACT, New England Biolabs), plasmid pTWorfAB::GFP was transformed into BL21(DE3)pLysS cells. Single colonies were inoculated into 10 mL 2xYT plus Cb plus Cm and grown overnight at 37°C. Two milliliters of this starter culture was inoculated into 120 mL of the same medium (to establish an optical density (OD) of 0.2) and grown at 37°C to an OD of 0.8 when it was induced with 1.0 mM isopropyl β-D-1-thiogalactopyranoside and allowed to grow overnight at 16°C. The culture was lysed as described above and the cleared lysate loaded onto the chitin column. The protein was purified per the manufacturer's instructions with binding and elution monitored by UV light-induced fluorescence. Peak fractions were collected pooled and analyzed as described above, purified on ion exchange Q-sepharose columns (HiTrap Q XL, GE Healthcare) following the manufacturer's instructions, and concentrated, dialyzed and stored as described above.
Electrophoretic mobility shift assays
Annealed 50-mer oligonucleotides containing the 41 bp IRR sequence were used in all but one of the EMSA experiments (Figure 6a-e). The upper strand was labeled at the 5' end with γ32P-ATP. Primer A - upper strand (the IRR sequence is within the square brackets): 5'GGATCC[TTAAGTGATAACAGATGTCTGGAAATATAGGGGCAAATCCA]GCG3'. Primer B - lower strand: 5'CGC[TGGATTTGCCCCTATATTTCCAGACATCTGTTATCACTTAA]GGATCC3'.
Reactions shown in Figure 6f utilized annealed 87-mer oligonucleotides containing the IRR sequence. The top strand (primer A) was labeled at its 5' end with γ32P-ATP. Primer A - 5'GCTGACTTGACGGGACGGGGATCC[TTAAGTGATAACAGATGTCTGGAAATATAGGGGCAAATCCA]ATCGACCTGCAGGCATATAAGC3'. Primer B - 5'GCTTATATGCCTGCAGGTCGAT[TGGATTTGCCCCTATATTTCCAGACATCTGTTATCACTTAA]GGATCCCCGTCCCGTCAAGTCAGC3'.
5'-end labeling and annealing of the primers
A 20 μL labeling reaction contained 40 units of T4 polynucleotide kinase in 1X T4 polynucleotide kinase reaction buffer (New England Biolabs), 20 μM of the primer (upper strand) and 50 μCi of γ32P-ATP. The reaction was incubated at 37°C for 30 minutes and heat-killed at 90°C for 5 minutes. A 100-μL annealing reaction contained 10 ρmol and 13 ρmol of the labeled and unlabeled strands respectively, 20 mM Tris-Cl pH 8.0 and 100 mM NaCl. The reaction was placed in a boiling water bath, cooled to 65°C, held there for 15 minutes and allowed to cool to room temperature.
Binding of the TPase to its cognate DNA was carried out for 30 minutes at room temperature (20°C) in a 15-uL reaction mixture of 20 mM Tris-Cl pH 8.0, 1 mM ethylenediaminetetraacetic acid, 5.0 μg/mL calf thymus DNA, 10 nM of the radioactively labeled annealed primers and 0.13 μM of the partially purified preparation of the OrfAB-GFP fusion protein. Reactions were separated on 5% native polyacrylamide gels at 4°C for an average of 450 volt hours (Vhrs) (see Figure 6).
Secondary structure algorithms and protein alignment tools
The ExPASy SWISS PROT translation toolkit  of the Swiss Institute of Bioinformatics was used to translate DNA sequences from the prototypes of the principal subgroups of the IS3 family, that is, IS2, IS3, IS51, and IS407 plus IS911 of the IS3 subgroup and IS861 of the IS150 subgroup, into protein sequences. Similar translations were done for sequences of the HIV-1 and RSV integrases. The ClustalW2 multiple alignment tool  was used for the alignment of protein sequences in Figure 7. Structure-based alignments in Figure 10 were determined from the sequences shown in Figure 7, from published RSV and HIV-1 sequences [73, 109, 110] from the alignments of Fayet et al.  and Rezsohazy et al.  and from the PSIPRED secondary structure determinations for the members of the IS3 family sub-groups and the two integrases. In these aligned sequences, functionally conserved non-polar hydrophobic residues were identified as h1 when all sequences possessed only very hydrophobic residues (L, I, V, C, M, F or W) and h2 when less hydrophobic residues are present or the conserved residues are only found in fewer than 80% of the sequences. Three different algorithms were used for secondary structure predictions: the PSIPRED server, the PROF Secondary Structure Prediction Protocol  using the Bioinformatics Information toolkit of the Max Planck Institute for Developmental Biology and the PHD Secondary Structure Analysis Algorithm  from the secondary analysis prediction protocol of PBIL (pbil.univ-lyon.fr; ). A PCOILS algorithm for coiled coils from the Bioinformatics Information toolkit of the Max Planck Institute for Developmental Biology [57, 58] was used to predict the presence of a coiled coil motif and the 2ZIP server  from the same institution was used to predict the presence of a LZ within the coiled coil motif.
List of abbreviations
catalytic active site
electrophoretic mobility shift assay
green fluorescent protein
right and left inverted repeats
open reading frame
polymerase chain reaction
TPase/retroviral integrase superfamily
Rous sarcoma virus
We thank W Wong for technical assistance and NDF Grindley for useful discussions. This research was supported by US Public Health Service grant NIGMS/MBRS GMO8153 to LAL and a York College FDSP award 990110 to LAL.
- Chandler M, Mahillon J: Insertion sequences revisited. Mobile DNA II. Edited by: Craig NL, Craigie R, Gellert M, Lambowitz AM. 2002, Washington, DC: ASM Press, 305-366.View ArticleGoogle Scholar
- Rousseau P, Normand C, Loot C, Turlan C, Alazard R, Duval-Valentin G, Chandler M: Transposition of IS911. Mobile DNA II. Edited by: Craig NL, Craigie R, Gellert M, Lambowitz AM. 2002, Washington, DC: ASM Press, 367-383.View ArticleGoogle Scholar
- Polard P, Prere MF, Chandler M, Fayet O: Programmed translational frameshifting and initiation at an AUU codon in gene expression of bacterial insertion sequence IS911. J Mol Biol. 1991, 222: 465-477. 10.1016/0022-2836(91)90490-W.PubMedView ArticleGoogle Scholar
- Vogele K, Schwartz E, Welz C, Schiltz E, Rak B: High-level ribosomal frameshifting directs the synthesis of IS150 gene products. Nucleic Acids Res. 1991, 19: 4377-4385. 10.1093/nar/19.16.4377.PubMed CentralPubMedView ArticleGoogle Scholar
- Hu ST, Lee LC, Lei GS: Detection of an IS2-encoded 46-kilodalton protein capable of binding terminal repeats of IS2. J Bacteriol. 1996, 178: 5652-5659.PubMed CentralPubMedGoogle Scholar
- Lewis LA, Grindley ND: Two abundant intramolecular transposition products, resulting from reactions initiated at a single end, suggest that IS2 transposes by an unconventional pathway. Mol Microbiol. 1997, 25: 517-529. 10.1046/j.1365-2958.1997.4871848.x.PubMedView ArticleGoogle Scholar
- Lewis LA, Gadura N, Greene M, Saby R, Grindley ND: The basis of asymmetry in IS2 transposition. Mol Microbiol. 2001, 42: 887-901. 10.1046/j.1365-2958.2001.02662.x.PubMedView ArticleGoogle Scholar
- Normand C, Duval-Valentin G, Haren L, Chandler M: The terminal inverted repeats of IS911: requirements for synaptic complex assembly and activity. J Mol Biol. 2001, 308: 853-871. 10.1006/jmbi.2001.4641.PubMedView ArticleGoogle Scholar
- Polard P, Chandler M: Bacterial TPases and retroviral integrases. Mol Microbiol. 1995, 15: 13-23. 10.1111/j.1365-2958.1995.tb02217.x.PubMedView ArticleGoogle Scholar
- Duval-Valentin G, Marty-Cointin B, Chandler M: Requirement of IS911 replication before integration defines a new bacterial transposition pathway. Embo J. 2004, 23: 3897-3906. 10.1038/sj.emboj.7600395.PubMed CentralPubMedView ArticleGoogle Scholar
- Kiss J, Olasz F: Formation and transposition of the covalently closed IS30 circle: the relation between tandem dimers and monomeric circles. Mol Microbiol. 1999, 34: 37-52. 10.1046/j.1365-2958.1999.01567.x.PubMedView ArticleGoogle Scholar
- Loessner I, Dietrich K, Dittrich D, Hacker J, Ziebuhr W: TPase-dependent formation of circular IS256 derivatives in Staphylococcus epidermidis and Staphylococcus aureus. J Bacteriol. 2002, 184: 4709-4714. 10.1128/JB.184.17.4709-4714.2002.PubMed CentralPubMedView ArticleGoogle Scholar
- Prudhomme M, Turlan C, Claverys JP, Chandler M: Diversity of Tn4001 transposition products: the flanking IS256 elements can form tandem dimers and IS circles. J Bacteriol. 2002, 184: 433-443. 10.1128/JB.184.2.433-443.2002.PubMed CentralPubMedView ArticleGoogle Scholar
- Schmid S, Berger B, Haas D: Target joining of duplicated insertion sequence IS21 is assisted by IstB protein in vitro. J Bacteriol. 1999, 181: 2286-2289.PubMed CentralPubMedGoogle Scholar
- Rousseau P, Tardin C, Tolou N, Salome L, Chandler M: A model for the molecular organisation of the IS911 transpososome. Mob DNA. 2010, 1: 16-10.1186/1759-8753-1-16.PubMed CentralPubMedView ArticleGoogle Scholar
- Polard P, Chandler M: An in vivo TPase-catalyzed single-stranded DNA circularization reaction. Genes Dev. 1995, 9: 2846-2858. 10.1101/gad.9.22.2846.PubMedView ArticleGoogle Scholar
- Polard P, Ton-Hoang B, Haren L, Betermier M, Walczak R, Chandler M: IS911-mediated transpositional recombination in vitro. J Mol Biol. 1996, 264: 68-81. 10.1006/jmbi.1996.0624.PubMedView ArticleGoogle Scholar
- Szabo M, Kiss J, Nagy Z, Chandler M, Olasz F: Sub-terminal sequences modulating IS30 transposition in vivo and in vitro. J Mol Biol. 2008, 375: 337-352. 10.1016/j.jmb.2007.10.043.PubMedView ArticleGoogle Scholar
- Lewis LA, Cylin E, Lee HK, Saby R, Wong W, Grindley ND: The left end of IS2: a compromise between transpositional activity and an essential promoter function that regulates the transposition pathway. J Bacteriol. 2004, 186: 858-865. 10.1128/JB.186.3.858-865.2004.PubMed CentralPubMedView ArticleGoogle Scholar
- Szeverenyi I, Bodoky T, Olasz F: Isolation, characterization and transposition of an (IS2)2 intermediate. Mol Gen Genet. 1996, 251: 281-289.PubMedGoogle Scholar
- Ton-Hoang B, Betermier M, Polard P, Chandler M: Assembly of a strong promoter following IS911 circularization and the role of circles in transposition. Embo J. 1997, 16: 3357-3371. 10.1093/emboj/16.11.3357.PubMed CentralPubMedView ArticleGoogle Scholar
- Sekine Y, Aihara K, Ohtsubo E: Linearization and transposition of circular molecules of insertion sequence IS3. J Mol Biol. 1999, 294: 21-34. 10.1006/jmbi.1999.3181.PubMedView ArticleGoogle Scholar
- Haas M, Rak B: Escherichia coli insertion sequence IS150: transposition via circular and linear intermediates. J Bacteriol. 2002, 184: 5833-5841. 10.1128/JB.184.21.5833-5841.2002.PubMed CentralPubMedView ArticleGoogle Scholar
- Haren L, Ton-Hoang B, Chandler M: Integrating DNA: TPases and retroviral integrases. Annu Rev Microbiol. 1999, 53: 245-281. 10.1146/annurev.micro.53.1.245.PubMedView ArticleGoogle Scholar
- Nowotny M: Retroviral integrase superfamily: the structural perspective. EMBO Rep. 2009, 10: 144-151. 10.1038/embor.2008.256.PubMed CentralPubMedView ArticleGoogle Scholar
- Rice PA, Baker TA: Comparative architecture of TPase and integrase complexes. Nat Struct Biol. 2001, 8: 302-307. 10.1038/86166.View ArticleGoogle Scholar
- Rowland SJ, Dyke KG: Tn552, a novel transposable element from Staphylococcus aureus. Mol Microbiol. 1990, 4: 961-975. 10.1111/j.1365-2958.1990.tb00669.x.PubMedView ArticleGoogle Scholar
- Nagy Z, Szabo M, Chandler M, Olasz F: Analysis of the N-terminal DNA binding domain of the IS30 TPase. Mol Microbiol. 2004, 54: 478-488. 10.1111/j.1365-2958.2004.04279.x.PubMedView ArticleGoogle Scholar
- Stalder R, Caspers P, Olasz F, Arber W: The N-terminal domain of the insertion sequence 30 TPase interacts specifically with the terminal inverted repeats of the element. J Biol Chem. 1990, 265: 3757-3762.PubMedGoogle Scholar
- Haren L, Normand C, Polard P, Alazard R, Chandler M: IS911 transposition is regulated by protein-protein interactions via a leucine zipper motif. J Mol Biol. 2000, 296: 757-768. 10.1006/jmbi.1999.3485.PubMedView ArticleGoogle Scholar
- Haren L, Polard P, Ton-Hoang B, Chandler M: Multiple oligomerisation domains in the IS911 TPase: a leucine zipper motif is essential for activity. J Mol Biol. 1998, 283: 29-41. 10.1006/jmbi.1998.2053.PubMedView ArticleGoogle Scholar
- Hu ST, Hwang JH, Lee LC, Lee CH, Li PL, Hsieh YC: Functional analysis of the 14 kDa protein of insertion sequence 2. J Mol Biol. 1994, 236: 503-513. 10.1006/jmbi.1994.1161.PubMedView ArticleGoogle Scholar
- Lei GS, Hu ST: Functional domains of the InsA protein of IS2. J Bacteriol. 1997, 179: 6238-6243.PubMed CentralPubMedGoogle Scholar
- Rousseau P, Gueguen E, Duval-Valentin G, Chandler M: The helix-turn-helix motif of bacterial insertion sequence IS911 TPase is required for DNA binding. Nucleic Acids Res. 2004, 32: 1335-1344. 10.1093/nar/gkh276.PubMed CentralPubMedView ArticleGoogle Scholar
- Barth S, Huhn M, Matthey B, Klimka A, Galinski EA, Engert A: Compatible-solute-supported periplasmic expression of functional recombinant proteins under stress conditions. Appl Environ Microbiol. 2000, 66: 1572-1579. 10.1128/AEM.66.4.1572-1579.2000.PubMed CentralPubMedView ArticleGoogle Scholar
- Davis GD, Elisee C, Newham DM, Harrison RG: New fusion protein systems designed to give soluble expression in Escherichia coli. Biotechnol Bioeng. 1999, 65: 382-388. 10.1002/(SICI)1097-0290(19991120)65:4<382::AID-BIT2>3.0.CO;2-I.PubMedView ArticleGoogle Scholar
- Galloway CA, Sowden MP, Smith HC: Increasing the yield of soluble recombinant protein expressed in E. coli by induction during late log phase. Biotechniques. 2003, 34: 524-526. 528, 530PubMedGoogle Scholar
- Jenkins TM, Hickman AB, Dyda F, Ghirlando R, Davies DR, Craigie R: Catalytic domain of human immunodeficiency virus type 1 integrase: identification of a soluble mutant by systematic replacement of hydrophobic residues. Proc Natl Acad Sci USA. 1995, 92: 6057-6061. 10.1073/pnas.92.13.6057.PubMed CentralPubMedView ArticleGoogle Scholar
- Compaan DM, Ellington WR: Functional consequences of a gene duplication and fusion event in an arginine kinase. J Exp Biol. 2003, 206: 1545-1556. 10.1242/jeb.00299.PubMedView ArticleGoogle Scholar
- Stempfer G, Holl-Neugebauer B, Rudolph R: Improved refolding of an immobilized fusion protein. Nat Biotechnol. 1996, 14: 329-334. 10.1038/nbt0396-329.PubMedView ArticleGoogle Scholar
- Armstrong N, de Lencastre A, Gouaux E: A new protein folding screen: application to the ligand binding domains of a glutamate and kainate receptor and to lysozyme and carbonic anhydrase. Protein Sci. 1999, 8: 1475-1483. 10.1110/ps.8.7.1475.PubMed CentralPubMedView ArticleGoogle Scholar
- Chen GQ, Gouaux E: Overexpression of a glutamate receptor (GluR2) ligand binding domain in Escherichia coli: application of a novel protein folding screen. Proc Natl Acad Sci USA. 1997, 94: 13431-13436. 10.1073/pnas.94.25.13431.PubMed CentralPubMedView ArticleGoogle Scholar
- Rudolph R, Lilie H: In vitro folding of inclusion body proteins. Faseb J. 1996, 10: 49-56.PubMedGoogle Scholar
- Waldo GS, Standish BM, Berendzen J, Terwilliger TC: Rapid protein-folding assay using green fluorescent protein. Nat Biotechnol. 1999, 17: 691-695. 10.1038/10904.PubMedView ArticleGoogle Scholar
- Chalmers R, Guhathakurta A, Benjamin H, Kleckner N: IHF modulation of Tn10 transposition: sensory transduction of supercoiling status via a proposed protein/DNA molecular spring. Cell. 1998, 93: 897-908. 10.1016/S0092-8674(00)81449-X.PubMedView ArticleGoogle Scholar
- Chalmers RM, Kleckner N: Tn10/IS10 TPase purification, activation, and in vitro reaction. J Biol Chem. 1994, 269: 8029-8035.PubMedGoogle Scholar
- Craig NL, Nash HA: E. coli integration host factor binds to specific sites in DNA. Cell. 1984, 39: 707-716. 10.1016/0092-8674(84)90478-1.PubMedView ArticleGoogle Scholar
- Krebs MP, Reznikoff WS: Use of a Tn5 derivative that creates lacZ translational fusions to obtain a transposition mutant. Gene. 1988, 63: 277-285. 10.1016/0378-1119(88)90531-8.PubMedView ArticleGoogle Scholar
- Gasteiger E, Gattiker A, Hoogland C, Ivanyi I, Appel RD, Bairoch A: ExPASy: The proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Res. 2003, 31: 3784-3788. 10.1093/nar/gkg563.PubMed CentralPubMedView ArticleGoogle Scholar
- Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, et al: Clustal W and Clustal × version 2.0. Bioinformatics. 2007, 23: 2947-2948. 10.1093/bioinformatics/btm404.PubMedView ArticleGoogle Scholar
- McGuffin LJ, Bryson K, Jones DT: The PSIPRED protein structure prediction server. Bioinformatics. 2000, 16: 404-405. 10.1093/bioinformatics/16.4.404.PubMedView ArticleGoogle Scholar
- Prere MF, Chandler M, Fayet O: Transposition in Shigella dysenteriae: isolation and analysis of IS911, a new member of the IS3 group of insertion sequences. J Bacteriol. 1990, 172: 4090-4099.PubMed CentralPubMedGoogle Scholar
- Ouali M, King RD: Cascaded multiple classifiers for secondary structure prediction. Protein Sci. 2000, 9: 1162-1176. 10.1110/ps.9.6.1162.PubMed CentralPubMedView ArticleGoogle Scholar
- Combet C, Blanchet C, Geourjon C, Deleage G: NPS@: network protein sequence analysis. Trends Biochem Sci. 2000, 25: 147-150. 10.1016/S0968-0004(99)01540-6.PubMedView ArticleGoogle Scholar
- Rost B, Sander C: Combining evolutionary information and neural networks to predict protein secondary structure. Proteins. 1994, 19: 55-72. 10.1002/prot.340190108.PubMedView ArticleGoogle Scholar
- Dodd IB, Egan JB: Improved detection of helix-turn-helix DNA-binding motifs in protein sequences. Nucleic Acids Res. 1990, 18: 5019-5026. 10.1093/nar/18.17.5019.PubMed CentralPubMedView ArticleGoogle Scholar
- Lupas A: Coiled coils: new structures and new functions. Trends Biochem Sci. 1996, 21: 375-382.PubMedView ArticleGoogle Scholar
- Lupas A, Van Dyke M, Stock J: Predicting coiled coils from protein sequences. Science. 1991, 252: 1162-1164. 10.1126/science.252.5009.1162.PubMedView ArticleGoogle Scholar
- Bornberg-Bauer E, Rivals E, Vingron M: Computational approaches to identify leucine zippers. Nucleic Acids Res. 1998, 26: 2740-2746. 10.1093/nar/26.11.2740.PubMed CentralPubMedView ArticleGoogle Scholar
- Fayet O, Ramond P, Polard P, Prere MF, Chandler M: Functional similarities between retroviruses and the IS3 family of bacterial insertion sequences?. Mol Microbiol. 1990, 4: 1771-1777. 10.1111/j.1365-2958.1990.tb00555.x.PubMedView ArticleGoogle Scholar
- Rezsohazy R, Hallet B, Delcour J, Mahillon J: The IS4 family of insertion sequences: evidence for a conserved TPase motif. Mol Microbiol. 1993, 9: 1283-1295. 10.1111/j.1365-2958.1993.tb01258.x.PubMedView ArticleGoogle Scholar
- Katzman M, Mack JP, Skalka AM, Leis J: A covalent complex between retroviral integrase and nicked substrate DNA. Proc Natl Acad Sci USA. 1991, 88: 4695-4699. 10.1073/pnas.88.11.4695.PubMed CentralPubMedView ArticleGoogle Scholar
- Kulkosky J, Jones KS, Katz RA, Mack JP, Skalka AM: Residues critical for retroviral integrative recombination in a region that is highly conserved among retroviral/retrotransposon integrases and bacterial insertion sequence TPases. Mol Cell Biol. 1992, 12: 2331-2338.PubMed CentralPubMedView ArticleGoogle Scholar
- Rice P, Mizuuchi K: Structure of the bacteriophage Mu TPase core: a common structural motif for DNA transposition and retroviral integration. Cell. 1995, 82: 209-220. 10.1016/0092-8674(95)90308-9.PubMedView ArticleGoogle Scholar
- Ohta S, Tsuchida K, Choi S, Sekine Y, Shiga Y, Ohtsubo E: Presence of a characteristic D-D-E motif in IS1 TPase. J Bacteriol. 2002, 184: 6146-6154. 10.1128/JB.184.22.6146-6154.2002.PubMed CentralPubMedView ArticleGoogle Scholar
- Ton-Hoang B, Turlan C, Chandler M: Functional domains of the IS1 TPase: analysis in vivo and in vitro. Mol Microbiol. 2004, 53: 1529-1543. 10.1111/j.1365-2958.2004.04223.x.PubMedView ArticleGoogle Scholar
- Davies DR, Goryshin IY, Reznikoff WS, Rayment I: Three-dimensional structure of the Tn5 synaptic complex transposition intermediate. Science. 2000, 289: 77-85. 10.1126/science.289.5476.77.PubMedView ArticleGoogle Scholar
- Chen JC, Krucinski J, Miercke LJ, Finer-Moore JS, Tang AH, Leavitt AD, Stroud RM: Crystal structure of the HIV-1 integrase catalytic core and C-terminal domains: a model for viral DNA binding. Proc Natl Acad Sci USA. 2000, 97: 8233-8238. 10.1073/pnas.150220297.PubMed CentralPubMedView ArticleGoogle Scholar
- Dyda F, Hickman AB, Jenkins TM, Engelman A, Craigie R, Davies DR: Crystal structure of the catalytic domain of HIV-1 integrase: similarity to other polynucleotidyl transferases. Science. 1994, 266: 1981-1986. 10.1126/science.7801124.PubMedView ArticleGoogle Scholar
- Goldgur Y, Dyda F, Hickman AB, Jenkins TM, Craigie R, Davies DR: Three new structures of the core domain of HIV-1 integrase: an active site that binds magnesium. Proc Natl Acad Sci USA. 1998, 95: 9150-9154. 10.1073/pnas.95.16.9150.PubMed CentralPubMedView ArticleGoogle Scholar
- Maignan S, Guilloteau JP, Zhou-Liu Q, Clement-Mella C, Mikol V: Crystal structures of the catalytic domain of HIV-1 integrase free and complexed with its metal cofactor: high level of similarity of the active site with other viral integrases. J Mol Biol. 1998, 282: 359-368. 10.1006/jmbi.1998.2002.PubMedView ArticleGoogle Scholar
- Bujacz G, Jaskolski M, Alexandratos J, Wlodawer A, Merkel G, Katz RA, Skalka AM: High-resolution structure of the catalytic domain of avian sarcoma virus integrase. J Mol Biol. 1995, 253: 333-346. 10.1006/jmbi.1995.0556.PubMedView ArticleGoogle Scholar
- Yang ZN, Mueser TC, Bushman FD, Hyde CC: Crystal structure of an active two-domain derivative of Rous sarcoma virus integrase. J Mol Biol. 2000, 296: 535-548. 10.1006/jmbi.1999.3463.PubMedView ArticleGoogle Scholar
- Katayanagi K, Miyagawa M, Matsushima M, Ishikawa M, Kanaya S, Ikehara M, Matsuzaki T, Morikawa K: Three-dimensional structure of ribonuclease H from E. coli. Nature. 1990, 347: 306-309. 10.1038/347306a0.PubMedView ArticleGoogle Scholar
- Yang W, Hendrickson WA, Crouch RJ, Satow Y: Structure of ribonuclease H phased at 2 A resolution by MAD analysis of the selenomethionyl protein. Science. 1990, 249: 1398-1405. 10.1126/science.2169648.PubMedView ArticleGoogle Scholar
- Ariyoshi M, Vassylyev DG, Iwasaki H, Nakamura H, Shinagawa H, Morikawa K: Atomic structure of the RuvC resolvase: a holliday junction-specific endonuclease from E. coli. Cell. 1994, 78: 1063-1072. 10.1016/0092-8674(94)90280-1.PubMedView ArticleGoogle Scholar
- Rice P, Craigie R, Davies DR: Retroviral integrases and their cousins. Curr Opin Struct Biol. 1996, 6: 76-83. 10.1016/S0959-440X(96)80098-4.PubMedView ArticleGoogle Scholar
- Lovell S, Goryshin IY, Reznikoff WR, Rayment I: Two-metal active site binding of a Tn5 TPase synaptic complex. Nat Struct Biol. 2002, 9: 278-281. 10.1038/nsb778.PubMedView ArticleGoogle Scholar
- Wintjens R, Rooman M: Structural classification of HTH DNA-binding domains and protein-DNA interaction modes. J Mol Biol. 1996, 262: 294-313. 10.1006/jmbi.1996.0514.PubMedView ArticleGoogle Scholar
- Landschulz WH, Johnson PF, McKnight SL: The leucine zipper: a hypothetical structure common to a new class of DNA binding proteins. Science. 1988, 240: 1759-1764. 10.1126/science.3289117.PubMedView ArticleGoogle Scholar
- O'Shea EK, Klemm JD, Kim PS, Alber T: X-ray structure of the GCN4 leucine zipper, a two-stranded, parallel coiled coil. Science. 1991, 254: 539-544. 10.1126/science.1948029.PubMedView ArticleGoogle Scholar
- Jenkins TM, Esposito D, Engelman A, Craigie R: Critical contacts between HIV-1 integrase and viral DNA identified by structure-based analysis and photo-crosslinking. Embo J. 1997, 16: 6849-6859. 10.1093/emboj/16.22.6849.PubMed CentralPubMedView ArticleGoogle Scholar
- Zimmer M: Green fluorescent protein (GFP): applications, structure, and related photophysical behavior. Chem Rev. 2002, 102: 759-781. 10.1021/cr010142r.PubMedView ArticleGoogle Scholar
- Gupta RD, Tawfik DS: Directed enzyme evolution via small and effective neutral drift libraries. Nat Methods. 2008, 5: 939-942. 10.1038/nmeth.1262.PubMedView ArticleGoogle Scholar
- Hanson DA, Ziegler SF: Fusion of green fluorescent protein to the C-terminus of granulysin alters its intracellular localization in comparison to the native molecule. J Negat Results Biomed. 2004, 3: 2-10.1186/1477-5751-3-2.PubMed CentralPubMedView ArticleGoogle Scholar
- Liu AX, Zhang SB, Xu XJ, Ren DT, Liu GQ: Soluble expression and characterization of a GFP-fused pea actin isoform (PEAc1). Cell Res. 2004, 14: 407-414. 10.1038/sj.cr.7290241.PubMedView ArticleGoogle Scholar
- Davies DR, Mahnke Braam L, Reznikoff WS, Rayment I: The three-dimensional structure of a Tn5 TPase-related protein determined to 2.9-A resolution. J Biol Chem. 1999, 274: 11904-11913. 10.1074/jbc.274.17.11904.PubMedView ArticleGoogle Scholar
- Wiegand TW, Reznikoff WS: Interaction of Tn5 TPase with the transposon termini. J Mol Biol. 1994, 235: 486-495. 10.1006/jmbi.1994.1008.PubMedView ArticleGoogle Scholar
- Hennig S, Ziebuhr W: Characterization of the TPase encoded by IS256, the prototype of a major family of bacterial insertion sequence elements. J Bacteriol. 2010, 192: 4153-4163. 10.1128/JB.00226-10.PubMed CentralPubMedView ArticleGoogle Scholar
- Derbyshire KM, Grindley ND: Binding of the IS903 TPase to its inverted repeat in vitro. Embo J. 1992, 11: 3449-3455.PubMed CentralPubMedGoogle Scholar
- Vos JC, van Luenen HG, Plasterk RH: Characterization of the Caenorhabditis elegans Tc1 TPase in vivo and in vitro. Genes Dev. 1993, 7: 1244-1253. 10.1101/gad.7.7a.1244.PubMedView ArticleGoogle Scholar
- Mack AM, Crawford NM: The Arabidopsis TAG1 TPase has an N-terminal zinc finger DNA binding domain that recognizes distinct subterminal motifs. Plant Cell. 2001, 13: 2319-2331.PubMed CentralPubMedView ArticleGoogle Scholar
- Betts MJ, Russell RB: Amino acid properties and consequences of substitutions. Bioinformatics for Geneticists. Edited by: Barnes MI, Gray IC. 2003, Chichester, UK: John Wiley and Sons Ltd., 289-316.View ArticleGoogle Scholar
- Henikoff S, Henikoff JG: Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA. 1992, 89: 10915-10919. 10.1073/pnas.89.22.10915.PubMed CentralPubMedView ArticleGoogle Scholar
- Pabo CO, Sauer RT: Transcription factors: structural families and principles of DNA recognition. Annu Rev Biochem. 1992, 61: 1053-1095. 10.1146/annurev.bi.61.070192.005201.PubMedView ArticleGoogle Scholar
- Clubb RT, Schumacher S, Mizuuchi K, Gronenborn AM, Clore GM: Solution structure of the I gamma subdomain of the Mu end DNA-binding domain of phage Mu TPase. J Mol Biol. 1997, 273: 19-25. 10.1006/jmbi.1997.1312.PubMedView ArticleGoogle Scholar
- Schumacher S, Clubb RT, Cai M, Mizuuchi K, Clore GM, Gronenborn AM: Solution structure of the Mu end DNA-binding ibeta subdomain of phage Mu TPase: modular DNA recognition by two tethered domains. Embo J. 1997, 16: 7532-7541. 10.1093/emboj/16.24.7532.PubMed CentralPubMedView ArticleGoogle Scholar
- van Pouderoyen G, Ketting RF, Perrakis A, Plasterk RH, Sixma TK: Crystal structure of the specific DNA-binding domain of Tc3 TPase of C.elegans in complex with transposon DNA. Embo J. 1997, 16: 6044-6054. 10.1093/emboj/16.19.6044.PubMed CentralPubMedView ArticleGoogle Scholar
- Izsvak Z, Khare D, Behlke J, Heinemann U, Plasterk RH, Ivics Z: Involvement of a bifunctional, paired-like DNA-binding domain and a transpositional enhancer in Sleeping Beauty transposition. J Biol Chem. 2002, 277: 34581-34588. 10.1074/jbc.M204001200.PubMedView ArticleGoogle Scholar
- Kissinger CR, Liu BS, Martin-Blanco E, Kornberg TB, Pabo CO: Crystal structure of an engrailed homeodomain-DNA complex at 2.8 A resolution: a framework for understanding homeodomain-DNA interactions. Cell. 1990, 63: 579-590. 10.1016/0092-8674(90)90453-L.PubMedView ArticleGoogle Scholar
- Xu W, Rould MA, Jun S, Desplan C, Pabo CO: Crystal structure of a paired domain-DNA complex at 2.5 A resolution reveals structural basis for Pax developmental mutations. Cell. 1995, 80: 639-650. 10.1016/0092-8674(95)90518-9.PubMedView ArticleGoogle Scholar
- Gonzalez L, Woolfson DN, Alber T: Buried polar residues and structural specificity in the GCN4 leucine zipper. Nat Struct Biol. 1996, 3: 1011-1018. 10.1038/nsb1296-1011.PubMedView ArticleGoogle Scholar
- Graddis TJ, Myszka DG, Chaiken IM: Controlled formation of model homo- and heterodimer coiled coil polypeptides. Biochemistry. 1993, 32: 12664-12671. 10.1021/bi00210a015.PubMedView ArticleGoogle Scholar
- O'Shea EK, Lumb KJ, Kim PS: Peptide 'Velcro': design of a heterodimeric coiled coil. Curr Biol. 1993, 3: 658-667. 10.1016/0960-9822(93)90063-T.PubMedView ArticleGoogle Scholar
- Baker TA, Luo L: Identification of residues in the Mu TPase essential for catalysis. Proc Natl Acad Sci USA. 1994, 91: 6654-6658. 10.1073/pnas.91.14.6654.PubMed CentralPubMedView ArticleGoogle Scholar
- Esposito D, Craigie R: Sequence specificity of viral end DNA binding by HIV-1 integrase reveals critical regions for protein-DNA interaction. Embo J. 1998, 17: 5832-5843. 10.1093/emboj/17.19.5832.PubMed CentralPubMedView ArticleGoogle Scholar
- van Gent DC, Groeneger AA, Plasterk RH: Mutational analysis of the integrase protein of human immunodeficiency virus type 2. Proc Natl Acad Sci USA. 1992, 89: 9598-9602. 10.1073/pnas.89.20.9598.PubMed CentralPubMedView ArticleGoogle Scholar
- Calmels C, de Soultrait VR, Caumont A, Desjobert C, Faure A, Fournier M, Tarrago-Litvak L, Parissi V: Biochemical and random mutagenesis analysis of the region carrying the catalytic E152 amino acid of HIV-1 integrase. Nucleic Acids Res. 2004, 32: 1527-1538. 10.1093/nar/gkh298.PubMed CentralPubMedView ArticleGoogle Scholar
- Andrake MD, Skalka AM: Retroviral integrase, putting the pieces together. J Biol Chem. 1996, 271: 19633-19636. 10.1074/jbc.271.33.19633.PubMedView ArticleGoogle Scholar
- Valkov E, Gupta SS, Hare S, Helander A, Roversi P, McClure M, Cherepanov P: Functional and structural characterization of the integrase from the prototype foamy virus. Nucleic Acids Res. 2009, 37: 243-255. 10.1093/nar/gkn938.PubMed CentralPubMedView ArticleGoogle Scholar
- Obradovic Z, Peng K, Vucetic S, Radivojac P, Dunker AK: Exploiting heterogeneous sequence properties improves prediction of protein disorder. Proteins. 2005, 61 (Suppl 7): 176-182.PubMedView ArticleGoogle Scholar
- Sickmeier M, Hamilton JA, LeGall T, Vacic V, Cortese MS, Tantos A, Szabo B, Tompa P, Chen J, Uversky VN, Obradovic Z, Dunker AK: DisProt: the Database of Disordered Proteins. Nucleic Acids Res. 2007, 35: D786-793. 10.1093/nar/gkl893.PubMed CentralPubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.