Program | Steps |
---|---|
blast_and_parse | • A tblastn search is done against NCBI’s ‘nr’ database with a set of representative group II intron ORF sequences as queries |
• A list of unique, non-overlapping candidate hits is assembled, along with accession number and coordinates | |
DNA_sequence_download | • The GenBank entry for each candidate DNA sequence is downloaded |
• Candidates are separated by taxonomic classification, with bacterial and archaeal candidates proceeding to the next step by default | |
create_storage | • Creates a FASTA file for each candidate’s DNA sequence |
• Creates storable files containing information about each candidate, to be used in later programs | |
filter_out_ non_gpII_rts | • A blastx search of candidate sequences is done against a local database of known, categorized bacterial RT sequences; candidate RTs whose closest relatives are not group II introns are separated out |
find_intron_class | • A blastx search of candidate sequences is done against a local database of known and classified group II intron ORF sequences; based on the top matches, the ORF class is assigned, and the closest relative in the curated set is identified |
find_orf_domains | • A blastx alignment is done between a candidate sequence and a representative IEP of the same class, whose IEP is mapped for the domains characteristic of group II introns |
• The domains present for each IEP are tabulated, and the candidate is categorized as having complete domains or missing domains; candidate sequences with complete IEP domains continue to be analyzed | |
find_orf | • A blastx alignment is done between each candidate sequence and its closest relative among curated group II introns |
• From the alignment, it is decided whether the candidate sequence contains frame shifts, premature stops or other problems within its IEP | |
• If the ORF appears intact, then a predicted amino acid sequence is assigned | |
find_intron_boundaries | • Information on possible boundary positions is acquired using class-specific HMM profiles of boundary sequences |
generate_rna_sequence | • Boundary sequence data are evaluated and the most probable intron boundaries are predicted, along with the complete sequence of the intron |
• Candidates with ambiguous boundaries are noted | |
group_candidates | • All ORF sequences assigned to a given class are aligned using ClustalW, and pair-wise distances are calculated using PROTDIST of the Phylip package |
• Sequences differing by less than 0.061 units are assigned to a group of 95% identity | |
• For each group of 95% identity, the complete intron DNA sequence of each member is aligned using ClustalW | |
select_prototypes | • For each group of 95% identity, one candidate sequence is selected as the prototype, or representative of the group |