Skip to main content
Fig. 4 | Mobile DNA

Fig. 4

From: RepEnTools: an automated repeat enrichment analysis package for ChIP-seq data reveals hUHRF1 Tandem-Tudor domain enrichment in young repeats

Fig. 4

RepEnTools analysis is reliable for repeat masker regions, excluding some simple repeats. A Using real data, RepEnTools-HISAT2 shows comparable mapping efficiency on Repeat Masker (RMSK) annotated regions to popular alternatives. Applying the MAPQ ≥ 40 criterion, we found comparable number of “unique, high-quality” primary alignments. The real data in this analysis are the two biological replicates of HepG2 inputs. CIDOP data were not used to avoid bias from experimental enrichment. The bar diagrams represent the average of n = 2 and the whiskers are standard deviation. D—default. Data were processed as in Fig. 3D. See also Additional file 1: Fig. S4A. B Using real data with RepEnTools-HISAT2, the fraction of insert size (IS) outliers, exceeding 2-times maximum IS, is comparable to the best alternatives. STAR alignments always have zero (0) inserts at IS ≥ 2 × max. Data were processed as in Fig. 3F. See also Additional file 1: Fig. S4B. C Using real experimental data (2 CIDOP + 2 Input), RepEnTools outputs enrichment scores that are well reproducible within the same implementation (Galaxy or UNIX), as well as across platforms. RepEnTools is even more precise when the Simple repeats are not considered. Pairwise Pearson correlations (r) were calculated for independent, complete runs of RepEnTools, considering either all 15,745 REs in RMSK, or only the 1,399 non-Simple repeat subfamilies. Each RepEnTools run processed all the datasets. D Comparison of the average enrichment scores between two complete and independent runs of the Galaxy implementation demonstrates the overall good reproduction of RepEnTools, while some Simple repeats are suboptimal for this type of analysis. Out of the 15,745 REs in RMSK, 436 are outliers with ≥ 2.5% relative difference in average enrichment scores. It is clear that this error in reproducibility is overwhelmingly seen among a fraction (< 500) of the 14,346 Simple repeats and correlates to low read density/abundance. See also Additional file 1: Fig. S4C-D

Back to article page