Skip to main content
Fig. 3 | Mobile DNA

Fig. 3

From: RepEnTools: an automated repeat enrichment analysis package for ChIP-seq data reveals hUHRF1 Tandem-Tudor domain enrichment in young repeats

Fig. 3

RepEnTools produces fast, efficient and reliable mapping of HTS reads on the human chm13v2 genome assembly. A RepEnTools-HISAT2, the alignment programme employed in RepEnTools, is faster than a range of popular alternatives for ChIP-seq data alignment to a T2T assembly. On average, it requires 14 min to align one ChIP-seq dataset (1.1–1.3∙107 paired-end sequences). RepEnTools-HISAT2 uses the optimised settings of HISAT2 with a defined maximum fragment length, suppression of spliced alignment, and improved randomisation of multimapping reads. D—default. The settings are described in detail in Material and Methods. See also Additional file 1: Fig. S3A-B. The datasets (n = 4) originate from pulldown enriched fragments (2 datasets) and input chromatin (2 datasets). B RepEnTools-HISAT2 has low demands on CPU resources due to HISAT2 optimised software architecture [27]. C RepEnTools-HISAT2 has low memory requirements due to HISAT2 memory minimisation strategies [27]. D RepEnTools-HISAT2 generates a comparable number of primary alignments to popular alternatives using the same datasets. Application of the MAPQ ≥ 40 criterion shows comparable number of “unique, high-quality” primary alignments. See also Additional file 1: Fig. S3C. Primary mapped read counts were reported by flagstat (SAMTools) [36] for all aligners for consistency, and divided by the number of total primary reads of each BAM file. Filtering for primary reads with MAPQ ≥ 40 done using SAMTools. E All alignment algorithms produced insert sizes (IS) of comparable statistics for these very large datasets (> 107 datapoints). Here, input datasets (n = 2) are shown beside the CIDOP datasets (n = 2) for each algorithm. Insert size (IS) was extracted from the TLEN of primary alignments from SAM files and plotted using MatPlotLib. The central line is the median, box borders are 25th to 75th percentile, and the whiskers show the deviation by 1.5 times the inter-quartile range (0.35th to 99.65th percentile in a normal distribution). F Using RepEnTools-HISAT2, the fraction of insert size (IS) outliers is comparable to the best alternatives. Among IS outliers, i.e. the inserts that exceed 2 × the length of the maximum fragment size reliably observed in the specific ChIP-seq library, discrepancies of more than an order of magnitude are seen. STAR alignments always have zero (0) inserts at IS ≥ 2 × max. See also Additional file 1: Fig. S3D. The data presented here were generated using the two biological replicates of hUHRF1-TTD CIDOP and their respective inputs. The bar diagrams represent the average of n = 4 independent datasets and the whiskers are standard deviation. Open circles show individual datapoints. All jobs were run on usegalaxy.eu, using m3.2xlarge (30 GB / 8 vCPUs / Intel Xeon E5-2670 v2 (Ivy Bridge/Sandy Bridge)) machines, except all STAR runs, that were allocated to m5d.4xlarge (64 GB / 16 vCPUs / Intel Xeon Platinum 8175) machines. All job metrics were retrieved from the individual usegalaxy.eu dataset details

Back to article page