Skip to main content

Table 6 Data analysis using TIPseqHunter (Timing: variable)

From: Transposon insertion profiling by sequencing (TIPseq) for mapping LINE-1 insertions in the human genome

TIPseqHunter uses genome assembly GRCh37 (hg19) and can be run with a Docker image or by using individual programs.

TIPseqHunter was developed by Java (version 7) and R (version 3.2) languages and tested under Linux operating system and is available to download at: https://github.com/fenyolab/TIPseqHunter

Docker image for TIPseqHunter was developed with the stable version of Docker Community Edition (CE) and it may work under any operating system capable to run Docker. However, we recommend the Unix-like operating systems, such as Linux and Mac OS X. Our Docker image is an alternative to the conventional TIPseqHunter program mentioned above. This image version is available at Docker Hub registry (https://hub.docker.com/) and can be downloaded with the Docker client command: docker pull galantelab/tipseqhunter. For further details, check

https://github.com/galantelab/tipseq_hunter/blob/master/README.md

Testing data and masked and bowtie-built reference genome are available to download at: http://openslice.fenyolab.org/data/tipseqhunter/test_data

Docker Prerequisite:

The Docker image works as a container and runs exactly the same TIPseqHunter program. Neither downloading of dependencies nor manually setting of software used by TIPseqHunter are required. In order to run this container you will need only need to install Docker.

For OS X: https://docs.docker.com/mac/started/

For Linux: https://docs.docker.com/linux/started/

For Windows: https://docs.docker.com/docker-for-windows/

TIPseqHunter Prerequisites:

1. At least 10GB of memory is needed if the number of sequenced read-pairs is greater than 20M.

2. Bowtie 2 alignment software (version 2.2.3 used for testing): http://bowtie-bio.sourceforge.net/bowtie2/index.shtml

3. Samtools software (latest version): http://samtools.sourceforge.net/

4. Trimmomatic software (version 0.32 used for testing): http://www.usadellab.org/cms/?page=trimmomatic

5. Java packages: sam-1.112.jar, commons-math3-3.4.1.jar, jfreechart-1.0.14.jar, jcommon-1.0.17.jar, itextpdf-5.2.1.jar, biojava3-core-3.0.1.jar

6. R packages: pROC, ggplot2, caret, e1071

Critical: BAM file has to be generated by bowtie2 alignment with "XM" tag

Running TIPseqHunter:

(1) for quality control, alignment, feature selection, modeling, prediction:

./TIPseqHunterPipelineJar.sh fastq_path output_path fastq_r1 key_r1 key_r2 num_rp

Critical: Detailed information is provided in the TIPseqHunterPipelineJar.sh file. Some parameters need to be pre-set.

Parameters:

fastq_path: path of the fastq files (Note: this is the only path and file name is not included)

output_folder: path of the output files (Note: this is the only path and file name is not included)

fastq_r1: read 1 file name of paired fastq files

key_r1: key word to recognize read-1 fastq file (such as "_1" is the key word for CAGATC_1.fastq fastq file)

Critical: key has to be unique in the file name

key_r2: key word to recognize read-2 fastq file and replaceable with the read-1 key word to match to read-1 file (such as "_2" is the key word for CAGATC_2.fastq fastq file)

Critical: key has to be unique in the file name

num_rp: the total number of the read pairs in the paired fastq files (Note: it is the total number of read-pairs, i.e. either the total number of read1 or read2 but not together.) (This number is for normalization purpose)

(2) for somatic insertions:

TIPseqHunterPipelineJarSomatic.sh repred_path control_path repred_file control_file

Critical: Detailed information is provided in the TIPseqHunterPipelineJarSomatic.sh file. Some parameters need to be pre-set.

Parameters:

repred_path: path of ā€œmodelā€ folder under output folder

control_path: path "TRLocator" folder under output folder

repred_file: file with suffix ".repred" and generated from P11 in repred_path (Note: file name should be ending with ".repred".) (such as 302_T_GTCCGC.wsize100.regwsize1.minreads1.clip1.clipflk5.mindis150.FP.uniqgs.bed.csinfo.lm.l1hs.pred.txt.repred)

control_file: file with suffix ā€œ.bedā€ in control_path (Note: file name should be ending with ".bed".) (such as 302_N_GTGAAA.fastq.cleaned.fastq.pcsort.bam.w100.minreg1.mintag1