9. CUT&Tag/CUT&Run analysis#
This page describes how to analyze CUT&Tag/CUT&Run data with Churros. Churros includes SEACR . The sample scripts are also available at Churros GitHub site.
Note
churros.sif). Please add apptainer exec churros.sif before each command below.apptainer exec churros.sif download_genomedata.sh9.1. Get data#
Here we use CUT&Tag data from Kaya-Okur et al, Nat Protoc. 2020, which are paired-end.
mkdir -p fastq
for id in SRR12246717 SRR11074240 SRR11074254 SRR11074258 SRR11923224 SRR8754611
do
$sing pfastq-dump -t 4 -s $id -O fastq/ --gzip --split-files
done
download_genomedata.sh and build-index.sh.hg38 for genome build. See Appendix for the detail of genome build.mkdir -p log
build=hg38 # genome build
Ddir=Referencedata_$build # output directory
ncore=12 # number of CPUs
# download the genome
download_genomedata.sh -s $build $Ddir
# make Bismark index
build-index.sh -p $ncore bowtie2 $Ddir
9.2. Running Churros#
Several specific parameters must be specified for CUT&Tag analysis.
First, adapter trimming is highly recommended in CUT&Tag and can be enabled with the
--fastqtrimmingoption.Second, reads that are typically marked as PCR duplicates should not be filtered out because, in CUT&Tag, they may reflect high target specificity rather than PCR amplification artifacts. The
--keepdupoption opts out of the duplicate filtering step.Third, here we use the parameter set for mapping recommended by Kaya-Okur et al, Nat Protoc. 2020.
The resulting command is as follows.
build=hg38
Ddir=Referencedata_$build
ncore=12
# Use the following parameters for Bowtie2 mapping
btparam="--end-to-end --very-sensitive --no-mixed --no-discordant --phred33 -I 10 -X 700"
churros -p $ncore --keepdup --fastqtrimming --mapparam "$btparam" \
samplelist.txt samplepairlist.txt $build $Ddir
9.3. Running SEACR#
SEACR is a peak-calling tool designed for CUT&Tag and CUT&RUN data.
Churros has churros_SEACR command for it.
churros_SEACR [Options] <samplelist> <samplepairlist> <Churrosdir> <genometable>
<samplelist>: List of samples
<samplepairlist>: List of sample pairs
<Churrosdir>: Churros output directory
<genometable>: Path to genome table file
Options:
-p <int>: Number of threads (default: 4)
This command executes the steps below for all samples included in samplepairlist:
Sort BAM files by read names (*.qname.bam).
Generates bedGraph files to be used as input for SEACR.
Plots the fragment length distribution.
If control samples are available, such as IgG, runs SEACR in both stringent and relaxed modes.
If control samples are not available, runs SEACR in non-control mode with q < 0.01.
After running churros above, you can run churros_SEACR as follows.
build=hg38
Ddir=Referencedata_$build
gt=$Ddir/genometable.txt
Churrosdir=Churros_result/$build
ncore=8
$sing churros_SEACR -p $ncore samplelist.txt samplepairlist.txt $Churrosdir $gt
The results are output in Churros_result/hg38/SEACR/.