6. Peakheatmap: Large-scale profile analysis#

Churros provides commands for clustering and visualizing large-scale peak profiles. These functions take a reference region (BED format) and cluster regions based on the overlap pattern of peaks or bigWig profiles. The sample scripts are also available at Churros GitHub site.

6.1. peakheatmap_binary: Binary comparison#

peakheatmap_binary outputs a binary matrix (output 1) representing the peak overlap for given genomic regions. The binary matrix is then formatted and sorted by the user-defined column (i.e., the filename of the selected marker) to generate the processed matrix (output 2) and plot the sorted heatmap (output 3). It then utilizes PCA followed by k-means clustering (or other clustering methods) to produce the clustered matrix (output 4) and the clustered heatmap (output 5).

The main usages are:

peakheatmap_binary <region> <directory>

The required parameters:

  • region: a BED file for regions of interest. Only the first 3 columns are used.

  • directory: a directory containing the peak files.

The optional parameters:

  • -k kcluster: number of clusters for clustered matrix and clustered heatmap. The default value is 3.

  • -s sortname: the filename of the selected marker in the directory above. This is used for the processed matrix and sorted heatmap.

  • -n normalize type: Normalization methods for continuous data, could be zscore or scale0to1. Default: zscore.

  • -m clustering method: minikmeans, kmeans, spectral, meanshift, dbscan, affinity

  • -l samplelabel: A .tsv table used to assign groups for each marker in the directory above. For example, it could look like this.

GATA3_ENCSR000EWV_rep1_peaks.narrowPeak protein
GATA3_ENCSR000EWV_rep2_peaks.narrowPeak protein
H3K27ac_ENCSR000EWR_rep1_peaks.narrowPeak       histone
H3K27ac_ENCSR000EWR_rep2_peaks.narrowPeak       histone
H3K27ac_ENCSR752UOD_rep2_peaks.narrowPeak       histone
H3K27ac_ENCSR752UOD_rep3_peaks.narrowPeak       histone

6.1.1. Example usage#

peakheatmap_binary -l samplelabel.tsv reference.bed ./peakdir/

This command takes as input a file representing regions of interest (reference.bed) and a peak directory (./peakdir/). We also assigned labels to the files in the ./peakdir/ directory.

Five output files are generated:

Output1_raw_matrix.tsv
Output2_sorted_matrix.tsv
Output3_sorted_heatmap.png
Output4_kmeans_matrix.tsv
Output5_kmeans_heatmap.png
Alternate

Fig. 6.1 Output5_kmeans_heatmap.png#

6.2. peakheatmap_quantitative: Quantitative comparison#

peakheatmap_quantitative calculates the read density of each ChIP samples at given genomic regions. After log transformation, z-score normalization (optional method is 0-to-1 scaling), and sorting, it generates the remaining outputs in the same manner as in binary mode.

peakheatmap_quantitative [Options] <genometable> <region> <bigwigs>
   <genometable>: genome table file (e.g., genometable.txt)
   <region>: BED file of regions to analyze
   <bigwigs>: bigWig files for comparison (should be quoted)
   Options:
      -k <int>: number of clusters (default: 3)
      -b <int>: bin size (default: 1000)
      -s <str>: sort name (default: "defaultUseFirstColumn")
      -n <str>: normalization type (default: "zscore")
      -l <str>: sample label TSV file
      -m <str>: clustering method (default: "minikmeans")
      -o <str>: output directory (default: "output_quantitative/")

6.2.1. Example usage#

bwdir="Churros_result/hg38/bigWig/TotalReadNormalized/""
bws=`ls $bwdir/*.100.bw`

peakheatmap_quantitative -b 1000 genometable.txt reference.bed "$bws"