국가생명연구자원정보센터(KOBIC)

분석 파이프라인

Whole-genome sequencing pipeline

카테고리Bioinformatics > Whole-Genome-Sequencing
수정일자2025-05-20 11:37:40

#Whole Genome Sequencing#WGS#Genomics#Next Generation Sequencing#Precision Medicine#Clinical Genomics#noncoding genome#GATK#fastp#Cutadapt#BWA#SortSam#MarkDuplicates#CountBase#BaseRecalibrator#ApplyBQSR#HaplotypeCaller#somalier

The Whole-genome sequencing(WGS) pipeline is a modular toolkit for processing WGS data. This pipeline takes a FASTQ file as input and provides haplotype call results and annotations and visualizations based on GATK pipeline. First, raw read data with well-calibrated base error estimates in FASTQ format are mapped to the reference genome. The BWA mapping tool is used to align reads to the human genome reference, allowing for up to two mismatches in 30-base seeds, and generate a technology-independent SAM/BAM reference file format. Next, duplicate fragments are marked and removed using Picard(http://picard.sourceforge.net), mapping quality is assessed and low-quality mapped reads are filtered, and Paired-read information is also evaluated to ensure that all mate-pair information is in sync between each read. We then refine the initial alignments with local realignment and identify suspicious regions. Using this information as a covariate along with other technical covariates and known sites of variation, the GATK base quality score recalibration(BQSR) is performed. Germline SNPs and indels are called via local reassembly of haplotypes using the recalibrated and realigned BAM files. Finally, we provide Somalier, a tool to quickly assessing sample relevance from sequencing data in BAM, CRAM or VCF format.

버전1.0
마지막 업데이트10일 전
기여자

궁금한 점이 있으신가요? 문의하기