NBDC Research ID: hum0160.v2

 

SUMMARY

Aims: To investigate genomic alterations of Japanese esophageal squamous cell carcinomas

Methods: DNAs were extracted from esophageal squamous cell carcinoma tissues and paired non-cancer (normal) tissues. NGS libaraies were prepared by using TruSeq Nano DNA Low Throughput Library Prep Kit for whole genome sequencing (WGS). Sequencing was perfomred by Illumina HiSeq 2000/2500/X Five.

Participants/Materials: DNAs extracted from cancer tissues and non-cancer tissues of Japanese esophageal squamous cell carcinoma patients.

 

Dataset IDType of DataCriteriaRelease Date
JGAS000155 NGS (WGS) Controlled-access (Type I) 2019/05/28
JGAS000155 (Data addition) bam/gvcf data of NGS (WGS) Controlled-access (Type I) 2021/07/13

*Release Note

*Data users need to apply an application for Using NBDC Human Data to reach the Controlled-access Data. Learn more

 

MOLECULAR DATA

JGAS000155

Participants/Materials

esophageal squamous cell carcinoma (ICD10: C15): 20 cases

          cancer tissues: 20 samples

          paired non-cancer tissues: 20 samples

Targets WGS
Target Loci for Capture Methods -
Platform Illumina [HiSeq 2000/2500/X Five]
Library Source DNAs extracted from cancer tissues and paired non-cancer tissues from esophageal squamous cell carcinoma patients
Cell Lines -
Library Construction (kit name) TruSeq Nano DNA Low Throughput Library Prep Kit
Fragmentation Methods Ultrasonic fragmentation (Covaris)
Spot Type Paired-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers) 100-150 bp
QC

Data with bad base quality and high %GC content were removed.

Aligment:

Data matched for the following condition were removed.

- Low mapping rate

- Different insert size

- Gender information mismatch between meta-data and genotype data

- Suspected sex chromosome aberration

Genotyping:

GATK’s best practices includes a variant filtering step following Variant Quality Score Recalibration (VQSR)

- DP/GP (DP < 5, GQ < 20, DP > 60, GQ < 95 )

- Heterozygosity (F>=0.05)

- Hardy-Weinberg equilibrium (p < 10^-6)

- Repeat & Low Complexity

Principal Component Analysis (PCA):

PCA was performed with individuals included in the 1000 genomes project and outliers from Japanese cluster were removed.

 

After these filtering steps, variants located in the regions listed as the HighConfidenceRegion (Genome-In-A-Bottle project) were flagged.

Deduplication Picard 2.10.6
Calibration for re-alignment and base quality GATK 3.7
Mapping Methods BWA mem 0.7.12
Mapping Quality Reads with MAPQ<20 were excluded at variant calling with GATK 3.7 HaplotypeCaller
Reference Genome Sequence GRCh37/hg19 (hs37d5)
Coverage (Depth) HiSeq 2000/2500/XFive: 31.8x
Detecting Methods for Variation GATK 3.7 HaplotypeCaller
SNV Numbers (after QC)

76,768,387 (Autosomal Chromosomes)

2,898,518 (X Chromosome)

INDEL Numbers (after QC)

10,202,908 (Autosomal Chromosomes)

410,435 (X Chromosome)

Japanese Genotype-phenotype Archive Dataset ID

JGAD000233 (fastq)

JGAD000405 (bam, vcf): Whole genome sequencing analyzed data included in the JGAD000117 were mapped to the GRCh37 reference genome sequence, and variant detection was carried out using the GATK (Genome Analysis Toolkit) standards. This project is an initiative of the GEnome Medical alliance Japan (GEM Japan, GEM-J). Lean more..

Total Data Volume 3 TB (fastq) + 1.8 TB (bam, vcf)
Comments (Policies) NBDC policy

 

DATA PROVIDER

Principal Investigator: Hidewaki Nakagawa

Affiliation: RIKEN Center for Integrative Medical Sciences

Project / Group Name: -

Funds / Grants (Research Project Number):

Name Title Project Number

 

PUBLICATIONS

TitleDOIDataset ID
1
2

 

USERS (Controlled-access Data)

Principal InvestigatorAffiliationResearch TitleData in Use (Dataset ID)Period of Data Use
Kengo Kinoshita Tohoku Medical Megabank Organization Construction of Japanese whole genome database JGAD000233 2019/06/24-2022/03/31
Kouya Shiraishi Division of Genome Biology, National Cancer Research Institute Elucidation of immune-system networks between host and tumor based on genomic analysis JGAD000233 2019/08/05-2023/03/31