NBDC Research ID: hum0184.v2

 

SUMMARY

Aims: Tohoku University Tohoku Medical Megabank Organization (ToMMo) and Iwate Tohoku Medical Megabank Organization (IMM) were founded to establish an advanced medical system to foster the reconstruction from the Great East Japan Earthquake. These organizations are developing a biobank that includes medical and genome information for supporting health and welfare in the Tohoku area. In the first stage, the part of our mission was to sequence the 4,000 individuals to construct Japanese whole-genome reference panel.

Methods: Whole genome sequencing

Participants/Materials: 4,566 Japanese general residents

URL: https://jmorp.megabank.tohoku.ac.jp/

 

Dataset IDType of DataCriteriaRelease Date
JGAS000239 NGS (WGS) Controlled-access (Type II) 2020/09/01
JGAS000239 (Dataset addition) bam/gvcf data of NGS(WGS) Controlled-access (Type I) 2022/02/18

*Release Note

*Data users need to apply an application for Using NBDC Human Data to reach the Controlled-access Data. Learn more

 

MOLECULAR DATA

JGAS000239

Participants/Materials: 4,566 Japanese general residents
Targets WGS
Target Loci for Capture Methods -
Platform Illumina [HiSeq 2500, NovaSeq 6000]
Library Source DNA extracted from peripheral blood cells
Cell Lines -
Library Construction (kit name) TruSeq DNA PCR-Free Library Prep Kit
Fragmentation Methods Ultrasonic fragmentation (Covaris LE220)
Spot Type Paired-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers)

HiSeq 2500: 162 bp / 259 bp

NovaSeq 6000: 150 bp

Japanese Genotype-phenotype Archive Dataset ID

JGAD000338

JGAD000339

Total Data Volume

JGAD000338: 260 TB (fastq)

JGAD000339: 230 TB (bam, gvcf, vcf [ref: GRCh37/hg19 (hs37d5)])

Comments (Policies)

NBDC policy & hum0184 policy

Contact Information of ToMMo Supercomputer:

 

JGAS000239 (Dataset addition)

Participants/Materials: 4,566 Japanese general residents
Targets WGS
Source fastq files of JGAD000338
QC

Data with bad base quality and high %GC content were removed.

Aligment:

Data matched for the following condition were removed.

- Low mapping rate

- Different insert size

- Gender information mismatch between meta-data and genotype data

- Suspected sex chromosome aberration

Genotyping:

GATK’s best practices includes a variant filtering step following Variant Quality Score Recalibration (VQSR)

- DP/GP (DP < 5, GQ < 20, DP > 60, GQ < 95 )

- Heterozygosity (F>=0.05)

- Hardy-Weinberg equilibrium (p < 10^-6)

- Repeat & Low Complexity

Principal Component Analysis (PCA):

PCA was performed with individuals included in the 1000 genomes project and outliers from Japanese cluster were removed.

After these filtering steps, variants located in the regions listed as the HighConfidenceRegion (Genome-In-A-Bottle project) were flagged.

Deduplication Picard 2.10.6
Calibration for re-alignment and base quality GATK 3.7
Mapping Methods BWA mem 0.7.12
Mapping Quality Reads with MAPQ< 20 were excluded at variant calling with GATK 3.7 HaplotypeCaller
Reference Genome Sequence GRCh37/hg19 (hs37d5)
Coverage (Depth) HiSeq 2500: 31.8x, NovaSeq 6000: 28.0x
Detecting Methods for Variation GATK 3.7 HaplotypeCaller
SNV Numbers (after QC)

76,768,387 (Autosomal Chromosomes)

2,898,518 (X Chromosome)

INDEL Numbers (after QC)

10,202,908 (Autosomal Chromosomes)

410,435 (X Chromosome)

Japanese Genotype-phenotype Archive Dataset ID JGAD000625: Whole genome sequencing analyzed data included in the JGAD000117 were mapped to the GRCh37 reference genome sequence, and variant detection was carried out using the GATK (Genome Analysis Toolkit) standards. This project is an initiative of the GEnome Medical alliance Japan (GEM Japan, GEM-J). Lean more..
Total Data Volume 230 TB (bam, vcf)
Comments (Policies)

NBDC policy & hum0184 policy

Contact Information of ToMMo Supercomputer:

 

DATA PROVIDER

Principal Investigator: Masayuki Yamamoto

Affiliation: Tohoku Medical Megabank Organization

Project / Group Name: Tohoku Medical Megabank Project

URL: https://www.megabank.tohoku.ac.jp/english/

Funds / Grants (Research Project Number):

NameTitleProject Number
Japan Agency for Medical Research and Development (AMED) Tohoku Medical Megabank Project (Tohoku University) Special Account of the Great East Japan Earthquake Disaster Recovery JP20km0105001
Japan Agency for Medical Research and Development (AMED) Tohoku Medical Megabank Project (Tohoku University) General Accounting JP20km0105002
Japan Agency for Medical Research and Development (AMED) Tohoku Medical Megabank Project (Iwate Medical University) Special Account of the Great East Japan Earthquake Disaster Recovery JP20km0105003
Japan Agency for Medical Research and Development (AMED) Tohoku Medical Megabank Project (Iwate Medical University) General Accounting JP20km0105004

 

PUBLICATIONS

TitleDOIDataset ID
1 3.5KJPNv2: an allele frequency panel of 3552 Japanese individuals including the X chromosome doi: 10.1038/s41439-019-0059-5 hum0015.v3.3.5kjpnv2.v1
2

 

USRES (Controlled-access Data)

Principal InvestigatorAffiliationCountry/RegionResearch TitleData in Use (Dataset ID)Period of Data Use