NBDC Research ID: hum0174.v6

 

SUMMARY

Aims: To build a database of genomic structural variants in Japanese population

Methods: We sequenced genomic DNAs using PacBio, 10X Genomics and Nanopore sequencing technologies, and analyzed genomic structural variations.

Participants/Materials: Japanese (collected by Japanese B cell DNA bank)

 

Dataset IDType of DataCriteriaRelease Date
JGAS000173 NGS (WGS): Sequence raw data, Structural Variants data for each sample Controlled-access (Type I) 2020/10/06
JGAS000173 (Data addition) NGS (WGS) Controlled-access (Type I) 2020/11/27
JGAS000580 NGS (WGS) Controlled-access (Type I) 2023/06/29
JGAS000286 NGS (WGS): Sequence raw data, Structural Variants data for each sample Controlled-access (Type I) 2023/07/06
JGAS000505 NGS (WGS): Sequence raw data, haplotype data for each sample Controlled-access (Type I) 2023/07/10
JGAS000596 NGS (WGS) Controlled-access (Type I) 2023/12/28

*Release Note

* Data users need to apply an application for Using NBDC Human Data to reach the Controlled-access Data. Learn more

 

MOLECULAR DATA

JGAS000173

Participants/Materials: Purified DNA from Japanese-origin B cell lines: 10 samples
Targets WGS
Target Loci for Capture Methods -
Platform

1. PacBio [Sequel]

2. 10x Genomics [Chromium Controller]

Library Source Purified DNA from Japanese-origin B cell lines
Cell Lines the Health Science Research Resources Bank (HSRRB), the National Institutes of Biomedical Innovation, Health and Nutrition (NIBIOHN)
Library Construction (kit name)

1. the library prep. kit for SMRT sequencing by Pacific Biosciences

2. 10X Genomics-Chromium system

Fragmentation Methods

1. Megaruptor, g-tube

2. None

Spot Type

1. Single-end

2. Paired-end

Read Length (without Barcodes, Adaptors, Primers, and Linkers)

1. 14000 bp

2. 151 bp

QC Methods

1. Qubit, Pulsed-field gel electrophoresis, TapeStation, Bioanalyzer

2. qPCR, Bioanalyzer

Mapping Methods

1. minimap2

2. longranger by 10X Genomics

Depth (average)

1. 29x

2. 19x

Structural Variants Detection Methods

1. Sniffles

2. longranger by 10X Genomics

Polymorphism Number (after QC)

1. 16870/sample

2. 11700/sample

Japanese Genotype-phenotype Archive Dataset ID JGAD000251
Total Data Volume 1 TB (fastq, bam [ref: unmapped], bed, vcf [ref: hg38])
Comments (Policies) NBDC policy

 

JGAS000173 (Data addition)

Participants/Materials: Purified DNA from Japanese-origin B cell liens: 11 samples
Targets WGS
Target Loci for Capture Methods -
Platform PacBio [Sequel]
Library Source Purified DNA from Japanese-origin B cell lines
Cell Lines the Health Science Research Resources Bank (HSRRB), the National Institutes of Biomedical Innovation, Health and Nutrition (NIBIOHN)
Library Construction (kit name) the library prep. kit for SMRT sequencing by Pacific Biosciences
Fragmentation Methods Megaruptor, g-tube
Spot Type Single-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers) 14000 bp
Japanese Genotype-phenotype Archive Dataset ID JGAD000251
Total Data Volume 3.44 TB (bam)
Comments (Policies) NBDC policy

 

JGAS000580

Participants/Materials: Purified DNA from Japanese-origin B cell liens: 1 samples
Targets WGS
Target Loci for Capture Methods MHC, LRC, Chr1, SMN1/SMN2
Platform Nanopore [PromethION]
Library Source Purified DNA from Japanese-origin B cell lines
Cell Lines the Health Science Research Resources Bank (HSRRB), the National Institutes of Biomedical Innovation, Health and Nutrition (NIBIOHN)
Library Construction (kit name) Ultra-Long DNA Sequencing Kit (SQK-ULK001)
Fragmentation Methods Transposase-based
Spot Type Single-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers) 56.2 Kbp ~ 63.8 Kbp (N50)
Mapping Methods minimap2 (v2.24) with "-x map-ont"
Mapping Quality -
Reference Genome Sequence T2T-CHM13v2.0
Coverage (Depth) 81x ~ 104x (median)
Japanese Genotype-phenotype Archive Dataset ID JGAD000706
Total Data Volume 1.4 GB (bam)
Comments (Policies) NBDC policy

 

JGAS000286

Participants/Materials:

Purified DNA from Japanese-origin B cell lines: 177 samples

    (CCS: 112 samples, CLR: 65 samples)

Targets WGS
Target Loci for Capture Methods -
Platform PacBio [Sequel, Sequel II]
Library Source Purified DNA from Japanese-origin B cell lines
Cell Lines the Health Science Research Resources Bank (HSRRB), the National Institutes of Biomedical Innovation, Health and Nutrition (NIBIOHN)
Library Construction (kit name) the library prep. kit for SMRT sequencing by Pacific Biosciences
Fragmentation Methods Megaruptor, g-tube
Spot Type Single-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers) 14000 bp
QC Methods Qubit, Pulsed-field gel electrophoresis, TapeStation, Bioanalyzer
Mapping Methods minimap2
Depth (average)

CCS: 9.5x

CLR: 36x

SNV Call DeepVariant
SNV Haplotyping WhatsHap
Structural Variants Detection Methods pbsv
diploid assembly HiCanu
Japanese Genotype-phenotype Archive Dataset ID JGAD000392
Total Data Volume 31.8 TB (bam, vcf, fasta)
Comments (Policies) NBDC policy

 

JGAS000505 / JGAS000596

Participants/Materials: Purified DNA from Japanese-origin B cell lines: 177 + 30 samples
Targets WGS
Target Loci for Capture Methods -
Platform PacBio [Sequel II]
Library Source Purified DNA from Japanese-origin B cell lines
Cell Lines the Health Science Research Resources Bank (HSRRB), the National Institutes of Biomedical Innovation, Health and Nutrition (NIBIOHN)
Library Construction (kit name) the library prep. kit for SMRT sequencing by Pacific Biosciences
Fragmentation Methods Megaruptor
Spot Type Single-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers) 14,949 bp
QC Methods Qubit, NanoDrop, TapeStation, Femto Pulse, Pulsed-field gel electrophoresis
Mapping Methods minimap2 (hg38-no_alt)
Depth (average) CCS: 9.06x
SNV Call DeepVariant
SNV Haplotyping WhatsHap
Structural Variants Detection Methods pbsv
diploid assembly HiCanu
Japanese Genotype-phenotype Archive Dataset ID

JGAD000622 (177 samples)

JGAD000725 (30 samples)

Total Data Volume

JGAD000622: 7.5 TB (bam/vcf/contig_fasta for 30 samples, fastq for 147 samples)

JGAD000725: 705 GB (fastq)

Comments (Policies) NBDC policy

 

DATA PROVIDER

Principal Investigator: Shinichi Morishita

Affiliation: Graduate School of Frontier Sciences, the University of Tokyo

Project / Group Name: -

Funds / Grants (Research Project Number):

NameTitleProject Number
Advanced Genome Research and Bioinformatics Study to Facilitate Medical Innovation, Platform Program for Promotion of Genome Medicine, Japan Agency for Medical Research and Development (AMED) Informatics for analyzing de novo human genome assemblies JP16km0405204
Biobank - Construction and Utilization biobank for genomic medicine REalization, Japan Agency for Medical Research and Development (AMED) Informatics for analyzing de novo human genome assemblies JP21tm0424219

 

PUBLICATIONS

TitleDOIDataset ID
1 Rapid and ongoing evolution of repetitive sequence structures in human centromeres. doi: 10.1126/sciadv.abd9230 JGAD000251
2 JTK: targeted diploid genome assembler doi: 10.1093/bioinformatics/btad398 JGAD000706
3 A landscape of complex tandem repeats within individual human genomes doi: 10.1038/s41467-023-41262-1 JGAD000392
JGAD000622

 

USRES (Controlled-access Data)

Principal InvestigatorAffiliationCountry/RegionResearch TitleData in Use (Dataset ID)Period of Data Use
Yuta Kochi Department of Genomic Function and Diversity, Medical Research Institute, Tokyo Medical and Dental University Japan Genetic study of complex diseases through comprehensive analysis of functional genetic variations JGAD000251 2023/04/10-2024/03/31
Yukinori Okada Department of Statistical Genetics, Osaka University Graduate School of Medicine Japan Elucidation of disease etiology by trans-layer omics analysis JGAD000251
JGAD000392
JGAD000622
JGAD000725
2024/03/05-2025/07/14