NBDC Research ID: hum0056.v1

SUMMARY

Aims: To provide genome and multi-layer omics data from a population based cohort (Iwate Medical Megabank: IMM) to research communities

Methods: Allele frequencies based on whole genome sequencing (WGS), methylation rates based on whole genome bisulfite sequencing (WGBS), and FPKM values based on RNA sequencing (RNA-seq) of 102 monocytes, 102 CD4+ T cells, and 94 neutrophils (total 197 individuals from IMM cohort) isolated from peripheral blood cells.

Participants/Materials: Monocytes and CD4+ T cells from 102 individuals, and neutrophils from 94 individuals in each (total 197 individuals)

URL: http://imethyl.iwate-megabank.org/

Dataset ID	Type of Data	Criteria	Release Date
hum0056.v1.freq.v1	(1) Allele frequencies from WGS	Unrestricted-access	2018/03/30
hum0056.v1.ch3.v1	(2) Methylation rates at each CpG site from WGBS	Unrestricted-access	2018/03/30
hum0056.v1.fpkm.v1	(3) Average of FPKM values from RNA-seq	Unrestricted-access	2018/03/30

*Release Note

*When the research results including the data which were downloaded from NHA/DRA, are published or presented somewhere, the data user must refer the papers which are related to the data, or include in the acknowledgment. Learn more

MOLECULAR DATA

hum0056.v1.freq.v1


Participants/Materials	IMM cohort: 197 individuals
Targets	WGS
Target Loci for Capture Methods	-
Platform	Illumina [HiSeq 2500/HiSeq X]
Library Source	DNAs extracted from 102 monocytes, 102 CD4+ T cells, and 94 neutrophils
Cell Lines	-
Library Construction (kit name)	TruSeq DNA PCR-Free HT Sample Prep Kit
Fragmentation Methods	Ultrasonic fragmentation (Covaris LE220)
Spot Type	Paired-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers)	162 bp (72 neutrophils: 150 bp)
QC	qMiSeq^*1
Deduplication	none
Calibration for re-alignment and base quality	none
Mapping Methods	Bowtie2 (version 2.1.0)
Reference Genome Sequence	GRCh37d5
Coverage (Depth)	40.9 ×
Detecting Methods for Variation	Bcftools software (ver. 0.1.17-dev)
Total Reads / Uniquely Mapped Reads	230,956,735,490 / 228,318,012,630 (Monocytes: 58,859,522,366 / 57,737,359,722) (CD4+ T cells: 58,964,563,094 / 57,845,976,317) (neutrophils: 113,132,650,030 / 112,734,676,591) *mapped reads: a read which had multiple mapped locations was aligned to one site randomly.
Filtering Methods	minor allele count (MAC) > 1
SNP Numbers (after QC)	Monocytes: 8,129,415 SNPs CD4+ T cells: 8,137,443 SNPs Neutrophils: 8,175,808 SNPs
NBDC Dataset ID	hum0056.v1.freq.v1 (Click the Dataset ID to download the file) Dictionary file
Total Data Volume	Monocytes: 403 MB (txt) CD4+ T cells: 403.5 MB (txt) neutrophils: 409 MB (txt)
Comments (Policies)	NBDC policy

*1: Katsuoka et al, Analytical Biochemistry, 2014

hum0056.v1.ch3.v1


Participants/Materials	IMM cohort : 197 individuals
Targets	WGBS
Target Loci for Capture Methods	-
Platform	Illumina [HiSeq 2500]
Library Source	DNAs extracted from 102 monocytes, 102 CD4+ T cells, and 94 neutrophils
Cell Lines	-
Library Construction (kit name)	TruSeq DNA Methylation Kit
Fragmentation Methods	bisulfite conversion reaction
Spot Type	Paired-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers)	125 bp
QC	electrophoresis and qPCR
Deduplication	SAMtools (ver.0.1.19)
Calibration of re-alignment and base quality	none
Mapping Methods	NovoAlign (ver.3.02.08)
Reference Genome Sequence	GRCh37d5
Coverage (Depth)	Monocytes: 31.1 ± 1.6 CD4+T cells: 31.0 ± 1.6 Neutrophils: 54.7 ± 1.6
Detecting Methods for Variation	NovoMethyl (ver.3.02.08)
Total Reads / Uniquely Mapped Reads	Monocytes: 780,709,034 ± 45,934,514 / 624,432,868 ± 38,766,158 CD4+ T cells: 779,212,752 ± 40,833,955 / 667,934,331 ± 33,002,407 Neutrophils: 1,144,935,054 ± 33,512,764 / 994,992,101 ± 32,667,070
NBDC Dataset ID	hum0056.v1.ch3.v1 (Click the Dataset ID to download the file) Dictionary file
Total Data Volume	Monocytes: 1.4 GB (txt) CD4+ T cells: 1.4 GB (txt) neutrophils: 1.5 GB (txt)
Comments (Policies)	NBDC policy

hum0056.v1.fpkm.v1


Participants/Materials	IMM cohort : 197 individuals
Targets	RNA-seq
Target Loci for Capture Methods	-
Platform	Illumina [HiSeq 2500]
Library Source	RNAs extracted from 102 monocytes, 102 CD4+ T cells, and 94 neutrophils
Cell Lines	-
Library Construction (kit name)	TruSeq RNA Library Preparation Kit v2
Fragmentation Methods	divalent cation under high temperature
Spot Type	Single-end
Read Length (without Barcodes, Adaptors, Primers, and Linkers)	125 bp
QC	electrophoresis and qPCR
Deduplication	none
Calibration of re-alignment and base quality	none
Mapping Methods	TopHat (ver. 2.0.13)
Reference Genome Sequence	GRCh37, Human GENCODE Gene Set (release 19)
Coverage (Depth)	-
Detecting Methods for Variation	cuffquant and cuffnorm
Total Reads / Uniquely Mapped Reads	Monocytes: 33,917,157 ± 3,153,528 / 27,390,039 ± 2,494,286 CD4+ T cells: 35,175,996 ± 1,275,575 / 27,506,624 ± 1,669,459 Neutrophils: 47,040,140 ± 6,289,540 / 43,241,139 ± 8,491,065
Gene Numbers	Monocytes: 16,282 genes CD4+ T cells: 18,299 genes Neutrophils: 14,534 genes
NBDC Dataset ID	hum0056.v1.fpkm.v1 (Click the Dataset ID to download the file) Dictionary file
Total Data Volume	Monocytes: 818 KB (txt) CD4+ T cells: 919 KB (txt) neutrophils: 731 KB
Comments (Policies)	NBDC policy

DATA PROVIDER

Principal Investigator: Atsushi Shimizu

Affiliation: Disaster Reconstruction Center, Iwate Medical University

Project / Group Name: Division of Biomedical Information Analysis, Iwate Tohoku Medical Megabank Organization

Funds / Grants (Research Project Number):

Name	Title	Project Number
Ministry of Education, Culture, Sports, Science and Technology Japan	Tohoku Medical Megabank Project
Japan Agency for Medical Research and Development (AMED)	Tohoku Medical Megabank Project (Iwate Medical University) Special Account of the Great East Japan Earthquake Disaster Recovery	JP18km0105003

PUBLICATIONS

	Title	DOI	Dataset ID
1	Genome-wide identification of inter-individually variable DNA methylation sites improves the efficacy of epigenetic association studies	doi: 10.1038/s41525-017-0016-5