Use the 'Select options' tab to query the WGS database.
Phylogenetic tree view powered by VCF2PopTree
No VCF file loaded. Use the 'Select options' tab to query the WGS database.
SNPversity 2.0 Data Downloads
Release 2.0.80 (20 September 2024)
Overview
Below you will find detailed information about the datasets available for download, along with guidelines and best practices. For more information on how the datasets were generated see the 'Help' section.
How to Download
- Filtered datasets (Direct download): For the filtered datasets, click the link to the Federal Box account and identify the files to download. Check the boxes next to the files and click the download icon near the top of the page. Alternatively, click on the filename listed in one of the tables below, which is linked to the Box download page for that file.
- Unfiltered datasets (Request Access): Send an email to John Portwood with the subject "WGS file access request". Once your request is received, you will receive a link and temporary password via email. This link will be active for 48 hours.
Download Instructions
- Prepare Adequate Storage: Ensure you have sufficient storage space on your system for the download.
- Use a Reliable Internet Connection: A wired connection is recommended for faster and more stable downloads.
- Consider Using a Download Manager: For large files, a download manager can help manage the download process, allowing for pause, resume, and recovery of the download.
- Verify Data Integrity: After download, use checksums (provided in the tables below) to verify the integrity of the downloaded files.
MaizeGDB2024 High Coverage Datasets (VCFs)
Dataset Name |
Description |
Size |
*Download time (@100Mbps / @10MBps) |
MD5 Checksum |
chr1_high_coverage.vcf.gz |
High Coverage variants and metadata identified in Chromsome 1. |
8.7 GB |
12 minutes / 1 hour 56 minutes |
e50be4e8ec47629ee6ff28bddcd90df4 |
chr2_high_coverage.vcf.gz |
High Coverage variants and metadata identified in Chromsome 2. |
6.8 GB |
9 minutes / 1 hour 30 minutes |
c1c8606a087b5a74940a01e92b1ed4b8 |
chr3_high_coverage.vcf.gz |
High Coverage variants and metadata identified in Chromsome 3. |
6.8 GB |
9 minutes / 1 hour 30 minutes |
c721c6714bb75ad5e5f663fd10939289 |
chr4_high_coverage.vcf.gz |
High Coverage variants and metadata identified in Chromsome 4. |
6.9 GB |
9 minutes / 1 hour 32 minutes |
8d095b23f802435c4571aee661667e13 |
chr5_high_coverage.vcf.gz |
High Coverage variants and metadata identified in Chromsome 5. |
6.1 GB |
8 minutes / 1 hour 21 minutes |
88369ecfe6ec4e9eba2671417f365c75 |
chr6_high_coverage.vcf.gz |
High Coverage variants and metadata identified in Chromsome 6. |
5.0 GB |
7 minutes / 1 hour 6 minutes |
c8e0e732b5c6bd0e6ede0552d95c9709 |
chr7_high_coverage.vcf.gz |
High Coverage variants and metadata identified in Chromsome 7. |
4.9 GB |
7 minutes / 1 hour 5 minutes |
e179e8e767a511b0f5bec2e56c97ba38 |
chr8_high_coverage.vcf.gz |
High Coverage variants and metadata identified in Chromsome 8. |
5.2 GB |
7 minutes / 1 hour 9 minutes |
9dffa48a5993d35909db53a1017f1639 |
chr9_high_coverage.vcf.gz |
High Coverage variants and metadata identified in Chromsome 9. |
4.5 GB |
6 minutes / 1 hour |
b21a8f266736e4a9e5f0554aa3ca51cc |
chr10_high_coverage.vcf.gz |
High Coverage variants and metadata identified in Chromsome 10. |
4.3 GB |
6 minutes / 57 minutes |
0346e968e9cce790f3d5730602a7ce24 |
*Download times are estimated and may vary.
MaizeGDB 2024 High Coverage Datasets (H5)
Dataset Name |
Description |
Size |
*Download time (@100Mbps / @10MBps) |
MD5 Checksum |
maizegdb2024_chr1_HC.h5.gz |
High Coverage variants and metadata identified in Chromsome 1. |
4.8 GB |
6 minutes / 1 hour and 4 minutes |
f51b1e0e94ddbbb3f998ddab4f35edd1 |
maizegdb2024_chr2_HC.h5.gz |
High Coverage variants and metadata identified in Chromsome 2. |
3.9 GB |
5 minutes / 52 minutes |
a5c3e387cee3fd7b59765f3c9e91fc63 |
maizegdb2024_chr3_HC.h5.gz |
High Coverage variants and metadata identified in Chromsome 3. |
3.8 GB |
5 minutes / 51 minutes |
eeb8b872c6898d1a0f3fa76a92b9a384 |
maizegdb2024_chr4_HC.h5.gz |
High Coverage variants and metadata identified in Chromsome 4. |
4.0 GB |
5 minutes / 53 minutes |
31fdef77aea4ec2ec8ed4927a1f14016 |
maizegdb2024_chr5_HC.h5.gz |
High Coverage variants and metadata identified in Chromsome 5. |
3.4 GB |
5 minutes / 45 minutes |
37ba64fb3c72a8b434372226a046df9d |
maizegdb2024_chr6_HC.h5.gz |
High Coverage variants and metadata identified in Chromsome 6. |
2.7 GB |
4 minutes / 36 minutes |
56bd55bb6d7b1059b467e1eeb08e7c8e |
maizegdb2024_chr7_HC.h5.gz |
High Coverage variants and metadata identified in Chromsome 7. |
2.9 GB |
4 minutes / 39 minutes |
31be7f35f8ef34d475afd2caec760bd4 |
maizegdb2024_chr8_HC.h5.gz |
High Coverage variants and metadata identified in Chromsome 8. |
2.9 GB |
4 minutes / 39 minutes |
402392a3a7d6a20b3a3ab50c2b049e69 |
maizegdb2024_chr9_HC.h5.gz |
High Coverage variants and metadata identified in Chromsome 9. |
2.6 GB |
3 minutes / 35 minutes |
f469e30c670baeb2229a41c49e6128e6 |
maizegdb2024_chr10_HC.h5.gz |
High Coverage variants and metadata identified in Chromsome 10. |
2.4 GB |
3 minutes / 32 minutes |
276a6c9755ddd13174b7dc91f188ead3 |
*Download times are estimated and may vary.
MaizeGDB 2024 High Quality Datasets (VCFs)
Dataset Name |
Description |
Size |
*Download time (@100Mbps / @10MBps) |
MD5 Checksum |
chr1_high_quality.vcf.gz |
High Quality variants and metadata identified in Chromsome 1. |
3.3 GB |
4 minutes / 44 minutes |
6a0dd7f98bb40f83db73e5ab4c053637 |
chr2_high_quality.vcf.gz |
High Quality variants and metadata identified in Chromsome 2. |
2.6 GB |
3 minutes / 34 minutes |
c18ee8a06a9b84b4a23c58a934ca12d8 |
chr3_high_quality.vcf.gz |
High Quality variants and metadata identified in Chromsome 3. |
2.5 GB |
3 minutes / 33 minutes |
548fea5782096148f31143036d07e839 |
chr4_high_Quality.vcf.gz |
High Quality variants and metadata identified in Chromsome 4. |
2.6 GB |
3 minutes / 35 minutes |
fa8f40057c580ba0e80aaf178f1c4735 |
chr5_high_quality.vcf.gz |
High Quality variants and metadata identified in Chromsome 5. |
2.2 GB |
3 minutes / 29 minutes |
916a5cf7980e2a43d8bc9b6ef062e7be |
chr6_high_quality.vcf.gz |
High Quality variants and metadata identified in Chromsome 6. |
1.9 GB |
3 minutes / 25 minutes |
da2e2fe9bea1f30b46ec95cd3677cee2 |
chr7_high_quality.vcf.gz |
High Quality variants and metadata identified in Chromsome 7. |
1.8 GB |
2 minutes / 24 minutes |
85cd9a910afa97ddd80cb2b8a04ef366 |
chr8_high_quality.vcf.gz |
High Quality variants and metadata identified in Chromsome 8. |
1.9 GB |
3 minutes / 25 minutes |
9edb56728c126a6332893ae88f608917 |
chr9_high_quality.vcf.gz |
High Quality variants and metadata identified in Chromsome 9. |
1.7 GB |
2 minutes / 23 minutes |
9560079d5adb7deaeb749121ffb10d70 |
chr10_high_quality.vcf.gz |
High Quality variants and metadata identified in Chromsome 10. |
1.6 GB |
2 minutes / 21 minutes |
72e8cec621ce020025eefbec47b54534 |
*Download times are estimated and may vary.
MaizeGDB 2024 High Quality Datasets (H5)
Dataset Name |
Description |
Size |
*Download time (@100Mbps / @10MBps) |
MD5 Checksum |
maizegdb2024_chr1_HQ.h5.gz |
High Quality variants and metadata identified in Chromsome 1. |
2.1 GB |
3 minutes / 28 minutes |
a915f198a8d9b7a227848279b0675fd1 |
maizegdb2024_chr2_HQ.h5.gz |
High Quality variants and metadata identified in Chromsome 2. |
1.7 GB |
2 minutes / 23 minutes |
3e1bcc41155dcbaa22a8081e618826cd |
maizegdb2024_chr3_HQ.h5.gz |
High Quality variants and metadata identified in Chromsome 3. |
1.6 GB |
2 minutes / 21 minutes |
7bffefa06a2ff00f004fc1ec6096b5d6 |
maizegdb2024_chr4_HQ.h5.gz |
High Quality variants and metadata identified in Chromsome 4. |
1.7 GB |
2 minutes / 23 minutes |
9221077671d77b6a3c8d2247c2c78931 |
maizegdb2024_chr5_HQ.h5.gz |
High Quality variants and metadata identified in Chromsome 5. |
1.4 GB |
2 minutes / 19 minutes |
adfe23716e38d777c3be756ff6374001 |
maizegdb2024_chr6_HQ.h5.gz |
High Quality variants and metadata identified in Chromsome 6. |
1.2 GB |
2 minutes / 16 minutes |
b0d073449e7b263c5897dda1329a5cf4 |
maizegdb2024_chr7_HQ.h5.gz |
High Quality variants and metadata identified in Chromsome 7. |
1.2 GB |
2 minutes / 16 minutes |
49037f5842003282bddccbd22755890f |
maizegdb2024_chr8_HQ.h5.gz |
High Quality variants and metadata identified in Chromsome 8. |
1.2 GB |
2 minutes / 16 minutes |
99d25904e4a5dea7175b061757066f25 |
maizegdb2024_chr9_HQ.h5.gz |
High Quality variants and metadata identified in Chromsome 9. |
1.1 GB |
1 minutes / 15 minutes |
2a456f9f614191427733570715fe4f2d |
maizegdb2024_chr10_HQ.h5.gz |
High Quality variants and metadata identified in Chromsome 10. |
1.1 GB |
1 minutes / 15 minutes |
d6b3467f0612b76877b160ba9a7dbd32 |
*Download times are estimated and may vary.
MaizeGDB 2024 Raw Unfiltered Datasets (VCFs)
Dataset Name |
Description |
Size |
*Download time (@100Mbps / @10MBps) |
MD5 Checksum |
maizegdb2024_chr1.vcf.gz |
Full unfiltered variants and read data in Chromsome 1. |
314 GB |
7 hours at 100 Mbps; 70 hours at 10 Mbps |
3d1c431ae016ccdaff844c44818bf419 |
maizegdb2024_chr2.vcf.gz |
Full unfiltered variants and read data in Chromsome 2. |
245 GB |
6 hours at 100 Mbps; 55 hours at 10 Mbps |
81ddb6d949aefef5fe332d481fa0c962 |
maizegdb2024_chr3.vcf.gz |
Full unfiltered variants and read data in Chromsome 3. |
241 GB |
5 hours at 100 Mbps; 54 hours at 10 Mbps |
031fcd97bcaf236d7f970803cdd73f0b |
maizegdb2024_chr4.vcf.gz |
Full unfiltered variants and read data in Chromsome 4. |
259 GB |
6 hours at 100 Mbps; 58 hours at 10 Mbps |
512b896ea1145b9798dd7177ce5f755c |
maizegdb2024_chr5.vcf.gz |
Full unfiltered variants and read data in Chromsome 5. |
224 GB |
5 hours at 100 Mbps; 50 hours at 10 Mbps |
03d78f4cb4f8f13aea67ec52d9f4971b |
maizegdb2024_chr6.vcf.gz |
Full unfiltered variants and read data in Chromsome 6. |
175 GB |
4 hours at 100 Mbps; 39 hours at 10 Mbps |
55834e34330db14d8c1e2998f4a9a45b |
maizegdb2024_chr7.vcf.gz |
Full unfiltered variants and read data in Chromsome 7. |
184 GB |
4 hours at 100 Mbps; 41 hours at 10 Mbps |
133a6611ebd9ba80b638bf3f5d14fc77 |
maizegdb2024_chr8.vcf.gz |
Full unfiltered variants and read data in Chromsome 8. |
185 GB |
4 hours at 100 Mbps; 41 hours at 10 Mbps |
1728205b913ecde564f34b6315b4f807 |
maizegdb2024_chr9.vcf.gz |
Full unfiltered variants and read data in Chromsome 9. |
163 GB |
4 hours at 100 Mbps; 36 hours at 10 Mbps |
59b8e5d6a0cccce85b0ad0e8dc7fa728 |
maizegdb2024_chr10.vcf.gz |
Full unfiltered variants and read data in Chromsome 10. |
155 GB |
4 hours at 100 Mbps; 34 hours at 10 Mbps |
7d384e34dcd343463b2e4555dc520797 |
maizegdb2024_scaf.vcf.gz |
Full unfiltered variants and read data in unplaced Scaffolds. |
3.1 GB |
4 minutes at 100 Mbps; 41 minutes at 10 Mbps |
cdf61f383f664b22eba6a380e20abb78 |
*Download times are estimated and may vary.
MaizeGDB 2024 + NAM 2021 High Coverage Datasets (VCFs)
Dataset Name |
Description |
Size |
*Download time (@100Mbps / @10MBps) |
MD5 Checksum |
chr1_high_coverage.vcf.gz |
Intersection of High Coverage variants and NAM variants identified in Chromsome 1. |
2.1 GB |
3 minutes / 28 minutes |
5159061234ffc6b7c4a3f2a9f6b3d174 |
chr2_high_coverage.vcf.gz |
Intersection of High Coverage variants and NAM variants identified in Chromsome 2. |
1.7 GB |
2 minutes / 23 minutes |
658376d7d7e3896a21b5966cb6304db1 |
chr3_high_coverage.vcf.gz |
Intersection of High Coverage variants and NAM variants identified in Chromsome 3. |
1.6 GB |
2 minutes / 21 minutes |
0f4ac8ca9c224c78f20da4936ce570d1 |
chr4_high_coverage.vcf.gz |
Intersection of High Coverage variants and NAM variants identified in Chromsome 4. |
1.6 GB |
2 minutes / 21 minutes |
3790c8161f1c9820976ee6fff6064da3 |
chr5_high_coverage.vcf.gz |
Intersection of High Coverage variants and NAM variants identified in Chromsome 5. |
1.4 GB |
2 minutes / 19 minutes |
311fe04a5f246e022052e48073fc3624 |
chr6_high_coverage.vcf.gz |
Intersection of High Coverage variants and NAM variants identified in Chromsome 6. |
1.2 GB |
2 minutes / 16 minutes |
a66372d53ff88a132b752c67b6756021 |
chr7_high_coverage.vcf.gz |
Intersection of High Coverage variants and NAM variants identified in Chromsome 7. |
1.2 GB |
2 minutes / 16 minutes |
9c75340a7cb74df616832f32d3c85faf |
chr8_high_coverage.vcf.gz |
Intersection of High Coverage variants and NAM variants identified in Chromsome 8. |
1.2 GB |
2 minutes / 16 minutes |
3815ca66bf58ffd27d5573f8cf2c421c |
chr9_high_coverage.vcf.gz |
Intersection of High Coverage variants and NAM variants identified in Chromsome 9. |
1.1 GB |
1 minutes / 15 minutes |
544295825f8caa9ad3087afbf33aa475 |
chr10_high_coverage.vcf.gz |
Intersection of High Coverage variants and NAM variants identified in Chromsome 10. |
1.1 GB |
1 minutes / 15 minutes |
9d509344b0ca28484d1b67b603114b1a |
*Download times are estimated and may vary.
MaizeGDB 2024 + NAM 2021 High Coverage Datasets (H5)
Dataset Name |
Description |
Size |
*Download time (@100Mbps / @10MBps) |
MD5 Checksum |
maizegdb2024_chr1_HC.h5.gz |
Intersection of High Coverage variants and NAM variants identified in Chromsome 1. |
1.4 GB |
2 minutes / 19 minutes |
cba0754c740c36f667f3c8d5b2d423ae |
maizegdb2024_chr2_HC.h5.gz |
Intersection of High Coverage variants and NAM variants identified in Chromsome 2. |
1.1 GB |
1 minutes / 15 minutes |
2118510f52f70f424e35f9141a4bad05 |
maizegdb2024_chr3_HC.h5.gz |
Intersection of High Coverage variants and NAM variants identified in Chromsome 3. |
1.1 GB |
1 minutes / 15 minutes |
3dd9d1b408f6058674501aea981bdc80 |
maizegdb2024_chr4_HC.h5.gz |
Intersection of High Coverage variants and NAM variants identified in Chromsome 4. |
1.1 GB |
1 minutes / 15 minutes |
3810747ab0abac75f03f520f18783fb4 |
maizegdb2024_chr5_HC.h5.gz |
Intersection of High Coverage variants and NAM variants identified in Chromsome 5. |
922 MB |
1 minutes / 12 minutes |
14fec2e2e02916b6c6c229b942502591 |
maizegdb2024_chr6_HC.h5.gz |
Intersection of High Coverage variants and NAM variants identified in Chromsome 6. |
748 MB |
1 minutes / 9 minutes |
cefa57e24ce71bc4188e7cbc30394634 |
maizegdb2024_chr7_HC.h5.gz |
Intersection of High Coverage variants and NAM variants identified in Chromsome 7. |
818 MB |
1 minutes / 11 minutes |
62e522f9709420368d7e6f3b63a39e69 |
maizegdb2024_chr8_HC.h5.gz |
Intersection of High Coverage variants and NAM variants identified in Chromsome 8. |
785 MB |
1 minutes / 9 minutes |
756ba2fcb992862ba219d32db22a0b15 |
maizegdb2024_chr9_HC.h5.gz |
Intersection of High Coverage variants and NAM variants identified in Chromsome 9. |
745 MB |
1 minutes / 9 minutes |
a1f54d0b6d30f4e8cd741b291d896788 |
maizegdb2024_chr10_HC.h5.gz |
Intersection of High Coverage variants and NAM variants identified in Chromsome 10. |
676 MB |
1 minutes / 9 minutes |
3c8873d99bd9ebb80e6017afb051874a |
*Download times are estimated and may vary.
MaizeGDB 2024 + NAM 2021 High Quality Datasets (VCFs)
Dataset Name |
Description |
Size |
*Download time (@100Mbps / @10MBps) |
MD5 Checksum |
chr1_high_quality.vcf.gz |
Intersection of High Quality variants and NAM variants identified in Chromsome 1. |
894 MB |
1 minutes / 12 minutes |
07cf62f39abed159017c8690c728380c |
chr2_high_quality.vcf.gz |
Intersection of High Quality variants and NAM variants identified in Chromsome 2. |
706 MB |
1 minutes / 9 minutes |
433717b0dcd8c22d5ab153cf49bea518 |
chr3_high_quality.vcf.gz |
Intersection of High Quality variants and NAM variants identified in Chromsome 3. |
671 MB |
1 minutes / 9 minutes |
36561e3b03dd3834ba8743187d1d3635 |
chr4_high_Quality.vcf.gz |
Intersection of High Quality variants and NAM variants identified in Chromsome 4. |
705 MB |
1 minutes / 9 minutes |
b617695025819814a14044a6149ce360 |
chr5_high_quality.vcf.gz |
Intersection of High Quality variants and NAM variants identified in Chromsome 5. |
574 MB |
1 minutes / 8 minutes |
68a7985fefd1fe84d8a89c646cf1a6e3 |
chr6_high_quality.vcf.gz |
Intersection of High Quality variants and NAM variants identified in Chromsome 6. |
477 MB |
1 minutes / 8 minutes |
742d4d21d0d0235f97ed3223d94c53fb |
chr7_high_quality.vcf.gz |
Intersection of High Quality variants and NAM variants identified in Chromsome 7. |
529 MB |
1 minutes / 8 minutes |
c0904aa90c5d5af213e61a1b227720eb |
chr8_high_quality.vcf.gz |
Intersection of High Quality variants and NAM variants identified in Chromsome 8. |
494 MB |
1 minutes / 8 minutes |
c0904aa90c5d5af213e61a1b227720eb |
chr9_high_quality.vcf.gz |
Intersection of High Quality variants and NAM variants identified in Chromsome 9. |
486 MB |
1 minutes / 8 minutes |
82a1582203b9196b746dd8ef50bbe6d1 |
chr10_high_quality.vcf.gz |
Intersection of High Quality variants and NAM variants identified in Chromsome 10. |
440 MB |
1 minutes / 8 minutes |
dcaf3a4fb16f12eaca713d1fe10e826f |
*Download times are estimated and may vary.
MaizeGDB 2024 + NAM 2021 High Quality Datasets (H5)
Dataset Name |
Description |
Size |
*Download time (@100Mbps / @10MBps) |
MD5 Checksum |
maizegdb2024_chr1_HQ.h5.gz |
Intersection of High Quality variants and NAM variants identified in Chromsome 1. |
1.4 GB |
2 minutes / 19 minutes |
6117e0d6f88a3baff5473a00ad389f63 |
maizegdb2024_chr2_HQ.h5.gz |
Intersection of High Quality variants and NAM variants identified in Chromsome 2. |
1.1 GB |
1 minutes / 15 minutes |
c8a44ec6b1f4e43499d097de0bf91133 |
maizegdb2024_chr3_HQ.h5.gz |
Intersection of High Quality variants and NAM variants identified in Chromsome 3. |
1.1 GB |
1 minutes / 15 minutes |
4e428d9516d566e806e7a1c33eb3bd62 |
maizegdb2024_chr4_HQ.h5.gz |
Intersection of High Quality variants and NAM variants identified in Chromsome 4. |
1.1 GB |
1 minutes / 15 minutes |
090e9a4bea1c2fdbcaacb144823cd244 |
maizegdb2024_chr5_HQ.h5.gz |
Intersection of High Quality variants and NAM variants identified in Chromsome 5. |
894 MB/td>
| 1 minutes / 12 minutes |
1b623ad8d89e6513fb6a5aff71e1eab7 |
maizegdb2024_chr6_HQ.h5.gz |
Intersection of High Quality variants and NAM variants identified in Chromsome 6. |
762 MBB |
1 minutes / 9 minutes |
245cd7aa3081d3d23e720e8cea125296 |
maizegdb2024_chr7_HQ.h5.gz |
Intersection of High Quality variants and NAM variants identified in Chromsome 7. |
793 MB |
1 minutes / 9 minutes |
0ed90bad730330cab38a4271ab409f6e |
maizegdb2024_chr8_HQ.h5.gz |
Intersection of High Quality variants and NAM variants identified in Chromsome 8. |
780 MB |
1 minutes / 9 minutes |
61a4a466231f856b3d731a5fbafee798 |
maizegdb2024_chr9_HQ.h5.gz |
Intersection of High Quality variants and NAM variants identified in Chromsome 9. |
749 MB |
1 minutes / 9 minutes |
978350d9b19f3da4b418b16808a1b7f6 |
maizegdb2024_chr10_HQ.h5.gz |
Intersection of High Quality variants and NAM variants identified in Chromsome 10. |
683 MB |
1 minutes / 9 minutes |
c4f94a598971893c3683a66aa4a70691 |
*Download times are estimated and may vary.
Schnable 2023 Imputation Datasets (VCFs)
Dataset Name |
Description |
Size |
*Download time (@100Mbps / @10MBps) |
MD5 Checksum |
chr1_high_coverage.vcf.gz |
High quality filtered variants with imputation identified in Chromsome 1. |
1.2 GB |
2 minutes / 16 minutes |
5b576f00da277a8ee00c09013b47eb92 |
chr2_high_coverage.vcf.gz |
High quality filtered variants with imputation identified in Chromsome 2. |
938 MB |
1 minute / 12 minutes |
b7e2f255efb5ec460165382bee3a7f74 |
chr3_high_coverage.vcf.gz |
High quality filtered variants with imputationidentified in Chromsome 3. |
915 MB |
1 minute / 12 minutes |
274192f88d3019bff09a2bbd5433766a |
chr4_high_coverage.vcf.gz |
High quality filtered variants with imputationidentified in Chromsome 4. |
984 MB |
1 minute / 12 minutes |
9f1a317b8453eec79a924a511b741fa3 |
chr5_high_coverage.vcf.gz |
High quality filtered variants with imputationidentified in Chromsome 5. |
814 MB |
1 minute / 11 minutes |
5951fc48a862e20cf213562519fc0ff9 |
chr6_high_coverage.vcf.gz |
High quality filtered variants with imputationidentified in Chromsome 6. |
631 MB |
1 minute / 8 minutes |
712382cc627c4eb7574d194ac1fe6541 |
chr7_high_coverage.vcf.gz |
High quality filtered variants with imputationidentified in Chromsome 7. |
689 MB |
1 minute / 9 minutes |
e9c9155acc26ca408339db173ef17fec |
chr8_high_coverage.vcf.gz |
High quality filtered variants with imputationidentified in Chromsome 8. |
679 MB |
1 minute / 9 minutes |
020c567bd6bf33361be5f266ee38ce9c |
chr9_high_coverage.vcf.gz |
High quality filtered variants with imputationidentified in Chromsome 9. |
637 MB |
1 minute / 9 minutes |
c52b8bc2dd2e2696f5d68446671b2056 |
chr10_high_coverage.vcf.gz |
High quality filtered variants with imputationidentified in Chromsome 10. |
585 MB |
1 minute / 8 minutes |
00016d1b7fac84b789a580fb0545e9a9 |
*Download times are estimated and may vary.
Schnable 2023 Imputation Datasets (H5)
Dataset Name |
Description |
Size |
*Download time (@100Mbps / @10MBps) |
MD5 Checksum |
maizegdb2024_chr1_HC.h5.gz |
High quality filtered variants with imputationidentified in Chromsome 1. |
1.4 GB |
2 minutes / 19 minutes |
99a13611972199d1f389481d8fb9f50a |
maizegdb2024_chr2_HC.h5.gz |
High quality filtered variants with imputationidentified in Chromsome 2. |
1.1 GB |
1 minute / 15 minutes |
c515383ef0888a253c16745f6fa19f58 |
maizegdb2024_chr3_HC.h5.gz |
High quality filtered variants with imputationidentified in Chromsome 3. |
1.1 GB |
1 minute / 15 minutes |
9b06e8a203c6baab2ee8e96c0f5d2f43 |
maizegdb2024_chr4_HC.h5.gz |
High quality filtered variants with imputationidentified in Chromsome 4. |
1.2 GB |
2 minutes / 16 minutes |
1f31b607411e59ead16ff61cb044c4aa |
maizegdb2024_chr5_HC.h5.gz |
High quality filtered variants with imputationidentified in Chromsome 5. |
968 MB |
1 minute / 12 minutes |
d6d4e4e1711cd70ef86240e6491985ee |
maizegdb2024_chr6_HC.h5.gz |
High quality filtered variants with imputationidentified in Chromsome 6. |
756 MB |
1 minute / 10 minutes |
9ce2a24cf969a65d0138109fc6177b3d |
maizegdb2024_chr7_HC.h5.gz |
High quality filtered variants with imputationidentified in Chromsome 7. |
807 MB |
1 minute / 11 minutes |
8b659879f1f43d88c16fab68d5f39422 |
maizegdb2024_chr8_HC.h5.gz |
High quality filtered variants with imputationidentified in Chromsome 8. |
810 MB |
1 minute / 11 minutes |
2faffe9fad563d68af252c8e789d3e87 |
maizegdb2024_chr9_HC.h5.gz |
High quality filtered variants with imputationidentified in Chromsome 9. |
753 MB |
1 minute / 10 minutes |
8d556f0d149ba7fb8486d136974ddbfb |
maizegdb2024_chr10_HC.h5.gz |
High quality filtered variants with imputationidentified in Chromsome 10. |
703 MB |
1 minute / 10 minutes |
8d556f0d149ba7fb8486d136974ddbfb |
*Download times are estimated and may vary.
Need Help?
If you encounter any issues during the download or have any questions, please contact
John Portwood.
Disclaimer
- The estimated download times are provided for convenience and can vary based on your network speed and connectivity.
- We reserve the right to modify the access procedures to ensure data security, integrity, and convenience. This includes using another platform to host the datasets. This page will be updated with the most up-to-date datasets and methods of accessing the data.
SNPversity 2.0
Release 2.0.80 (20 September 2024)
Citation
If you use SNPversity for your research, please cite one or more of the following
- SNPVerstiy and the MaizeGDB 2024 dataset:
Andorf CM, Ross-Ibarra J, Seetharam AS, Hufford MB, Woodhouse MR. (2024) A unified VCF data set from nearly 1,500 diverse maize accessions and resources to explore the genomic landscape of maize. bioRxiv 2024.04.30.591904; doi:10.1101/2024.04.30.591904.
- NAM 2021 dataset:
Hufford MB, Seetharam AS, Woodhouse MR, et al. De novo assembly, annotation, and comparative analysis of 26 diverse maize genomes. Science. 2021;373(6555):655-662. doi:10.1126/science.abg5289
- Schanble 2023 dataset:
Grzybowski MW, Mural RV, Xu G, Turkus J, Yang J, Schnable JC. A common resequencing-based genetic marker data set for global maize diversity. Plant J. 2023;113(6):1109-1121. doi:10.1111/tpj.16123
Overview
Welcome to SNPversity 2.0, the second generation of the SNPversity platform. Designed as an open-source visualization tool, SNPversity 2.0 enables users to explore extensive variant datasets with ease. Here's how it works:
- Input & Output: Users can input specific genomic intervals and select maize accessions of interest. SNPversity 2.0 processes this input to deliver a Variant Call Format (VCF) file and a detailed table. These outputs showcase the alleles matching the user's query, providing a clear and concise visualization of genetic variations.
- Technical Architecture: At its core, SNPversity 2.0 is powered by a robust HDF5 database back-end that manages variant annotations efficiently. A data exchange layer, developed in Python, facilitates seamless data handling, while a JavaScript-based interface layer presents Single Nucleotide Polymorphisms (SNPs) and Insertions/Deletions (INDELs) in an interactive format.
- Real-Time Visualization: SNPversity 2.0 elevates data presentation by displaying information in real-time directly in your web browser. Tables are intuitively color-coded, reflecting each SNP's allelic status and mutational state for straightforward interpretation.
- Enhanced Data Insights: Beyond basic variant data, SNPversity 2.0 enriches your research with valuable metadata. This includes variant effect annotations, mapping quality scores, genotypic coverage, and maximum R2 scores to indicate linkage disequilibrium, offering a comprehensive view of genetic variations.
- Genomic Dataset: This version leverages a whole-genome sequencing (WGS) dataset encompassing 1,525 publicly available maize accessions. These accessions are aligned to the B73 RefGen_v5 (Zm-B73-REFERENCE-NAM-5.0), ensuring a broad and relevant genetic foundation for analysis.
Code
For information about the code used in SNPVersity 2.0 please refer to the GitHub page. This page provides detailed information about web compatibility, the directory structure, system requirements, and a comprehensive description of the code. It also offers guidance on how to use the websites, information about the pipelines used, and instructions on preparing data for PanEffect.
For more details, please visit our GitHub page:
GitHub - SNPVersity 2.0.
MaizeGDB 2024 Dataset
A variant dataset for maize was generated using a diverse set of inbred lines, landraces, and teosintes from 1,525 public resequenced lines through a standardized variant-calling pipeline against version 5 of the B73 reference genome. The output was filtered for mapping quality, genotypic coverage, and linkage disequilibrium, and annotated based on variant effects relative to the B73 RefGen_v5 gene annotations. Two versions of the dataset are available. A high-coverage dataset consisting of ~230 million loci was filtered on mapping quality and genotypic coverage. A high-quality dataset of ~75 million loci had an additional filtering step based on high-confidence linkage disequilibrium. See the tables below for the projects used to build the dataset and a summary of the variant effect annotations.
The table shows the composition of public resequencing data used to build the dataset. The table lists the number of accessions from each NCBI bioproject. The last column shows the color code used in the table headers in the table view.
The VCF and HDF5 files can be downloaded from the
MaizeGDB Box account.
Effect |
High Coverage |
High Quality |
intergenic |
216,128,332 |
69,236,257 |
5' UTR |
544,839 |
287,968 |
synonymous |
1,042,272 |
523,102 |
missense |
1,311,927 |
572,404 |
stop |
61,129 |
20,049 |
frameshift |
138,463 |
40,550 |
intron |
8,601,103 |
4,349,068 |
non-coding |
3,708 |
1,978 |
3'UTR |
801,323 |
468,603 |
other |
46,355 |
18,411 |
TOTAL |
228,679,451 |
75,518,390 |
MaizeGDB 2024 + NAM 2021 Dataset
The MaizeGDB 2024 datasets were merged with the ~36 million variants site found in the NAM 2021 study. Only variants with data in both datasets are shown in the MaizeGDB 2024 + NAM 2021 Dataset. Two versions of the dataset are available. A high-coverage dataset consisting of ~36 million loci was filtered on mapping quality and genotypic coverage. A high-quality dataset of ~23 million loci had an additional filtering step based on high-confidence linkage disequilibrium. See the tables below for the projects used to build the dataset and a summary of the variant effect annotations.
The table shows the composition of public resequencing data used to build the dataset. The table lists the number of accessions from each NCBI bioproject. The last column shows the color code used in the table headers in the table view.
The VCF and HDF5 files can be downloaded from the
MaizeGDB Box account.
Effect |
High Coverage |
High Quality |
intergenic |
32,561,459 |
20,257,551 |
5' UTR |
153,116 |
103,276 |
synonymous |
356,466 |
236,279 |
missense |
351,855 |
220,882 |
stop |
9,298 |
5,045 |
frameshift |
0 |
0 |
intron |
2,151,042 |
1,667,176 |
non-coding |
0 |
0 |
3'UTR |
245,275 |
179,511 |
other |
384 |
253 |
TOTAL |
35,828,895 |
22,669,973 |
Schnable 2023 Dataset
This dataset consists of resequencing data from 1276 previously published and 239 newly resequenced maize samples, resulting in a unified marker set of approximately 366 million segregating variants and 46 million high-confidence variants. It covers genetic diversity across crop wild relatives, landraces, and breeding lines, offering enhanced power to identify known genes and track allele frequency changes in modern maize. The VCF files from this study were reformatted for SNPVersity.
The table shows the composition of public resequencing data used to build the dataset. The table lists the number of accessions from each NCBI bioproject. The last column shows the color code used in the table headers in the table view.
The VCF and HDF5 files can be downloaded from the
MaizeGDB Box account.
Effect |
Imputed |
intergenic |
41,741,685 |
5' UTR |
197,854 |
synonymous |
447,334 |
missense |
412,221 |
stop |
11,210 |
frameshift |
20,787 |
intron |
2,897,450 |
non-coding |
1,111 |
3'UTR |
311,904 |
other |
12,719 |
TOTAL |
46,054,275 |
The table shows the counts for each of the variant effect types for the high-coverage and high-quality datasets.
How to use the website
There are four tabs in the SNPversity 2.0 website 'Select options', 'Table view', 'Tree view', and 'Help'.
Select options
The two main inputs for this tool are selecting the genomic interval of interest and which subset of maize accessions to include.
Select genomic interval:
This section allows the user to select the genomic interval of interest. The two main options are entering the genomic coordinates or entering a gene model identifier.
- Genome Version: Select the reference genome version. The only option currently available is B73 version 5.
- Dataset: Select the variation dataset to query. The 'MaizeGDB High Quality' dataset was built on 1,525 diverse maize accessions and was filtered on mapping quality (>= 30), genotypic coverage (>= 50%), and linkage disequilibrium with max R2 > 0.5. There were approximately 75 million annotated loci in this dataset. The 'MaizeGDB High Coverage' dataset was built on 1,525 diverse maize accessions and was filtered only on mapping quality (>= 30) and genotypic coverage (>= 50%). There are approximately 230 million annotated loci in this dataset.
- Chromosome: Select the chromosome.
- Genome Start Position (bp): Select the start position on the chromosome.
- Genome End Position (bp): Select the end position on the chromosome.
- Loci per page: Select the number of loci to view per page in the HTML table view. This number corresponds to the number of rows in the table.
- Gene Model ID: Optionally, select the gene model identifier and the number of base pairs to add as padding to the start and end of the gene model coordinates. Pressing the 'Load coordinates' button will load the chromosome, start, and end positions if the gene model is found. This option currently only accepts B73 RefGen_v5 gene models.
(
NOTE: Genomic regions larger than 1 MB will only be avaialble as VCF downloads. The table and tree views will not be available.)
Select which accessions to include:
This section allows the user to select a subset of the maize accessions to view. The two main options are to upload a file with the accession IDs, use the buttons to randomly subsample the datasets, or use the checkboxes to manually select the maize accessions. A list of all maize accessions can be found in
XLSX.
Table view
The table view option allows the user to download the VCF generated from the select options tab and displays a table of the data (for regions <= 1Mb). Each row of the data corresponds to a locus position in the dataset. The descriptions of the columns are in the following table.
Column name |
Definition |
Abbreviation |
Example data |
Chromosome |
CHR |
The chromosome where the locus is located |
chr1 |
Position |
POS |
The genomic coordinate on the chromosome |
104985 |
Reference allele |
REF |
The allele value for the locus in the reference genome B73. |
A |
Alternate allele |
ALT |
The alternative allele value found in other maize accessions |
T |
Gene models |
Gene model(s) |
The name of the B73 RefGen_v5 gene model affected by the variant. Displays the closest gene models when the variant is intergenic. |
Zm00001eb404830 |
Effect type |
Effect type |
The type of effect using Sequence Ontology terms. |
stop gained |
Effect impact |
Effect impact |
A estimation of putative impact/deleteriousness. |
HIGH MODIFIER |
Mapping quality score |
MQ |
The average mapping quality of reads supporting the variant. |
58 |
Completeness |
COMP |
The percent of accessions that provide genotype data for a particular variant (i.e., there is at least one read for that accession at the given variant). Note, this is different than read coverage |
99 |
Maximum squared correlation |
max R2 |
The linkage disequilibrium measured by the maximum R2 for a given loci. |
0.64 |
The gene models in the Gene model(s) column are linked to the MaizeGDB B73 genome browser. The position of the locus is shown as a vertical line on the browser.
Variant effects were calculated using
SNPeff.
Linkage disequilibrium values were calculated using
Plink.
For synonymous and missense variant effect types, the information in the Effect type column shows the following information in parentheses: the amino acid for the reference genome, the position, and the amino acid substitution in the alternative genome (e.g. G477S). Missense variant effect types are linked to the
Maize PanEffect tool.
The remainder of the columns are based on the subset of selected maize accessions. There is one column for each maize accession and the column header is color-coded based on the project. The columns are named based on the accession name, an underscore, and the SRR ID. The values and colors of the data in these columns are based on the allele value for the given locus for that accession.
-
0
Homozygous reference genotype.
-
1
Heterozygous genotype with one reference and one alternate allele
-
2
Homozygous alternate genotype.
-
N
Missing or unknown genotypes
Variant effect types
Seq. Ontology |
Effect |
Description |
Impact |
intergenic_region |
INTERGENIC |
The variant is in an intergenic region |
MODIFIER |
upstream_gene_variant |
UPSTREAM |
Upstream of a gene (default length: 5K bases) |
MODIFIER |
5_prime_UTR_variant |
UTR_5_PRIME |
Variant hits 5'UTR region |
MODIFIER |
coding_sequence_variant |
CDS |
The variant hits a CDS. |
MODIFIER |
exon_variant |
EXON |
The variant hits an exon (from a non-coding transcript) or a retained intron. |
MODIFIER |
intron_variant |
INTRON |
Variant hits an intron. Technically, hits no exon in the transcript. |
MODIFIER |
frameshift_variant |
FRAME_SHIFT |
Insertion or deletion causes a frame shift e.g.: An indel size is not multiple of 3 |
HIGH |
missense_variant |
NON_SYNONYMOUS_CODING |
Variant causes a codon that produces a different amino acid e.g.: Tgg/Cgg, W/R |
MODERATE |
synonymous_variant |
SYNONYMOUS_CODING |
Variant causes a codon that produces the same amino acid e.g.: Ttg/Ctg, L/L |
LOW |
stop_lost |
STOP_LOST |
Variant causes stop codon to be mutated into a non-stop codon e.g.: Tga/Cga, */R |
HIGH |
stop_gained |
STOP_GAINED |
Variant causes a STOP codon e.g.: Cag/Tag, Q/* |
HIGH |
3_prime_UTR_variant |
UTR_3_PRIME |
Variant hits 3'UTR region |
MODIFIER |
downstream_gene_variant |
DOWNSTREAM |
Downstream of a gene (default length: 5K bases) |
MODIFIER |
Impact |
Meaning |
Example |
HIGH |
The variant is assumed to have high (disruptive) impact in the protein, probably causing protein truncation, loss of function or triggering nonsense mediated decay. |
stop_gained, frameshift_variant |
MODERATE |
A non-disruptive variant that might change protein effectiveness. |
missense_variant, inframe_deletion |
LOW |
Assumed to be mostly harmless or unlikely to change protein behavior. |
synonymous_variant |
MODIFIER |
Usually non-coding variants or variants affecting non-coding genes, where predictions are difficult or there is no evidence of impact. |
exon_variant, downstream_gene_variant |
To see a full list of the variant effect types click
here.
VCF Download from Search
After completing a search based on a genomic region and a subset of accessions, a "Download the VCF file" button will be available in the "Table view" tab. When the user clicks this link, a subset of the variants meeting the search criteria will be downloaded as a VCF file. The VCF will only include the specified set of maize accessions and each variant within the selected genomic region. The VCF will contain rows with no variant relative to the reference. The genotype coverage fields (CVC and CVP) will refer to the number of accessions with genotype data for a particular variant based on all 1,525 accessions, not the local genotype coverage of the selected accessions.
Below is a sample of an output VCF file generated from SNPVersity 2.0:
##fileformat=VCFv4.2
##fileDate=20240531
##source=MaizeGDB2024
##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS mapping quality">
##INFO=<ID=CVC,Number=1,Type=Integer,Description="The number of accessions that have genotype data for a particular variant">
##INFO=<ID=CVP,Number=1,Type=Float,Description="The percent of accessions that have genotype data for a particular variant.">
##INFO=<ID=TYPE,Number=.,Type=String,Description="The type of effect using Sequence Ontology terms">
##INFO=<ID=EFFECT,Number=.,Type=String,Description="An estimation of putative impact/deleteriousness">
##INFO=<ID=GENEMODEL,Number=.,Type=String,Description="The name of the gene model affected by the variant">
##INFO=<ID=SUB,Number=.,Type=String,Description="The amino acid substitution for missense and non-synonymous variants">
##INFO=<ID=MAXR2,Number=1,Type=Float,Description="The maximum R2 for a given loci">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT CML228_SRR8906784 CML69_SRR8906963 B97_CRX445264 CML322_CRX445267 CML333_CRX445268 CML52_SRR5725841 CML103_SRR5976229 M37W_SRR5976317
chr10 218441 . T C 5127.59 . MQ=59.89;CVC=1481;CVP=98.87;TYPE=3_prime_UTR_variant;EFFECT=MODIFIER;GENEMODEL=Zm00001eb404760;SUB=NA;MAXR2=0.910833 GT 0/0 1/1 0/0 0/0 0/0 0/0 0/0 0/0
chr10 218443 . G A 5127.84 . MQ=60;CVC=1484;CVP=99.07;TYPE=3_prime_UTR_variant;EFFECT=MODIFIER;GENEMODEL=Zm00001eb404760;SUB=NA;MAXR2=0.910835 GT 0/0 1/1 0/0 0/0 0/0 0/0 0/0 0/0
chr10 218458 . T C 1656.67 . MQ=53.05;CVC=1484;CVP=99.07;TYPE=3_prime_UTR_variant;EFFECT=MODIFIER;GENEMODEL=Zm00001eb404760;SUB=NA;MAXR2=0.666034 GT 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0
chr10 218498 . T A 1715.17 . MQ=52.05;CVC=1465;CVP=97.80;TYPE=3_prime_UTR_variant;EFFECT=MODIFIER;GENEMODEL=Zm00001eb404760;SUB=NA;MAXR2=0.916933 GT 0/0 1/1 0/0 0/0 0/0 0/0 0/0 0/0
chr10 218502 . C CTCTGTCTG 1676.75 . MQ=47.9;CVC=1444;CVP=96.40;TYPE=3_prime_UTR_variant;EFFECT=MODIFIER;GENEMODEL=Zm00001eb404760;SUB=NA;MAXR2=0.916923 GT 0/0 1/1 0/0 0/0 0/0 0/0 0/0 0/0
Tree view
The tree view option allows the user to generate phylogenetic tree views based on the
VCF2PopTree tool. The input to build the tree is based on the VCF file generated from the select options tab.
The tree tree can be constructed as either an UPGMA tree or Neighbour-Joining tree (Unrooted). The drawing options are inlcudes Rectangular tree or Radial tree. In addtion, the trees can be saved in the following text formats: Newick tree, Pair-wise diversity (MEGA), or PHYLIP.
Downloads
The Downloads tab provides detailed information about the datasets available for download, along with guidelines and best practices.
Help
The help page gives an overview of the webpage, descriptions of the datasets and methods, and how to use the website.