This repository contains the code and data for the analysis of Cannabis samples for the manuscript titled "Cannabis labelling is associated with genetic variation in terpene synthase genes". This project involved the analysis of terpene and cannabinoid data from a previously published paper (Hazekamp et al. 2016) and a newly generated SNP dataset from Cannabis samples collected across the Netherlands.
The genetic data used in the genetic analysis can be found on DRYAD:
- 20191209_bedrocan_gen_filtered.map
- 20191209_bedrocan_gen_filtered.ped
- 20191209_bedrocan_gen_filtered.nosex
- 20191209_bedrocan_gen_filtered_pruned.raw
- 20191209_bedrocan_gen_filtered.mdist
- 20191209_bedrocan_gen_filtered.mdist.id
- 20191209_bedrocan_gen_filtered.genome
- 20191209_bedrocan_gen_filtered.hmp.txt
- 202008011_bedrocan_gen_filtered.raw
- 2020080607_bedro_kinship.txt
Please note that the genetic data files generated in this study have chromosomes numbered based on the old numbering system. Within the 'cannabis GWAS' source code the chromosomes are renumbered for presentation in the Manhattan plots according to the new chromosome numbering system for the CBDRx reference genome that was adopted in 2020 (https://www.ncbi.nlm.nih.gov/assembly/GCF_900626175.2). All resulting figures from the GWAS follow the new numbering system, however, the raw genetic files still reflect the old numbering system and should be recoded if used by others in the future.
During the review process some chemical names from the Hazekamp et al. 2016 paper were renamed, these changes are recorded in Supplementary Table 3 and described within the methods of the manuscript.
170428 Bedrocan_chem_sample mg per g.xlsx Raw chemical data file that contains terpenes and cannabinoid concentrations across 297 samples.
short_chem_names.txt Table with the short hand names for the chemical compounds.
sesqui_genes.xlsx Excel file containing annotations and bp positions for genes within the regions of interest on chromosome 6.
mono_genes.xlsx Excel file containing annotations and bp positions for genes within the regions of interest on chromosome 5.
cannabis_data_cleanup.Rmd: code used to clean up the chemical data.
cannabis_analysis.Rmd: code used to do the analyses for PCA and chemical correlations with label types.
cannabis_supplements.Rmd code used to do the supplemental analyses.
cannabis_simple_m.Rmd code to calculate the effective number fo independant tests for multiple test correction of the GWAS results.
cannabis_gwas.Rmd code for running the standard (EMMAX) and multi-locus mixed model (MLMM) gwas.
cannabis_zoom_ld.Rmd code for creating zoom in figures of GWAS resutls with LD heatmaps.
chem_data.csv Full chemical data for 297 samples that has been cleaned up.
gen_chem_data.csv Chemical data for 137 samples that also have genetic data.
pc_chem_load.R Chemical PC loadings 1-10 from 297 samples along with Sativa-indica labels.
pc_geno_load.R Genetic PC loadings 1-10 from 137 samples along with Sativa-indica labels.
chem_lm_tab.txt output statistics from the correlation label type with chemical concentration.
heatmap_pvals.csv P-values from Pearson correlations between chemical concentrations.
heatmap_r.csv Estimates (r) from Pearson correlations between chemical concentrations.
gwas_pvals_emmax.csv P-values from the standard emmax gwas with un-normalized chemical data.
gwas_pvals_mlmm.csv P-values from the MLMM gwas with un-normalized chemical data.
snp_r2 a sub directory that contains the outputs for the variance explained by the top SNPs included as co-factors in the mlmm GWAS with the un-normalized chemical data.
gwas_pvals_emmax_norm.csv P-values from the standard emmax gwas with normalized chemical data.
gwas_pvals_mlmm_norm.csv P-values from the MLMM gwas with normalized chemical data.
snp_r2_norm a sub directory that contains the outputs for the variance explained by the top SNPs included as co-factors in the mlmm GWAS with the normalized chemical data.
This directory contains the figures created in from the scripts above.
gwas is a sub directory that contains:
- mlmm_results a pdf with MLM and MLMM GWAS plots that used un-normalized chemical data.
- mlmm_results_norm a pdf with MLM and MLMM GWAS plots that used normalized chemical data.
This directory contains the final main figures presented in the manuscript.
This directory contains the supplementary figures and files.