Standardization of cell-free methylated DNA immunoprecipitation (cfMeDIP) results for population-scale inference

Standardization of cell-free methylated DNA immunoprecipitation (cfMeDIP) results for population-scale inference


Author(s): Nicholas Cheng,Althaf Singhawansa,Sasha Main,Ming Han,Tim Triche,Michael M. Hoffman,Samantha L Wilson,Daniel de Carvalho,Emma Bell

Affiliation(s): University Healthcare Network, Toronto, ON, CA



Minimally invasive diagnostic procedures using small quantities of biofluids have rapidly gained attention due to their low risk and high information yield. Traditional tissue biopsies not only present higher risk of infection, pain, and bleeding, but also assess a mixture of dormant and active diseased tissue which may or may not represent cells of interest. Liquid biopsies sidestep both of these hazards and further enable serial monitoring of diseased tissue, subclonal composition, and full-body sequelae. For example, in an individual who has received a stem cell transplant, not only can serial profiling assess engraftment and monitor for disease relapse, the presence of free nucleic acids from unexpected tissues can herald graft-vs-host disease and allow early prophylactic intervention against endothelial injury. In 2018, Shen and colleagues published a highly sensitive method for cell-free methylated DNA immunoprecipitation, suitable for inputs of template DNA as low as 10ng. By assessing not only DNA but also its modifications (specifically, clustered 5-methylcytosine), the method, cell-free methylated DNA immunoprecipitation sequencing (cfMeDIP-seq), provides additional information on the tissue of origin for cfDNA. To date, there exist at least half-a-dozen variations on the Shen et al., protocol. We know not the impact of these methodological inconsistencies on the resulting libraries. Furthermore, the impact of different protocols to account for them as we process and analyse the data is unclear. More well-established genomics methods incorporate synthetic spike-in controls to reduce technical variation. Wilson et al., recently published synthetic spike-in controls designed for cfMeDIP-seq experiments. This presents a major update to the cfMeDIP-seq method. Thus, our data processing and analysis strategy also requires update, and harmonization of the data is required to integrate recent whole-genome bisulfite sequencing and long-read results with the substantially greater scope of existing cfMeDIP results. Here, we report recommendations for cfMeDIP-seq library preparation, sequencing read processing, and data analysis. We address minimising batch effects, reproducible DNA methylation calling, quality control thresholds, and genomic regions of concern. In tandem, we present a step-by-step workflow, MEDIPIPE, and integration with the BSgenome and spiky R packages to reduce the method's steep learning curve. Adopting these techniques and metrics allow for the already broad scope of cfMeDIP profiling to be extended and standardized across laboratories, disease indications, and centers, maximizing the application of this sensitive and inexpensive assay as well as data from existing cohorts of rare disease subjects.