DNA methylation, the covalent addition of a methyl group to cytosine, is known to have critical roles in gene regulation and modifying transcription factor binding affinity. Its role in gene silencing and genomic imprinting is also well studied1,2. Genomic methylation occurs primarily on the approximately 1 billion cytosines in the genome almost exclusively in the context of cytosine-guanine dinucleotides (mC, CG) for most cell types1. DNA methylation is correlated to gene expression3,4 and reflects cellular identity5. DNA methylation has also been linked to neurodevelopmental disorders in the human frontal cortex6. Notably, methylation also occurs at non-CG dinucleotides and is referred to as CH methylation (mCH, H = adenosine, thymine, or cytosine). This is occurs at high levels in embryonic stem cells and mature neurons, though at different trinucleotide patterns, namely CAG for stem cells and CAC for neurons1,7. In mature neurons the amount of mCH exceeds that of mCG during synaptogenesis, roughly four weeks after birth in mice or two years after birth in humans4,8,9. Remarkably, gene body mCH levels in neurons negatively, yet strongly, correlates with transcriptomic expression and are useful for cell type identification10. In bulk methylation profiling of cortical organoids, Luo et al. were able to capture the transition of dominant mCH from CAG to CAC during the transition from neuroepithelial cells to mature neurons, suggesting a point of methylome transition from stem-like to neuronal-like8. This provides both a model system and a key time-point for future analyses of mCH levels and their regulation11. Organoids were observed to have changes in methylome profiling from fetal cortex. These changes manifest as differential methylation across extracellular matrix genes (possibly due to the inclusion of matrigel in culture) and hypomethylation around pericentromeric regions (a previously reported phenomenon for induced pluripotent stem cells)11.
In the native methylation, reduction of mC is catalyzed by the Tet family of mC hydroxylase proteins, converting the methyl- moiety to hydroxymethyl-, formyl-, and carboxyl- progressively. Hydroxymethylation (5hmC) occurs almost exclusively in the CG context and accumulates in mature neurons. In neurons, 5hmC is known to be enriched near constitutively expressed promoter regions4. To this day the role of 5hmC is understudied, likely due to the inability to distinguish mC and 5hmC by the most commonly used assay for methylation, bisulfite conversion. Two reports through alternative assays demonstrate a ratio of 5hmC to mC of 30-50% in mature excitatory neurons4,12. Alternatively, new enzymatic methods have been described in which APOBEC3A, a natively expressed deaminase induces direct cytosine deamination in an in vitro reaction13. To date, this method has not been published as a single-cell protocol, however, it does have a promising adaptation for assaying the understudied moiety 5hmC12.
Methylation profiling genome-wide is achieved by the selective mutation of non-methylated cytosines. Sodium bisulfite is applied to genomic DNA which effectively deaminates non-methylated cytosine to uracil, through a three-step reaction. Importantly, uracil complements with adenine, which means subsequent library amplification will report non-methylated cytosines as thymine. Through this point mutation in the reference genome, namely thymine where cytosines were expected, methylation profiles can be inferred (bisulfite-sequencing, BS-seq)14. The first reported protocol for single-cell methylation was scRRBS (reduced representation bisulfite sequencing, Figure 12a). This method uses a methylation-insensitive restriction enzyme (MspI) to digest genomic DNA prior to bisulfite conversion. MspI is used to enrich for CG-rich regions across the genome, via its cut site (5’C|CGG). The resulting sticky ends enriched at CG-rich genome regions are then adapter ligated, DNA is bisulfite converted, and sequencing libraries are prepared15.
BS-seq is harsh and fragments genomic DNA. This is of high concern for scaling the assay to single-cell resolution. To avoid heavy losses of genomic capture, post-bisulfite adapter tagging (PBAT) is used. In PBAT library adapters necessary for PCR and sequencing are added to genomic DNA after BS conversion (Figure 11b)5,10,16. In this order of events, BS conversion fragments the genome and denatures DNA to a single-stranded state. Single-cell PBAT strategies such as scBS-seq introduce adapters after conversion through random priming, similar to the single-cell whole genome method DOP-PCR16,17. Secondary adapters are then added and libraries can be sequenced. An alternative approach, single-nucleus methylome sequencing (snmC-seq), uses a blunt-end adapter tagging strategy (Figure 12c)10. Cells are fully lysed by the bisulfite conversion chemical reaction, making this protocol difficult but not impossible to adapt to higher cell count strategies. I detail a new method for high throughput single-cell methylome library generation (sci-MET). In this method I use custom sequencing adapters and indexes depleted in cytosines. The lack of cytosines prevents BS conversion changing the indexes, allowing for the split-pool indexing necessary for sci- chemistry (Figure 12d).
Analysis of single-cell methylation profiles leverage the point-mutations induced through BS conversion. These mutations lead to decreased library complexity and can make reference alignment difficult. To account for this, special considerations must be taken. In one approach, the tool Bismark18 generates four pre-converted reference genomes to account for the full bisulfite treatment of each possible strand of genomic DNA prior to running the short read sequence aligner Bowtie19. From this, base specific methylation of cytosines can be ascertained. Alignments with greater than 70% methylation of non-CG cytosines reported as methylated are generally removed from analysis as this suggests a read-specific failure of bisulfite conversion10. Following filtering, methylation rates (% methylated CG/all CG) are generated across genomic bins and used for dimensionality reduction and clustering. To account for depth of coverage, some strategies apply a post-hoc probabilistic binomial model, wherein region methylation rates are weighted by coverage16. Notably, for neuronal data, CH methylation rates performs better for discrimination of cell types than CG methylation rates10. Differentially methylated regions have been implicated as diagnostic biomarkers20, and can be calculated between cellular clusters via two-sided t-tests (Figure 13)21. High throughout single-cell methods will allow for exploratory analyses of methylome changes across complex systems such as neurodevelopment or tumor progression. It is with this motivation in mind that we developed sci-MET.