Synthetic reversed sequences reveal default genomic states

Design of synthetic loci

The synthetic HPRT1 locus has been described previously¹⁸. The synthetic HPRT1R locus was designed by reversing (but not reverse-complementing) the sequence of the human HPRT1 locus corresponding to hg38 chromosome X:134429208-134529874. HPRT1R^noCpG was designed starting with the HPRT1R sequence, using a Python script to scan the sequence for occurrences of CG and randomly delete either the C or the G. As this sequence transformation can result in the formation of new CG instances, the script was reiterated until no CG sequences remained. We used software developed in house to split the synthetic loci into smaller DNA segments for commercial DNA synthesis. HPRT1R was split into 28 segments, 27 of ~4 kb and one of ~2 kb, and HPRT1R^noCpG was split into 36 segments, 35 of ~3 kb and one of 1,300 bp. Each synthetic segment had overlaps of ~300 bp, in both termini, with the neighbouring segments. MenDEL⁶⁹ was used to design primers for junction PCR screening of yeast clones harbouring the correct assembly. Synthetic DNA segments were ordered from Qinglan Biotech, and junction PCR primers were ordered from IDT.

Synthetic loci sequence features

Dinucleotides were counted across each synthetic locus. Expected CpG number was calculated as (no. of C × no. of G)/sequence length and CpG ratio was calculated as observed CpG/expected CpG. Yeast TFBSs were predicted by scanning the DNA sequences with the YEASTRACT+ database⁶⁵. Mouse TFBSs were predicted using FIMO⁶⁶ in the MEME suite using the JASPAR vertebrate motif database⁶⁷.

Yeast assembly and BAC recovery

All yeast work was performed starting with the parental strain BY4741 using standard yeast media. HPRT1R was assembled from 28 synthetic DNA segments, first as two half-assemblies that were then combined using eSwAP-In¹⁸. HPRT1R^noCpG was assembled from 36 synthetic segments in one step. For both HPRT1R and HPRT1R^noCpG assemblies, ~50 ng each of linearized and gel-purified yeast assembly vector (YAV) (pLM1110 (ref. ¹⁷), Addgene #168460) backbone DNA and purified assembly fragments were transformed into yeast using the high-efficiency lithium acetate method⁷⁰. Transformants were plated on synthetic complete media lacking uracil or leucine (SC–Ura, SC–Leu) depending on the selectable marker (URA3 for HPRT1R segments 1–15 half-assembly, and LEU2 for HPRT1R segments 15–28 half-assembly and for HPRT1R^noCpG full assembly). Successful assemblies were screened by junction quantitative PCR (qPCR) on crude yeast genomic DNA (gDNA) prepared from 48 colonies from each assembly transformation. Crude yeast gDNA was prepared by performing three cycles of boiling in 20 mM NaOH at 98 °C for 3 min, followed by cooling at 4 °C for 1 min. Junction qPCRs were set up using an Echo 650 liquid handler (Labcyte) by dispensing 20 nl crude gDNA and 10 nl premixed junction primer pairs (50 µM) into a LightCycler 1536 Multiwell Plate (Roche 05358639001) containing 1 µl 1× LightCycler 1536 DNA Green mix (Roche 05573092001). qPCR reactions were performed using a LightCycler 1536 Instrument (Roche 05334276001) and successful assemblies were identified based on positive results for all junctions, defined as a having a C_t value lower than 30 (with exceptions for primer pairs determined to be consistently poor). Candidate assemblies were verified by next-generation sequencing. Libraries were prepared from 100 ng of DNA using the NEBNext Ultra II FS DNA Library Prep Kit for Illumina (NEB E7805L) with NEBNext Multiplex Oligos for Illumina (E7600S), according to the manufacturer’s protocol for FS DNA Library Prep Kit with Inputs ≤100 ng. Sequencing reactions were run on a NextSeq 500 system (Illumina SY-415-1001). Sequence-verified assemblons were recovered from yeast using the Zymoprep Yeast Miniprep I kit (Zymo Research D2001) and electroporated into TransforMax EPI300 Electrocompetent E. coli (Lucigen EC300150), recovered in LB + 5 mM MgCl₂ at 30 °C for 1 h and then selected on LB + kanamycin agar plates. Bacteria colonies were screened by colony PCR for one or two assembly junctions to confirm that they contained the assemblon, then assemblon DNA was isolated from overnight cultures using ZR BAC DNA Miniprep kit (Zymo Research D4048) and verified by next-generation sequencing. eSwAP-In¹⁸ was used to combine the two HPRT1R half-assemblies. The sequence-verified assembly of segments 15–28 was purified from E. coli and digested with I-SceI and NotI to release the HPRT1R portion along with the LEU2 marker. This digested segment was transformed into yeast harbouring the assemblon with segments 1–15, along with a Cas9–guide RNA (gRNA) expression vector, pYTK-Cas9 (ref. ⁷¹), with a URA3-targeting gRNA. The Cas9-induced break in the URA3 marker was repaired with the HPRT1R-15–28-LEU2 segment using homology provided by the common segment 15 and common sequence downstream of the selection markers. eSwAP-In transformants were selected on SC–Leu and colonies were picked to screen by junction PCR using a subset of primers spanning the entire locus. Candidate clones were verified by next-generation sequencing and recovered into E. coli as previously described.

The HPRT1 locus was transplanted from its original assembly vector¹⁸ by restriction digestion of purified assemblon DNA with NotI and NruI to release the HPRT1 locus, followed by co-transformation of the digested locus (~1.5 μg) along with the new, linearized, pLM1110 assembly vector (~100 ng) and linker DNAs that included loxP and loxM sites flanked by 200 bp of homology to the assembly vector and HPRT1 locus (~50 ng each). Forty-eight colonies were picked following transformation and selection and crude yeast gDNA was screened by PCR using primers spanning the vector-HPRT1 junctions. Candidate clones were verified by next-generation sequencing and recovered into E. coli as described above.

Assemblons were recovered from TransforMax EPI300 E. coli for delivery to mouse ES cells. Cultures of 250 ml cultures were grown at 30 °C with shaking overnight in LB + kanamycin + 0.04% arabinose to induce copy number amplification of the assemblon BAC. DNA was purified using the NucleoBond XtraBAC kit (Takara Bio 740436.25) and stored at 4 °C for less than one week before delivery to mouse ES cells.

Integrating loci into the yeast genome

A landing pad containing a URA3 cassette flanked by loxM and loxP sites was installed at YKL162C-A²¹ in yeast strains harbouring either HPRT1 or HPRT1R assemblons. The landing pad was co-transformed, along with linker DNAs with terminal homologies to the yeast genomic locus and to the landing pad cassette (~200 ng each), into yeast as described above. Colonies were selected on SC–Ura plates, and 4 colonies were picked from each transformation and screened by PCR using primers spanning the genome–landing pad junctions. Landing pad integration was verified by Sanger sequencing of PCR products spanning the genome–landing pad junctions. The synthetic HPRT1 and HPRT1R loci were integrated by Cre-mediated recombination. A HIS3 plasmid expressing Cre-recombinase from a galactose-inducible promoter (pSH62 (ref. ⁷²), Euroscarf P30120) was introduced by yeast transformation, single colonies were picked and grown to saturation in SC–His–Leu with raffinose, subcultured 1:100 in SC–His media with galactose, and plated on SC + 5-Fluoroorotic acid (5FOA) plates after 2 days of growth. 5FOA-resistant colonies were picked, screened by PCR using primers spanning the yeast genome–HPRT1 or HPRT1R junctions, and verified by next-generation whole-genome sequencing as described above. Engineered yeast strains are available upon request.

Sphis5 insertion and transcription factor knockouts

The His5 gene, including 5′ and 3′ untranslated regions, was cloned by PCR using Q5 high-fidelity DNA polymerase (New England Biolabs M0494L) from S. pombe genomic DNA. PCR primers were designed to add 40 bp of homology on each side for the desired target location in the synthetic HPRT1 or HRPT1R sequence, or in the yeast genome. Sphis5 PCR products were purified using the DNA Clean and Concentrator 5 kit (Zymo Research D4004) and transformed into HPRT1 or HPRT1R episome-harbouring yeast strains, as described above. Transformations were selected on SC–His–Leu plates and correct insertions were determined by PCR using a forward primer annealing in the in the predicted promoter regions within the HPRT1 or HPRT1R locus or yeast genome, outside of the homology arm, and a reverse primer annealing inside of the Sphis5 sequence.

Select transcription factor genes were knocked out of His⁺ yeast strains by cloning the URA3 expression cassette from pAV116 (Addgene #63183) using primers designed to add 40-bp homology arms targeting the genomic region upstream and downstream of the transcription factor coding sequence. URA3 PCR products were purified using the DNA Clean and Concentrator 5 kit (Zymo Research D4004) and transformed into His⁺ yeast strains as above. Transformations were selected on SC–Leu–Ura and correct knockouts were verified by PCR using two sets of primers spanning the URA3–genome junctions.

Yeast spot assays

Fitness of yeast strains following Sphis5 insertions and transcription factor knockouts was assessed by spot assay. Yeast strains were grown to saturation in selective media and diluted to OD₆₀₀ of 1 in sterile water. Five tenfold serial dilutions were made of each strain, and 5 μl of each dilution was spotted on agar plates using a multichannel pipette. Plates were incubated at 37 °C for 2 days before imaging. 3-AT, a competitive inhibitor of the Sphis5 gene product, was used to better identify small magnitude changes in expression.

Mouse ES cell culture

C57BL6/6J × CAST/EiJ (BL6xCAST) ΔPiga mouse ES cells, which enable PIGA-based Big-IN genome rewriting, have been described previously¹⁷. Mouse ES cells were cultured in 80/20 medium, which consists of 80% 2i medium (1:1 mixture of Advanced DMEM/F12 (ThermoFisher 12634010) and Neurobasal-A (ThermoFisher 10888022) supplemented with 1% N2 Supplement (ThermoFisher 17502048), 2% B27 Supplement (ThermoFisher 17504044), 1% GlutaMAX (ThermoFisher 35050061), 1% penicillin-streptomycin (ThermoFisher 15140122), 0.1 mM 2-mercaptoethanol (Sigma M3148), 1,250 U ml⁻¹ LIF (ESGRO ESG1107l), 3 μM CHIR99021 (R&D Systems 4423), and 1 μM PD0325901 (Sigma PZ0162)), and 20% mouse ES cell medium (KnockOut DMEM (ThermoFisher 10829018) supplemented with 15% FBS (BenchMark 100106), 0.1 mM 2-mercaptoethanol, 1% GlutaMAX, 1% MEM non-essential amino acids (ThermoFisher 11140050), 1% nucleosides (EMD Millipore ES-008-D), 1% penicillin-streptomycin, and 1,250 U ml⁻¹ LIF). Mouse ES cells were maintained on plates coated with 0.1% gelatin (EMD Millipore ES-006-B) at 37 °C in a humidified incubator with 5% CO₂. C57BL6/6J × CAST/EiJ (BL6xCAST) mouse ES cells were originally provided by D. Spector, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY. The BL6xCAST cell line was authenticated in next-generation capture-sequencing experiments, confirming cells as C57BL6/6J × CAST/EiJ hybrids on the basis of species-specific single-nucleotide polymorphisms. Cell lines were verified to be mycoplasma free prior to the study. There was no indication of contamination of any kind.

Integrating synthetic loci into mouse ES cells

Integration of synthetic loci was performed using the Big-IN method¹⁷. First, a landing pad, LP-PIGA2, containing a polycistronic cassette, pEF1 α-PuroR-P2A-PIGA-P2A-mScarlet-EF1αpA, for selection and counterselection and flanked by loxM and loxP sites, was modified with homology arms for targeting the landing pad to the mouse Hprt1 locus. Specifically, ~130-bp homology arms (amplified from a mouse Hprt1 BAC) flanked by gRNA sites for the Hprt1-targeting gRNAs (see below) and protospacer adjacent motifs were cloned flanking the lox sites using BsaI Golden Gate Assembly. LP-PIGA2 was delivered to BL6xCAST ΔPiga mouse ES cells, along with Cas9–gRNA-expression plasmids (pSpCas9(BB)-2A-GFP, Addgene #48138) expressing gRNAs that target sites flanking the Hprt1 locus, by nucleofection using the Neon Transfection System (ThermoFisher) as described¹⁷. One million cells were used per transfection with 5 μg of the landing pad plasmid and 2.5 μg each of Cas9–gRNA-expression plasmids. Cells were selected with 1 μg ml⁻¹ puromycin starting day 1 post-transfection, with 6-thioguanine (Sigma-Aldrich A4660) starting day 7 post-transfection to select for the loss of Hprt1, and with 1 µM ganciclovir (Sigma PHR1593) to select against the landing pad plasmid backbone that contained a HSV1-ΔTK expression cassette. Candidate clones were picked on day 10, screened by qPCR using primers spanning the mouse genome–landing pad junctions and with primers for validating the loss of the endogenous Hprt1 gene and the absence of landing pad backbone or pSpCas9 plasmid integration. Mouse ES cell clones were further verified by next-generation baited Capture-seq¹⁷ that the Hprt1 locus was deleted and the landing pad was present on target. Genomic integration of a landing pad at Sox2 has been described²⁰, replacing only the BL6 allele in the hybrid BL6xCAST cell line, leaving the CAST Sox2 allele intact. Engineered mouse ES cell lines are available upon request.

Delivery of the synthetic locus payloads was performed as described¹⁷ using the Amaxa 2b nucleofector (program A-23). In brief, 5 million cells were nucleofected with 5 μg pCAG-iCre (Addgene #89573) and 5 μg of assemblon DNA. Nucleofected mouse ES cells were treated with 10 µg ml⁻¹ blasticidin for 2 days starting 1 day post-transfection to transiently select for the presence of the synthetic assemblons, and then with 2 nM proaerolysin for 2 days starting day 7 post-transfection to select for loss of PIGA in the landing pad cassette. Cells delivered with HPRT1 were also selected with HAT medium (ThermoFisher Scientific 21060017) starting day 7 post-transfection. Clones were picked on day 9 post-transfection, expanded, and screened first by qPCR aided by an Echo 550 liquid handler (Labcyte) as described²⁰ using primers spanning the junctions between the mouse genome and HPRT1 or HPRT1R synthetic loci, and verified by Capture-seq¹⁷. For each locus integration we established two clonal cell lines from independent integration events.

Whole-genome sequencing and Capture-seq

Whole-genome sequencing and Capture-seq were performed as previously described¹⁷. Biotinylated bait DNA was generated by nick translation from purified BACs and plasmids of interest: the mouse Hprt1– and Sox2-containing BACs (RP23-412J16, RP23-274P9 respectively, BACPAC Resources Center), the synthetic HPRT1, HPRT1R, and HPRT1R^noCpG BACs, LP-PIGA2, pCAG-iCre and pSpCas9(BB)-2A-GFP.

Sequencing and initial data processing were performed according to as previously described¹⁷ with modifications. Illumina libraries were sequenced in paired-end mode on an Illumina NextSeq 500 operated at the Institute for Systems Genetics. All data were initially processed using a uniform mapping pipeline. Sequencing adapters were trimmed with Trimmomatic v0.39 (ref. ⁷³). Whole-genome and Capture-seq reads were aligned using BWA v0.7.17 (ref. ⁷⁴) to a reference genome (SacCer_April2011/sacCer3 or GRCm38/mm10), including unscaffolded contigs and alternate references, as well as independently to HPRT1 and HPRT1R custom references for relevant samples. PCR duplicates were marked using samblaster v0.1.24 (ref. ⁷⁵). Generation of per base coverage depth tracks and quantification was performed using BEDOPS v2.4.35 (ref. ⁷⁶). Data were visualized using the University of California, Santa Cruz Genome Browser. On-target, single-copy integrations are validated using DELLY⁷⁷ call copy number variations, and bamintersect¹⁷ to identify unexpectedly mapping read pairs. Using these quality control steps, DELLY will identify duplications or deletions, and bamintersect will identify duplications based on read pairs mapping either between the end and the start of the synthetic locus (if duplicated in tandem) or between the synthetic locus and an unexpected genomic location (if duplicated by off-target integration). The sequencing processing pipeline is available at https://github.com/mauranolab/mapping.

ATAC-seq

For yeast, two independent clones for each strain were inoculated into 5 ml of SC–Leu (for assemblon strains) or YPD (for integration strains) for overnight culture at 30 °C. Saturated overnight cultures were diluted to an OD₆₀₀ of 0.1 and cultured for 6 h at 30 °C, until OD₆₀₀ reached ~0.6. Around 5 × 10⁶ cells were taken from each culture, pelleted at 3,000g for 5 min, washed twice with 500 μl spheroplasting buffer (1.4 M sorbitol, 40 mM HEPES-KOH pH 7.5, 0.5 mM MgCl₂), resuspended in 100 μl spheroplasting buffer with 0.2 U μl⁻¹ zymolyase (Zymo Research E1004), then incubated for 30 min at 30 °C on a rotator. Spheroplasts were washed twice with 500 μl spheroplasting buffer then resuspended in 50 μl 1× TD buffer with TDE (Illumina 20034197). Tagmentation was performed for 30 min at 37 °C on a rotator and DNA was purified using the DNA Clean and Concentrator 5 kit (Zymo Research D4004). PCR was performed as previously described⁷⁸ using 11 total cycles. The libraries were sequenced with 36-bp paired-end reads on a NextSeq 500 for ~1 million reads per sample.

For mouse ES cells, two independent cultures of each cell line were grown to medium confluency in 6-well plates. Cells were harvested by washing once with PBS, dissociated into single-cell suspension with TrypLE Express (ThermoFisher 12604013) and then neutralizing with equal volume mouse ES cell medium. Cells were counted and 50,000 were taken for tagmentation. Cells were pelleted at 500g for 5 min at 4 °C, washed with 50 μl cold PBS, resuspended in 50 μl cold ATAC lysis buffer (10 mM Tris-HCl, pH 7.4, 10 mM NaCl, 3 mM MgCl₂, 0.1% IGEPAL CA-630), spun down at 500g for 10 mins at 4 °C, resuspended in 50 μl TDE mix, and incubated at 37 °C on rotator for 30 mins. DNA was purified using the DNA Clean and Concentrator 5 kit (Zymo Research D4004). PCR was performed as previously described⁷⁸ using 10 total cycles. The libraries were sequenced with 36-bp paired-end reads for ~50 million reads per sample.

Illumina libraries were sequenced on an Illumina NextSeq 500 operated at the Institute for Systems Genetics. Sequencing adapters were trimmed with Trimmomatic v0.39 (ref. ⁷³). Reads were aligned using bowtie2 v2.2.9 (ref. ⁷⁹) to custom references in which the synthetic locus sequences were present on separate chromosomes or inserted at their specific integration sites in the SacCer_April2011/sacCer3 or GRCm38/mm10 genomes (produced using the reform tool; https://gencore.bio.nyu.edu/reform/). Coverage tracks were produced in bigWig format using bamCoverage (deepTools v3.5.0)⁸⁰ with bin size 10 and smooth length 100, normalized using RPGC to an effective genome size of 12,000,000 for sacCer3 and 2652783500 for mm10, and visualized using IGV v2.12.3 (ref. ⁸¹). Peaks were called using macs2 v2.1.0 (ref. ⁸²) with the parameters: –nomodel -f BAMPE –keep-dup all -g 1.2e7 (sacCer3)/1.87e9 (mm10). Relative coverage analysis was performed as described below.

RNA-seq

For yeast, the remaining culture that was not used for ATAC-seq was centrifuged at 3,000g for 5 min to pellet cells, washed once with water, pelleted again at 3,000g for 5 min, and cell pellets were frozen at −80 °C. Frozen pellets were resuspended in 200 μl lysis buffer (50 mM Tris-HCl pH 8, 100 mM NaCl) and lysed by disruption with an equal volume of acid washed glass beads, vortexing 10× 15 s. 300 μl lysis buffer was added and samples were mixed by inversion followed by a short centrifugation to collect all liquid in the tube. Supernatant (450 μl) was mixed with an equal volume of phenol:chloroform:isoamyl alcohol, vortexed for 1 min, and centrifuged at maximum speed for 5 min. 350 μl of the aqueous layer was then mixed with an equal volume of phenol:chloroform:isoamyl alcohol, vortexed for 1 min, and centrifuged at maximum speed for 5 min. RNA was precipitated from 300 μl of the aqueous phase by adding 30 μl of 3 M sodium acetate and 800 μl of cold 99.5% ethanol, briefly vortexing, and centrifuging at maximum speed for 10 min. The pellet was rinsed with 70% ethanol and dried at room temperature before dissolving in 100 μl of RNase-free DNase set (Qiagen 79254) and incubating at room temperature for 10 min to remove DNA. RNA was purified using the RNeasy Plus Mini kit (Qiagen 74136) and eluted in 30 μl RNase-free water. RNA-seq libraries were prepared from 1 μg total RNA using the QIAseq FastSelect -rRNA Yeast kit (Qiagen 334217) and QIAseq Stranded RNA Library kit (Qiagen 180743) according to the manufacturer’s protocol. The libraries were sequenced on a NextSeq 500 with 75 bp paired-end reads for ~45 million reads per sample.

For mouse ES cells, the remaining cells that were not used for ATAC-seq were pelleted at 500g for 5 min and RNA was isolated using Qiagen RNeasy Plus Mini kit, resuspending in 350 μl buffer RLT Plus + β-mercaptoethanol, with homogenization using QIAshredder columns (Qiagen 79654). RNA-seq libraries were prepared from 1 μg total RNA using QIAseq FastSelect -rRNA HMR (Qiagen 334386) and QIAseq Stranded RNA kits (Qiagen 180743) according to the manufacturer’s protocol. The libraries were sequenced with 75-bp paired-end reads for ~50 million reads per sample.

Illumina libraries were sequenced on an Illumina NextSeq 500 operated at the Institute for Systems Genetics. Sequencing adapters were trimmed with Trimmomatic v0.39 (ref. ⁷³). STAR (v2.5.2a)⁸³ was used to align reads, without providing a gene annotation file, to custom references in which the synthetic HPRT1 and HPRT1R sequences were present on separate chromosomes or inserted at their specific integration sites in the SacCer_April2011/sacCer3 or GRCm38/mm10 genomes (produced using the reform tool; https://gencore.bio.nyu.edu/reform/). Coverage tracks were produced in bigWig format using bamCoverage (deepTools v3.5.0)⁸⁰ with bin size 10 and smooth length 100, filtering by strand, normalizing using TMM⁸⁴, and visualized using IGV v2.12.3 (ref. ⁸¹). Relative coverage analysis was performed as described below.

CUT&RUN

For yeast, two independent colonies for each strain were inoculated into 5 ml of SC–Leu (for assemblon strains) or YPD (for integration strains) for overnight culture at 30 °C. Saturated overnight cultures were diluted to OD₆₀₀ of 0.1 and cultured for ~6 h at 30 °C, until OD₆₀₀ reached ~0.6. Cells were pelleted at 3,000g for 5 min, washed twice with water, and resuspended in spheroplasting buffer (1.4 M sorbitol, 40 mM HEPES-KOH pH 7.5, 0.5 mM MgCl₂, 0.5 mM 2-mercaptoethanol). Spheroplasting was performed by adding 0.125 U μl⁻¹ Zymolyase (Zymo Research E1004) and incubating at 37 °C for 45 min on a rotator. Nuclei were prepared as previously described⁸⁵. Resuspended nuclei were split into aliquots of ~10⁸ nuclei each and snap frozen in liquid nitrogen.

For mouse ES cells, two independent cultures for each engineered cell line cells were harvested from tissue culture dishes using TrypLE Express (ThermoFisher 12604013), dissociated into single-cell suspension, and quenched with mouse ES cell medium. Crosslinking was performed by adding formaldehyde to a final concentration of 0.1% (v/v) and incubating at room temperature for 5 min with occasional mixing by inversion. Crosslinking was stopped by quenching with 125 mM glycine and incubating at room temperature for 5 min with occasional mixing by inversion. DMSO was added to a final concentration of 10% (v/v) and cells were frozen in aliquots of ~10⁶ cells.

Isolated yeast nuclei (~10⁸ per sample) or crosslinked mouse ES cells (~10⁶ per sample) were thawed and processed for CUT&RUN using the CUTANA ChIC/CUT&RUN kit (EpiCypher 14-1048) according to the manufacturer’s protocol. Antibodies were all used at 0.5 μg: rabbit IgG negative control (EpiCypher 13-0042), H3K4me3 (EpiCypher 13-0041), H3K27ac (EpiCypher 13-0045), H3K27me3 (Active Motif 39055, RRID: AB_2561020), RNAP2 (Santa Cruz Biotechnology sc-56767). Sequencing libraries were prepared using the NEBNext Ultra II DNA Library Prep Kit for Illumina (New England Biolabs E7645L) and sequenced with 75 bp paired-end reads for ~15 M reads for H3K4me3 and Pol II samples, and ~20 M reads for H3K27ac and H3K27me3 samples.

Illumina libraries were sequenced on an Illumina NextSeq 500 operated at the Institute for Systems Genetics. Sequencing adapters were trimmed with Trimmomatic v0.39 (ref. ⁷³). Reads were aligned using bowtie2 v2.2.9 (ref. ⁷⁹) to custom references in which the synthetic HPRT1 and HPRT1R sequences were present on separate chromosomes or inserted at their specific integration sites in the SacCer_April2011/sacCer3 or GRCm38/mm10 genomes (produced using the reform tool; https://gencore.bio.nyu.edu/reform/). Coverage tracks were produced in bigWig format using bamCoverage (deepTools v3.5.0)⁸⁰ with bin size 10 and smooth length 100, normalized using RPGC to an effective genome size of 12,000,000 for sacCer3 and 2,652,783,500 for mm10, and visualized using IGV v2.12.3 (ref. ⁸¹). Peaks were called using macs2 v2.1.0 (ref. ⁸²) with the parameters: –nomodel -f BAMPE –keep-dup all -g 1.2e7 (sacCer3)/1.87e9 (mm10). Relative coverage analysis was performed as described below.

CAGE-seq

RNA was isolated as described above for RNA-seq, using two replicate colonies for each yeast strain. CAGE libraries were prepared as previously described²⁴,starting with 5 μg RNA, with the following modifications. SuperScript IV Reverse Transcriptase (Invitrogen 18090010) was used for the reverse transcription step. AMPure XP beads (Beckman Coulter A63881) were used for all bead cleanup steps. We also used custom-made linker and primer oligonucleotides so that linkers are universal to all samples and primers contain sample-specific barcodes. Libraries were amplified using universal forward and reverse primers with 20 cycles of PCR. Libraries were sequenced on with 75 bp paired-end reads for ~22 million reads per sample.

Illumina libraries were sequenced on an Illumina NextSeq 500 operated at the Institute for Systems Genetics. Sequencing adapters were trimmed with Trimmomatic v0.39 (ref. ⁷³). The 5′ reads only were aligned using bowtie2 v2.2.9 (ref. ⁷⁹) to custom references in which the synthetic HPRT1 and HPRT1R sequences were present on separate chromosomes or inserted at their specific integration sites in the SacCer_April2011/sacCer3 or GRCm38/mm10 genomes (produced using the reform tool; https://gencore.bio.nyu.edu/reform/). Coverage tracks were produced in bigWig format using bamCoverage (deepTools v3.5.0)⁸⁰ with bin size 1, filtering by strand, normalized using RPGC to an effective genome size of 12,000,000, and visualized using IGV v2.12.3 (ref. ⁸¹). Peaks were called using macs2 v2.1.0 (ref. ⁸²) with the parameters: –nomodel -f BAM –keep-dup all -g 1.2e7.

Locus copy number estimation

For copy number estimation in yeast strains, coverage depth was calculated from whole-genome sequencing data for the synthetic HPRT1 and HPRT1R loci as well as the entire yeast genome (excluding chrM) using samtools v1.9 depth⁸⁶, and the calculated depth of the synthetic loci was divided by the genome average.

Sequencing coverage analysis

Relative coverage analysis was performed for yeast ATAC-seq, RNA-seq, and CUT&RUN experiments. Average coverage depth was calculated over the synthetic HPRT1 and HPRT1R loci, 100-kb sliding windows of yeast genome using samtools v1.9 bedcov⁸⁶, which reports the total read base count (the sum of per base read depths) per specified region, and then dividing the total read base count by the region size − 100,735 bp for the HPRT1/HPRT1R loci or 100,000 bp for the 100-kb windows. Coverage was corrected for estimated copy numbers of the HPRT1 and HPRT1R episomes. The yeast genome was split into 100-kb sliding windows with 10-kb step size using bedtools v2.29.2 makewindows⁸⁷. The average of the 100-kb windows was then calculated. The average coverage depth over the synthetic loci was then divided by the relevant genome average to determine relative coverage depth in each context (that is, HPRT1 average coverage/average coverage of yeast 100-kb windows = relative coverage of HPRT1 compared to the yeast genome). For peak analysis, total peaks were counted across the HPRT1 and HPRT1R loci, or averaged over the yeast genome 100-kb windows.

For mouse genome RNA-seq read analysis, the mouse genome was split into 100-kb sliding windows with 10-kb step size using bedtools v2.29.2 makewindows⁸⁷. The windows were then filtered to exclude ENCODE blacklist regions⁸⁸, centromeres, telomeres, and annotated transcripts based on Gencode comprehensive gene annotation, release M10 (GRCm38.p4). RNA-seq reads were counted for the synthetic loci and for the 100-kb genomic windows using samtools v1.9 (ref. ⁸⁶) view with arguments -c -F 2308 -L (reference bed file).

Replicate correlation

Correlation between sequencing assay replicates was assessed using deepTools v3.5.0 (ref. ⁸⁰) multiBigwigSummary to first calculate average bigWig scores for each dataset across the mouse genome in 10-kb bins, and across the yeast genome in 100-bp bins. Biological and technical replicates were compared using plotCorrelation with the following arguments: –corMethod pearson –whatToPlot scatterplot –skipZeros –removeOutliers –log1p.

Metaplots analysis

TSSs were defined as the 5′ coordinate of the experimentally identified CAGE-seq peaks. Metaplots were produced using deepTools v3.5.0 (ref. ⁸⁰) computeMatrix and plotProfile, with argument –plotType se. Matrices were computed for ATAC-seq and H3K4me3 CUT&RUN signals and profiles were plotted for TSSs across the HPRT1 and HPRT1R loci and across the rest of the yeast genome.

Motif analysis

Putative promoter regions in the synthetic HPRT1 and HPRT1R loci were defined as 200 bp upstream and 100 bp downstream of the TSSs identified based on CAGE-seq peaks (above). Motif discovery was performed on the putative promoter regions, ATAC-seq peaks, and ATAC-seq peaks that intersect with putative promoters, identified with bedtools v2.29.2 intersect⁸⁷. Regions of interest were combined from HPRT1 and HPRT1R for motif analysis using MEME v4.102 (ref. ²⁵) with a maximum motif width of 10 bp. This width was determined empirically by observing that increasing widths did not result in the predicting of any more informative motifs. Tomtom²⁷ was performed to scan the identified motifs for matches to motifs in the YEASTRACT database⁶⁵. GOmo⁸⁹ was performed to identify gene ontology terms linked to gene promoters containing the identified motifs.

Public sequencing data

We obtained UCSC browser data for CpG islands^90,91, as well as the following ENCODE data⁹². DNase-seq from ES-E14 mouse embryonic stem cells, ENCSR000CMW⁹³. Chromatin immunoprecipitation with sequencing (ChIP-seq) from ES-Bruce mouse embryonic stem cells, ENCSR000CBG, ENCSR000CDE, ENCSR000CFN⁹⁴, ENCSR000CCC. RNA-seq from ES-E14 mouse embryonic stem cells, ENCSR000CWC, ENCSR000CWC. ATAC-seq data from embryonic day (E)11.5 mouse embryonic tissue, ENCSR282YTE, ENCFF936VGM²⁸. ChIP-seq data from E11.5 mouse embryonic tissue, ENCSR427OZM, ENCFF952ZWD, ENCSR531RZS, ENCFF033UPR, ENCSR240OUM, ENCFF179QWF²⁸. DNase-seq from H1 human ES cells ENCSR000EJN, ChIP-seq from H1 human ES cells ENCSR443YAS, ENCSR880SUY, ENCSR928HYM, RNA-seq from H1 human ES cells ENCSR000COU⁹⁵. Long RNA-seq data from H1 human ES cells, ENCSR000COU, ENCFF563OKS, ENCFF501KFP, ENCFF407PJY, ENCFF761BKF².

We obtained public sequencing data for yeast from the following datasets (Gene Expression Omnibus (GEO) accession numbers): ATAC-seq (GSM6139041), H3K4me3 ChIP-seq (GSM3193266), RNA-seq (GSM5702033) and yeast CAGE-seq (ref. ⁹⁶).

DNA reagents

Sequences and identifiers, where applicable, for all DNA reagents used in this study are available as supplementary material, including all oligonucleotides, synthetic DNA segments, plasmids, landing pads, homology arms and yeast strains.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Source link