Developmental DNA rearrangements has been shown to occur in a wide variety of multi-cellular organisms, but only in a single group of unicellular eukaryotes : the ciliates. Species among this monophyletic group, like Paramecium tetraurelia harbours two kind of nuclei in the same cytoplasm, each of which having different biological function. The germinal nucleus (micronucleus - MIC) undergoes meiosis and karyogamy, hence it persists across generations. The associated MIC genome (currently unassembled) is diploid, and contains non-expressed genes interrupted by parasitic DNA elements. The somatic nucleus (macronucleus - MAC) is required for gene expression, but is lost at each sexual reproduction, then a new one develops from a copy of a micronucleus. Its associated MAC genome (assembled), highly polyploid, is a subset of the MIC genome out of which transposons, repeated sequences and interspersed sequences have been eliminated during MAC development. In the end, the MAC genome is streamlined for gene expression, with 80 % of its sequence included in exon.
Identification of the deleted sequences has recently been achieved thanks to the discovery of the endonuclease required for DNA elimination. This protein has been named PiggyMac (Pgm) since it is a Piggybac domesticated transposase. The inactivation of PGM by RNAi lead to the development of an unrearranged macronucleus. High throughput sequencing (HTS) allowed us to get a first insight into the micronuclear genome on a large scale. We developed bioinformatic tools to identify the deleted sequences, resulting in a list of ~45 000 Internal Eliminated Sequences (IES), that are in single copy but derives from Tc1/mariner elements. We refined our pipelines to study, in collaboration, how sequences are marked epigenetically to be deleted. We show that only 7% of IESs are dependant on meiotic short RNAs for their excision and 70 % requires histone marks such as H3K27me3 and H3K4me3. This raises questions about how the remaining 30% are recognized by the endonuclease.