
Characterization of the nuclear and cytosolic transcriptomes in human brain tissue reveals new insights into the subcellular distribution of rna transcripts
- Select a language for the TTS:
- UK English Female
- UK English Male
- US English Female
- US English Male
- Australian Female
- Australian Male
- Language selected: (auto detect) - EN
Play all audios:

ABSTRACT Transcriptome analysis has mainly relied on analyzing RNA sequencing data from whole cells, overlooking the impact of subcellular RNA localization and its influence on our
understanding of gene function, and interpretation of gene expression signatures in cells. Here, we separated cytosolic and nuclear RNA from human fetal and adult brain samples and performed
a comprehensive analysis of cytosolic and nuclear transcriptomes. There are significant differences in RNA expression for protein-coding and lncRNA genes between cytosol and nucleus. We
show that transcripts encoding the nuclear-encoded mitochondrial proteins are significantly enriched in the cytosol compared to the rest of protein-coding genes. Differential expression
analysis between fetal and adult frontal cortex show that results obtained from the cytosolic RNA differ from results using nuclear RNA both at the level of transcript types and the number
of differentially expressed genes. Our data provide a resource for the subcellular localization of thousands of RNA transcripts in the human brain and highlight differences in using the
cytosolic or the nuclear transcriptomes for expression analysis. SIMILAR CONTENT BEING VIEWED BY OTHERS ANALYSIS OF SUBCELLULAR RNA FRACTIONS DEMONSTRATES SIGNIFICANT GENETIC REGULATION OF
GENE EXPRESSION IN HUMAN BRAIN POST-TRANSCRIPTIONALLY Article Open access 24 August 2023 SMALL NON-CODING RNA TRANSCRIPTOMIC PROFILING IN ADULT AND FETAL HUMAN BRAIN Article Open access 12
July 2024 SINGLE-NUCLEI ISOFORM RNA SEQUENCING UNLOCKS BARCODED EXON CONNECTIVITY IN FROZEN BRAIN TISSUE Article Open access 07 March 2022 INTRODUCTION The mammalian transcriptome harbors a
myriad of coding and non-coding RNA transcripts, with biogenesis, abundance and subcellular localization tightly regulated to match the physiology and function of the cell. Genome-wide
technologies such as RNA sequencing have been instrumental in characterizing the transcriptome architecture. However, transcriptome analysis using RNA sequencing has primarily been performed
on total or polyA + RNA from whole cells, overlooking the spatial dimension of gene expression at the subcellular level. Nevertheless, an increasing number of reports are pointing to the
importance of investigating the subcellular repertoire of RNA molecules and understanding the mechanisms that govern their distribution inside the cell1,2,3,4. Subcellular RNA localization
is widespread and conserved from bacteria to mammals, and it is becoming evident that this process plays crucial roles in regulating gene expression5,6. For mRNA, subcellular localization
provides a means to spatially control protein production and target proteins to their site of function7,8,9,10,11. Therefore, the differential distribution of mature mRNA between the nucleus
and the cytoplasm may have a direct impact on protein expression levels. In fact, it has been recently shown that nuclear retention of mature mRNA is a mechanism that involves a wide range
of protein-coding transcripts. This nuclear retention is believed to ultimately fine-tune mRNA translation in the cytoplasm12. Similarly, identifying the subcellular localization of
non-coding RNA such as long non-coding RNAs (lncRNAs) can provide substantial insights into their biology and function. Based on the link between localization, function, and regulation it is
important to map the subcellular localization of coding and non-coding RNA in different cells and tissues to obtain a more comprehensive understanding of RNA’s biological functions. Early
attempts to determine the cellular localization of RNA molecules were performed on one transcript at a time, mostly using RNA fluorescent in situ hybridization (RNA FISH)13,14. However, in
the last few years, the field has benefited from the use of various high-throughput technologies, such as expression microarrays and RNA-seq in combination with cellular
fractionation11,15,16,17. The general aim of these studies was to characterize the contribution of each of the nuclear and cytoplasmic transcriptomes to the global gene expression profiles.
Several studies have demonstrated that although the nucleus and cytoplasm contain overlapping populations of RNA transcripts, there are significant differences between the two fractions, and
many coding and non-coding transcripts are unique to one fraction17. The differentially localized transcripts exhibit particular global characteristics. For example, transcripts localized
to the cytoplasm tend to have shorter coding sequences and shorter Untranslated regions (UTRs) compared to the transcripts localized in the nuclear fraction15,18. The subcellular expression
patterns of lncRNA have also been the focus of several studies during the last years. Although a large number of lncRNAs have been identified, the biological relevance for the majority of
lncRNAs is still elusive19,20. Using microarrays, RNA-seq and imaging technologies, several reports have demonstrated that although lncRNA are mostly localized in the nucleus, cytoplasmic
lncRNAs also exist19,21,22,23,24,25. In fact, lncRNA subcellular localization was divided into three categories: (1) exclusively expressed in the nucleus, (2) mainly enriched in the nucleus
and, (3) mainly enriched in the cytoplasm26. Due to the importance of expanding the catalog of RNA subcellular localization, Zhang et al., established a database (RNALocate), from previously
published data, that documents the subcellular localization of different RNA types in several species3. However, most of the current knowledge about RNA localization is based on cell lines
and in most cases restricted to targeted experiments rather than analysis of the complete RNA co-expression profiles. Global profiling of the differences between nuclear and cytosolic
fractions is crucial for accurately interpreting data from single-cell transcriptome studies. Single-cell analysis from frozen tissue mainly involves sequencing RNA from sorted nuclei27.
Therefore, understanding the main differences between the nuclear and cytosolic transcriptome in human tissues is important for meaningful interpretation of results from single-cell RNA
sequencing experiments. In this study, we utilized an efficient cytoplasmic and nuclear RNA extraction protocol28 to purify populations of subcellular RNA fractions from human fetal and
adult frontal cortex, as well as fetal cerebellum. We investigate the differences in relative transcript abundance between cytosol and nucleus and highlight genes and gene categories that
are over-represented in either compartments. Our results also provide important insight into the subcellular localization of nuclear-encoded-mitochondrial proteins (NEMPs) and non-coding RNA
in brain tissue samples. RESULTS To characterize the abundance of transcripts in the nuclear and cytosolic RNA fractions from brain tissue we sequenced fractionated total RNA extracted from
six adult frontal cortex samples, three fetal frontal cortex samples, and cerebellum from the same three fetal samples. The nuclear and cytosolic RNA was also sequenced from the human
neuroblastoma cell line (SHSY-5Y) for reference. Total RNA sequencing was carried out using IonProton whole transcriptome with an average of 80.5 million mapped reads per nuclear sample and
46.9 million reads per cytosolic sample. The cytosolic sequence reads mapped primarily to exons (84.1% exonic mapping) while the nuclear reads, as expected, had a large proportion aligned to
intron and intergenic sequences (19.81% exonic mapping). As we demonstrated in our previous studies28,29, the lower percentage of exonic reads and higher percentage of intronic reads in the
nucleus compared to the cytosol are primarily due to the presence of nascent transcripts. Details about the samples and sequencing of each sample are provided in Supplementary Table 1.
Hierarchical clustering of all samples based on the expression of protein-coding genes and lncRNAs showed brain tissues clustering together with their corresponding subcellular fraction (a
cytosolic cluster and a nuclear cluster), while both fractions from the SHSY-5Y cell line formed a separate cluster (Fig. 1A). Similarly, principal component analysis (PCA) separated all the
samples according to their subcellular fractions and tissue sources along the PC1 and PC2, respectively (Fig. 1B). To further investigate the relative distribution of transcripts across the
cytosolic and nuclear fractions and identify genes that show differential abundance between nucleus and cytosol, we first investigated how different classes of transcripts are distributed.
We found that protein-coding genes were similarly distributed between the nuclear and the cytosolic fractions, whereas lncRNAs and small nucleolar RNAs (snoRNAs), as previously shown30, were
more abundant in the nuclear fraction (Fig. 1C,D). We could confirm the nuclear localization of a number of highly expressed lncRNAs with known nuclear localization (including _XIST_ and
_MALAT1_). We observed similar distribution patterns when each brain tissue was analyzed separately, as well as in SHSY-5Y (Supplementary Fig. 1). However, the most skewed distribution was
exhibited by snoRNAs, which are well-known to have their primary function in the nucleus. Some snoRNAs, for example, SNORD113-8 and SNORD114-12, were observed to have extreme nuclear
association (Fig. 1C). Overall, the PCA and distribution of the different transcripts are in agreement with previous findings and indicate that our analysis readily identifies transcripts
with a previously demonstrated cytosolic or nuclear localization3. PROTEIN CODING GENES To identify transcripts exhibiting differential distribution in the cytosolic and nuclear fractions,
we first analyzed protein-coding transcripts across all tissues. In total, we identified 5109 transcripts that were significantly more abundant in the cytosol and 5397 transcripts
significantly more abundant in the nucleus (_P__adj._ < 0.05, Supplementary Table 2, Supplementary Fig. 2A). The differentially distributed transcripts for each individual tissue are
listed in Supplementary Table 3. A separate analysis grouped by tissue type and age showed more significant transcripts in both cytosol and nucleus in the adult samples, reflecting that the
adult tissues represented a larger sample group with less inter-individual heterogeneity compared with the fetal tissues (Data not shown). To evaluate the potential contribution of nascent
RNA to transcripts localized to the nucleus, we performed a similar analysis using polyA + RNA-seq data from the cytosol and nucleus from the Encode cell lines. We observed similar
subcellular distribution patterns of transcript abundance seen in our brain cytosolic and nuclear total RNA-seq data (Supplementary Fig. 2B–D). This indicates that nascent RNA does not fully
explain the transcript localization observed in the nuclear fraction. Next, we performed gene ontology (GO) enrichment analysis of genes that were significantly more abundant in cytosol and
nucleus across all tissues. We found transcripts representing specific biological processes to be enriched (_P__adj_ < 0.05). In cytosol, there was significant enrichment of transcripts
involved in metabolic and catabolic processes, translation processes, macromolecular and protein complex organization and disassembly, and in the nervous system and neural projection
development. Significant GO categories for cellular compartment showed that the biological processes associated with cytosolic transcripts can be primarily linked to the extracellular
organelles, vesicles, and ribosomes. Whereas in the nucleus, there were fewer transcripts in a particular ontology and a small number of significant GO categories in general. Significant
categories include regulation of transcription, sensory processes, and regulation of various biosynthetic processes (See Supplementary Table 4 for enriched GO categories). Previous studies
demonstrated that transcripts localized to the cytosol or the nucleus possess particular genic features in terms of exon length17. Analysis of transcript abundance differences between the
cytosol and the nucleus and mature transcript length for the three sample groups indicates that long transcripts are more nuclear compared to short transcripts (Supplementary Fig. 3A),
highlighting a transcript length signature in subcellular mRNA distribution. However, no such pattern was observed when the transcript abundance differences between the cytosol and nucleus
were analyzed against the UTR length of the genes (data not shown). We also found, when comparing the global expression levels between nucleus and cytosol, that genes with relatively higher
expression tend to be shifted toward the cytosol (Supplementary Fig. 3B). To examine whether the higher expression in the cytosol is caused by the transcript length bias detected between the
cytosol and the nucleus, we analyzed the RPKM expression values and transcript lengths in the three sample groups. The results demonstrated no influence of transcript length on gene
expression (Supplementary Fig. 3C). Although not evident in the GO analysis, when we manually investigated lists of differentially localized genes in the cellular fractions, we observed that
many nuclear-encoded-mitochondrial proteins (NEMPs) were abundant in the cytosol. To further analyze the enrichment of NEMP transcripts, we extracted a full list of genes (1158) encoding
proteins with evidence for mitochondrial localization from Mitocharta2.031,32. By investigating gene expression and fold change differences for NEMP transcripts in the cytosol and nucleus,
we observed a clear shift for NEMP transcripts to the cytosolic compartment (Fig. 2A) and that NEMPs were significantly enriched in the cytosol compared to all other protein-coding genes (p
= 2.8 × 10e−76, Fig. 2B). In our cellular fractionation protocol, we used a centrifugation power of 10,000 rpm, which results in the mitochondria being pelleted together with the nuclear
fraction. Therefore, it is unlikely that the enrichment of NEMPs in the cytosol was due to NEMP RNAs being physically anchored to the mitochondria. We further validated this by showing that
the transcripts produced by the mitochondria were clearly nuclear in their localization (p = 6.4 × 10e−08, Fig. 2A,B) indicating that the mitochondria themselves indeed end up in the nuclear
fraction. The results of preferential cytosolic localization of NEMPs and nuclear localization of mitochondrial transcripts were consistent across all the tissues (Supplementary Fig. 4). To
experimentally validate the cytosolic localization of NEMPs, we performed RNA Fluorescence in situ hybridization (RNA FISH)33 on two highly expressed NEMP targets in SHSY-5Y cells. The
results demonstrated that both of the NEMPs targets were preferentially localized in the cytosolic fraction (Fig. 2C). To further test if the cytosolic enrichment of NEMPs was consistent
across various tissue types, we extended our analysis to cytosolic and nuclear RNA-seq data from 11 cell lines from the ENCODE project22,34. In agreement with our brain data, we found that
NEMPs exhibited preferential cytosolic localization in all the cell lines (Supplementary Fig. 5), indicating that the cytosolic localization of NEMPs is a general phenomenon that involves
various human tissues. One explanation for the increased abundance of a group of transcripts in the cytosol could be an increased mRNA half-life. To examine if the preferential cytosolic
localization of NEMPs was due to long half-life, we used published mRNA half-life data35 and inquired whether these transcripts have a longer half-life than the rest of protein-coding
transcripts. We found that NEMPs displayed significantly longer half-life compared to other protein-coding transcripts (p = 7.1 × 10e−12, Fig. 2D), which could explain the preferential
enrichment of NEMPs in the cytosol. An analysis of the half-life of all protein-coding transcripts significantly associated with the cytosolic or nuclear compartments showed a significant
difference in half-life, with a longer half-life for transcripts enriched in the cytosol (p = 1.6 × 10e−21, Fig. 2E). LONG NON-CODING RNA (LNCRNAS) We next investigated if lncRNAs also
exhibit differential distribution in the nuclear and cytosolic fractions across all samples. Overall, we found a higher representation of lncRNAs in the nucleus (Fig. 3A). Out of all
expressed lncRNAs, we identified 1,114 lncRNAs in the nucleus compared to 118 lncRNA in the cytosol (_P__adj._ < 0.05; Supplementary Table 5). The analysis in Fig. 3A also shows the
nuclear localization of previously reported nuclear lncRNAs including _MALAT1_, _MEG3,_ and _XIST._ The nuclear and cytosolic enrichment of lncRNA in the individual tissues are shown in
Supplementary Fig. 6 (A-C). The adult frontal cortex samples harbored the highest number of lncRNAs that showed significant enrichment in either nucleus or cytosol, followed by the fetal
cortex and fetal cerebellum, respectively (Fig. 3B and Supplementary Table 6). Although there was an overlap within the cytosolic and within nuclear lncRNAs between the tissues, each tissue
type harbored a group of unique lncRNAs. To further understand how subcellular localization of lncRNAs varies between different tissue types, we explored the subcellular distribution of
lncRNAs in 10 cell lines from the ENCODE project. In this analysis, we only focused on lncRNAs that showed significant subcellular localization in either the cytosol or the nucleus in the
brain samples. Although we found an overlap of lncRNAs between the subcellular compartments, there were many lncRNAs that exhibited differential subcellular localization between the brain
samples and the cell lines (Fig. 3C). A range from 35 to 69 lncRNA displayed differential localization between the brain and each of the different cell lines (Supplementary Table 7), such as
_LINC00672_, which was cytosolic in the brain and nuclear in 7 out of 10 cell lines. The lncRNAs _RP11-452F19.3_ and _RP11-449H11.1_ were nuclear in the brain and cytosolic in 5 and 3
different cell lines, respectively. We also found that many of the lncRNAs that display differential localization between the brain and cell lines differ within the different cell types
(Supplementary Table 7). These results further support that subcellular localization of lncRNAs is often tissue specific. The lncRNAs that displayed similar localization patterns between the
brain and cell lines are listed in Supplementary Table 8. THE CYTOSOLIC AND NUCLEAR TRANSCRIPTOMES AS A SOURCE FOR GENE EXPRESSION ANALYSIS There is a longstanding interest in understanding
the reliability of using the cytosolic or nuclear RNA as a proxy for whole-cell gene expression analysis. To obtain a more accurate view of the differences between the cytosolic and nuclear
transcriptomes as a source for gene expression analysis, we compared the results of differential gene expression analysis between cytosolic and nuclear transcriptomes from the fetal frontal
cortex and the adult frontal cortex (cytosolic vs cytosolic [cyto–cyto], and nuclear vs nuclear [nuc–nuc]). To control for the differences in the amount of sequencing data produced from the
fetal and adult tissues (Supplementary Table 1), we first subsampled the read counts to obtain equal library sizes for the samples. Differentially expressed genes (DEGs) between the two
samples for the cyto–cyto and nuc–nuc before and after subsampling showed that the subsampling had a marginal effect on the number of DEGs (Fig. 4A). Subsampled data were used for the rest
of the analysis in Fig. 4. We then compared the number of up-regulated and down-regulated genes between the two tissues in the cyto–cyto and nuc–nuc data. There was a clear difference in the
number of DEGs detected in cyto–cyto and nuc–nuc, with a higher number in the cyto–cyto analysis compared to the nuc–nuc (Fig. 4B). Comparing the lists of DEGs between the two analyses
revealed that although there was an overlap between the DEGs identified from the two analyses, a large fraction of the DEGs was unique to cyto–cyto analysis (Fig. 4B). GO analysis for the
unique list of DEGs obtained from the cyto–cyto comparison revealed enrichment of transcripts related to translational processes, various metabolic and biosynthetic processes and gene
expression and RNA processing. Meanwhile, the unique DEGs from the nuc–nuc analysis were mainly enriched for transcripts that belong to mitochondrial translation, which as we mentioned was a
result of the mitochondria being separated with the nuclear fraction in the extraction protocol (Supplementary Table 9–12). Dividing the lists of DEGs from Fig. 4B into protein-coding and
lncRNA revealed that even though the nuc–nuc analysis resulted in a smaller number of protein-coding genes, it harbored a larger number of differentially expressed lncRNAs as compared to the
cyto–cyto analysis, which was not surprising given that most lncRNAs were localized to the nucleus (Fig. 4C). Collectively, these results indicate that although the cytosol and the nuclear
fractions contain largely overlapping lists of DEGs, each fraction also contains unique groups of DEGs. To our surprise, when we investigated the expression fold change of all DEGs obtained
from cyto–cyto, and the nuc–nuc analysis, a group of 61 genes showed opposite differential expression patterns (Fig. 4D). For example, _ETV2,_ which encodes a transcription factor and is
overexpressed and critically required during development, was up-regulated in the fetal sample when using cyto–cyto analysis and up-regulated in the adult sample when using the nuc–nuc
analysis. Other genes that showed opposite differential expression patterns are highlighted in Fig. 4D and listed in Supplementary Table 13 and Supplementary Table 14. These results imply
that gene expression analysis using either of these fractions could lead to different conclusions on differential expression between samples. Therefore, we suggest that neither the cytosolic
nor the nuclear transcriptomes alone represent an accurate proxy for a complete view of gene expression levels in a cell. In addition, using either the cytosol or the nucleus may lead to
conflicting data. A recent study by Lake et al_._ revealed a high concordance in the nuclear and whole-cell transcriptomes from mice when profiling cellular diversity using
cell-type-specific marker gene expression signatures36. We compared our list of significant cytosolic and nuclear transcripts obtained from all brain samples with the whole-cell and nuclear
data from Lake et al_._ and found that our analysis resulted in a larger number of cytosolic/nuclear-localized genes (5201 compared to 1518 cytosolic genes, and 5548 compared to 791). This
is likely due to the higher complexity and heterogeneity of samples used in our study. We also found that 65% of their cytosolic-accumulated and 45% nuclear-accumulated genes overlapped with
our results (Supplementary Fig. 7). We investigated overlaps between our list of genes that demonstrated opposite differential expression patterns between the cytosol and nucleus
(Supplementary Table 13 and 14) with the list of cell-type-specific genes from Lake et al. We found three cell-type specific markers: _Hsd17b10_ for Oligodendrocyte, _AK3_ for Astrocyte, and
_Chrnb1_ for Endothelial cells which showed opposite differential expression patterns between the cytosol and the nucleus i.e. _Hsd17b10_ and _Chrnb1_ were upregulated in the fetal sample
when using cyto–cyto analysis and upregulated in the adult sample when using the nuc–nuc analysis. Meanwhile, _AK3_ was upregulated in the adult sample when using cyto–cyto analysis and
upregulated in the fetal sample when using the nuc–nuc analysis. DISCUSSION To date, transcriptome analysis has mainly relied on analyzing RNA sequencing data from whole cells, overlooking
the impact of subcellular RNA localization and its influence on our understanding and interpretation of gene expression patterns in cells and tissues. We argue that defining gene expression
patterns at the subcellular level is crucial to (1) understand the differences between the cytosolic and nuclear transcriptomes and how each fraction contributes to the global gene
expression patterns, (2) investigate the dynamics of subcellular RNA localization between different cells and different developmental stages as a regulatory mechanism to control protein
expression and influence RNA function, and (3) evaluate the reliability of using nuclear transcriptome in single-cell studies as a proxy for whole-cell analysis. In this study, we therefore
aimed to obtain a comprehensive overview of the subcellular localization of the different RNA transcripts in the human brain. One group of transcripts strongly associated with the cytosol
was the nuclear-encoded mitochondrial transcripts. The biogenesis of the mitochondria relies on dual genetic origins; the nuclear and the mitochondrial genomes, but 99% of the mitochondrial
proteins are encoded in the nucleus31,37. This imposes the need for tight coordination between the two compartments to regulate gene expression. The detailed mechanisms that regulate gene
and protein expression coordination between the nucleus and the mitochondria are not well understood. Nevertheless, it is believed that this coordination occurs both at mRNA and protein
levels. Previous reports demonstrated that many mRNAs encoding NEMPs are localized to the outer membrane of mitochondria and it is believed that these mRNAs are translated locally38. It has
also been demonstrated that mRNAs coding for NEMPs localize to and are translated by ribosomes in the cytoplasm39. However, an accurate estimation of the fraction of NEMP mRNAs that localize
to the mitochondrial-membrane and cytosol is still an open question, particularly in humans as most of the current knowledge thus far is based on drosophila and yeast40. In this study, we
demonstrate, for the first time, that NEMP mRNAs in the human brain are significantly enriched in the cytosol as compared to the rest of the protein-coding genes. We validated this
observation in ENCODE cell lines, where independent extraction protocols were employed. We also showed that the cytosolic localization of NEMP mRNAs was not a result of their association to
the mitochondrial membrane since the mitochondria end up in the nuclear fraction in our cellular fractionation procedure. NEMPs have previously been assigned to classes based on their
location of translation in relation to mitochondria (MLR class)40, but we found no correlation between increased cytosolic localization and MLR class (data not shown). We were also unable to
find any correlation with the strength of the mitochondrial localization signal or scores based on evidence for mitochondrial localization (TargetP score or MitoCarta2.0 score) from the
MitoCarta database31 (data not shown). We further determine that NEMP mRNAs on average have a longer half-life than the rest of protein-coding genes. Thus, our data suggest that NEMP mRNAs
are rapidly transported from the nucleus after transcription and accumulate in the cytoplasm until their translation is required by the mitochondria. In line with our results, a recent study
in yeast sought to monitor nuclear and mitochondrial gene expression during mitochondrial biogenesis. The study demonstrated that the synchronization between the two compartments does not
occur on the transcriptional level. Instead, they are rapidly synchronized on protein translation level41. It is becoming increasingly evident that subcellular localization of lncRNAs is
strongly associated with their function. In light of this, ongoing efforts such as the LncATLAS and RNALocate are aiming to establish a complete map for subcellular localization of lncRNAs
in various tissue types3,30. So far, these databases rely on data from cell lines and lack knowledge about the subcellular localization of lncRNA in tissues and during development. Studies
that have investigated cellular localization of lncRNAs during development were mainly performed in drosophila and zebrafish42,43. Therefore, in this study, we sought to provide global data
of lncRNA subcellular localization in the human brain for two developmental stages. In accordance with previous studies, we found that the majority of lncRNAs are more abundant in the
nucleus than in the cytosol22. We further validated previous findings of tissue-specific expression and subcellular localization of lncRNAs in the adult frontal cortex, fetal frontal cortex,
and cerebellum. The higher number of cytosolic and nuclear-enriched lncRNAs detected in the adult frontal lobe compared to the fetal samples is most probably due to the larger number of
adult brain samples used in this study, as well as a more homogeneous expression in the adult brain compared to the fetal samples (represented by 14 and 38-week embryos), providing better
power to detect significant differences in localization. Considering the differences in subcellular localization between the different brain samples, we investigated whether subcellular
localization of lncRNAs could vary between different tissue contexts. When we compared the subcellular localization patterns of lncRNAs in the brain samples and ten cell lines from the
ENCODE project, we clearly show that some lncRNAs have opposite subcellular localization patterns in different samples. This indicates that the subcellular localization of certain lncRNAs is
dynamic, which could be explained by several processes including tissue-specific functionality or regulation, or that some lncRNA are retained in the nuclear compartment to regulate their
function, assuming that their site of action is in the cytosol. Our results provide a resource for lncRNAs enriched in cytosol and nucleus in both fetal and adult human brain samples. We
also provide evidence for the differential subcellular localization of lncRNAs in tissues. These data further encourage the need to understand the mechanisms that control the subcellular
fate of lncRNAs and the biological consequences of this process. The impact of using transcriptome data obtained from either cytosolic or nuclear RNA as a reliable proxy for gene expression
analysis is an active topic of investigation. Recently, single nucleus RNA sequencing has been introduced as an alternative to whole single-cell sequencing, particularly in tissues where it
is challenging or impossible to recover intact cells33,44,45. In addition to the data presented in this study, several other studies demonstrated significant differences between the nuclear
and the cytosolic transcriptome17,46. Also, the compartmentalization of mRNA transcripts in the nucleus was shown to play an active role in regulating gene expression in different biological
contexts and to prevent fluctuations arising from bursts in gene transcription12,47. On the same note, several studies revealed a higher correlation between RNA-seq data from the cytosol
and whole-cell RNA-seq compared with RNA-seq data from nuclei15,47. On the other hand, several studies comparing data from single-nuclei and single-cell concluded that nuclear RNA sequencing
faithfully replicates data from the whole-cell33,36,44. In light of this, we sought to analyze gene expression differences between fetal and adult brain cortex, and compare the results from
cytosol vs cytosol RNA-seq to the results from nucleus vs nucleus RNA-seq. We identified almost twice the number of differentially expressed genes in the cyto vs cyto analysis compared to
the nuc vs nuc analysis. Our data show that even though there was an overlap between the lists of DEGs between two analyses, almost 50% of the cyto vs cyto and 35% of the nuc vs nuc DEGs
were unique for each analysis. We also note that 61 transcripts showed contradictory differential expression patterns between the two analyses. A recent study by Lake et al_._36. found high
concordance between single nuclei and whole-cell data and concluded the efficiency of single nuclei sequencing to interpret cellular diversity. We found on average more than 50% of
cytosolic-enriched and nuclear-enriched transcripts from their study overlap with cytosolic and nuclear-enriched transcripts found in our data, despite the fact that the studies were
performed in different species and with different RNA preparation and sequencing protocols. We also found overlap, particularly for the metabolic processes, between the two data sets for the
list of ontology categories enriched in both the cytosol (or whole cell in the Lake et al. study) and nucleus. While we find a large overlap in genes identified as differentially expressed
between samples using comparisons of cytosolic vs cytosolic and nuclear vs nuclear RNA, respectively, our data also highlight that certain categories of genes will give rise to differences
in results depending on the cellular compartment used. This includes lncRNA, which are generally overrepresented in the nucleus, but primarily thought to be located in the compartment where
they are functional. We further find many genes to be exclusively found in either cyto vs cyto or nuc vs nuc analysis, with e.g. differential expression of genes involved in translational
processes and ribosomal function to be better assayed using cytosolic RNA. In single-cell analysis, there are now different protocols ranging from those employing very mild lysis that will
primarily interrogate cytosolic RNA, to harsh lysis and single nucleus sequencing that will capture the whole cell or only nuclear RNA, respectively. Our results show important differences
that may arise depending on the protocol used, and highlight the importance of which subcellular fraction that is analyzed in interpreting and comparing results from single-cell studies.
CONCLUSIONS Our data provide the first resource for the subcellular localization of RNA transcript in the human brain. We show significant differences in RNA expression for both
protein-coding and lncRNAs between the cytosol and the nucleus. We also provide novel knowledge about the subcellular localization of NEMP transcripts, which could provide deeper insights
into mitochondrial gene expression regulation. Our data suggest that using cytosol or nuclear RNA as a source for gene expression analysis might bias measurements of transcript levels.
METHODS SAMPLE PREPARATION Cytoplasmic and nuclear RNA was fractionated and purified from brain samples and SHSY-5Y cells using Cytoplasmic and nuclear RNA purification kit (Norgen) with
modifications as published in28. For each sample, 20 mg of frozen tissue was ground in liquid nitrogen using mortar and pestle. Tissue powder was transferred to ice cold 1.5 ml tubes and 200
μl lysis buffer (Norgen) was added. The tubes were incubated on ice for 10 min and then centrifuged for 3 min at 13,000 RPM to separate the cellular fractions. The supernatant containing
the cytoplasmic fraction and the pellet containing the nuclei were mixed with 400 μl 1.6 M sucrose solution and carefully layered on the top of two 500 μl sucrose solution in two separate
tubes. Both fractions were then centrifuged at 13,000-RPM for 15 min (4 °C). The cytoplasmic fraction was collected from the top of the sucrose cushion and the cytoplasmic RNA was then
further purified according to Norgen kit recommendations. The nuclear pellet was collected from the bottom of the tube and washed with 200 μl 1 × PBS. The nuclei were collected after another
centrifugation at 13,000 RPM for 3 min. The nuclear RNA was purified from the nuclear fraction according to the Norgen kit recommendations. The quality of the RNA extracted from the
cytosolic and nuclear fractions was examined using Bioanalyzer (Supplementary Fig. 8). As an additional support for the efficiency of our extraction protocol, we demonstrated that known
Small nuclear RNAs (snRNAs), in agreement with previously published data, are predominantly located in the nuclear fraction ((Supplementary Fig. 9). Prior to RNA extraction, SHSY-5Y cells
were cultured in 1:1 mixture of Ham's F12 and DMEM (Gibco) medium (Gibco) with L-glutamine and phenol red, supplemented with 2 mM l-glutamine (Sigma), 10% FBS (Sigma) and 1 × PEST
(Sigma). Cells were incubated at 37 °C, 5% CO2. 3 × 107 cells were used for RNA purification. RNA SEQUENCING _C_ytoplasmic RNA was treated with the Ribo-Zero Magnetic Gold Kit for
human/mouse/Rat (Epicentre) to remove ribosomal RNA, and purified using Agencourt RNAClean XP Kit (Beckman Coulter). The rRNA depleted cytoplasmic RNA and the nuclear RNA were then treated
with RNaseIII according to Ion Total RNA-Seq protocol v2 and purified with Magnetic Bead Cleanup Module (Thermo Fisher). Sequencing libraries were prepared using the Ion Total RNA-Seq Kit
for the AB Library Builder System and quantified using the Fragment analyzer (Advanced Analytical). The quantified libraries were pooled followed by emulsion PCR on the Ion OneTouch 2 system
and sequenced on the Ion Proton System. Nuclear RNA was sequenced on two Ion Proton chips yielding an average 80.5 million mapped reads per sample and cytosolic RNA was sequenced on one
chip yielding 46.9 million mapped reads per sample. READ MAPPING AND EXPRESSION QUANTIFICATION Quality metrics of the raw data were determined with RSeqC 2.6.148. Reads shorter than 36 bp
were filtered out and adaptor sequences were trimmed from the remaining data using Cutadapt v1.9.1. Read alignment was done with STAR v2.4.249, with default parameters using the human
reference genome (hg19). Gene expression was quantified from the alignments using HTSeq count (v0.6.1)50 by providing Ensembl gene annotation (v75) in GTF format. Only uniquely mapped reads
overlapping with exonic regions of a single gene were considered for quantification. Gene biotype annotation was extracted from the Ensembl GTF file. DIFFERENTIAL EXPRESSION ANALYSIS
Differential expression analysis was performed on counts of mapped reads for each gene in R version 3.2.151_._ Strand-specific read counts were used with the R package DESeq2 (v1.12.4)52 to
identify differentially expressed genes (adjusted _p-value_ < 0.05, Benjamini–Hochberg correction). Gene counts were normalized prior to differential analysis using the normalization
method implemented in DESeq2. To identify differentially expressed snoRNA genes, all the rRNA and tRNA genes were removed from the quantified genes prior to performing differential analysis.
For the cyto–cyto and nuc–nuc differential gene expression analysis, we subsampled the read counts from the samples to obtain equal library sizes using metaseqR53. Differential expression
analysis was performed in the similar way as mentioned earlier. SAMPLES CLUSTERING AND PCA ANALYSIS Euclidean distances between the samples were calculated on regularized log-transformed
(Rlog) counts. The distance matrix was used to compute hierarchical clustering of the samples. The Principal Component Analysis (PCA) was performed with prcomp function in R using Rlog
values of top 500 genes with high variance. Clustering and PCA were based on protein-coding and lncRNA genes. GENE ONTOLOGY ENRICHMENT The gene ontology (GO) term enrichment of DEGs was
analyzed using GOSeq (v.1.24)54, controlling for gene-length bias. All genes expressed in the samples were used as a background set in order to find over-represented GO terms in the DEGs.
Hypergeometric test (Benjamini and Hochberg correction) was performed to compensate for multiple testing at a significance level set to a value of < 0.05 in our analyses. GENE LENGTH
DISTRIBUTION Genes expressed in all fetal and adult tissues (base mean expression ≥ 10) were sorted by length. We then employed a sliding window of 200 genes in steps of 40 genes. The
log2-fold-change values for the 200 genes within each length bin were averaged and standard errors (SE) for a bin were calculated by propagating the SE determined from the bin’s
log2-fold-change values and the mean SE of the individual genes reflecting their sample variability. Similarly, we computed the relationship of transcript and UTRs length, and gene length
with expression (RPKM). NUCLEAR-ENCODED-MITOCHONDRIAL PROTEINS A list of 1158 human nuclear-encoded-mitochondrial proteins (NEMPs) was obtained from the Mitocarta2.0 project
(https://www.broadinstitute.org/pubs/MitoCarta/). Expression fold changes were estimated after normalizing the samples for only protein-coding genes using DESeq2. Significance of difference
in fold change distribution between NEMPs and other protein-coding genes (baseMean ≥ 10) was determined using the nonparametric Mann–Whitney–Wilcoxon test. ENCODE DATA Gene counts (hg19) of
polyA and non-polyA RNAseq data of cytoplasmic and nuclear fractions for the cell lines: GM12878, HUVEC, HepG2, HeLa-S3, NHEK, K562, IMR90, MCF-7, A549, and SK-N-SH, were obtained from
ENCODE project (GEO:GSE30567). For NEMPs and ribosomal protein-coding genes analyses, polyA data were normalized for protein-coding genes. For lncRNA analysis, gene counts (polyA) were
normalized for protein-coding and lncRNA genes. RNA FISH RNA FISH assays were performed using a custom-designed oligonucleotide probe set for OAT and SLC25A5, both labeled with CAL Fluor Red
610 and DesignReady Probe Sets for MALAt1 labeled with Quasar 670 Dye, and GAPDH labeled with CAL Fluor Red 610 (Stellaris, LGC Biosearch Technologies). 48 oligonucleotides were selected
for OAT, MALAT1, and GAPDH, and 27 oligonucleotides for SLC25A5. All probes were designed with a minimal mismatch. RNA FISH was carried using hybridization and wash buffers from LGC
Biosearch Technologies. In short, 10^5 human SHSY-5Y cells were seeded on 18 mm #1 round cover glass (VWR) in a 12 well culture plate. Cells were cultured in 1:1 mixture of Ham's F12
and DMEM (Gibco) medium (Gibco) with l-glutamine and phenol red, supplemented with 2 mM l-glutamine (Sigma), 10% FBS (Sigma) and 1 × PEST (Sigma). Cells were maintained in culture at 37 °C
and 5% CO2. Following overnight incubation, cells were fixed and permeabilized according to manufacturer recommendations. Cells were incubated for 16 h with FISH probes in the hybridization
buffer (LGC Biosearch Technologies). Cells were then washed using wash buffer A (LGC Biosearch Technologies) for 30 min at 37 °C, and DNA stained with DAPI (1 μg/ml) for 30 min at 37 °C.
Finally, cells were washed in wash buffer B, and coverslips were mounted in Vectashield anti-fade (VectorLabs) and imaged immediately. Imaging was carried out on Zeiss Axio Imager Z2
Widefield Fluorescence Microscopy using an Olympus 63x/1.42 PlanApo objective and using and ZEN imaging software. Z-stacks were collected at 0.5 μm intervals for 18 fields per sample.
Z-stacks were deconvolved using orthogonal projection (ZEN imaging software). ETHICS APPROVAL AND CONSENT All samples were collected with informed consent and the study was approved by the
regional ethical review board in Uppsala (dnr 2012/082) in accordance with regulations, guidelines, ethics codes and laws that regulate and place ethical demands on the research process in
Uppsala University. DATA AVAILABILITY RNA-seq data are deposited in the NCBI Sequence Read Archive (SRA) under accession GSE110727). REFERENCES * Martin, K. C. & Ephrussi, A. mRNA
localization: Gene expression in the spatial dimension. _Cell_ 136, 719–730. https://doi.org/10.1016/j.cell.2009.01.044 (2009). Article CAS PubMed PubMed Central Google Scholar *
Kuersten, S. & Goodwin, E. B. The power of the 3’ UTR: Translational control and development. _Nat. Rev. Genet._ 4, 626–637. https://doi.org/10.1038/nrg1125 (2003). Article CAS PubMed
Google Scholar * Zhang, T. _et al._ RNALocate: A resource for RNA subcellular localizations. _Nucleic Acids Res._ https://doi.org/10.1093/nar/gkw728 (2016). Article PubMed PubMed
Central Google Scholar * Taliaferro, J. M., Wang, E. T. & Burge, C. B. Genomic analysis of RNA localization. _RNA Biol._ 11, 1040–1050. https://doi.org/10.4161/rna.32146 (2014).
Article PubMed PubMed Central Google Scholar * Buxbaum, A. R., Haimovich, G. & Singer, R. H. In the right place at the right time: Visualizing and understanding mRNA localization.
_Nat. Rev. Mol. Cell Biol._ 16, 95–109. https://doi.org/10.1038/nrm3918 (2015). Article CAS PubMed Google Scholar * St Johnston, D. Moving messages: The intracellular localization of
mRNAs. _Nat. Rev. Mol. Cell Biol._ 6, 363–375. https://doi.org/10.1038/nrm1643 (2005). Article CAS PubMed Google Scholar * Steward, O. & Schuman, E. M. Compartmentalized synthesis
and degradation of proteins in neurons. _Neuron_ 40, 347–359 (2003). Article CAS PubMed Google Scholar * Marrison, J. L., Schunmann, P., Ougham, H. J. & Leech, R. M. Subcellular
visualization of gene transcripts encoding key proteins of the chlorophyll accumulation process in developing chloroplasts. _Plant Physiol._ 110, 1089–1096 (1996). Article CAS PubMed
PubMed Central Google Scholar * Sylvestre, J., Vialette, S., Corral Debrinski, M. & Jacq, C. Long mRNAs coding for yeast mitochondrial proteins of prokaryotic origin preferentially
localize to the vicinity of mitochondria. _Genome Biol._ 4, R44. https://doi.org/10.1186/gb-2003-4-7-r44 (2003). Article PubMed PubMed Central Google Scholar * Russo, A., Russo, G.,
Cuccurese, M., Garbi, C. & Pietropaolo, C. The 3’-untranslated region directs ribosomal protein-encoding mRNAs to specific cytoplasmic regions. _Biochim. Biophys. Acta_ 1763, 833–843.
https://doi.org/10.1016/j.bbamcr.2006.05.010 (2006). Article CAS PubMed Google Scholar * Barthelson, R. A., Lambert, G. M., Vanier, C., Lynch, R. M. & Galbraith, D. W. Comparison of
the contributions of the nuclear and cytoplasmic compartments to global gene expression in human cells. _BMC Genomics_ 8, 340. https://doi.org/10.1186/1471-2164-8-340 (2007). Article CAS
PubMed PubMed Central Google Scholar * Bahar Halpern, K. _et al._ Nuclear retention of mRNA in mammalian tissues. _Cell Rep._ 13, 2653–2662. https://doi.org/10.1016/j.celrep.2015.11.036
(2015). Article CAS PubMed PubMed Central Google Scholar * Hachet, O. & Ephrussi, A. Splicing of oskar RNA in the nucleus is coupled to its cytoplasmic localization. _Nature_ 428,
959–963. https://doi.org/10.1038/nature02521 (2004). Article ADS CAS PubMed Google Scholar * Lawrence, J. B. & Singer, R. H. Intracellular localization of messenger RNAs for
cytoskeletal proteins. _Cell_ 45, 407–415 (1986). Article CAS PubMed Google Scholar * Solnestam, B. W. _et al._ Comparison of total and cytoplasmic mRNA reveals global regulation by
nuclear retention and miRNAs. _BMC Genomics_ 13, 574. https://doi.org/10.1186/1471-2164-13-574 (2012). Article CAS PubMed PubMed Central Google Scholar * Trask, H. W. _et al._
Microarray analysis of cytoplasmic versus whole cell RNA reveals a considerable number of missed and false positive mRNAs. _RNA_ 15, 1917–1928. https://doi.org/10.1261/rna.1677409 (2009).
Article CAS PubMed PubMed Central Google Scholar * Chen, L. A global comparison between nuclear and cytosolic transcriptomes reveals differential compartmentalization of alternative
transcript isoforms. _Nucleic Acids Res._ 38, 1086–1097. https://doi.org/10.1093/nar/gkp1136 (2010). Article CAS PubMed Google Scholar * Neve, J. _et al._ Subcellular RNA profiling links
splicing and nuclear DICER1 to alternative cleavage and polyadenylation. _Genome Res._ 26, 24–35. https://doi.org/10.1101/gr.193995.115 (2016). Article CAS PubMed PubMed Central Google
Scholar * Derrien, T. _et al._ The GENCODE v7 catalog of human long noncoding RNAs: Analysis of their gene structure, evolution, and expression. _Genome Res._ 22, 1775–1789.
https://doi.org/10.1101/gr.132159.111 (2012). Article CAS PubMed PubMed Central Google Scholar * Iyer, M. K. _et al._ The landscape of long noncoding RNAs in the human transcriptome.
_Nat. Genet._ 47, 199–208. https://doi.org/10.1038/ng.3192 (2015). Article CAS PubMed PubMed Central Google Scholar * Ayupe, A. C. _et al._ Global analysis of biogenesis, stability and
sub-cellular localization of lncRNAs mapping to intragenic regions of the human genome. _RNA Biol._ 12, 877–892. https://doi.org/10.1080/15476286.2015.1062960 (2015). Article PubMed PubMed
Central Google Scholar * Djebali, S. _et al._ Landscape of transcription in human cells. _Nature_ 489, 101–108. https://doi.org/10.1038/nature11233 (2012). Article ADS CAS PubMed
PubMed Central Google Scholar * Bhatt, D. M. _et al._ Transcript dynamics of proinflammatory genes revealed by sequence analysis of subcellular RNA fractions. _Cell_ 150, 279–290.
https://doi.org/10.1016/j.cell.2012.05.043 (2012). Article CAS PubMed PubMed Central Google Scholar * van Heesch, S. _et al._ Extensive localization of long noncoding RNAs to the
cytosol and mono- and polyribosomal complexes. _Genome Biol._ 15, R6. https://doi.org/10.1186/gb-2014-15-1-r6 (2014). Article CAS PubMed PubMed Central Google Scholar * Cabili, M. N.
_et al._ Localization and abundance analysis of human lncRNAs at single-cell and single-molecule resolution. _Genome Biol_ 16, 20. https://doi.org/10.1186/s13059-015-0586-4 (2015). Article
CAS PubMed PubMed Central Google Scholar * Chen, L. L. Linking long noncoding RNA localization and function. _Trends Biochem. Sci._ 41, 761–772.
https://doi.org/10.1016/j.tibs.2016.07.003 (2016). Article CAS PubMed Google Scholar * Hochgerner, H. _et al._ STRT-seq-2i: Dual-index 5’ single cell and nucleus RNA-seq on an
addressable microwell array. _Sci. Rep._ 7, 16327. https://doi.org/10.1038/s41598-017-16546-4 (2017). Article ADS CAS PubMed PubMed Central Google Scholar * Zaghlool, A. _et al._
Efficient cellular fractionation improves RNA sequencing analysis of mature and nascent transcripts from human tissues. _BMC Biotechnol._ 13, 99. https://doi.org/10.1186/1472-6750-13-99
(2013). Article CAS PubMed PubMed Central Google Scholar * Ameur, A. _et al._ Total RNA sequencing reveals nascent transcription and widespread co-transcriptional splicing in the human
brain. _Nat. Struct. Mol. Biol._ 18, 1435–1440. https://doi.org/10.1038/nsmb.2143 (2011). Article CAS PubMed Google Scholar * Mas-Ponte, D. _et al._ LncATLAS database for subcellular
localization of long noncoding RNAs. _RNA_ 23, 1080–1087. https://doi.org/10.1261/rna.060814.117 (2017). Article CAS PubMed PubMed Central Google Scholar * Calvo, S. E., Clauser, K. R.
& Mootha, V. K. MitoCarta2.0: An updated inventory of mammalian mitochondrial proteins. _Nucleic Acids Res_ 44, D1251–1257, https://doi.org/10.1093/nar/gkv1003 (2016). * Pagliarini, D.
J. _et al._ A mitochondrial protein compendium elucidates complex I disease biology. _Cell_ 134, 112–123. https://doi.org/10.1016/j.cell.2008.06.016 (2008). Article CAS PubMed PubMed
Central Google Scholar * Grindberg, R. V. _et al._ RNA-sequencing from single nuclei. _Proc. Natl. Acad. Sci. U S A_ 110, 19802–19807. https://doi.org/10.1073/pnas.1319700110 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar * Consortium, E. P. An integrated encyclopedia of DNA elements in the human genome. _Nature_ 489, 57–74.
https://doi.org/10.1038/nature11247 (2012). Article ADS CAS Google Scholar * Sharova, L. V. _et al._ Database for mRNA half-life of 19 977 genes obtained by DNA microarray analysis of
pluripotent and differentiating mouse embryonic stem cells. _DNA Res_ 16, 45–58. https://doi.org/10.1093/dnares/dsn030 (2009). Article CAS PubMed Google Scholar * Lake, B. B. _et al._ A
comparative strategy for single-nucleus and single-cell transcriptomes confirms accuracy in predicted cell-type expression from nuclear RNA. _Sci. Rep._ 7, 6031.
https://doi.org/10.1038/s41598-017-04426-w (2017). Article ADS CAS PubMed PubMed Central Google Scholar * Smith, A. C. & Robinson, A. J. MitoMiner v3.1, an update on the
mitochondrial proteomics database. _Nucleic Acids Res_ 44, D1258–1261, https://doi.org/10.1093/nar/gkv1001 (2016). * Lesnik, C., Golani-Armon, A. & Arava, Y. Localized translation near
the mitochondrial outer membrane: An update. _RNA Biol._ 12, 801–809. https://doi.org/10.1080/15476286.2015.1058686 (2015). Article PubMed PubMed Central Google Scholar *
Corral-Debrinski, M., Blugeon, C. & Jacq, C. In yeast, the 3’ untranslated region or the presequence of ATM1 is required for the exclusive localization of its mRNA to the vicinity of
mitochondria. _Mol. Cell Biol._ 20, 7881–7892. https://doi.org/10.1128/mcb.20.21.7881-7892.2000 (2000). Article CAS PubMed PubMed Central Google Scholar * Pearce, S. F. _et al._
Regulation of mammalian mitochondrial gene expression: Recent advances. _Trends Biochem. Sci._ 42, 625–639. https://doi.org/10.1016/j.tibs.2017.02.003 (2017). Article CAS PubMed PubMed
Central Google Scholar * Couvillion, M. T., Soto, I. C., Shipkovenska, G. & Churchman, L. S. Synchronized mitochondrial and cytosolic translation programs. _Nature_ 533, 499–503.
https://doi.org/10.1038/nature18015 (2016). Article ADS CAS PubMed PubMed Central Google Scholar * Wilk, R., Hu, J., Blotsky, D. & Krause, H. M. Diverse and pervasive subcellular
distributions for both coding and long noncoding RNAs. _Genes Dev._ 30, 594–609. https://doi.org/10.1101/gad.276931.115 (2016). Article CAS PubMed PubMed Central Google Scholar * Pauli,
A. _et al._ Systematic identification of long noncoding RNAs expressed during zebrafish embryogenesis. _Genome Res._ 22, 577–591. https://doi.org/10.1101/gr.133009.111 (2012). Article CAS
PubMed PubMed Central Google Scholar * Krishnaswami, S. R. _et al._ Using single nuclei for RNA-seq to capture the transcriptome of postmortem neurons. _Nat. Protoc._ 11, 499–524.
https://doi.org/10.1038/nprot.2016.015 (2016). Article CAS PubMed PubMed Central Google Scholar * Habib, N. _et al._ Massively parallel single-nucleus RNA-seq with DroNc-seq. _Nat.
Methods_ 14, 955–958. https://doi.org/10.1038/nmeth.4407 (2017). Article CAS PubMed PubMed Central Google Scholar * Battich, N., Stoeger, T. & Pelkmans, L. Control of transcript
variability in single mammalian cells. _Cell_ 163, 1596–1610. https://doi.org/10.1016/j.cell.2015.11.018 (2015). Article CAS PubMed Google Scholar * Pastro, L. _et al._ Nuclear
compartmentalization contributes to stage-specific gene expression control in _Trypanosoma cruzi_. _Front. Cell Dev. Biol._ 5, 8. https://doi.org/10.3389/fcell.2017.00008 (2017). Article
PubMed PubMed Central Google Scholar * Wang, L., Wang, S. & Li, W. RSeQC: Quality control of RNA-seq experiments. _Bioinformatics_ 28, 2184–2185.
https://doi.org/10.1093/bioinformatics/bts356 (2012). Article CAS PubMed Google Scholar * Dobin, A. _et al._ STAR: Ultrafast universal RNA-seq aligner. _Bioinformatics_ 29, 15–21.
https://doi.org/10.1093/bioinformatics/bts635 (2013). Article CAS PubMed Google Scholar * Anders, S., Pyl, P. T. & Huber, W. HTSeq—A Python framework to work with high-throughput
sequencing data. _Bioinformatics_ 31, 166–169. https://doi.org/10.1093/bioinformatics/btu638 (2015). Article CAS PubMed Google Scholar * R Core Team. _R: A Language and Environment for
Statistical Computing_. (R Foundation for Statistical Computing, Vienna, 2015) https://www.R-project.org/. * Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and
dispersion for RNA-seq data with DESeq2. _Genome Biol._ 15, 550. https://doi.org/10.1186/s13059-014-0550-8 (2014). Article CAS PubMed PubMed Central Google Scholar * Moulos, P. &
Hatzis, P. Systematic integration of RNA-Seq statistical algorithms for accurate detection of differential gene expression patterns. _Nucleic Acids Res._ 43, e25.
https://doi.org/10.1093/nar/gku1273 (2015). Article CAS PubMed Google Scholar * Young, M. D., Wakefield, M. J., Smyth, G. K. & Oshlack, A. Gene ontology analysis for RNA-seq:
Accounting for selection bias. _Genome Biol._ 11, R14. https://doi.org/10.1186/gb-2010-11-2-r14 (2010). Article CAS PubMed PubMed Central Google Scholar Download references
ACKNOWLEDGEMENTS The authors would like to acknowledge support from Science for Life Laboratory, the National Genomics Infrastructure, NGI, and Uppmax for providing assistance in massive
parallel sequencing and computational infrastructure. We are also grateful for the Bioinformatics Long-term Support from NBIS (SciLifeLab) for this project. This work was supported by grants
Swedish Research Council (2012-4530 and 2012-2639) and the European Research Council ERC Starting Grant Agreement n. 282330. FUNDING Open access funding provided by Uppsala University.
AUTHOR INFORMATION Author notes * These authors contributed equally: Ammar Zaghlool and Adnan Niazi. AUTHORS AND AFFILIATIONS * Department of Immunology, Genetics and Pathology, Uppsala
University, BMC B11:4, Box 815, 751 08, Uppsala, Sweden Ammar Zaghlool, Adnan Niazi, Adam Ameur & Lars Feuk * Science for Life Laboratory in Uppsala, Uppsala University, Uppsala, Sweden
Ammar Zaghlool, Adnan Niazi, Adam Ameur & Lars Feuk * Department of Cell and Molecular Biology, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Uppsala
University, Husargatan 3, 752 37, Uppsala, Sweden Åsa K. Björklund * Department of Biochemistry and Biophysics, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory,
Stockholm University, Box 1031, 17121, Solna, Sweden Jakub Orzechowski Westholm Authors * Ammar Zaghlool View author publications You can also search for this author inPubMed Google Scholar
* Adnan Niazi View author publications You can also search for this author inPubMed Google Scholar * Åsa K. Björklund View author publications You can also search for this author inPubMed
Google Scholar * Jakub Orzechowski Westholm View author publications You can also search for this author inPubMed Google Scholar * Adam Ameur View author publications You can also search for
this author inPubMed Google Scholar * Lars Feuk View author publications You can also search for this author inPubMed Google Scholar CONTRIBUTIONS A.Z. and L.F. conceived and designed the
study. A.Z., A.N., Å.B., J.W, A.A and L.F. coordinated experiments and analysis. A.N., Å.B conducted the bioinformatics analysis. A.Z. did the sample preparation and experimental analysis.
All authors participated in discussions of different parts of the study. A.Z., A.N., Å.B., and L.F. wrote the manuscript. All authors read and approved the manuscript. CORRESPONDING AUTHORS
Correspondence to Ammar Zaghlool or Lars Feuk. ETHICS DECLARATIONS COMPETING INTERESTS The authors declare no competing interests. ADDITIONAL INFORMATION PUBLISHER'S NOTE Springer
Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. SUPPLEMENTARY INFORMATION SUPPLEMENTARY TABLES. SUPPLEMENTARY FIGURE S1.
SUPPLEMENTARY FIGURE S2. SUPPLEMENTARY FIGURE S3. SUPPLEMENTARY FIGURE S4. SUPPLEMENTARY FIGURE S5. SUPPLEMENTARY FIGURE S6. SUPPLEMENTARY FIGURE S7. SUPPLEMENTARY FIGURE S8. SUPPLEMENTARY
FIGURE S9. RIGHTS AND PERMISSIONS OPEN ACCESS This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution
and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if
changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the
material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will
need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. Reprints and permissions ABOUT THIS ARTICLE
CITE THIS ARTICLE Zaghlool, A., Niazi, A., Björklund, Å.K. _et al._ Characterization of the nuclear and cytosolic transcriptomes in human brain tissue reveals new insights into the
subcellular distribution of RNA transcripts. _Sci Rep_ 11, 4076 (2021). https://doi.org/10.1038/s41598-021-83541-1 Download citation * Received: 24 April 2020 * Accepted: 20 January 2021 *
Published: 18 February 2021 * DOI: https://doi.org/10.1038/s41598-021-83541-1 SHARE THIS ARTICLE Anyone you share the following link with will be able to read this content: Get shareable
link Sorry, a shareable link is not currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative