Alternative splicing variability in human populations
Mar Gonzalez-Porta, Micha Sammeth, Miquel Calvo, Roderic Guigo
Center for Genomic Regulation, Barcelona, Catalonia, Spain

We have developed statistical methodology to measure variation in gene expression and splicing ratios within and between populations, and to deconvolute the contribution of each of them to total variability in the abundances of individual transcripts. We have applied this methodology to estimates of transcript abundances obtained from RNA-seq experiments in lymphoblastoid cells from Caucasian and Yoruban individuals. We have found that protein coding genes exhibit reduced gene expression variability in human populations, and an even greater reduction in splicing ratios, with many genes exhibiting constant ratios across individuals. Consistent with this observation, we have found that genes involved in the regulation of splicing show less expression variability than human genes overall. While there is correlation in splicing variability between populations, up to 10% of protein coding genes could exhibit population-specific splicing ratios. We estimate that about 50% of the total variability observed in the abundance of transcript forms can be explained by variability in transcription. A large fraction of the remaining variance can likely result from variability in splicing, although variability in splicing is uncommon without variability in transcription. Genes with high total variability (resulting from variability both in transcription and splicing) are particularly enriched in RNA binding functions. Consistent with this finding (and with the reduced variability of splicing factors), we have also found that long non coding RNAs show higher expression variability than protein coding genes. This suggests that variation in expression of long non coding RNAs may play an important role in establishing the molecular basis of intraspecies phenotypic individuality.