Recent work has identified a new component of the proteome; the translation of small open reading frames (sORFs) located with the untranslated and coding regions of mRNAs and a variety of non-coding transcripts. In this study we have focused on the coding potential of upstream open reading frames (uORFs). We hypothesize that some uORFs encode bio-active peptides which we have termed uPEPs (uORF encoded peptides). Significantly, there are reports showing uORF mutations associated with gene disorders including Marie Unna hereditary hypotrichosis.
We used an in-house sequence comparison program (uPEPperoni) to identify ~500 uPEP sequences that show a high level of conservation between human and other vertebrate transcripts. We cloned several of these into GFP expression vectors and expressed them in HeLa cells. uPEP expression was visualized by confocal microscopy and five exhibited organelle-specific localization. Additionally, we confirmed the localization of three uPEPs using synthetic peptides tagged with a fluorescent label. We complemented these studies with mass spectrometry to confirm the expression of peptides derived from uORFs. Lysates from HeLa and 293 cells were enriched for low molecular weight proteins, digested with trypsin and analysed by nanoLC-MS/MS. The resulting MS/MS data was searched against SwissProt and then all unmatched spectra were searched against the uPEPperoni output. MS/MS spectra matching peptides encoded by uORFs were identified from both cell types. In particular, multiple peptides matching an 11.1 kDa uPEP were identified. Bioinformatic analysis showed that this uPEP is conserved across a number of species including human, mouse, orangutan, rat, zebra fish and salmon.
Our study has shown that uORF-derived peptides are expressed in human cells and that a significant number of them are conserved across several species, indicating possible bio-activity. It now remains to determine what roles these peptides play and what the consequences are if they are altered by mutation.