Enter your query nucleotide sequence:


Choose RefSeq database to search:


- Settings

- General

Estimate Ka/Ks ratio:
Reference Heatmaps:

Marked Regions:
Region:
 to 
Description:

+ Alignment Parameters

+ Heatmap Generation

 
Enter your query nucleotide sequence:


Enter your reference nucleotide sequence:


- Settings

- General

Draw CDS:
Locate uORFs:
uORF Length:  Codon Minimum
 Codon Maximum
Grace Length:
Allow  nucleotides into CDS
Marked Regions:
Region:
 to 
Description:

+ Alignment Parameters

+ Heatmap Generation

 
This is the help documentation for the uPEPperoni web program. Here you will find an overview of the uPEPperoni program as well as example inputs and descriptions of each of the adjustable parameters. uPEPperoni is divided into two subprograms, the Conserved uPEP Search utility and the Heatmap Generation utility. You can skip the uPEPperoni overview and jump straight to the documentation of these subprograms by clicking on the respective link.

Overview of uPEPperoni

(The background section of Crowe, Wang and Rothnagel, BMC Genomics 2006, 7:16 offers an excellent lead-in to the following).

uPEPperoni was designed to assist in the location and identification of upstream open reading frames (uORFs) that have the potential to encode bioactive peptides (uPEPs). In order to facilitate quick identification of conserved uORFs, it generates "heatmaps" that allow for visual comparison of pairs of sequences for regions of localised sequence similarity. As a result, uPEPperoni is divided into two subprograms. Conserved uPEP Search allows for the identification of conserved uORFs and uses the full functionality of uPEPperoni, while the Heatmap Generation utility generates heatmaps from any user-entered pair of sequences, and will show regions of localised sequence similarity in an arbitrary pair of nucleotide sequences.

The two subprograms allow for entry into the uPEPperoni pipeline at different stages (Figure 1). The Conserved uPEP Search takes user-entered query sequences and compares them using BLAST to a database of eukaryotic uORFs derived from the RefSeq mRNA release datafiles. It then generates heatmaps based on the alignment of query/BLAST hit (reference) transcript pairs. The Heatmap Generation allows entry at this point, with user entered query/reference sequences. If the locations of the uPEP and coding sequence (CDS) are known for both transcripts, Ka/Ks ratios for both regions are calculated. Both subprograms will accept a user-entered plain sequence of nucleotides ('AATGCGATAGC...', for example), or the RefSeq accession/GI identifier of a query mRNA (eg. 'NM_003463' or 'GI:62865860').

Figure 1: Diagram of the uPEPperoni pipeline. Coloured arrows denote variations in the pathway based on the type of query.

Full descriptions of the settings of each subprogram as well as examples of each are given in the respective subprogram's documentation below.

Conserved uPEP Search

The Conserved uPEP Search utility (Figure 2) is the default subprogram of uPEPperoni and is displayed when the uPEPperoni main site is first loaded. It can also be accessed by selecting the Conserved uPEP Search tab located on the grey menu bar below the uPEPperoni logo.

Figure 2: The Conserved uPEP Search form

The Conserved uPEP Search form contains a textbox for entering a query sequence (the RefSeq accession 'NM_001007775' has been entered in Figure 2), a combo box for selecting the internal uORF reference databases (the complete uORF database has been selected in Figure 2), and an expandable settings panel. The settings panel can be expanded/collapsed by clicking the "+ Settings" hyperlink and furthermore, each subsection can be expanded/collapsed by clicking on the "+ General", "+ Alignment Parameters", "+ Heatmap Generation" text hyperlinks.

Settings - General:

Calculate Ka/Ks ratio: Checking this option will ask uPEPperoni to estimate synonymous and nonsynonymous substitution rates using the method of Yang and Nielsen (2000), implemented in a library compiled from modified source code of the yn00 program in the PAML package (Yang 2007). The integration of this library into uPEPperoni was done with the permission of the author of PAML, Prof. Ziheng Yang.

Generate Reference Heatmaps: The default behaviour of uPEPperoni after sequence pair alignment is to generate a heatmap representation of the query transcript. However, in certain situations, it is preferable to have a heatmap representation of the BLAST hit or reference sequence (one such example is in the Conserved uPEP Search documentation - Examples section). Selection of this option will generate heatmaps of both sequences involved in an alignment.

Marked Regions: The marked regions section allows you to specify domains and give them descriptions. At the moment, the descriptions aren't added to the final heatmap, so they are only useful to keep track of what has been entered. To specify a region/domain type in the start and end nucleotides in the two boxes below the small "Region:" caption. A description of the region may be entered in the box below the "Description:" caption. Press the "<< Add" button to add the domain. You can remove added domains by selecting them (use Shift or Ctrl to select multiple: Shift for ranges, Ctrl to select multiple individuals), and clicking the "Remove >>" button. The domains will show up as black bars on the heatmap.

Settings - Alignment:

Standard parameters of any sequence alignment. The "Nucleotide Match:" and "Nucleotide Mismatch:" parameters specify the reward and penalty for nucleotide matches/mismatches, while "Gap Existence Penalty:" and "Gap Extension Penalty:" specify the penalties for the opening and extension of gaps in the alignment.

Settings - Heatmap Generation:

Gradient Options: The gradient options allow users to modify the reference colour (heat) gradient from which heatmaps are generated. Placing a value beneath a colour associates that colour with the percentage sequence identity of the value, and colours for inbetween values are linearly interpolated. Threshold values can be created by using values other than 0 and 100 at the extremes.

Window Size: This value allows users to change the size of the window used to to calculate the percentage sequence identity surrounding a nucleotide. Smaller values give greater resolution, while large values allow overall trends to be seen.

Heatmap Width: This option specifies the width (in pixels) of the heatmap uPEPperoni produces.

Conserved uPEP Search - Examples

Examples of usage for both the Conserved uPEP search and Heatmap Generation utilities can be found here.

Heatmap Generation

The Heatmap Generation utility (Figure 3) can be accessed by selecting the Heatmap Generation tab located on the grey menu bar below the uPEPperoni logo.

Figure 3: The Heatmap Generation form

The Heatmap Generation form contains two textboxes, one each for the query and reference sequences and an expandable settings panel. The settings panel can be expanded/collapsed by clicking the "+ Settings" hyperlink and furthermore, each subsection can be expanded/collapsed by clicking on the "+ General", "+ Alignment Parameters", "+ Heatmap Generation" text hyperlinks.

Settings - General:

Draw CDS: If a RefSeq accession query is given, the coding sequence (CDS) of the query sequence will be shown on the final heatmap.

Locate uORFs: If a RefSeq accession query is given, any uORFs matching the entered parameters will be shown on the final heatmap. The parameters allow the user to modify the minimum and maximum size of a uORF as well as how many nucleotides into the CDS a uORF is allowed.

Marked Regions: The marked regions section allows you to specify domains and give them descriptions. At the moment, the descriptions aren't added to the final heatmap, so they are only useful to keep track of what has been entered. To specify a region/domain type in the start and end nucleotides in the two boxes below the small "Region:" caption. A description of the region may be entered in the box below the "Description:" caption. Press the "<< Add" button to add the domain. You can remove added domains by selecting them (use Shift or Ctrl to select multiple: Shift for ranges, Ctrl to select multiple individuals), and clicking the "Remove >>" button. The domains will show up as black bars on the heatmap.

Settings - Alignment:

Standard parameters of any sequence alignment. The "Nucleotide Match:" and "Nucleotide Mismatch:" parameters specify the reward and penalty for nucleotide matches/mismatches, while "Gap Existence Penalty:" and "Gap Extension Penalty:" specify the penalties for the opening and extension of gaps in the alignment.

Settings - Heatmap Generation:

Gradient Options: The gradient options allow users to modify the reference colour (heat) gradient from which heatmaps are generated. Placing a value beneath a colour associates that colour with the (sequence) percentage identity of the value, and colours for inbetween values are linearly interpolated. Threshold values can be created by using values other than 0 and 100 at the extremes.

Window Size: This value allows users to change the size of the window used to to calculate the percentage sequence identity surrounding a nucleotide. Smaller values give greater resolution, while large values allow overall trends to be seen.

Heatmap Width: This option specifies the width (in pixels) of the heatmap uPEPperoni produces.

Database updates/schedule

On the 1st day of each month uPEP will check to see if there is a new NCBI RefSeq release. If RefSeq has been updated, uPEP will download the required data files and new uORF reference databases will be built. With a successful building completion, the new databases will become the default queried. The uPEP server will be turned to maintainence mode while the new databases are activated. The downtime will be a few minutes. The version of RefSeq that the uORF reference databases are based is always displayed at the top of the results page. We archivally store all old uORF reference databases. Please contact us if you would like access to a previous version (citing the RefSeq version from which it was based).

Acknowledgements

The authors of uPEPperoni would like to thank Prof. Ziheng Yang for his permission to include libraries generated from his PAML source code.

References

Yang, Z. 2007. PAML 4: a program package for phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution 24: 1586-1591.

Yang, Z., and R. Nielsen. 2000. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Molecular Biology and Evolution 17: 32-43.

Example input and output from Conserved uPEP Searches

Sequence of interest: Asparagine Synthetase Domain Containing 1 (ASNSD1) against all databases. Input. Output.

Where the RefSeq identifier is known: Homo sapiens Hairless (HR) transcript against all databases. Input. Output.

Publication related:

Joe Rothnagel

Phone: +61 7 336 54629

Email: j.rothnagel@uq.edu.au

uPEP Server related

Mitchell Stanton-Cook

Email: m.stantoncook@gmail.com