Electronic Supplementary Material accompanying the paper entitled "EGene: a configurable pipeline generation system for automated sequence analysis" by Durham, A.M., Kashiwabara, A.I., Matsunaga, F.T.G., Ahagon, P.H., Rainone, F., Varuzza, L. and Gruber, A. Bioinformatics 21(12): 2812-2813, 2005.


Table 1. Components developed for EGene pipeline generation system. By convention, component names have two parts: the prefix, stating the component function, and the suffix, stating the third-party software used, when appropriate.  

Component
Third party software required
Function
assemble_cap3.pl
CAP3
Creates a directory structure, runs CAP3 and analyzes redundancy
assemble_phrap.pl
Phrap
Creates a directory structure, runs Phrap and analyzes redundancy

bigou.pl

-
Initializes and runs the pipeline

filter_blast.pl

BLAST
Marks as invalid the sequences with a significant alignment block obtained with BLAST against a specified database

filter_cross_match.pl

Cross_match
Marks as invalid the sequences with a significant alignment block obtained with Cross_match

filter_quality.pl

-
Marks as invalid the sequences not attaining a set of minimum quality criteria

filter_size.pl

-
Marks as invalid the sequences below a threshold size

mask_cross_match.pl

Cross_match
Masks sequence blocks with significant Cross_match alignments against a database

outsave.pl

-
Produces a snapshot of all sequences in multi-PHD or XML files

report_bases.pl

-
Creates a HTML file reporting on masked, trimmed and good bases for each sequence and the respective averages for a pipeline run

report_filtering.pl

-
Produces a HTML report of the filtering performed on the sequences

report_graphic_complete.pl

-
Generates a detailed graphic report of the quality assignment, vector/primer masking and trimming in multiple HTML files

report_graphic_simple.pl

-
Generates a graphic report of the quality assignment, vector/primer masking and trimming in a single HTML file

snoop_filtered.pl

-
Produces a FASTA, XML or PHD file with either the valid sequences or the sequences invalidated by specified filters.

trimming.pl

-
Trims sequences based on quality and masking

upload_fasta.pl

-
Uploads sequences from a multi-FASTA file

upload_fasta_STDIN.pl

-
Uploads multiple FASTA formatted sequences from standard inputa

upload_phd_dir.pl

-
Uploads PHD files contained in a directory

upload_phd.pl

-
Uploads multiple PHDs from a single concatenated file

upload_traces_phred.pl

Phred
Uploads trace files using Phred for base-calling and quality assignment

upload_seq_names_db.pl

PostgreSQL
Uploads sequences from the database based on their names. Names can be Perl regular expressions. Only runs on database mode.

upload_sql.pl

PostgreSQL
Uploads sequences from the database based on an SQL query. Query should return sequence identifiers. Only runs on database mode.

upload_traces_phred.pl

Phred
Uploads trace files using Phred for base-calling and quality assignment

upload_xml.pl

-
Uploads sequences from an XML file

aThis program can be used to feed FASTA output from other UNIX software into a pipeline (e.g. using UNIX pipes).