Methods ------- Extract Intergenic Regions From 30 Sequenced Genomes (See Below) Perform All Vs. All Nucleotide-Nucleotide BLAST Select Significant Alignments, Concatenate and Format into QRNA Program Input Run QRNA, Extract Alignments Scoring as sRNAs vs. Coding and Null Hypothesis Regions Eliminate Alignment Regions Which Overlap >50% with E. coli Regulatory Regions Extend Regions Within 25 nt Of Other Regions Causing Them To Include Each Other Merge sRNA Regions Which Align or Exactly Overlap Into Families Eliminate Family Regions Not Found Using Both Query And Database Organism As Source Verify Results Result Statistics ----------------- Number of intergenic regions used: 94464 Average number of intergenic regions per organism: 3148.8 Total combined length of intergenic regions: 16663732 nt Average length of intergenic region: 176.4 nt Total number of sRNAs found: 76817 Total number of sRNA families created: 31416 Instructions for Reading Files ------------------------------ Files contain a list of sRNAs organized in by various attributes. Each attribute is described below: Family Designation: Name assigned to family containing listed sRNA entry. Family designation expressed as [Organism name] [locus absolute start location] [locus absolute end location] and is synonymous with the first (header) entry of that family. Source Organism Name: Organism name in which sRNA was initially located (source from which an aligment to another entry was made). Contains first letter of Genus followed by species. A number at the end of Source Organism Name indicates chromosome in the case that organism genome contains multiple chromosomes (for example: Vcholerae1 indicates Vibrio cholerae chromosome 1). Start: Start position of sRNA sequence first nucleotide in genome of source organism. Stop: Stop position of sRNA sequence last nucleotide in genome of source organism. Length: Length of sRNA sequence in number of nucleotides. Score: Calculated by log odds ratio of BLAST alignment sRNA score vs. other as defined by QRNA program. Flanking Genes: Indicated by, from left to right, orientation of preceding gene, preceding gene name, _, following gene name, following gene orientation. Genome Data Set --------------- Gammaproteobacteria: Acinetobacter calcoaceticus Blochmannia floridanus Buchnera aphidicola Coxiella burnetii Erwinia carotovora Escherichia coli Haemophilus ducreyi Haemophilus influenzae Pasteurella multocida Photorhabdus luminescens Pseudomonas aeruginosa Pseudomonas putida Pseudomonas syringae Salmonella enterica Salmonella typhimurium Shewanella oneidensis Shigella flexneri Vibrio cholerae Vibrio parahaemolyticus Vibrio vulnificus Wigglesworthia brevipalpis Xanthomonas campestris Xanthomonas citri Xylella fastidiosa Yersinia pestis Alphaproteobacteria: Agrobacterium tumefaciens Brucella melitensis Caulobacter crescentus Mesorhizobium loti Deinococci: Deinococcus radiodurans Paper Information ----------------- Stanislav Luban and Daisuke Kihara. (2005) Comparative Study of Small RNA and Small Peptides in Complete Genome Sequences Currently submitted for review