Background Since the genome of K-12 was initially annotated in 1997, additional functional information based on biological characterization and functions of sequence-similar proteins has become available. of functions predicted by BLAST and DARWIN analyses and by the MAGPIE genome annotation system. Conclusions Much knowledge has been gained about functions encoded by the K-12 genome since the 1997 annotation was published. The data presented here should be useful for analysis of gene products as well as gene products encoded by other genomes. Background The field of genomics has been expanding at a rapid pace since the annotated K-12 genome was published in 1997 [1], with the current number of published genomes exceeding 66 and with another 364 on their way according to the Genomes OnLine Database (GOLD) [2]. Deciphering the functions encoded by all gene products of the genomes is the next big challenge in the field. Function attributions through experimental, biochemical buy Atazanavir and genetic analyses and through bioinformatic studies are continuing, and microarray technology is Rabbit Polyclonal to PKA-R2beta (phospho-Ser113) shedding additional light on the functions associated with the gene products of the organism in question. The wealth buy Atazanavir of biological information on is still increasing [3] and is contributing to a better understanding of this organism as well as of functions encoded in other organisms. It is therefore important that the most up-to-date information on gene products is available and used by researchers. Several databases have been assembled for various areas of knowledge about the genome [4,5,6,7,8,9]. Each compilation has a different emphasis and collects different sets of information related to the function of the gene products. In the GenProtEC database, we have been curating information on physiological function and modular construction of gene products. Other databases most closely related to ours include EcoCyc, with emphasis on metabolic pathways [6], the CGSC database, with information on the genotypes and phenotypes of mutant strains [8], and EcoGene, which includes information on gene reconstructions, alternative gene boundaries and verified amino-terminal amino-acid sequences of the mature proteins [5]. The genome project at the University of Wisconsin-Madison presents genome data on K-12 and pathogenic enterobacteria [9]. We present a functional update for K-12 gene products that incorporates information from the literature and referenced databases obtained since the 1997 GenBank deposit. Our focus has been the biological function of the gene products. Coding sequences (CDSs) encoding proteins whose function previously was imputed or not known were re-evaluated, and putative functions were assigned by manually evaluating the results from BLAST and DARWIN (data analysis and retrieval with indexed nucleotide/peptide sequences) analyses. The MAGPIE (multipurpose automated genome project investigation environment) genome annotation system [10] was also applied. MAGPIE detected alternative boundaries for some of the open reading frames (ORFs). Results Number of genes in the K-12 genome For the initial annotation of the K-12 genome [1], 4,404 genes were identified with Blattner numbers (Bnums). Among the genes, 4,288 were believed to encode proteins and 116 to encode RNAs. Since then six Bnums have been retired: bo322, bo395, bo663, bo667, bo669 and bo671 (G. Plunkett, personal communication). In addition, three new genes have been identified and assigned to Bnums. These include the protein-coding b4406 (SWISS-PROT “type”:”entrez-protein”,”attrs”:”text”:”P52099″,”term_id”:”1723194″,”term_text”:”P52099″P52099) and b4407 (SWISS-PROT 032583) and the RNA buy Atazanavir encoding b4408. The current number of genes is 4,401, with 4,285 encoding proteins and 116 encoding RNAs. MAGPIE identified 5,527 candidate CDSs that were assigned to MAGPIE identifiers (Magnums) (see MAGPIE [11] for details). The 4,285 CDSs identified by Bnums were also identified with Magnums. Variations were detected for either the start or stop positions for 1,077 of these CDSs resulting in differences in the encoded proteins ranging from 1 to 147 amino acids, the latter in PtsA buy Atazanavir (Bnum b3947, Magnum ec_6103). The.