UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2013_01 STATISTICS 1. INTRODUCTION Release 2013_01 of 09-Jan-2013 of UniProtKB/TrEMBL contains 29266939 sequence entries, comprising 9427157298 amino acids . 936120 sequences have been added since release 2012_11, the sequence data of 36647 existing entries has been updated and the annotations of 11841242 entries have been revised. This represents an increase of 3%. Number of fragments: 3804361 Protein existence (PE): entries % 1: Evidence at protein level 14128 0.05% 2: Evidence at transcript level 647041 2.21% 3: Inferred from homology 6777086 23.16% 4: Predicted 21828684 74.58% 5: Uncertain 0 0.00% The growth of the database is summarized below.2. TAXONOMIC ORIGIN Total number of species represented in this release of UniProtKB/TrEMBL: 386951 The first twenty species represent 1753638 sequences: 6 % of the total number of entries. 2.1 Table of the frequency of occurrence of species Species represented 1x:16242 2x:64555 3x:34983 4x:23250 5x:14696 6x:10625 7x: 7987 8x: 6241 9x: 5054 10x: 9820 11- 20x:25669 21- 50x: 9048 51-100x: 3468 >100x: 9130 2.2 Table of the most represented species ------ --------- -------------------------------------------- Number Frequency Species ------ --------- -------------------------------------------- 1 502941 Human immunodeficiency virus 1 2 180820 uncultured bacterium 3 113576 Homo sapiens (Human) 4 96951 Oryza sativa subsp. japonica (Rice) 5 78428 Hepatitis C virus 6 73724 Glycine max (Soybean) (Glycine hispida) 7 68970 Macaca mulatta (Rhesus macaque) 8 58232 Mus musculus (Mouse) 9 56116 Medicago truncatula (Barrel medic) (Medicago tribuloides) 10 54935 Hepatitis B virus (HBV) 11 54191 Danio rerio (Zebrafish) (Brachydanio rerio) 12 54089 Vitis vinifera (Grape) 13 50594 Trichomonas vaginalis 14 49230 Tetraodon nigroviridis (Spotted green pufferfish) (Chelonodon nigroviridis) 15 48878 Takifugu rubripes (Japanese pufferfish) (Fugu rubripes) 16 44540 Populus trichocarpa (Western balsam poplar) 17 43145 Callithrix jacchus (White-tufted-ear marmoset) 18 42299 Arabidopsis thaliana (Mouse-ear cress) 19 42129 Zea mays (Maize) 20 39850 Paramecium tetraurelia 21 39805 Oryza sativa subsp. indica (Rice) 22 39291 Setaria italica (Foxtail millet) (Panicum italicum) 23 38163 human gut metagenome 24 35879 Solanum lycopersicum (Tomato) (Lycopersicon esculentum) 25 35602 Ailuropoda melanoleuca (Giant panda) 26 35193 Acyrthosiphon pisum (Pea aphid) 27 35066 Caenorhabditis japonica 28 34802 Physcomitrella patens subsp. patens (Moss) 29 34453 Thalassiosira oceanica (Marine diatom) 30 34176 Drosophila melanogaster (Fruit fly) 31 33910 Rattus norvegicus (Rat) 32 33777 Sorghum bicolor (Sorghum) (Sorghum vulgare) 33 33267 Selaginella moellendorffii (Spikemoss) 34 32769 Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress) 35 32339 Oryza brachyantha 36 32122 Caenorhabditis remanei (Caenorhabditis vulgaris) 37 32093 Oryza glaberrima (African rice) 38 31833 Pan troglodytes (Chimpanzee) 39 31472 Sus scrofa (Pig) 40 31397 Ricinus communis (Castor bean) 41 30917 Daphnia pulex (Water flea) 42 30300 Caenorhabditis brenneri (Nematode worm) 43 30143 Brachypodium distachyon (Purple false brome) (Trachynia distachya) 44 29815 Amphimedon queenslandica (Sponge) 45 29451 Strongylocentrotus purpuratus (Purple sea urchin) 46 29315 Pristionchus pacificus 47 29178 Branchiostoma floridae (Florida lancelet) (Amphioxus) 48 29053 Oikopleura dioica (Tunicate) 49 28521 Xenopus tropicalis (Western clawed frog) (Silurana tropicalis) 50 28446 Canis familiaris (Dog) (Canis lupus familiaris) 51 28362 Escherichia coli 52 28343 Simian immunodeficiency virus (SIV) 53 28055 Gasterosteus aculeatus (Three-spined stickleback) 54 27685 Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey) 55 27490 Oreochromis niloticus (Nile tilapia) (Tilapia nilotica) 56 27089 Gorilla gorilla gorilla (Lowland gorilla) 57 26818 Crassostrea gigas (Pacific oyster) (Crassostrea angulata) 58 26790 Gallus gallus (Chicken) 59 25900 Oryzias latipes (Medaka fish) (Japanese ricefish) 60 25758 Loxodonta africana (African elephant) 61 25721 Phytophthora sojae (strain P6497) (Soybean stem and root rot agent) 62 25384 Bos taurus (Bovine) 63 25074 Oryctolagus cuniculus (Rabbit) 64 24879 Nematostella vectensis (Starlet sea anemone) 65 24643 Tetrahymena thermophila (strain SB210) 66 24200 Cricetulus griseus (Chinese hamster) (Cricetulus barabensis griseus) 67 24059 Equus caballus (Horse) 68 23715 Ornithorhynchus anatinus (Duckbill platypus) 69 23565 Oxytricha trifallax 70 23225 Nomascus leucogenys (Northern white-cheeked gibbon) (Hylobates leucogenys) 71 23115 Perkinsus marinus (strain ATCC 50983 / TXsc) 72 22714 Monodelphis domestica (Gray short-tailed opossum) 73 22560 Sarcophilus harrisii (Tasmanian devil) (Sarcophilus laniarius) 74 22437 Caenorhabditis elegans 75 22305 Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii) 76 22163 gut metagenome 77 21821 Latimeria chalumnae (West Indian ocean coelacanth) 78 21699 Hordeum vulgare var. distichum (Two-rowed barley) 79 21546 Heterocephalus glaber (Naked mole rat) 80 21339 Caenorhabditis briggsae 81 21086 Ixodes scapularis (Black-legged tick) (Deer tick) 82 20854 Myotis lucifugus (Little brown bat) 83 20732 Pelodiscus sinensis (Chinese softshell turtle) (Trionyx sinensis) 84 20130 Otolemur garnettii (Small-eared galago) (Garnett's greater bushbaby) 85 20114 Ciona savignyi (Pacific transparent sea squirt) 86 20069 Cavia porcellus (Guinea pig) 87 19969 Spermophilus tridecemlineatus (Thirteen-lined ground squirrel) 88 19671 Taeniopygia guttata (Zebra finch) (Poephila guttata) 89 19438 Wuchereria bancrofti 90 19320 Toxoplasma gondii 91 19255 Anolis carolinensis (Green anole) (American chameleon) 92 19200 Trypanosoma cruzi (strain CL Brener) 93 18919 Culex quinquefasciatus (Southern house mosquito) (Culex pungens) 94 18771 mine drainage metagenome 95 18706 Drosophila simulans (Fruit fly) 96 18585 Ciona intestinalis (Transparent sea squirt) (Ascidia intestinalis) 97 18121 Atta cephalotes (Leafcutter ant) 98 17839 Laccaria bicolor (strain S238N-H82 / ATCC MYA-4686) (Bicoloured deceiver) 99 17784 Fusarium oxysporum (strain Fo5176) (Panama disease fungus) 100 17599 Phytophthora infestans (strain T30-4) (Potato late blight fungus) 101 17384 Aedes aegypti (Yellowfever mosquito) (Culex aegypti) 102 17374 Bombyx mori (Silk moth) 103 17277 Nasonia vitripennis (Parasitic wasp) 104 17031 Drosophila yakuba (Fruit fly) 105 17011 Tribolium castaneum (Red flour beetle) 106 16946 Rhizopus delemar (strain RA 99-880 / ATCC MYA-4621 / FGSC 9543 / NRRL 43880) 107 16871 Meleagris gallopavo (Common turkey) 108 16714 Drosophila persimilis (Fruit fly) 109 16643 Fusarium oxysporum f. sp. lycopersici 110 16475 Drosophila pseudoobscura pseudoobscura (Fruit fly) 111 16426 Ectocarpus siliculosus (Brown alga) 112 16345 Botryotinia fuckeliana (strain T4) (Noble rot fungus) (Botrytis cinerea) 113 16306 Danaus plexippus (Monarch butterfly) 114 16263 Trichinella spiralis (Trichina worm) 115 16237 Melampsora larici-populina (strain 98AG31 / pathotype 3-4-7) 116 16188 Drosophila sechellia (Fruit fly) 117 16140 Schistosoma japonicum (Blood fluke) 118 16110 Colletotrichum higginsianum (strain IMI 349063) (Crucifer anthracnose fungus) 119 15930 Hepatitis C virus subtype 1b 120 15816 Plasmodium falciparum 121 15793 Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 122 15762 Phaeosphaeria nodorum (strain SN15 / ATCC MYA-4574 / FGSC 10173) 123 15715 Naegleria gruberi (Amoeba) 124 15653 Nectria haematococca (strain 77-13-4 / ATCC MYA-4622 / FGSC 9596 / MPVI) 125 15630 Anopheles gambiae (African malaria mosquito) 126 15563 Phytophthora ramorum (Sudden oak death agent) 127 15420 Drosophila willistoni (Fruit fly) 128 15354 Loa loa (Eye worm) (Filaria loa) 129 15225 Pythium ultimum 130 15173 Hepatitis C virus subtype 1a 131 15143 Drosophila ananassae (Fruit fly) 132 15036 Harpegnathos saltator (Jerdon's jumping ant) 133 14927 Drosophila erecta (Fruit fly) 134 14851 Chlamydomonas reinhardtii (Chlamydomonas smithii) 135 14797 Camponotus floridanus (Florida carpenter ant) 136 14788 Drosophila mojavensis (Fruit fly) 137 14701 Drosophila virilis (Fruit fly) 138 14697 Plasmodium chabaudi 139 14650 Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi) 140 14610 Gaeumannomyces graminis var. tritici (strain R3-111a-1) 141 14417 Volvox carteri (Green alga) 142 14339 Serpula lacrymans var. lacrymans (strain S7.3) (Dry rot fungus) 143 14336 Solenopsis invicta (Red imported fire ant) (Solenopsis wagneri) 144 14236 Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 145 13966 Acromyrmex echinatior (Panamanian leafcutter ant) 146 13867 Phanerochaete carnosa (strain HHB-10118-sp) (White-rot fungus) 147 13863 Clonorchis sinensis (Chinese liver fluke) 148 13801 Macrophomina phaseolina (strain MS6) (Charcoal rot fungus) 149 13766 Aspergillus niger (strain CBS 513.88 / FGSC A1513) 150 13648 Moniliophthora perniciosa (strain FA553 / isolate CP02) 151 13530 Trypanosoma cruzi 152 13329 Aspergillus flavus 153 13266 Coprinopsis cinerea (strain Okayama-7 / 130 / ATCC MYA-4618 / FGSC 9003) 154 13186 Mustela putorius furo (European domestic ferret) (Mustela furo) 155 13121 Schizophyllum commune (strain H4-8 / FGSC 9210) (Split gill fungus) 156 13043 Magnaporthe oryzae (strain 70-15 / ATCC MYA-4617 / FGSC 8958) 157 12983 Albugo laibachii Nc14 158 12950 Stigmatella aurantiaca (strain DW4/3-1) 159 12936 Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 160 12900 Gibberella zeae (strain PH-1 / ATCC MYA-4620 / FGSC 9075 / NRRL 31084) 161 12722 Serpula lacrymans var. lacrymans (strain S7.9) (Dry rot fungus) 162 12696 Trypanosoma congolense (strain IL3000) 163 12682 Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 164 12674 Schistosoma mansoni (Blood fluke) 165 12603 Xenopus laevis (African clawed frog) 166 12570 Ralstonia solanacearum (Pseudomonas solanacearum) 167 12447 Fusarium pseudograminearum (strain CS3096) (Wheat and barley crown-rot fungus) 168 12446 Leptosphaeria maculans (strain JN3 / isolate v23.1.3 / race Av1-4-5-6-7-8) 169 12440 Polysphondylium pallidum (Cellular slime mold) 170 12390 Rabies virus 171 12389 Hypocrea virens (strain Gv29-8 / FGSC 10586) (Gliocladium virens) 172 12352 Dictyostelium purpureum (Slime mold) 173 12329 uncultured archaeon 174 12152 Dictyostelium fasciculatum (strain SH3) (Slime mold) 175 11994 Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 176 11993 Colletotrichum graminicola (strain M1.001 / M2 / FGSC 10212) 177 11946 Porcine reproductive and respiratory syndrome virus (PRRSV) 178 11945 Emericella nidulans 179 11927 Helicobacter pylori (Campylobacter pylori) 180 11914 Apis mellifera (Honeybee) 181 11815 Hypocrea atroviridis (strain ATCC 20476 / IMI 206040) (Trichoderma atroviride) 182 11780 Piriformospora indica (strain DSM 11827) 183 11752 Chondrocladia sp. SMF
2.3 Taxonomic distribution of the sequences Kingdom sequences (% of the database) Archaea 408842 ( 1%) Bacteria 20361597 ( 70%) Eukaryota 6846383 ( 23%) Viruses 1547692 ( 5%) Other 102424 ( <1%) Within Eukaryota:
Category sequences (% of Eukaryota) (% of the complete database) Human 113612 ( 2%) ( 0%) Other Mammalia 845085 ( 12%) ( 3%) Other Vertebrata 753401 ( 11%) ( 3%) Viridiplantae 1317778 ( 19%) ( 5%) Fungi 1510587 ( 22%) ( 5%) Insecta 782338 ( 11%) ( 3%) Nematoda 252562 ( 4%) ( 1%) Other 1271020 ( 19%) ( 4%) 3. SEQUENCE SIZE Repartition of the sequences by size (excluding fragments) From To Number From To Number 1- 50 755730 1001-1100 168936 51- 100 2504899 1101-1200 118418 101- 150 2798075 1201-1300 83249 151- 200 2715002 1301-1400 53034 201- 250 2729580 1401-1500 42871 251- 300 2645019 1501-1600 29671 301- 350 2401964 1601-1700 22537 351- 400 1820598 1701-1800 17087 401- 450 1569876 1801-1900 14194 451- 500 1287032 1901-2000 12118 501- 550 851336 2001-2100 9497 551- 600 655543 2101-2200 9743 601- 650 479546 2201-2300 7592 651- 700 376441 2301-2400 6087 701- 750 317101 2401-2500 5197 751- 800 279539 >2500 42031 801- 850 213365 851- 900 190510 901- 950 132149 951-1000 97011
The average sequence length in UniProtKB/TrEMBL is 322 amino acids. The shortest sequence is G0XMK1_9MYRT: 1 amino acids. The longest sequence is Q3ASY8_CHLCH: 36805 amino acids. 4. STATISTICS FOR SOME LINE TYPES The following table summarizes the total number of some UniProtKB/TrEMBL lines, as well as the number of entries with at least one such line, and the frequency of the lines. Total Number of Average Line type / subtype number entries per entry --------------------------------- -------- --------- --------- References (RL) 35659949 1.22 Submitted to EMBL/GenBank/DDBJ 19645186 18124632 0.67 Journal 14507716 13660992 0.50 Submitted to other databases 1490660 1481406 0.05 Thesis 9861 9803 <0.01 Book citation 6506 6457 <0.01 Unpublished observations 19 19 <0.01 Patent 1 1 <0.01 Total number of distinct authors cited in UniProtKB/TrEMBL: 456562 Total Number of Average Line type / subtype number entries per entry Rank --------------------------------- -------- --------- --------- ---- Comments (CC) 36180072 1.24 CATALYTIC ACTIVITY 3133293 2851026 0.11 4 CAUTION 13393492 13393247 0.46 1 COFACTOR 1186735 1098197 0.04 8 DOMAIN 118504 113749 <0.01 9 FUNCTION 3425325 3205049 0.12 3 INTERACTION 690 690 <0.01 11 MISCELLANEOUS 82701 82605 <0.01 10 PATHWAY 1552547 1411355 0.05 7 SIMILARITY 8834143 7669541 0.30 2 SUBCELLULAR LOCATION 2759375 2633226 0.09 5 SUBUNIT 1693267 1673876 0.06 6 Total number of comment topics: 11 Total Number of Average Line type / subtype number entries per entry Rank --------------------------------- -------- --------- --------- ---- Features (FT) 7381377 0.25 CHAIN 774905 642346 0.03 2 NON_TER 5990131 3805013 0.20 1 SIGNAL 615484 612214 0.02 3 TRANSIT 857 856 <0.01 4 Total number of feature keys: 4 Total Number of Average Line type / subtype number entries per entry Rank Category --------------------------------- -------- --------- --------- ---- ------------------------------------------- Cross-references (DR) 330404216 11.29 AGD 2525 2525 <0.01 85 Organism-specific databases ANU-2DPAGE 52 52 <0.01 101 2D gel databases Allergome 3024 2409 <0.01 81 Protein family/group databases ArachnoServer 66 66 <0.01 100 Organism-specific databases ArrayExpress 86858 86858 <0.01 52 Gene expression databases BRENDA 2680 2651 <0.01 83 Enzyme and pathway databases Bgee 118824 118824 <0.01 49 Gene expression databases BioCyc 3585778 3547066 0.12 20 Enzyme and pathway databases CAZy 74131 69652 <0.01 56 Protein family/group databases CGD 7064 7064 <0.01 77 Organism-specific databases COMPLUYEAST-2DPAGE 5 5 <0.01 107 2D gel databases CTD 322062 320469 0.01 38 Organism-specific databases ChEMBL 576 576 <0.01 92 Other ConoServer 160 160 <0.01 96 Organism-specific databases DIP 2782 2777 <0.01 82 Protein-protein interaction databases DNASU 43665 43327 <0.01 61 Protocols and materials databases EMBL 32113816 28423223 1.10 3 Sequence databases Ensembl 959177 943419 0.03 29 Genome annotation databases EnsemblBacteria 834798 800722 0.03 30 Genome annotation databases EnsemblFungi 262813 261328 0.01 41 Genome annotation databases EnsemblMetazoa 629747 614533 0.02 32 Genome annotation databases EnsemblPlants 425229 405703 0.01 37 Genome annotation databases EnsemblProtists 126697 125197 <0.01 48 Genome annotation databases EuPathDB 178957 178954 0.01 45 Organism-specific databases EvolutionaryTrace 8177 8177 <0.01 75 Other FlyBase 182089 180690 0.01 44 Organism-specific databases GO 59000915 18239516 2.02 2 Ontologies Gene3D 12548459 9987923 0.43 6 Family and domain databases GeneID 8760277 8545494 0.30 9 Genome annotation databases GeneTree 798347 798164 0.03 31 Phylogenomic databases Genevestigator 93341 93333 <0.01 51 Gene expression databases GenoList 14735 14462 <0.01 73 Organism-specific databases GenomeRNAi 21434 21434 <0.01 67 Other GenomeReviews 4251570 4152775 0.15 16 Genome annotation databases Gramene 67608 67608 <0.01 57 Organism-specific databases H-InvDB 626 478 <0.01 91 Organism-specific databases HAMAP 2916833 2880798 0.10 22 Family and domain databases HGNC 48958 48878 <0.01 59 Organism-specific databases HOGENOM 3658839 3658792 0.13 19 Phylogenomic databases HOVERGEN 311258 311246 0.01 39 Phylogenomic databases HSSP 250661 250434 0.01 42 3D structure databases IPI 309350 308676 0.01 40 Sequence databases InParanoid 189538 189538 0.01 43 Phylogenomic databases IntAct 16845 16845 <0.01 71 Protein-protein interaction databases InterPro 63181437 22759474 2.16 1 Family and domain databases KEGG 7902138 7711636 0.27 11 Genome annotation databases KO 3098495 3085017 0.11 21 Phylogenomic databases LegioList 5138 5110 <0.01 78 Organism-specific databases Leproma 1272 1270 <0.01 88 Organism-specific databases MEROPS 81141 81140 <0.01 53 Protein family/group databases MGI 35077 34422 <0.01 63 Organism-specific databases MINT 8590 8590 <0.01 74 Protein-protein interaction databases NextBio 104044 103709 <0.01 50 Other OMA 3889766 3889385 0.13 18 Phylogenomic databases OrthoDB 557052 557014 0.02 34 Phylogenomic databases PANTHER 4184001 3955873 0.14 17 Family and domain databases PATRIC 8310247 8310154 0.28 10 Genome annotation databases PDB 18385 10298 <0.01 68 3D structure databases PDBsum 18152 10143 <0.01 69 3D structure databases PHCI-2DPAGE 99 99 <0.01 98 2D gel databases PIR 173697 140856 0.01 46 Sequence databases PIRSF 2516170 2515493 0.09 26 Family and domain databases PMAP-CutDB 214 214 <0.01 94 Other PMMA-2DPAGE 2 2 <0.01 108 2D gel databases PRIDE 476086 476086 0.02 36 Proteomic databases PRINTS 4469756 3973785 0.15 15 Family and domain databases PROSITE 14613027 9689583 0.50 5 Family and domain databases Pathway_Interaction_DB 11 9 <0.01 106 Enzyme and pathway databases PaxDb 17042 17042 <0.01 70 Proteomic databases PeptideAtlas 144 144 <0.01 97 Proteomic databases PeroxiBase 2558 2550 <0.01 84 Protein family/group databases Pfam 28789057 21143842 0.98 4 Family and domain databases PharmGKB 4279 4279 <0.01 80 Organism-specific databases PhosphoSite 1167 1167 <0.01 89 PTM databases PhylomeDB 151147 151147 0.01 47 Phylogenomic databases PomBase 40 27 <0.01 102 Organism-specific databases PptaseDB 36 34 <0.01 103 Protein family/group databases ProDom 572760 547986 0.02 33 Family and domain databases ProMEX 268 268 <0.01 93 Proteomic databases ProtClustDB 2720788 2720777 0.09 24 Phylogenomic databases ProteinModelPortal 7763747 7763747 0.27 12 3D structure databases PseudoCAP 4539 4533 <0.01 79 Organism-specific databases REBASE 32949 32946 <0.01 64 Protein family/group databases REPRODUCTION-2DPAGE 84 83 <0.01 99 2D gel databases RGD 24752 24429 <0.01 66 Organism-specific databases Reactome 209 179 <0.01 95 Enzyme and pathway databases RefSeq 8800149 8554497 0.30 8 Sequence databases SGD 11 11 <0.01 105 Organism-specific databases SMART 6511821 4935454 0.22 14 Family and domain databases SMR 1667955 1667955 0.06 27 3D structure databases STRING 2588334 2588334 0.09 25 Protein-protein interaction databases SUPFAM 12032612 9897317 0.41 7 Family and domain databases SWISS-2DPAGE 28 28 <0.01 104 2D gel databases Siena-2DPAGE 2 2 <0.01 109 2D gel databases TAIR 15743 15666 <0.01 72 Organism-specific databases TCDB 2389 2377 <0.01 86 Protein family/group databases TIGRFAMs 6651588 6065915 0.23 13 Family and domain databases TubercuList 1991 1986 <0.01 87 Organism-specific databases UCSC 64222 64050 <0.01 58 Genome annotation databases UniGene 545475 514039 0.02 35 Sequence databases UniPathway 1514470 1409900 0.05 28 Enzyme and pathway databases VectorBase 78249 77732 <0.01 54 Genome annotation databases World-2DPAGE 675 670 <0.01 90 2D gel databases WormBase 42337 42219 <0.01 62 Organism-specific databases Xenbase 25586 25469 <0.01 65 Organism-specific databases ZFIN 45574 45448 <0.01 60 Organism-specific databases dictyBase 7996 7774 <0.01 76 Organism-specific databases eggNOG 2770833 2770812 0.09 23 Phylogenomic databases euHCVdb 75267 75264 <0.01 55 Organism-specific databases Number of explicitly cross-referenced databases: 135 5. AMINO ACID COMPOSITION 5.1 Composition in percent for the complete database Ala (A) 8.63 Gln (Q) 3.97 Leu (L) 9.92 Ser (S) 6.65 Arg (R) 5.42 Glu (E) 6.18 Lys (K) 5.30 Thr (T) 5.56 Asn (N) 4.11 Gly (G) 7.08 Met (M) 2.47 Trp (W) 1.30 Asp (D) 5.32 His (H) 2.21 Phe (F) 4.03 Tyr (Y) 3.05 Cys (C) 1.24 Ile (I) 6.00 Pro (P) 4.67 Val (V) 6.77 Asx (B) 0.000 Glx (Z) 0 Xaa (X) 0.03
Legend: gray = aliphatic, red = acidic, green = small hydroxy, blue = basic, black = aromatic, white = amide, yellow = sulfur 5.2 Classification of the amino acids by their frequency Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe, Gln, Tyr, Met, His, Trp, Cys 6. MISCELLANEOUS STATISTICS Total number of entries encoded on a Mitochondrion: 577580 Total number of entries encoded on a Plasmid: 313599 Total number of entries encoded on a Plastid: 24319 Total number of entries encoded on a Plastid; Apicoplast: 701 Total number of entries encoded on a Plastid; Chloroplast: 212091 Total number of entries encoded on a Plastid; Cyanelle: 8 Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 926