UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2012_01 STATISTICS 1. INTRODUCTION Release 2012_01 of 25-Jan-2012 of UniProtKB/TrEMBL contains 19434245 sequence entries, comprising 6336667304 amino acids . 940824 sequences have been added since release 2011_12, the sequence data of 865 existing entries has been updated and the annotations of 3325172 entries have been revised. This represents an increase of 5%. Number of fragments: 3034906 Protein existence (PE): entries % 1: Evidence at protein level 13062 0.07% 2: Evidence at transcript level 554302 2.85% 3: Inferred from homology 3981888 20.49% 4: Predicted 14884993 76.59% 5: Uncertain 0 0.00% The growth of the database is summarized below.2. TAXONOMIC ORIGIN Total number of species represented in this release of UniProtKB/TrEMBL: 414594 The first twenty species represent 1446592 sequences: 7.4 % of the total number of entries. 2.1 Table of the frequency of occurrence of species Species represented 1x:20104 2x:71769 3x:35551 4x:21142 5x:13163 6x: 9276 7x: 6978 8x: 5226 9x: 4194 10x: 8454 11- 20x:21172 21- 50x: 7465 51-100x: 2779 >100x: 6384 2.2 Table of the most represented species ------ --------- -------------------------------------------- Number Frequency Species ------ --------- -------------------------------------------- 1 435592 Human immunodeficiency virus 1 2 97753 Homo sapiens (Human) 3 95235 Oryza sativa subsp. japonica (Rice) 4 66370 Hepatitis C virus 5 62676 uncultured bacterium 6 60712 Mus musculus (Mouse) 7 54043 Vitis vinifera (Grape) 8 52763 Danio rerio (Zebrafish) (Brachydanio rerio) 9 51312 Macaca mulatta (Rhesus macaque) 10 50479 Trichomonas vaginalis 11 50117 Medicago truncatula (Barrel medic) (Medicago tribuloides) 12 47308 Hepatitis B virus (HBV) 13 44405 Arabidopsis thaliana (Mouse-ear cress) 14 44066 Populus trichocarpa (Western balsam poplar) 15 42080 Zea mays (Maize) 16 42045 Callithrix jacchus (White-tufted-ear marmoset) 17 39850 Paramecium tetraurelia 18 39389 Oryza sativa subsp. indica (Rice) 19 35593 Ailuropoda melanoleuca (Giant panda) 20 34804 Physcomitrella patens subsp. patens (Moss) 21 33943 Rattus norvegicus (Rat) 22 33660 Sorghum bicolor (Sorghum) (Sorghum vulgare) 23 33290 Drosophila melanogaster (Fruit fly) 24 33269 Selaginella moellendorffii (Spikemoss) 25 32604 Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress) 26 31827 Caenorhabditis remanei (Caenorhabditis vulgaris) 27 31571 Monodelphis domestica (Gray short-tailed opossum) 28 31382 Ricinus communis (Castor bean) 29 30550 Daphnia pulex (Water flea) 30 30300 Caenorhabditis brenneri (Nematode worm) 31 29162 Branchiostoma floridae (Florida lancelet) (Amphioxus) 32 29026 Oikopleura dioica (Tunicate) 33 28930 Xenopus tropicalis (Western clawed frog) (Silurana tropicalis) 34 28092 Tetraodon nigroviridis (Spotted green pufferfish) (Chelonodon nigroviridis) 35 28013 Gasterosteus aculeatus (Three-spined stickleback) 36 27779 Bos taurus (Bovine) 37 27264 Canis familiaris (Dog) (Canis lupus familiaris) 38 27088 Gorilla gorilla gorilla (Lowland gorilla) 39 26870 Ornithorhynchus anatinus (Duckbill platypus) 40 25969 Simian immunodeficiency virus (isolate CPZ GAB1) (SIV-cpz) 41 25753 Loxodonta africana (African elephant) 42 25721 Phytophthora sojae (strain P6497) (Soybean stem and root rot agent) 43 25014 Oryctolagus cuniculus (Rabbit) 44 24842 Sus scrofa (Pig) 45 24829 Gallus gallus (Chicken) 46 24817 Nematostella vectensis (Starlet sea anemone) 47 24190 Cricetulus griseus (Chinese hamster) (Cricetulus barabensis griseus) 48 23749 Equus caballus (Horse) 49 23644 Escherichia coli 50 23115 Perkinsus marinus (strain ATCC 50983 / TXsc) 51 23099 Nomascus leucogenys (Northern white-cheeked gibbon) (Hylobates leucogenys) 52 23064 Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey) 53 22507 Sarcophilus harrisii (Tasmanian devil) (Sarcophilus laniarius) 54 21554 Hordeum vulgare var. distichum (Two-rowed barley) 55 21541 Heterocephalus glaber (Naked mole rat) 56 21231 Caenorhabditis briggsae 57 21087 Ixodes scapularis (Black-legged tick) (Deer tick) 58 20982 Ciona intestinalis (Transparent sea squirt) (Ascidia intestinalis) 59 20961 Caenorhabditis elegans 60 20845 Myotis lucifugus (Little brown bat) 61 20427 Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 62 19526 Ralstonia solanacearum (Pseudomonas solanacearum) 63 19201 Trypanosoma cruzi (strain CL Brener) 64 19198 Toxoplasma gondii 65 18906 Culex quinquefasciatus (Southern house mosquito) (Culex pungens) 66 18771 mine drainage metagenome 67 18602 Drosophila simulans (Fruit fly) 68 17839 Laccaria bicolor (strain S238N-H82 / ATCC MYA-4686) (Bicoloured deceiver) 69 17784 Fusarium oxysporum (strain Fo5176) (Panama disease fungus) 70 17599 Phytophthora infestans (strain T30-4) (Potato late blight fungus) 71 17032 Drosophila yakuba (Fruit fly) 72 16992 Tribolium castaneum (Red flour beetle) 73 16755 Aedes aegypti (Yellowfever mosquito) (Culex aegypti) 74 16713 Drosophila persimilis (Fruit fly) 75 16425 Ectocarpus siliculosus (Brown alga) 76 16345 Botryotinia fuckeliana (strain T4) (Noble rot fungus) (Botrytis cinerea) 77 16306 Loa loa (Eye worm) (Filaria loa) 78 16294 Danaus plexippus (Monarch butterfly) 79 16256 Trichinella spiralis (Trichina worm) 80 16239 Botryotinia fuckeliana (strain B05.10) (Noble rot fungus) (Botrytis cinerea) 81 16237 Melampsora larici-populina (strain 98AG31 / pathotype 3-4-7) 82 16190 Drosophila sechellia (Fruit fly) 83 15984 Drosophila pseudoobscura pseudoobscura (Fruit fly) 84 15976 Meleagris gallopavo (Common turkey) 85 15762 Phaeosphaeria nodorum (strain SN15 / ATCC MYA-4574 / FGSC 10173) 86 15714 Naegleria gruberi (Amoeba) 87 15625 Nectria haematococca (strain 77-13-4 / ATCC MYA-4622 / FGSC 9596 / MPVI) 88 15622 Anopheles gambiae (African malaria mosquito) 89 15419 Drosophila willistoni (Fruit fly) 90 15232 Tetrahymena thermophila (strain SB210) 91 15143 Drosophila ananassae (Fruit fly) 92 15029 Harpegnathos saltator (Jerdon's jumping ant) 93 14961 Hepatitis C virus subtype 1a 94 14923 Drosophila erecta (Fruit fly) 95 14850 Chlamydomonas reinhardtii (Chlamydomonas smithii) 96 14792 Camponotus floridanus (Florida carpenter ant) 97 14782 Drosophila mojavensis (Fruit fly) 98 14697 Drosophila virilis (Fruit fly) 99 14669 Plasmodium chabaudi 100 14650 Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi) 101 14417 Volvox carteri (Green alga) 102 14339 Serpula lacrymans var. lacrymans (strain S7.3) (Dry rot fungus) 103 14324 Solenopsis invicta (Red imported fire ant) (Solenopsis wagneri) 104 14238 Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 105 14186 Hepatitis C virus subtype 1b 106 13964 Acromyrmex echinatior (Panamanian leafcutter ant) 107 13773 Plasmodium falciparum 108 13767 Aspergillus niger (strain CBS 513.88 / FGSC A1513) 109 13648 Moniliophthora perniciosa (strain FA553 / isolate CP02) 110 13328 Aspergillus flavus 111 13271 Coprinopsis cinerea (strain Okayama-7 / 130 / ATCC MYA-4618 / FGSC 9003) 112 13121 Schizophyllum commune (strain H4-8 / FGSC 9210) (Split gill fungus) 113 13042 Magnaporthe oryzae (strain 70-15 / ATCC MYA-4617 / FGSC 8958) 114 12983 Albugo laibachii Nc14 115 12950 Stigmatella aurantiaca (strain DW4/3-1) 116 12936 Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 117 12737 Glycine max (Soybean) (Glycine hispida) 118 12722 Serpula lacrymans var. lacrymans (strain S7.9) (Dry rot fungus) 119 12696 Trypanosoma congolense (strain IL3000) 120 12682 Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 121 12596 Schistosoma mansoni (Blood fluke) 122 12576 Xenopus laevis (African clawed frog) 123 12459 Trypanosoma cruzi 124 12446 Leptosphaeria maculans (strain JN3 / isolate v23.1.3 / race Av1-4-5-6-7-8) 125 12441 Polysphondylium pallidum (Cellular slime mold) 126 12352 Dictyostelium purpureum (Slime mold) 127 12152 Dictyostelium fasciculatum (strain SH3) (Slime mold) 128 11994 Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 129 11993 Colletotrichum graminicola (strain M1.001 / M2 / FGSC 10212) 130 11936 Emericella nidulans 131 11780 Piriformospora indica (strain DSM 11827) 132 11716 Thalassiosira pseudonana (Marine diatom) (Cyclotella nana) 133 11703 Salpingoeca sp. (strain ATCC 50818) (Proterospongia sp. 134 11685 Pyrenophora teres f. teres (strain 0-1) (Barley net blotch fungus) 135 11648 Anopheles darlingi (Mosquito) 136 11644 Plasmodium berghei (strain Anka) 137 11586 Aspergillus oryzae (strain ATCC 42149 / RIB 40) (Yellow koji mold) 138 11562 Trichoplax adhaerens (Trichoplax reptans) 139 11557 Trypanosoma vivax Y486 140 11513 Aureococcus anophagefferens (Harmful bloom alga) 141 11497 Brugia malayi (Filarial nematode worm) 142 11491 Aspergillus kawachii (strain NBRC 4308) (White koji mold) 143 11477 Arthrobotrys oligospora (strain ATCC 24927 / CBS 115.81 / DSM 1491) 144 11323 Helicobacter pylori (Campylobacter pylori) 145 11288 Picea sitchensis (Sitka spruce) (Pinus sitchensis) 146 11240 Clonorchis sinensis (Chinese liver fluke) 147 11211 Ktedonobacter racemifer DSM 44963 148 11177 Neurospora tetrasperma (strain FGSC 2509 / P0656) 149 10971 Mycosphaerella graminicola (strain CBS 115943 / IPO323) 150 10966 Streptomyces clavuligerus ATCC 27064 151 10949 Aspergillus niger 152 10934 Schistosoma japonicum (Blood fluke) 153 10841 Pediculus humanus subsp. corporis (Body louse) 154 10820 Chaetomium globosum 155 10782 Porcine reproductive and respiratory syndrome virus (PRRSV) 156 10570 Metarhizium robertsii (strain ARSEF 23 / ATCC MYA-3075) (Metarhizium anisopliae) 157 10547 Podospora anserina (strain S / ATCC MYA-4624 / DSM 980 / FGSC 10383) 158 10545 Rabies virus 159 10542 Verticillium dahliae (strain VdLs.17 / ATCC MYA-4575 / FGSC 10137) 160 10387 Pseudomonas syringae pv. glycinea str. race 4 161 10378 Neurospora tetrasperma (strain FGSC 2508 / ATCC MYA-4615 / P0657) 162 10377 Penicillium marneffei (strain ATCC 18224 / CBS 334.59 / QM 7333) 163 10355 Phaeodactylum tricornutum (strain CCAP 1055/1) 164 10276 Micromonas pusilla (Picoplanktonic green alga) 165 10204 Verticillium albo-atrum (strain VaMs.102 / ATCC MYA-4576 / FGSC 10136) 166 10194 Coccidioides posadasii (strain RMSCC 757 / Silveira) (Valley fever fungus) 167 10152 Ascaris suum (Pig roundworm) (Ascaris lumbricoides) 168 10110 Micromonas sp. (strain RCC299 / NOUM17) (Picoplanktonic green alga) 169 10089 Ajellomyces dermatitidis (strain ATCC 18188 / CBS 674.68) 170 10088 Aspergillus terreus (strain NIH 2624 / FGSC A1156) 171 10052 Neosartorya fischeri (strain ATCC 1020 / DSM 3700 / FGSC A1164 / NRRL 181) 172 10013 Streptomyces bingchenggensis (strain BCW-1) 173 9846 Sordaria macrospora (strain ATCC MYA-333 / DSM 997 / K(L3346) / K-hell) 174 9836 Chlorella variabilis (Green alga) 175 9822 Metarhizium acridum (strain CQMa 102) 176 9760 Thielavia terrestris (strain ATCC 38088 / NRRL 8126) (Acremonium alabamense) 177 9705 Neosartorya fumigata (strain CEA10 / CBS 144.89 / FGSC A1163) 178 9662 Trypanosoma brucei gambiense (strain MHOM/CI/86/DAL972) 179 9651 Cordyceps militaris (strain CM01) (Caterpillar fungus) 180 9634 uncultured archaeon 181 9551 Amycolatopsis mediterranei S699 182 9533 Ajellomyces dermatitidis (strain SLH14081) (Blastomyces dermatitidis) 183 9510 Ajellomyces capsulata (strain H143) (Darling's disease fungus) 184 9485 Ajellomyces dermatitidis (strain ER-3 / ATCC MYA-2586) 185 9443 Ajellomyces capsulata (strain H88) (Darling's disease fungus) 186 9439 Salmo salar (Atlantic salmon) 187 9327 Anolis carolinensis (Green anole) (American chameleon) 188 9237 Monosiga brevicollis (Choanoflagellate) 189 9201 Amycolatopsis mediterranei (strain U-32) 190 9197 Streptomyces himastatinicus ATCC 53653 191 9157 Ajellomyces capsulata (strain G186AR / H82 / ATCC MYA-2454 / RMSCC 2432) 192 9146 Ajellomyces capsulata (strain NAm1 / WU24) (Darling's disease fungus) 193 9139 Pseudomonas syringae pv. pisi str. 1704B 194 9113 Sorangium cellulosum (strain So ce56) (Polyangium cellulosum (strain So ce56)) 195 9112 Hypocrea jecorina (strain QM6a) (Trichoderma reesei) 196 9081 Thielavia heterothallica (strain ATCC 42464 / BCRC 31852 / DSM 1799) 197 9064 Paracoccidioides brasiliensis (strain ATCC MYA-826 / Pb01) 198 9013 Neurospora crassa 199 9012 Neosartorya fumigata (strain ATCC MYA-4609 / Af293 / CBS 101355 / FGSC A1100) 200 8991 Dictyostelium discoideum (Slime mold) 201 8971 Postia placenta (strain ATCC 44394 / Madison 698-R) (Brown rot fungus) 202 8944 Streptosporangium roseum (strain ATCC 12428 / DSM 43021 / JCM 3005 / NI 9100) 203 8941 Streptomyces violaceusniger Tu 4113 204 8940 Burkholderia sp. TJI49 205 8916 Klebsiella pneumoniae 206 8900 Catenulispora acidiphila 207 8860 Arthroderma gypseum (strain ATCC MYA-4604 / CBS 118893) (Microsporum gypseum) 208 8796 Aspergillus clavatus 209 8787 Bradyrhizobium japonicum USDA 6 210 8783 Pseudomonas syringae pv. japonica str. M301072PT 211 8755 Rhodococcus sp. (strain RHA1) 212 8741 Trypanosoma brucei brucei (strain 927/4 GUTat10.1) 213 8705 Trichophyton rubrum (strain ATCC MYA-4607 / CBS 118892) (Athlete's foot fungus) 214 8698 Paracoccidioides brasiliensis (strain Pb18) 215 8691 Streptomyces scabies (strain 87.22) (Streptomyces scabiei) 216 8676 Trichophyton equinum (strain ATCC MYA-4606 / CBS 127.97) (Horse ringworm fungus) 217 8661 Arthroderma otae (strain ATCC MYA-4605 / CBS 113480) (Microsporum canis) 218 8607 Batrachochytrium dendrobatidis (strain JAM81 / FGSC 10211) (Frog chytrid fungus) 219 8599 Entamoeba dispar (strain ATCC PRA-260 / SAW760) 220 8520 Trichophyton tonsurans (strain CBS 112818) (Scalp ringworm fungus) 221 8437 Plesiocystis pacifica SIR-1 222 8433 Candida albicans (strain SC5314 / ATCC MYA-2876) (Yeast) 223 8394 Streptomyces sp. AA4 224 8374 Capsaspora owczarzaki (strain ATCC 30864) 225 8338 Bradyrhizobium japonicum 226 8320 Frankia sp. CN3 227 8308 Entamoeba histolytica 228 8308 Grosmannia clavigera (strain kw1407 / UAMH 11150) (Blue stain fungus) 229 8270 Leishmania major 230 8248 Microscilla marina ATCC 23134 231 8202 Streptomyces sviceus ATCC 29083 232 8201 Microcoleus chthonoplastes PCC 7420 233 8200 Leishmania infantum 234 8186 Leishmania braziliensis 235 8163 Frankia sp. EUN1f 236 8154 Burkholderia xenovorans (strain LB400) 237 8049 Ichthyophthirius multifiliis (strain G5) (White spot disease agent) (Ich) 238 8044 Leishmania mexicana (strain MHOM/GT/2001/U1103) 239 8037 uncultured crenarchaeote 240 7961 Leishmania donovani (strain BPK282A1) 241 7957 Trichophyton verrucosum (strain HKI 0517) 242 7954 Ostreococcus tauri 243 7943 Rhodococcus opacus (strain B4) 244 7917 Methylobacterium nodulans (strain ORS2060 / LMG 21967) 245 7906 Arthroderma benhamiae (strain ATCC MYA-4681 / CBS 112371) 246 7865 Streptomyces ghanaensis ATCC 14672 247 7854 Acaryochloris marina (strain MBIC 11017) 248 7824 Paracoccidioides brasiliensis (strain Pb03) 249 7823 Burkholderia sp. Ch1-1 250 7807 Plasmodium yoelii yoelii 2.3 Taxonomic distribution of the sequences
Kingdom sequences (% of the database) Archaea 340145 ( 2%) Bacteria 12406877 ( 64%) Eukaryota 5343009 ( 27%) Viruses 1304110 ( 7%) Other 40103 ( <1%) Within Eukaryota:
Category sequences (% of Eukaryota) (% of the complete database) Human 97789 ( 2%) ( 1%) Other Mammalia 696374 ( 13%) ( 4%) Other Vertebrata 491043 ( 9%) ( 3%) Viridiplantae 981327 ( 18%) ( 5%) Fungi 1160267 ( 22%) ( 6%) Insecta 735325 ( 14%) ( 4%) Nematoda 166753 ( 3%) ( 1%) Other 1014131 ( 19%) ( 5%) 3. SEQUENCE SIZE Repartition of the sequences by size (excluding fragments) From To Number From To Number 1- 50 437315 1001-1100 117016 51- 100 1575700 1101-1200 82582 101- 150 1791500 1201-1300 57889 151- 200 1731809 1301-1400 37889 201- 250 1742697 1401-1500 30343 251- 300 1686776 1501-1600 21590 301- 350 1538660 1601-1700 16289 351- 400 1177057 1701-1800 12769 401- 450 1008683 1801-1900 10453 451- 500 838337 1901-2000 8999 501- 550 565264 2001-2100 7197 551- 600 439648 2101-2200 7147 601- 650 320348 2201-2300 5618 651- 700 250346 2301-2400 4498 701- 750 215162 2401-2500 3822 751- 800 192484 >2500 31999 801- 850 144395 851- 900 130541 901- 950 89507 951-1000 67010
The average sequence length in UniProtKB/TrEMBL is 326 amino acids. The shortest sequence is G0XMK1_9MYRT: 1 amino acids. The longest sequence is Q3ASY8_CHLCH: 36805 amino acids. 4. STATISTICS FOR SOME LINE TYPES The following table summarizes the total number of some UniProtKB/TrEMBL lines, as well as the number of entries with at least one such line, and the frequency of the lines. Total Number of Average Line type / subtype number entries per entry --------------------------------- -------- --------- --------- References (RL) 23411456 1.20 Submitted to EMBL/GenBank/DDBJ 12980676 11660440 0.67 Journal 9777016 9145857 0.50 Submitted to other databases 638521 631438 0.03 Thesis 8842 8784 <0.01 Book citation 6372 6323 <0.01 Unpublished observations 28 28 <0.01 Patent 1 1 <0.01 Total number of distinct authors cited in UniProtKB/TrEMBL: 424297 Total Number of Average Line type / subtype number entries per entry Rank --------------------------------- -------- --------- --------- ---- Comments (CC) 19008613 0.98 CATALYTIC ACTIVITY 1780463 1635811 0.09 4 CAUTION 6096548 6096538 0.31 1 COFACTOR 625487 588812 0.03 8 DOMAIN 54901 52165 <0.01 9 FUNCTION 1984189 1829023 0.10 3 INTERACTION 610 610 <0.01 11 MISCELLANEOUS 36384 36318 <0.01 10 PATHWAY 868693 794487 0.04 7 SIMILARITY 5135663 4457841 0.26 2 SUBCELLULAR LOCATION 1526875 1462867 0.08 5 SUBUNIT 898800 886808 0.05 6 Total number of comment topics: 11 Total Number of Average Line type / subtype number entries per entry Rank --------------------------------- -------- --------- --------- ---- Features (FT) 5938622 0.31 CHAIN 576743 457258 0.03 2 NON_TER 4958919 3035013 0.26 1 SIGNAL 402361 401254 0.02 3 TRANSIT 599 599 <0.01 4 Total number of feature keys: 4 Total Number of Average Line type / subtype number entries per entry Rank Category --------------------------------- -------- --------- --------- ---- ------------------------------------------- Cross-references (DR) 223546855 11.50 AGD 2526 2526 <0.01 80 Organism-specific databases ANU-2DPAGE 56 56 <0.01 97 2D gel databases Allergome 2431 1834 <0.01 83 Protein family/group databases ArachnoServer 66 66 <0.01 96 Organism-specific databases ArrayExpress 89218 89206 <0.01 51 Gene expression databases BRENDA 2751 2720 <0.01 78 Enzyme and pathway databases Bgee 109239 109036 0.01 49 Gene expression databases BioCyc 670292 655930 0.03 31 Enzyme and pathway databases CAZy 74332 69834 <0.01 55 Protein family/group databases CGD 7094 7094 <0.01 74 Organism-specific databases COMPLUYEAST-2DPAGE 5 5 <0.01 101 2D gel databases CTD 265783 264722 0.01 40 Organism-specific databases CYGD 2 2 <0.01 103 Organism-specific databases ConoServer 152 152 <0.01 91 Organism-specific databases DIP 2614 2609 <0.01 79 Protein-protein interaction databases EMBL 21879665 19146425 1.13 3 Sequence databases Ensembl 648873 632899 0.03 32 Genome annotation databases EnsemblBacteria 835276 801110 0.04 30 Genome annotation databases EnsemblFungi 167133 166889 0.01 48 Genome annotation databases EnsemblMetazoa 294293 284457 0.02 38 Genome annotation databases EnsemblPlants 271340 245434 0.01 39 Genome annotation databases EnsemblProtists 77646 76577 <0.01 52 Genome annotation databases EuPathDB 178993 178992 0.01 46 Organism-specific databases FlyBase 195586 194038 0.01 43 Organism-specific databases GO 36225282 11849328 1.86 2 Ontologies Gene3D 8352741 6703466 0.43 7 Family and domain databases GeneDB_Spombe 1 1 <0.01 105 Organism-specific databases GeneID 6583290 6461690 0.34 10 Genome annotation databases GeneTree 1149010 1148665 0.06 25 Phylogenomic databases Genevestigator 95739 95733 <0.01 50 Gene expression databases GenoList 14741 14468 <0.01 71 Organism-specific databases GenomeReviews 4251713 4153325 0.22 14 Genome annotation databases Gramene 68591 68591 <0.01 56 Organism-specific databases H-InvDB 584 479 <0.01 88 Organism-specific databases HAMAP 1569494 1553226 0.08 23 Family and domain databases HGNC 37151 37082 <0.01 62 Organism-specific databases HOGENOM 2189777 2189735 0.11 21 Phylogenomic databases HOVERGEN 314302 314292 0.02 37 Phylogenomic databases HSSP 251482 251255 0.01 41 3D structure databases IPI 326565 326438 0.02 36 Sequence databases InParanoid 191602 191535 0.01 45 Phylogenomic databases IntAct 18063 18063 <0.01 67 Protein-protein interaction databases InterPro 41075418 14632171 2.11 1 Family and domain databases KEGG 5294439 5195781 0.27 12 KO 1962403 1952732 0.10 22 Family and domain databases LegioList 5140 5112 <0.01 75 Organism-specific databases Leproma 936 935 <0.01 86 Organism-specific databases MEROPS 55592 55592 <0.01 57 Protein family/group databases MGI 37006 36743 <0.01 63 Organism-specific databases MINT 8702 8702 <0.01 72 Protein-protein interaction databases NMPDR 909936 909933 0.05 29 Genome annotation databases NextBio 44179 44177 <0.01 59 Other OMA 3305280 3305270 0.17 16 Phylogenomic databases OrthoDB 570733 570569 0.03 33 Phylogenomic databases PANTHER 2929639 2778535 0.15 18 Family and domain databases PATRIC 8385058 8385026 0.43 6 Genome annotation databases PDB 15979 9221 <0.01 69 3D structure databases PDBsum 15751 9075 <0.01 70 3D structure databases PHCI-2DPAGE 102 102 <0.01 94 2D gel databases PIR 174097 141225 0.01 47 Sequence databases PIRSF 1312941 1312614 0.07 24 Family and domain databases PMAP-CutDB 234 234 <0.01 90 Other PMMA-2DPAGE 2 2 <0.01 102 2D gel databases PRIDE 228928 228882 0.01 42 Proteomic databases PRINTS 3139809 2794504 0.16 17 Family and domain databases PROSITE 9651663 6404202 0.50 5 Family and domain databases Pathway_Interaction_DB 11 9 <0.01 100 Enzyme and pathway databases PeptideAtlas 146 146 <0.01 92 Proteomic databases PeroxiBase 2510 2501 <0.01 81 Protein family/group databases Pfam 18603453 13776021 0.96 4 Family and domain databases PharmGKB 2885 2885 <0.01 77 Organism-specific databases PhosphoSite 1592 1592 <0.01 85 PTM databases PhylomeDB 919065 919042 0.05 28 Phylogenomic databases ProDom 359657 340029 0.02 35 Family and domain databases ProMEX 310 310 <0.01 89 Proteomic databases ProtClustDB 2723223 2723212 0.14 19 Phylogenomic databases ProteinModelPortal 5871028 5867436 0.30 11 3D structure databases PseudoCAP 4564 4558 <0.01 76 Organism-specific databases REBASE 24476 23791 <0.01 66 Protein family/group databases REPRODUCTION-2DPAGE 89 88 <0.01 95 2D gel databases RGD 24937 24658 <0.01 65 Organism-specific databases Reactome 140 118 <0.01 93 Enzyme and pathway databases RefSeq 6605917 6462601 0.34 9 Sequence databases SGD 11 11 <0.01 99 Organism-specific databases SMART 4268878 3237514 0.22 13 Family and domain databases SMR 938482 938472 0.05 27 3D structure databases STRING 2602356 2602172 0.13 20 Protein-protein interaction databases SUPFAM 7985170 6593481 0.41 8 Family and domain databases SWISS-2DPAGE 29 29 <0.01 98 2D gel databases Siena-2DPAGE 2 2 <0.01 104 2D gel databases TAIR 16584 16504 <0.01 68 Organism-specific databases TCDB 2496 2484 <0.01 82 Protein family/group databases TIGR 194627 187572 0.01 44 Genome annotation databases TIGRFAMs 3905058 3559800 0.20 15 Family and domain databases TubercuList 2082 2077 <0.01 84 Organism-specific databases UCSC 54877 54876 <0.01 58 Genome annotation databases UniGene 482689 450844 0.02 34 Sequence databases VectorBase 75570 75062 <0.01 53 Genome annotation databases World-2DPAGE 931 926 <0.01 87 2D gel databases WormBase 39843 39738 <0.01 61 Organism-specific databases Xenbase 24959 24918 <0.01 64 Organism-specific databases ZFIN 42665 41877 <0.01 60 Organism-specific databases dictyBase 8000 7778 <0.01 73 Organism-specific databases eggNOG 1142816 1142816 0.06 26 Phylogenomic databases euHCVdb 75266 75263 <0.01 54 Organism-specific databases Number of explicitly cross-referenced databases: 131 5. AMINO ACID COMPOSITION 5.1 Composition in percent for the complete database Ala (A) 8.59 Gln (Q) 3.91 Leu (L) 9.87 Ser (S) 6.73 Arg (R) 5.46 Glu (E) 6.17 Lys (K) 5.25 Thr (T) 5.60 Asn (N) 4.10 Gly (G) 7.10 Met (M) 2.47 Trp (W) 1.31 Asp (D) 5.30 His (H) 2.21 Phe (F) 4.01 Tyr (Y) 3.03 Cys (C) 1.28 Ile (I) 5.96 Pro (P) 4.76 Val (V) 6.74 Asx (B) 0.000 Glx (Z) 0 Xaa (X) 0.04
Legend: gray = aliphatic, red = acidic, green = small hydroxy, blue = basic, black = aromatic, white = amide, yellow = sulfur 5.2 Classification of the amino acids by their frequency Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe, Gln, Tyr, Met, His, Trp, Cys 6. MISCELLANEOUS STATISTICS Total number of entries encoded on a Mitochondrion: 604499 Total number of entries encoded on a Plasmid: 259357 Total number of entries encoded on a Plastid: 15095 Total number of entries encoded on a Plastid; Apicoplast: 388 Total number of entries encoded on a Plastid; Chloroplast: 164321 Total number of entries encoded on a Plastid; Cyanelle: 8 Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 471