Current Release Statistics


         UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2013_08 STATISTICS


1.  INTRODUCTION

Release 2013_08 of 24-Jul-2013 of UniProtKB/TrEMBL contains 41451118 sequence entries,
comprising 13208986710 amino acids .

1589741 sequences have been added since release 2013_07, the sequence data of
3518 existing entries has been updated and the annotations of
4454156 entries have been revised. This represents an increase of 4%.

Number of fragments: 4339045

Protein existence (PE):              entries      %
1: Evidence at protein level           20352     0.05%
2: Evidence at transcript level       823009     1.99%
3: Inferred from homology            8297415    20.02%
4: Predicted                        32310342    77.95%
5: Uncertain                               0     0.00%

The growth of the database is summarized below.
image



2.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 426778

   The first twenty species represent 1885947 sequences:   4.5 % of the
   total number of entries.


   2.1 Table of the frequency of occurrence of species

        Species represented 1x:17611
                            2x:69912
                            3x:37768
                            4x:26917
                            5x:16093
                            6x:11337
                            7x: 8656
                            8x: 6834
                            9x: 5396
                           10x:10576
                       11- 20x:30529
                       21- 50x:10129
                       51-100x: 3919
                         >100x:12598


   2.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1     543519  Human immunodeficiency virus 1
       2     199370  uncultured bacterium
       3     113495  Homo sapiens (Human)
       4      96864  Oryza sativa subsp. japonica (Rice)
       5      88788  Hepatitis C virus
       6      73836  Glycine max (Soybean) (Glycine hispida)
       7      70412  Hordeum vulgare var. distichum (Two-rowed barley)
       8      69137  Macaca mulatta (Rhesus macaque)
       9      60526  Zea mays (Maize)
      10      59640  Hepatitis B virus (HBV)
      11      56466  Mus musculus (Mouse)
      12      56144  Medicago truncatula (Barrel medic) (Medicago tribuloides)
      13      54889  Solanum tuberosum (Potato)
      14      54121  Vitis vinifera (Grape)
      15      52253  Danio rerio (Zebrafish) (Brachydanio rerio)
      16      50601  Trichomonas vaginalis
      17      49237  Tetraodon nigroviridis (Spotted green pufferfish) (Chelonodon nigroviridis)
      18      48897  Takifugu rubripes (Japanese pufferfish) (Fugu rubripes)
      19      44560  Populus trichocarpa (Western balsam poplar) 
      20      43192  Callithrix jacchus (White-tufted-ear marmoset)
      21      41389  Arabidopsis thaliana (Mouse-ear cress)
      22      41203  Brassica rapa subsp. pekinensis (Chinese cabbage) (Brassica pekinensis)
      23      39850  Paramecium tetraurelia
      24      39838  Oryza sativa subsp. indica (Rice)
      25      39300  Setaria italica (Foxtail millet) (Panicum italicum)
      26      38796  Mustela putorius furo (European domestic ferret) (Mustela furo)
      27      38163  human gut metagenome
      28      36673  Drosophila melanogaster (Fruit fly)
      29      36522  Musa acuminata subsp. malaccensis (Wild banana) (Musa malaccensis)
      30      35905  Solanum lycopersicum (Tomato) (Lycopersicon esculentum)
      31      35631  Ailuropoda melanoleuca (Giant panda)
      32      35599  Emiliania huxleyi CCMP1516
      33      35205  Acyrthosiphon pisum (Pea aphid)
      34      35066  Caenorhabditis japonica
      35      34927  Simian immunodeficiency virus (SIV)
      36      34830  Physcomitrella patens subsp. patens (Moss)
      37      34570  Thalassiosira oceanica (Marine diatom)
      38      34355  Aegilops tauschii (Tausch's goatgrass) (Aegilops squarrosa)
      39      33839  Sorghum bicolor (Sorghum) (Sorghum vulgare)
      40      33253  Selaginella moellendorffii (Spikemoss)
      41      32767  Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)
      42      32342  Oryza brachyantha
      43      32200  Sus scrofa (Pig)
      44      32122  Caenorhabditis remanei (Caenorhabditis vulgaris)
      45      32094  Oryza glaberrima (African rice)
      46      31849  Pan troglodytes (Chimpanzee)
      47      31386  Ricinus communis (Castor bean)
      48      31207  Capitella teleta
      49      30921  Daphnia pulex (Water flea)
      50      30300  Caenorhabditis brenneri (Nematode worm)
      51      30146  Brachypodium distachyon (Purple false brome) (Trachynia distachya)
      52      29815  Amphimedon queenslandica (Sponge)
      53      29451  Strongylocentrotus purpuratus (Purple sea urchin)
      54      29317  Pristionchus pacificus (Parasitic nematode)
      55      29183  Branchiostoma floridae (Florida lancelet) (Amphioxus)
      56      29054  Oikopleura dioica (Tunicate)
      57      28835  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      58      28825  Capsella rubella
      59      28778  Escherichia coli
      60      28613  Prunus persica (Peach) (Amygdalus persica)
      61      28495  Canis familiaris (Dog) (Canis lupus familiaris)
      62      28080  Gasterosteus aculeatus (Three-spined stickleback)
      63      27743  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)
      64      27504  Oreochromis niloticus (Nile tilapia) (Tilapia nilotica)
      65      27454  Equus caballus (Horse)
      66      27089  Gorilla gorilla gorilla (Lowland gorilla)
      67      26824  Crassostrea gigas (Pacific oyster) (Crassostrea angulata)
      68      25964  Oryzias latipes (Medaka fish) (Japanese ricefish)
      69      25796  Loxodonta africana (African elephant)
      70      25724  Rattus norvegicus (Rat)
      71      25721  Phytophthora sojae (strain P6497) (Soybean stem and root rot agent) 
      72      25652  Bos taurus (Bovine)
      73      25092  Oryctolagus cuniculus (Rabbit)
      74      24905  Nematostella vectensis (Starlet sea anemone)
      75      24643  Tetrahymena thermophila (strain SB210)
      76      24590  Guillardia theta CCMP2712
      77      24374  Triticum urartu (Red wild einkorn) (Crithodium urartu)
      78      24208  Cricetulus griseus (Chinese hamster) (Cricetulus barabensis griseus)
      79      23716  Ornithorhynchus anatinus (Duckbill platypus)
      80      23565  Oxytricha trifallax
      81      23502  Latimeria chalumnae (West Indian ocean coelacanth)
      82      23115  Perkinsus marinus (strain ATCC 50983 / TXsc)
      83      22750  Monodelphis domestica (Gray short-tailed opossum)
      84      22562  Sarcophilus harrisii (Tasmanian devil) (Sarcophilus laniarius)
      85      22546  Caenorhabditis elegans
      86      22313  Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii)
      87      22163  gut metagenome
      88      21548  Heterocephalus glaber (Naked mole rat)
      89      21346  Caenorhabditis briggsae
      90      21309  Gallus gallus (Chicken)
      91      21106  Ixodes scapularis (Black-legged tick) (Deer tick)
      92      20937  Felis catus (Cat) (Felis silvestris catus)
      93      20867  Myotis lucifugus (Little brown bat)
      94      20838  Tupaia chinensis (Chinese tree shrew)
      95      20758  Pelodiscus sinensis (Chinese softshell turtle) (Trionyx sinensis)
      96      20512  Xiphophorus maculatus (Southern platyfish) (Platypoecilus maculatus)
      97      20133  Otolemur garnettii (Small-eared galago) (Garnett's greater bushbaby)
      98      20114  Ciona savignyi (Pacific transparent sea squirt)
      99      20072  Cavia porcellus (Guinea pig)
     100      19985  Spermophilus tridecemlineatus (Thirteen-lined ground squirrel) 
     101      19816  Nomascus leucogenys (Northern white-cheeked gibbon) (Hylobates leucogenys)
     102      19684  Taeniopygia guttata (Zebra finch) (Poephila guttata)
     103      19551  Anolis carolinensis (Green anole) (American chameleon)
     104      19544  Pteropus alecto (Black flying fox)
     105      19438  Wuchereria bancrofti
     106      19334  Toxoplasma gondii
     107      19200  Trypanosoma cruzi (strain CL Brener)
     108      19057  Chelonia mydas (Green sea-turtle) (Chelonia agassizi)
     109      18946  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
     110      18855  Drosophila simulans (Fruit fly)
     111      18771  mine drainage metagenome
     112      18592  Ciona intestinalis (Transparent sea squirt) (Ascidia intestinalis)
     113      18555  Bos grunniens mutus
     114      18121  Atta cephalotes (Leafcutter ant)
     115      18024  Anopheles gambiae (African malaria mosquito)
     116      17839  Laccaria bicolor (strain S238N-H82 / ATCC MYA-4686) (Bicoloured deceiver) 
     117      17784  Fusarium oxysporum (strain Fo5176) (Panama disease fungus)
     118      17599  Phytophthora infestans (strain T30-4) (Potato late blight fungus)
     119      17518  Bombyx mori (Silk moth)
     120      17412  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
     121      17296  Anas platyrhynchos (Domestic duck) (Anas boschas)
     122      17282  Nasonia vitripennis (Parasitic wasp)
     123      17046  Tribolium castaneum (Red flour beetle)
     124      17040  Drosophila yakuba (Fruit fly)
     125      16946  Rhizopus delemar (strain RA 99-880 / ATCC MYA-4621 / FGSC 9543 / NRRL 43880)  
     126      16914  Meleagris gallopavo (Common turkey)
     127      16714  Drosophila persimilis (Fruit fly)
     128      16698  Drosophila pseudoobscura pseudoobscura (Fruit fly)
     129      16643  Fusarium oxysporum f. sp. lycopersici  
     130      16638  Plasmodium falciparum
     131      16469  Hepatitis C virus subtype 1b
     132      16426  Ectocarpus siliculosus (Brown alga)
     133      16338  Botryotinia fuckeliana (strain T4) (Noble rot fungus) (Botrytis cinerea)
     134      16329  Danaus plexippus (Monarch butterfly)
     135      16274  Trichinella spiralis (Trichina worm)
     136      16237  Melampsora larici-populina (strain 98AG31 / pathotype 3-4-7) 
     137      16188  Drosophila sechellia (Fruit fly)
     138      16156  Schistosoma japonicum (Blood fluke)
     139      16110  Colletotrichum higginsianum (strain IMI 349063) (Crucifer anthracnose fungus)
     140      15793  Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 
     141      15762  Phaeosphaeria nodorum (strain SN15 / ATCC MYA-4574 / FGSC 10173)  
     142      15716  Naegleria gruberi (Amoeba)
     143      15653  Nectria haematococca (strain 77-13-4 / ATCC MYA-4622 / FGSC 9596 / MPVI) 
     144      15568  Phytophthora ramorum (Sudden oak death agent)
     145      15461  Myotis davidii (David's myotis)
     146      15421  Drosophila willistoni (Fruit fly)
     147      15371  Colletotrichum gloeosporioides (strain Nara gc5) (Anthracnose fungus) 
     148      15354  Loa loa (Eye worm) (Filaria loa)
     149      15345  Fusarium oxysporum f. sp. cubense race 1
     150      15225  Pythium ultimum
     151      15177  Hepatitis C virus subtype 1a
     152      15144  Drosophila ananassae (Fruit fly)
     153      15041  Harpegnathos saltator (Jerdon's jumping ant)
     154      14942  Acanthamoeba castellanii str. Neff
     155      14927  Drosophila erecta (Fruit fly)
     156      14910  Dendroctonus ponderosae (mountain pine beetle)
     157      14858  Chlamydomonas reinhardtii (Chlamydomonas smithii)
     158      14853  Klebsiella pneumoniae
     159      14801  Camponotus floridanus (Florida carpenter ant)
     160      14791  Drosophila mojavensis (Fruit fly)
     161      14713  Plasmodium chabaudi
     162      14704  Drosophila virilis (Fruit fly)
     163      14652  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
     164      14610  Gaeumannomyces graminis var. tritici (strain R3-111a-1) 
     165      14417  Volvox carteri (Green alga)
     166      14341  Solenopsis invicta (Red imported fire ant) (Solenopsis wagneri)
     167      14339  Serpula lacrymans var. lacrymans (strain S7.3) (Dry rot fungus)
     168      14265  uncultured archaeon
     169      14236  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
     170      14147  Fusarium oxysporum f. sp. cubense race 4
     171      13970  Acromyrmex echinatior (Panamanian leafcutter ant) 
     172      13923  Hyaloperonospora arabidopsidis (strain Emoy2) (Downy mildew agent) 
     173      13900  Rabies virus
     174      13876  Clonorchis sinensis (Chinese liver fluke)
     175      13867  Phanerochaete carnosa (strain HHB-10118-sp) (White-rot fungus) 
     176      13801  Macrophomina phaseolina (strain MS6) (Charcoal rot fungus)
     177      13766  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
     178      13648  Moniliophthora perniciosa (strain FA553 / isolate CP02)  
     179      13588  Trypanosoma cruzi
     180      13345  Aspergillus flavus 
     181      13336  Colletotrichum orbiculare   
     182      13267  Coprinopsis cinerea (strain Okayama-7 / 130 / ATCC MYA-4618 / FGSC 9003)  
     183      13121  Schizophyllum commune (strain H4-8 / FGSC 9210) (Split gill fungus)
     184      13062  Pseudocercospora fijiensis CIRAD86
     185      13043  Magnaporthe oryzae (strain 70-15 / ATCC MYA-4617 / FGSC 8958)  
     186      12983  Albugo laibachii Nc14
     187      12962  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
     188      12950  Stigmatella aurantiaca (strain DW4/3-1)
     189      12900  Gibberella zeae (strain PH-1 / ATCC MYA-4620 / FGSC 9075 / NRRL 31084)  
     190      12858  Magnaporthe oryzae Y34
     191      12857  Bipolaris maydis C5
     192      12754  Porcine reproductive and respiratory syndrome virus (PRRSV)
     193      12722  Serpula lacrymans var. lacrymans (strain S7.9) (Dry rot fungus)
     194      12711  Magnaporthe oryzae P131
     195      12705  Bipolaris maydis ATCC 48331
     196      12697  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
     197      12696  Trypanosoma congolense (strain IL3000)
     198      12681  Schistosoma mansoni (Blood fluke)
     199      12629  Xenopus laevis (African clawed frog)
     200      12586  Leptosphaeria maculans (strain JN3 / isolate v23.1.3 / race Av1-4-5-6-7-8)  
     201      12447  Fusarium pseudograminearum (strain CS3096) (Wheat and barley crown-rot fungus)
     202      12440  Polysphondylium pallidum (Cellular slime mold)
     203      12414  Dothistroma septosporum NZE10
     204      12389  Hypocrea virens (strain Gv29-8 / FGSC 10586) (Gliocladium virens) 
     205      12352  Dictyostelium purpureum (Slime mold)
     206      12342  Helicobacter pylori (Campylobacter pylori)
     207      12197  Rhizoctonia solani AG-1 IB
     208      12174  Bipolaris sorokiniana ND90Pr
     209      12152  Dictyostelium fasciculatum (strain SH3) (Slime mold)
     210      12078  Ceriporiopsis subvermispora B
     211      12011  Apis mellifera (Honeybee)
     212      11994  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
     213      11993  Colletotrichum graminicola (strain M1.001 / M2 / FGSC 10212)  
     214      11941  Emericella nidulans  
     215      11815  Hypocrea atroviridis (strain ATCC 20476 / IMI 206040) (Trichoderma atroviride)
     216      11780  Piriformospora indica (strain DSM 11827)
     217      11752  Chondrocladia sp. SMF
   2.3  Taxonomic distribution of the sequences

image

   Kingdom        sequences (% of the database)
    Archaea          689097 (  2%)
    Bacteria       30768592 ( 74%)
    Eukaryota       8131770 ( 20%)
    Viruses         1757465 (  4%)
    Other            104193 ( <1%)



   Within Eukaryota:

image

    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                 113531 (  1%)           (  0%)
     Other Mammalia        973886 ( 12%)           (  2%)
     Other Vertebrata      839696 ( 10%)           (  2%)
     Viridiplantae        1673209 ( 21%)           (  4%)
     Fungi                1945973 ( 24%)           (  5%)
     Insecta               844097 ( 10%)           (  2%)
     Nematoda              253392 (  3%)           (  1%)
     Other                1487986 ( 18%)           (  4%)



3.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50 1113998             1001-1100   227576
                 51- 100 3690225             1101-1200   158051
                101- 150 4121105             1201-1300   113566
                151- 200 3994933             1301-1400    68370
                201- 250 4028228             1401-1500    56584
                251- 300 3900661             1501-1600    38678
                301- 350 3531654             1601-1700    28177
                351- 400 2646190             1701-1800    21265
                401- 450 2297592             1801-1900    17222
                451- 500 1881256             1901-2000    14513
                501- 550 1206951             2001-2100    11709
                551- 600  931990             2101-2200    11894
                601- 650  680692             2201-2300     9158
                651- 700  535568             2301-2400     7383
                701- 750  446145             2401-2500     6512
                751- 800  384630             >2500        49973
                801- 850  299029
                851- 900  267011
                901- 950  183691
                951-1000  129893

image


   The average sequence length in UniProtKB/TrEMBL is   318 amino acids.

   The shortest sequence is G0XMK1_9MYRT:     1 amino acids.
   The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.



4.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                    49170975                1.19                                                    
   Submitted to EMBL/GenBank/DDBJ  28877412  27158547      0.70                                                    
   Journal                         18532072  17524570      0.45                                                    
   Submitted to other databases     1744448   1733538      0.04                                                    
   Thesis                             10306     10248     <0.01                                                    
   Book citation                       6736      6686     <0.01                                                    
   Patent                                 1         1     <0.01                                                    

Total number of distinct authors cited in UniProtKB/TrEMBL: 473993


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Comments (CC)                      51904431                1.25                                                    
   CATALYTIC ACTIVITY               3911040   3509952      0.09     4                                              
   CAUTION                         23311857  23295569      0.56     1                                              
   COFACTOR                         1536560   1425126      0.04     8                                              
   DOMAIN                            159005    152928     <0.01     9                                              
   FUNCTION                         4361169   4105568      0.11     3                                              
   INTERACTION                         1251      1251     <0.01    11                                              
   MISCELLANEOUS                     101948    101752     <0.01    10                                              
   PATHWAY                          1929435   1756052      0.05     7                                              
   SIMILARITY                      10962690   9531540      0.26     2                                              
   SUBCELLULAR LOCATION             3421038   3301047      0.08     5                                              
   SUBUNIT                          2208438   2185108      0.05     6                                              

Total number of comment topics: 11


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Features (FT)                       8333857                0.20                                                    
   CHAIN                             869404    710847      0.02     2                                              
   NON_TER                          6773824   4340809      0.16     1                                              
   SIGNAL                            689228    685955      0.02     3                                              
   TRANSIT                             1401      1401     <0.01     4                                              

Total number of feature keys: 4


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank  Category
---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
Cross-references (DR)             418778244               10.10                                                    
   Allergome                           3461      2831     <0.01    84   Protein family/group databases             
   ArachnoServer                         66        66     <0.01   101   Organism-specific databases                
   ArrayExpress                      195540    195540     <0.01    44   Gene expression databases                  
   BRENDA                              2648      2619     <0.01    86   Enzyme and pathway databases               
   Bgee                               99908     99908     <0.01    51   Gene expression databases                  
   BindingDB                           5826      5826     <0.01    77   Other                                      
   BioCyc                           5640234   5572915      0.14    16   Enzyme and pathway databases               
   CAZy                               74012     69539     <0.01    55   Protein family/group databases             
   CGD                                 7033      7033     <0.01    76   Organism-specific databases                
   COMPLUYEAST-2DPAGE                     4         4     <0.01   107   2D gel databases                           
   CTD                               352099    350775      0.01    37   Organism-specific databases                
   ChEMBL                               607       607     <0.01    93   Other                                      
   ChiTaRS                            65948     65948     <0.01    56   Other                                      
   ConoServer                           160       160     <0.01    98   Organism-specific databases                
   DIP                                 2878      2873     <0.01    85   Protein-protein interaction databases      
   DNASU                              42342     42008     <0.01    62   Protocols and materials databases          
   EMBL                            44611291  40431685      1.08     3   Sequence databases                         
   Ensembl                          1002552    988095      0.02    29   Genome annotation databases                
   EnsemblBacteria                 17882907  17609037      0.43     5   Genome annotation databases                
   EnsemblFungi                      351526    349532      0.01    38   Genome annotation databases                
   EnsemblMetazoa                    676337    661010      0.02    33   Genome annotation databases                
   EnsemblPlants                     654222    620702      0.02    34   Genome annotation databases                
   EnsemblProtists                   156294    153898     <0.01    47   Genome annotation databases                
   EuPathDB                          147113    147111     <0.01    49   Organism-specific databases                
   EvolutionaryTrace                   8053      8053     <0.01    74   Other                                      
   FlyBase                           196133    194665     <0.01    43   Organism-specific databases                
   GO                              72922214  23638969      1.76     2   Ontologies                                 
   Gene3D                          16160011  12753815      0.39     8   Family and domain databases                
   GeneID                           9859724   9601683      0.24    11   Genome annotation databases                
   GeneTree                          900068    900008      0.02    31   Phylogenomic databases                     
   Genevestigator                     86540     86534     <0.01    52   Gene expression databases                  
   GenoList                           14733     14460     <0.01    72   Organism-specific databases                
   GenomeRNAi                         19569     19568     <0.01    69   Other                                      
   Gramene                           204056    204056     <0.01    42   Organism-specific databases                
   H-InvDB                              614       467     <0.01    92   Organism-specific databases                
   HAMAP                            3751883   3704375      0.09    20   Family and domain databases                
   HGNC                               47526     47455     <0.01    59   Organism-specific databases                
   HOGENOM                          3654171   3654126      0.09    21   Phylogenomic databases                     
   HOVERGEN                          305496    305485      0.01    39   Phylogenomic databases                     
   IPI                               280051    279160      0.01    40   Sequence databases                         
   InParanoid                        186582    186582     <0.01    45   Phylogenomic databases                     
   IntAct                             17273     17273     <0.01    70   Protein-protein interaction databases      
   InterPro                        77993391  27382046      1.88     1   Family and domain databases                
   KEGG                             8769565   8555102      0.21    12   Genome annotation databases                
   KO                               3578882   3561857      0.09    22   Phylogenomic databases                     
   LegioList                           5138      5110     <0.01    79   Organism-specific databases                
   Leproma                             1272      1270     <0.01    88   Organism-specific databases                
   MEROPS                            138764    138763     <0.01    50   Protein family/group databases             
   MGI                                51779     51463     <0.01    58   Organism-specific databases                
   MINT                               10264     10263     <0.01    73   Protein-protein interaction databases      
   NextBio                           208912    208907      0.01    41   Other                                      
   OMA                              4858421   4858204      0.12    19   Phylogenomic databases                     
   OrthoDB                           553206    553163      0.01    35   Phylogenomic databases                     
   PANTHER                          5196425   4894508      0.13    18   Family and domain databases                
   PATRIC                           8286168   8286051      0.20    13   Genome annotation databases                
   PDB                                20016     11115     <0.01    67   3D structure databases                     
   PDBsum                             19749     10919     <0.01    68   3D structure databases                     
   PIR                               172390    139562     <0.01    46   Sequence databases                         
   PIRSF                            3157576   3154380      0.08    23   Family and domain databases                
   PMAP-CutDB                           209       209     <0.01    96   Other                                      
   PRIDE                             930025    930025      0.02    30   Proteomic databases                        
   PRINTS                           5289728   4728545      0.13    17   Family and domain databases                
   PROSITE                         17532395  11642430      0.42     6   Family and domain databases                
   Pathway_Interaction_DB                10         8     <0.01   106   Enzyme and pathway databases               
   PaxDb                              29088     29086     <0.01    64   Proteomic databases                        
   PeptideAtlas                         129       129     <0.01    99   Proteomic databases                        
   PeroxiBase                          2596      2588     <0.01    87   Protein family/group databases             
   Pfam                            34831380  25528635      0.84     4   Family and domain databases                
   PharmGKB                            3589      3589     <0.01    83   Organism-specific databases                
   PhosphoSite                         1128      1128     <0.01    89   PTM databases                              
   PhylomeDB                         147299    147299     <0.01    48   Phylogenomic databases                     
   PomBase                               40        27     <0.01   102   Organism-specific databases                
   PptaseDB                              36        35     <0.01   103   Protein family/group databases             
   ProDom                            704398    676902      0.02    32   Family and domain databases                
   ProMEX                              5235      5235     <0.01    78   Proteomic databases                        
   ProtClustDB                      2719644   2719633      0.07    26   Phylogenomic databases                     
   ProteinModelPortal               9871281   9871281      0.24    10   3D structure databases                     
   PseudoCAP                           4533      4527     <0.01    80   Organism-specific databases                
   REBASE                             39313     39305     <0.01    63   Protein family/group databases             
   REPRODUCTION-2DPAGE                   66        65     <0.01   100   2D gel databases                           
   RGD                                21083     20192     <0.01    66   Organism-specific databases                
   Reactome                             180       145     <0.01    97   Enzyme and pathway databases               
   RefSeq                           9900105   9609735      0.24     9   Sequence databases                         
   SABIO-RK                             481       481     <0.01    94   Enzyme and pathway databases               
   SGD                                   11        11     <0.01   105   Organism-specific databases                
   SMART                            7772333   5889972      0.19    15   Family and domain databases                
   SMR                              2608329   2608329      0.06    27   3D structure databases                     
   STRING                           2903996   2903927      0.07    24   Protein-protein interaction databases      
   SUPFAM                          16314232  13168694      0.39     7   Family and domain databases                
   SWISS-2DPAGE                          28        28     <0.01   104   2D gel databases                           
   SignaLink                           4406      4404     <0.01    81   Enzyme and pathway databases               
   TAIR                               15255     15182     <0.01    71   Organism-specific databases                
   TCDB                                4242      4234     <0.01    82   Protein family/group databases             
   TIGRFAMs                         8254802   7533766      0.20    14   Family and domain databases                
   TubercuList                         1102      1101     <0.01    90   Organism-specific databases                
   UCSC                               57998     57854     <0.01    57   Genome annotation databases                
   UniGene                           551507    521841      0.01    36   Sequence databases                         
   UniPathway                       1599601   1489114      0.04    28   Enzyme and pathway databases               
   VectorBase                         78249     77732     <0.01    53   Genome annotation databases                
   World-2DPAGE                         673       668     <0.01    91   2D gel databases                           
   WormBase                           42540     42367     <0.01    61   Organism-specific databases                
   Xenbase                            25581     25512     <0.01    65   Organism-specific databases                
   ZFIN                               45657     45091     <0.01    60   Organism-specific databases                
   dictyBase                           7996      7774     <0.01    75   Organism-specific databases                
   eggNOG                           2768423   2768403      0.07    25   Phylogenomic databases                     
   euHCVdb                            75267     75264     <0.01    54   Organism-specific databases                
   mycoCLAP                             422       422     <0.01    95   Protein family/group databases             

Number of explicitly cross-referenced databases: 128


5.  AMINO ACID COMPOSITION

   5.1  Composition in percent for the complete database

   Ala (A) 8.64   Gln (Q) 3.98   Leu (L) 9.94   Ser (S) 6.55
   Arg (R) 5.36   Glu (E) 6.23   Lys (K) 5.33   Thr (T) 5.55
   Asn (N) 4.12   Gly (G) 7.08   Met (M) 2.49   Trp (W) 1.28
   Asp (D) 5.33   His (H) 2.19   Phe (F) 4.05   Tyr (Y) 3.08
   Cys (C) 1.20   Ile (I) 6.09   Pro (P) 4.57   Val (V) 6.79

   Asx (B) 0.000  Glx (Z) 0      Xaa (X) 0.02

image

   Legend: gray = aliphatic, red = acidic, green = small hydroxy,
           blue = basic, black = aromatic, white = amide, yellow = sulfur


   5.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Trp, Cys



6.  MISCELLANEOUS STATISTICS

Total number of entries encoded on a Mitochondrion: 636914
Total number of entries encoded on a Plasmid: 347429
Total number of entries encoded on a Plastid: 26708
Total number of entries encoded on a Plastid; Apicoplast: 750
Total number of entries encoded on a Plastid; Chloroplast: 235505
Total number of entries encoded on a Plastid; Cyanelle: 8
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 1031