UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2012_01 STATISTICS


1.  INTRODUCTION

Release 2012_01 of 25-Jan-2012 of UniProtKB/TrEMBL contains 19434245 sequence entries,
comprising 6336667304 amino acids .

940824 sequences have been added since release 2011_12, the sequence data of
865 existing entries has been updated and the annotations of
3325172 entries have been revised. This represents an increase of 5%.

Number of fragments: 3034906

Protein existence (PE):              entries      %
1: Evidence at protein level           13062     0.07%
2: Evidence at transcript level       554302     2.85%
3: Inferred from homology            3981888    20.49%
4: Predicted                        14884993    76.59%
5: Uncertain                               0     0.00%

The growth of the database is summarized below.

   



2.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 414594

   The first twenty species represent 1446592 sequences:   7.4 % of the
   total number of entries.


   2.1 Table of the frequency of occurrence of species

        Species represented 1x:20104
                            2x:71769
                            3x:35551
                            4x:21142
                            5x:13163
                            6x: 9276
                            7x: 6978
                            8x: 5226
                            9x: 4194
                           10x: 8454
                       11- 20x:21172
                       21- 50x: 7465
                       51-100x: 2779
                         >100x: 6384


   2.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1     435592  Human immunodeficiency virus 1
       2      97753  Homo sapiens (Human)
       3      95235  Oryza sativa subsp. japonica (Rice)
       4      66370  Hepatitis C virus
       5      62676  uncultured bacterium
       6      60712  Mus musculus (Mouse)
       7      54043  Vitis vinifera (Grape)
       8      52763  Danio rerio (Zebrafish) (Brachydanio rerio)
       9      51312  Macaca mulatta (Rhesus macaque)
      10      50479  Trichomonas vaginalis
      11      50117  Medicago truncatula (Barrel medic) (Medicago tribuloides)
      12      47308  Hepatitis B virus (HBV)
      13      44405  Arabidopsis thaliana (Mouse-ear cress)
      14      44066  Populus trichocarpa (Western balsam poplar) 
      15      42080  Zea mays (Maize)
      16      42045  Callithrix jacchus (White-tufted-ear marmoset)
      17      39850  Paramecium tetraurelia
      18      39389  Oryza sativa subsp. indica (Rice)
      19      35593  Ailuropoda melanoleuca (Giant panda)
      20      34804  Physcomitrella patens subsp. patens (Moss)
      21      33943  Rattus norvegicus (Rat)
      22      33660  Sorghum bicolor (Sorghum) (Sorghum vulgare)
      23      33290  Drosophila melanogaster (Fruit fly)
      24      33269  Selaginella moellendorffii (Spikemoss)
      25      32604  Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)
      26      31827  Caenorhabditis remanei (Caenorhabditis vulgaris)
      27      31571  Monodelphis domestica (Gray short-tailed opossum)
      28      31382  Ricinus communis (Castor bean)
      29      30550  Daphnia pulex (Water flea)
      30      30300  Caenorhabditis brenneri (Nematode worm)
      31      29162  Branchiostoma floridae (Florida lancelet) (Amphioxus)
      32      29026  Oikopleura dioica (Tunicate)
      33      28930  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      34      28092  Tetraodon nigroviridis (Spotted green pufferfish) (Chelonodon nigroviridis)
      35      28013  Gasterosteus aculeatus (Three-spined stickleback)
      36      27779  Bos taurus (Bovine)
      37      27264  Canis familiaris (Dog) (Canis lupus familiaris)
      38      27088  Gorilla gorilla gorilla (Lowland gorilla)
      39      26870  Ornithorhynchus anatinus (Duckbill platypus)
      40      25969  Simian immunodeficiency virus (isolate CPZ GAB1) (SIV-cpz) 
      41      25753  Loxodonta africana (African elephant)
      42      25721  Phytophthora sojae (strain P6497) (Soybean stem and root rot agent) 
      43      25014  Oryctolagus cuniculus (Rabbit)
      44      24842  Sus scrofa (Pig)
      45      24829  Gallus gallus (Chicken)
      46      24817  Nematostella vectensis (Starlet sea anemone)
      47      24190  Cricetulus griseus (Chinese hamster) (Cricetulus barabensis griseus)
      48      23749  Equus caballus (Horse)
      49      23644  Escherichia coli
      50      23115  Perkinsus marinus (strain ATCC 50983 / TXsc)
      51      23099  Nomascus leucogenys (Northern white-cheeked gibbon) (Hylobates leucogenys)
      52      23064  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)
      53      22507  Sarcophilus harrisii (Tasmanian devil) (Sarcophilus laniarius)
      54      21554  Hordeum vulgare var. distichum (Two-rowed barley)
      55      21541  Heterocephalus glaber (Naked mole rat)
      56      21231  Caenorhabditis briggsae
      57      21087  Ixodes scapularis (Black-legged tick) (Deer tick)
      58      20982  Ciona intestinalis (Transparent sea squirt) (Ascidia intestinalis)
      59      20961  Caenorhabditis elegans
      60      20845  Myotis lucifugus (Little brown bat)
      61      20427  Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 
      62      19526  Ralstonia solanacearum (Pseudomonas solanacearum)
      63      19201  Trypanosoma cruzi (strain CL Brener)
      64      19198  Toxoplasma gondii
      65      18906  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
      66      18771  mine drainage metagenome
      67      18602  Drosophila simulans (Fruit fly)
      68      17839  Laccaria bicolor (strain S238N-H82 / ATCC MYA-4686) (Bicoloured deceiver) 
      69      17784  Fusarium oxysporum (strain Fo5176) (Panama disease fungus)
      70      17599  Phytophthora infestans (strain T30-4) (Potato late blight fungus)
      71      17032  Drosophila yakuba (Fruit fly)
      72      16992  Tribolium castaneum (Red flour beetle)
      73      16755  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
      74      16713  Drosophila persimilis (Fruit fly)
      75      16425  Ectocarpus siliculosus (Brown alga)
      76      16345  Botryotinia fuckeliana (strain T4) (Noble rot fungus) (Botrytis cinerea)
      77      16306  Loa loa (Eye worm) (Filaria loa)
      78      16294  Danaus plexippus (Monarch butterfly)
      79      16256  Trichinella spiralis (Trichina worm)
      80      16239  Botryotinia fuckeliana (strain B05.10) (Noble rot fungus) (Botrytis cinerea)
      81      16237  Melampsora larici-populina (strain 98AG31 / pathotype 3-4-7) 
      82      16190  Drosophila sechellia (Fruit fly)
      83      15984  Drosophila pseudoobscura pseudoobscura (Fruit fly)
      84      15976  Meleagris gallopavo (Common turkey)
      85      15762  Phaeosphaeria nodorum (strain SN15 / ATCC MYA-4574 / FGSC 10173)  
      86      15714  Naegleria gruberi (Amoeba)
      87      15625  Nectria haematococca (strain 77-13-4 / ATCC MYA-4622 / FGSC 9596 / MPVI) 
      88      15622  Anopheles gambiae (African malaria mosquito)
      89      15419  Drosophila willistoni (Fruit fly)
      90      15232  Tetrahymena thermophila (strain SB210)
      91      15143  Drosophila ananassae (Fruit fly)
      92      15029  Harpegnathos saltator (Jerdon's jumping ant)
      93      14961  Hepatitis C virus subtype 1a
      94      14923  Drosophila erecta (Fruit fly)
      95      14850  Chlamydomonas reinhardtii (Chlamydomonas smithii)
      96      14792  Camponotus floridanus (Florida carpenter ant)
      97      14782  Drosophila mojavensis (Fruit fly)
      98      14697  Drosophila virilis (Fruit fly)
      99      14669  Plasmodium chabaudi
     100      14650  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
     101      14417  Volvox carteri (Green alga)
     102      14339  Serpula lacrymans var. lacrymans (strain S7.3) (Dry rot fungus)
     103      14324  Solenopsis invicta (Red imported fire ant) (Solenopsis wagneri)
     104      14238  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
     105      14186  Hepatitis C virus subtype 1b
     106      13964  Acromyrmex echinatior (Panamanian leafcutter ant) 
     107      13773  Plasmodium falciparum
     108      13767  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
     109      13648  Moniliophthora perniciosa (strain FA553 / isolate CP02)  
     110      13328  Aspergillus flavus 
     111      13271  Coprinopsis cinerea (strain Okayama-7 / 130 / ATCC MYA-4618 / FGSC 9003)  
     112      13121  Schizophyllum commune (strain H4-8 / FGSC 9210) (Split gill fungus)
     113      13042  Magnaporthe oryzae (strain 70-15 / ATCC MYA-4617 / FGSC 8958)  
     114      12983  Albugo laibachii Nc14
     115      12950  Stigmatella aurantiaca (strain DW4/3-1)
     116      12936  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
     117      12737  Glycine max (Soybean) (Glycine hispida)
     118      12722  Serpula lacrymans var. lacrymans (strain S7.9) (Dry rot fungus)
     119      12696  Trypanosoma congolense (strain IL3000)
     120      12682  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
     121      12596  Schistosoma mansoni (Blood fluke)
     122      12576  Xenopus laevis (African clawed frog)
     123      12459  Trypanosoma cruzi
     124      12446  Leptosphaeria maculans (strain JN3 / isolate v23.1.3 / race Av1-4-5-6-7-8)  
     125      12441  Polysphondylium pallidum (Cellular slime mold)
     126      12352  Dictyostelium purpureum (Slime mold)
     127      12152  Dictyostelium fasciculatum (strain SH3) (Slime mold)
     128      11994  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
     129      11993  Colletotrichum graminicola (strain M1.001 / M2 / FGSC 10212)  
     130      11936  Emericella nidulans  
     131      11780  Piriformospora indica (strain DSM 11827)
     132      11716  Thalassiosira pseudonana (Marine diatom) (Cyclotella nana)
     133      11703  Salpingoeca sp. (strain ATCC 50818) (Proterospongia sp. 
     134      11685  Pyrenophora teres f. teres (strain 0-1) (Barley net blotch fungus) 
     135      11648  Anopheles darlingi (Mosquito)
     136      11644  Plasmodium berghei (strain Anka)
     137      11586  Aspergillus oryzae (strain ATCC 42149 / RIB 40) (Yellow koji mold)
     138      11562  Trichoplax adhaerens (Trichoplax reptans)
     139      11557  Trypanosoma vivax Y486
     140      11513  Aureococcus anophagefferens (Harmful bloom alga)
     141      11497  Brugia malayi (Filarial nematode worm)
     142      11491  Aspergillus kawachii (strain NBRC 4308) (White koji mold) 
     143      11477  Arthrobotrys oligospora (strain ATCC 24927 / CBS 115.81 / DSM 1491)  
     144      11323  Helicobacter pylori (Campylobacter pylori)
     145      11288  Picea sitchensis (Sitka spruce) (Pinus sitchensis)
     146      11240  Clonorchis sinensis (Chinese liver fluke)
     147      11211  Ktedonobacter racemifer DSM 44963
     148      11177  Neurospora tetrasperma (strain FGSC 2509 / P0656)
     149      10971  Mycosphaerella graminicola (strain CBS 115943 / IPO323)  
     150      10966  Streptomyces clavuligerus ATCC 27064
     151      10949  Aspergillus niger 
     152      10934  Schistosoma japonicum (Blood fluke)
     153      10841  Pediculus humanus subsp. corporis (Body louse)
     154      10820  Chaetomium globosum  
     155      10782  Porcine reproductive and respiratory syndrome virus (PRRSV)
     156      10570  Metarhizium robertsii (strain ARSEF 23 / ATCC MYA-3075) (Metarhizium anisopliae)
     157      10547  Podospora anserina (strain S / ATCC MYA-4624 / DSM 980 / FGSC 10383) 
     158      10545  Rabies virus
     159      10542  Verticillium dahliae (strain VdLs.17 / ATCC MYA-4575 / FGSC 10137)
     160      10387  Pseudomonas syringae pv. glycinea str. race 4
     161      10378  Neurospora tetrasperma (strain FGSC 2508 / ATCC MYA-4615 / P0657)
     162      10377  Penicillium marneffei (strain ATCC 18224 / CBS 334.59 / QM 7333)
     163      10355  Phaeodactylum tricornutum (strain CCAP 1055/1)
     164      10276  Micromonas pusilla (Picoplanktonic green alga)
     165      10204  Verticillium albo-atrum (strain VaMs.102 / ATCC MYA-4576 / FGSC 10136) 
     166      10194  Coccidioides posadasii (strain RMSCC 757 / Silveira) (Valley fever fungus)
     167      10152  Ascaris suum (Pig roundworm) (Ascaris lumbricoides)
     168      10110  Micromonas sp. (strain RCC299 / NOUM17) (Picoplanktonic green alga)
     169      10089  Ajellomyces dermatitidis (strain ATCC 18188 / CBS 674.68) 
     170      10088  Aspergillus terreus (strain NIH 2624 / FGSC A1156)
     171      10052  Neosartorya fischeri (strain ATCC 1020 / DSM 3700 / FGSC A1164 / NRRL 181) 
     172      10013  Streptomyces bingchenggensis (strain BCW-1)
     173       9846  Sordaria macrospora (strain ATCC MYA-333 / DSM 997 / K(L3346) / K-hell)
     174       9836  Chlorella variabilis (Green alga)
     175       9822  Metarhizium acridum (strain CQMa 102)
     176       9760  Thielavia terrestris (strain ATCC 38088 / NRRL 8126) (Acremonium alabamense)
     177       9705  Neosartorya fumigata (strain CEA10 / CBS 144.89 / FGSC A1163) 
     178       9662  Trypanosoma brucei gambiense (strain MHOM/CI/86/DAL972)
     179       9651  Cordyceps militaris (strain CM01) (Caterpillar fungus)
     180       9634  uncultured archaeon
     181       9551  Amycolatopsis mediterranei S699
     182       9533  Ajellomyces dermatitidis (strain SLH14081) (Blastomyces dermatitidis)
     183       9510  Ajellomyces capsulata (strain H143) (Darling's disease fungus) 
     184       9485  Ajellomyces dermatitidis (strain ER-3 / ATCC MYA-2586) 
     185       9443  Ajellomyces capsulata (strain H88) (Darling's disease fungus) 
     186       9439  Salmo salar (Atlantic salmon)
     187       9327  Anolis carolinensis (Green anole) (American chameleon)
     188       9237  Monosiga brevicollis (Choanoflagellate)
     189       9201  Amycolatopsis mediterranei (strain U-32)
     190       9197  Streptomyces himastatinicus ATCC 53653
     191       9157  Ajellomyces capsulata (strain G186AR / H82 / ATCC MYA-2454 / RMSCC 2432)  
     192       9146  Ajellomyces capsulata (strain NAm1 / WU24) (Darling's disease fungus) 
     193       9139  Pseudomonas syringae pv. pisi str. 1704B
     194       9113  Sorangium cellulosum (strain So ce56) (Polyangium cellulosum (strain So ce56))
     195       9112  Hypocrea jecorina (strain QM6a) (Trichoderma reesei)
     196       9081  Thielavia heterothallica (strain ATCC 42464 / BCRC 31852 / DSM 1799) 
     197       9064  Paracoccidioides brasiliensis (strain ATCC MYA-826 / Pb01)
     198       9013  Neurospora crassa 
     199       9012  Neosartorya fumigata (strain ATCC MYA-4609 / Af293 / CBS 101355 / FGSC A1100) 
     200       8991  Dictyostelium discoideum (Slime mold)
     201       8971  Postia placenta (strain ATCC 44394 / Madison 698-R) (Brown rot fungus) 
     202       8944  Streptosporangium roseum (strain ATCC 12428 / DSM 43021 / JCM 3005 / NI 9100)
     203       8941  Streptomyces violaceusniger Tu 4113
     204       8940  Burkholderia sp. TJI49
     205       8916  Klebsiella pneumoniae
     206       8900  Catenulispora acidiphila 
     207       8860  Arthroderma gypseum (strain ATCC MYA-4604 / CBS 118893) (Microsporum gypseum)
     208       8796  Aspergillus clavatus 
     209       8787  Bradyrhizobium japonicum USDA 6
     210       8783  Pseudomonas syringae pv. japonica str. M301072PT
     211       8755  Rhodococcus sp. (strain RHA1)
     212       8741  Trypanosoma brucei brucei (strain 927/4 GUTat10.1)
     213       8705  Trichophyton rubrum (strain ATCC MYA-4607 / CBS 118892) (Athlete's foot fungus)
     214       8698  Paracoccidioides brasiliensis (strain Pb18)
     215       8691  Streptomyces scabies (strain 87.22) (Streptomyces scabiei)
     216       8676  Trichophyton equinum (strain ATCC MYA-4606 / CBS 127.97) (Horse ringworm fungus)
     217       8661  Arthroderma otae (strain ATCC MYA-4605 / CBS 113480) (Microsporum canis)
     218       8607  Batrachochytrium dendrobatidis (strain JAM81 / FGSC 10211) (Frog chytrid fungus)
     219       8599  Entamoeba dispar (strain ATCC PRA-260 / SAW760)
     220       8520  Trichophyton tonsurans (strain CBS 112818) (Scalp ringworm fungus)
     221       8437  Plesiocystis pacifica SIR-1
     222       8433  Candida albicans (strain SC5314 / ATCC MYA-2876) (Yeast)
     223       8394  Streptomyces sp. AA4
     224       8374  Capsaspora owczarzaki (strain ATCC 30864)
     225       8338  Bradyrhizobium japonicum
     226       8320  Frankia sp. CN3
     227       8308  Entamoeba histolytica
     228       8308  Grosmannia clavigera (strain kw1407 / UAMH 11150) (Blue stain fungus) 
     229       8270  Leishmania major
     230       8248  Microscilla marina ATCC 23134
     231       8202  Streptomyces sviceus ATCC 29083
     232       8201  Microcoleus chthonoplastes PCC 7420
     233       8200  Leishmania infantum
     234       8186  Leishmania braziliensis
     235       8163  Frankia sp. EUN1f
     236       8154  Burkholderia xenovorans (strain LB400)
     237       8049  Ichthyophthirius multifiliis (strain G5) (White spot disease agent) (Ich)
     238       8044  Leishmania mexicana (strain MHOM/GT/2001/U1103)
     239       8037  uncultured crenarchaeote
     240       7961  Leishmania donovani (strain BPK282A1)
     241       7957  Trichophyton verrucosum (strain HKI 0517)
     242       7954  Ostreococcus tauri
     243       7943  Rhodococcus opacus (strain B4)
     244       7917  Methylobacterium nodulans (strain ORS2060 / LMG 21967)
     245       7906  Arthroderma benhamiae (strain ATCC MYA-4681 / CBS 112371) 
     246       7865  Streptomyces ghanaensis ATCC 14672
     247       7854  Acaryochloris marina (strain MBIC 11017)
     248       7824  Paracoccidioides brasiliensis (strain Pb03)
     249       7823  Burkholderia sp. Ch1-1
     250       7807  Plasmodium yoelii yoelii


   
   2.3  Taxonomic distribution of the sequences

   

   Kingdom        sequences (% of the database)
    Archaea          340145 (  2%)
    Bacteria       12406877 ( 64%)
    Eukaryota       5343009 ( 27%)
    Viruses         1304110 (  7%)
    Other             40103 ( <1%)



   Within Eukaryota:

   

    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                  97789 (  2%)           (  1%)
     Other Mammalia        696374 ( 13%)           (  4%)
     Other Vertebrata      491043 (  9%)           (  3%)
     Viridiplantae         981327 ( 18%)           (  5%)
     Fungi                1160267 ( 22%)           (  6%)
     Insecta               735325 ( 14%)           (  4%)
     Nematoda              166753 (  3%)           (  1%)
     Other                1014131 ( 19%)           (  5%)



3.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50  437315             1001-1100   117016
                 51- 100 1575700             1101-1200    82582
                101- 150 1791500             1201-1300    57889
                151- 200 1731809             1301-1400    37889
                201- 250 1742697             1401-1500    30343
                251- 300 1686776             1501-1600    21590
                301- 350 1538660             1601-1700    16289
                351- 400 1177057             1701-1800    12769
                401- 450 1008683             1801-1900    10453
                451- 500  838337             1901-2000     8999
                501- 550  565264             2001-2100     7197
                551- 600  439648             2101-2200     7147
                601- 650  320348             2201-2300     5618
                651- 700  250346             2301-2400     4498
                701- 750  215162             2401-2500     3822
                751- 800  192484             >2500        31999
                801- 850  144395
                851- 900  130541
                901- 950   89507
                951-1000   67010

   


   The average sequence length in UniProtKB/TrEMBL is   326 amino acids.

   The shortest sequence is G0XMK1_9MYRT:     1 amino acids.
   The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.



4.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                    23411456                1.20                                                    
   Submitted to EMBL/GenBank/DDBJ  12980676  11660440      0.67                                                    
   Journal                          9777016   9145857      0.50                                                    
   Submitted to other databases      638521    631438      0.03                                                    
   Thesis                              8842      8784     <0.01                                                    
   Book citation                       6372      6323     <0.01                                                    
   Unpublished observations              28        28     <0.01                                                    
   Patent                                 1         1     <0.01                                                    

Total number of distinct authors cited in UniProtKB/TrEMBL: 424297


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Comments (CC)                      19008613                0.98                                                    
   CATALYTIC ACTIVITY               1780463   1635811      0.09     4                                              
   CAUTION                          6096548   6096538      0.31     1                                              
   COFACTOR                          625487    588812      0.03     8                                              
   DOMAIN                             54901     52165     <0.01     9                                              
   FUNCTION                         1984189   1829023      0.10     3                                              
   INTERACTION                          610       610     <0.01    11                                              
   MISCELLANEOUS                      36384     36318     <0.01    10                                              
   PATHWAY                           868693    794487      0.04     7                                              
   SIMILARITY                       5135663   4457841      0.26     2                                              
   SUBCELLULAR LOCATION             1526875   1462867      0.08     5                                              
   SUBUNIT                           898800    886808      0.05     6                                              

Total number of comment topics: 11


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Features (FT)                       5938622                0.31                                                    
   CHAIN                             576743    457258      0.03     2                                              
   NON_TER                          4958919   3035013      0.26     1                                              
   SIGNAL                            402361    401254      0.02     3                                              
   TRANSIT                              599       599     <0.01     4                                              

Total number of feature keys: 4


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank  Category
---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
Cross-references (DR)             223546855               11.50                                                    
   AGD                                 2526      2526     <0.01    80   Organism-specific databases                
   ANU-2DPAGE                            56        56     <0.01    97   2D gel databases                           
   Allergome                           2431      1834     <0.01    83   Protein family/group databases             
   ArachnoServer                         66        66     <0.01    96   Organism-specific databases                
   ArrayExpress                       89218     89206     <0.01    51   Gene expression databases                  
   BRENDA                              2751      2720     <0.01    78   Enzyme and pathway databases               
   Bgee                              109239    109036      0.01    49   Gene expression databases                  
   BioCyc                            670292    655930      0.03    31   Enzyme and pathway databases               
   CAZy                               74332     69834     <0.01    55   Protein family/group databases             
   CGD                                 7094      7094     <0.01    74   Organism-specific databases                
   COMPLUYEAST-2DPAGE                     5         5     <0.01   101   2D gel databases                           
   CTD                               265783    264722      0.01    40   Organism-specific databases                
   CYGD                                   2         2     <0.01   103   Organism-specific databases                
   ConoServer                           152       152     <0.01    91   Organism-specific databases                
   DIP                                 2614      2609     <0.01    79   Protein-protein interaction databases      
   EMBL                            21879665  19146425      1.13     3   Sequence databases                         
   Ensembl                           648873    632899      0.03    32   Genome annotation databases                
   EnsemblBacteria                   835276    801110      0.04    30   Genome annotation databases                
   EnsemblFungi                      167133    166889      0.01    48   Genome annotation databases                
   EnsemblMetazoa                    294293    284457      0.02    38   Genome annotation databases                
   EnsemblPlants                     271340    245434      0.01    39   Genome annotation databases                
   EnsemblProtists                    77646     76577     <0.01    52   Genome annotation databases                
   EuPathDB                          178993    178992      0.01    46   Organism-specific databases                
   FlyBase                           195586    194038      0.01    43   Organism-specific databases                
   GO                              36225282  11849328      1.86     2   Ontologies                                 
   Gene3D                           8352741   6703466      0.43     7   Family and domain databases                
   GeneDB_Spombe                          1         1     <0.01   105   Organism-specific databases                
   GeneID                           6583290   6461690      0.34    10   Genome annotation databases                
   GeneTree                         1149010   1148665      0.06    25   Phylogenomic databases                     
   Genevestigator                     95739     95733     <0.01    50   Gene expression databases                  
   GenoList                           14741     14468     <0.01    71   Organism-specific databases                
   GenomeReviews                    4251713   4153325      0.22    14   Genome annotation databases                
   Gramene                            68591     68591     <0.01    56   Organism-specific databases                
   H-InvDB                              584       479     <0.01    88   Organism-specific databases                
   HAMAP                            1569494   1553226      0.08    23   Family and domain databases                
   HGNC                               37151     37082     <0.01    62   Organism-specific databases                
   HOGENOM                          2189777   2189735      0.11    21   Phylogenomic databases                     
   HOVERGEN                          314302    314292      0.02    37   Phylogenomic databases                     
   HSSP                              251482    251255      0.01    41   3D structure databases                     
   IPI                               326565    326438      0.02    36   Sequence databases                         
   InParanoid                        191602    191535      0.01    45   Phylogenomic databases                     
   IntAct                             18063     18063     <0.01    67   Protein-protein interaction databases      
   InterPro                        41075418  14632171      2.11     1   Family and domain databases                
   KEGG                             5294439   5195781      0.27    12                                              
   KO                               1962403   1952732      0.10    22   Family and domain databases                
   LegioList                           5140      5112     <0.01    75   Organism-specific databases                
   Leproma                              936       935     <0.01    86   Organism-specific databases                
   MEROPS                             55592     55592     <0.01    57   Protein family/group databases             
   MGI                                37006     36743     <0.01    63   Organism-specific databases                
   MINT                                8702      8702     <0.01    72   Protein-protein interaction databases      
   NMPDR                             909936    909933      0.05    29   Genome annotation databases                
   NextBio                            44179     44177     <0.01    59   Other                                      
   OMA                              3305280   3305270      0.17    16   Phylogenomic databases                     
   OrthoDB                           570733    570569      0.03    33   Phylogenomic databases                     
   PANTHER                          2929639   2778535      0.15    18   Family and domain databases                
   PATRIC                           8385058   8385026      0.43     6   Genome annotation databases                
   PDB                                15979      9221     <0.01    69   3D structure databases                     
   PDBsum                             15751      9075     <0.01    70   3D structure databases                     
   PHCI-2DPAGE                          102       102     <0.01    94   2D gel databases                           
   PIR                               174097    141225      0.01    47   Sequence databases                         
   PIRSF                            1312941   1312614      0.07    24   Family and domain databases                
   PMAP-CutDB                           234       234     <0.01    90   Other                                      
   PMMA-2DPAGE                            2         2     <0.01   102   2D gel databases                           
   PRIDE                             228928    228882      0.01    42   Proteomic databases                        
   PRINTS                           3139809   2794504      0.16    17   Family and domain databases                
   PROSITE                          9651663   6404202      0.50     5   Family and domain databases                
   Pathway_Interaction_DB                11         9     <0.01   100   Enzyme and pathway databases               
   PeptideAtlas                         146       146     <0.01    92   Proteomic databases                        
   PeroxiBase                          2510      2501     <0.01    81   Protein family/group databases             
   Pfam                            18603453  13776021      0.96     4   Family and domain databases                
   PharmGKB                            2885      2885     <0.01    77   Organism-specific databases                
   PhosphoSite                         1592      1592     <0.01    85   PTM databases                              
   PhylomeDB                         919065    919042      0.05    28   Phylogenomic databases                     
   ProDom                            359657    340029      0.02    35   Family and domain databases                
   ProMEX                               310       310     <0.01    89   Proteomic databases                        
   ProtClustDB                      2723223   2723212      0.14    19   Phylogenomic databases                     
   ProteinModelPortal               5871028   5867436      0.30    11   3D structure databases                     
   PseudoCAP                           4564      4558     <0.01    76   Organism-specific databases                
   REBASE                             24476     23791     <0.01    66   Protein family/group databases             
   REPRODUCTION-2DPAGE                   89        88     <0.01    95   2D gel databases                           
   RGD                                24937     24658     <0.01    65   Organism-specific databases                
   Reactome                             140       118     <0.01    93   Enzyme and pathway databases               
   RefSeq                           6605917   6462601      0.34     9   Sequence databases                         
   SGD                                   11        11     <0.01    99   Organism-specific databases                
   SMART                            4268878   3237514      0.22    13   Family and domain databases                
   SMR                               938482    938472      0.05    27   3D structure databases                     
   STRING                           2602356   2602172      0.13    20   Protein-protein interaction databases      
   SUPFAM                           7985170   6593481      0.41     8   Family and domain databases                
   SWISS-2DPAGE                          29        29     <0.01    98   2D gel databases                           
   Siena-2DPAGE                           2         2     <0.01   104   2D gel databases                           
   TAIR                               16584     16504     <0.01    68   Organism-specific databases                
   TCDB                                2496      2484     <0.01    82   Protein family/group databases             
   TIGR                              194627    187572      0.01    44   Genome annotation databases                
   TIGRFAMs                         3905058   3559800      0.20    15   Family and domain databases                
   TubercuList                         2082      2077     <0.01    84   Organism-specific databases                
   UCSC                               54877     54876     <0.01    58   Genome annotation databases                
   UniGene                           482689    450844      0.02    34   Sequence databases                         
   VectorBase                         75570     75062     <0.01    53   Genome annotation databases                
   World-2DPAGE                         931       926     <0.01    87   2D gel databases                           
   WormBase                           39843     39738     <0.01    61   Organism-specific databases                
   Xenbase                            24959     24918     <0.01    64   Organism-specific databases                
   ZFIN                               42665     41877     <0.01    60   Organism-specific databases                
   dictyBase                           8000      7778     <0.01    73   Organism-specific databases                
   eggNOG                           1142816   1142816      0.06    26   Phylogenomic databases                     
   euHCVdb                            75266     75263     <0.01    54   Organism-specific databases                

Number of explicitly cross-referenced databases: 131


5.  AMINO ACID COMPOSITION

   5.1  Composition in percent for the complete database

   Ala (A) 8.59   Gln (Q) 3.91   Leu (L) 9.87   Ser (S) 6.73
   Arg (R) 5.46   Glu (E) 6.17   Lys (K) 5.25   Thr (T) 5.60
   Asn (N) 4.10   Gly (G) 7.10   Met (M) 2.47   Trp (W) 1.31
   Asp (D) 5.30   His (H) 2.21   Phe (F) 4.01   Tyr (Y) 3.03
   Cys (C) 1.28   Ile (I) 5.96   Pro (P) 4.76   Val (V) 6.74

   Asx (B) 0.000  Glx (Z) 0      Xaa (X) 0.04

   

   Legend: gray = aliphatic, red = acidic, green = small hydroxy,
           blue = basic, black = aromatic, white = amide, yellow = sulfur


   5.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Trp, Cys



6.  MISCELLANEOUS STATISTICS

Total number of entries encoded on a Mitochondrion: 604499
Total number of entries encoded on a Plasmid: 259357
Total number of entries encoded on a Plastid: 15095
Total number of entries encoded on a Plastid; Apicoplast: 388
Total number of entries encoded on a Plastid; Chloroplast: 164321
Total number of entries encoded on a Plastid; Cyanelle: 8
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 471