UniProtKB/TrEMBL PROTEIN DATABASE RELEASE 2012_10 STATISTICS


1.  INTRODUCTION

Release 2012_10 of 31-Oct-2012 of UniProtKB/TrEMBL contains 27122814 sequence entries,
comprising 8765290755 amino acids .

1090897 sequences have been added since release 2012_09, the sequence data of
12851 existing entries has been updated and the annotations of
5311792 entries have been revised. This represents an increase of 4%.

Number of fragments: 3619825

Protein existence (PE):              entries      %
1: Evidence at protein level           13916     0.05%
2: Evidence at transcript level       626691     2.31%
3: Inferred from homology            6248716    23.04%
4: Predicted                        20233491    74.60%
5: Uncertain                               0     0.00%

The growth of the database is summarized below.

   



2.  TAXONOMIC ORIGIN

   Total number of species represented in this release of UniProtKB/TrEMBL: 378039

   The first twenty species represent 1620903 sequences:     6 % of the
   total number of entries.


   2.1 Table of the frequency of occurrence of species

        Species represented 1x:15892
                            2x:63253
                            3x:34371
                            4x:22398
                            5x:14231
                            6x:10422
                            7x: 7869
                            8x: 6096
                            9x: 4900
                           10x: 9653
                       11- 20x:25181
                       21- 50x: 8820
                       51-100x: 3314
                         >100x: 8610


   2.2  Table of the most represented species

  ------  ---------  --------------------------------------------
  Number  Frequency  Species
  ------  ---------  --------------------------------------------
       1     488416  Human immunodeficiency virus 1
       2     111051  Homo sapiens (Human)
       3      96975  Oryza sativa subsp. japonica (Rice)
       4      78858  uncultured bacterium
       5      78140  Hepatitis C virus
       6      68945  Macaca mulatta (Rhesus macaque)
       7      61235  Glycine max (Soybean) (Glycine hispida)
       8      58246  Mus musculus (Mouse)
       9      56115  Medicago truncatula (Barrel medic) (Medicago tribuloides)
      10      54321  Danio rerio (Zebrafish) (Brachydanio rerio)
      11      54078  Vitis vinifera (Grape)
      12      53988  Hepatitis B virus (HBV)
      13      50594  Trichomonas vaginalis
      14      49227  Tetraodon nigroviridis (Spotted green pufferfish) (Chelonodon nigroviridis)
      15      48875  Takifugu rubripes (Japanese pufferfish) (Fugu rubripes)
      16      44070  Populus trichocarpa (Western balsam poplar) 
      17      43142  Callithrix jacchus (White-tufted-ear marmoset)
      18      42656  Arabidopsis thaliana (Mouse-ear cress)
      19      42121  Zea mays (Maize)
      20      39850  Paramecium tetraurelia
      21      39776  Oryza sativa subsp. indica (Rice)
      22      35601  Ailuropoda melanoleuca (Giant panda)
      23      35193  Acyrthosiphon pisum (Pea aphid)
      24      34802  Physcomitrella patens subsp. patens (Moss)
      25      34187  Drosophila melanogaster (Fruit fly)
      26      33925  Rattus norvegicus (Rat)
      27      33769  Sorghum bicolor (Sorghum) (Sorghum vulgare)
      28      33262  Selaginella moellendorffii (Spikemoss)
      29      32926  Monodelphis domestica (Gray short-tailed opossum)
      30      32680  Arabidopsis lyrata subsp. lyrata (Lyre-leaved rock-cress)
      31      32339  Oryza brachyantha
      32      32122  Caenorhabditis remanei (Caenorhabditis vulgaris)
      33      32093  Oryza glaberrima (African rice)
      34      31396  Ricinus communis (Castor bean)
      35      30855  Daphnia pulex (Water flea)
      36      30300  Caenorhabditis brenneri (Nematode worm)
      37      30141  Brachypodium distachyon (Purple false brome) (Trachynia distachya)
      38      29816  Amphimedon queenslandica (Sponge)
      39      29448  Strongylocentrotus purpuratus (Purple sea urchin)
      40      29315  Pristionchus pacificus
      41      29173  Branchiostoma floridae (Florida lancelet) (Amphioxus)
      42      29142  Sus scrofa (Pig)
      43      29053  Oikopleura dioica (Tunicate)
      44      28844  Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
      45      28436  Canis familiaris (Dog) (Canis lupus familiaris)
      46      28235  Escherichia coli
      47      28137  Simian immunodeficiency virus (SIV)
      48      28055  Gasterosteus aculeatus (Three-spined stickleback)
      49      27652  Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)
      50      27487  Oreochromis niloticus (Nile tilapia) (Tilapia nilotica)
      51      27086  Gorilla gorilla gorilla (Lowland gorilla)
      52      26931  Ornithorhynchus anatinus (Duckbill platypus)
      53      26766  Gallus gallus (Chicken)
      54      25895  Oryzias latipes (Medaka fish) (Japanese ricefish)
      55      25758  Loxodonta africana (African elephant)
      56      25721  Phytophthora sojae (strain P6497) (Soybean stem and root rot agent) 
      57      25438  Caenorhabditis japonica
      58      25391  Bos taurus (Bovine)
      59      25069  Oryctolagus cuniculus (Rabbit)
      60      24873  Nematostella vectensis (Starlet sea anemone)
      61      24643  Tetrahymena thermophila (strain SB210)
      62      24199  Cricetulus griseus (Chinese hamster) (Cricetulus barabensis griseus)
      63      24165  Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii)
      64      24054  Equus caballus (Horse)
      65      23565  Oxytricha trifallax
      66      23224  Nomascus leucogenys (Northern white-cheeked gibbon) (Hylobates leucogenys)
      67      23115  Perkinsus marinus (strain ATCC 50983 / TXsc)
      68      22989  Pan troglodytes (Chimpanzee)
      69      22535  Sarcophilus harrisii (Tasmanian devil) (Sarcophilus laniarius)
      70      22460  Caenorhabditis elegans
      71      22163  gut metagenome
      72      21821  Latimeria chalumnae (West Indian ocean coelacanth)
      73      21699  Hordeum vulgare var. distichum (Two-rowed barley)
      74      21546  Heterocephalus glaber (Naked mole rat)
      75      21339  Caenorhabditis briggsae
      76      21086  Ixodes scapularis (Black-legged tick) (Deer tick)
      77      20853  Myotis lucifugus (Little brown bat)
      78      20130  Otolemur garnettii (Small-eared galago) (Garnett's greater bushbaby)
      79      20114  Ciona savignyi (Pacific transparent sea squirt)
      80      20067  Cavia porcellus (Guinea pig)
      81      19972  Spermophilus tridecemlineatus (Thirteen-lined ground squirrel) 
      82      19655  Taeniopygia guttata (Zebra finch) (Poephila guttata)
      83      19438  Wuchereria bancrofti
      84      19319  Toxoplasma gondii
      85      19200  Trypanosoma cruzi (strain CL Brener)
      86      19152  Anolis carolinensis (Green anole) (American chameleon)
      87      19035  Ciona intestinalis (Transparent sea squirt) (Ascidia intestinalis)
      88      18916  Culex quinquefasciatus (Southern house mosquito) (Culex pungens)
      89      18771  mine drainage metagenome
      90      18710  Drosophila simulans (Fruit fly)
      91      18121  Atta cephalotes (Leafcutter ant)
      92      17839  Laccaria bicolor (strain S238N-H82 / ATCC MYA-4686) (Bicoloured deceiver) 
      93      17784  Fusarium oxysporum (strain Fo5176) (Panama disease fungus)
      94      17599  Phytophthora infestans (strain T30-4) (Potato late blight fungus)
      95      17374  Aedes aegypti (Yellowfever mosquito) (Culex aegypti)
      96      17373  Bombyx mori (Silk moth)
      97      17031  Drosophila yakuba (Fruit fly)
      98      17009  Tribolium castaneum (Red flour beetle)
      99      16946  Rhizopus delemar (strain RA 99-880 / ATCC MYA-4621 / FGSC 9543 / NRRL 43880)  
     100      16871  Meleagris gallopavo (Common turkey)
     101      16714  Drosophila persimilis (Fruit fly)
     102      16643  Fusarium oxysporum f. sp. lycopersici  
     103      16475  Drosophila pseudoobscura pseudoobscura (Fruit fly)
     104      16426  Ectocarpus siliculosus (Brown alga)
     105      16345  Botryotinia fuckeliana (strain T4) (Noble rot fungus) (Botrytis cinerea)
     106      16304  Danaus plexippus (Monarch butterfly)
     107      16263  Trichinella spiralis (Trichina worm)
     108      16239  Colletotrichum higginsianum
     109      16237  Melampsora larici-populina (strain 98AG31 / pathotype 3-4-7) 
     110      16190  Drosophila sechellia (Fruit fly)
     111      16140  Schistosoma japonicum (Blood fluke)
     112      15793  Puccinia graminis f. sp. tritici (strain CRL 75-36-700-3 / race SCCL) 
     113      15769  Hepatitis C virus subtype 1b
     114      15762  Phaeosphaeria nodorum (strain SN15 / ATCC MYA-4574 / FGSC 10173)  
     115      15726  Plasmodium falciparum
     116      15715  Naegleria gruberi (Amoeba)
     117      15653  Nectria haematococca (strain 77-13-4 / ATCC MYA-4622 / FGSC 9596 / MPVI) 
     118      15630  Anopheles gambiae (African malaria mosquito)
     119      15557  Phytophthora ramorum (Sudden oak death agent)
     120      15419  Drosophila willistoni (Fruit fly)
     121      15354  Loa loa (Eye worm) (Filaria loa)
     122      15142  Drosophila ananassae (Fruit fly)
     123      15045  Hepatitis C virus subtype 1a
     124      15036  Harpegnathos saltator (Jerdon's jumping ant)
     125      14922  Drosophila erecta (Fruit fly)
     126      14850  Chlamydomonas reinhardtii (Chlamydomonas smithii)
     127      14797  Camponotus floridanus (Florida carpenter ant)
     128      14788  Drosophila mojavensis (Fruit fly)
     129      14700  Drosophila virilis (Fruit fly)
     130      14697  Plasmodium chabaudi
     131      14650  Drosophila grimshawi (Fruit fly) (Idiomyia grimshawi)
     132      14610  Gaeumannomyces graminis var. tritici (strain R3-111a-1) 
     133      14417  Volvox carteri (Green alga)
     134      14339  Serpula lacrymans var. lacrymans (strain S7.3) (Dry rot fungus)
     135      14336  Solenopsis invicta (Red imported fire ant) (Solenopsis wagneri)
     136      14236  Sclerotinia sclerotiorum (strain ATCC 18683 / 1980 / Ss-1) (White mold) 
     137      13966  Acromyrmex echinatior (Panamanian leafcutter ant) 
     138      13863  Clonorchis sinensis (Chinese liver fluke)
     139      13766  Aspergillus niger (strain CBS 513.88 / FGSC A1513)
     140      13648  Moniliophthora perniciosa (strain FA553 / isolate CP02)  
     141      13329  Aspergillus flavus 
     142      13266  Coprinopsis cinerea (strain Okayama-7 / 130 / ATCC MYA-4618 / FGSC 9003)  
     143      13180  Mustela putorius furo (European domestic ferret) (Mustela furo)
     144      13121  Schizophyllum commune (strain H4-8 / FGSC 9210) (Split gill fungus)
     145      13060  Trypanosoma cruzi
     146      13043  Magnaporthe oryzae (strain 70-15 / ATCC MYA-4617 / FGSC 8958)  
     147      12983  Albugo laibachii Nc14
     148      12950  Stigmatella aurantiaca (strain DW4/3-1)
     149      12936  Talaromyces stipitatus (strain ATCC 10500 / CBS 375.48 / QM 6759 / NRRL 1006) 
     150      12906  Gibberella zeae (strain PH-1 / ATCC MYA-4620 / FGSC 9075 / NRRL 31084)  
     151      12722  Serpula lacrymans var. lacrymans (strain S7.9) (Dry rot fungus)
     152      12696  Trypanosoma congolense (strain IL3000)
     153      12682  Penicillium chrysogenum (strain ATCC 28089 / DSM 1075 / Wisconsin 54-1255) 
     154      12650  Schistosoma mansoni (Blood fluke)
     155      12606  Xenopus laevis (African clawed frog)
     156      12549  Ralstonia solanacearum (Pseudomonas solanacearum)
     157      12446  Leptosphaeria maculans (strain JN3 / isolate v23.1.3 / race Av1-4-5-6-7-8)  
     158      12440  Polysphondylium pallidum (Cellular slime mold)
     159      12389  Hypocrea virens (strain Gv29-8 / FGSC 10586) (Gliocladium virens) 
     160      12352  Dictyostelium purpureum (Slime mold)
     161      12322  Rabies virus
     162      12152  Dictyostelium fasciculatum (strain SH3) (Slime mold)
     163      11994  Pyrenophora tritici-repentis (strain Pt-1C-BFP) (Wheat tan spot fungus) 
     164      11993  Colletotrichum graminicola (strain M1.001 / M2 / FGSC 10212)  
     165      11945  Emericella nidulans  
     166      11912  Apis mellifera (Honeybee)
     167      11815  Hypocrea atroviridis (strain ATCC 20476 / IMI 206040) (Trichoderma atroviride)
     168      11780  Piriformospora indica (strain DSM 11827)
     169      11752  Chondrocladia sp. SMF
   2.3  Taxonomic distribution of the sequences

   

   Kingdom        sequences (% of the database)
    Archaea          391574 (  1%)
    Bacteria       18674570 ( 69%)
    Eukaryota       6488502 ( 24%)
    Viruses         1503983 (  6%)
    Other             64184 ( <1%)



   Within Eukaryota:

   

    Category            sequences (% of Eukaryota) (% of the complete database)
     Human                 111087 (  2%)           (  0%)
     Other Mammalia        847395 ( 13%)           (  3%)
     Other Vertebrata      717252 ( 11%)           (  3%)
     Viridiplantae        1219371 ( 19%)           (  4%)
     Fungi                1424221 ( 22%)           (  5%)
     Insecta               757822 ( 12%)           (  3%)
     Nematoda              242800 (  4%)           (  1%)
     Other                1168554 ( 18%)           (  4%)



3.  SEQUENCE SIZE

   Repartition of the sequences by size (excluding fragments)

               From   To  Number             From   To   Number
                  1-  50  684559             1001-1100   156472
                 51- 100 2306699             1101-1200   110218
                101- 150 2579663             1201-1300    77299
                151- 200 2502070             1301-1400    49469
                201- 250 2520366             1401-1500    40066
                251- 300 2443655             1501-1600    27794
                301- 350 2218299             1601-1700    21044
                351- 400 1681586             1701-1800    16064
                401- 450 1449063             1801-1900    13368
                451- 500 1189630             1901-2000    11428
                501- 550  788148             2001-2100     9002
                551- 600  608122             2101-2200     9154
                601- 650  443516             2201-2300     7211
                651- 700  347775             2301-2400     5717
                701- 750  293924             2401-2500     4894
                751- 800  260227             >2500        39946
                801- 850  197705
                851- 900  176605
                901- 950  122153
                951-1000   90078

   


   The average sequence length in UniProtKB/TrEMBL is   323 amino acids.

   The shortest sequence is G0XMK1_9MYRT:     1 amino acids.
   The longest sequence is  Q3ASY8_CHLCH: 36805 amino acids.



4.  STATISTICS FOR SOME LINE TYPES

The following table summarizes the total number of some UniProtKB/TrEMBL lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.

                                   Total    Number of  Average
Line type / subtype                number   entries    per entry
---------------------------------  -------- ---------  ---------

References (RL)                    33172901                1.22                                                    
   Submitted to EMBL/GenBank/DDBJ  18685090  17070687      0.69                                                    
   Journal                         13125207  12349216      0.48                                                    
   Submitted to other databases     1346264   1344575      0.05                                                    
   Thesis                              9837      9779     <0.01                                                    
   Book citation                       6483      6434     <0.01                                                    
   Unpublished observations              19        19     <0.01                                                    
   Patent                                 1         1     <0.01                                                    

Total number of distinct authors cited in UniProtKB/TrEMBL: 449953


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Comments (CC)                      32399711                1.19                                                    
   CATALYTIC ACTIVITY               2831479   2586198      0.10     4                                              
   CAUTION                         11808561  11808508      0.44     1                                              
   COFACTOR                         1062294    985458      0.04     8                                              
   DOMAIN                            100372     96351     <0.01     9                                              
   FUNCTION                         3098888   2894930      0.11     3                                              
   INTERACTION                          686       686     <0.01    11                                              
   MISCELLANEOUS                      67560     67167     <0.01    10                                              
   PATHWAY                          1408182   1280946      0.05     7                                              
   SIMILARITY                       8034554   6974978      0.30     2                                              
   SUBCELLULAR LOCATION             2488572   2374638      0.09     5                                              
   SUBUNIT                          1498563   1485193      0.06     6                                              

Total number of comment topics: 11


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank
---------------------------------  -------- ---------  ---------  ----

Features (FT)                       7108060                0.26                                                    
   CHAIN                             765954    635145      0.03     2                                              
   NON_TER                          5739360   3620443      0.21     1                                              
   SIGNAL                            601917    599297      0.02     3                                              
   TRANSIT                              829       828     <0.01     4                                              

Total number of feature keys: 4


                                   Total    Number of  Average
Line type / subtype                number   entries    per entry  Rank  Category
---------------------------------  -------- ---------  ---------  ----  -------------------------------------------
Cross-references (DR)             300168669               11.07                                                    
   AGD                                 2525      2525     <0.01    84   Organism-specific databases                
   ANU-2DPAGE                            52        52     <0.01    99   2D gel databases                           
   Allergome                           2912      2299     <0.01    80   Protein family/group databases             
   ArachnoServer                         66        66     <0.01    98   Organism-specific databases                
   ArrayExpress                       87204     87136     <0.01    52   Gene expression databases                  
   BRENDA                              2689      2659     <0.01    82   Enzyme and pathway databases               
   Bgee                              127015    127000     <0.01    47   Gene expression databases                  
   BioCyc                            670934    656497      0.02    31   Enzyme and pathway databases               
   CAZy                               74148     69669     <0.01    56   Protein family/group databases             
   CGD                                 7078      7078     <0.01    75   Organism-specific databases                
   COMPLUYEAST-2DPAGE                     5         5     <0.01   105   2D gel databases                           
   CTD                               309842    308524      0.01    40   Organism-specific databases                
   ConoServer                           160       160     <0.01    94   Organism-specific databases                
   DIP                                 2747      2742     <0.01    81   Protein-protein interaction databases      
   DNASU                              43826     43501     <0.01    60   Protocols and materials databases          
   EMBL                            29672874  26328146      1.09     3   Sequence databases                         
   Ensembl                           949831    934607      0.04    28   Genome annotation databases                
   EnsemblBacteria                   834903    800816      0.03    29   Genome annotation databases                
   EnsemblFungi                      262827    261342      0.01    41   Genome annotation databases                
   EnsemblMetazoa                    539589    527361      0.02    34   Genome annotation databases                
   EnsemblPlants                     333072    319758      0.01    37   Genome annotation databases                
   EnsemblProtists                   111375    110044     <0.01    49   Genome annotation databases                
   EuPathDB                          178957    178954      0.01    45   Organism-specific databases                
   EvolutionaryTrace                   8186      8186     <0.01    73   Other                                      
   FlyBase                           195205    193661      0.01    43   Organism-specific databases                
   GO                              49355140  16013400      1.82     2   Ontologies                                 
   Gene3D                          11593706   9225342      0.43     6   Family and domain databases                
   GeneID                           8410396   8214479      0.31     9   Genome annotation databases                
   GeneTree                          814568    814508      0.03    30   Phylogenomic databases                     
   Genevestigator                     93776     93769     <0.01    51   Gene expression databases                  
   GenoList                           14735     14462     <0.01    71   Organism-specific databases                
   GenomeRNAi                         21817     21817     <0.01    66   Other                                      
   GenomeReviews                    4252471   4153705      0.16    15   Genome annotation databases                
   Gramene                            67631     67631     <0.01    57   Organism-specific databases                
   H-InvDB                              627       479     <0.01    90   Organism-specific databases                
   HAMAP                            2644452   2611496      0.10    23   Family and domain databases                
   HGNC                               46528     46450     <0.01    59   Organism-specific databases                
   HOGENOM                          3659451   3659424      0.13    19   Phylogenomic databases                     
   HOVERGEN                          311707    311697      0.01    38   Phylogenomic databases                     
   HSSP                              250825    250599      0.01    42   3D structure databases                     
   IPI                               310593    310367      0.01    39   Sequence databases                         
   InParanoid                        190116    189979      0.01    44   Phylogenomic databases                     
   IntAct                             16846     16845     <0.01    69   Protein-protein interaction databases      
   InterPro                        58189174  20947720      2.15     1   Family and domain databases                
   KEGG                             7569617   7403034      0.28    11   Genome annotation databases                
   KO                               2991798   2978485      0.11    20   Phylogenomic databases                     
   LegioList                           5138      5110     <0.01    76   Organism-specific databases                
   Leproma                             1272      1270     <0.01    87   Organism-specific databases                
   MEROPS                             81316     81316     <0.01    53   Protein family/group databases             
   MGI                                34702     34382     <0.01    62   Organism-specific databases                
   MINT                                8603      8603     <0.01    72   Protein-protein interaction databases      
   NextBio                           104532    104520     <0.01    50   Other                                      
   OMA                              3893345   3893316      0.14    17   Phylogenomic databases                     
   OrthoDB                           557152    557151      0.02    32   Phylogenomic databases                     
   PANTHER                          3745350   3552954      0.14    18   Family and domain databases                
   PATRIC                           8331559   8331471      0.31    10   Genome annotation databases                
   PDB                                17868     10076     <0.01    67   3D structure databases                     
   PDBsum                             17741      9970     <0.01    68   3D structure databases                     
   PHCI-2DPAGE                           99        99     <0.01    96   2D gel databases                           
   PIR                               173875    141034      0.01    46   Sequence databases                         
   PIRSF                            2280112   2279471      0.08    25   Family and domain databases                
   PMAP-CutDB                           214       214     <0.01    93   Other                                      
   PMMA-2DPAGE                            2         2     <0.01   106   2D gel databases                           
   PRIDE                             383938    383931      0.01    36   Proteomic databases                        
   PRINTS                           4166759   3695740      0.15    16   Family and domain databases                
   PROSITE                         13561539   8964480      0.50     5   Family and domain databases                
   Pathway_Interaction_DB                11         9     <0.01   104   Enzyme and pathway databases               
   PeptideAtlas                         144       144     <0.01    95   Proteomic databases                        
   PeroxiBase                          2555      2547     <0.01    83   Protein family/group databases             
   Pfam                            26273783  19376848      0.97     4   Family and domain databases                
   PharmGKB                            4393      4393     <0.01    78   Organism-specific databases                
   PhosphoSite                         1191      1191     <0.01    88   PTM databases                              
   PhylomeDB                         116147    116147     <0.01    48   Phylogenomic databases                     
   PomBase                               40        27     <0.01   100   Organism-specific databases                
   PptaseDB                              36        34     <0.01   101   Protein family/group databases             
   ProDom                            529080    505019      0.02    35   Family and domain databases                
   ProMEX                               277       277     <0.01    91   Proteomic databases                        
   ProtClustDB                      2723249   2723249      0.10    22   Phylogenomic databases                     
   ProteinModelPortal               7047494   7046475      0.26    12   3D structure databases                     
   PseudoCAP                           4539      4533     <0.01    77   Organism-specific databases                
   REBASE                             31128     31127     <0.01    63   Protein family/group databases             
   REPRODUCTION-2DPAGE                   84        83     <0.01    97   2D gel databases                           
   RGD                                24796     24482     <0.01    65   Organism-specific databases                
   Reactome                             216       179     <0.01    92   Enzyme and pathway databases               
   RefSeq                           8436936   8215586      0.31     8   Sequence databases                         
   SGD                                   11        11     <0.01   103   Organism-specific databases                
   SMART                            6041542   4570486      0.22    14   Family and domain databases                
   SMR                              1538794   1538794      0.06    26   3D structure databases                     
   STRING                           2591201   2591047      0.10    24   Protein-protein interaction databases      
   SUPFAM                          11136105   9150850      0.41     7   Family and domain databases                
   SWISS-2DPAGE                          28        28     <0.01   102   2D gel databases                           
   Siena-2DPAGE                           2         2     <0.01   107   2D gel databases                           
   TAIR                               15933     15854     <0.01    70   Organism-specific databases                
   TCDB                                2399      2387     <0.01    85   Protein family/group databases             
   TIGRFAMs                         6057132   5524974      0.22    13   Family and domain databases                
   TubercuList                         2027      2022     <0.01    86   Organism-specific databases                
   UCSC                               64378     64362     <0.01    58   Genome annotation databases                
   UniGene                           555426    521702      0.02    33   Sequence databases                         
   UniPathway                       1373821   1279802      0.05    27   Enzyme and pathway databases               
   VectorBase                         78249     77732     <0.01    54   Genome annotation databases                
   World-2DPAGE                         676       671     <0.01    89   2D gel databases                           
   WormBase                           42214     42095     <0.01    61   Organism-specific databases                
   Xenbase                            25673     25572     <0.01    64   Organism-specific databases                
   ZFIN                                3436      3436     <0.01    79   Organism-specific databases                
   dictyBase                           7996      7774     <0.01    74   Organism-specific databases                
   eggNOG                           2771148   2771147      0.10    21   Phylogenomic databases                     
   euHCVdb                            75267     75264     <0.01    55   Organism-specific databases                

Number of explicitly cross-referenced databases: 135


5.  AMINO ACID COMPOSITION

   5.1  Composition in percent for the complete database

   Ala (A) 8.62   Gln (Q) 3.95   Leu (L) 9.91   Ser (S) 6.65
   Arg (R) 5.43   Glu (E) 6.19   Lys (K) 5.30   Thr (T) 5.57
   Asn (N) 4.11   Gly (G) 7.08   Met (M) 2.46   Trp (W) 1.29
   Asp (D) 5.31   His (H) 2.21   Phe (F) 4.02   Tyr (Y) 3.05
   Cys (C) 1.25   Ile (I) 6.00   Pro (P) 4.68   Val (V) 6.77

   Asx (B) 0.000  Glx (Z) 0      Xaa (X) 0.03

   

   Legend: gray = aliphatic, red = acidic, green = small hydroxy,
           blue = basic, black = aromatic, white = amide, yellow = sulfur


   5.2  Classification of the amino acids by their frequency

   Leu, Ala, Gly, Val, Ser, Glu, Ile, Thr, Arg, Asp, Lys, Pro, Asn, Phe,
   Gln, Tyr, Met, His, Trp, Cys



6.  MISCELLANEOUS STATISTICS

Total number of entries encoded on a Mitochondrion: 562331
Total number of entries encoded on a Plasmid: 302793
Total number of entries encoded on a Plastid: 22600
Total number of entries encoded on a Plastid; Apicoplast: 701
Total number of entries encoded on a Plastid; Chloroplast: 205162
Total number of entries encoded on a Plastid; Cyanelle: 8
Total number of entries encoded on a Plastid; Non-photosynthetic plastid: 893