UniProtKB/Swiss-Prot protein knowledgebase release 2012_03 statistics
1. INTRODUCTION
Release 2012_03 of 21-Mar-12 of UniProtKB/Swiss-Prot contains 535248 sequence entries,
comprising 189901164 amino acids abstracted from 208076 references.
570 sequences have been added since release 2012_02, the sequence data of
127 existing entries has been updated and the annotations of
121706 entries have been revised.
Number of fragments: 8994
Number of additional sequences produced by alternative splicing, initiation or promoter usage, or ribosomal frameshifting: 31176
Protein existence (PE): entries %
1: Evidence at protein level 74284 13.9%
2: Evidence at transcript level 67762 12.7%
3: Inferred from homology 376894 70.4%
4: Predicted 14424 2.7%
5: Uncertain 1884 0.4%
The growth of the database is summarized below.
2. TAXONOMIC ORIGIN
Total number of species represented in this release of UniProtKB/Swiss-Prot: 12759
The first twenty species represent 111504 sequences: 20.8 % of the total
number of entries.
2.1 Table of the frequency of occurrence of species
Species represented 1x: 5376
2x: 1859
3x: 961
4x: 629
5x: 461
6x: 374
7x: 274
8x: 218
9x: 197
10x: 112
11- 20x: 657
21- 50x: 393
51-100x: 209
>100x: 1039
2.2 Table of the most represented species
------ --------- --------------------------------------------
Number Frequency Species
------ --------- --------------------------------------------
1 20254 Homo sapiens (Human)
2 16513 Mus musculus (Mouse)
3 11072 Arabidopsis thaliana (Mouse-ear cress)
4 7710 Rattus norvegicus (Rat)
5 6619 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker's yeast)
6 5898 Bos taurus (Bovine)
7 4982 Schizosaccharomyces pombe (strain 972 / ATCC 24843) (Fission yeast)
8 4431 Escherichia coli (strain K12)
9 4244 Bacillus subtilis
10 4124 Dictyostelium discoideum (Slime mold)
11 3347 Caenorhabditis elegans
12 3337 Xenopus laevis (African clawed frog)
13 3146 Drosophila melanogaster (Fruit fly)
14 2839 Oryza sativa subsp. japonica (Rice)
15 2799 Danio rerio (Zebrafish) (Brachydanio rerio)
16 2235 Gallus gallus (Chicken)
17 2217 Pongo abelii (Sumatran orangutan)
18 2011 Escherichia coli O157:H7
19 1929 Mycobacterium tuberculosis
20 1797 Salmonella typhimurium
21 1787 Methanocaldococcus jannaschii
22 1707 Haemophilus influenzae (strain ATCC 51907 / DSM 11121 / KW20 / Rd)
23 1678 Shigella flexneri
24 1675 Escherichia coli O6
25 1634 Xenopus tropicalis (Western clawed frog) (Silurana tropicalis)
26 1407 Sus scrofa (Pig)
27 1346 Salmonella typhi
28 1244 Mycobacterium bovis
29 1222 Pseudomonas aeruginosa (strain ATCC 15692 / PAO1 / 1C / PRS 101 / LMG 12228)
30 1170 Macaca fascicularis (Crab-eating macaque) (Cynomolgus monkey)
31 1029 Synechocystis sp. (strain PCC 6803 / Kazusa)
32 1015 Yersinia pestis
33 1002 Archaeoglobus fulgidus
34 957 Vibrio cholerae
35 930 Salmonella paratyphi A
36 926 Ashbya gossypii (strain ATCC 10895 / CBS 109.51 / FGSC 9923 / NRRL Y-1056)
37 925 Staphylococcus aureus (strain N315)
38 923 Staphylococcus aureus (strain Mu50 / ATCC 700699)
39 909 Acanthamoeba polyphaga mimivirus (APMV)
40 901 Kluyveromyces lactis
41 899 Staphylococcus aureus (strain COL)
42 895 Staphylococcus aureus (strain MW2)
43 889 Staphylococcus aureus (strain MSSA476)
44 888 Escherichia coli O6:K15:H31 (strain 536 / UPEC)
45 888 Staphylococcus aureus (strain MRSA252)
46 886 Oryctolagus cuniculus (Rabbit)
47 882 Salmonella choleraesuis
48 878 Shigella sonnei (strain Ss046)
49 868 Rhizobium meliloti (strain 1021) (Ensifer meliloti) (Sinorhizobium meliloti)
50 864 Yersinia pseudotuberculosis
51 861 Candida glabrata
52 841 Escherichia coli O9:H4 (strain HS)
53 834 Escherichia coli O139:H28 (strain E24377A / ETEC)
54 832 Neurospora crassa
55 829 Shigella boydii serotype 4 (strain Sb227)
56 824 Escherichia coli (strain UTI89 / UPEC)
57 819 Shigella dysenteriae serotype 1 (strain Sd197)
58 819 Escherichia coli (strain ATCC 8739 / DSM 1576 / Crooks)
59 801 Canis familiaris (Dog) (Canis lupus familiaris)
60 795 Vibrio parahaemolyticus
61 791 Escherichia coli (strain SMS-3-5 / SECEC)
62 784 Erwinia carotovora subsp. atroseptica (Pectobacterium atrosepticum)
63 779 Aquifex aeolicus (strain VF5)
64 774 Pasteurella multocida (strain Pm70)
65 771 Escherichia coli (strain K12 / DH10B)
66 766 Emericella nidulans
67 765 Escherichia coli O127:H6 (strain E2348/69 / EPEC)
68 765 Escherichia coli (strain K12 / MC4100 / BW2952)
69 764 Escherichia coli O17:K52:H18 (strain UMN026 / ExPEC)
70 762 Escherichia coli (strain 55989 / EAEC)
71 761 Escherichia coli O8 (strain IAI1)
72 760 Shigella flexneri serotype 5b (strain 8401)
73 759 Staphylococcus epidermidis (strain ATCC 35984 / RP62A)
74 758 Streptomyces coelicolor
75 757 Staphylococcus epidermidis (strain ATCC 12228)
76 756 Escherichia coli (strain SE11)
77 756 Escherichia coli O45:K1 (strain S88 / ExPEC)
78 753 Escherichia coli O7:K1 (strain IAI39 / ExPEC)
79 748 Escherichia coli O157:H7 (strain EC4115 / EHEC)
80 744 Photorhabdus luminescens subsp. laumondii (strain TT01)
81 737 Staphylococcus aureus (strain NCTC 8325)
82 735 Yersinia enterocolitica serotype O:8 / biotype 1B (strain 8081)
83 734 Bacillus halodurans
84 733 Bacillus anthracis
85 733 Vibrio vulnificus
86 731 Escherichia coli O81 (strain ED1a)
87 721 Salmonella enteritidis PT4 (strain P125109)
88 717 Vibrio vulnificus (strain YJ016)
89 716 Salmonella paratyphi B (strain ATCC BAA-1250 / SPB7)
90 715 Yersinia pestis bv. Antiqua (strain Nepal516)
91 714 Salmonella paratyphi A (strain AKU_12601)
92 713 Enterobacter sp. (strain 638)
93 713 Salmonella agona (strain SL483)
94 713 Escherichia coli O1:K1 / APEC
95 713 Salmonella newport (strain SL254)
96 713 Yersinia pseudotuberculosis serotype O:1b (strain IP 31758)
97 712 Klebsiella pneumoniae subsp. pneumoniae (strain ATCC 700721 / MGH 78578)
98 712 Salmonella schwarzengrund (strain CVM19633)
99 711 Yersinia pestis bv. Antiqua (strain Antiqua)
100 710 Salmonella heidelberg (strain SL476)
101 702 Salmonella dublin (strain CT_02021853)
102 698 Shigella boydii serotype 18 (strain CDC 3083-94 / BS512)
103 696 Klebsiella pneumoniae (strain 342)
104 695 Escherichia fergusonii (strain ATCC 35469 / DSM 13698 / CDC 0568-73)
105 692 Candida albicans (strain SC5314 / ATCC MYA-2876) (Yeast)
106 689 Zea mays (Maize)
107 687 Mycoplasma pneumoniae (strain ATCC 29342 / M129)
108 687 Pan troglodytes (Chimpanzee)
109 687 Nostoc sp. (strain PCC 7120 / UTEX 2576)
110 683 Salmonella gallinarum (strain 287/91 / NCTC 13346)
111 678 Citrobacter koseri (strain ATCC BAA-895 / CDC 4225-83 / SGSC4696)
112 675 Pseudomonas putida (strain KT2440)
113 675 Pseudomonas syringae pv. tomato (strain DC3000)
114 669 Serratia proteamaculans (strain 568)
115 668 Mycobacterium leprae
116 667 Yersinia pestis (strain Pestoides F)
117 666 Staphylococcus aureus (strain USA300)
118 658 Rhizobium sp. (strain NGR234)
119 657 Bradyrhizobium japonicum
120 653 Debaryomyces hansenii
121 651 Bacillus cereus (strain ATCC 14579 / DSM 31)
122 643 Escherichia coli
123 643 Staphylococcus aureus (strain bovine RF122 / ET3-1)
124 642 Salmonella arizonae (strain ATCC BAA-731 / CDC346-86 / RSK2980)
125 642 Yarrowia lipolytica (strain CLIB 122 / E 150) (Yeast) (Candida lipolytica)
126 638 Yersinia pseudotuberculosis serotype O:3 (strain YPIII)
127 635 Neosartorya fumigata (strain ATCC MYA-4609 / Af293 / CBS 101355 / FGSC A1100)
128 634 Yersinia pseudotuberculosis serotype IB (strain PB1/+)
129 630 Agrobacterium tumefaciens (strain C58 / ATCC 33970)
130 629 Shewanella oneidensis
131 622 Cronobacter sakazakii (strain ATCC BAA-894) (Enterobacter sakazakii)
132 615 Treponema pallidum (strain Nichols)
133 612 Staphylococcus haemolyticus (strain JCSC1435)
134 610 Methanobacterium thermoautotrophicum (strain Delta H)
135 605 Rhizobium loti (strain MAFF303099) (Mesorhizobium loti)
136 605 Listeria monocytogenes
137 602 Xanthomonas campestris pv. campestris
138 602 Photobacterium profundum (Photobacterium sp. (strain SS9))
139 602 Staphylococcus saprophyticus subsp. saprophyticus
140 601 Salmonella paratyphi C (strain RKS4594)
141 601 Ralstonia solanacearum (strain GMI1000) (Pseudomonas solanacearum)
142 600 Yersinia pestis bv. Antiqua (strain Angola)
143 594 Oryza sativa subsp. indica (Rice)
144 590 Bacillus cereus (strain ATCC 10987)
145 590 Listeria innocua
146 589 Pectobacterium carotovorum subsp. carotovorum (strain PC1)
147 586 Rickettsia prowazekii (strain Madrid E)
148 576 Brucella suis biovar 1 (strain 1330)
149 574 Neisseria meningitidis serogroup B (strain MC58)
150 572 Brucella melitensis biotype 1 (strain 16M / ATCC 23456 / NCTC 10094)
151 572 Helicobacter pylori (strain ATCC 700392 / 26695) (Campylobacter pylori)
152 572 Buchnera aphidicola subsp. Acyrthosiphon pisum (strain APS)
153 567 Bacillus thuringiensis subsp. konkukian (strain 97-27)
154 565 Helicobacter pylori (strain J99) (Campylobacter pylori J99)
155 565 Pseudomonas syringae pv. syringae (strain B728a)
156 564 Caulobacter crescentus (Caulobacter vibrioides)
157 564 Caenorhabditis briggsae
158 562 Bacillus licheniformis (strain DSM 13 / ATCC 14580)
159 562 Vibrio fischeri (strain ATCC 700601 / ES114)
160 562 Buchnera aphidicola subsp. Schizaphis graminum (strain Sg)
161 560 Bacillus cereus (strain ZK / E33L)
162 559 Clostridium acetobutylicum
163 558 Pseudomonas aeruginosa (strain UCBPP-PA14)
164 556 Xanthomonas axonopodis pv. citri (Citrus canker)
165 552 Neisseria meningitidis serogroup A / serotype 4A (strain Z2491)
166 552 Pseudomonas fluorescens (strain Pf0-1)
167 551 Oceanobacillus iheyensis (strain DSM 14371 / JCM 11309 / KCTC 3954 / HTE831)
168 546 Pseudomonas fluorescens (strain Pf-5 / ATCC BAA-477)
169 544 Pseudomonas syringae pv. phaseolicola (strain 1448A / Race 6)
170 532 Lactococcus lactis subsp. lactis (strain IL1403) (Streptococcus lactis)
171 531 Erwinia tasmaniensis (strain DSM 17950 / Et1/99)
172 529 Sodalis glossinidius (strain morsitans)
173 529 Listeria monocytogenes serotype 4b (strain F2365)
174 527 Streptococcus pneumoniae
175 524 Thermotoga maritima
176 522 Xylella fastidiosa
177 521 Bordetella bronchiseptica (strain ATCC BAA-588 / NCTC 13252 / RB50)
178 515 Bordetella pertussis
179 514 Chromobacterium violaceum
180 512 Xylella fastidiosa (strain Temecula1 / ATCC 700964)
181 511 Pseudomonas aeruginosa (strain PA7)
182 511 Vibrio cholerae serotype O1 (strain ATCC 39541 / Ogawa 395 / O395)
183 510 Haemophilus ducreyi (strain 35000HP / ATCC 700724)
184 509 Bordetella parapertussis
185 507 Buchnera aphidicola subsp. Baizongia pistaciae (strain Bp)
186 507 Geobacillus kaustophilus (strain HTA426)
187 506 Staphylococcus aureus (strain Newman)
188 501 Deinococcus radiodurans
189 500 Pseudomonas entomophila (strain L48)
190 499 Corynebacterium glutamicum (Brevibacterium flavum)
191 499 Brucella abortus biovar 1 (strain 9-941)
192 497 Rickettsia conorii (strain ATCC VR-613 / Malish 7)
193 496 Bacillus clausii (strain KSM-K16)
194 495 Haemophilus influenzae (strain 86-028NP)
195 494 Burkholderia pseudomallei (Pseudomonas pseudomallei)
196 494 Streptomyces avermitilis
197 493 Proteus mirabilis (strain HI4320)
198 492 Bacillus amyloliquefaciens (strain FZB42)
199 491 Xanthomonas campestris pv. campestris (strain 8004)
200 490 Vibrio harveyi (strain ATCC BAA-1116 / BB120)
201 490 Clostridium perfringens
202 487 Methanosarcina acetivorans (strain ATCC 35395 / DSM 2834 / JCM 12185 / C2A)
203 487 Shewanella sp. (strain MR-7)
204 485 Mannheimia succiniciproducens (strain MBEL55E)
205 484 Pseudomonas aeruginosa (strain LESB58)
206 484 Staphylococcus aureus (strain Mu3 / ATCC 700698)
207 484 Shewanella sp. (strain MR-4)
208 483 Mycoplasma genitalium (strain ATCC 33530 / G-37 / NCTC 10195)
209 481 Thermosynechococcus elongatus (strain BP-1)
210 480 Acinetobacter sp. (strain ADP1)
211 479 Enterococcus faecalis (Streptococcus faecalis)
212 476 Synechococcus elongatus (strain PCC 7942) (Anacystis nidulans R2)
213 475 Pyrococcus horikoshii
214 474 Burkholderia sp. (strain 383) (Burkholderia cepacia
215 474 Pseudomonas putida (strain F1 / ATCC 700007)
216 473 Brucella abortus (strain 2308)
217 473 Aspergillus oryzae (strain ATCC 42149 / RIB 40) (Yellow koji mold)
218 466 Xanthomonas campestris pv. vesicatoria (strain 85-10)
219 466 Shewanella frigidimarina (strain NCIMB 400)
220 466 Pseudomonas putida (strain GB-1)
221 466 Halobacterium salinarium (strain ATCC 700922 / JCM 11081 / NRC-1)
222 465 Pyrococcus abyssi (strain GE5 / Orsay)
223 465 Methanosarcina mazei
224 464 Streptococcus pneumoniae (strain ATCC BAA-255 / R6)
225 464 Aeromonas hydrophila subsp. hydrophila (strain ATCC 7966 / NCIB 9240)
226 463 Shewanella sp. (strain ANA-3)
227 462 Anabaena variabilis (strain ATCC 29413 / PCC 7937)
228 462 Cupriavidus necator (strain ATCC 17699 / H16 / DSM 428 / Stanier 337)
229 462 Rhodopseudomonas palustris (strain ATCC BAA-98 / CGA009)
230 461 Burkholderia mallei (Pseudomonas mallei)
231 460 Lactobacillus plantarum (strain ATCC BAA-793 / NCIMB 8826 / WCFS1)
232 458 Cupriavidus pinatubonensis (strain JMP134 / LMG 1197) (Alcaligenes eutrophus)
233 455 Staphylococcus aureus (strain JH1)
234 455 Rhodobacter sphaeroides (strain ATCC 17023 / 2.4.1 / NCIB 8253 / DSM 158)
235 454 Xanthomonas oryzae pv. oryzae (strain MAFF 311018)
236 453 Ovis aries (Sheep)
237 453 Pseudomonas putida (strain W619)
238 453 Rickettsia felis (strain ATCC VR-1525 / URRWXCal2) (Rickettsia azadi)
239 452 Methylococcus capsulatus (strain ATCC 33009 / NCIMB 11132 / Bath)
240 452 Shewanella baltica (strain OS185)
241 451 Pyrococcus furiosus (strain ATCC 43587 / DSM 3638 / JCM 8422 / Vc1)
242 451 Streptococcus mutans
243 451 Aeromonas salmonicida (strain A449)
244 449 Mycobacterium paratuberculosis
245 449 Thermoanaerobacter tengcongensis
246 449 Staphylococcus aureus (strain JH9)
247 448 Hahella chejuensis (strain KCTC 2396)
248 447 Vibrio fischeri (strain MJ11)
249 445 Nicotiana tabacum (Common tobacco)
250 445 Pseudomonas mendocina (strain ymp)
2.3 Taxonomic distribution of the sequences
Kingdom sequences (% of the database)
Archaea 18834 ( 4%)
Bacteria 327836 ( 61%)
Eukaryota 172538 ( 32%)
Viruses 16040 ( 3%)
Within Eukaryota:
Category sequences (% of Eukaryota) (% of the complete database)
Human 20255 ( 12%) ( 4%)
Other Mammalia 45525 ( 26%) ( 9%)
Other Vertebrata 16953 ( 10%) ( 3%)
Viridiplantae 32221 ( 19%) ( 6%)
Fungi 30594 ( 18%) ( 6%)
Insecta 8369 ( 5%) ( 2%)
Nematoda 4217 ( 2%) ( 1%)
Other 14404 ( 8%) ( 3%)
3. SEQUENCE SIZE
Repartition of the sequences by size (excluding fragments)
From To Number From To Number
1- 50 8691 1001-1100 3696
51- 100 41027 1101-1200 2561
101- 150 57213 1201-1300 1996
151- 200 57415 1301-1400 1848
201- 250 56187 1401-1500 1492
251- 300 49513 1501-1600 722
301- 350 49702 1601-1700 555
351- 400 42983 1701-1800 453
401- 450 35235 1801-1900 419
451- 500 28326 1901-2000 340
501- 550 20111 2001-2100 208
551- 600 14390 2101-2200 277
601- 650 12139 2201-2300 287
651- 700 8762 2301-2400 170
701- 750 7213 2401-2500 136
751- 800 5111 >2500 1074
801- 850 4481
851- 900 4977
901- 950 3835
951-1000 2709
The average sequence length in UniProtKB/Swiss-Prot is 354 amino acids.
The shortest sequence is GWA_SEPOF (P83570): 2 amino acids.
The longest sequence is TITIN_MOUSE (A2ASS6): 35213 amino acids.
4. JOURNAL CITATIONS
Note: the following citation statistics reflect the number of distinct
journal citations.
Total number of journals cited in this release of UniProtKB/Swiss-Prot: 2226
4.1 Table of the frequency of journal citations
Journals cited 1x: 728
2x: 288
3x: 144
4x: 109
5x: 97
6x: 75
7x: 48
8x: 38
9x: 33
10x: 26
11- 20x: 177
21- 50x: 185
51-100x: 101
>100x: 177
4.2 List of the most cited journals in UniProtKB/Swiss-Prot
Nb Citations Journal name
-- --------- -------------------------------------------------------------
1 19961 Journal of Biological Chemistry
2 9090 Proceedings of the National Academy of Sciences of the U.S.A.
3 5413 Journal of Bacteriology
4 4872 Biochemical and Biophysical Research Communications
5 4558 Gene
6 4436 Nucleic Acids Research
7 4216 Biochemistry
8 4198 FEBS Letters
9 4073 The EMBO Journal
10 3800 Molecular and Cellular Biology
11 3540 Nature
12 3363 Journal of Molecular Biology
13 3181 European Journal of Biochemistry
14 3121 Biochimica et Biophysica Acta
15 2919 Cell
16 2499 Genomics
17 2357 Journal of Virology
18 2355 Biochemical Journal
19 2334 Science
20 1931 Molecular Microbiology
21 1774 Journal of Cell Biology
22 1630 Plant Physiology
23 1574 Plant Molecular Biology
24 1520 Genes and Development
25 1499 Virology
26 1474 The American Journal of Human Genetics
27 1431 Nature Genetics
28 1405 Human Molecular Genetics
29 1375 Oncogene
30 1318 Molecular and General Genetics
31 1276 Development
32 1225 Human Mutation
33 1209 Journal of Biochemistry
34 1198 Molecular Biology of the Cell
35 1136 The Plant Cell
36 1120 Journal of Immunology
37 1052 Genetics
38 1036 Molecular Cell
39 1002 Structure
40 1002 The Plant Journal
41 996 Journal of General Virology
42 923 Blood
43 915 Infection and Immunity
44 890 Archives of Biochemistry and Biophysics
45 871 Journal of Cell Science
46 798 Microbiology
47 791 Developmental Biology
48 781 Yeast
49 772 Cancer Research
50 745 Current Biology
51 692 FEMS Microbiology Letters
52 622 Acta Crystallographica, Section D
53 616 Human Genetics
54 615 Nature Structural Biology
55 612 Mechanisms of Development
56 610 Protein Science
57 607 Journal of Neuroscience
58 589 Applied and Environmental Microbiology
59 577 Toxicon
60 570 Neuron
61 553 Journal of Clinical Investigation
62 536 Current Genetics
63 515 American Journal of Physiology
64 504 The Journal of Experimental Medicine
65 478 Mammalian Genome
66 474 Molecular Endocrinology
67 453 Immunogenetics
68 449 Journal of Neurochemistry
69 446 Proteins
70 436 The Journal of Clinical Endocrinology and Metabolism
71 427 Molecular and Biochemical Parasitology
72 422 Endocrinology
73 403 Nature Cell Biology
74 402 Bioscience, Biotechnology, and Biochemistry
75 398 Plant and Cell Physiology
76 390 Journal of Molecular Evolution
77 386 Journal of Medical Genetics
78 373 DNA and Cell Biology
79 369 Molecular Biology and Evolution
80 361 DNA Sequence
81 355 Experimental Cell Research
82 327 Peptides
83 325 Brain Research. Molecular Brain Research
84 321 Tissue Antigens
85 317 PLoS ONE
86 314 Comparative Biochemistry and Physiology
87 299 Molecular Pharmacology
88 297 Antimicrobial Agents and Chemotherapy
89 296 Developmental Cell
90 293 Biological Chemistry Hoppe-Seyler
91 292 Journal of Investigative Dermatology
92 290 RNA
93 277 Cytogenetics and Cell Genetics
94 273 Biology of Reproduction
95 271 Neurology
96 263 Nature Structural and Molecular Biology
97 262 Developmental Dynamics
98 262 Virus Research
99 261 Planta
100 257 Genome Research
101 256 The FEBS Journal
102 252 Journal of General Microbiology
103 242 Molecular Plant-Microbe Interactions
104 235 Immunity
105 230 EMBO Reports
106 226 European Journal of Immunology
107 224 Biochimie
108 223 Genes to Cells
109 218 The New England Journal of Medicine
110 218 Eukaryotic Cell
111 218 Hoppe-Seyler's Zeitschrift fur Physiologische Chemie
112 217 Annals of Neurology
113 213 The FASEB Journal
114 211 DNA Research
115 210 European Journal of Human Genetics
116 200 Journal of Human Genetics
117 192 Investigative Ophthalmology and Visual Science
118 186 Archives of Virology
119 182 Molecular and Cellular Endocrinology
120 179 Archives of Microbiology
121 176 Journal of the American Chemical Society
122 175 Journal of Cellular Biochemistry
123 173 American Journal of Medical Genetics. Part A
124 172 Molecular Immunology
125 170 BMC Genomics
126 169 Diabetes
127 169 Insect Biochemistry and Molecular Biology
128 167 Glycobiology
129 167 Clinical Genetics
130 167 American Journal of Medical Genetics
131 167 Molecular Phylogenetics and Evolution
132 164 Nature Immunology
133 160 Journal of Medicinal Chemistry
134 159 DNA
135 158 International Journal of Cancer
136 156 Molecular Reproduction and Development
137 155 Circulation Research
138 155 Hemoglobin
139 153 Bioorganicheskaia Khimiia
140 146 Molecular and Cellular Neuroscience
141 146 Molecular Genetics and Metabolism
142 144 Biological Chemistry
143 142 Molecular Genetics and Genomics
144 139 British Journal of Haematology
145 138 General and Comparative Endocrinology
146 138 Acta Crystallographica, Section F
147 138 Animal Genetics
148 135 Protein Expression and Purification
149 134 Phytochemistry
150 133 Journal of Experimental Botany
5. STATISTICS FOR SOME LINE TYPES
The following table summarizes the total number of some UniProtKB/Swiss-Prot lines,
as well as the number of entries with at least one such line, and the
frequency of the lines.
Total Number of Average
Line type / subtype number entries per entry
------------------------------------ -------- --------- ---------
References (RL) 1009016 1.89
Journal 803909 409997 1.50 1
Submitted to EMBL/GenBank/DDBJ 196204 176614 0.37 2
Submitted to other databases 6755 6296 0.01 3
Book citation 687 673 <0.01 4
Plant Gene Register 576 564 <0.01 5
Thesis 406 403 <0.01 6
Unpublished observations 284 280 <0.01 7
Patent 189 186 <0.01 8
Worm Breeder's Gazette 6 6 <0.01 9
Total number of distinct authors cited in UniProtKB/Swiss-Prot: 318763
Total Number of Average
Line type / subtype number entries per entry Rank
------------------------------------ -------- --------- --------- ----
Comments (CC) 2362125 4.41
ALLERGEN 516 516 <0.01 26
ALTERNATIVE PRODUCTS 20446 20446 0.04 13
BIOPHYSICOCHEMICAL PROPERTIES 4021 4021 0.01 23
BIOTECHNOLOGY 325 323 <0.01 28
CATALYTIC ACTIVITY 238939 216979 0.45 4
CAUTION 7912 7754 0.01 19
COFACTOR 103708 95334 0.19 7
DEVELOPMENTAL STAGE 9550 9550 0.02 16
DISEASE 4909 3303 0.01 21
DISRUPTION PHENOTYPE 4420 4420 0.01 22
DOMAIN 36867 32634 0.07 10
ENZYME REGULATION 10445 10445 0.02 15
FUNCTION 410264 393418 0.77 2
INDUCTION 13909 13909 0.03 14
INTERACTION 9196 9196 0.02 17
MASS SPECTROMETRY 5023 3822 0.01 20
MISCELLANEOUS 31210 28793 0.06 12
PATHWAY 131273 119071 0.25 6
PHARMACEUTICAL 85 85 <0.01 29
POLYMORPHISM 848 802 <0.01 24
PTM 42694 33941 0.08 8
RNA EDITING 623 623 <0.01 25
SEQUENCE CAUTION 40206 40206 0.08 9
SIMILARITY 631745 510464 1.18 1
SUBCELLULAR LOCATION 320849 315302 0.60 3
SUBUNIT 236091 236091 0.44 5
TISSUE SPECIFICITY 36818 36818 0.07 11
TOXIC DOSE 496 482 <0.01 27
WEB RESOURCE 8737 7009 0.02 18
Total number of comment topics: 29
Total Number of Average
Line type / subtype number entries per entry Rank
------------------------------------ -------- --------- --------- ----
Features (FT) 3517632 6.57
ACT_SITE 134418 82172 0.25 9
BINDING 247148 67595 0.46 4
CA_BIND 3814 1581 0.01 35
CARBOHYD 105965 27039 0.20 14
CHAIN 541800 529441 1.01 1
COILED 19754 13555 0.04 26
COMPBIAS 53088 28055 0.10 18
CONFLICT 124698 43707 0.23 11
CROSSLNK 6335 3756 0.01 34
DISULFID 104306 28060 0.19 15
DNA_BIND 11287 10396 0.02 31
DOMAIN 156946 93557 0.29 6
HELIX 153274 15931 0.29 7
INIT_MET 15179 15179 0.03 27
INTRAMEM 1921 844 <0.01 38
LIPID 11360 7219 0.02 30
METAL 298108 72948 0.56 3
MOD_RES 191871 63027 0.36 5
MOTIF 34681 22363 0.06 24
MUTAGEN 38665 9034 0.07 21
NON_CONS 2008 735 <0.01 37
NON_STD 353 278 <0.01 39
NON_TER 12136 9264 0.02 29
NP_BIND 113761 71194 0.21 12
PEPTIDE 9769 6579 0.02 32
PROPEP 12462 10715 0.02 28
REGION 110766 59223 0.21 13
REPEAT 93177 13796 0.17 16
SIGNAL 37444 37434 0.07 22
SITE 40736 24128 0.08 20
STRAND 150072 14811 0.28 8
TOPO_DOM 128007 26462 0.24 10
TRANSIT 7954 7860 0.01 33
TRANSMEM 352133 72517 0.66 2
TURN 35386 12412 0.07 23
UNSURE 2988 522 0.01 36
VAR_SEQ 41403 17871 0.08 19
VARIANT 83258 16626 0.16 17
ZN_FING 29201 12743 0.05 25
Total number of feature keys: 39
Total Number of Average
Line type / subtype number entries per entry Rank Category
------------------------------------ -------- --------- --------- ---- -------------------------------------------
Cross-references (DR) 15241820 28.48
2DBase-Ecoli 85 85 <0.01 125 2D gel databases
Aarhus/Ghent-2DPAGE 126 96 <0.01 122 2D gel databases
AGD 932 926 <0.01 100 Organism-specific databases
Allergome 1421 876 <0.01 96 Protein family/group databases
ANU-2DPAGE 26 26 <0.01 131 2D gel databases
ArachnoServer 763 755 <0.01 106 Organism-specific databases
ArrayExpress 59718 59718 0.11 42 Gene expression databases
Bgee 39344 39344 0.07 47 Gene expression databases
BindingDB 295 295 <0.01 118 Other
BioCyc 248409 239955 0.46 22 Enzyme and pathway databases
BRENDA 4242 4235 0.01 87 Enzyme and pathway databases
CAZy 7526 6768 0.01 73 Protein family/group databases
CGD 671 651 <0.01 107 Organism-specific databases
CleanEx 30109 29468 0.06 51 Gene expression databases
COMPLUYEAST-2DPAGE 99 98 <0.01 124 2D gel databases
ConoServer 915 833 <0.01 102 Organism-specific databases
Cornea-2DPAGE 67 67 <0.01 126 2D gel databases
CTD 68226 67618 0.13 39 Organism-specific databases
CYGD 5594 5591 0.01 77 Organism-specific databases
dictyBase 4200 4084 0.01 88 Organism-specific databases
DIP 13454 13346 0.03 66 Protein-protein interaction databases
DisProt 397 394 <0.01 114 3D structure databases
DMDM 16778 16777 0.03 60 Polymorphism databases
DNASU 18322 18251 0.03 57 Protocols and materials databases
DOSAC-COBS-2DPAGE 149 147 <0.01 121 2D gel databases
DrugBank 5318 1627 0.01 78 Other
EchoBASE 4167 4163 0.01 89 Organism-specific databases
ECO2DBASE 352 300 <0.01 116 2D gel databases
EcoGene 4292 4290 0.01 86 Organism-specific databases
eggNOG 429154 429154 0.80 9 Phylogenomic databases
EMBL 918127 524778 1.72 3 Sequence databases
Ensembl 66720 48142 0.12 40 Genome annotation databases
EnsemblBacteria 97884 84935 0.18 29 Genome annotation databases
EnsemblFungi 16559 16267 0.03 61 Genome annotation databases
EnsemblMetazoa 10917 8276 0.02 69 Genome annotation databases
EnsemblPlants 15885 13628 0.03 63 Genome annotation databases
EnsemblProtists 4432 4308 0.01 85 Genome annotation databases
euHCVdb 55 44 <0.01 127 Organism-specific databases
EuPathDB 793 792 <0.01 103 Organism-specific databases
FlyBase 5845 5471 0.01 76 Organism-specific databases
Gene3D 329891 254067 0.62 17 Family and domain databases
GeneCards 19974 19670 0.04 55 Organism-specific databases
GeneFarm 3048 3034 0.01 92 Organism-specific databases
GeneID 485088 465546 0.91 6 Genome annotation databases
GeneTree 56913 56883 0.11 43 Phylogenomic databases
Genevestigator 66479 66479 0.12 41 Gene expression databases
GenoList 7067 7055 0.01 74 Organism-specific databases
GenomeReviews 376260 356673 0.70 12 Genome annotation databases
GermOnline 41906 41332 0.08 46 Gene expression databases
GlycoSuiteDB 272 272 <0.01 119 PTM databases
GO 2176476 503082 4.07 1 Ontologies
Gramene 4727 4727 0.01 81 Organism-specific databases
H-InvDB 13250 12336 0.02 67 Organism-specific databases
HAMAP 311759 311556 0.58 18 Family and domain databases
HGNC 19762 19605 0.04 56 Organism-specific databases
HOGENOM 365460 365460 0.68 14 Phylogenomic databases
HOVERGEN 75276 75276 0.14 36 Phylogenomic databases
HPA 15760 12093 0.03 65 Organism-specific databases
HSSP 30170 30170 0.06 50 3D structure databases
InParanoid 68993 68993 0.13 38 Phylogenomic databases
IntAct 33011 33011 0.06 49 Protein-protein interaction databases
InterPro 1760072 510212 3.29 2 Family and domain databases
IPI 93496 66427 0.17 32 Sequence databases
KEGG 458748 437153 0.86 8 Genome annotation databases
KO 368549 368099 0.69 13 Phylogenomic databases
LegioList 764 762 <0.01 105 Organism-specific databases
Leproma 671 668 <0.01 108 Organism-specific databases
MaizeGDB 486 481 <0.01 112 Organism-specific databases
MEROPS 10277 10277 0.02 71 Protein family/group databases
MGI 16414 16369 0.03 62 Organism-specific databases
MIM 17337 13266 0.03 59 Organism-specific databases
MINT 17588 17588 0.03 58 Protein-protein interaction databases
NextBio 49276 49274 0.09 44 Other
neXtProt 20099 20099 0.04 54 Organism-specific databases
OGP 377 377 <0.01 115 2D gel databases
OMA 385229 385229 0.72 11 Phylogenomic databases
Orphanet 4141 2492 0.01 90 Organism-specific databases
OrthoDB 77952 77952 0.15 35 Phylogenomic databases
PANTHER 200334 186125 0.37 24 Family and domain databases
Pathway_Interaction_DB 4568 1666 0.01 84 Enzyme and pathway databases
PATRIC 308277 308256 0.58 20 Genome annotation databases
PDB 82799 17847 0.15 34 3D structure databases
PDBsum 82799 17847 0.15 33 3D structure databases
PeptideAtlas 5164 5164 0.01 79 Proteomic databases
PeroxiBase 766 749 <0.01 104 Protein family/group databases
Pfam 700045 490325 1.31 4 Family and domain databases
PharmGKB 15811 15486 0.03 64 Organism-specific databases
PHCI-2DPAGE 249 249 <0.01 120 2D gel databases
PhosphoSite 25548 25548 0.05 53 PTM databases
PhosSite 351 351 <0.01 117 PTM databases
PhylomeDB 169367 169367 0.32 25 Phylogenomic databases
PIR 117670 107578 0.22 28 Sequence databases
PIRSF 96836 96822 0.18 30 Family and domain databases
PMAP-CutDB 1457 1457 <0.01 95 Other
PMMA-2DPAGE 52 52 <0.01 128 2D gel databases
PomBase 5014 4954 0.01 80 Organism-specific databases
PptaseDB 34 34 <0.01 129 Protein family/group databases
PRIDE 74625 74625 0.14 37 Proteomic databases
PRINTS 137441 120319 0.26 27 Family and domain databases
ProDom 29184 29005 0.05 52 Family and domain databases
ProMEX 497 497 <0.01 111 Proteomic databases
PROSITE 476303 301399 0.89 7 Family and domain databases
ProtClustDB 342195 342195 0.64 15 Phylogenomic databases
ProteinModelPortal 428922 428922 0.80 10 3D structure databases
PseudoCAP 1230 1221 <0.01 98 Organism-specific databases
Rat-heart-2DPAGE 28 28 <0.01 130 2D gel databases
Reactome 10592 6701 0.02 70 Enzyme and pathway databases
REBASE 402 402 <0.01 113 Protein family/group databases
RefSeq 507453 466891 0.95 5 Sequence databases
REPRODUCTION-2DPAGE 1256 1035 <0.01 97 2D gel databases
RGD 7614 7610 0.01 72 Organism-specific databases
SGD 6638 6633 0.01 75 Organism-specific databases
Siena-2DPAGE 102 102 <0.01 123 2D gel databases
SMART 166295 124484 0.31 26 Family and domain databases
SMR 211297 211297 0.39 23 3D structure databases
STRING 308686 308684 0.58 19 Protein-protein interaction databases
SUPFAM 330193 261549 0.62 16 Family and domain databases
SWISS-2DPAGE 1183 1182 <0.01 99 2D gel databases
TAIR 11107 11036 0.02 68 Organism-specific databases
TCDB 3625 3610 0.01 91 Protein family/group databases
TIGR 34536 33754 0.06 48 Genome annotation databases
TIGRFAMs 288362 268056 0.54 21 Family and domain databases
TubercuList 1945 1909 <0.01 94 Organism-specific databases
UCD-2DPAGE 510 501 <0.01 110 2D gel databases
UCSC 47753 37204 0.09 45 Genome annotation databases
UniGene 95710 88049 0.18 31 Sequence databases
VectorBase 606 588 <0.01 109 Genome annotation databases
World-2DPAGE 919 908 <0.01 101 2D gel databases
WormBase 4705 3855 0.01 82 Organism-specific databases
Xenbase 4666 4661 0.01 83 Organism-specific databases
ZFIN 2713 2701 0.01 93 Organism-specific databases
Total number of cross-referenced databases: 131
6. AMINO ACID COMPOSITION
6.1 Composition in percent for the complete database
Ala (A) 8.26 Gln (Q) 3.93 Leu (L) 9.66 Ser (S) 6.55
Arg (R) 5.53 Glu (E) 6.75 Lys (K) 5.84 Thr (T) 5.34
Asn (N) 4.06 Gly (G) 7.08 Met (M) 2.42 Trp (W) 1.08
Asp (D) 5.45 His (H) 2.27 Phe (F) 3.86 Tyr (Y) 2.92
Cys (C) 1.36 Ile (I) 5.97 Pro (P) 4.70 Val (V) 6.87
Asx (B) 0.000 Glx (Z) 0.000 Xaa (X) 0.00
Legend: gray = aliphatic, red = acidic, green = small hydroxy,
blue = basic, black = aromatic, white = amide, yellow = sulfur
6.2 Classification of the amino acids by their frequency
Leu, Ala, Gly, Val, Glu, Ser, Ile, Lys, Arg, Asp, Thr, Pro, Asn, Gln,
Phe, Tyr, Met, His, Cys, Trp
7. MISCELLANEOUS STATISTICS
4461 entries are encoded on a mitochondrion, and 3644 are encoded on a plasmid.
12188 entries are encoded on a plastid,
of which 21 are encoded on apicoplasts,
11623 on chloroplasts,
51 on organellar chromatophores,
145 on cyanelles,
149 on non-photosynthetic plastids and
199 on unspecified types of plastid.
Number of entries with at least one sequence correction: 73169