ASSAYS AND ANNOTATION



Immunohistochemistry - tissues


The protein atlas contains histological images obtained by sections from human tissues. The images represent a view similar to what is seen in a microscope when examining sections of tissue on glass slides. Each antibody in the database has been used for immunohistochemical staining of both normal and cancer tissue, the specific binding of an antibody to its corresponding antigen results in a brown color. The tissue section is also counterstained with hematoxylin to enable visualization of microscopical features. Hematoxylin staining is unspecific and results in a blue coloring of both cells and extracellular material. The immunohistochemical protocol is a standardized protocol (available for download), performed in an identical manner every time, the only variables are primary antibody dilution (optimized for every individual antibody) and the secondary antibody (host species dependent).

Tissue microarrays provide the possibility to immunohistochemically stain a large number and variety of normal and cancer tissues (movie about tissue microarray production and immunohistochemical staining). The generated tissue microarrays include samples from 144 individuals corresponding to 44 different normal tissue types, and samples from 216 cancer patients corresponding to 20 different types of cancer. Each sample is represented by 1 mm tissue cores, resulting in a total number of 576 images for each antibody. Normal tissues are represented by samples from three individuals each (except for endometrium, skin, soft tissue and stomach which are represented by samples from six individuals each), one core per individual, and protein expression is annotated in 83 different normal cell types present in these tissues. For cancer tissues, two cores are sampled from each individual and protein expression is annotated in tumor cells. Normally, a smaller fraction of the 576 images are missing for each antibody due to technical issues. Specimens containing normal and cancer tissue have been collected and sampled from anonymized paraffin embedded material of surgical specimens, in accordance with approval from the local ethics committee.

Since specimens are derived from surgical material, normal is here defined as non-neoplastic and morphologically normal. It is not always possible to obtain fully normal tissues and thus several of the tissues denoted as normal will include alterations due to inflammation, degeneration and tissue remodeling. In rare tissues, hyperplasia or benign proliferations are included as exceptions. It should also be noted that within normal morphology there may exist inter-individual differences and variations due to primary diseases, age, sex etc. Such differences may also effect protein expression and thereby immunohistochemical staining patterns.

Samples from cancer are also derived from surgical material. Due to subgroups and heterogeneity of tumors within each cancer type, included cases represent a typical mix of specimens from surgical pathology. The inclusion of tumors is based on availability and representativity, however, an effort has been made to include high and low grade malignancies where such is applicable. In certain tumor groups, subtypes have been included, e.g. breast cancer includes both ductal and lobular cancer, lung cancer includes both squamous cell carcinoma and adenocarcinoma and liver cancer includes both hepatocellular and cholangiocellular carcinoma etc. Tumor heterogenity and inter-individual differences may be reflected in diverse expression of proteins resulting in variable immunohistochemical staining patterns.

Annotation


In order to provide an overview of protein expression patterns, all images of immunohistochemically stained tissues were manually annotated by specially educated personnel followed by review and verification by a second qualified member of the staff. Annotation of each different normal and cancer tissue was performed using fixed guidelines for classification of immunohistochemical outcome. Each tissue was examined for representability, and subsequently immunoreactivity in the different cell types present in normal or cancer tissues was annotated. Basic annotation parameters included an evaluation of i) staining intensity (negative, weak, moderate or strong), ii) fraction of stained cells (rare, <25%, 25-75% or >75%) and iii) subcellular localization (nuclear and/or cytoplasmic/membranous). The manual annotation also provides two summarizing texts describing the staining pattern for each antibody in normal tissues and in cancer tissues, respectively.

The terminology and ontology used is compliant with standards used in pathology and medical science. SNOMED classification has been used for assignment of topography and morphology. SNOMED classification also underlies the given original diagnosis from which normal as well as cancer samples were collected.

A histological dictionary used in the annotation is available as a PDF-document, containing images which are immunohistochemically stained with antibodies included in the protein atlas. The dictionary displays subtypes of cells distinguishable from each other and also shows specific expression patterns in different intracellular structures. Annotation dictionary: screen usage (15MB), printing (95MB).

Knowledge-based annotation


Knowledge-based annotation aims to create a comprehensive knowledge-based map over protein expression patterns in normal human tissues and cells. This is achieved by stringent evaluation of immunohistochemical staining pattern, RNA-seq data from internal and external sources and available protein/gene characterization data, with special emphasis on RNA-seq. Annotated protein expression profiles are performed using single antibodies as well as paired- antibodies (two or more independent antibodies directed against different, non-overlapping epitopes on the same protein). For paired antibodies, the immunohistochemical data from all the different antibodies are taken into consideration. The immunohistochemical staining pattern in normal tissues, subjectively annotated based on the experienced evaluation of positive immunohistochemical signals in defined subpopulations of cells within a tissue context, provides the fundament for a subsequent annotated protein expression. The microscopical images and previous annotations of the 83 included normal cells types are reviewed to assess the performance of the antibody in comparison to RNA-seq data and available protein/gene characterization data, with special emphasis on RNA- Seq data. Accordingly, a knowledge-based protein expression profile is accomplished by the analysis and combination of the available information. The review also takes sub-optimal experimental procedures into consideration. This includes immunostaining errors such as sub-optimal titration of the primary antibody and suspected cross-reactivity as well as the fact that multiple immunostainings have been performed on non-consecutive tissue microarray sections, allowing for differences in immunohistochemical staining patterns caused by inter-individual and inter-specimen variations.

For a knowledge-based expression profile the validation of the immunohistochemical staining pattern by one or several of the following additional data sources is necessary; i) an independent antibody targeting another epitope of the same protein and showing a similar staining pattern, ii) consistency between antibody staining and RNA-seq data, and iii) available protein/gene characterization data. When the information available at the time of analysis is evaluated as not sufficient for verification of the staining pattern and an estimation of the expected protein expression, a knowledge-based expression profile is not performed. When possible, the RNA-seq data is summarized instead of a knowledge-based protein expression profile. Furthermore, in cases where the antibody staining pattern is assessed as interesting to mention although it could not be verified by other data sources, the staining pattern is described instead of a knowledge-based protein expression profile. The final annotated protein expression is considered as a best estimate and as such reflects the most probable histological distribution and relative expression level for each evaluated protein. The knowledge-based protein expression profiles are performed using fixed guidelines on evaluation and presentation of the resulting expression profiles. Standardized explanatory sentences are used when necessary to provide additional information required for full understanding of the expression profile. A reliability score, set as supportive (premium, visualized with a yellow star symbol) or uncertain (not premium) is set for each annotated protein expression profile based on evaluation of all available data.



Immunofluorescence - mouse brain


As a complement to the immunohistochemically stained tissues, the protein atlas also includes the mouse brain atlas as a sub compartment of the normal tissue atlas. In which comprehensive profiles are available in mouse brain. A selected set of targets have been analyzed by using the antibodies in serial sections of mouse brain which covers 129 areas and subfields of the brain, several of these regions difficult to cover in the human brain.

The tissue micro array method used within the human protein atlas enabled the global mapping of proteins in the human body, including the brain. Currently, the human tissue atlas covers four areas of the human brain: cerebral cortex, hippocampus, lateral ventricle, and cerebellum. Due to the heterogeneous structure of the brain, with many nuclei and cell-types organized in complex networks, it is difficult to achieve a comprehensive overview in a 1 mm tissue sample. Analysis of more human brain samples, including smaller brain nuclei, is thus desirable in order to generate a more detailed map of protein distribution in the brain. Therefore, we here complemented the human brain atlas effort with a more comprehensive analysis of the mouse brain. A series of mouse brain sections is explored for protein expression and distribution in a large number of brain regions.

Antibodies are selected against protein involved in normal brain physiology, brain development and neuropathological processes. A limit of 60% homology (human vs mouse) is used as cut off when comparing the PrEST sequence for the antibody targets.

Selected antibodies are exposed to mouse brain lysates in a western blot setting to identify specific and non-specific interactions with mouse brain proteins. Antibodies with strong off-target interactions are excluded from further analysis. During an initial immunofluorescence test antibodies are applied to mouse brain sections and the staining patterns are evaluated and validated before further processing. Antibody immunoreactivity is visualized using tyramid signal amplification shown in green. A nuclear reference staining (DAPI) is visualized in blue. The immunofluorescence protocol is standardized though antibody concentration and incubation time are variable depending on protein abundance and antibody affinity determined during the test staining. The complete mouse brain profile is represented by serial coronal sections of adult mouse brain, 16 µm thick. Stained slides are then scanned and digitalized before further processing.

Table 1. Brain regions. Abbreviations are based on The Mouse Brain in Stereotaxic Coordinates, Third Edition: The coronal plates and diagrams (ISBN: 9780123742445)

Region Abbreviation Allen Brain Atlas
forebrain olfactory bulb anterior olfactory nucleus aon AON
forebrain olfactory bulb granule cell layer gro MOBgr
forebrain olfactory bulb internal plexiform layer ipl MOBipl
forebrain olfactory bulb mitral cell layer mi MOBmi
forebrain olfactory bulb glomerular layer gl MOBgl
forebrain olfactory bulb rostral migratory stream rms SEZ
forebrain olfactory bulb external plexiform layer epl MOBopl
forebrain olfactory bulb external plexiform layer of the accessory OB epla
forebrain olfactory bulb granule cell layer of the accessory OB gra AOBgr
forebrain olfactory bulb glomerular layer of the accessory OB gla AOBgl
forebrain basal forebrain dorsal tenia tecta dtt TTd
forebrain basal forebrain caudate putamen cpu CP
forebrain basal forebrain accumbens nucleus, core acbc ACB
forebrain basal forebrain accumbens nucleus, shell acbsh ACB
forebrain basal forebrain island of Calleja icj isl
forebrain basal forebrain ventral pallidum vp PALv
forebrain basal forebrain medial septum ms MS
forebrain basal forebrain nucleus of the vertical limb of the diagonal band vdb NDB
forebrain basal forebrain lateral septum ls LS
forebrain basal forebrain nucleus of the horizontal limb of the diagonal band hdb NDB
forebrain basal forebrain globus pallidus gp PALd
Show more

Annotation


The digitalized images are processed (axel-adjusted and tissue edges defined) and regions of interest (ROIs) are then marked according to the table above. Theses ROIs are then used for image analysis and the relative fluorescence intensity is listed for each region. The relative fluorescence is defined intensity of the annotated region relative to the intensity of the region with highest intensity.

The overview and preserved orientation in the mouse brain has enabled us to annotate additional cell classes (ependymal), glial subpopulations (microglia, oligodendrocytes, and astrocytes), and additional brain specific subcellular locations (axon, dendrite, synapse, and glia endfeet) for each investigated protein.

All images of immunofluorescence stained sections were manually annotated by specially educated personnel followed by review and verification by a second qualified member of the staff. The cellular and sub cellular location of the immunoreactivity is defined and a summarizing text is provided describing the general staining pattern.

Specificity is validated by comparing the data with in situ hybridization data (Allen brain atlas) and/or available literature; support from other data leads to a supportive reliability score, while more unknown targets are viewed as uncertain and awaits further validation.



Immunohistochemistry - cells


As a complement to the representation of normal and cancer tissue, the protein atlas displays images of a selection of widely used and well characterized human cell lines as well as cell samples from healthy individuals and leukemia/lymphoma patients.

A cell microarray has been used to enable immunohistochemical staining of a panel of cell lines and cell samples. Duplicates from 46 cell lines,10 leukemia blood cell samples and 2 samples of PBMC renders a total of 116 cell images per antibody. Included cell lines are derived from DSMZ, ATCC or academic research groups (kindly provided by cell line founders). Information regarding sex and age of the donor, tissue origin and source is listed here. All cells are fixed in 4% paraformaldehyde and dispersed in agarose prior to paraffin embedding and immunohistochemical staining.

The CMA enables representation of leukemia and lymphoma cell lines, covering major hematopoietic neoplasms and even different stages of differentiation. Cell lines from solid tumors are also included in the CMA. A subset originate from solid tumors not represented in the TMAs, e.g. sarcoma, choriocarcinoma, small cell lung carcinoma, and the remaining cell lines are derived from tumor types also represented in the TMAs.

The immunohistochemical protocols used result in a brown-black staining, localized where an antibody has bound to its corresponding antigen. The section is furthermore histochemically counterstained with hematoxylin to enable visualization of microscopical features. Hematoxylin staining is unspecific, and results in a blue coloring of both cells and extracellular material.

Annotation


In order to provide an overview of protein expression patterns, all images of immunohistochemically stained cell lines are annotated using an automated recognition software for image analysis. The image analysis software, TMAx (Beecher Instruments, Sun Prairie, WI, USA), built on an object-oriented image analysis engine from Definiens, utilizes rule-based operations and multiple iterative segmentation processes together with fuzzy logic to identify cells and immunohistochemical stain deposits.

Output parameters from the software always displayed in conjunction with the annotated images are:

  • number of objects defined as cells in the image
  • staining intensity (negative, weak, moderate and strong)
  • fraction (%) of positive cells
In addition, two overlay images with additional numerical information are presented to facilitate interpretation. The information displayed includes:
  • Cell: object based view representing fraction (%) of immunostained cells. The color code for each cell represents a range of immunoreactivity, blue (negative/very weak), yellow (weak/moderate), orange (moderate/strong) and red (strong) cells. This classification is based on areas of different intensities within each object (cell). This differs slightly from the subjective classification provided by manual annotation of cells in normal and cancer tissue.
  • Area: area-based view representing immunostained areas (%) within cells. The color code represents a range of immunoreactivity, yellow (weak/moderate), green (moderate/strong) and red (strong). Negative/very weak areas are transparent. The intensity score is generated from the total of this area based analysis.


Immunofluorescence - cells


As a complement to the immunohistochemically stained cells and tissues, the protein atlas displays high resolution, multicolor images of immunofluorescently stained cells. This provides spatial information on protein expression patterns on a fine cellular and subcellular level.

Originally three cell lines, U-2 OS, A-431 and U-251 MG, originating from different human tissues were chosen to be included in the immunofluorescent analysis. Starting from year 2012, the cell line panel has been expanded to include additional cell lines: A-549, BJ, CACO-2, HaCaT, HEK 293, HeLa, Hep-G2, MCF-7, PC-3, RH-30, RT-4, SH-SY5Y, SiHa, SK-MEL-30 and TIME. To enhance the probability for a large number of proteins to be expressed, the cell lines were selected from different lineages, e.g. tumor cell lines from mesenchymal, epithelial and glial tumors. The selection was furthermore based on morphological characteristics, widespread use and multitude of publications using these cell lines. Information regarding sex and age of the donor, cellular origin and source is listed here. Based on mRNA expression data, two suitable cell lines from the cell line panel are selected for the immunofluorescent analysis of each protein. In order to localize the whole human proteome on a subcellular level in one specific cell line a third cell line, U-2 OS, is always chosen.

In addition to the human cell lines, the mouse cell line NIH 3T3 is also stained. This is only done for the antibodies corresponding to genes where the mouse and human genes are orthologues.

In order to facilitate the annotation of the subcellular localization of the protein targeted by the HPA antibody, the cells are also stained with reference markers. The following probes/organelles are used as references; (i) DAPI for the nucleus, (ii) anti-tubulin antibody as internal control and marker of microtubules, and (iii) anti-calreticulin or anti-KDEL for the endoplasmic reticulum (ER).

The resulting confocal images are single slice images representing one optical section of the cells. The microscope settings are optimized for each sample. The different organelle probes are displayed as different channels in the multicolor images; the HPA antibody staining is shown in green, nuclear stain in blue, microtubules in red and ER in yellow.

Annotation


In order to provide an interpretation of the staining patterns, all images of immunofluorescently stained cell lines are manually annotated. For each cell line and antibody the intensity and subcellular location of the staining is described. The staining intensity is classified as negative, weak, moderate or strong based on the laser power and detector gain settings used for image acquisition in combination with the visual appearance of the image. The subcellular location is further combined with parameters describing the staining characteristics (e.g. smooth, granular, speckled or fibrous). The table below lists the subcellular locations used, links to the cell structure dictionary and corresponding GO terms.

Subcellular location GO term
Aggresome GO:0016235
Cell Junctions GO:0030054
Centrosome GO:0005813
Cytoplasm GO:0005737
Cytoskeleton (Actin filaments) GO:0015629
Cytoskeleton (Cytokinetic bridge) GO:0045171
Cytoskeleton (Intermediate filaments) GO:0045111
Cytoskeleton (Microtubules) GO:0015630
Endoplasmic reticulum GO:0005783
Focal Adhesions GO:0005925
Golgi apparatus GO:0005794
Microtubule organizing center GO:0005815
Mitochondria GO:0005739
Nuclear membrane GO:0031965
Nucleoli GO:0005730
Nucleus GO:0005634
Nucleus but not nucleoli GO:0005654
Plasma membrane GO:0005886
Vesicles GO:0043231


Knowledge-based annotation


Knowledge-based annotation of subcellular location aims to provide an interpretation of the subcellular localization of a specific protein in at least three human cell lines. The conflation of immunofluorescence data from two or more antibody sources directed towards the same protein and a review of available protein/gene characterization data, allows for a knowledge-based interpretation of the subcellular location.

Immunofluorescence - siRNA validation


To validate the protein subcellular localization determined with the HPA-antibody, the staining procedure is repeated on siRNA transfected U-2 OS cells.

A reverse solid phase transfection protocol is used to coat cell seeding surfaces with siRNA and transfection reagents prior to cell seeding. After siRNA transfection has occurred, cells are fixated and stained according to the standard protocol. For each antibody, the assay is performed in duplicates using siRNA:s from two different providers, and the results are compared to negative control cells transfected with scrambled siRNA.

Images are automatically acquired using objectives with 10x- and 40x-magnification. An automated image analysis protocol segments the cells and extracts features from all acquired images before statistical software automatically compare the population median staining intensity between siRNA coated and negative control samples.

Relative Fluorescence Intensity (RFI) denotes the percentage of remaining staining intensity after siRNA down regulation.

Annotation


For each antibody, the statistical analysis is performed in the one of the three segmented cell areas (nucleus, cytoplasm or whole cells) that best matches the antibody staining. Based on the RFI values, an siRNA Validation score is set grouping the siRNA assays according to the level of down regulation.

Immunofluorescence - GFP validation


In addition to the cell line panel mentioned above antibodies targeting a subset of genes are analyzed in HeLa cell lines stably expressing GFP-tagged target protein.

Cell lines are kindly provided by the group of Professor Anthony Hyman, Max Planck Institute, Dresden, (Poser et al 2008). These are produced using BAC TransgeneOmics technology where transfection with Bacterial Artificial Chromosomes allow for a large transgenic insert with all regulatory elements present resulting in near-endogenous expression. Analysis is performed directly on the clone pool where individual cells show variations in tagged protein expression level. An anti-GFP antibody is used to detect even low abundant tagged target protein.

Annotation


All images are manually annotated to one or several subcellular locations and if applicable combined with parameters describing the staining characteristics (e.g. smooth, granular, speckled or fibrous).

The antibody staining intensity is classified as negative, weak, moderate or strong based on the laser power and detector gain settings used for image acquisition in combination with the visual appearance of the image. GFP intensity is classified as positive or negative.

The location of the tagged protein is taken into account when performing the knowledge based annotation of subcellular location for each gene described above.



Western blot


Western blot analysis of antibody specificity has been done using a routine sample setup composed of IgG/HSA-depleted human plasma and protein lysates from a limited number of human tissues and cell lines. Antibodies with a non-supportive routine WB have been revalidated using an over-expression lysate (VERIFY Tagged Antigen(TM), OriGene Technologies, Rockville, MD) as a positive control. Antibody binding was visualized by chemiluminescence detection in a CCD-camera system using a peroxidase (HRP) labeled secondary antibody.

Antibodies included in the Human Protein Atlas have been analyzed without further efforts to optimize the procedure and therefore it cannot be excluded that certain observed binding properties are due to technical rather than biological reasons and that further optimization could result in a different outcome.

Western blot - siRNA validation


As an additional validation method for the HPA-antibody, Western blot was performed on lysates from siRNA transfected U-2 OS cells.

A reverse solid phase transfection protocol is used to coat cell seeding surfaces with siRNA and transfection reagents prior to cell seeding. After siRNA transfection, cells are lysed (150 mM NaCl, 50 mM Tris pH 8, 1% Triton, 0.5% sodium deoxycholate, 0.1% SDS) and reducing buffer is added to prepare sample for SDS-PAGE. Two siRNA samples, with siRNA from two different providers, are produced per antibody and the result is compared to a negative control sample, with cells transfected with scrambled siRNA. SDS-PAGE, protein transfer and Western blot is performed according to standard Western blot protocol with the exception of 4-20% Criterion TGX Stain-Free precast gels (Bio-Rad Laboratories) being used, enabling a total protein image being acquired prior to transfer. All Western blots are analyzed using Image Lab software (Bio-Rad Laboratories).

Protein array


All purified antibodies are analyzed on antigen microarrays. The specificity profile for each antibody is determined based on the interaction with 384 different antigens including its own target. The antigens present on the arrays are consecutively exchanged in order to correspond to the next set of 384 purified antibodies. Each microarray is divided into 21 replicated subarrays, enabling the analysis of 21 antibodies simultaneously. The antibodies are detected through a fluorescently labeled secondary antibody and a dual color system is used in order to verify the presence of the spotted proteins. A specificity profile plot is generated for each antibody, where the signal from the binding to its own antigen is compared to the eventual off target interactions to all the other antigens. The vast majority (86%) of antibodies are given a pass, but a fraction are failed either due to low signal or low specificity.

HPA RNA-seq data


In total, 45 cell lines and 32 tissues have been analyzed by RNA-seq to estimate the transcript abundance of each protein-coding gene.

For cell lines, early-split samples were used as duplicates and total RNA was extracted using the RNeasy mini kit. Information regarding cellular origin and source of each cell line is listed here.

For normal tissue, specimens were collected with consent from patients and all samples were anonymized in accordance with approval from the local ethics committee (ref #2011/473) and Swedish rules and legislation. All tissues were collected from the Uppsala Biobank and RNA samples were extracted from frozen tissue sections.

For a total number of 93 cell line samples and 124 tissue samples, mRNA sequencing was performed on Illumina HiSeq2000 and 2500 machines (Illumina, San Diego, CA, USA) using the standard Illumina RNA-seq protocol with a read length of 2x100 bases. Transcript abundance estimation was performed using Tophat v2.0.8b and Cufflinks v2.1.1. For each gene, FPKM values or 'number of Fragments Per Kilobase gene model and Million reads', were calculated as the sum of all its protein-coding transcripts, and the average FPKM value for replicate samples were used as abundance scores. The threshold level to detect presence of a transcript for a particular gene was set to > 0.5 FPKM.

The RNA-seq data was used to classify all genes according to their tissue-specific expression into one of six different categories, defined based on the total set of all FPKM values in 32 tissues:

  • tissue enriched (expression in one tissue at least five-fold higher than all other tissues)
  • group enriched (five-fold higher average FPKM level in a group of two to seven tissues compared to all other tissues)
  • tissue enhanced (five-fold higher average FPKM level in one or more tissues compared to the mean FPKM of all tissues)
  • expressed in all (> 0.5 FPKM in all tissues)
  • not detected (< 0.5 FPKM in all tissues)
  • mixed (detected in 1-31 tissues and none of the above categories)
An additional category "elevated", containing all genes in the first three categories (tissue enriched, group enriched and tissue enhanced), has been used for some parts of the analysis.

FPKM thresholds were further set for categorization of transcript expression levels into low, medium or high RNA abundance.

Abundance FPKM tissue FPKM cell line
Not detected 0-0.5 0-0.5
Low 0.5-10 0.5-20
Medium 10-50 20-50
High >50 >50


GTEx RNA-seq data


The Genotype-Tissue Expression (GTEx) project collects and analyzes multiple human post mortem tissues. RNA-seq data from 28 of their tissues having a corresponding tissue in Human Protein Atlas have been included to allow for comparisons between the Human Protein Atlas data and GTEx data.

The GTEx RNA-seq data has been mapped using the ensembl gene id available from GTEx, and the RPKMs (number Reads Per Kilobase gene model and Million mapped reads) for each gene were subsequently used to categorize the genes using the same classification and thresholds as described above.

Tissue GTEx tissue Number of samples
Adipose tissue Adipose - Subcutaneous 350
Adipose - Visceral (Omentum) 227
Adrenal gland Adrenal Gland 145
Breast Breast - Mammary Tissue 214
Cerebellum Brain - Cerebellar Hemisphere 105
Brain - Cerebellum 125
Cerebral cortex Brain - Cortex 114
Brain - Frontal Cortex (BA9) 108
Cervix, uterine Cervix - Ectocervix 6
Cervix - Endocervix 5
Colon Colon - Sigmoid 149
Colon - Transverse 196
Endometrium Uterus - Endometrium 14
Esophagus Esophagus - Mucosa 286
Fallopian tube Fallopian Tube 6
Heart muscle Heart - Atrial Appendage 194
Heart - Left Ventricle 218
Hippocampus Brain - Hippocampus 94
Kidney Kidney - Cortex 32
Liver Liver 119
Lung Lung 320
Ovary Ovary 97
Pancreas Pancreas 171
Prostate Prostate 106
Salivary gland Minor Salivary Gland 57
Skeletal muscle Muscle - Skeletal 430
Skin Skin - Not Sun Exposed (Suprapubic) 250
Skin - Sun Exposed (Lower leg) 357
Small intestine Small Intestine - Terminal Ileum 88
Spleen Spleen 104
Stomach Stomach 193
Testis Testis 172
Thyroid gland Thyroid 323
Urinary bladder Bladder 11
Vagina Vagina 96


Evidence


Protein evidence is calculated for each gene based on three different sources: UniProt protein existence (UniProt evidence); a Human Protein Atlas antibody- or RNA based score (HPA evidence); and evidence based on two proteogenomics studies (MS evidence). In addition, for each gene, a protein evidence summary score is based on the maximum level of evidence in all three independent evidence scores (Evidence summary).

All scores are classified into the following categories:

  • Evidence at protein level
  • Evidence at transcript level
  • No evidence
  • Not available

UniProt evidence is based on UniProt protein existence data, which uses five types of evidence for the existence of a protein. All genes in the classes "experimental evidence at protein level" or "experimental evidence at transcript level" are classified into the first two evidence categories, whereas genes from the "inferred from homology", "predicted", or "uncertain" classes are classified as "No evidence". Genes where the gene identifier could not be mapped to UniProt from Ensembl version 78.38 are classified as "Not available".

The HPA evidence is calculated based on the manual curation of Western blot, tissue profiling and subcellular location as well as transcript profiling using RNA-seq. All genes with supportive protein reliability in one or more of the three methods immunohistochemistry, immunofluorescence, or Western blot (assays using over-expression lysates not included) are classified as "Evidence at protein level". For the remaining genes, all genes detected at FPKM > 0.5 in at least one of the tissues or cell lines used in the RNA-seq analysis are classified as "Evidence at transcript level". A small number of genes lack RNA-seq data due to software error and are classified as "Not available". All remaining genes are classified as "No evidence".

MS evidence is based on two proteogenomics studies Kim et al 2014 and Ezkurdia et al 2014. Each gene detected by at least one of the MS-based studies is classified as "Evidence at protein level" and all remaining genes as "Not available".

Immunohistochemistry - tissues
Annotation
Knowledge-based annotation
Immunofluorescence - mouse brain
Annotation
Immunohistochemistry - cells
Annotation
Immunofluorescence - cells
Annotation
Knowledge-based annotation
Immunofluorescence - siRNA validation
Annotation
Immunofluorescence - GFP validation
Annotation
Western blot
Western blot - siRNA validation
Protein array
HPA RNA-seq data
GTEx RNA-seq data
Evidence