DATA QUALITY ASSURANCE AND SCORING



Quality assurance


The usefulness of antibodies in different assays is dependent on both sensitivity and specificity of epitope binding. The quality of antibodies in the database is monitored through a number of different quality assurance steps. Below is a list of measures taken to ensure that the quality of produced and utilized antibodies is acceptable. All antibodies must pass steps 1-3 in order to be used for immunohistochemistry and immunofluorescence. Steps 4-8 provide a basis for an evaluation and scoring of antibody validity. All antibodies that provide a reasonable pattern of immunoreactivity are added to the Human Protein Atlas portal. Feed-back from the research community is appreciated and needed for continuous curation of data.

Quality assurance steps for antibodies generated within the Human Protein Atlas project:

  1. Plasmid inserts are sequenced to assure that the correct PrEST sequence is cloned.
  2. Size of resulting recombinant protein (including the specific PrEST) is analyzed using mass spectrometry to assure that the correct antigen has been produced and purified.
  3. To control for cross-reactivity, affinity purified antibodies are tested for sensitivity and specificity on protein arrays consisting of glass slides with spotted PrEST fragments.
  4. Antibody specificity is analyzed using Western blot in a standardized setup. Total protein lysates from a limited number of tissues (liver and tonsil), cell lines (RT4 and U-251 MG), and human plasma are used to evaluate the antibody target binding in a Western blot setting. Antibodies with a non-supportive routine WB have been revalidated using an over-expression lysate as a positive control.
  5. Immunohistochemical staining of normal and cancer tissue is examined by trained personnel to assure plausible immunohistochemical staining properties.
  6. High resolution confocal microscopy images of immunofluorescently stained human cell lines are annotated for specific subcellular localizations by trained cell biologists, and the subcellular localization patterns are compared with the immunohistochemical staining and available experimental gene/protein characterization data.
  7. For a subset of genes the antibody specificity has been validated by siRNA knockdown of the target protein prior to immunofluorescence or Western Blot analysis.
  8. For a subset of genes the antibody specificity has been validated by co-staining of a cell line expressing the target protein tagged with GFP at near-endogenous levels.

For commercially available antibodies (CABs), immunohistochemistry has been performed in a similar manner as for HPA-antibodies. These antibodies have also been tested on Western blots. For each commercially available antibody, a link to the antibody provider is given on the "Antibody/Antigen" page.

Antibody validation


The antibody validation indicates how well the quality assurance data supports the specificity of the antibody towards the expected human target protein in various assays.

For antibodies supplied through commercial or other academic sources we provide Western blot validation, immunofluorescence validation and immunohistochemistry validation based on literature conformity and for immunohistochemistry validation also RNA consistency. For further validation we refer to quality controls provided by the respective company.

Immunohistochemistry (IH)


The result of the immunostaining of each antibody is compared with available gene/RNA/protein characterization data, resulting in two different validations: Literature conformity and RNA consistency. Literature conformity is based on conformance of the expression pattern to available gene/protein characterization data in scientific literature and data from bioinformatic predictions. UniProt is used as the main source of gene/protein characterization data and when relevant, available publications and other sources of information are probed in depth. Extensive or sufficient gene/protein data requires that there is evidence of existence on a protein level and that a substantial quantity of published experimental data is available from literature and public databases. Limited protein/gene data does not require evidence of existence on a protein level and refers to genes for which only bioinformatic predictions and scarce published experimental data is available. RNA consistency is based on a comparison of immunohistochemistry data with the internally generated RNA-seq data.

The different options of literature conformity are:

  • Consistent with extensive gene/protein characterization data
  • Consistent with gene/protein characterization data
  • Partly consistent with extensive gene/protein characterization data
  • Partly consistent with gene/protein characterization data
  • No avaliable gene/protein characterization data
  • Not consistent with gene/protein characterization data
  • Not done

RNA consistency is scored as follows:

  • Consistent with RNA expression data
  • Mainly consistent with RNA expression data
  • Mainly not consistent with RNA expression data
  • Not consistent with RNA expression data
  • No internal RNA expression data available for correlation
  • Not done

Immunofluorescence Mouse brain (IFM)


In order to generate and present reliable and valuable data several validation steps are incorporated in our work flow.

Antibody selection: Based on sequence homology, only antibodies raised against PrESTs with >60% homology with corresponding mouse genes are selected.

Translational validation: Antibodies exposed to mouse brain lysates using western blot to identify possible off-target interactions with mouse proteins.

Internal comparative validation: If available multiple antibodies raised against different fragments of targeted proteins are applied to mouse brain tissue. Reliability score increases when 2 or more antibodies reveal similar staining patterns.

External multidisciplinary validation: Staining patterns will be evaluated using peer-reviewed published data on cellular and regional distribution of proteins. In addition protein distribution data is assessed using expression data available in the Allen Brain Atlas.

Immunofluorescence (IF)


For each cell line, the observed staining is assigned a validation score, classified as either Supportive, Uncertain or Non-supportive based on concordance with available experimental gene/protein characterization data in the UniProtKB/Swiss-Prot database. The validation scores for the three cell lines are merged into one of the main categories; Supportive, Uncertain or Non-supportive, to represent the antibody staining in all analyzed cell lines.

Validation scores for Immunofluorescence:

Supportive

  • One/multiple location(s) supported by experimental gene/protein characterization data and supported by ≥1 other antibody.
  • One/multiple location(s) with no available experimental gene/protein characterization data or partly supported and partly conflicting data, but supported by ≥1 other antibody.
  • One/multiple location(s) supported by experimental gene/protein characterization data.
  • Multiple locations partly supported (at least one) by experimental gene/protein characterization data.

Uncertain

  • One/multiple location(s) in cytoplasm (e.g. Golgi apparatus, mitochondria) supported by experimental evidence for cytoplasmic localization.
  • Location not consistent with experimental gene/protein characterization data, but supported by ≥1 other antibody.
  • One/multiple location(s) with no available experimental gene/protein characterization data.
  • Not decisive - One/multiple location(s) where experimental gene/protein characterization data is partly supporting and partly conflicting.
  • No staining.
  • One/multiple location(s) supported by experimental gene/protein characterization data but showing dissimilar staining to ≥1 other antibody.

Non-supportive

  • Location not consistent with experimental gene/protein characterization data.
  • Location not consistent with experimental gene/protein characterization data and showing dissimilar staining to ≥1 other antibody.
  • One/multiple location(s) with no available experimental gene/protein characterization data or partly supported and partly conflicted, but showing dissimilar staining to ≥1 other antibody.

The validation of multi-targeting (targeting proteins encoded by two or more genes) antibodies is based on the conformance of the expression pattern to available gene/protein characterization data. Similarity between paired antibodies is not taken in account due to the complexity of multiple gene targets.

Validation scores for Immunofluorescence - multi-targeting antibodies:

Supportive

  • The multi-targeting antibody yielding a staining pattern consistent with available gene/protein characterization data for all of the genes.
  • The multi-targeting antibody yielding a staining pattern partly consistent with available gene/protein characterization data for all of the genes.

Uncertain

  • The multi-targeting antibody yielding a staining pattern with no available gene/protein characterization data.
  • The multi-targeting antibody yielding a staining pattern consistent with available gene/protein characterization data for at least one of the genes but not all.
  • The multi-targeting antibody not yielding a staining pattern.

Non-supportive

  • The multi-targeting antibody yielding a staining pattern not consistent with available gene/protein characterization data.

Immunofluorescence siRNA (IF siRNA)


For each siRNA validation assay a validation score is assigned based on the decrease in antibody-based staining intensity upon target protein downregulation.

Validation scores for immunofluorescence siRNA validation:

Supportive

  • Signal downregulation > 25 % by both siRNA:s.
  • Signal downregulation > 25 % by one siRNA and > 10 % by the other.
  • Signal downregulation > 25 % by one siRNA.
  • Signal downregulation < 10 % by one/two siRNA:s.

Immunofluorescence GFP (IF GFP)


For each GFP validation assay a validation score is assigned based on colocalization of the antibody staining and the GFP-tagged protein.

Validation scores for immunofluorescence GFP validation:

Supportive

  • Antibody staining overlaps with GFP tagged protein.
  • Antibody staining overlaps with GFP tagged protein but additional locations are seen.

Western blot (WB)


Supportive

  • Bands corresponding to the predicted size in kDa (+/-20%).
  • Band of predicted size in kDa (+/-20%) with additional bands present.

Uncertain

  • Single band larger than predicted size in kDa (+20%) but partly supported by predicted transmembrane region, signal peptide or by other available data.
  • No bands detected.
  • Single band differing more than +/-20% from predicted size in kDa and not supported by predicted transmembrane region, signal peptide or by other available data.

Non-supportive

  • Weak band of predicted size in kDa (+/-20%) but with additional bands of higher intensity also present.
  • Only bands not corresponding to the predicted size.
  • Target too small/large to be analyzed with the present setup.
  • Current setup is not applicable due to low RNA count

For antibodies showing non-supportive Western blot data the corresponding image is not shown.

Western blot siRNA (WB siRNA)


Western blot analysis for each antibody is scored as supportive if the signal from one of or both siRNA lane/s are >25% weaker than from the control lane. Total protein amount in all lanes are taken in consideration when scoring.

Supportive

  • Downregulation visible in both siRNA lanes
  • Downregulation visible in one of two siRNA lanes

Protein array (PA)


Supportive

  • Pass with single peak corresponding to interaction only with its own antigen.

Uncertain

  • Pass with quality comment low specificity (binding to 1-2 PrESTs >15% and <40%).

Non-supportive

  • No or weak signal.
  • Low specificity (one antigen with >40% signal or more than two antigens with signal >15%).

Antibodies that are validated as non-supportive are not published.


Reliability score


A reliability score is set for all genes and indicates the level of reliability of the analyzed protein expression pattern based on available protein/RNA/gene characterization data.

Immunohistochemistry (IH)


Genes with knowledge-based annotation

The reliability score, divided into supportive (premium) or uncertain (not premium), is manually selected based on the knowledge-based annotation. Supportive/premium is indicated by a star on the image that links to the tissue atlas data for a particular gene. Experienced personnel evaluate the performance of the antibodies and compare the staining pattern with available protein/gene characterization data as well as internally generated RNA-seq data. If there is available data from more than one antibody, the staining patterns of all antibodies are taken in consideration. A similar immunostaining pattern between paired antibodies implies that two or more antibodies directed towards the same protein target show the same cellular and subcellular distribution pattern in a vast majority of analyzed normal tissues.

Genes without knowledge-based annotation

For genes without knowledge-based annotation, literature conformity and RNA consistency are together used for automatic generation of the reliability score of the gene, divided into supportive (premium) or uncertain (not premium). Tissue enriched, group enriched and tissue enhanced with higher expression in a small number of tissues as compared to other analyzed tissues automatically include positive and negative controls and are hence handled slightly different when it comes to the criteria.

The following criteria are needed for a gene to yield a Supportive reliability score:

  • Protein expression data consistent or mainly consistent with RNA-seq data.
  • Protein expression consistent with available gene/protein characterization data.
  • For the RNA categories tissue enriched, group enriched and tissue enhanced, also protein expression partly consistent with available gene/characterization data or no available gene/protein characterization data in combination with protein expression consistent with RNA-seq data is acceptable for a supportive reliability score.
  • For the RNA category tissue enriched also the combination of no available gene/protein characterization data with protein expression mainly consistent with RNA-seq data is acceptable for a supportive reliability score.

Immunofluorescence Mouse brain (IFM)


The reliability score of the antibodies in mouse brain atlas is scored as supportive or uncertain depending on support from in situ hybridization data (Allen brain atlas) and/or previous published data, UniProtKB/Swiss-Prot database.

Immunofluorescence (IF)


The reliability of the annotated protein expression data is scored as supportive or uncertain depending on similarity in immunostaining patterns and consistency with available experimental gene/protein characterization data in the UniProtKB/Swiss-Prot database. Assays referred to in the reliability scores are western blot (WB) and siRNA. If siRNA validation supports a subcellular localization it is always considered supportive. The reliability scores are based on the following criteria:

Supportive

  • Two independent antibodies yielding similar or partly similar staining patterns.
  • Two independent antibodies yielding dissimilar staining patterns, both supported by experimental gene/protein characterization data.
  • One antibody yielding a staining pattern supported by experimental gene/protein characterization data.
  • One antibody yielding a staining pattern with no available experimental gene/protein characterization data, but supported by other assay within the protein atlas.
  • One or more independent antibodies yielding staining patterns not consistent with experimental gene/protein characterization data, but supported by siRNA assay.

Uncertain

  • Two independent antibodies yielding partly similar staining patterns but not consistent with experimental protein/gene characterization data.
  • Two independent antibodies yielding dissimilar staining patterns with no available, or partly supportive/partly conflicting, experimental gene/protein characterization data.
  • One antibody yielding a staining pattern with no available, or partly supportive/partly conflicting experimental gene/protein characterization data.

RNA approval - cells


Antibodies used for the analysis of protein expression in cell lines were validated by comparison of immunohistochemical staining results with available transcript data in 44 cell lines. For two cell lines, LP-1 and Hth83, transcript data is missing.

Criteria for approval are listed in scheme below. In brief the approval is performed automatically from generated expression values, and is designed as a funnel in which the antibodies are tried against the selection criteria with a descending level of stringency. Antibodies approved according to more stringent criteria are denoted "supportive antibodies" (marked with a star in the Human Protein Atlas), while the remaining antibodies are denoted "uncertain".

Spearman correlation between continuous values of IHC quantification and FPKM values across the set of cell lines constitutes one of the basic strategies. In order to evade the difficulty of comparing continuous numbers generated with two methods offering vastly different levels of accuracy and sensitivity, other approval criteria have also been defined as displayed in table below.


RNA-approval
category
Subcellular localization
Supportive Partly supportive/not decisive Not supportive Not done
Expression lymphoid cell lines Supportive Supportive Uncertain Fail
Expression myeloid cell lines Supportive Supportive Uncertain Fail
Expression hemato cell lines Supportive Supportive Uncertain Fail
Expression solid tumor cell lines Supportive Supportive Uncertain Fail
Expression epithelial cell lines Supportive Supportive Uncertain Fail
Expression single cell line Supportive Supportive Uncertain Fail
Expression subset of cell lines Supportive Supportive Uncertain Fail
Correlation ≥0.65 Supportive Uncertain Fail Fail
All high/all medium/all low Supportive Uncertain Fail Fail
Similar expression levels Supportive Uncertain Fail Fail
Congruent expression, highest/lowest Supportive Uncertain Fail Fail
Correlation ≥0.55, highest/lowest Supportive Uncertain Fail Fail
Congruent expression Supportive Uncertain Fail Fail
All no expression Uncertain (subcell loc not applicable) Fail
All no/low expression Uncertain Uncertain Fail Fail
Fail Fail Fail Fail Fail

Quality assurance
Antibody validation
Immunohistochemistry (IH)
Immunofluorescence Mouse brain (IFM)
Immunofluorescence (IF)
Immunofluorescence siRNA (IF siRNA)
Immunofluorescence GFP (IF GFP)
Western blot (WB)
Western blot siRNA (WB siRNA)
Protein array (PA)
Reliability score
Immunohistochemistry (IH)
Immunofluorescence Mouse brain (IFM)
Immunofluorescence (IF)
RNA approval - cells