DISCLAIMER

Quality assured antibodies have been used in this study and each image has been evaluated by a pathologist. However, the complexity of tissues and lack of verified references for largely unknown proteins disables immunoreactivity as firm proof of protein expression levels. It cannot be excluded that certain observed and annotated differences in immunoreactivity are due to technical rather than biological reasons. In addition, inter-individual differences regarding both expression patterns and image annotation may play a role.

Also note that antibody-based immunohistochemistry can result in off-target binding yielding false positive results. For antibodies that currently lack knowledge-based annotation, this will appear in the summary page of a particular gene as differences between RNA and protein expression, usually with more ubiquitous expression on the protein level. Such staining on the protein level should be interpreted with caution and one of the objectives of the Human Protein Atlas program is to resolve if the discrepancies between RNA and protein levels for these genes across the analyzed tissues are due to technical issues. During knowledge-based annotation of protein expression profiles, protein expression in each normal tissue is assessed as specific or off-target based on the available RNA-seq data and protein/gene characterization data. The protein expression levels presented on the summary page are the result of manual correction performed according to this assessment.

In the automatic annotation of cell images, object-based morphological operations and classification processes are alternating. This results in objects of interest, i.e. cells being anlaysed with respect to IHC staining. Technical artefacts present in the image complying with classification criteria will therefore not be disregarded, but instead wrongfully included in the analysis. The intensity score represents the composite overall staining in the image, based on areas of strong, moderate and weak staining. Small areas of strong intensity tend to be underscored, for example granular/nucleolar staining.

The overall nature of the intensity score, and the cutoff-values set for this, entails that a fraction of cells may be positive, although the image is scored as negative. Fractions of differentially stained cells do not substantiate the basis for the overall intensity score, as this classification of cells is by large subjective.

In the immunofluorescent analysis of the subcellular distribution of proteins, high resolution images of a limited number of cells are acquired. The images are single slice images representing one optical section in the cells. It cannot therefore be excluded that there are additional staining localizations not captured/represented in the images.

It can also not be excluded that differences in intensities and localization are due to technical rather than biological reasons. One method for fixation and permeabilization of the cells is currently being used and this method may be more or less suitable for different types of proteins. Some types of proteins will not easily be resolved in their intact subcellular compartment and hence fall into a less resolved compartment, i.e. cytoplasm or nucleus, possibly resulting in an over-representation of these "meta"-compartments.

Western blot analysis has, up until release 7.0, been performed only using a routine setup. This setup is composed of total protein extracts from a limited number of tissues/cells and human plasma depleted of serum albumin and IgG. The lack of a verified positive reference for many of the analyzed antibodies and the limited number of included protein sources sometimes excludes Western blot as firm proof of antibody specificity. In addition, due to the high-throughput nature of the project, the majority of the antibodies have been analyzed using a standardized protocol in a single shot approach without further efforts to optimize the procedure. Therefore, it cannot be excluded that certain observed binding properties are due to technical rather than biological reasons and that further optimization could result in a different outcome. Antibodies with a non-supportive standard western blot will gradually be revalidated using an over-expression HEK293T lysate.



The transcriptomics data is based on deep sequencing of RNA libraries. The library preparation and data analysis will unavoidably introduce biases and errors for a small number of genes. While these errors are rare, it is important to take notice when studying affected genes, since they will cause discrepancies between presented RNA and protein data. The library preparation method will not capture non-polyadenylated transcripts. This mainly affects histone genes, which incorrectly appear to not be expressed at the RNA level.

Finally, we have detected a minor leakage (in the order of 0.01-0.1%) between samples that were multiplexed in the sequencing. Due to the limited extent of this leakage its effect is very minor, but genes with high and specific expression in one sample will, erroneously, appear to have low levels of expression in some other samples. This leakage affects most experiments on the Illumina platform and has been previously described (Kircher et al 2012).