Getting Started

Site organisation

_images/family.png

The family page is the major page for accessing information contained within Pfam as it describes the Pfam family entries. Most referring sites link to this page. Alternatively, users can navigate to family pages by entering the Pfam identifier or accession number, either via the home page, the “Jump-to” boxes or the keyword search box, or by clicking on a domain name or graphic from anywhere on the website. As with all Pfam pages, there is the context-sensitive icon bar in the top right hand corner that provides a quick overview about the contents of the tabs. The tabs on the family page cover the following topics: functional annotation; domain organisation or architectures; alignments; HMM logo; trees; curation and models; species distribution; interactions; and structures.

Searching a protein sequence against Pfam

Searching a protein sequence against the Pfam library of HMMs will enable you to find out the domain architecture of the protein. If your protein is present in the version of UniProt, NCBI Genpept or the metagenomic sequence set that we used to make the current release of Pfam, we have already calculated its domain architecture. You can access this by entering the sequence accession or ID in the ‘view a sequence’ box on the Pfam homepage.

If your sequence is not in the Pfam database, you could perform a single-sequence or a batch search by clicking on the ‘Search’ link at the top of the Pfam page.

Local protein searches

If you have a very large number of protein searches to perform, or you do not wish to post your sequence across the web, it may be more convenient to run the Pfam searches locally using the ‘pfam_scan.pl’ script. To do this you will need the HMMER3 software, the Pfam HMM libraries and a couple of additional data files from the Pfam website. You will also need to download a few modules from CPAN, most notably Moose.

Full details on how to get ‘pfam_scan.pl’ up and running can be found on our FTP site.

Proteome analysis

Pfam pre-calculates the domain compositions and architectures UniprotKB reference proteomes. To see the list of proteomes, click on the ‘browse’ link at the top of the Pfam website, and click on a letter of the alphabet in the ‘proteomes’ section. By clicking on a particular organism, you will be be able to view the proteome page for that organism. From here you can view the domain organisation and the domain composition for that proteome.

The taxonomy query allows quick identification of families/domains which are present in one species but are absent from another. It can also be used to find families/domains that are unique to a particular species (note this can be very slow).

Finding proteins with a specific set of domain combinations (‘architectures’)

For a detailed study of domain architectures you can use PfamAlyzer. PfamAlyzer allows you to find proteins which contain a specific combination of domains and to specify particular species and the evolutionary distances allowed between domains.

Wikipedia annotation

The Pfam consortium is now coordinating the annotation of Pfam families via Wikipedia. On the summary tab of some family pages, you’ll find the text from a Wikipedia article that we feel provides a good description of the Pfam family. If a family has a Wikipedia article assigned to it, we now show the text of that article on the summary tab, in preference to the traditional Pfam annotation text.

If a family does not yet have a Wikipedia article assigned to it, there are several ways for you to help us add one. You can find much more information about the process in the Pfam Annotation in Wikipedia section.