Searching Pfam¶
There are multiple ways to look for information in Pfam by using the IntePro website.
Searching a specific Pfam entry¶
Users can navigate to specific Pfam entry pages by entering the Pfam identifier or accession number or a keyword that form part of its name via three different Search boxes:
When selecting the Browse + By member database option, the search box is located in the header of the results table.
After selecting Search + By text, a larger text box is shown in the center of the page.
In the top right corner of any InterPro page, next to the magnifying glass.
This text box allows you to go quickly to the relevant page in the InterPro site, by using:
Search |
Find |
---|---|
Pfam accession number |
Pfam entry page |
Pfam identifier or name |
Pfam entry page |
Clan identifier |
Pfam Clan page |
UniProt accession |
InterPro protein page, which includes Pfam matches (with coordinates) |
Gene names |
InterPro protein page, which includes Pfam matches (with coordinates) |
PDB identifier |
InterPro structure page, which includes a 3D visualisation of Pfam matches |
Proteomes |
If it is a reference proteome, the InterPro proteome page will be displayed |
Keywords, free text |
List of possible matches |
Searching a protein sequence against Pfam¶
Searching a protein sequence against the Pfam library of HMMs will enable you to find out the domain architecture of the protein, and thus what its potential function might be. If your protein is present UniProt version used to make the current release of InterPro, we have already calculated its domain architecture. You can access this by entering the Uniprot sequence identifier in any of the Search boxes mentioned above (see Searching a specific Pfam entry).
Using the InterPro online sequence search¶
If your sequence is not in the InterPro database, you could perform a single-sequence or a batch search against the Pfam database on the InterPro website. This search uses the web based InterProScan tool, which allows you to scan up 100 sequences at a time with a maximum length of 40,000 amino acids. To run any online search you can follow these steps:
Click the Search + By Sequence in the InterPro website menu. This opens the InterPro sequence search page.
Provide the FASTA formatted protein sequence(s) of interest by pasting them into the text box or by importing them from a file.
Expand the Advanced options, click on Unselect all protein sequence applications and select Pfam.
Click on the Search button.
While the sequence search is running, you can continue to navigate through the website, other browser tabs or applications and will get a pop-up notification when the job has been completed (this requires the browser notifications to be enabled).
The results of the submitted job are accessible by selecting Results + Your InterProScan Searches in the InterPro website menu.
Interpreting the protein viewer¶
All Pfam entries - and the InterPro entries where they are integrated - are displayed in the protein sequence viewer. The Pfam and InterPro entries are grouped by type (family, domain, repeat, site). The coloured bars indicate the location of entry matches on the protein sequence. Each matched InterPro entry is displayed on a separate line, with the Pfam entries integrated in it displayed below where relevant. The Pfam entries that remain unintegrated in InterPro entries are displayed separately in the Unintegrated category.
On top of the protein sequence viewer, different icons allow to display the viewer on full screen and zoom in and out of the protein sequence. The Options button offers the possibility to personalise the display by changing the colour code of the entries, the labels (accession number, short name and/or description can be displayed on the right-hand side of the viewer), collapsing the visualisation to show InterPro entries only or to display also the contributing entries from the member databases. The tooltip should be kept active to see a pop-up box with the accession number, description and amino acid coordinates of the match of an entry when hovering the mouse over it. Snapshots of the results can be taken in PNG format.
Local protein search¶
Alternatively, if you have a very large number of protein searches to perform, or you do not wish to share your sequence, it may be more convenient to install and run InterProScan.
Finding proteins with a specific set of domain combinations (Domain architectures)¶
Users can search protein sequences that contain specific Pfam entries in a particular arrangement by selecting Search + By Domain architecture in the InterPro website menu. Pfam entries that the proteins should or should not contain can be included or excluded from the domain architecture. The Order of domain matters option offers the possibility to arrange the domains in a particular order. The Exact match option fine tunes the search to find only proteins containing the selected domains (no extra domain in the proteins). Domains can be selected by entering a domain name, Pfam accession or InterPro accession.