Features

1) Introduction

  • WDAC is a new web-based server that identifies homologous proteins by comparing the sequence of domains (domain architecture). Domains are the building blocks of proteins and one of the most useful characteristics for determining protein function. The functions of the individual domains of a multidomain protein contribute to our understanding of the properties of the protein as a whole. WDAC considers promiscuous domains (domains that typically carry out auxiliary functions and appear in many unrelated proteins), which are not directly related to homology.

  • To detect promiscuous domains, we assigned a weight score to each domain extracted from RefSeq proteins that was based on its abundance and versatility. We used a domain¡¯s scores to represent its importance in protein world. We use the cosine similarity to measure the similarity of a pair of domain architectures.

2) Datasets

3) How to measure domain promiscuity

  • To measure abundance and versatility of each domain , we use IAF (Inverse Abundance Frequency) and IV (Inverse Versatility) of a domain. The basic idea of the IAF is derived from IDF (Inverse Document Frequency).

                 Domain weight score = IAF*IV

4) How to compare two domain architectures
  • WDAC search for the best matched domain architecture from the domain architecture database, which is from RefSeq proteins

  • WDAC compare domain architectures using the Cosine similarity.

  • A domain architecture with maximum score is the best similar domain architecture.

Download
  • The frequency, versatility, and weight scores of all the domain in RefSeq proteins are available for download. The data are presented in simple tab-delimited text file (for easy parsing of the data).

        - Download [gz](~110K)

  • Tab-delimited one line of the files consists of the following consecutive fields.
Field Description
Domain

Single Pfam domain  (ex, PF00001).

freq_euk

Frequency of proteins containing a domain in eukaryotic RefSeq proteins.

freq_bac

Frequency of proteins containing a domain in bacterial RefSeq proteins.

freq_arc

Frequency of proteins containing a domain in archaea RefSeq proteins.

partner_euk

partner domain families N- and C-sides of a domain in eukaryotic RefSeq proteins.

partner_bac

partner domain families N- and C-sides of a domain in bacterial RefSeq proteins.

partner_arc

partner domain families N- and C-sides of a domain in archaea RefSeq proteins.

WS_euk

weight score of domain in eukaryotic RefSeq proteins.

WS_bac

weight score of domain in bacterial RefSeq proteins.

WS_arc

weight score of domain in archaea RefSeq proteins.

¡¡

 


Contact Information

Byungwook Lee, Ph.D. (E-mail: bulee@kribb.re.kr)

52 Eoeun-dong,Yuseong-gu,Daejeon, 305-333, Korean Bioinformation Center (KOBIC)