contents footer

Supplementary Data

Table S1. Statistics of raw data used in the VnD database.

this table, the numbers of diseases, genes, proteins, and SNPs were counted based on each unique identifier.
*CUI: Unified Medical Language System (UMLS)'s concept unique identifier.

Disease(CUI) Genes Total SNPs UniProt
SNPs nsSNPs
36,109 (3,898) 40,234 14,581,945 85,510
5,766,017 a 91,038 (1.6%) b
  • a These SNP located in genic regions.
  • b The number of SNP indicates non-synonymous SNP located in genic regions.

Table S2. Summary of the disease-related genetic variation.

Disease-related genes (*dzGenes) SNPs in dzGenes nsSNPs (%) in dzGenes UniProt from dzGenes
13,940 107,780 57,695 (53.5%) 10,883
  • * dzGenes : genes associated with diseases.

Table S3. Summary of the protein associated with drug-related nsSNPs.

Drug targets (in DrugBank) No. of PDB a No. of UniProt b Structure & docking
Wild-type Mutations
2,486 601 538 590 4,437
  • a The number means the of proteins that are mappedto PDB with 60% identity sequence homology.
  • b The number indicates the number of UniProt proteins having pockets and SNPs.

Figure S1.

Modeling and comparison of wild-type and nsSNP mutant structures.
The structure modeling section displays the protein structures with the highest stability. The selected drug target protein templates are generated using 10 candidate structures modeling wild-type and mutant pockets. Then the structures with the highest stability are chosen and displayed(as shown in the upper box). The structure analysis section compares the structural features between a wild-type protein and its mutants. To identify the structural change between wild-type and mutant proteins, we compared structural features such as stability and pocket sizes (as shown in the lower box).

Figure S2.

Protein structure change for protein P07550.
The query protein (P07550) is associated with obesity, diabetes, parasitic infection, and asthma. This protein has several structural pockets, and six disease-related nsSNPs have been identified. These SNPs cause physical changes in the amino acids sequence and affect the energy stability of the protein. Remarkably, a single amino acid change (G257R) by nsSNP (rs56100672) changes the protein's properties from small and hydrophobic to polar and positive. This change interrupts the interactions between the protein and its ligand (SO4), as shown in (a), by sharply denting the protein's pocket size and shape. The pocket size in mutant protein is reduced from 214 Amstrong in wild-type to 170 Amstrong (b).

Figure S3.

Distance between SNP and structural pocket pie chart.
To identify the SNP distribution near the pockets, we analyzed the pockets in a protein structure using the LIGSITE, which calculates the pocket size and potential ligand-binding sites using the protein-solvent-protein method. We then count the number of SNPs located around protein structural pockets

Figure S4.

Distribution of SNPs located around structural pockets.
To investigate the relationships between SNPs and pocket configurations, we calculated the distances between structural pockets on proteins and mutation sites. It has been shown that the physical, chemical, and geometric properties of a protein's surface affects its function. Protein surface patterns such as pockets have also been shown to be important to protein function. Changes in pocket size or stability by SNPs can affect a protein's ability to interact with a drug ligand. Thus, nsSNPs neighboring protein pockets are likely to have deleterious effects on protein function and may result in disease formation.

Figure S5.

Distribution of Amino acids from wild-type and mutants.
To determine the amino acids changes caused by non-synonymous SNP, we counted the number of amino acids in wild-type and mutants.

Figure S6.

Amino acid change by non-synonymouse SNP.
The numbers in table represent the number of times (change count) an amino acid on Y-axis changes to one on X-axis (1→2). High rate change counts (>50) are marked in red .