Translational Bioinformatics Laboratory logo

BioGraph - An online service and a graph database for bioinformatics

ICAR logo

Motivations

In the era of Big Data, a huge amount of biological data related to different entities, such as proteins, genes, non-coding RNA, diseases, functional associations, has been made available.
These resources are typically stored in several bioinformatics databases, each one implementing its own data model and user interface.
However, in many bioinformatics scenarios there is often the need to use more than one resource.
For a bioinformatician that implies a further effort in terms of ability to skip form one service to another one; waste of working time for transferring data and intermediate results from one resource to another one, sometimes dealing with aliases and accession ID disambiguation.
The availability of a single bioinformatics platform that integrates many biological resources and services is, for those reasons, a fundamental issue.

Source Databases

Methods

BioGraphDB is an integrated graph database that collects and links heterogeneous bioinformatics resources, and it is implemented on-top of OrientDB.
Graph databases allow, in fact, a greater scalability and queries efficiency with regards to the size of data, rather than traditional SQL database.
Each component database has been downloaded from its original site and it has been processed using customized Extract-Transformer-Loader (ETL) modules, in order to be assembled into a graph architecture. Each biological entity and its properties have been mapped respectively into a vertex and its attributes, and each relationship between two biological entities has been mapped into an edge. The whole assembled graph can be traversed using proper query languages, such as Gremlin. Each graph traversal represents a set of queries that are enough in order to solve several bioinformatics scenarios.

DataBase Schema

The graph data model used for BioGraphDB.


BioGraphDB graph data model
 

Nodes Properties
Gene geneId (String)
locusTag (String)
chromosome (String)
mapLocation (String)
description (String)
type (String)
nomenclatureAuthoritySymbol (String)
nomenclatureAuthorityFullName (String)
nomenclatureStatus (String)
otherDesignations (String)
GeneName symbol (String)
Go goId (String)
name (String)
namespace (String)
definition (String)
obsolete (String)
comment (String)
Protein name (String)
fullName (String)
alternativeName (String)
gene (String)
sequence (String)
sequenceLenght (Int)
sequenceMass (Int)
ProteinName name (String)
Pathway pathwayId (String)
name (String)
summation (String)
Cancer name (String)
MiRNA accession (String)
name (String)
description (String)
comment (String)
sequence (String)
MiRNAmature ...
location (String)
sequence (String)
MiRNASNP SNPid (String)
miRNA (String)
chr (String)
miRstart (Int)
miRend (Int)
lostNum (Int)
gainNum (Int)
Interaction transcriptId (String)
extTranscriptId (String)
mirAlignment (String)
alignment (String)
geneAlignment (String)
mirStart (Int)
mirEnd (Int)
geneStart (Int)
geneEnd (Int)
genomeCoordinates (String)
conservation (Double)
alignScore (Int)
seedCat (Int)
energy (Double)
mirSvrScore (String)

mirTarBaseId (String)
experiments (String)
supportType (String)

snpEnergy (Double)
basePair (String)
geneAve (Double)
mirnaAve (Double)

database (String)
Relations Properties
ANNOTATES evidence (String)
qualifier (String)
category (String)
SYNONYM_OF -
CODING -
CONTAINS -
REFERS_TO -
CANCER2MIRNA profile (String)
PRECURSOR_OF -
HAS_SNP -
INTERACTING_GENE -
INTERACTING_MIRNA -
INTERACTING_SNP -

Templates

Templates are predefined queries, each with a simple form and a description, grouped by category.
After setting parameters, the ready-to-run query is sent to the Gremlin Workbench page for execution.

GO Term Genes

Search for Genes that are associated with a particular Gene Ontology (GO) annotation.
 


 g.V().hasLabel('Go').has('name', goTerm ).
 out('ANNOTATES').hasLabel('Gene').order().by('description')
 


Pathway Genes

For a given Pathways show any Genes associated with the Pathway.
 


 g.V().hasLabel('Pathway').has('name', pathwayName ).
 out('CONTAINS').in('CODING').order().by('description')
 


Pathways Protein

For a given pathway, show all proteins.
 


 g.V().hasLabel('Pathway').has('name', pathwayName ).
 out('CONTAINS').order().by('name')
 


Gene GO Term

Search for GO annotations for a particular gene.
 


 g.V().hasLabel('Gene').has('nomenclatureAuthoritySymbol', symbol ).
 in('ANNOTATES').order().by('name')
 


Gene Pathway

For a given Gene, show any associated Pathway.
 


 g.V().hasLabel('Gene').has('nomenclatureAuthoritySymbol', symbol ).
 out('CODING').in('CONTAINS').order().by('name')
 


Protein GO Term

For a given Protein, returns the associated Gene Ontology (GO) terms.
 


 g.V().hasLabel('Protein').has('name', proteinName ).
 in('ANNOTATES').order().by('name')
 


miRNA Cancer

For a given miRNA, returns the associated cancers from miRCancer.
 


 g.V().hasLabel('MiRNA').has('name', mirnaName ).
 in('CANCER2MIRNA').dedup().order().by('name')
 


miRNA mature Genes

For a given miRNA mature, returns the genes through all the validated (miRTarBase) interactions.
 


 g.V().hasLabel('MiRNAmature').has('product', mirnaName ).
 in('INTERACTING_MIRNA').has('database','miRTarBase').out('INTERACTING_GENE').
 dedup().order().by('nomenclatureAuthoritySymbol')
 


Scenarios

Scenarios are predefined complex queries, proposed as examples of how BioGraph and Gremlin can help us in the analysis of specifics non-trivial problems.
After setting parameters, the ready-to-run query is sent to the Gremlin Workbench page for execution.

miRNA functional analysis in cancer

The query investigates the functional role of miRNAs in cancer pathology.
Wild-type differentially expressed (DE) miRNAs in a specific cancer disease are investigated as regulative elements of gene targets through interaction analysis. At this point an energy filter is applied according to the free energy score of the binding site predicted by miRanda. This allows to highlight only miRNA-target interactions that are strongly bound.
The targets evidenced are then analyzed through GO enrichment, to see the functional annotations that link these molecules to the selected cancer disease.
 


 g.V().hasLabel('Cancer').has('name', cancerName ).
 out('CANCER2MIRNA').dedup().out('PRECURSOR_OF').in('INTERACTING_MIRNA').
 has('database','miRanda').has('energy',lt( energy )).
 out('INTERACTING_GENE').dedup().in('ANNOTATES').dedup()
 

miRNA-SNP functional analysis in cancer

The query allows to evidence the functional significance of miRNA single nucleotide polymorphisms (SNPs) in cancer pathology.
Starting from a specific cancer type, miRNA SNPs linked to the cancer disease are selected and used in miRNA-target interactions DB (a free energy score is applied).
The results list used to evidence GO association lists related to DE miRNA SNPs and cancer disease.
 


 g.V().hasLabel('Cancer').has('name', cancerName ).
 out('CANCER2MIRNA').dedup().out('PRECURSOR_OF').out('HAS_SNP').
 in('INTERACTING_SNP').has('snpEnergy',lt( snpEnergy )).
 out('INTERACTING_GENE').dedup().in('ANNOTATES').dedup()
 


Cancer involved miRNAs by pathway

Starting from a specific pathway, finds the up-regulated miRNAs involved in a specific cancer scenario.
 


 g.V().hasLabel('Pathway').has('name', pathwayName ).out('CONTAINS').in('CODING').
 in('INTERACTING_GENE').has('database', 'miRanda').has('energy',lt( energy )).
 out('INTERACTING_MIRNA').dedup().in('PRECURSOR_OF').inE('CANCER2MIRNA').has('profile', profile ).outV().dedup()
 


Common pathways between two genes

Starting from two given genes, finds all pathways. If common pathways are found, the user will focus them instantly.
 


 g.V().hasLabel('Gene').choose(values('nomenclatureAuthoritySymbol')).
 option( gene1 ,__.as('a')).option( gene2 ,__.as('b')).
 out('CODING').in('CONTAINS')
 


Clear
Cancer MiRNA MiRNA mature MiRNA SNP Interaction Gene Protein Pathway GO

 
 
 

Data Sources

List of data sources used in current version of BioGraph.

Data Source Version/Date URL
NCBI Entrez Gene 09/05/2017 ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Mammalia/Homo_sapiens.gene_info.gz
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2go.gz
UniProt Swiss-Prot 2017_04 ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.xml.gz
ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/idmapping/by_organism/HUMAN_9606_idmapping_selected.tab.gz
HGNC 10/05/2017 ftp://ftp.ebi.ac.uk/pub/databases/genenames/new/tsv/hgnc_complete_set.txt
ftp://ftp.ebi.ac.uk/pub/databases/genenames/new/tsv/locus_groups/protein-coding_gene.txt
ftp://ftp.ebi.ac.uk/pub/databases/genenames/new/tsv/locus_groups/non-coding_RNA.txt
Gene Ontology 09/05/2017 http://archive.geneontology.org/latest-termdb/go_daily-termdb.obo-xml.gz
Reactome v59 http://www.reactome.org/download/current/pathway2summation.txt
http://www.reactome.org/download/current/gene_association.reactome
http://www.reactome.org/download/current/miRBase2Reactome_All_Levels.txt
miRBase Release 21 ftp://mirbase.org/pub/mirbase/CURRENT/miRNA.dat.gz
miRCancer December 2016 http://mircancer.ecu.edu/downloads/miRCancerDecember2016.txt
microRNA.org August 2010 http://cbio.mskcc.org/microrna_data/human_predictions_S_C_aug2010.txt.gz
miRTarBase 6.1 http://mirtarbase.mbc.nctu.edu.tw/cache/download/6.1/hsa_MTI.xlsx
miRNASNP 2.0 http://bioinfo.life.hust.edu.cn/miRNASNP2/download/snp_in_human_miRNA_seed_region.txt
http://bioinfo.life.hust.edu.cn/miRNASNP2/download/miRNA_targets_gain_by_SNPs_in_seed_regions.txt

Contact Us

Use the simple form below to contact our team.


To do...