Introduction
Overview
The Ignet (Integrated gene network) project is a centrality- and ontology-based literature discovery system for analyzing and visualizing biological gene interaction networks using all PubMed literature papers. Currently Ignet focuses on the literature mining of human gene interaction networks. The Ignet program is generated based on a literature mining strategy that we named “CONDL”, which represents the Centrality and Ontology-based Network Discovery using Literature data. Ignet is co-developed by three groups led by Dr. Yongqun “Oliver” He at the University of Michigan, USA, Dr. Junguk Hur at the University of North Dakota, USA, and Dr. Arzucan Özgür from Bogazici University. The details about CONDL and its case study application have been described in the papers: Ozgur et al., 2011, Hur et al., 2012, and more here.
CONDL Strategy
Our CONDL strategy was initially applied to the literature mining of the Interferon-gamma (IFN-γ; Gene symbol: IFNG) and vaccine-mediated gene interaction networks. IFNG is vital in immune defense against bacterial and viral infections and tumor. It also regulates various immune responses that are often critical for induction of protective immunity generated by vaccines. Initially we used a centrality-based literature discovery approach to study IFN-γ and vaccine-mediated gene interaction network. Our study identified a generic IFNG network that contains 1,060 genes and 26,313 interactions among these genes (Reference: Ozgur et al., 2010). As a subset of this generic IFN-γ network, the vaccine-specific subnetwork contains 102 genes and 154 interactions. However, this literature mining strategy misses the identification of those sentences that include specific vaccine names (e.g., BCG) without mentioning the words “vaccine”, “vaccination”, or their derivatives. Therefore, we used the VO hierarchy definitions to get more specific vaccine names and their relations, and used them for further literature mining. Our study found that more results were identified (Reference: Ozgur et al., 2011). Then such a CONDL strategy was proposed. Later we used the same CONDL strategy to study the fever and vaccine specific human gene interaction networks (Reference: Hur et al., 2012).
Centrality Analysis
Our CONDL approach integrates text mining with network centrality analysis to study various gene interaction networks. Our approach uses a natural language processing (NLP) and a machine learning based method to automatically extract gene interaction networks from the biomedical literature. To rank the genes in the literature-mined networks and to identify the most important ones we analyze the networks from centrality perspective. We calculate four different types of centralities (see reference: PMC2718658):
- Degree centrality — the number of neighbors of a node
- Eigenvector centrality — a function of the centralities of its neighbors
- Closeness centrality — the inverse sum of the distances from the node to the other nodes in the network
- Betweenness centrality — the proportion of the shortest paths between all pairs of nodes that pass through the node in interest. Different centralities measure different levels of importance. For example, in betweenness centrality a node is considered important if it occurs on many shortest paths between other nodes, whereas in degree centrality a node is considered important if it is connected to many other nodes.
Ontology Integration
One novel feature in Ignet is that its development is accompanied with our development of the Interaction Network Ontology (INO). INO contains more than 800 interaction keywords organized in a hierarchical structure. These terms are aligned with the Basic Formal Ontology (BFO). An example of an INO term is “increase”, whose parent term in INO is “positive regulation”, which is a child term of “regulation” and “interaction”. In INO, 21 words are listed as synonyms for the term “increase” — for example, increased, increasing, elevated, and enhanced. These terms are all used for literature retrieval, enabling comprehensive coverage of all “increase” interaction types.
Development History
The Ignet project was initiated in response to community requests for a web server to store analyzed data and provide a user-friendly interface for querying and visualizing gene interaction networks. We initially developed Ignet for analyses of IFNG gene networks, and later extended it to mine all possible human gene interactions based on the literature. The Ignet results are updated periodically. In each dynamic update, Ignet retrieves and stores PubMed abstract data, extracts gene-gene relationships, executes centrality analysis, and stores the results in the Ignet database for users to query.