Information

Pathway Tools Overview
Academic Download

Publications
Release Note History
Contributions
Pathway Tools Blog
Technical Datasheet
Contact Us

Technical Specs

Web Services
Pathway Tools APIs
Installation Guide
Ontologies
Operations
File Formats

Support

Submitting Bug Reports
Tutorials
User Forum / FAQ
User Group Meetings
Webinars
Ortholog-link Setup Instructions

Ortholog-link Setup Instructions

Context

Starting with Pathway Tools release 9.5, the comparative genome browser allows displaying orthologs between several organisms. To make this work, the orthologs need to have been precomputed, and the orthology relation needs to have been recorded by so-called ortholog-links. Ortholog-links are one type of a dblink, which are stored in the DBLINKS slot. However, there are 2 distinct ways of storing ortholog-links: as DBLINKS on gene frames, or in a special MySQL server (which is advantageous when many organisms need to be compared, but which is more complicated to set up).

For most PGDBs served from http://biocyc.org/ , ortholog-links are stored on a separate MySQL server, where they are accessible by the BioCyc WWW server (and from SRI-internal development images).

For Pathway Tools users outside of SRI, the prerequisite would be that ortholog data has been precomputed by the user or is otherwise available to the user, because SRI does not yet have a mechanism for distributing ortholog data.

Ortholog-link Server Setup Instructions

This section describes how the MySQL server is loaded up with the ortholog-link data, which happens in 2 stages.

Stage 1. Dumping out Ortholog-link Flatfiles

One or several flatfiles need to be created, which contain the ortholog-link data. These files will be loaded into MySQL in Stage 2. The file format is simple, containing 5 tab-delimited columns. Each ortholog-link is a link between one gene in one PGDB and another gene in another PGDB. Each such link has to be mentioned in the total set of flatfiles only once, because the retrieval query will query in both directions. This cuts in half the number of links that have to be put in flatfiles and stored in MySQL.

The 5 columns are: GeneID1 , GeneID2 , OrgID1 , OrgID2 , PValue . Both GeneID1 and GeneID2 are the frame IDs of gene frames in their respective PGDBs and need to be unique within their PGDBs. Both OrgID1 and OrgID2 are the unique IDs for the PGDBs. PValue is a double float number, containing the PValue of the BLAST score. It is effectively optional, as Pathway Tools does not currently use the PValue information for anything. An example line form a flatfile looks like:

CC0008 CBU_0001 CAULO CBUR227377 .00000000000000000000000000000000000000000000000000000000000023

Stage 2. Populating MySQL from the Ortholog-link Flatfiles

  1. Ensure that a MySQL server is running, which has proper access permissions for creating a table in a database and data loading permissions (See MySQL server details below).
  2. Create the "orthologs" database schema in your database:
    mysql> create database orthologs
    NOTE: Whatever you call the ortholog database name, 
          set this value in ptools-init.dat via the Ortho-RDBMS-Database-Name 
          configuration directive.  
  3. Start up a Pathway Tools image.
  4. Ensure that the ec::*ortholog-link-host* variable is set correctly, pointing to the ortholog-link server. It is set by the parameter called Ortho-RDBMS-Server-Hostname in the ptools-init.dat file, along with 3 more related parameters.
  5. The ortholog-link data is stored in one SQL table called Orthologs . If this table already exists and was used for a prior version of the data, then this table needs to be dropped, by running the following at the LISP prompt:
    (connect-to-ortholog-link-db-if-needed)
    (dbi.mysql:sql "DROP TABLE Orthologs" :db *ortholog-link-db*)
    
  6. Create the Orthologs table, populate from the ortholog-link flatfiles, and build the indices, by running the following at the LISP prompt:
    (init-ortho-link-db "/var/ortholog-link-flat-files/")
    
    Replace the example path "/var/ortholog-link-flat-files/" with the directory location of where the flatfiles are located.

    This could take several hours to run to completion.
    For the 189 PGDBs of the 9.5 release, this took 37 min., running on cumin. (kr:Nov-6-2005)
    For 400 11.5 PGDBs, it took over 7 hrs., running on baharat. (kr:Oct-1-2007).
  7. The ortholog-link server should now be ready to use.

MySQL server details

  1. The mysql server usually runs as its own user (mysql). Ensure that:
    • Your ortholog data files AND directories are accessible by the mysql user/group.
    • Newer Linux distros (in particular Debian based) utilizes a new security feature called AppArmor that limits which files/directories that services like mysql can access. AppArmor profiles are usually stored in: /etc/apparmor.d/usr.sbin.mysqld
      Review or adjust the files/directories the paths so that mysql has access to your data files.
  2. Ensure you have enough free disk space for your ortholog data on the mysql server. One of our MySQL servers once ran out of disk space while the indices were being built. The problem is that it ended just hanging forever, and never returned any kind of error message regarding the problem. Running
    df
    should give an indication of whether a disk partition is used up 100%. Also, the MySQL logs, stored at /var/log/mysql/ are likely to contain a disk space error message. However, ordinary users do not have read permissions for these logs...
  3. Ensure the user account you use to load the data has sufficient permissions to load data. Currently, our mysql interface only supports server side loading of data files.
    grant file on *.* to dbuserid@localhost identified by 'dbpassword';

Pathway Tools configuration

In order to make use of the MySQL database for ortholog queries, you must modify a few Pathway Tools parameters stored in the ptools-init.dat configuration file. These are the parameters you need to configure:
  • Ortho-RDBMS-Server-Port 3306 (default mysql port, ask your DBA if you're not sure)
  • Ortho-RDBMS-Database-Name XXXXX (whatever name you called your ortholog database in stage 2, step #2 above).
  • Ortho-RDBMS-Username XXXXX (username to access your mysql DB)
  • Ortho-RDBMS-Password XXXXX (password you use to access your mysql DB)
  • Get-Orthologs-From-SRI N (If you're behind a firewall, you'll want to set this to "N", otherwise, each ortholog query will attempt to query SRI's public ortholog database also.)

Let us please know when you run into trouble with any of this, and we will help guide you through this. Very few of our users have experimented with their own ortholog-links, so the setup is not very user-friendly yet.