Installation of UCSC Genome Browser in a local server

A. Preface and MySQL configuration


The UCSC Genome Browser is one of the most essential tools in genomics research. Its value is ever increasing, proportionally to the current explode in available Next Generation Sequencing data. Its installation is not something mainstream and requires a lot of patience and a little more than basic knowledge of Linux environment and MySQL. Before you try it, make sure that you know how to install linux packages (and also from source), how to perform a basic MySQL and Apache setup and how to run Perl and Shell scripts. This guide is not exaclty a step by step procedure as it refers a lot of times to external sources, blogs and wikis found around the web. Based on the work of others, I tried to install an as customizable as possible version on my server, to be used by several labs at the institution I am currently working in.

My installation is performed on an Ubuntu 12.04 LTS Server. You can adjust it for your distribution. Throughout this guide we assume that our base storage environment is /media/HD2/, so you will see a lot of time the shell variable $STORAGE="/media/HD2". We also assume a temporary directory, $TEMP, by default the /tmp directory

If you have MySQL > 5.5 (which is the default in Ubuntu>=12.04) you must recompile from the source in order to enable the
load data local infile
MySQL command, which is by default disabled in more recent versions. To this end, you can follow the instructions here. In the cmake command, add:
-DWITH_SSL=yes -DMYSQL_UNIX_ADDR=/var/run/mysqld/mysqld.sock -DWITH_INNOBASE_STORAGE_ENGINE=1
Then, edit the /etc/mysql/my.cnf file and comment all the ssl functions under [mysqld] and
change
lc-messages-dir = /usr/share/mysql
to
lc-messages-dir = /usr/local/mysql/share
While in my.cnf, add the following lines under [mysqld]
key_buffer              = 1024M
max_allowed_packet      = 64M
thread_stack            = 512K
thread_cache_size       = 32
table_cache             = 1024
query_cache_limit       = 16M
query_cache_size        = 1024M
sort_buffer_size        = 16M
read_buffer_size        = 16M
read_rnd_buffer_size    = 32M
myisam_sort_buffer_size = 512M
bulk_insert_buffer_size = 1024M
join_buffer_size        = 512M
innodb_flush_log_at_trx_commit  = 2
innodb_log_buffer_size  = 64M
innodb_log_file_size    = 512M
innodb_buffer_pool_size = 32768M # Watch this as I have 64GB of RAM!
innodb_thread_concurrency = 16
innodb_flush_method=O_DIRECT
the following under [myisamchk]
key_buffer              = 1024M
sort_buffer_size        = 512M
read_buffer             = 64M
write_buffer            = 64M
and the following under [isamchk]
key_buffer              = 1024M
sort_buffer_size        = 512M
read_buffer             = 64M
write_buffer            = 64M
Before starting the newly compiled mysql server, follow EXACTLY the instructions here and in a more friendly version here to properly reconfigure InnoDB to store and process bigger bulk imports, as it is mandatory for the faster function and import of custom tracks. You should also lock the MySQL version you just installed in order not be affected by Ubuntu's updating system. This is quite easy and you can do it by a little googling.

B. Installation of the UCSC Genome Browser web application and session system


This section contains instructions on how to install the Genome Browser application only. The visualization application, the session system and the other UCSC applications (e.g. the Table Browser) are independent of the background databases containing several genomic features. This section assumes basic knowledge about Apache, installing packages from source and basic MySQL administration knowledge.
  1. Create $STORAGE/gbdb and $STORAGE/genomebrowser directories
    sudo mkdir $STORAGE/gbdb
    sudo mkdir $STORAGE/genomebrowser
  2. Fetch the kent source tree to a $STORAGE/kent directory
    sudo mkdir $STORAGE/kent
    sudo git clone git://genome-source.cse.ucsc.edu/kent.git
  3. Copy $STORAGE/kent/src/product/scripts to $STORAGE/scripts
    sudo mkdir $STORAGE/scripts
    sudo cp -r $STORAGE/kent/src/product/scripts $STORAGE/scripts
  4. Open synaptic package manager and install the libmysqlclient-dev packages, and generally other libmysql development packages to get header files. At this point you should be careful not to interfere with the new MySQL installation of section A. It might require some time playing around, but generally, it should work from the first effort. If not, perform this step before installing MySQL from source, in section A.
  5. Optionally, enable SSL in MySQL, see here for detailed instructions. Fix the apparmor by following instructions here.
  6. Install SAM tools from source from here or look for the samtools package in Ubuntu synaptic application.
  7. Edit both your account .bashrc file as well as the /root/.bashrc and add the line
    export MACHTYPE=x86_64
    (replace x86_64 with your machine's architecture, it can be found with uname -p) and reload them.
    source ~.bashrc
    su # ...and then your password
    source ~.bashrc
  8. Edit $STORAGE/scripts/browserEnvironment.txt. Change the following to (changed below):
    export KENTHOME=$STORAGE"/kenthome/"
    export kentSrc=$STORAGE"/kent"
    export GBDB=$STORAGE"/gbdb"
    export BROWSERHOME=$STORAGE"/genomebrowser"
    export HGSQL="mysql -uUSER_WITH_WRITE_PERMISSIONS -pPASSWORD"
    #export MYSQLLIBS="/usr/lib/x86_64-linux-gnu/libmysqlclient.a -lz"
    export MYSQLLIBS="/usr/local/mysql/lib/libmysqlclient.a -lz"
    export MYSQLINC="/usr/local/mysql/include"
    export PNGLIB="/usr/lib/x86_64-linux-gnu/libpng.a"
    export PNGINCL"-I/usr/include/libpng12"
    export USE_BAM=1 (uncomment)
    export KNETFILE_HOOKS=1 (uncomment)
    export SAMDIR=/opt/NGSTools/SAMTools
    export SAMINC=${SAMDIR} (uncomment)
    export SAMLIB=${SAMDIR}/libbam.a (uncomment)
    export AUTH_MACHINE="sevenofnine"
    export AUTH_USER="root"
  9. Prepare Apache

    Enable XBitHack
    sudo a2enmod include
    In /etc/apache2/apache2.conf add the line
    XBitHack on
    Create a virtual host file called my_prefered_host_name in /etc/apache2/sites-available and copy:
    XBitHack on
    # Virtual host for genomebrowser
    <VirtualHost *:80>
     ServerAdmin your_admin_mail@yourdomain.com
     DocumentRoot /media/HD2/genomebrowser
     ServerName genomebrowser
     <Directory />
      Order deny,allow
      Deny from all
      Options FollowSymLinks
      AllowOverride None
     </Directory>
     <Directory /media/HD2/genomebrowser>
      AllowOverride AuthConfig
      Options +Inlcudes
      Order allow,deny
      allow from all
     </Directory>
    
     ScriptAlias /cgi-bin/ /media/HD2/genomebrowser/cgi-bin/
     <Directory "/media/HD2/genomebrowser/cgi-bin">
      AllowOverride None
      Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
      Order allow,deny
      Allow from all
      AddHandler cgi-script .cgi .pl
     </Directory>
    
     ErrorLog /media/HD2/genomebrowser/logs/apache2/error.log
     CustomLog /media/HD2/genomebrowser/logs/apache2/access.log combined
     LogLevel warn
    
     Alias /doc/ "/usr/share/doc/"
     <Directory "/usr/share/doc/">
      Options Indexes MultiViews FollowSymLinks
      AllowOverride None
      Order deny,allow
      Deny from all
      Allow from 127.0.0.0/255.0.0.0 ::1/128
     </Directory>
    
     # Some security
     ServerSignature Off
    </VirtualHost>
    Add the following line to /etc/hosts
    127.0.0.254     genomebrowser
    Restart the networking service
    sudo /etc/init.d/networking restart
    Restart Apache
    sudo /etc/init.d/apache2/restart
  10. Create a MySQL user using either MySQL command line or webmin or phpMyAdmin (I created gbuser, password MY_PASSWORD, using  webadmin). You should have these tools anyway as they are very handy for managing your system.
  11. Create the hg.conf file as described here, in Part 1: Genome Browser engine. Here is mine:
    # Configuration file for the UCSC Human Genome server
    #
    # the format is in the form of name/value pairs, written as 'name=value'
    #
    # note that there is no space between the name and its value. Also, no blank lines should be in this file.
    #
    #--------------------------------------------------------------#
    #
    # db.host is the name of the MySQL host to connect to
    db.host=localhost
    #
    # db.user is the username used when connecting to the host
    db.user=THE_USER_CREATED_IN_STEP_9
    #
    #
    # this is the password to use with the above hostname
    db.password=THE_PASSWORD
    #
    db.trackDb=trackDb
    # central.host is the name of the host of the central MySQL
    # database where stuff common to all versions of the genome
    # and the user database is stored.
    central.db=hgcentral
    central.host=localhost
    central.user=THE_USER_CREATED_IN_STEP_9
    central.password=THE_PASSWORD
    central.domain=
    backupcentral.db=hgcentral
    backupcentral.host=localhost
    backupcentral.user=THE_USER_CREATED_IN_STEP_9
    backupcentral.password=THE_PASSWORD
    backupcentral.domain=
    # required to use hgLogin
    login.systemName=hgLogin CGI
    # url to server hosting hgLogin
    wiki.host=genomebrowser
    # name of cookie holding username - do not change!
    wiki.userNameCookie=wikidb_mw1_UserName
    # name of cookie holding user id - do not change!
    wiki.loggedInCookie=wikidb_mw1_UserID
    # title of host of browser, this text be shown in the user interface of the login/sign up screens
    login.browserName=UCSC Genome Browser @Fleming
    # base url of browser install
    login.browserAddr=http://genomebrowser
    # signature written at the bottom of hgLogin system emails
    login.mailSignature=Local administrator: Panagiotis Moulos
    # from/return email address used for system emails
    login.mailReturnAddr=your_admin_mail@yourdomain.com
    The last lines (about login) will enable the independent login system of the browser so as to be able to host different users.
  12. Create a /root/bin/x86_64 directory and $STORAGE/kenthome/bin/x86_64 directory and a symbolic link
    sudo mkdir -p /root/bin/x86_64
    sudo mkdir -p $STORAGE/kenthome/bin/x86_64
    sudo ln -s $STORAGE/kenthome/bin/x86_64 /root/bin/x86_64
  13. Create the /gbdb symlink. Very important...
    sudo ln -s $STORAGE/media/HD2/gbdb /gbdb
  14. Before fetching the html files using updateHtml.sh, I edited the updateHtml.sh kent script in $STORAGE/scripts to also displaty the rsync output in stdout instead of log only. To do this, go to the ${RSYNC} commands towards the end of the script and replace
    >> ${FETCHLOG} 2>&1
    with
    | tee -a ${FETCHLOG} 2>&1.
    Save the file and then run it. Add also --verbose after ${RSYNC}.
    sudo sh updateHtml.sh ./browserEnvironment.txt
  15. Before fetching and compiling the source, we have to patch SAMTools to enable network support for BAM files. This has to be done manually, as the SAMTools do not yet support it. The patch as well as full instruction on how to apply it can be found here. Please follow them carefully
  16. Now we have to run kentSrcUpdate.sh in order to fetch the latest code and build the binaries and CGIs from source. Open the kentSrcUpdate.sh script. Towards the end, replace the > daily.log etc. of the make commands with | tee -a (see also step 13) to display all messages in STDOUT. Then, run the script
    sudo sh kentSrcUpdate.sh ./browserEnvironment.txt
  17. We must download now the hgcentral database. We use the fetchHgCentral.sh script for that
    sudo sh fetchHgCentral.sh go > $TEMP/hgcentral.sql
  18. We must set up an SQL database to accept the file that we just downloaded, along with a genome browser user. Ideally, we should have a user with SELECT permissions and a user with ALL permissions... We set a user will ALL permissions for now as the browser itself is graphical and does not allow for writing
    mysql -uUSER_WITH_WRITE_PERMISSIONS -pPASSWORD \
    -e "CREATE USER 'gbuser'@'localhost' identified by 'password'; FLUSH PRIVILEGES;"
    
    mysql -uUSER_WITH_WRITE_PERMISSIONS -pPASSWORD -e "CREATE DATABASE hgcentral;"
    
    mysql -uUSER_WITH_WRITE_PERMISSIONS -pPASSWORD \
    -e "GRANT SELECT, INSERT, UPDATE, DELETE, INDEX, LOCK TABLES, \
    CREATE, DROP, ALTER, CREATE TEMPORARY TABLES ON hgcentral.* \
    TO 'gbuser'@'localhost'; FLUSH PRIVILEGES;"
    
    mysql -uUSER_WITH_WRITE_PERMISSIONS -pPASSWORD \
    -e "GRANT FILE ON *.* TO 'gbuser'@'localhost'; FLUSH PRIVILEGES;"
    
    mysql -uUSER_WITH_WRITE_PERMISSIONS -pPASSWORD -e "CREATE DATABASE hgFixed;"
    
    mysql -uUSER_WITH_WRITE_PERMISSIONS -pPASSWORD \
    -e "GRANT SELECT ON hgFixed.* TO 'gbuser'@'localhost'; FLUSH PRIVILEGES;"
  19. Import the hgcentral database
    mysql -ugbuser -ppassword hgcentral < $TEMP/hgcentral.sql
    The basic genome browser session functionality should be almost ready. We need to create a couple more symbolic links to custom JavaScript and CSS files
  20. Create the following symbolic links:
    sudo ln -s $STORAGE/genomebrowser/cgi-bin $STORAGE/genomebrowser/htdocs/cgi-bin
    sudo ln -s $STORAGE/genomebrowser/trash $STORAGE/genomebrowser/htdocs/trash
  21. Create the /usr/local/apache/htdocs directory (nothing there) and then the following symbolic links:
    sudo mkdir -p /usr/local/apache/htdocs
    sudo ln -s $STORAGE/genomebrowser/htdocs/js /usr/local/apache/htdocs/js
    sudo ln -s $STORAGE/genomebrowser/htdocs/style /usr/local/apache/htdocs/style
    sudo ln -s $STORAGE/genomebrowser/htdocs/inc /usr/local/apache/htdocs/inc
    sudo ln -s $STORAGE/genomebrowser/htdocs/images /usr/local/apache/htdocs/images
    sudo ln -s $STORAGE/genomebrowser/htdocs/goldenPath/help /usr/local/apache/htdocs/goldenPath/help/
    At this point the website must be partially functional. Now we have to install some genome databases
  22. Change the ownership of the contents of the genomebrowser directory to www-data and restart apache
    sudo chown -R www-data:www-data $STORAGE/genomebrowser
    sudo /etc/init.d/apache2 restart

C. Installation of minimal genome databases


  1. Create a file named my.minimal.db.list.txr and type the following (for 5 organisms):
    hg18
    hg19
    mm9
    dm3
    hgFixed
  2. Fetch the minimal gbdb information for these organisms by running the script fetchMinimalGbdb.sh. Before running, edit and replace to the last lines, where fetchOne is called, > with | tee -a to display information as before. Add also --verbose option in the ${RSYNC} commands, if additional information is essential to you (it was for me!).
    sudo sh fetchMinimalGbdb.sh ./browserEnvironment.txt ./my.minimal.db.list.txt
  3. Fetch the minimal golden path database information for these organisms by running the script  fetchMinimalGoldenPath.sh. Before running, edit and replace lines for more verbosity, as in step 2.
    sudo sh fetchMinimalGoldenPath.sh ./browserEnvironment.txt ./my.minimal.db.list.txt
  4. hg18 sql table creation files have a syntax problem (at least with my MySQL version). Go to $STORAGE/genomebrowser/htdocs/goldenPath/hg18/database and run
    sudo sed -i 's/TYPE=/ENGINE=/g' *.sql
  5. Load the minimal golden path databases fetched with the script above
    sudo sh loadDb.sh ./browserEnvironment.txt hg18
    sudo sh loadDb.sh ./browserEnvironment.txt hg18
    sudo sh loadDb.sh ./browserEnvironment.txt hg19
    sudo sh loadDb.sh ./browserEnvironment.txt mm9
    sudo sh loadDb.sh ./browserEnvironment.txt mm10
    sudo sh loadDb.sh ./browserEnvironment.txt dm3
    sudo sh loadDb.sh ./browserEnvironment.txt hgFixed
  6. Grant access to the newhe genome browser user
    for DB in hg18 hg19 mm9 mm10 dm3
    do
     mysql -uUSER_WITH_WRITE_PERMISSIONS -pPASSWORD \
     -e "GRANT SELECT, INSERT, UPDATE, DELETE, INDEX, CREATE, DROP, ALTER, \
     CREATE TEMPORARY TABLES ON $DB.* TO 'gbuser'@'localhost'; FLUSH PRIVILEGES;" 
    done
  7. Now you must have a basic track functionality if the local version of UCSC genome browser. However, there are not many things that can be done, apart from custom track exploration and sequence retrieval as there is no gene annotations etc. The next section explains how we can customize the UCSC databases a bit further than the "take the minimum or all" approach of the kent scripts.

D. Installation of other genome database tables


  1. As it is very space costly (and most times useless) to install the full mirror of Genome Browser databases and there is no straightforward way to determine what feature corresponds to which table, I created a Perl script called fetchCustomDb.pl to fetch the tables we need. However, this is not completely automatic as it requires certain manual work to determine the tables for the required features (I did it using the UCSC Table Browser) and to note them down so as to create a YAML configuration file which is required by the Perl script. The YAML configuration file is quite self-excplicable and contains these tables but in a configuration format understandable by the Perl script together with other variables. The script can be downloaded from here and the YAML parameter file from here. I created the table list by fetching the tables for the features of interest in the UCSC Table Browser and then by viewing the source of the page and copying-pasting the contents of ther respective SELECT list. This of course can become more systematic by using some package to scrap the page (in the TODO list...). In this way, the final table list contains a lot of duplicates as many tables are interconnected. But the script takes care of that. As the tables and the way they are constructed across genomes are a mess (e.g. in some genomes features are splitted per chromosome, in others not), you need to explore a bit the FTP server of UCSC to determine that. Another script also takes care of the external databases that have to be installed (e.g. GO, UniProt, etc.) by directly using mysqldump in UCSC server. It can be downloaded from here. Once done, you can pass these tables to the parameters file and the script takes care of the rest. After you define all these, you just run
    sudo perl fetchCustomDb.pl --param your_param_file.yml
    The parameter file is optional if your needs are the same as mine (they are loaded also by default). It is advised to use the --dry parameter to see what will be the total amount of data to be downloaded, as UCSC data continue to expand.
    sudo perl fetchCustomDb.pl --param your_param_file.yml --dry
    The script uses a Perl interface for rsync. One would wonder why not use a simple shell script with multiple rsync lines. The answer for me is easy usage, reusability, elegance, system and maintenance!
  2. Now we must reload the database tables. This will be done with the kent script loadDb.sh. However, keep in mind that in order to use this script, the databases created in step C.5 must be dropped as this script works only if the databases do not exist. This can be easily done using a GUI tool such as phpMyAdmin or webmin and sufficient user privileges, or even in command line by
    -e "DROP DATABASE genome_to_be_dropped;"
    Do NOT drop the hgcentral database! The latest version of the aforementioned Perl script takes care of that for you. Just read the documentation.
  3. Set up a cron work in /etc/cron.weekly to clean the trash data from custom tracks. You can do this either in webmin or using the script here, name it clean-gb-trash and set permissions to 755:
    #!/bin/bash
    find $STORAGE/genomebrowser/trash/ \! \( -regex "$STORAGE/genomebrowser/trash/ct/.*" \
     -or -regex "$STORAGE/genomebrowser/trash/hgSs/.*" \) -type f -amin +10080 -exec rm -f {} \;
    find $STORAGE/genomebrowser/trash/    \( -regex "$STORAGE/genomebrowser/trash/ct/.*" \
     -or -regex "$STORAGE/genomebrowser/trash/hgSs/.*" \) -type f -amin +20160 -exec rm -f {} \;
    and
    sudo chmod 755 /etc/cron.daily/clean-gb-trash
  4. Finally, download the processed Genbank files from UCSC FTP path /gbdb/genbank/./data/processed/* to $STORAGE/gbdb/genbank/./data/processed/
    sudo rsync --archive --compress --partial --recursive --progress --stats --verbose --human-readable \
    rsync://hgdownload.cse.ucsc.edu/gbdb/genbank/data/processed/* \
    $STORAGE/gbdb/genbank/data/processed/

E. Setting up the custom track database


This section describes how to set up support for the custom track database, so that to avoid using the trash directories and achieve faster access. This was later added to the UCSC Genome Browser. It is advised that you follow this step as it is recommended for proper user sessions.
  1. Enable the custom track database. This is not handled by kent scripts. To do this, firstly create the customTrack database in MySQL
    mysql -uUSER_WITH_WRITE_PERMISSIONS -pPASSWORD -e "CREATE DATABASE customTrash;"
  2. Create another user to work with custom tracks, e.g. ctgbuser (I created ctgbuser with password ctbguser@mydomain).
    mysql -uUSER_WITH_WRITE_PERMISSIONS -pPASSWORD \ 
    -e "CREATE USER 'ctgbuser'@'localhost' IDENTIFIED BY 'password'; FLUSH PRIVILEGES;"
    
    mysql -uUSER_WITH_WRITE_PERMISSIONS -pPASSWORD \ 
    -e "GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, ALTER ON customTrash.* \ 
    TO 'ctgbuser'@'localhost'; FLUSH PRIVILEGES;" 
    
    mysql -uUSER_WITH_WRITE_PERMISSIONS -pPASSWORD \ 
    -e "GRANT FILE ON *.* TO 'ctgbuser'@'localhost'; FLUSH PRIVILEGES;"
  3. Create a temporary directory which is used by this functionality
    sudo mkdir $STORAGE/genomebrowser/data/tmp
    sudo chown -R www-data:www-data $STORAGE/genomebrowser/data/tmp
  4. Enter the following items to hg.conf
    customTracks.host=localhost
    customTracks.user=ctgbuser
    customTracks.password=password
    customTracks.useAll=yes
    customTracks.tmpdir=$STORAGE/genomebrowser/data/tmp
  5. Make sure you do all things below as root, either with su or with sudo. Create a hidden directory .conf in $STORAGE/genomebrowser.
    sudo mkdir $STORAGE/genomebrowser/.conf
    In this directory, create the .ct.hg.conf file file with the following contents:
    db.host=localhost
    db.user=ctgbuser
    db.password=password
    and set its permissions to 600
    sudo chmod 600 ct.hg.conf
    Next, place a copy of hg.conf there too
    sudo cp $STORAGE/genomebrowser/cgi-bin/hg.conf  $STORAGE/genomebrowser/.conf/.hg.conf
    Finally, create two symbolic links in /root for these files
    sudo ln -s $STORAGE/genomebrowser/.conf/.hg.conf .hg.conf
    sudo ln -s $STORAGE/genomebrowser/.conf/.ct.hg.conf .ct.hg.conf
  6. In /etc/cron.daily create the tmp cleaner script (cleans it daily) and name it clean-gb-tmp:
    #!/bin/bash
    find $STORAGE/data/tmp -type f -amin +1440 -exec rm -f {} \;
    and then
    sudo chmod 755 /etc/cron.daily/clean-gb-tmp
  7. Create the following script to be used with a cron job (better schedule it through webmin tool) to periodically clean the custom tracks database and name it clean-gb-ctdb
    #!/bin/sh
    
    DS=`date "+%Y-%m-%d"`
    YYYY=`date "+%Y"`
    MM=`date "+%m"`
    export DS YYYY MM
    
    mkdir -p $STORAGE/genomebrowser/data/trashLog/localhost/${YYYY}/${MM}
    RESULT="$STORAGE/data/trashLog/localhost/${YYYY}/${MM}/${DS}.txt"
    export RESULT
    
    sudo $STORAGE/kenthome/bin/x86_64/dbTrash -age=168 -drop -verbose=2 \
    > ${RESULT} 2>&1
    and then
    sudo chmod 755 /etc/cron.daily/clean-gb-ctdb
    This will clean it weekly and keep a log. Don't forget to add $STORAGE/kenthome/bin/x86_64/dbTrash to your sudoers file, so as not to ask for password confirmation. You can google on how to do this, it's easy. The reason for this is that dbTrash uses .hg.conf which is located under /root home and the default cron user (which is the root by the way, strange...) cannot find it.

F. Setting up the Blat server (optional but required for most molecular biology labs)


  1. Most tools required by blat have already been compiled (gfServer, gfClient, faToNib and blat). If some of them have not been compiled in step B.16, compile them separately (see also the first note in Notes).
  2. Update the blatServers table in the hgcentral database with the address of your host (usually localhost).
    mysql -uUSER_WITH_WRITE_PERMISSIONS -pPASSWORD -e "USE hgcentral; \
    UPDATE blatServers SET host='localhost' WHERE db LIKE '%hg18%'; \
    UPDATE blatServers SET host='localhost' WHERE db LIKE '%hg19%'; \
    UPDATE blatServers SET host='localhost' WHERE db LIKE '%mm9%'; \
    UPDATE blatServers SET host='localhost' WHERE db LIKE '%mm10%'; \
    UPDATE blatServers SET host='localhost' WHERE db LIKE '%dm3%';"
    It would be good to backup this table first or note down the initial entries in case something goes wrong. You can also change the default ports (explore the relative tables using MySQL command line or phpMyAdmin).
  3. Everything else is straightforward. See also instructions in here on how to launch the blat server. I created a startup script and made it run every time my machine boots (with a certain delay unfortunately). It also contains a lot of hardcoded paths, which is in the todo list to change.
    sudo update-rc.d sblat defaults
    sudo chmod +x /etc/init.d/sblat

Notes


  • I faced a lot of problems with making the whole Kent source tree. For example, one tool needed for the cleaning of the customTrash database, dbTrash, was not compiling... There was a problem with mysql and sql libraries, so what I did was to install everything that had to do with dev packages from synaptic and in the /src directory of /kent source, I typed make libs. I found here. The tool was not compiled at first but when I visited /src/hg/dbTrash and typed make, it was compiled. All of the above as root. With the same way, I compiled the hgsql tool.
  • It is recommended that you migrate the MySQL database storage folder from the default location, as the table sizes will explode fast, especially if you want to host a lot of features, so as to keep the filesystem light. The process is not very difficult and explained in many blogs/forums. Just google for it.
  • Be sure also to have a lot of available space for the gbdb directory, as all the big tracks (not suitable for a database, e.g. ENCODE tracks and genome files) are stored there.
  • To insert a new table in any Genome Browser database without loading everything from the beginning:
    mysql -uSER_WITH_WRITE_PERMISSIONS -pPASSWORD --local-infile -e \
    "load data local infile '/path/to/my/table.ext' into table my_genome.table;"
    The table must have been created first from the respective table.sql file! I will soon provide a couple of example scripts as until now there have been a lot of times that I should add extra tables (according to the needs of my collegues) without rebuilding everything from the beginning.
  • There is a general problem with the $MACHTYPE environmental variable outside the kent shell scripts that are used for the general Genome Browser building (/src/procuct scripts). I fixed this by manually editing the makefile of each additional tool I wanted to use (e.g. the spToDb tool) and replaced the line
    MYLIBDIR = ../../../lib/$(MACHTYPE)
    with
    MYLIBDIR = ../../../lib/x86_64


TODO


  • Log rotation scripts for the logs maintained by most of the UCSC Genome Browser tools... If someone has done it, I would really appreciate some sharing!
  • Better scrap table browser pages for table names?

2 comments:

Unknown said...

Thanks God (Jim Kent) for make the Hubs exist!

Unknown said...

Thanks for this tuto. I have a issue in step B.14 when i run "sudo sh kentSrcUpdate.sh ./browserEnvironment.txt", system return this error :
git update report summary is in email to root
kentSrcUpdate.sh: 73: kentSrcUpdate.sh: mail: not found
make: execvp: ./hg/sqlEnvTest.sh: Permission denied
make: *** [hgLib] Error 127
make: *** Waiting for unfinished jobs....
/bin/sh: 1: ./machTest.sh: Permission denied
make: *** [topLibs] Error 126
egrep: daily.log: No such file or directory

Post a Comment

Copyright © Bioinformatics dance