change-history.org
Table of Contents
1 Database schema
In current we use Ensembl database schema as template.A full featured Ensembl database is consist of over 70 tables. For a gene prediction task using Augustus as annotation engine,we only need 3 of them.
1.1 table 'dna'
Contains DNA sequence. This table has a 1:1 relationship with the contig table. There's a one-one map for each record in this table to each single row in a plain file 'dna.txt' in which sequences are stored in format of 'int-id\tsequence'.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
seq_region_id | INT(10) | Primary key, internal identifier. Foreign key references to the seq_region table. | primary key | |
sequence | LONGTEXT | DNA sequence. |
1.2 table 'seq_region'
Stores information about sequence regions. The primary key is used as a pointer into the dna table so that actual sequence can be obtained, and the coord_system_id allows sequence regions of multiple types to be stored.Contigs are stored with the 'coord_system_id=2'. Chromosomes have 'coord_system_id=1',they have no corresponding record in table 'dna'. The relationship between contigs and chromosomes is stored in the assembly table.
Column | Type | Default value | Description | Index |
---|---|---|---|---|
seq_region_id | INT(10) | Primary key, internal identifier. | primary key | |
name | VARCHAR(40) | Sequence region name. | unique key: name_cs_idx | |
coord_system_id | INT(10) | Foreign key references to the coord_system table. | unique key: name_cs_idx | |
key: cs_idx | ||||
length | INT(10) | Sequence length. |
1.3 table 'assembly'
This is the assembly table structure.
Field | Type | Null | Key | Default | Extra |
---|---|---|---|---|---|
asm_seq_region_id | int(10) unsigned | NO | PRI | NULL | |
cmp_seq_region_id | int(10) unsigned | NO | PRI | NULL | |
asm_start | int(10) | NO | PRI | NULL | |
asm_end | int(10) | NO | PRI | NULL | |
cmp_start | int(10) | NO | PRI | NULL | |
cmp_end | int(10) | NO | PRI | NULL | |
ori | tinyint(4) | NO | PRI | NULL |
2 mysql++ API
In current,we use a third-part mysql API:mysql++ to handle sequence from database.I choose it because of its lightweight and it supports STL perfectly.
2.1 install
install package
# apt-get install libmysql++-devor see docs/INSTALL.md for a complete overview.
2.2 use SSQLS
mysqlpp allows user defined 'Specialized SQL Structure'.At the most superficial level,and SSQLS has a member variable corresponding to each field in the SQL table. In 'include/table_structure.h' defined 'dna','seq_region','assembly'.
sql_create_2(dna, 1, 2, int,seq_region_id, std::string, sequence) sql_create_4(seq_region, 1,4, int,seq_region_id, std::string,name, std::string,coord_system_id, int,length) sql_create_6(assembly, 1, 6, int, asm_seq_region_id, int, cmp_seq_region_id, int, asm_start, int, asm_end, int, cmp_start, int, cmp_end)
3 cmdline parameters
- –dbaccess accepts comma separated string "database name,host name,user,passwd,table name"
- the only parameter without a '–' is the query.If '–dbaccess' is indicated,query corresponds to a name in 'seq_region' table.So skip filetype detect in this case.
- –predictionStart and –predictionEnd still work the same way as when input file is a fasta or genebank.
augustus --dbaccess="fly,localhost,henry,123456,," 3L --predictionStart=100 --predictionEnd=30000000 --species=fly
4 modification
file | desc |
---|---|
Makefile | add 2 header path and 2 lib path;add -Wl,rpath=/your/run-timelib/path |
types.cc | l-322~l-324,comment an exception thow message to allow 'dbaccess' in sigle mode.I don't want to modify this behavior in system level so I just comment it. |
types.cc | reorder –dbaccess to "database name,host name,user,passwd,tablename" |
randaccess.{hh,cc} | accomplish the AnnoSequence* DbSeqAccess::getSeq method.Give a mysqlpp::connection object to class DbSeqAccess. |
genbank.cc | GBSplitter(string fname ),l-526. If input fname is a name in 'seq_region' in database,skip the filetype detect. |
table_structure.h | in 'trunks/include/mysqlppheader' add 3 SSQLS: 'dna','seq_region','assembly' |
Date: 2012-06-09 Sat