Genend - Update 1
Moved from Perl to Python. Extensive use of Perl in larger files proved to be hard to organize for myself, was having trouble keeping straight what I was doing. Also don’t like the Perl object/class system, more at home with Python’s.
Current progress includes a custom database object for use with interfacing to a sqlite database (and possibly PostgreSQL/MySQL/Firebird if it gets too slow). Everything except ‘updates’ to an entry are done. Database object is about as simple as it gets, using a list of tuples for adding k-mer’s and a large tuple for taxonomy.
Started working on an object that will take in a directory full of genomes, the output directory and a number for the number of threads to run and it will pool objects to process files. Will have a threadable object that accesses BioPython libraries to parse the genome files. Important question for queueing threads is whether SQLite will like concurrent access to the same database. Need to figure out how to handle inserts so that there isn’t fragmentation. There should be little fragmentation as each file and species will be unique.
For next week:
Finish up database object and threading objects. Do preliminary run to start building genomes. Determine largest feasible genome before laptop machine (2×2.4GHz w/ 4 GB ram) will puke. If it proves to do so before getting to high in the phlya, will need to start writing some string operating libraries in C to deal with static length strings.