ChristianKl comments on What should normal people do? - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (93)
That sounds like you do one insert per transaction which is the default way SQL operates. It possible to batch multiple inserts together to one transaction.
If I remember right the data was something in the size of 10GB. I think that a computer should be able to do the logs->SQL step in less than a day provided one doesn't do one insert per transaction.
I believe so, yeah. You can see an old copy of the script at http://github.com/bartosh/pomni/blob/master/mnemosyne/science_server/parse_logs.py (or download the Mnemosyne repo with
bzr). My version is slightly different in that I made it a little more efficient by shifting theself.con.commit()call up into the exception handler, which is about as far as my current Python & SQL knowledge goes. I don't see anything in http://docs.python.org/2/library/sqlite3.html mentioning 'union', so I don't know how to improve the script.The .bz2 logs are ~4GB; the half-done SQL database is ~18GB so I infer the final database will be ~36GB.
EDIT: my ultimate solution was to just spend $540 on an SSD, which finished the import process in a day; the final uploaded dataset was 2.8GB compressed and 18GB uncompressed (I'm not sure why it was half the size I expected).