[Openchemistry-developers] Sample CSV data for mongochem?

Kyle Lutz kyle.lutz at kitware.com
Sat Feb 2 09:14:32 EST 2013


On Fri, Feb 1, 2013 at 2:45 PM, Eric E. Monson <emonson at cs.duke.edu> wrote:
> Hey guys,
>
> I've been using VTK charts for a GUI I built a while ago, and Marcus mentioned to me recently (on the VTK-dev list) that MongoChem had some good example code, so I've been trying to check it out. I was able to build it okay (I have some notes on that experience if you're interested), but I was wondering if there is someplace I can get some sample data to populate my database? I'm not totally ignorant of chemistry, but it's not my field, and it's not clear where I could get a decent-sized CSV file with the right format and data.

Hi Eric,

Thanks for trying out MongoChem. Any feedback on building and/or
running MongoChem would be greatly appreciated!

Currently, MongoChem supports loading molecular data from SDF files
and CSV files. To initially setup our database I used one of the
PubChem SDF files which can be downloaded from
ftp://ftp.ncbi.nlm.nih.gov/pubchem/Compound/CURRENT-Full/SDF/. Each
one contains about 25,000 molecular structures along with a few
descriptors. Once those are loaded I used the
"import-obabel-diagrams.py" script in the descriptors directory to
import the 2D images (soon the image import will be done
automatically).

To generate CSV files I use the "make-descriptors-csv-file.py" script
which when given a molecule file (e.g. one of the PubChem SDF file)
and a list of descriptor names (e.g. "mass tpsa vabc rotatable-bonds")
will output a file that can then be used with MongoChem's CSV
importer.

Also, we are currently working on adding more data importers. Can you
let us know what type of data files you have? Furthermore, if you
could provide any sample data sets that would be a great help in
getting them to work smoothly with MongoChem.

Let me know if you have any other questions or feedback.

Cheers,
Kyle



More information about the Openchemistry-developers mailing list