[Openchemistry-developers] Sample CSV data for mongochem?

Tue Feb 5 14:52:12 EST 2013

Hey Kyle,

I'm trying to get this to work, but I'm stuck trying to get data into the DB. When I do a CSV import, I only end up with the final molecule in the database, as if it's not generating new IDs as it goes, and is just overwriting the same data over and over for each molecule in the CSV file… I'll attach my CSV to see if there were any conversion problems from the SDF. I used the exact same string ("mass tpsa vabc rotatable-bonds") for the descriptor names as you suggested, so if there should have been other names listed, please tell me. (And, BTW, what is the File->Add New Data menu option supposed to do?) 

Thanks,
-Eric

-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.csv
Type: text/csv
Size: 1738 bytes
Desc: not available
URL: <http://public.kitware.com/pipermail/openchemistry-developers/attachments/20130205/a1587915/attachment-0002.csv>
-------------- next part --------------


========================

Just for your notes, here are my build experiences trying to get the chemkit python bindings built: (FYI: I'm on Mac OS X 10.8.2)

I don't know how to force the chemkit python bindings to be generated in the superbuild, so I tried to build from a separate source tree.

First I tried the 0.1 stable release. I already had Qt and CMake, and I installed eigen and boost (1.52) with homebrew. This built fine, but when I turned on the python bindings I had to add ${PYTHON_LIBRARY} to the target_link_libraries() command (like I see you fixed in the git master). I couldn't do "import chemkit" with the library that was generated.

Next, I tried the git master, but I ran into some boost linking errors. (BTW, I run into a huge number of these trying to do the openchemistry superbuild, which for now I got around by manually modifying linker commands…) I fixed this by changing

--- a/src/chemkit/CMakeLists.txt
+++ b/src/chemkit/CMakeLists.txt
@@ -8,7 +8,7 @@ find_package(Boost COMPONENTS system filesystem thread REQUIRED)
 
 # boost.thread in versions 1.50 and later require boost.chrono
 if(${Boost_VERSION} GREATER 104999)
-  find_package(Boost COMPONENTS chrono REQUIRED)
+  find_package(Boost COMPONENTS chrono system filesystem thread REQUIRED)
 endif()

But, I was still having trouble getting the Python bindings to work. Unfortunately, I'm pretty ignorant in this area. For the VTK Python wrapping the system generates .so (what I think are) static library files, and so I tried changing SHARED to MODULE in the chemkit-python library generation command

--- a/bindings/python/CMakeLists.txt
+++ b/bindings/python/CMakeLists.txt
@@ -66,7 +66,7 @@ add_custom_command(OUTPUT chemkit.cpp
                    DEPENDS ${SOURCES})
 
 # build library
-add_library(chemkit-python SHARED chemkit.cpp)
+add_library(chemkit-python MODULE chemkit.cpp)
 set_target_properties(chemkit-python PROPERTIES OUTPUT_NAME "chemkit" PREFIX "")
 target_link_libraries(chemkit-python ${CHEMKIT_LIBRARIES} ${PYTHON_LIBRARIES})
 
And this worked, so after a "make install" I can do "import chemkit" from python.


On Feb 2, 2013, at 9:14 AM, Kyle Lutz <kyle.lutz at kitware.com> wrote:

> Hi Eric,
> 
> Thanks for trying out MongoChem. Any feedback on building and/or
> running MongoChem would be greatly appreciated!
> 
> Currently, MongoChem supports loading molecular data from SDF files
> and CSV files. To initially setup our database I used one of the
> PubChem SDF files which can be downloaded from
> ftp://ftp.ncbi.nlm.nih.gov/pubchem/Compound/CURRENT-Full/SDF/. Each
> one contains about 25,000 molecular structures along with a few
> descriptors. Once those are loaded I used the
> "import-obabel-diagrams.py" script in the descriptors directory to
> import the 2D images (soon the image import will be done
> automatically).
> 
> To generate CSV files I use the "make-descriptors-csv-file.py" script
> which when given a molecule file (e.g. one of the PubChem SDF file)
> and a list of descriptor names (e.g. "mass tpsa vabc rotatable-bonds")
> will output a file that can then be used with MongoChem's CSV
> importer.
> 
> Also, we are currently working on adding more data importers. Can you
> let us know what type of data files you have? Furthermore, if you
> could provide any sample data sets that would be a great help in
> getting them to work smoothly with MongoChem.
> 
> Let me know if you have any other questions or feedback.
> 
> Cheers,
> Kyle