[Openchemistry-developers] Interested in Machine learning applied to parsing computational chemistry output

Karol Langner karol.langner at gmail.com
Sun Mar 11 01:14:15 EST 2018


Hi Yue,

This is project is definitely about training an algo to extract data from
the text output files, not replacing the calculation method in programs.
There has been a bunch of work on the latter in the literature, and
although I am not averse to a project in that area it's not what I had in
mind. If you would like to propose a project that would predict molecular
properties or whatever, you're welcome to do that, but keep in mind it
should be a coding project, not a research-only thing.

If you would like to keep to the original intent for this project, then
it's like you said, using the text of the output file as input and the data
cclib extracts as output. In other words, can we train a model to extract
the data we want without writing a parser? As an example, we'd like to just
feed the logfile and get number of atoms, SCF energies, etc. To be honest,
I'm not sure what approach would be best or than anything would work well,
but the project is about exploring what can be done. We would expect you to
suggest several models and a procedure to evaluate them.

HTH,
Karol


On Fri, Mar 9, 2018 at 6:06 AM, Yue Wang <ywang337 at jhu.edu> wrote:

> Dear Karol,
>
>
> Thanks for your message and it helps a lot!
>
>
> I find cclib is really a huge project and I have some new questions about
> the machine learning part.
>
>
> First, the issues listed on Github are all about previous projects(i.e.,
> debug, maintainace). I've downloaded the cclib and read some of the
> python scripts. But I think  this kind of work needs great knowledge of
> target programs like ADF or ORCA. Thus, it might be difficult for me to
> solve them and I was wondering if you could list some machine learning
> tasks for student like me to work on.
>
>
> As you mentioned before, this machine learning project is brand new. So
> what's your expectation for it?(I've visited the GSOC website but the ideas
> list comes without details.) To be more specific, what's the predictor?
> and what's the target?  I think we cannot avoid reading the output files If
> we want to parse the logfiles. My plan is to set the numbers in the
> output files as predictor and the numbers in the parsed files as target.
> Furthermore, we can even use machine learning techniques to replace the
> calculation methods.
>
>
> Thanks again and I'm looking forward to hearing from you soon!
>
>
> Best,
>
> Yue
>
>
>
>
>
>
> ------------------------------
> *From:* Karol Langner <karol.langner at gmail.com>
> *Sent:* Sunday, February 25, 2018 9:49:02 AM
> *To:* Yue Wang
> *Cc:* openchemistry-developers public.kitware.com; cclib-dev List
> *Subject:* Re: Interested in Machine learning applied to parsing
> computational chemistry output
>
> Hi Yue,
>
> CC'ing relevant mailing lists.
>
> Nice to hear from you. To get started, I would recommend taking a look
> around the cclib repository (https://github.com/cclib/cclib) and docs (
> https://cclib.github.io/how_to_parse.html). The docs are not perfect, but
> give a reasonable overview (of course, please tell us what to improve). If
> you feel like digging into some contributions, feel free to send a pull
> request on GitHub or to peruse our current list of bugs and issues (
> https://github.com/cclib/cclib/issues).
> How to parse and write — cclib 1.5 documentation
> <https://cclib.github.io/how_to_parse.html>
> cclib.github.io
> How to parse and write¶ The cclib package provides three scripts to parse
> and write data i.e. ccget, ccwrite, and cda. ccget is used to parse
> attribute data from ...
>
> GitHub - cclib/cclib: Parsers and algorithms for ...
> <https://github.com/cclib/cclib>
> github.com
> cclib - Parsers and algorithms for computational chemistry logfiles
>
>
> As far as the ML project is concerned, it would somewhat more research-y
> than the other projects, simply because we haven't really tried to do this
> before. We would expect a student to independently survey what approaches
> would be reasonable, and define the metrics/assumption that can be applied.
>
> Hope that helps somewhat, don't hesitate to ask more questions.
>
> - Karol
>
>
> On Sat, Feb 24, 2018 at 4:48 AM, Yue Wang <ywang337 at jhu.edu> wrote:
>
> Hi Karol,
>
>
> I am a student at Johns Hopkins University and I am interested in your
> project idea: Machine learning applied to parsing computational chemistry
> output.
>
> I've experience with Python and Machine Learning and participated in
> Kaggle competition and UW's Data Science Incubator program. Also, I
> worked with Prof. Xiao Gu during my undergrad to do DFT calculation and
> participated in a project exploring alkali-resistant mechanism of a
> Hollandite deNOx catalyst, which was published on Environ. Sci. Technol in
> 2015.
>
> But I'm new to open source project and I do not know how to work with cclib
> to make some contribution. Could you give me some guide?
>
> Thanks!
>
> Best,
> Yue
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://public.kitware.com/pipermail/openchemistry-developers/attachments/20180310/129f0621/attachment.html>


More information about the Openchemistry-developers mailing list