[Openchemistry-developers] Some queries regarding cclib projects

Karol Langner karol.langner at gmail.com
Mon Mar 13 16:17:56 EDT 2017


Geoff's and Adam's comments are spot on. I would add that both these ideas
are on the more challenging/advanced side (which also make them interesting
in my mind).

The first because, as Geoff mentioed, it is more of a research project,
requiring some scouting what approaches might be appropriate for the kind
of input/output cclib deals with. I don't have any specific suggestions
here, since I haven't thought about the technical aspects much, but
deifnitely some combination of supervised learning and constraints based on
prior knowledge (relationships in the output space, like atom charges sum
up to molecular charge). The ideal end state would be taking output from a
program we don't support (like mpqc) and getting reasonable attribute
coverage. I don't think this is unreasonable, since output from all
programs have many things in common.

The second thing you listed is a design task in addition to coding.
Currently, cclib's parsers have a simple design: some helper methods but
they are mostly one gigantic parse method that goes through the file
incrementally. Refactoring a single parser is probably a good way to start
coding and testing ideas. But what I think we're really after here is some
design concept. How should the parsers be structured so the code is more
modular? What steps should we take to make them easier to maintain, test
and extend?


On Mon, Mar 13, 2017 at 11:07 AM, Adam Tenderholt <atenderholt at gmail.com>
wrote:

> Hi Kunal,
>
> Geoff covered everything pretty well. Other than providing a correct link
> to our application guidelines (http://wiki.openchemistry.
> org/Applying_to_GSoC), I'll just add that the project ideas on the wiki
> are just that—ideas. It's up to you to write a proposal, so feel free to
> combine ideas so that you find the project exciting with realistic
> milestones.
>
> Adam
>
>
> On Mon, Mar 13, 2017 at 10:47 AM Geoffrey Hutchison <
> geoff.hutchison at gmail.com> wrote:
>
>> Hi Kunal,
>>
>> Thanks for your message. I think Adam and/or Karol can comment more, but
>> I'll give some in-line comments to your message.
>>
>> *1. Machine Learning applied to parsing computational chemistry output: *Since
>> parser is used to get a very specific output from a specific input, what is
>> it that we expect from the final ML pipeline. Do we want it to get all the
>> available data from an output file (like most
>> (if not all) of the parameters mentioned in data.py)?
>>
>>
>>
>> Right. The question is whether it's possible to teach a ML model to find
>> all available data mentioned in data.py. This is clearly more of a research
>> project than some of the other ideas.
>>
>>
>> 2. Refactoring parser and Implementing new parsers: I was looking into
>> this and saw that you thought about an approach which utilized decorators
>> and partial parsing of the file, but maybe it was dropped? Also, can you
>> please provide a list of the parsers you would like to extend in cclib in
>> this GSoC ...
>>
>>
>> There are a lot of example files in the cclib data repository:
>> https://github.com/cclib/cclib-data
>>
>> I think the idea here is that you would choose which parsers you'd want
>> to refactor and/or add. There are, after all, no end to the number of
>> computational packages.
>>
>>
>> *I was thinking that given the overall duration of GSoC I would like to
>> attempt to do more than one project (Combining two projects). What are your
>> thoughts on this? Given the duration, would it be possible?*
>>
>>
>>
>> It depends a bit on the projects, but I can imagine these two projects
>> could be blended (e.g., refactoring and adding new parsers while trying the
>> ML approach).
>>
>> As for the application, here's a guide on the wiki:
>> https://github.com/cclib/cclib-data
>>
>> We usually recommend students start a proposal with Google Docs (or
>> something similar) and share with mentors/admins to get feedback.
>>
>> Hope that helps,
>> -Geoff
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://public.kitware.com/pipermail/openchemistry-developers/attachments/20170313/daf186bd/attachment.html>


More information about the Openchemistry-developers mailing list