[Openchemistry-developers] Some queries regarding cclib projects

Kunal Sharma ks05111996 at gmail.com
Sun Mar 19 06:41:23 EDT 2017


Hello,

I had some last minute queries before I submit my first proposal draft,

1. I have decided to club together another project with the following
project: Machine learning applied to Computational Chemistry data but I am
not sure which other project will benefit the organisation more. *Basically,
what project should I club together with the ML one?*

2. I was also thinking of making a Point Group detection library in python
but then I cam across Geoffrey's answer
<http://scicomp.stackexchange.com/questions/135/how-does-one-determine-the-point-group-of-a-molecule>
where he mentioned that this was added to Avagadro. So I will be dropping
this.

I will send a link to my first draft by *20th March, 4:30 AM GMT.*

Regards,
Kunal Sharma

On Tue, Mar 14, 2017 at 1:47 AM, Karol Langner <karol.langner at gmail.com>
wrote:

> Geoff's and Adam's comments are spot on. I would add that both these ideas
> are on the more challenging/advanced side (which also make them interesting
> in my mind).
>
> The first because, as Geoff mentioed, it is more of a research project,
> requiring some scouting what approaches might be appropriate for the kind
> of input/output cclib deals with. I don't have any specific suggestions
> here, since I haven't thought about the technical aspects much, but
> deifnitely some combination of supervised learning and constraints based on
> prior knowledge (relationships in the output space, like atom charges sum
> up to molecular charge). The ideal end state would be taking output from a
> program we don't support (like mpqc) and getting reasonable attribute
> coverage. I don't think this is unreasonable, since output from all
> programs have many things in common.
>
> The second thing you listed is a design task in addition to coding.
> Currently, cclib's parsers have a simple design: some helper methods but
> they are mostly one gigantic parse method that goes through the file
> incrementally. Refactoring a single parser is probably a good way to start
> coding and testing ideas. But what I think we're really after here is some
> design concept. How should the parsers be structured so the code is more
> modular? What steps should we take to make them easier to maintain, test
> and extend?
>
>
> On Mon, Mar 13, 2017 at 11:07 AM, Adam Tenderholt <atenderholt at gmail.com>
> wrote:
>
>> Hi Kunal,
>>
>> Geoff covered everything pretty well. Other than providing a correct link
>> to our application guidelines (http://wiki.openchemistry.org
>> /Applying_to_GSoC), I'll just add that the project ideas on the wiki are
>> just that—ideas. It's up to you to write a proposal, so feel free to
>> combine ideas so that you find the project exciting with realistic
>> milestones.
>>
>> Adam
>>
>>
>> On Mon, Mar 13, 2017 at 10:47 AM Geoffrey Hutchison <
>> geoff.hutchison at gmail.com> wrote:
>>
>>> Hi Kunal,
>>>
>>> Thanks for your message. I think Adam and/or Karol can comment more, but
>>> I'll give some in-line comments to your message.
>>>
>>> *1. Machine Learning applied to parsing computational chemistry output: *Since
>>> parser is used to get a very specific output from a specific input, what is
>>> it that we expect from the final ML pipeline. Do we want it to get all the
>>> available data from an output file (like most
>>> (if not all) of the parameters mentioned in data.py)?
>>>
>>>
>>>
>>> Right. The question is whether it's possible to teach a ML model to find
>>> all available data mentioned in data.py. This is clearly more of a research
>>> project than some of the other ideas.
>>>
>>>
>>> 2. Refactoring parser and Implementing new parsers: I was looking into
>>> this and saw that you thought about an approach which utilized decorators
>>> and partial parsing of the file, but maybe it was dropped? Also, can you
>>> please provide a list of the parsers you would like to extend in cclib in
>>> this GSoC ...
>>>
>>>
>>> There are a lot of example files in the cclib data repository:
>>> https://github.com/cclib/cclib-data
>>>
>>> I think the idea here is that you would choose which parsers you'd want
>>> to refactor and/or add. There are, after all, no end to the number of
>>> computational packages.
>>>
>>>
>>> *I was thinking that given the overall duration of GSoC I would like to
>>> attempt to do more than one project (Combining two projects). What are your
>>> thoughts on this? Given the duration, would it be possible?*
>>>
>>>
>>>
>>> It depends a bit on the projects, but I can imagine these two projects
>>> could be blended (e.g., refactoring and adding new parsers while trying the
>>> ML approach).
>>>
>>> As for the application, here's a guide on the wiki:
>>> https://github.com/cclib/cclib-data
>>>
>>> We usually recommend students start a proposal with Google Docs (or
>>> something similar) and share with mentors/admins to get feedback.
>>>
>>> Hope that helps,
>>> -Geoff
>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://public.kitware.com/pipermail/openchemistry-developers/attachments/20170319/f8aaceee/attachment.html>


More information about the Openchemistry-developers mailing list