[Openchemistry-developers] Some queries regarding cclib projects

Sun Mar 19 19:16:43 EDT 2017

Hi Kunal,

I think the ML project stands on it's own and can be pretty broad depending
on how you plan things out. I also think it's pretty orthogonal to other
projects on the list.

On Sun, Mar 19, 2017 at 3:41 AM, Kunal Sharma <ks05111996 at gmail.com> wrote:

> Hello,
>
> I had some last minute queries before I submit my first proposal draft,
>
> 1. I have decided to club together another project with the following
> project: Machine learning applied to Computational Chemistry data but I am
> not sure which other project will benefit the organisation more. *Basically,
> what project should I club together with the ML one?*
>
> 2. I was also thinking of making a Point Group detection library in python
> but then I cam across Geoffrey's answer
> <http://scicomp.stackexchange.com/questions/135/how-does-one-determine-the-point-group-of-a-molecule>
> where he mentioned that this was added to Avagadro. So I will be dropping
> this.
>
> I will send a link to my first draft by *20th March, 4:30 AM GMT.*
>
> Regards,
> Kunal Sharma
>
> On Tue, Mar 14, 2017 at 1:47 AM, Karol Langner <karol.langner at gmail.com>
> wrote:
>
>> Geoff's and Adam's comments are spot on. I would add that both these
>> ideas are on the more challenging/advanced side (which also make them
>> interesting in my mind).
>>
>> The first because, as Geoff mentioed, it is more of a research project,
>> requiring some scouting what approaches might be appropriate for the kind
>> of input/output cclib deals with. I don't have any specific suggestions
>> here, since I haven't thought about the technical aspects much, but
>> deifnitely some combination of supervised learning and constraints based on
>> prior knowledge (relationships in the output space, like atom charges sum
>> up to molecular charge). The ideal end state would be taking output from a
>> program we don't support (like mpqc) and getting reasonable attribute
>> coverage. I don't think this is unreasonable, since output from all
>> programs have many things in common.
>>
>> The second thing you listed is a design task in addition to coding.
>> Currently, cclib's parsers have a simple design: some helper methods but
>> they are mostly one gigantic parse method that goes through the file
>> incrementally. Refactoring a single parser is probably a good way to start
>> coding and testing ideas. But what I think we're really after here is some
>> design concept. How should the parsers be structured so the code is more
>> modular? What steps should we take to make them easier to maintain, test
>> and extend?
>>
>>
>> On Mon, Mar 13, 2017 at 11:07 AM, Adam Tenderholt <atenderholt at gmail.com>
>> wrote:
>>
>>> Hi Kunal,
>>>
>>> Geoff covered everything pretty well. Other than providing a correct
>>> link to our application guidelines (http://wiki.openchemistry.org
>>> /Applying_to_GSoC), I'll just add that the project ideas on the wiki
>>> are just that—ideas. It's up to you to write a proposal, so feel free to
>>> combine ideas so that you find the project exciting with realistic
>>> milestones.
>>>
>>> Adam
>>>
>>>
>>> On Mon, Mar 13, 2017 at 10:47 AM Geoffrey Hutchison <
>>> geoff.hutchison at gmail.com> wrote:
>>>
>>>> Hi Kunal,
>>>>
>>>> Thanks for your message. I think Adam and/or Karol can comment more,
>>>> but I'll give some in-line comments to your message.
>>>>
>>>> *1. Machine Learning applied to parsing computational chemistry output:
>>>> *Since parser is used to get a very specific output from a specific
>>>> input, what is it that we expect from the final ML pipeline. Do we want it
>>>> to get all the available data from an output file (like most
>>>> (if not all) of the parameters mentioned in data.py)?
>>>>
>>>>
>>>>
>>>> Right. The question is whether it's possible to teach a ML model to
>>>> find all available data mentioned in data.py. This is clearly more of a
>>>> research project than some of the other ideas.
>>>>
>>>>
>>>> 2. Refactoring parser and Implementing new parsers: I was looking into
>>>> this and saw that you thought about an approach which utilized decorators
>>>> and partial parsing of the file, but maybe it was dropped? Also, can you
>>>> please provide a list of the parsers you would like to extend in cclib in
>>>> this GSoC ...
>>>>
>>>>
>>>> There are a lot of example files in the cclib data repository:
>>>> https://github.com/cclib/cclib-data
>>>>
>>>> I think the idea here is that you would choose which parsers you'd want
>>>> to refactor and/or add. There are, after all, no end to the number of
>>>> computational packages.
>>>>
>>>>
>>>> *I was thinking that given the overall duration of GSoC I would like to
>>>> attempt to do more than one project (Combining two projects). What are your
>>>> thoughts on this? Given the duration, would it be possible?*
>>>>
>>>>
>>>>
>>>> It depends a bit on the projects, but I can imagine these two projects
>>>> could be blended (e.g., refactoring and adding new parsers while trying the
>>>> ML approach).
>>>>
>>>> As for the application, here's a guide on the wiki:
>>>> https://github.com/cclib/cclib-data
>>>>
>>>> We usually recommend students start a proposal with Google Docs (or
>>>> something similar) and share with mentors/admins to get feedback.
>>>>
>>>> Hope that helps,
>>>> -Geoff
>>>>
>>>>
>>
>
> _______________________________________________
> Openchemistry-developers mailing list
> Openchemistry-developers at public.kitware.com
> http://public.kitware.com/mailman/listinfo/openchemistry-developers
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://public.kitware.com/pipermail/openchemistry-developers/attachments/20170319/f6bb42c8/attachment.html>