<div dir="ltr">Hi Kunal,<div><br></div><div>Geoff covered everything pretty well. Other than providing a correct link to our application guidelines (<a href="http://wiki.openchemistry.org/Applying_to_GSoC">http://wiki.openchemistry.org/Applying_to_GSoC</a>), I'll just add that the project ideas on the wiki are just that—ideas. It's up to you to write a proposal, so feel free to combine ideas so that you find the project exciting with realistic milestones.</div><div><br></div><div>Adam</div><div><br><div><br><div class="gmail_quote"><div dir="ltr">On Mon, Mar 13, 2017 at 10:47 AM Geoffrey Hutchison <<a href="mailto:geoff.hutchison@gmail.com">geoff.hutchison@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word" class="gmail_msg"><div class="gmail_msg">Hi Kunal,</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">Thanks for your message. I think Adam and/or Karol can comment more, but I'll give some in-line comments to your message.</div><br class="gmail_msg"><div class="gmail_msg"></div></div><div style="word-wrap:break-word" class="gmail_msg"><div class="gmail_msg"><blockquote type="cite" class="gmail_msg"><div class="gmail_msg"><b class="gmail_msg">1. Machine Learning applied to parsing computational chemistry output: </b>Since parser is used to get a very specific output from a specific input, what is it that we expect from the final ML pipeline. Do we want it to get all the available data from an output file (like most </div><div class="gmail_msg"><div dir="ltr" class="gmail_msg"><div class="gmail_msg">(if not all) of the parameters mentioned in data.py)?</div></div></div></blockquote><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg"><br class="gmail_msg"></div></div></div><div style="word-wrap:break-word" class="gmail_msg"><div class="gmail_msg"><div class="gmail_msg">Right. The question is whether it's possible to teach a ML model to find all available data mentioned in data.py. This is clearly more of a research project than some of the other ideas.</div><div class="gmail_msg"><br class="gmail_msg"></div><br class="gmail_msg"><blockquote type="cite" class="gmail_msg"><div class="gmail_msg"><div dir="ltr" class="gmail_msg"><div class="gmail_msg"><h3 style="background-image:none;background-position:initial;background-size:initial;background-repeat:initial;background-origin:initial;background-clip:initial;margin:0.3em 0px 0px;overflow:hidden;padding-top:0.5em;padding-bottom:0px;border-bottom:none;line-height:1.6" class="gmail_msg"><font size="2" class="gmail_msg">2. Refactoring parser and Implementing new parsers: <span style="font-weight:normal" class="gmail_msg">I was looking into this and saw that you thought about an approach which utilized decorators and partial parsing of the file, but maybe it was dropped? Also, can you please provide a list of the parsers you would like to extend in cclib in this GSoC ...</span></font></h3></div></div></div></blockquote><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">There are a lot of example files in the cclib data repository: </div><div class="gmail_msg"><a href="https://github.com/cclib/cclib-data" class="gmail_msg" target="_blank">https://github.com/cclib/cclib-data</a></div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">I think the idea here is that you would choose which parsers you'd want to refactor and/or add. There are, after all, no end to the number of computational packages.</div></div></div><div style="word-wrap:break-word" class="gmail_msg"><div class="gmail_msg"><div class="gmail_msg"><br class="gmail_msg"></div><br class="gmail_msg"><blockquote type="cite" class="gmail_msg"><div class="gmail_msg"><div dir="ltr" class="gmail_msg"><div class="gmail_msg"><b class="gmail_msg">I was thinking that given the overall duration of GSoC I would like to attempt to do more than one project (Combining two projects). What are your thoughts on this? Given the duration, would it be possible?</b><br class="gmail_msg"></div></div></div></blockquote><br class="gmail_msg"></div></div><div style="word-wrap:break-word" class="gmail_msg"><div class="gmail_msg"></div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">It depends a bit on the projects, but I can imagine these two projects could be blended (e.g., refactoring and adding new parsers while trying the ML approach).</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">As for the application, here's a guide on the wiki:</div><div class="gmail_msg"><a href="https://github.com/cclib/cclib-data" class="gmail_msg" target="_blank">https://github.com/cclib/cclib-data</a></div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">We usually recommend students start a proposal with Google Docs (or something similar) and share with mentors/admins to get feedback.</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">Hope that helps,</div><div class="gmail_msg">-Geoff</div><br class="gmail_msg"></div></blockquote></div></div></div></div>