[Openchemistry-developers] GSoC 2019 - Project Discussion

Karol Langner karol.langner at gmail.com
Fri Mar 22 12:22:21 EDT 2019


Hi Bhavay,

Here are my opinionated responses to your questions. Please treat them with
a grain of salt :) Also CC'ing two mailing list in case others would like
to chime in.

1) I think of this project as an "infrastructure component". That means
that it would result in code that others could use to more easily build a
service that searches the content. But maybe it is feasible and even more
reasonable to set up a whole server in the summer, as a prototype. I'm not
sure about this and would love to hear your thoughts. But I do see those
two things as rather separate. I think the main issue pinning down the
scope of the crawling and what content should be indexed. Note that there
are things like Chemspider out there, so it's worthwhile to see the
functionality those currently provide.

2) It's hard to call out one specific problem this is meant to solve. I see
this as more of a systematic gap in the comp chem sphere. Let's say I'm
researching how nitrogen behaves close to metal surfaces.... I can search
the web, literature, structural databases. But people must have done
thousands of computations already for nitrogen and metals... there's
currently no way to get that list and build on top of it. That really
should be the starting point. And that would be pretty cool to have. Now, I
don't think we can index all past computations, since people haven't
generally been uploading their raw outputs along side scientific articles.
But something like this could help nudge people in that direction. There's
also a whole universe of possibilities if we had a publicly available
database of computational results at our disposable, like re-using orbitals
and other properties, learning on top of results, etc.

3) As a field I would say computational chemistry is VERY mature. It's been
around for decades and the main methods are well studied and entrenched. A
lot of current work happens around the edge cases and adds minor
improvements to existing numerical methods. Of course, there are some
really interesting new things going on, but most research of the type "run
program X wiht method Y on system Z". Most people treat comp chem programs
as black boxes, which is why treating their raw results as research outputs
makes a lot of sense.

4) The main constraint is the GSOC timeline, so you need to think about how
to structure work in a way that fits in to the space of a summer.

I don't have specific questions for you, but I would encourage you to also
look at other projects and how they relate to this one. For example, the
project around producing QCJSON output is really interesting. If you think
about how to make search results in this project available, you'll quickly
come to the conclusion that it should be some kind of common format or data
structure. This is exactly what QCJSON is trying to achieve. Of course, we
could just provide the Python API internal to cclib, but it makes a lot of
sense to rely on something self-contained.

Hope that helps a little,
Karol

On Thu, Mar 21, 2019 at 12:12 PM Bhavay Aggarwal <bhavay18384 at iiitd.ac.in>
wrote:

> Hi,
>
> I got a bit confused after reading the wiki but I think I understand now.I
> thought we would be calculating using cclib so that's gone. I think
> building the crawler first is necessary before we can think of any
> additional functionalities and for the time being, I can only think of web
> integration which also requires the completion of the major task first. I
> was wanted to ask some other questions related to the project which would
> be helpful with the proposal.
> 1) How does this project help OpenChem and its users?
> 2) Similar to the above but are there any problems we are trying to solve
> with this?
> 3) I was not very familiar with computational chemistry so as a filed is
> it still in its beginning stages to its major audience?
> 4) Since currently there is only a single project to work on do I need a
> rigid timeline?
>
> If there are any questions you would like me to answer, I would be more
> than happy to do so,
>
> Thanks
>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://public.kitware.com/pipermail/openchemistry-developers/attachments/20190322/1e6e57db/attachment.html>


More information about the Openchemistry-developers mailing list