Author: Dr Anthony Nash

I am a Senior Research Scientist at the University of Oxford and a Research Member of Kellogg College, one of the Universities 39 constituent colleges. Prior to this, I was a postdoc for three and a half years in the Department of Chemistry, UCL, a PhD student for four years at Warwick University, and a Software Engineer with Motorola before then. I specialise in computational chemistry, clinical and drug epidemiology, structural biology and all things where computer science meets the chemical-life sciences.

Science Outreach Servers – Instagram

November 16, 2021 Dr Anthony NashLeave a comment

Please follow the Science Outreach Servers Instagram account: follow here.

For the price of one cup of coffee, you can help support Science Outreach Servers and its mission to help students and researchers from developing/low-income nations.

Please follow this link to show your support.

With the support from very kind donors, Science Outreach Servers is hoping to be free for one year before we move across to a sustainable model, for example, science and technology grant funds or opening up a couple of machines for paying clients to then offset the running costs for those in great need of high-performance machines.

#coffee #computers #education #educationforall #equality #science #technology #hpc #community #poverty #developingnations #chemistry #simulations #bioinformatics #biology #physics #biophysics #STEM #academia #research #university

Science Outreach Servers #1

October 16, 2021October 16, 2021 Dr Anthony NashLeave a comment

Finally. A name.

I’ve settled on the name “Science Outreach Servers”. Thank you to those who came forward with suggestions, you really helped me come to a final decision. I have a domain name and an email address. The website will be live soon.

Donations

All donations are hugely appreciated. The funds will go towards the following:

Domain name and hosting (I will cover the costs of the web development)
NAS HDD for the new NAS server. I’m hoping to provide up to 12TB of /home space.
x2 series 2080 Ti graphics card for one server without a GPU and a new server I’m building.
x2 Intel Xeon E3 chips, motherboard and RAM, and a PSU (I have a stock of 1TB HDD for workspace and a spare case).
Some of the electricity (most of it I will cover).
Second hand 45W solar panel with battery. I’m hoping to run some of the case fans with solar energy.

Please CLICK HERE to make a donation or share on your social media platform of choice.

Once I have sufficient resources and a User base, I will begin applying for funding.

The first user accounts.

I’m very pleased to announce that I have my first two users! When the website is up there will be a Projects page. For the time being, I will allocate accounts directly to machines. Users will all login to the same node and from there they SSH to an assigned node. Jobs will be submitted using tmux. This is not long term. When I’m confident with the SLURM setup, users will move across to a centralised job submission process, running from the login node.

Life Science Outreach Grid (update #3)

October 10, 2021 Dr Anthony NashLeave a comment

Lots to update. Firstly, I’m quite unwell, so I’m going to dump this out onto the keyboard with little consideration to how it reads. Updates are as follows:

Website:
I have a domain name (.org), an email address and I’ve secured hosting with my kind web developer. The actual web design is under pen-and-paper construction with the aim of starting sometime next week.

Upgrade-project to storage space:
I’m so grateful to my closest of friends, James Drinkwater, for gifting me his 2008 gaming PC. It’s perfect to run as an Ubuntu-NAS server (and possibly the login node, which is currently a Raspberry Pi Model 2). I’m looking at buying two 8TB drives.
The NAS drives will store user data, so, unlike the internal drives on each node I’ll set the HDD spin-down to 5 mins to keep the power consumption as low as possible.

Power consumption:
The UK is going through an energy crisis with electricity bills alone having trebled since last year. To mitigate this rise in costs I will be analysing the power consumption of all nodes and making changes where I can. This will probably include: reducing time till HDD spin-down; lowering RAM voltage on machines dedicated to non-memory intensive tasks; switching off on-board graphics, sound, USB ports, serial ports, etc. Some nodes e.g., high RAM-memory but low CPU/GPU speeds, will probably power on-and off as needed.
I am also interesting in running all case-fans off solar energy. However, until I find a generous supplier of solar panels, this option is too expensive. Essentially, a typical workstation case fan runs anywhere between 3 W to 24 W (12 V). Before you know it, you’re looking at a lot of energy consumed by case cooling alone. A brief internet search suggests that most free-standing solar panels would set me back around £150 to power several cases, and that’s without an essential battery pack.

Network Access:
This is a tough one. I’m running the internet off a residential Virgin Media broadband package with a VM Hub 3.0. The router does not allow user configured dynamic DNS settings. Sadly, without out of the question I don’t think I can use freedns. Unless I can find an alternative DDNS supplier, I might need to settle with Dyn (Oracle) at the costly sum of $50 a year.

Donations:
Any contributions are warmly welcomed. Here’s a link to the GoFundMe page. Also, if you or anyone you know is getting rid of old hardware please get in touch. I’m always happy to cover the courier fee.

Life Science Outreach Grid (update #2)

September 21, 2021September 21, 2021 Dr Anthony NashLeave a comment

Web site
I’ve enlisted the services of a local web developer. He’s sourced hosting, domain name and an email address, and we’ve run through a basic set of requirements. On top of that, a friend and colleague, Roshan Shrestha, has provided me with several excellent photos of the beautiful country of Nepal and his life there, in addition to some photos of the laptop that was the only resource he had to run a fairly small all-atom simulation; it was of an amyloid protein crossing a lipid-bilayer, part of the disease mechanism behind Alzheimer’s Disease.

Left-to-right: the countryside of Nepal, Roshan at a poster presentation, and an all-atom structure in VMD on a laptop.

The website will feature a user registration, project and resource proposal, resource and job allocation, and a wiki with a list of tutorials (those I’ve found the time to write), and external links to open software.

Funding
I am incredibly grateful for the latest donations. Your generous contributions will go towards a fourth server, more disk space, an Nvidia GTX 3080 Super graphics cards, and the running costs for one year. The link to the GoFundMe page is here. I would be extremely grateful if you could share the link with others.

Infrastructure
I’ve looked at getting a broadband static IP address, but sadly these are only offered with business packages, which range between £35 – £50 a month. So instead, as a workaround, I will allow users to submit job requests online. That data is automatically emailed to an account, then read on the head node server by my software.
I’m considering having a second electricity smart metre installed. It’s quite a giant leap and creates quite a few challenges, but electricity will be the most significant expense. I am looking into solar power—more of that latter.

Name
I still don’t have one yet. It will come to me.

On-going mission
There are now very few barriers to access fantastic scientific software packages and programs, and learning how to use them has never been easier. Whilst it doesn’t take much computing power to build, and even energy minimise a biological simulation, or prepare a drug database for drug-binding calculations, or to download and perform quality control checks on whole-genome data, performing simulations, free energy calculations, and GWAS calculations, require a lot of computing power. Many universities in developing countries do not have access to onsite high performance computing facilities such as Xeon processors and graphics cards. Meanwhile, access to cloud-based high-performance computing still incurs a high cost. I aim to build a high-performance cluster of workstations and permit students from developing countries to use the facilities to help with their scientific research. The project will run for one year as a trial.

Life Science Outreach Grid

September 15, 2021September 16, 2021 Dr Anthony NashLeave a comment

Not a name, this is just a place holder for now.

By now, some of you will know that I’m attempting to put together a small operation to help improve the educational provisions for university students in developing countries. I want to offer free high-performance computer time for the execution of computational chemistry, computational biophysics, and bioinformatics software to give those researches some of the opportunities I was privileged to have simply because of where I was born.

This is going to be one of many blog posts documenting my progress. Eventually, it will be superseded by LSOG’s blog and website. But until then, this will do.

My progress thus far.

The GoFundMe page is at £140! Thank you so much for these very kind and generous donations. And, earlier today, I traded in my old MacBook Pro for £290. I have a further Macbook Pro, although it’s a 2009 model, so probably not worth much, some DVDs, spare hard disks, an unused SoundBlaster sound card, a Lenovo Ideapad (a tablet), and dozens of DVDs nobody needs! Finally, when I find the time to replace my car seats, I will have two Crossfire seats worth around £60. These will go towards helping the operation get up and running.

Any and all donations are massively appreciated:

https://www.gofundme.com/manage/help-computerscience-in-developing-countries#

I have also made a brief enquiry about having an additional electricity metre fitted into my home. That way, the running of the machines won’t affect my domestic bills.

I am also designing the nuts and bolts of a website with the help of a kind volunteer. WordPress will host the site, and it will include features such as a Science Software Tutorial Wiki and a User Portal where eligible applicants can submit project proposals. I would also like to have, right at the front, a breakdown of the social and economic barriers students face in developing countries. Also, I want to include a list of publications, conference abstracts and project proposals that have used this service.

My internet connection will require a static IP address if users are to SSH onto the login node. But that’s going to require a business broadband connection. So, for now, users will need to upload software run files via the website, which will send me an email with an attachment, and I’ll write some code to stick it into a slurm queue.

I’m excited to say I’ve found a cheap refurbished NVidia 2080 Super 8GB graphic card. Once I have a few more funds I’ll add it to my collection of GPUs.

Final points

I’m nowhere near where I’d hoped to be. I’m juggling my commitment to our home, my place at Oxford, this project, any hopes of improving my fitness, and failing efforts at keeping up a social life with close friends. And to top it off, I’m finding life bloody hard. The last six months have been excruciating. My mental health is in a bad state, with levels of depression similar to what I experienced whilst working for a large American telecoms company almost twenty years ago; a meat grinder – young talented staff in, useless nervous wrecks out several years later. But I’m doing what I can to mitigate my depression, and this charitable operation is at the heart of keeping me motivated.

Hopefully, next update will include an official name, a new GPU, and maybe even a web address (or at least a Twitter account)…

Finally, I’d like to point out an interesting publication concerning scientific research in Nepal:

https://www.sciencedirect.com/science/article/pii/S2405844020325949

How would you build a bulletproof protein-bilayer Gromacs protocol?

June 19, 2021June 20, 2021 Dr Anthony NashLeave a comment

I am interested in learning how to circumvent potential structural issues when simulating a protein-bilayer (atomistic or coarse-grained) model. Each step, minimisation, equilibration and production, must operate without flaw or risk of crashing.

Motivation

I am writing software that automates the complete construction, equilibration, and production simulation of hundreds of CG protein-bilayer systems. Without automation, it would be far too time-consuming for any one person to undertake. But I have a problem with preventing a system from failing to go through a complete workflow (see below).

Potential structural issues include but are not limited to steric clashes after solvating a system, inserting a protein in a bilayer, and LINCS warnings/errors from a physically unrealistic system. To be clear, my basic protein-bilayer workflow, with accompanying thoughts, might include the following steps:

Work flow

Build protein-bilayer: All stages before an energy minimisation of the complete system, including bilayer construction, protein construction, inserting the protein across the bilayer, solvating the design, adding counter ions, removing water molecules inside the hydrocarbon lipid chains, etc. A coarse-grained model is a lot more forgiving in terms of protein and lipid placement.

For a CG protein-lipid bilayer system, I will use tools such as martinize and insane to build protein-lipid bilayer systems and encode protein models into CG representations. Then I solvate the system and remove all water molecules from inside the bilayer. Usually, I’ll take the Z-coordinate of the phosphorus atom (or corresponding CG bead) in the lipid furthest out from both bilayer leaflet planes and remove all water molecules found between the two Z-values.

Lipids can clash with the placement/orientation of an integral membrane protein, so I will usually remove those lipids nearest to the protein. Often this can leave a void that quickly fills with water. Suggestions to resolve this include adding position restraints to the water during the early stages of equilibration and running a short NVT equilibration simulation; the surface area of the lipids will increase as the unit cell stays fixed.

A coarse-grained model can self assemble, however, this is not guaranteed, and I would always favour some degree of considered protein-bilayer construction over complete self assemble. There are several automated techniques, including martinize, inflateGro and mdrun.

Energy minimisation: It shouldn’t need saying, but this is an essential step. It helps resolve steric clashes and yields a structure as close to its bonded equilibrium values as possible within the confines of the minimisation scheme (and the energy minima you are inside). Coarse-grained systems can be a little more forgiving than atomistic systems. I’ve seen CG systems throw LINCS warnings and fail to converge, yet a short equilibration simulation with a small time-step can fix underlying problems.

Minimisation schemes include; steepest descent, quick and efficient, but they can bounce around a local energy minimum indefinitely; and conjugate gradient, slow to begin with but a lot more efficient closer to an energy minimum.

NVT equilibration: Keep this as short as possible for atomistic simulations (no more than 50 ps at a dt of 0.002). If left to evolve for too long, the bilayer can form pores, micelles and other artefacts. For relatively straightforward CG protein-bilayer systems, I will miss out on this stage. I will usually place position restraints (1000 kJ/mol) on all alpha-carbons for complicated CG systems and any atomistic approach. These position restraints help prevent the protein backbone from distorting under the effects of high velocities whilst allowing the side chains to adjust. I’ll use the Berendsen thermostat. (Note: You can use velocity rescaling instead. The use of Berendsen is an old habit of mine from the days before v-rescaling was introduced to Gromacs.)

NPT equilibration: This is where I introduce pressure to the system. I keep this relatively short (50 ps, maybe a little longer) with a Berendsen pressure coupling in both CG and atomistic system. In the complicated CG system and all atomistic systems, the position restraints stay on. After the simulation finishes, I use the velocities (.cpt file) as starting velocities for a further short NPT simulation whilst maintaining the position restraints. I change temperature and pressure to more appropriate schemes, e.g., velocity rescaling or nose-hoover for temperature and Parrinello-Rahman for pressure. I repeat feeding preserved velocities into short NPT simulations whilst reducing the position restraints from 1000 kJ/mol, 100 kJ/mol, then finally ten kJ/mol.

For simple CG systems, after running the Berendsen pressure coupling 50 ps simulation, I switch over the temperature and pressure coupling as mentioned above. Typically, you won’t need to systematically reduce position restraints as the CG secondary structure is predetermined.

Finally, as mentioned above, I will usually reduce the CG time-step down to 0.005 (from 0.025) and run for 50 ps if the energy minimisation stage fails to converge or threw LINCS warnings.

NPT production run: By now, I’m using the correct pressure, and temperature coupling and all position restraints are gone. The system should be free of flaw to be left to run indefinitely. Note: I’m considering whether the system is equilibrated, nor am I considering when is it appropriate to conduct analysis; those are different questions.

Final thoughts

The above was a very brief and simplified Molecular Dynamics workflow. I’ve missed a lot, for example, water to bilayer/protein concentration, ion concentration, cleaning up PDB files, metal-ligand coordination (with a whole sidechain-ligand force field parameterisation), pressure and temperature coupling parameters, and all the other mdp parameters!

In terms of building a bullet proof Gromacs protocol, one in which you knew a computer was automating everything, what would you include to avoid error and human intervention?

June 15, 2021 Dr Anthony NashLeave a comment

Determining transmembrane peptide oligomeric states using SASA calculations.

<UNDER CONSTRUCTION>

Objective: Determine the oligomeric state of four coarse grained helical transmembrane peptides in a model phospholipid bilayer system.

Background: I want to calculate the proportion of time spent four helices spend in the oligomeric states: monomeric, two dimers, two monomers and a dimer, a monomer and a trimer, and a 4mer. This analysis is part of a home-grown project running off two GPU server machines I built out of reclaimed parts. This is not work/university funded or supported. I’m hoping to fine-tune the sequence of amino acids of four helical transmembrane peptides using Evolutionary Computing to yield a desired oligomeric state. Part of this involves deriving a fitness function that yields a score determined by the proportion of time spent in a particular oligomeric state. Almost nine years ago I tried using a sequence of distance based rules. As the system scales the number of rules rapidly increases.

Methods: I performed x100 50 ns NPT simulations of a DPPC/POPC solvated CG model bilayer. Each simulation included four transmembrane peptides positioned at the vertices of a square (looking at the x-y plane) of 5 nm by 5 nm. All four peptides shared the same amino acid sequence and the sequence was derived by randomly changing eight of the leucines in the sequence xxx-LLLLLLGLLLGLLLLLL-xxx (where xxx are polar/charged side chains to keep the peptide orientated as an integral membrane peptide). The motif GLLLG was used to encourage association. The sequence was checked that the DeltaG for protein insertion was negative.

I observed each of the simulations, discarding the first 150 steps. The rest of the simulation time was divided into the observations: monomeric, two dimers, two monomers and a dimer, a monomer and a trimer, and a 4mer. The Gromacs SASA analysis tool was used to calculate the solvent accessible surface area for each observation.

Results: There is a clear distinction between the SASA calculations for each of the oligomeric states with the exception of two dimers and a monomer and trimer.

From the SASA distributions it’s clear that a fitness function using SASA results alone is not enough and a second metric is required to help distinguish the difference between the oligomeric states two dimers and a monomer and trimer.

Conclusion: Early days! More to come.

Picking a programming language. A slight case of popularity.

February 17, 2021 Dr Anthony NashLeave a comment

I like Python. I like it just as much as I ‘like’ every other programming language. If I had to ‘love’ a language, I would pick Java for its history during the 90s and the early public-days of the Internet, Prolog and LISP for their impact on Artificial Intelligence, and C++ for making the Object Orientated paradigm prevalent*.

When you put history and nostalgia aside, love and like for a language goes out the window. I am currently working on three areas of research. The first involves a considerable amount of statistics, so I picked R. The second project incorporates cheminformatics, calling external programs such as the quantum chemical suite ORCA, and a few neural networks. Joining up those tasks and having a good source of existing packages made Python the best choice. Finally, the third project involves some modelling of differential expressions and for that Matlab was a great choice.

Three projects, three languages. I didn’t pick those languages because I like them. I decided on those languages because I thought they were the most appropriate for the tasks at hand. All three had the necessary package/module support (why reinvent the wheel if it already exists) and I knew that I would be able to get community support if I needed it.

This next part might make me unpopular, but I hope you can find some truth in these words whilst I’ll try my best to avoid sounding stuck up. I dislike what I see creep out of universities – “the cult of Python”. Take Python, change it for another language, and apply the same concerns, and this scenario is the same. I am not attacking a programming language; I’m attacking the mindset behind picking a programing language.

With a vast amount of data becoming increasingly available, and solutions behind research aims requiring a multidisciplinary approach, writing code is steadily becoming desired, if not essential. However, these conditions put research scientists under a lot of pressure to produce results fast with almost no proper software engineering training. As a result, I’m seeing many early-career scientists and PhD students reach for what is popular. And what is popular at the moment? Python. Ten years ago it would be a different language and again for the same reason, popularity.

I don’t blame the researcher/student I am merely reporting an observation and a concern. When I was doing my first degree in AI and Software Engineering back in 1998, universities were using Pascal as the language of choice for introducing programming to students. A couple of years later, universities dropped Pascal and Java took its place. Then C# was soon to replace Java as the new favourite. Now it’s Python. While on my PhD, our cohort was given a short course in R. Why R? It was the only language the lecturer knew, and it was popular in the small department at the time. My cohort had two mathematicians, two physicists, a software engineer/computer scientist, two biochemists and two chemists. Every Wednesday for an academic term, we had to attend tutorials on a language that wasn’t helpful to our project needs (some were doing wet lab work). However, most of my cohort continued to use R as a defacto language. It was like watching someone trying to fit a cube through a round hole.

Recently I saw an advertisement for a PhD and postdoc position in a life science/biochemistry department. One of the perks was “learn Python”. That’s not a perk. Learning a language is an excellent consequence of getting the science done, but it’s not a perk. You can pick up a Python book and start running through the tutorials over a weekend, so “learn Python” shouldn’t be a reason to take a job.

I found start-up companies to be the worst offenders. Usually staffed with researchers/developers straight out of university, development begins with whatever flavour of programming was taught at university at the time. One company I worked with was developing software solutions in Python using Jupyter Notebook. I have no problems with this on the surface, but it became quickly apparent that Python in a Notebook environment felt like putting together lego whilst wearing boxing gloves! Again, this was not the language’s fault; picking a language inappropriate for the task was the fault.

There have been several tweets that have almost had me cry over my morning toast. The science researchers were promoting Python over several alternative languages using the syntax for one line of code to measure language complexity. They argued that Python looks more like English, so it should be easier to learn. Firstly, that’s an awful measurement to compare language complexities and nuances. It’s the first, and hopefully the last time I see this. Secondly, Python may read a bit like English, but is it going to be an appropriate language for many banks’ enterprise-level software solution? Probably not. Or as a quick prototype to calculate the similarity between drug compounds? Probably, yes. Am I going to design heat-seeking missiles in Python? Hell no! Learning a language is tightly dictated by how complicated that task is and whether it fits within the program language framework. Every programming language is “universally complete,” i.e., each programming language can be reduced down to the same fundamental parts and therefore one language can emulate another language. However, this isn’t an excuse to pick the popular language over what is needed to get the job done.

In conclusion, it’s February 2021, and we’re in the middle of a pandemic. People on Twitter have suggested learning a programming language if you’re stuck at home, unable to return to the lab any time soon. I wouldn’t bother learning a language unless you’ve got a real task to solve. Also, it’s a pandemic! Learning to survive is a lot more critical. Get your friend into Virtual Reality and meet up with them there. But suppose you are determined to grab a keyboard and compiler/interpreter. In that case, I believe you would gain a lot more transferable skills from learning computer science basics rather than the latest popular language.

Finally, I’ve been developing software for twenty years. I’m tired of it. A language is just a language; it’s not something I feel the need to champion. Your favourite language today will be a chapter in the history books of tomorrow.

*C++ and Java increased the popularity of Object Orientation design. However, in the early 1960s, it first made an appearance via the LISP programming language.

Autodock Raccoon (1.1) – a hack for a strange python problem.

December 24, 2020 Dr Anthony NashLeave a comment

Introduction

I had hoped to calculate potential ligand binding sites and the top N drugs from the ZINC15 FDA dataset. For this I required:

AutoDockToolkit (ADT)
AutoGrid
AutoDock
AutoLigand
Raccoon (1.1) – this is not the same as Raccoon 2, which comes packed with MGLTools.

I’m running an Intel i9 38 core CPU, 128 GB, NVMe 1TB HD, and an NVIDEA Quadro 4000 GPU. I’m using Window 10* (pls see note below).

I was able to launch ADT, build all the grid maps for my receptor, and after performing AutoLigand, I refined the grid to cover the potential binding site. All was going well until I hit a road block when I tried performing a virtual scan using Raccoon(1.1).

The error

For the next step, I loaded up the original Raccoon and I selected the directory that contained the multipart mol2 file of my drug library. I then picked my receptor pdbqt file, the directory that contains my maps files and then finally, the default template parameter values. I selected the output directory and clicked “G E N E R A T E”. A python exception is immediately thrown and the error presented is:

Error: 1<type 'exceptions.NotImplementedError'> Exception in Tk callbackFunction: <function TheFunction at 0x03FF9470> (type: <type 'function'>)Args: ()Traceback (innermost last):File "C:\Program Files (x86)\MGLTools-1.5.7\lib\site-packages\Pmw\Pmw_1_3\lib\PmwBase.py", line 1747, in __call__return apply(self.func, args)File "C:\Program Files (x86)\MGLTools-1.5.7\raccoon.py", line 4573, in TheFunctionprepareDPF(dpf_file, receptor, ligand, flex_res)File "C:\Program Files (x86)\MGLTools-1.5.7\raccoon.py", line 4022, in prepareDPFdm.write_dpf(dpf_filename, parameter_list, pop_seed)File "C:\Program Files (x86)\MGLTools-1.5.7\raccoon.py", line 3761, in write_dpfself.dpo.write42(dpf_filename, parm_list)File "C:\Program Files (x86)\MGLTools-1.5.7\lib\site-packages\AutoDockTools\DockingParameters.py", line 1556, in write42dpf_ptr.write( self.make_param_string('autodock_parameter_version'))File "C:\Program Files (x86)\MGLTools-1.5.7\lib\site-packages\AutoDockTools\DockingParameters.py", line 1236, in make_param_stringraise NotImplementedError, "type (%s) of parameter %s unsupported" % (vt.__name__, param)<type 'exceptions.NotImplementedError'>: type (unicode) of parameter autodock_parameter_version unsupported

The exception thrown is essentially a complaint about the text input from a Tk UI component and on further inspection (see highlight) you’ll notice the error was thrown when trying to parse the first docking parameter from the default docking template. I deleted this parameter and found that Python would throw the same exception but for the next (which was now the first) docking parameter. I tried removing all parameters, but Raccoon requires something in the docking textbox.

The solution

Leaving just a single line comment (check there is no white space beneath) tricks Raccoon into thinking you have specific docking parameters whilst a set of default parameters is always applied. When you check the corresponding log files you’ll see for yourself that a set of default docking parameters are always used even when not explicitly supplied. Of course, this is only a fix if the default parameters are appropriate.

One more problem

Although, I was now able to run Raccoon on Windows 10, I still had one more issue to fix. Raccoon writes a RunVS.bat in the results target directory. The file contains a set of instruction to step into each small-molecule subdirectory and execute run.bat. Unfortunately, I couldn’t get the call to autodock4.exe inside each run.bat to find the executable. I opened raccoon.py, searched for autodock4.exe, and added the full path.

With those two issues resolved, I used Raccoon to generate the 1400+ run.bat files and then execute the master RunVS.bat from the command line terminal.

Retrieving drug targets from clue.io (CMAP) using Python.

November 7, 2020November 7, 2020 Dr Anthony NashLeave a comment

Some basic code that retrieves all known targets for a specified drug using the clue.io (CMAP) API. There are better sites and services (pubchem and drugbank) for target retrieval, however, I’ve yet to figure out how to do this with pubchem (I can do the reverse, all drugs for a target) and unless one wants to deal with a huge XML Schema, Drugbank is behind a paywall.

"""
For example: filter={"where":{"pert_iname":"Loperamide"}}        
"""
from typing import *
import json
import requests

drugNameStr: str = "Loperamide"
print("pubchemCIDStr", str(drugNameStr))
paramsStr: str = 'filter={"where":{"pert_iname":"' + drugNameStr + '"}}'             

req = requests.get(self.url, headers=self.headers, params=paramsStr)    
resultList: List = req.json()
if resultList is None or len(resultList)==0:
    print("Warning: Could not find targets for drug", drugNameStr)
else:
    resultDict = resultList[0]
    targetList: List = resultDict.get("target")

Unfortunately, drug nomenclature is a field of its own. Whilst “Loperamide” will return a match, you’re likely to supply a drug name that isn’t recognized, for example, whilst iterating through a file of several hundred drug names. Although, accuracy could be improved by suppling a CID number, the returned results do not contain a list of targets.