matter.farm

Computer-generated molecule database for drug discovery

About

matter.farm is an open database that continually generates and publishes novel molecular structures that are potential drug candidates.

Pharmacological space––or the space of possible molecules that are likely to have pharmacological activity––is vast. Some estimates place it at 1060 to 1063 possible distinct molecular structures.

matter.farm aims to delineate and publish as much of this space as possible and it also makes potential synthesis routes available to enable researchers and others to produce and test the molecules matter.farm generates. matter.farm is a non-commercial entity and the information matter.farm publishes is free and open to the public, to researchers, and to manufacturers. matter.farm does not lay any claim of ownership to the molecular structures it publishes. In many jurisdictions of patent law, the structures will become “prior art” or “background art” from the date of their publication. As a result, the structures published by matter.farm may not be able to be patented by other entities. (This information is not intended as legal advice.)

Each molecule that is generated by matter.farm is checked against the PubChem database to determine if it is novel at the time of publication. Molecular structures that exist in PubChem at the time that they are generated and checked by matter.farm are not published. matter.farm intends to only publish novel structures. However, researchers, manufacturers and others should do their own search to determine whether any particular structure is claimed by a patent of another entity. Additionally, molecular structures that do not appear in Pubchem but that exist elsewhere in literature or in the world, may be published by matter.farm though they are not in fact novel. matter.farm makes no warranty that its information is correct and is not liable for any damages that may result from the use of its site and information (see Terms of Use for more information).

The current version of matter.farm, launched in 2018 focuses on small molecule ligands, primarily following the Lipinski rule. Future versions may include biologics and other areas.

Using matter.farm

Each molecular structure generated by matter.farm is timestamped with its date of publication (or “Birthday”). For ease of use, molecules are grouped under the human receptors, enzymes and other therapeutic targets that matter.farm’s algorithm determines them to be most likely to activate (or inhibit). However, researchers, manufacturers and others should consider all other possible, relevant targets for a given molecule––the target a molecule is listed under as well as types of activation (e.g. agonist, antagonist, etc.) are meant as guidelines to assist in finding the structures on matter.farm. They are not intended to be limiting as to the potential therapeutic uses of a structure.

matter.farm also includes a predicted ATC (Anatomical Therapeutic Chemical classification) code for each entry for ease of use. As with the listed receptor and enzyme targets, these ATC codes should not be construed as being limiting to the potential therapeutic uses. All other potentially relevant uses should also be considered.

The entry for each molecular structure also includes a unique identifier and a SMILES string pertaining to the structure.

Clicking on the entry will lead to a page showing a 3-dimensional, interactive view of the structure.

At the bottom of each entry is a link to a potential synthesis plan to produce the molecule. This may not be the most efficient or cost-effective route. Other synthesis routes should be given consideration.

Citations and Acknowledgements

  1. Botev, Viktor, Kaloyan Marinov, and Florian Schäfer. "Word importance-based similarity of documents metric (WISDM): Fast and scalable document similarity metric for analysis of scientific documents." Proceedings of the 6th International Workshop on Mining Scientific Publications. ACM, 2017
  2. Jin, Wengong, Regina Barzilay, and Tommi Jaakkola. "Junction Tree Variational Autoencoder for Molecular Graph Generation." arXiv preprint arXiv:1802.04364 (2018).
  3. Kusner, Matt J., Brooks Paige, and José Miguel Hernández-Lobato. "Grammar variational autoencoder." arXiv preprint arXiv:1703.01925 (2017).
  4. Goh, Garrett B., Nathan O. Hodas, and Abhinav Vishnu. "Deep learning for computational chemistry." Journal of computational chemistry 38.16 (2017): 1291-1307.
  5. Yang, Xiufeng, et al. "ChemTS: an efficient python library for de novo molecular generation." Science and technology of advanced materials 18.1 (2017): 972-976.
  6. Liu, Yue, et al. "Materials discovery and design using machine learning." Journal of Materiomics 3.3 (2017): 159-177.
  7. Segler, Marwin HS, Mike Preuss, and Mark P. Waller. "Planning chemical syntheses with deep neural networks and symbolic AI." Nature 555.7698 (2018): 604.
  8. Kim, Edward, et al. "Virtual screening of inorganic materials synthesis parameters with deep learning." npj Computational Materials 3.1 (2017): 53.
  9. Josse, Julie, Jérome Pagès, and François Husson. "Testing the significance of the RV coefficient." Computational Statistics & Data Analysis 53.1 (2008): 82-91.
  10. Cordasco, Gennaro, and Luisa Gargano. "Community detection via semi-synchronous label propagation algorithms." Business Applications of Social Network Analysis (BASNA), 2010 IEEE International Workshop on. IEEE, 2010.
  11. Community Detection in Python
  12. Jin, Wengong, Regina Barzilay, and Tommi Jaakkola. "Junction Tree Variational Autoencoder for Molecular Graph Generation." arXiv preprint arXiv:1802.04364 (2018).
  13. What are the differences between community detection algorithms in igraph?
  14. Summary of community detection algorithms in igraph 0.6
  15. Wang, Yong-Cui, et al. "Network predicting drug’s anatomical therapeutic chemical code." Bioinformatics 29.10 (2013): 1317-1324.
  16. Liu, Zhongyang, et al. "Similarity-based prediction for Anatomical Therapeutic Chemical classification of drugs by integrating multiple data sources." Bioinformatics 31.11 (2015): 1788-1795.
  17. Cheng, Xiang, et al. "iATC-mHyb: a hybrid multi-label classifier for predicting the classification of anatomical therapeutic chemicals." Oncotarget 8.35 (2017): 58494.
  18. Szklarczyk D, Santos A, von Mering C, Jensen LJ, Bork P, Kuhn M. STITCH 5: augmenting protein-chemical interaction networks with tissue and affinity data. Nucleic Acids Res. 2016 Jan 4;44(D1):D380-4.
  19. Wishart DS, Feunang YD, Guo AC, Lo EJ, Marcu A, Grant JR, Sajed T, Johnson D, Li C, Sayeeda Z, Assempour N, Iynkkaran I, Liu Y, Maciejewski A, Gale N, Wilson A, Chin L, Cummings R, Le D, Pon A, Knox C, Wilson M. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 2017 Nov 8. doi: 10.1093/nar/gkx1037.
  20. Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, Gindulyte A, Han L, He J, He S, Shoemaker BA, Wang J, Yu B, Zhang J, Bryant SH. PubChem Substance and Compound databases. Nucleic Acids Res. 2016 Jan 4; 44(D1):D1202-13. Epub 2015 Sep 22 [PubMed PMID: 26400175] doi: 10.1093/nar/gkv951.
  21. Gilson, Michael K., et al. "BindingDB in 2015: a public database for medicinal chemistry, computational chemistry and systems pharmacology." Nucleic acids research 44.D1 (2015): D1045-D1053.
  22. The UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45: D158-D169 (2017)
  23. Gaulton A, Hersey A, Nowotka M, Bento AP, Chambers J, Mendez D, Mutowo P, Atkinson F, Bellis LJ, Cibrián-Uhalte E,
    Davies M, Dedman N, Karlsson A, Magariños MP, Overington JP, Papadatos G, Smit I, Leach AR. (2017)
    'The ChEMBL database in 2017.' Nucleic Acids Res., 45(D1) D945-D954.
  24. Papadatos, George, et al. "SureChEMBL: a large-scale, chemically annotated patent document database." Nucleic acids research 44.D1 (2015): D1220-D1228.
  25. Lowe, Daniel Mark. Extraction of chemical structures and reactions from the literature. Diss. University of Cambridge, 2012.
  26. Lowe, Daniel (2017): Chemical reactions from US patents (1976-Sep2016). figshare. Fileset. CC0 License.
  27. Liu, Bowen, et al. "Retrosynthetic reaction prediction using neural sequence-to-sequence models." ACS central science 3.10 (2017): 1103-1113.
  28. Klucznik, Tomasz, et al. "Efficient syntheses of diverse, medicinally relevant targets planned by computer and executed in the laboratory." Chem 4.3 (2018): 522-532.
  29. Segler, Marwin HS, and Mark P. Waller. "Neural‐Symbolic Machine Learning for Retrosynthesis and Reaction Prediction." Chemistry–A European Journal 23.25 (2017): 5966-5971.
  30. Law, James, et al. "Route designer: a retrosynthetic analysis tool utilizing automated retrosynthetic rule generation." Journal of chemical information and modeling 49.3 (2009): 593-602.
  31. Schwaller, Philippe, et al. "“Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models." Chemical science 9.28 (2018): 6091-6098.
  32. Kipf, Thomas N., and Max Welling. "Semi-supervised classification with graph convolutional networks." arXiv preprint arXiv:1609.02907 (2016).
  33. Yang, Zhilin, William W. Cohen, and Ruslan Salakhutdinov. "Revisiting semi-supervised learning with graph embeddings." arXiv preprint arXiv:1603.08861 (2016).
  34. Kipf, Thomas, et al. "Neural relational inference for interacting systems." arXiv preprint arXiv:1802.04687 (2018).
  35. Wei, Jennifer N., David Duvenaud, and Alán Aspuru-Guzik. "Neural networks for the prediction of organic chemistry reactions." ACS central science 2.10 (2016): 725-732.
  36. Coley, Connor W., et al. "Prediction of organic reaction outcomes using machine learning." ACS central science 3.5 (2017): 434-443.
  37. Gupta, Anvita. "Predicting Chemical Reaction Type and Reaction Products with Recurrent Neural Networks."
  38. Plehiers, Pieter P., et al. "Automated reaction database and reaction network analysis: extraction of reaction templates using cheminformatics." Journal of cheminformatics 10.1 (2018): 11.
  39. MIT's ASKCOS ("Automated System for Knowledge-based Continuous Organic Synthesis")