According to IBM, the database includes more than 2.4 million chemical compounds extracted from around 4.7 million patents. The company also said it contains 11 million biomedical journal abstracts from 1976 to 2000.
IBM will contribute the data to the National Center for Biotechnology Information, which is part of the National Library of Medicine, and the National Cancer Institute‘s computer-aided drug design group.
The data will be incorporated in NCBI’s PubChem, a public resource for the scientific community that aggregates scientific results. The data will also grow NCI CADD Group services, including the chemical structure lookup service and the chemical identifier resolver.
NIH will make the content available on PubChem at http://pubchem.ncbi.nlm.nih.gov.
Ultimately, IBM hopes the data will allow researchers to visualize important relationships among chemical compounds while aiding in drug development and cancer research.
“Information overload continues to be a challenge in drug discovery and other areas of scientific research,” said Steve Heller, project director for non-profit InChI Trust, which supports the “interlinking and combining of chemical, biological and related information.”
In order to extract the data, IBM said it used its analytics and optimization Strategic IP Insight platform, or SIIP, which is a data analytics program provided via the IBM SmartCloud. The company collaborated with several major life sciences organization to develop the cloud-based extraction system.