In recent years , In the biomedical field, molecular graph representation learning has been applied to a variety of downstream tasks related to biomedicine , For example, molecular property prediction and drug design . The graph contrast learning method does not need to design complex pre training tasks , Learning the representation of graphs by mining self supervised information from large-scale unsupervised data . Different from image , The comparative learning method on molecular graph faces unique challenges . First , The structure and semantic information of graphs are significantly different in different fields , Therefore, it is difficult to design a general graph enhancement scheme . Especially for molecular diagrams , The addition or deletion of chemical bonds or functional groups will change the identity and characteristics of molecules to a great extent . At present, most methods of graph contrast learning mainly focus on the topology of graphs , Rarely consider the domain knowledge contained in the diagram . Another easily overlooked problem is , Atoms in molecular diagrams are usually modeled as individuals that are connected only when there is a chemical bond between them , Failure to consider the implicit correlation between atoms （ for example , Commonalities between atoms with the same properties ）.
To solve the above problems , come from
Researchers at Zhejiang University use domain knowledge to guide the comparative study of molecular maps .
First , In order to establish the micro relationship between elements and the basic domain knowledge of each element , The study is based on the periodic table of chemical elements , Chemical elements were constructed
Knowledge map (Chemical Element Knowledge Graph). As shown in the figure below , chemical element
Knowledge map Describes the element （ The green part of the picture ） The relationship between them and their basic chemical properties （ for example , periodic 、 Metallicity , The red part of the picture ）.
chemical element Knowledge map Established a connection between atoms that are not connected by chemical bonds but are chemically related
then , The study was carried out through chemical elements
Knowledge map Guide the graph enhancement process of the original molecular graph , Helps to establish connections between atoms that are not adjacent but have the same properties . In this way , The enhanced molecular graph contains both topology knowledge , And have basic knowledge of the chemical field of elements . Based on this chemical element
Knowledge map , This paper creatively proposes a knowledge enhanced molecular graph comparative learning framework Knowledge-enhancedContrastive Learning (KCL).KCL Using chemical elements
Knowledge map Guide the enhancement process of the original molecular diagram , The knowledge of molecular enhancement graph is designed
perception Messaging network KMPNN, The contrast loss is constructed to optimize the model by maximizing the consistency between positive sample pairs and the difference between difficult negative sample pairs . Experimental results show that ,KCL In covering different molecular properties 8 Data sets were obtained SOTA performance .
say concretely ,KCL The framework is divided into three modules .
（1） Graph enhancement of knowledge guidance
The map enhancement module of knowledge guidance uses chemical elements
Knowledge map Guide the enhancement process of the original molecular diagram , Make the molecular enhancement diagram not only contain topology knowledge , It also contains basic domain knowledge of elements .
Chemical Element KG Construction: This study obtains all chemical elements and their basic chemical properties from the periodic table of chemical elements . Each element has 15 More than one property , Including metallic 、 periodic 、 state 、 weight 、 Electronegativity 、 Electron affinity 、 Melting point 、 boiling point 、 Ionization 、 radius 、 hardness 、 modulus 、 density 、 conductive 、 Heat and abundance . The extracted triples are represented by (Gas, isStateOf, Cl) The form of exists in KG in , Represents that there is a specified relationship between elements and properties .
chemical element Knowledge map Statistical information
Graph Augmentation: For each atom in the original molecular diagram , Find out the chemical elements
Knowledge map Triple with the atom as tail entity . Take the header entities in these triples as new nodes , Relationship as head entity （ nature ） And tail entities （ Elements / atom ） The edge between , Obtain the molecular enhancement diagram . The molecular enhancement map is used as a positive sample of the original molecular map , Contains richer and more complex information , Able to capture the microscopic connections between atoms .
（2） knowledge perception The diagram of
perception The graph representation module designs knowledge for molecular enhancement graph
perception Messaging network KMPNN, Enhance two different types of knowledge in the graph with better transmission and fusion .
Knowledge Feature Initialization: The study used the commonly used KGE Method ,RotateE, Initialize the properties and relationship nodes in the molecular enhancement graph .
KMPNN Encoder: KMPNN For different types of neighbors , Two different types of messaging are provided , And allocate different attention to neighbors according to their importance . adopt KMPNN, The representation of molecular enhancement diagram can be obtained . Algorithm 1 It describes KMPNN The coding process ：
knowledge perception Messaging network KMPNN The coding process
GNN based Encoder: For the original molecular diagram , use GNN The model learns its representation .
The comparison target module constructs the comparison loss by maximizing the consistency between positive sample pairs and the difference between difficult negative sample pairs , Express the model with optimization .
Projection Head: The representation of the original molecular diagram and the molecular enhancement diagram
mapping To the same latent feature space , In order to calculate and compare the loss .
Negative Mining: Using difficult negative sample Mining Technology , Select the molecular map and its molecular enhancement map with similar distance in the molecular fingerprint space as the negative sample .
Contrastive Loss: A training
batch Of a positive sample pair consisting of a molecule and its molecular enhancement diagram
Loss function Can be expressed as ：
In order to verify KCL The effect of , The study was conducted in MoleculeNet Of 8 individual
The benchmark Data sets , The specific information of the data set is as follows ：
The study was conducted in fine-tune protocol and linear protocol Experiments were carried out under two settings . Experiments show that ,KCL Under both settings, the effect of molecular graph representation learning method is better than that of the previous one .
Fine-tune protocol effect
This paper aims to integrate the basic knowledge of chemistry into the study of molecular diagram representation . The study constructed the chemical elements
Knowledge map To establish the microscopic connection between the elements , A knowledge guided molecular graph comparative learning framework is proposed —KCL. Experiments have proved KCL stay fine-tune protocol and linear protocol Effectiveness under two settings , And show KCL Compared with the previous methods, it has better interpretability and expression ability .
The research will expand this work in the following aspects . Domain knowledge of different granularity will be introduced to enrich chemical elements
Knowledge map ; Will use deeper means of knowledge expression , Such as OWL2, For chemical elements
Knowledge map Add description logic ; An open dataset will be released in multiple languages , Continuously update chemical elements
Knowledge map .
author[Heart of machine],Please bring the original link to reprint, thank you.