current position:Home>AAAI 2022 | Zhejiang University proposed KCl: molecular map comparative learning under the guidance of chemical element knowledge map

AAAI 2022 | Zhejiang University proposed KCl: molecular map comparative learning under the guidance of chemical element knowledge map

2022-02-04 23:52:28 Heart of machine

In recent years , In the biomedical field, molecular graph representation learning has been applied to a variety of downstream tasks related to biomedicine , For example, molecular property prediction and drug design . The graph contrast learning method does not need to design complex pre training tasks , Learning the representation of graphs by mining self supervised information from large-scale unsupervised data . Different from image , The comparative learning method on molecular graph faces unique challenges . First , The structure and semantic information of graphs are significantly different in different fields , Therefore, it is difficult to design a general graph enhancement scheme . Especially for molecular diagrams , The addition or deletion of chemical bonds or functional groups will change the identity and characteristics of molecules to a great extent . At present, most methods of graph contrast learning mainly focus on the topology of graphs , Rarely consider the domain knowledge contained in the diagram . Another easily overlooked problem is , Atoms in molecular diagrams are usually modeled as individuals that are connected only when there is a chemical bond between them , Failure to consider the implicit correlation between atoms ( for example , Commonalities between atoms with the same properties ).

To solve the above problems , come from Researchers at Zhejiang University use domain knowledge to guide the comparative study of molecular maps .

First , In order to establish the micro relationship between elements and the basic domain knowledge of each element , The study is based on the periodic table of chemical elements , Chemical elements were constructed Knowledge map (Chemical Element Knowledge Graph). As shown in the figure below , chemical element Knowledge map Describes the element ( The green part of the picture ) The relationship between them and their basic chemical properties ( for example , periodic 、 Metallicity , The red part of the picture ).

 picture

chemical element Knowledge map Established a connection between atoms that are not connected by chemical bonds but are chemically related

then , The study was carried out through chemical elements Knowledge map Guide the graph enhancement process of the original molecular graph , Helps to establish connections between atoms that are not adjacent but have the same properties . In this way , The enhanced molecular graph contains both topology knowledge , And have basic knowledge of the chemical field of elements . Based on this chemical element Knowledge map , This paper creatively proposes a knowledge enhanced molecular graph comparative learning framework Knowledge-enhancedContrastive Learning (KCL).KCL Using chemical elements Knowledge map Guide the enhancement process of the original molecular diagram , The knowledge of molecular enhancement graph is designed perception Messaging network KMPNN, The contrast loss is constructed to optimize the model by maximizing the consistency between positive sample pairs and the difference between difficult negative sample pairs . Experimental results show that ,KCL In covering different molecular properties 8 Data sets were obtained SOTA performance .

 picture

  • Address of thesis :https://arxiv.org/pdf/2112.00544.pdf

  • Data sets and code :_https://github.com/ZJU-Fangyin/KCL_


Method

 picture

KCL Frame diagram

say concretely ,KCL The framework is divided into three modules .

(1) Graph enhancement of knowledge guidance  

The map enhancement module of knowledge guidance uses chemical elements Knowledge map Guide the enhancement process of the original molecular diagram , Make the molecular enhancement diagram not only contain topology knowledge , It also contains basic domain knowledge of elements .

Chemical Element KG Construction: This study obtains all chemical elements and their basic chemical properties from the periodic table of chemical elements . Each element has 15 More than one property , Including metallic 、 periodic 、 state 、 weight 、 Electronegativity 、 Electron affinity 、 Melting point 、 boiling point 、 Ionization 、 radius 、 hardness 、 modulus 、 density 、 conductive 、 Heat and abundance . The extracted triples are represented by (Gas, isStateOf, Cl) The form of exists in KG in , Represents that there is a specified relationship between elements and properties .

 picture

chemical element Knowledge map Statistical information

Graph Augmentation: For each atom in the original molecular diagram , Find out the chemical elements Knowledge map Triple with the atom as tail entity . Take the header entities in these triples as new nodes , Relationship as head entity ( nature ) And tail entities ( Elements / atom ) The edge between , Obtain the molecular enhancement diagram . The molecular enhancement map is used as a positive sample of the original molecular map , Contains richer and more complex information , Able to capture the microscopic connections between atoms .

(2) knowledge perception The diagram of  

knowledge perception The graph representation module designs knowledge for molecular enhancement graph perception Messaging network KMPNN, Enhance two different types of knowledge in the graph with better transmission and fusion .

Knowledge Feature Initialization: The study used the commonly used KGE Method ,RotateE, Initialize the properties and relationship nodes in the molecular enhancement graph .

KMPNN Encoder: KMPNN For different types of neighbors , Two different types of messaging are provided , And allocate different attention to neighbors according to their importance . adopt KMPNN, The representation of molecular enhancement diagram can be obtained . Algorithm 1 It describes KMPNN The coding process :

 picture

knowledge perception Messaging network KMPNN The coding process

GNN based Encoder: For the original molecular diagram , use GNN The model learns its representation .

(3) Compare goals  

The comparison target module constructs the comparison loss by maximizing the consistency between positive sample pairs and the difference between difficult negative sample pairs , Express the model with optimization .

Projection Head: The representation of the original molecular diagram and the molecular enhancement diagram mapping To the same latent feature space , In order to calculate and compare the loss .

Negative Mining: Using difficult negative sample Mining Technology , Select the molecular map and its molecular enhancement map with similar distance in the molecular fingerprint space as the negative sample .

Contrastive Loss: A training batch Of a positive sample pair consisting of a molecule and its molecular enhancement diagram Loss function Can be expressed as :

 picture


experiment  

(1) Data sets

In order to verify KCL The effect of , The study was conducted in MoleculeNet Of 8 individual The benchmark Data sets , The specific information of the data set is as follows :
 picture
Dataset information

(2) experimental result  

The study was conducted in fine-tune protocol and linear protocol Experiments were carried out under two settings . Experiments show that ,KCL Under both settings, the effect of molecular graph representation learning method is better than that of the previous one .
 picture
Fine-tune protocol effect
 picture
Linear protocol effect

Summary and prospect

This paper aims to integrate the basic knowledge of chemistry into the study of molecular diagram representation . The study constructed the chemical elements Knowledge map To establish the microscopic connection between the elements , A knowledge guided molecular graph comparative learning framework is proposed —KCL. Experiments have proved KCL stay fine-tune protocol and linear protocol Effectiveness under two settings , And show KCL Compared with the previous methods, it has better interpretability and expression ability .

The research will expand this work in the following aspects . Domain knowledge of different granularity will be introduced to enrich chemical elements Knowledge map ; Will use deeper means of knowledge expression , Such as OWL2, For chemical elements Knowledge map Add description logic ; An open dataset will be released in multiple languages , Continuously update chemical elements Knowledge map .

copyright notice
author[Heart of machine],Please bring the original link to reprint, thank you.
https://en.fheadline.com/2022/02/202202042352224219.html

Random recommended