Dataset: Drug-target binding data from BindingDB using Kd measurements. Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pKd (pKd = -log10(Kd in M); higher means stronger binding). Dataset: bindingdb_kd. (1) The small molecule is Cn1cnc2c(F)c(Nc3ccc(Br)cc3Cl)c(C(=O)NOCCO)cc21. The target protein (P52564) has sequence MSQSKGKKRNPGLKIPKEAFEQPQTSSTPPRDLDSKACISIGNQNFEVKADDLEPIMELGRGAYGVVEKMRHVPSGQIMAVKRIRATVNSQEQKRLLMDLDISMRTVDCPFTVTFYGALFREGDVWICMELMDTSLDKFYKQVIDKGQTIPEDILGKIAVSIVKALEHLHSKLSVIHRDVKPSNVLINALGQVKMCDFGISGYLVDSVAKTIDAGCKPYMAPERINPELNQKGYSVKSDIWSLGITMIELAILRFPYDSWGTPFQQLKQVVEEPSPQLPADKFSAEFVDFTSQCLKKNSKERPTYPELMQHPFFTLHESKGTDVASFVKLILGD. The pKd is 5.0. (2) The compound is CCC1=C(C(C)C)/C(=C/C(C)=C\C=C\C(C)=C\C(=O)O)CCC1. The target protein (P62965) has sequence MPNFAGTWKMRSSENFDELLKALGVNAMLRKVAVAAASKPHVEIRQDGDQFYIKTSTTVRTTEINFKVGEGFEEETVDGRKCRSLPTWENENKIHCTQTLLEGDGPKTYWTRELANDELILTFGADDVVCTRIYVRE. The pKd is 6.7. (3) The small molecule is CCCCC(O)(c1ccc2nc(Cl)c(-c3ccccc3)c(Cl)c2c1)c1cc(C)no1. The target protein sequence is MAHHHHHHAGGAENLYFQGAMDSTPEAPYASLTEIEHLVQSVCKSYRETCQLRLEDLLRQRSNIFSREEVTGYQRKSMWEMWERCAHHLTEAIQYVVEFAKRLSGFMELCQNDQIVLLKAGAMEVVLVRMCRAYNADNRTVFFEGKYGGMELFRALGCSELISSIFDFSHSLSALHFSEDEIALYTALVLINAHRPGLQEKRKVEQLQYNLELAFHHHLCKTHRQSILAKLPPKGKLRSLCSQHVERLQIFQHLHPIVVQAAFPPLYKELFSTETESPVGLSK. The pKd is 6.6. (4) The small molecule is Nc1ccc(NS(=O)(=O)c2ccc(Cl)cc2)cc1. The target protein sequence is MSAVALPRVSGGHDEHGHLEEFRTDPIGLMQRVRDECGDVGTFQLAGKQVVLLSGSHANEFFFRAGDDDLDQAKAYPFMTPIFGEGVVFDASPERRKEMLHNAALRGEQMKGHAATIEDQVRRMIADWGEAGEIDLLDFFAELTIYTSSACLIGKKFRDQLDGRFAKLYHELERGTDPLAYVDPYLPIESFRRRDEARNGLVALVADIMNGRIANPPTDKSDRDMLDVLIAVKAETGTPRFSADEITGMFISMMFAGHHTSSGTASWTLIELMRHRDAYAAVIDELDELYGDGRSVSFHALRQIPQLENVLKETLRLHPPLIILMRVAKGEFEVQGHRIHEGDLVAASPAISNRIPEDFPDPHDFVPARYEQPRQEDLLNRWTWIPFGAGRHRCVGAAFAIMQIKAIFSVLLREYEFEMAQPPESYRNDHSKMVVQLAQPACVRYRRRTGV. The pKd is 5.1. (5) The pKd is 5.0. The drug is O=C1CC(CN2CCN(c3ccccc3)CC2)Cc2occc21. The target protein (P30994) has sequence MASSYKMSEQSTISEHILQKTCDHLILTDRSGLKAESAAEEMKQTAENQGNTVHWAALLIFAVIIPTIGGNILVILAVSLEKRLQYATNYFLMSLAVADLLVGLFVMPIALLTIMFEATWPLPLALCPAWLFLDVLFSTASIMHLCAISLDRYIAIKKPIQANQCNSRTTAFVKITVVWLISIGIAIPVPIKGIEADVVNAHNITCELTKDRFGSFMLFGSLAAFFAPLTIMIVTYFLTIHALRKKAYLVRNRPPQRLTRWTVSTVLQREDSSFSSPEKMVMLDGSHKDKILPNSTDETLMRRMSSAGKKPAQTISNEQRASKVLGIVFLFFLLMWCPFFITNVTLALCDSCNQTTLKTLLQIFVWVGYVSSGVNPLIYTLFNKTFREAFGRYITCNYQATKSVKVLRKCSSTLYFGNSMVENSKFFTKHGIRNGINPAMYQSPVRLRSSTIQSSSIILLNTFLTENDGDKVEDQVSYI. (6) The compound is CNC(=O)c1c(F)cccc1Nc1nc(Nc2cc3c(cc2OC)CCN3C(=O)CN(C)C)nc2[nH]ccc12. The target protein (O15197) has sequence MATEGAAQLGNRVAGMVCSLWVLLLVSSVLALEEVLLDTTGETSEIGWLTYPPGGWDEVSVLDDQRRLTRTFEACHVAGAPPGTGQDNWLQTHFVERRGAQRAHIRLHFSVRACSSLGVSGGTCRETFTLYYRQAEEPDSPDSVSSWHLKRWTKVDTIAADESFPSSSSSSSSSSSAAWAVGPHGAGQRAGLQLNVKERSFGPLTQRGFYVAFQDTGACLALVAVRLFSYTCPAVLRSFASFPETQASGAGGASLVAAVGTCVAHAEPEEDGVGGQAGGSPPRLHCNGEGKWMVAVGGCRCQPGYQPARGDKACQACPRGLYKSSAGNAPCSPCPARSHAPNPAAPVCPCLEGFYRASSDPPEAPCTGPPSAPQELWFEVQGSALMLHWRLPRELGGRGDLLFNVVCKECEGRQEPASGGGGTCHRCRDEVHFDPRQRGLTESRVLVGGLRAHVPYILEVQAVNGVSELSPDPPQAAAINVSTSHEVPSAVPVVHQVSRA.... The pKd is 5.0.