Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pKd (pKd = -log10(Kd in M); higher means stronger binding). Dataset: bindingdb_kd. From a dataset of Drug-target binding data from BindingDB using Kd measurements. (1) The compound is COc1cc(C2CCNCC2)ccc1Nc1ncc(C(F)(F)F)c(CCc2ccccc2CC(N)=O)n1. The target protein sequence is MAAAYLDPNLNHTPSSSTKTHLGTGMERSPGAMERVLKVFHYFESSSEPTTWASIIRHGDATDVRGIIQKIVDSHKVKHVACYGFRLSHLRSEEVHWLHVDMGVSSVREKYELAHPPEEWKYELRIRYLPKGFLNQFTEDKPTLNFFYQQVKSDYMQEIADQVDQEIALKLGCLEIRRSYWEMRGNALEKKSNYEVLEKDVGLKRFFPKSLLDSVKAKTLRKLIQQTFRQFANLNREESILKFFEILSPVYRFDKECFKCALGSSWIISVELAIGPEEGISYLTDKGCNPTHLADFNQVQTIQYSNSEDKDRKGMLQLKIAGAPEPLTVTAPSLTIAENMADLIDGYCRLVNGATQSFIIRPQKEGERALPSIPKLANSEKQGMRTHAVSVSHCQHKVKKARRFLPLVFCSLEPPPTDEISGDETDDYAEIIDEEDTYTMPSKSYGIDEARDYEIQRERIELGRCIGEGQFGDVHQGVYLSPENPALAVAIKTCKNCTSD.... The pKd is 9.1. (2) The pKd is 9.0. The target protein sequence is HHSTVADGLITTLHYPAPKRNKPTVYGVSPNYDKWEMERTDITMKHKLGGGHYGEVYEGVWKKYSLTVAVKTLKEDTMEVEEFLKEAAVMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDYLRECNRQEVNAVVLLYMATQISSAMEYLEKKNFIHRDLAARNCLVGENHLVKVADFGLSRLMTGDTYTAHAGAKFPIKWTAPESLAYNKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLSQVYELLEKDYRMERPEGCPEKVYELMRACWQWNPSDRPSFAEIHQAFETMFQES. The small molecule is Cc1nc(Nc2ncc(C(=O)Nc3c(C)cccc3Cl)s2)cc(N2CCN(CCO)CC2)n1. (3) The compound is CCCCNc1ccc(C(=O)OCC[N+](C)(C)CCC[NH3+])cc1. The target protein (Q00194) has sequence MKKVIINTWHSFVNIPNVVGPDVEKEITRMENGACSSFSGDDDDSASMFEESETENPHARDSFRSNTHGSGQPSQREQYLPGAIALFNVNNSSNKEQEPKEKKKKKKEKKSKPDDKNENKKDPEKKKKKEKDKDKKKKEEKGKDKKEEEKKEVVVIDPSGNTYYNWLFCITLPVMYNWTMIIARACFDELQSDYLEYWLAFDYLSDVVYLLDMFVRTRTGYLEQGLLVKEERKLIDKYKSTFQFKLDVLSVIPTDLLYIKFGWNYPEIRLNRLLRISRMFEFFQRTETRTNYPNIFRISNLVMYIIIIIHWNACVYFSISKAIGFGNDTWVYPDVNDPDFGRLARKYVYSLYWSTLTLTTIGETPPPVRDSEYFFVVADFLIGVLIFATIVGNIGSMISNMNAARAEFQARIDAIKQYMHFRNVSKDMEKRVIKWFDYLWTNKKTVDEREVLKYLPDKLRAEIAINVHLDTLKKVRIFADCEAGLLVELVLKLQPQVYSP.... The pKd is 5.8. (4) The small molecule is [NH3+][C@@H](Cc1ccc(OS(=O)(=O)[O-])cc1)C(=O)[O-]. The target protein sequence is QAEEWYFGKITRRESERLLLNPENPRGTFLVRRLQRVKGAYALSVSDFDNAKGLNVLHYKIRKLDSGGFYITSRTQFSSLQQLVAYYSKHADGLCHRLTNVCPT. The pKd is 8.2. (5) The drug is CC(C)[C@H](NC(=O)[C@@H]1CCCN1C(=O)[C@H](CCC(=O)O)NC(=O)[C@@H](N)CO)C(=O)N[C@@H](Cc1ccc(OP(=O)(O)O)cc1)C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@H](C(=O)N[C@H](C(=O)N[C@@H](CCC(=O)O)C(=O)O)[C@@H](C)O)[C@@H](C)O. The target protein sequence is SLEKHSWYHGPVSRNAAEYLLSSGINGSFLVRESESSPGQRSISLRYEGRVYHYRINTASDGKLYVSSESRFNTLAELVHHHSTVADGLITTLHYPAPKRNKPTVY. The pKd is 6.2. (6) The drug is C=CCC(=C)/C=C/[C@@](C)(O)[C@H]1O[C@@H]2C[C@@H]3O[C@@H]4C[C@@H]5O[C@@H]6C[C@@H]7O[C@@H]8C[C@@H]9O[C@@H]%10C[C@@H]%11O[C@](C)(CCOS(=O)(=O)[O-])[C@@H](OS(=O)(=O)[O-])C[C@H]%11O[C@H]%10C[C@H]9O[C@H]8CC[C@@]7(C)O[C@@]6(C)CC[C@H](C)[C@H]5O[C@H]4C[C@@]3(C)O[C@H]2CC1=C. The pKd is 5.4. The target protein (P62834) has sequence MREYKLVVLGSGGVGKSALTVQFVQGIFVEKYDPTIEDSYRKQVEVDCQQCMLEILDTAGTEQFTAMRDLYMKNGQGFALVYSITAQSTFNDLQDLREQILRVKDTEDVPMILVGNKCDLEDERVVGKEQGQNLARQWCNCAFLESSAKSKINVNEIFYDLVRQINRKTPVEKKKPKKKSCLLL. (7) The small molecule is N#CC[C@H](C1CCCC1)n1cc(-c2ncnc3[nH]ccc23)cn1. The target protein (P42681) has sequence MILSSYNTIQSVFCCCCCCSVQKRQMRTQISLSTDEELPEKYTQRRRPWLSQLSNKKQSNTGRVQPSKRKPLPPLPPSEVAEEKIQVKALYDFLPREPCNLALRRAEEYLILEKYNPHWWKARDRLGNEGLIPSNYVTENKITNLEIYEWYHRNITRNQAEHLLRQESKEGAFIVRDSRHLGSYTISVFMGARRSTEAAIKHYQIKKNDSGQWYVAERHAFQSIPELIWYHQHNAAGLMTRLRYPVGLMGSCLPATAGFSYEKWEIDPSELAFIKEIGSGQFGVVHLGEWRSHIQVAIKAINEGSMSEEDFIEEAKVMMKLSHSKLVQLYGVCIQRKPLYIVTEFMENGCLLNYLRENKGKLRKEMLLSVCQDICEGMEYLERNGYIHRDLAARNCLVSSTCIVKISDFGMTRYVLDDEYVSSFGAKFPIKWSPPEVFLFNKYSSKSDVWSFGVLMWEVFTEGKMPFENKSNLQVVEAISEGFRLYRPHLAPMSIYEVMY.... The pKd is 5.0. (8) The target protein sequence is MTLHSQSTTSPLFPQISSSWVHSPSEAGLPLGTVTQLGSYQISQETGQFSSQDTSSDPLGGHTIWQVVFIAFLTGFLALVTIIGNILVIVAFKVNKQLKTVNNYFLLSLASADLIIGVISMNLFTTYIIMNRWALGNLACDLWLSIDYVASNASVMNLLVISFDRYFSITRPLTYCAKRTTKRAGVMIGLAWVISFVLWAPAILFWQYFVGKRTVPPGECFIQFLSEPTITFGTAIAAFYMPVTIMTILYWRIYKETEKRTKELAGLQASGTEIEGRIEGRIEGRTRSQITKRKRMSLIKEKKAAQTLSAILLAFIITWTPYNIMVLVNTFADSAIPKTYWNLGYWLCYINSTVNPVAYALSNKTFRTTFKTLLLSQSDKRKRRKQQYQQRQSVIFHKRVPEQAL. The pKd is 9.2. The small molecule is C[N+]1(C)[C@H]2CC(OC(=O)[C@H](CO)c3ccccc3)C[C@@H]1[C@H]1O[C@@H]21. (9) The target protein (O60346) has sequence MEPAAAATVQRLPELGREDRASAPAAAAAAAAAAAAAAAALAAAAGGGRSPEPALTPAAPSGGNGSGSGAREEAPGEAPPGPLPGRAGGAGRRRRRGAPQPIAGGAAPVPGAGGGANSLLLRRGRLKRNLSAAAAAASSSSSSSAAAASHSPGAAGLPASCSASASLCTRSLDRKTLLLKHRQTLQLQPSDRDWVRHQLQRGCVHVFDRHMASTYLRPVLCTLDTTAGEVAARLLQLGHKGGGVVKVLGQGPGAAAAREPAEPPPEAGPRLAPPEPRDSEVPPARSAPGAFGGPPRAPPADLPLPVGGPGGWSRRASPAPSDSSPGEPFVGGPVSSPRAPRPVVSDTESFSLSPSAESVSDRLDPYSSGGGSSSSSEELEADAASAPTGVPGQPRRPGHPAQPLPLPQTASSPQPQQKAPRAIDSPGGAVREGSCEEKAAAAVAPGGLQSTPGRSGVTAEKAPPPPPPPTLYVQLHGETTRRLEAEEKPLQIQNDYLFQL.... The pKd is 6.2. The small molecule is CC(C)CCCC1(C)CCc2cc(S(N)(=O)=O)cc(Br)c2O1.