Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pKi (pKi = -log10(Ki in M); higher means stronger inhibition). Dataset: bindingdb_ki.. Dataset: Drug-target binding data from BindingDB using Ki measurements (1) The drug is COC(=O)C(Cc1ccccc1)NC(=O)C[C@H](O)[C@H](CC(C)C)NC(=O)[C@@H](NC(=O)[C@@H](NC(=O)CC(C)C)C(C)C)C(C)C. The target protein (P06026) has sequence MKFTLISSCIAIAALAVAVDAAPGEKKISIPLAKNPNYKPSAKNAIQKAIAKYNKHKINTSTGGIVPDAGVGTVPMTDYGNDVEYYGQVTIGTPGKKFNLDFDTGSSDLWIASTLCTNCGSRQTKYDPKQSSTYQADGRTWSISYGDGSSASGILAKDNVNLGGLLIKGQTIELAKREAASFANGPNDGLLGLGFDTITTVRGVKTPMDNLISQGLISRPIFGVYLGKASNGGGGEYIFGGYDSTKFKGSLTTVPIDNSRGWWGITVDRATVGTSTVASSFDGILDTGTTLLILPNNVAASVARAYGASDNGDGTYTISCDTSRFKPLVFSINGASFQVSPDSLVFEEYQGQCIAGFGYGNFDFAIIGDTFLKNNYVVFNQGVPEVQIAPVAQ. The pKi is 8.0. (2) The drug is Cl.O=C(Oc1ccc(O)cc1)[C@@H]1[C@@H]2CCC[C@H]1NCC2. The target protein sequence is MMENTGNISDLLYALSNPMVSNSSILCRNFSNSSGLVNMNSSVCDRTPELDKSSTPVIVAIIITALYSIVCVMGMGLVGNVLVMYVIIRYTKMKTATNIYIFNLALADSLATSTLPFQSVNYLMGTWPFGDELCKIVMSIDYYNMFTSIFTLTTMSVDRYIAVCHPVKALDFRTPRNAKIVNVCNWILSSAIGLPVMVMASTTSDLHSNGIIDCTLLFPHPSWYWENLLKICVFIFAFIMPVLIITVCYGLMILRLKSVRMLSGSKEKDRNLRRITRMVLVVVAVFIVCWTPIHIFVIIKALVTIPNSLLQTITWHFCIALGYTNSCLNPVLYAFLDENFKRCFREFCVPSPSVLDLQNSTRSRNPQRDGQSSGHTVDRTNQQV. The pKi is 4.9. (3) The small molecule is Cc1c(C(=O)NN2CCCCC2)nn(-c2ccc(Cl)cc2Cl)c1-c1ccc(Cl)cc1. The target protein sequence is MDTMTMETLLLSSTLLLLPFNKHGGAKLAGWLVNDLPRRRIMKPPSRFEVHIRFCCLKPENAETTFKVDGRRFNMSTKTLKLYRDTTYRIGVTSSPPMEFEEAEINGENLISHLEPDGGIEADWSTAGFSKTKSRSRCNIRLMLRGVFGSVTQDLQCKFYDISDPHAQWGDKFRQMVLVCSTYDDCMINVVEVELK. The pKi is 7.5. (4) The drug is CC1=N[C@@H]2[C@@H](O)[C@@H](O)[C@@H](CO)O[C@]2(CNC(=O)OC(C)(C)C)S1. The target protein sequence is MTTGAAPDRKAPVRPTPLDRVIPAPASVDPGGAPYRITRGTHIRVDDSREARRVGDYLADLLRPATGYRLPVTAHGHGGIRLRLAGGPYGDEGYRLDSGPAGVTITARKAAGLFHGVQTLRQLLPPAVEKDSAQPGPWLVAGGTIEDTPRYAWRSAMLDVSRHFFGVDEVKRYIDRVARYKYNKLHLHLSDDQGWRIAIDSWPRLATYGGSTEVGGGPGGYYTKAEYKEIVRYAASRHLEVVPEIDMPGHTNAALASYAELNCDGVAPPLYTGTKVGFSSLCVDKDVTYDFVDDVIGELAALTPGRYLHIGGDEAHSTPKADFVAFMKRVQPIVAKYGKTVVGWHQLAGAEPVEGALVQYWGLDRTGDAEKAEVAEAARNGTGLILSPADRTYLDMKYTKDTPLGLSWAGYVEVQRSYDWDPAGYLPGAPADAVRGVEAPLWTETLSDPDQLDYMAFPRLPGVAELGWSPASTHDWDTYKVRLAAQAPYWEAAGIDFYRS.... The pKi is 2.7. (5) The compound is CNC(=O)[C@H]1O[C@@H](n2cnc3c(NCc4cccc(I)c4)ncnc32)[C@H](O)[C@@H]1O. The target protein (Q60613) has sequence MGSSVYIMVELAIAVLAILGNVLVCWAVWINSNLQNVTNFFVVSLAAADIAVGVLAIPFAITISTGFCAACHGCLFFACFVLVLTQSSIFSLLAIAIDRYIAIRIPLRYNGLVTGMRAKGIIAICWVLSFAIGLTPMLGWNNCSQKDENSTKTCGEGRVTCLFEDVVPMNYMVYYNFFAFVLLPLLLMLAIYLRIFLAARRQLKQMESQPLPGERTRSTLQKEVHAAKSLAIIVGLFALCWLPLHIINCFTFFCSTCQHAPPWLMYLAIILSHSNSVVNPFIYAYRIREFRQTFRKIIRTHVLRRQEPFRAGGSSAWALAAHSTEGEQVSLRLNGHPLGVWANGSAPHSGRRPNGYTLGPGGGGSTQGSPGDVELLTQEHQEGQEHPGLGDHLAQGRVGTASWSSEFAPS. The pKi is 6.0. (6) The target protein (P01275) has sequence MKSIYFVAGLFVMLVQGSWQRSLQDTEEKSRSFSASQADPLSDPDQMNEDKRHSQGTFTSDYSKYLDSRRAQDFVQWLMNTKRNRNNIAKRHDEFERHAEGTFTSDVSSYLEGQAAKEFIAWLVKGRGRRDFPEEVAIVEELGRRHADGSFSDEMNTILDNLAARDFINWLIQTKITDRK. The compound is CC1(C)CC(C(Oc2cnc(-n3cc(C(F)(F)F)cn3)nc2)c2ccc(C(=O)NCCC(=O)O)cc2)C1. The pKi is 7.0. (7) The drug is CCOc1cc(=O)n(C)cc1-c1cc(NCC2CC2)ccc1Oc1ccc(F)cc1F. The target protein sequence is KRQTNQLQYLLRVVLKTLWKHQFAWPFQQPVDAVKLNLPDYYKIIKTPMDMGTIKKRLENNYYWNAQECIQDFNTMFTNCYIYNKPGDDIVLMAEALEKLFLQKINELPTEE. The pKi is 6.7.