Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pKi (pKi = -log10(Ki in M); higher means stronger inhibition). Dataset: bindingdb_ki.. Dataset: Drug-target binding data from BindingDB using Ki measurements (1) The compound is O=C(O)c1ncccc1S. The target protein (P07379) has sequence MPPQLHNGLDFSAKVIQGSLDSLPQEVRKFVEGNAQLCQPEYIHICDGSEEEYGRLLAHMQEEGVIRKLKKYDNCWLALTDPRDVARIESKTVIITQEQRDTVPIPKSGQSQLGRWMSEEDFEKAFNARFPGCMKGRTMYVIPFSMGPLGSPLAKIGIELTDSPYVVASMRIMTRMGTSVLEALGDGEFIKCLHSVGCPLPLKKPLVNNWACNPELTLIAHLPDRREIISFGSGYGGNSLLGKKCFALRIASRLAKEEGWLAEHMLILGITNPEGKKKYLAAAFPSACGKTNLAMMNPTLPGWKVECVGDDIAWMKFDAQGNLRAINPENGFFGVAPGTSVKTNPNAIKTIQKNTIFTNVAETSDGGVYWEGIDEPLAPGVTITSWKNKEWRPQDEEPCAHPNSRFCTPASQCPIIDPAWESPEGVPIEGIIFGGRRPAGVPLVYEALSWQHGVFVGAAMRSEATAAAEHKGKVIMHDPFAMRPFFGYNFGKYLAHWLSM.... The pKi is 3.9. (2) The small molecule is O=[N+]([O-])[O-]. The target protein sequence is MKKTFLIALALAASLIGAENTKWDYKNKENGPHRWDKLHKDFEVCKSGKSQSPINIEHYYHTQDKTDLQFKYAASKPKAVFFTHHTLKASFEPTNHINYRGHDYVLDNVHFHAPMEFLINNKTRPLSAHFVHKDAKGRLLVLAIGFEEGKENPNLDPILEDIQKKQNFKEVALDAFLPKTINYYHFNGSLTAPPCTEGVAWFVIEEPLEVSAKQLAEIKKRMKNSPNQRPVQPDYNTVIIKSSAETR. The pKi is 3.1. (3) The compound is OCCCN1CCN(Cc2cccnc2)CC1. The target protein (P28570) has sequence MAKKSAENGIYSVSGDEKKGPLIVSGPDGAPSKGDGPAGLGAPSSRLAVPPRETWTRQMDFIMSCVGFAVGLGNVWRFPYLCYKNGGGVFLIPYVLIALVGGIPIFFLEISLGQFMKAGSINVWNICPLFKGLGYASMVIVFYCNTYYIMVLAWGFYYLVKSFTTTLPWATCGHTWNTPDCVEIFRHEDCANASLANLTCDQLADRRSPVIEFWENKVLRLSTGLEVPGALNWEVTLCLLACWVLVYFCVWKGVKSTGKIVYFTATFPYVVLVVLLVRGVLLPGALDGIIYYLKPDWSKLGSPQVWIDAGTQIFFSYAIGLGALTALGSYNRFNNNCYKDAIILALINSGTSFFAGFVVFSILGFMATEQGVHISKVAESGPGLAFIAYPRAVTLMPVAPLWAALFFFMLLLLGLDSQFVGVEGFITGLLDLLPASYYFRFQREISVALCCALCFVIDLSMVTDGGMYVFQLFDYYSASGTTLLWQAFWECVVVAWVYGA.... The pKi is 2.0. (4) The small molecule is CC[C@H](C)[C@H](NC(=O)[C@H](Cc1ccc(O)cc1)NC(=O)[C@H](Cc1c[nH]cn1)NC(=O)[C@H](CCCN=C(N)N)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](C)NC(=O)[C@H](CO)NC(=O)[C@H](Cc1ccc(O)cc1)NC(=O)[C@H](Cc1ccc(O)cc1)NC(=O)[C@H](CCCN=C(N)N)NC(=O)[C@H](C)NC(=O)[C@H](CCSC)NC(=O)[C@H](CC(=O)O)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](C)NC(=O)[C@@H]1CCCN1C(=O)[C@H](C)NC(=O)[C@H](CC(=O)O)NC(=O)[C@H](CCC(=O)O)NC(=O)CNC(=O)[C@@H]1CCCN1C(=O)[C@H](CC(N)=O)NC(=O)[C@H](CC(=O)O)NC(=O)[C@@H]1CCCN1C(=O)[C@H](CCCCN)NC(=O)[C@H](CO)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](N)Cc1ccc(O)cc1)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@H](C(=O)N[C@H](CC1CNc2ccccc21)C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](Cc1ccc(O)cc1)C(N)=O)[C@@H](C)CC. The target protein (P21555) has sequence MNSTLFSRVENYSVHYNVSENSPFLAFENDDCHLPLAVIFTLALAYGAVIILGVSGNLALIIIILKQKEMRNVTNILIVNLSFSDLLVAVMCLPFTFVYTLMDHWVFGETMCKLNPFVQCVSITVSIFSLVLIAVERHQLIINPRGWRPNNRHAYIGITVIWVLAVASSLPFVIYQILTDEPFQNVSLAAFKDKYVCFDKFPSDSHRLSYTTLLLVLQYFGPLCFIFICYFKIYIRLKRRNNMMDKIRDSKYRSSETKRINVMLLSIVVAFAVCWLPLTIFNTVFDWNHQIIATCNHNLLFLLCHLTAMISTCVNPIFYGFLNKNFQRDLQFFFNFCDFRSRDDDYETIAMSTMHTDVSKTSLKQASPVAFKKISMNDNEKI. The pKi is 6.0. (5) The pKi is 6.7. The compound is N#Cc1ccc(C(=O)O)cc1. The target protein sequence is MPLFSFEGRSPRIDPTAFVAPTATLIGDVTIEAGASVWFNAVLRGDYAPVVVREGANVQDGAVLHAPPGIPVDIGPGATVAHLCVIHGVHVGSEALIANHATVLDGAVIGARCMIAAGALVVAGTQIPAGMLVTGAPAKVKGPIEGTGAEMWVNVNPQAYRDLAARHLAGLEPM. (6) The compound is CCc1nc(N)nc(N)c1-c1ccc(C)cc1. The target protein (O02604) has sequence MEDLSDVFDIYAICACCKVAPTSEGTKNEPFSPRTFRGLGNKGTLPWKCNSVDMKYFSSVTTYVDESKYEKLKWKRERYLRMEASQGGGDNTSGGDNTHGGDNADKLQNVVVMGRSSWESIPKQYKPLPNRINVVLSKTLTKEDVKEKVFIIDSIDDLLLLLKKLKYYKCFIIGGAQVYRECLSRNLIKQIYFTRINGAYPCDVFFPEFDESQFRVTSVSEVYNSKGTTLDFLVYSKVGGGVDGGASNGSTATALRRTAMRSTAMRRNVAPRTAAPPMGPHSRANGERAPPRARARRTTPRQRKTTSCTSALTTKWGRKTRSTCKILKFTTASRLMQHPEYQYLGIIYDIIMNGNKQGDRTGVGVMSNFGYMMKFNLSEYFPLLTTKKLFLRGIIEELLWFIRGETNGNTLLNKNVRIWEANGTREFLDNRKLFHREVNDLGPIYGFQWRHFGAEYTNMHDNYEDKGVDQLKNVIHLIKNEPTSRRIILCAWNVKDLDQM.... The pKi is 9.3. (7) The small molecule is CC(=O)N[C@H]1CS(=O)(=O)[C@@H]2[C@@H](C(=O)O)C[C@@H](N)[C@@H]21. The target protein sequence is MNPNQKIITIGSISIAIGIISLMLQIGNIISIWASHSIQTGSQNNTGICNQRIITYENSTWVNHTYVNINNTNVVAGEDKTSVTLAGNSSLCSISGWAIYTKDNSIRIGSKGDVFVIREPFISCSHLECRTFFLTQGALLNDKHSNGTVKDRSPYRALMSCPLGEAPSPYNSKFESVAWSASACHDGMGWLTIGISGPDNGAVAVLKYNGIITGTIKSWKKQILRTQESECVCMNGSCFTIMTDGPSNKAASYKIFKIEKGKVTKSIELNAPNFHYEECSCYPDTGIVMCVCRDNWHGSNRPWVSFNQNLDYQIGYICSGVFGDNPRPEDGEGSCNPVTVDGANGVKGFSYKYDNGVWIGRTKSNRLRKGFEMIWDPNGWTNTDSDFSVKQDVVAITDWSGYSGSFVQHPELTGLDCIRPCFWVELVRGLPRENTTIWTSGSSISFCGVNSDTANWSWPDGAELPFTIDK. The pKi is 4.7. (8) The pKi is 6.9. The target protein sequence is MEENTFGVQQIQPNVISVRLFKRKVGGLGFLVKERVSKPPVIISDLIRGGAAEQSGLIQAGDIILAVNDRPLVDLSYDSALEVLRGIASETHVVLILRGPEGFTTHLETTFTGDGTPKTIRVTQPLGPPTKAVDLSHQPSASKDQSLAVDRVTGLGNGPQHAQGHGQGAGSVSQANGVAIDPTMKSTKANLQDIGEHDELLKEIEPVLSILNSGSKATNRGGPAKAEMKDTGIQVDRDLDGKSHKAPPLGGDNDRVFNDLWGKDNVPVILNNPYSEKEQSPTSGKQSPTKNGSPSRCPRFLKVKNWETDVVLTDTLHLKSTLETGCTEHICMGSIMLPSQHTRKPEDVRTKDQLFPLAKEFLDQYYSSIKRFGSKAHMDRLEEVNKEIESTSTYQLKDTELIYGAKHAWRNASRCVGRIQWSKLQVFDARDCTTAHGMFNYICNHVKYATNKGNLRSAITIFPQRTDGKHDFRVWNSQLIRYAGYKQPDGSTLGDPANVQ.... The small molecule is CNCCN(C)c1cncc(CCc2cc(C)cc(N)n2)c1. (9) The drug is COc1ccc(/C=C2\S/C(=N\c3ccccc3)N(Cc3ccco3)C2=O)cc1. The target protein sequence is MSNIHDVVIIGSGPAAHTAAIYLGRSSLKPVMYEGFMAGGVAAGGQLTTTTIIENFPGFPNGIDGNELMMNMRTQSEKYGTTIITETIDHVDFSTQPFKLFTEEGKEVLTKSVIIATGATAKRMHVPGEDKYWQNGVSACAICDGAVPIFRNKVLMVVGGGDAAMEEALHLTKYGSKVIILHRRDAFRASKTMQERVLNHPKIEVIWNSELVELEGDGDLLNGAKIHNLVSGEYKVVPVAGLFYAIGHSPNSKFLGGQVKTADDGYILTEGPKTSVDGVFACGDVCDRVYRQAIVAAGSGCMAALSCEKWLQTH. The pKi is 5.7.