Dataset: Drug-target binding data from BindingDB using IC50 measurements. Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50. (1) The compound is COc1ccc(-c2nnn(CC(=O)N(C3CCCCC3)C(C(=O)NC3CCCC3)C(C)C)n2)cc1OC. The target protein (Q61009) has sequence MGGSSRARWVALGLGALGLLFAALGVVMILMVPSLIKQQVLKNVRIDPSSLSFGMWKEIPVPFYLSVYFFEVVNPNEVLNGQKPVVRERGPYVYREFRQKVNITFNDNDTVSFVENRSLHFQPDKSHGSESDYIVLPNILVLGGSILMESKPVSLKLMMTLALVTMGQRAFMNRTVGEILWGYDDPFVHFLNTYLPDMLPIKGKFGLFVGMNNSNSGVFTVFTGVQNFSRIHLVDKWNGLSKIDYWHSEQCNMINGTSGQMWAPFMTPESSLEFFSPEACRSMKLTYNESRVFEGIPTYRFTAPDTLFANGSVYPPNEGFCPCRESGIQNVSTCRFGAPLFLSHPHFYNADPVLSEAVLGLNPNPKEHSLFLDIHPVTGIPMNCSVKMQLSLYIKSVKGIGQTGKIEPVVLPLLWFEQSGAMGGKPLSTFYTQLVLMPQVLHYAQYVLLGLGGLLLLVPIICQLRSQEKCFLFWSGSKKGSQDKEAIQAYSESLMSPAAK.... The pIC50 is 4.6. (2) The pIC50 is 8.0. The drug is CN(C)CCCOC(=O)C(C)(c1ccccc1)C1CCCCC1. The target protein (P00689) has sequence MKFVLLLSLIGFCWAQYDPHTADGRTAIVHLFEWRWADIAKECERYLAPKGFGGVQVSPPNENIIINNPSRPWWERYQPISYKICSRSGNENEFKDMVTRCNNVGVRIYVDAVINHMCGSGNSAGTHSTCGSYFNPNNREFSAVPYSAWYFNDNKCNGEINNYNDANQVRNCRLSGLLDLALDKDYVRTKVADYMNNLIDIGVAGFRLDAAKHMWPGDIKAVLDKLHNLNTKWFSQGSRPFIFQEVIDLGGEAIKGSEYFGNGRVTEFKYGAKLGTVIRKWNGEKMSYLKNWGEGWGFVPTDRALVFVDNHDNQRGHGAGGASILTFWDARMYKMAVGFMLAHPYGFTRVMSSYRRTRNFQNGKDVNDWIGPPNNNGVTKEVTINPDTTCGNDWVCEHRWRQIRNMVAFRNVVNGQPFANWWDNGSNQVAFSRGNRGFIVFNNDDWALSSTLQTGLPAGTYCDVISGDKVNGNCTGLKVNVGSDGKAHFSISNSAEDPFI.... (3) The small molecule is CC(=O)N[C@@H](Cc1ccc(O)cc1)C(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@@H](CCCN=C(N)N)C(=O)COC(=O)c1c(C)cccc1C. The target protein sequence is MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRNHTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCWTFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNGQCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHAVLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV. The pIC50 is 4.9. (4) The drug is Cn1nc(Nc2nc(N[C@@H](CO)c3ccncc3Cl)nc3n[nH]cc23)cc1C(C)(C)C. The target protein (P17952) has sequence MNTQQLAKLRSIVPEMRRVRHIHFVGIGGAGMGGIAEVLANEGYQISGSDLAPNPVTQQLMNLGATIYFNHRPENVRDASVVVVSSAISADNPEIVAAHEARIPVIRRAEMLAELMRFRHGIAIAGTHGKTTTTAMVSSIYAEAGLDPTFVNGGLVKAAGVHARLGHGRYLIAEADESDASFLHLQPMVAIVTNIEADHMDTYQGDFENLKQTFINFLHNLPFYGRAVMCVDDPVIRELLPRVGRQTTTYGFSEDADVRVEDYQQIGPQGHFTLLRQDKEPMRVTLNAPGRHNALNAAAAVAVATEEGIDDEAILRALESFQGTGRRFDFLGEFPLEPVNGKSGTAMLVDDYGHHPTEVDATIKAARAGWPDKNLVMLFQPHRFTRTRDLYDDFANVLTQVDTLLMLEVYPAGEAPIPGADSRSLCRTIRGRGKIDPILVPDPARVAEMLAPVLTGNDLILVQGAGNIGKIARSLAEIKLKPQTPEEEQHD. The pIC50 is 7.2. (5) The small molecule is O=C(NCCCl)NNc1c(Cl)cc(Cl)cc1Cl. The pIC50 is 4.5. The target protein (O60755) has sequence MADAQNISLDSPGSVGAVAVPVVFALIFLLGTVGNGLVLAVLLQPGPSAWQEPGSTTDLFILNLAVADLCFILCCVPFQATIYTLDAWLFGALVCKAVHLLIYLTMYASSFTLAAVSVDRYLAVRHPLRSRALRTPRNARAAVGLVWLLAALFSAPYLSYYGTVRYGALELCVPAWEDARRRALDVATFAAGYLLPVAVVSLAYGRTLRFLWAAVGPAGAAAAEARRRATGRAGRAMLAVAALYALCWGPHHALILCFWYGRFAFSPATYACRLASHCLAYANSCLNPLVYALASRHFRARFRRLWPCGRRRRHRARRALRRVRPASSGPPGCPGDARPSGRLLAGGGQGPEPREGPVHGGEAARGPE. (6) The compound is O=C(NCCOCCOCCNc1nc(Nc2ccccc2)nc(Nc2ccc(O)cc2)n1)c1ccccc1. The target protein (Q00910) has sequence MGLLLKPGARQGSGTSSVPDRRCPRSVFSNIKVFVLCHGLLQLCQLLYSAYFKSSLTTIEKRFGLSSSSSGLISSLNEISNATLIIFISYFGSRVNRPRMIGIGGLLLAAGAFVLTLPHFLSEPYQYTSTTDGNRSSFQTDLCQKHFGALPPSKCHSTVPDTHKETSSLWGLMVVAQLLAGIGTVPIQPFGISYVDDFAEPTNSPLYISILFAIAVFGPAFGYLLGSVMLRIFVDYGRVDTATVNLSPGDPRWIGAWWLGLLISSGFLIVTSLPFFFFPRAMSRGAERSVTAEETMQTEEDKSRGSLMDFIKRFPRIFLRLLMNPLFMLVVLSQCTFSSVIAGLSTFLNKFLEKQYGATAAYANFLIGAVNLPAAALGMLFGGILMKRFVFPLQTIPRVAATIITISMILCVPLFFMGCSTSAVAEVYPPSTSSSIHPQQPPACRRDCSCPDSFFHPVCGDNGVEYVSPCHAGCSSTNTSSEASKEPIYLNCSCVSGGSA.... The pIC50 is 5.3. (7) The small molecule is COC(=O)[C@H](Cc1c[nH]c2ccccc12)NC(=O)[C@H](O)[C@@H](N)CCCCN. The target protein (P15684) has sequence MAKGFYISKTLGILGILLGVAAVCTIIALSVVYAQEKNRNAENSAIAPTLPGSTSATTSTTNPAIDESKPWNQYRLPKTLIPDSYQVTLRPYLTPNEQGLYIFKGSSTVRFTCNETTNVIIIHSKKLNYTNKGNHRVALRALGDTPAPNIDTTELVERTEYLVVHLQGSLVKGHQYEMDSEFQGELADDLAGFYRSEYMEGGNKKVVATTQMQAADARKSFPCFDEPAMKASFNITLIHPNNLTALSNMLPKDSRTLQEDPSWNVTEFHPTPKMSTYLLAYIVSEFKYVEAVSPNRVQIRIWARPSAIDEGHGDYALQVTGPILNFFAQHYNTAYPLEKSDQIALPDFNAGAMENWGLVTYRESALVFDPQSSSISNKERVVTVIAHELAHQWFGNLVTVDWWNDLWLNEGFASYVEFLGADYAEPTWNLKDLIVLNDVYRVMAVDALASSHPLSSPANEVNTPAQISELFDSITYSKGASVLRMLSSFLTEDLFKKGLS.... The pIC50 is 3.0. (8) The drug is C[C@H]1[C@H](NC(=O)/C(=N\OC(C)(C)C(=O)O)c2csc(N)n2)C(=O)N1S(=O)(=O)O. The target protein sequence is MMKKSLCCALLLTASFSTFAAAKTEQQIADIVNRTITPLMQEQAIPGMAVAVIYQGKPYYFTWGKADIANNHPVTQQTLFELGSVSKTFNGVLGGDCIARGEIKLSDPVTKYWPELTGKQWQGIRLLHLATYTAGGLPLQIPDDVRDKAALLHFYQNWQPQWTPGAKRLYANSSIGLFGALAVKPSGMSYEEAMTRRVLQPLKLAHTWITVPENEQKDYAWGYREGKPVHVSPGQLDAEAYGVKSSVIDMARWVQANMDASHVQEKTLQQGIALAQSRYWRIGDMYQGLGWEMLNWPLKADSIINGSDSKVALAALPAVEVNPPAPAVKASWVHKTGSTGGFGSYVAFVPEKNLGIVMLANKSYPNPVRVEAAWRILEKLQ. The pIC50 is 8.2. (9) The drug is O=C(/C=C/c1ccc(CN(CCO)CCc2c[nH]c3ccccc23)cc1)NO. The target protein (Q96DB2) has sequence MLHTTQLYQHVPETRWPIVYSPRYNITFMGLEKLHPFDAGKWGKVINFLKEEKLLSDSMLVEAREASEEDLLVVHTRRYLNELKWSFAVATITEIPPVIFLPNFLVQRKVLRPLRTQTGGTIMAGKLAVERGWAINVGGGFHHCSSDRGGGFCAYADITLAIKFLFERVEGISRATIIDLDAHQGNGHERDFMDDKRVYIMDVYNRHIYPGDRFAKQAIRRKVELEWGTEDDEYLDKVERNIKKSLQEHLPDVVVYNAGTDILEGDRLGGLSISPAGIVKRDELVFRMVRGRRVPILMVTSGGYQKRTARIIADSILNLFGLGLIGPESPSVSAQNSDTPLLPPAVP. The pIC50 is 8.2. (10) The compound is C=C1NC(=O)C(C)C(CCC(C)C(=O)C=CC(C)=CCC(C)CCCCCCC)OC(=O)[C@H](CC(OS(=O)(=O)O)C(N)=O)NC(=O)[C@@H](C)CNC1=O. The target protein (P00640) has sequence MKELKLKEAKEILKALGLPPQQYNDRSGWVLLALANIKPEDSWKEAKAPLLPTVSIMEFIRTEYGKDYKPNSRETIRRQTLHQFEQARIVDRNRDLPSRATNSKDNNYSLNQVIIDILHNYPNGNWKELIQQFLTHVPSLQELYERALARDRIPIKLLDGTQISLSPGEHNQLHADIVHEFCPRFVGDMGKILYIGDTASSRNEGGKLMVLDSEYLKKLGVPPMSHDKLPDVVVYDEKRKWLFLIEAVTSHGPISPKRWLELEAALSSCTVGKVYVTAFPTRTEFRKNAANIAWETEVWIADNPDHMVHFNGDRFLGPHDKKPELS. The pIC50 is 4.7.