Dataset: Drug-target binding data from BindingDB using IC50 measurements. Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50. (1) The drug is [N-]=[N+]=Nc1ccc(-c2cc(C(=O)O)c3cc(C(=O)c4ccccc4)ccc3n2)cc1. The target protein (O78749) has sequence MFINRWLFSTNHKDIGTLYLLFGAWAGMVGTALSLLIRAELGQPGTLLGDDQIYNVIVTAHAFVMIFFMVMPIMIGGFGNWLVPLMIGAPDMAFPRMNNMSFWLLPPSFLLLLASSMVEAGAGTGWTVYPPLAGNLAHAGASVDLTIFSLHLAGVSSILGAINFITTIINMKPPAMSQYQTPLFVWSVLITAVLLLLSLPVLAAGITMLLTDRNLNTTFFDPAGGGDPILYQHLFWFFGHPEVYILILPGFGMISHIVTYYSGKKEPFGYMGMVWAMMSIGFLGFIVWAHHMFTVGMDVDTRAYFTSATMIIAIPTGVKVFSWLATLHGGNIKWSPAMMWALGFIFLFTVGGLTGIVLANSSLDIVLHDTYYVVAHFHYVLSMGAVFAIMGGFVHWFPLFSGYTLNDTWAKIHFAIMFVGVNMTFFPQHFLGLSGMPRRYSDYPDAYTMWNTISSMGSFISLTAVMLMIFIIWEAFASKREVLTVDLTTTNLEWLNGCPP.... The pIC50 is 4.0. (2) The compound is Cc1cccc(COc2ccccc2CN(CC[C@H](N)C(=O)O)Cc2ccccc2OCc2cccc(C)c2)c1. The target protein sequence is MAVDPPKADPKGVVAVDPTANCGSGLKSREDQGAKAGGCCSSRDQVCRCLRANLLVLLTVAAAVAGVVLGLGVSAAGGAEALGHARFTAFAFPGELLLRLLEMIILPLVVCSLIGGAASLDPSALGRLGAWALLFFLVTTLLSSALGVALALALKPGAAFAAINSSVVDSSVHRAPTKEVLDSFLELLRNMFPSNLVSASAAFRIPCGACPQRSNATMDQPHCEMKMNILGLVVFAIVFGVALRKLGPEGELLIRFFNSFNDATMVLVSWIMWYAPIGILFLVAGKIVEMKDIRQLFIGLGKYIVCCLLGHAIHGLLVLPLIYFLFTRKNPYRFLWGIVTPLATAFGTSSSSATLPLMMKCVEEKNGVAKHISRFILPIGATVNMDGAALFQCVAAVFIAQLNGMSLDFVKIITILVTATASSVGAAGIPAGGVLTLAIILEAISLPVKDISLILAVDWLVDRSCTVLNVEGDAFGAGLLQSYVDRTKMPSSEPELIQVK.... The pIC50 is 5.0. (3) The compound is COc1nn(C)cc1Nc1nccc(-c2c[nH]c3c(NC(=O)[C@@H](C)N4CCN(C)[C@H](C)C4)cccc23)n1. The target protein sequence is CQDPTIFEERHLKYISQLGKGNFGSVELCRYDPLGDNTGALVAVKQLQHSGPDQQRDFQREIQILKALHSDFIVKYRGVSYGPGRQSLRLVMEYLPSGCLRDFLQRHRARLDASRLLLYSSQICKGMEYLGSRRCVHRDLAARNILVESEAHVKIADFGLAKLLPLDKDYYVVREPGQSPIFWYAPESLSDNIFSRQSDVWSFGVVLYELFTYCDKSCSPSAEFLRMMGCERDVPALCRLLELLEEGQRLPAPPACPAEVHELMKLCWAPSPQDRPSFSALGPQLDMLWSGSRGCETHAFTAHPEGKHHSLSFS. The pIC50 is 4.5. (4) The small molecule is CC(=O)N[C@@H](Cc1c[nH]c2ccccc12)C(=O)N[C@@H](Cc1c[nH]c2ccccc12)C(=O)N[C@H](C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](Cc1ccc(O)cc1)C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](C)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(C)C)C(=O)NCC(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](C)C(=O)N[C@@H](CCCN=C(N)N)C(=O)O)[C@@H](C)O. The target protein (Q16581) has sequence MASFSAETNSTDLLSQPWNEPPVILSMVILSLTFLLGLPGNGLVLWVAGLKMQRTVNTIWFLHLTLADLLCCLSLPFSLAHLALQGQWPYGRFLCKLIPSIIVLNMFASVFLLTAISLDRCLVVFKPIWCQNHRNVGMACSICGCIWVVAFVMCIPVFVYREIFTTDNHNRCGYKFGLSSSLDYPDFYGDPLENRSLENIVQPPGEMNDRLDPSSFQTNDHPWTVPTVFQPQTFQRPSADSLPRGSARLTSQNLYSNVFKPADVVSPKIPSGFPIEDHETSPLDNSDAFLSTHLKLFPSASSNSFYESELPQGFQDYYNLGQFTDDDQVPTPLVAITITRLVVGFLLPSVIMIACYSFIVFRMQRGRFAKSQSKTFRVAVVVVAVFLVCWTPYHIFGVLSLLTDPETPLGKTLMSWDHVCIALASANSCFNPFLYALLGKDFRKKARQSIQGILEAAFSEELTRSTHCPSNNVISERNSTTV. The pIC50 is 7.6. (5) The compound is COCCN1CCN(C[C@@H]2Cc3ccccc3CN2C(=O)c2cc3c(cc2-c2cc(C(=O)N(c4ccccc4)c4ccc(O)cc4)c4n2CCCC4)OCO3)CC1. The target protein (P10417) has sequence MAQAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDADAAPLGAAPTPGIFSFQPESNPMPAVHRDMAARTSPLRPLVATAGPALSPVPPVVHLTLRRAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRDGVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHK. The pIC50 is 8.0.