Dataset: Drug-target binding data from BindingDB using IC50 measurements. Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50. (1) The drug is COc1cc(C(=O)OCCCN2CCN(CCCOC(=O)c3cc(OC)c(OC)c(OC)c3)CC2)cc(OC)c1OC. The target protein (O54699) has sequence MAHGNAPRDSYHLVGISFFILGLGTLLPWNFFITAIPYFQGRLAGTNSSAETPSTNHTSPTDTFNFNNWVTLLSQLPLLLFTLLNSFLYQCIPESVRILGSLLAILLLFALTAALVKVDLSPGLFFSITMASVWFINSFCAVLQGSLFGQLGTMPSTYSTLFLSGQGLAGIFAALAMLTSLASGVDPQTSALGYFITPCVGILLSIICYLSLPHLKFARYYLTKKPQAPVQELETKAELLGADEKNGIPVSPQQAGPTLDLDPEKELELGLEEPQKPGKPSVFVVFRKIWLTALCLVLVFTVTLSVFPAITAMVTTSSNSPGKWSQFFNPICCFLLFNVMDWLGRSLTSYFLWPDEDSQLLPLLVCLRFLFVPLFMLCHVPQRARLPIIFWQDAYFITFMLLFAISNGYFVSLTMCLAPRQVLPHEREVAGALMTFFLALGLSCGASLSFLFKALL. The pIC50 is 5.9. (2) The drug is NCCOc1cccc(OCC2CCCCC2)c1. The target protein (Q28175) has sequence MSSQVEHPAGGYKKLFETVEELSSPLTAHVTGRIPLWLTGSLLRCGPGLFEVGSEPFYHLFDGQALLHKFDFKEGHVTYHRRFIRTDAYVRAMTEKRIVITEFGTCAFPDPCKNIFSRFFSYFRGVEVTDNALVNIYPVGEDYYACTETNFITKVNPETLETIKQVDLCNYVSVNGATAHPHIENDGTVYNIGNCFGKNFSIAYNIVKIPPLQADKEDPISKSEIVVQFPCSDRFKPSYVHSFGLTPNYIVFVETPVKINLFKFLSSWSLWGANYMDCFESNETMGVWLHIADKKRKKYINNKYRTSPFNLFHHINTYEDHEFLIVDLCCWKGFEFVYNYSYLANLRENWEEVKKNARKAPQPEVRRYVLPLNIDKADTGKNLVTLPNTTATAILCSDETIWLEPEVLFSGPRQAFEFPQINYQKYGGKPYTYAYGLGLNHFVPDRLCKLNVKTKETWVWQEPDSYPSEPIFVSHPDALEEDDGVVLSVVVSPGAGQKPA.... The pIC50 is 6.0. (3) The pIC50 is 4.4. The target protein (P43378) has sequence MEPATAPRPDMAPELTPEEEQATKQFLEEINKWTVQYNVSPLSWNVAVKFLMARKFDVLRAIELFHSYRETRRKEGIVKLKPHEEPLRSEILSGKFTILNVRDPTGASIALFTARLHHPHKSVQHVVLQALFYLLDRAVDSFETQRNGLVFIYDMCGSNYANFELDLGKKVLNLLKGAFPARLKKVLIVGAPIWFRVPYSIISLLLKDKVRERIQILKTSEVTQHLPRECLPENLGGYVKIDLATWNFQFLPQVNGHPDPFDEIILFSLPPALDWDSVHVPGPHAMTIQELVDYVNARQKQGIYEEYEDIRRENPVGTFHCSMSPGNLEKNRYGDVPCLDQTRVKLTKRSGHTQTDYINASFMDGYKQKNAYIGTQGPLENTYRDFWLMVWEQKVLVIVMTTRFEEGGRRKCGQYWPLEKDSRIRFGFLTVTNLGVENMNHYKKTTLEIHNTEERQKRQVTHFQFLSWPDYGVPSSAASLIDFLRVVRNQQSLAVSNMGA.... The compound is Cc1cccc(CSc2nnc(NC(=O)CSc3nc4ccc(N5C(=O)c6ccccc6C5=O)cc4s3)s2)c1. (4) The small molecule is CCN(c1cc(C#CC2CCN(C)CC2)cc(C(=O)NCc2c(C)cc(C)[nH]c2=O)c1C)C1CCOCC1. The target protein sequence is ATKAARKSAPATGGVKKPHRYRPGGK. The pIC50 is 8.1. (5) The small molecule is COCc1cccc(COC)c1CN. The target protein (Q9TRC7) has sequence MGRGTLALGWAGAALLLLQMLAAAERSPRTPGGKAGVFADLSAQELKAVHSFLWSQKELKLEPSGTLTMAKNSVFLIEMLLPKKQHVLKFLDKGHRRPVREARAVIFFGAQEQPNVTEFAVGPLPTPRYMRDLPPRPGHQVSWASRPISKAEYALLSHKLQEATQPLRQFFRRTTGSSFGDCHEQCLTFTDVAPRGLASGQRRTWFILQRQMPGYFLHPTGLELLVDHGSTNAQDWTVEQVWYNGKFYRSPEELAQKYNDGEVDVVILEDPLAKGKDGESLPEPALFSFYQPRGDFAVTMHGPHVVQPQGPRYSLEGNRVMYGGWSFAFRLRSSSGLQILDVHFGGERIAYEVSVQEAVALYGGHTPAGMQTKYIDVGWGLGSVTHELAPDIDCPETATFLDALHHYDADGPVLYPRALCLFEMPTGVPLRRHFNSNFSGGFNFYAGLKGQVLVLRTTSTVYNYDYIWDFIFYPNGVMEAKMHATGYVHATFYTPEGLRY.... The pIC50 is 3.0. (6) The compound is N#Cc1ccc2cc3c(=O)[nH]c(=O)nc-3n(-c3ccccc3)c2c1. The target protein sequence is MASGSSSDAAEPAGPAGRAASAPEAAQAEEDRVKRRRLQCLGFALVGGCDPTMVPSVLRENDWQTQKALSAYFELPENDQGWPRQPPTSFKSEAYVDLTNEDANDTTILEASPSGTPLEDSSTISFITWNIDGLDGCNLPERARGVCSCLALYSPDVVFLQEVIPPYCAYLKKRAASYTIITGNEEGYFTAILLKKGRVKFKSQEIIPFPNTKMMRNLLCVNVSLGGNEFCLMTSHLESTRGHAAERIRQLKTVLGKMQEAPDSTTVIFAGDTNLRDREVTRCGGLPDNVFDAWEFLGKPKHCQYTWDTKANNNLGITAACKLRFDRIFFRAEEGHLIPQSLDLVGLEKLDCGRFPSDHWGLLCTLNVVL. The pIC50 is 6.6. (7) The drug is CN[C@@H](C)C(=O)N[C@H](C(=O)N1c2ncccc2C[C@H]1CNC(=O)N(C)C)C(C)C. The target protein sequence is MRHHHHHHRDHFALDRPSETHADYLLRTGQVVDISDTIYPRNPAMYSEEARLKSFQNWPDYAHLTPRELASAGLYYTGIGDQVQCFACGGKLKNWEPGDRAWSEHRRHEPNCFFVLGRNLNIRSE. The pIC50 is 8.2.