Dataset: Drug-target binding data from BindingDB using IC50 measurements. Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50. (1) The compound is O=C(O)[C@H]1/C(=C/CO)O[C@@H]2CC(=O)N21. The target protein sequence is MSKKNFILIFIFVILISCKNTEKISNETTLIDNIFTNSNAEGTLVIYNLNDDKYIIHNKERAEQRFYPASTFKIYNSLIGLNEKAVKDVDEVFYKLMAKSFLESWAKDSNLRYAIKNSQVPAYKELARRIGIKKMKENIEKLDFGNKSIGDSVDTFWLEGPLEISAMEQVKLLTKLAQNELQYPIEIQKAISDITITRANLHITLHGKTGLADSKNMTTEPIGWFVGWLEENDNIYVFALNIDNINSDDLAKRINIVKESLKALNLLK. The pIC50 is 5.7. (2) The drug is CCCCC1=C(OC)C(OC)=CC(=O)C1=O. The target protein (Q2M385) has sequence MNNFRATILFWAAAAWAKSGKPSGEMDEVGVQKCKNALKLPVLEVLPGGGWDNLRNVDMGRVMELTYSNCRTTEDGQYIIPDEIFTIPQKQSNLEMNSEILESWANYQSSTSYSINTELSLFSKVNGKFSTEFQRMKTLQVKDQAITTRVQVRNLVYTVKINPTLELSSGFRKELLDISDRLENNQTRMATYLAELLVLNYGTHVTTSVDAGAALIQEDHLRASFLQDSQSSRSAVTASAGLAFQNTVNFKFEENYTSQNVLTKSYLSNRTNSRVQSIGGVPFYPGITLQAWQQGITNHLVAIDRSGLPLHFFINPNMLPDLPGPLVKKVSKTVETAVKRYYTFNTYPGCTDLNSPNFNFQANTDDGSCEGKMTNFSFGGVYQECTQLSGNRDVLLCQKLEQKNPLTGDFSCPSGYSPVHLLSQIHEEGYNHLECHRKCTLLVFCKTVCEDVFQVAKAEFRAFWCVASSQVPENSGLLFGGLFSSKSINPMTNAQSCPAG.... The pIC50 is 5.0. (3) The drug is COc1cccc(CN2CCN(c3cc(N4CCCC4)nc(N4CCCC4)n3)CC2)c1. The target protein (P49662) has sequence MAEGNHRKKPLKVLESLGKDFLTGVLDNLVEQNVLNWKEEEKKKYYDAKTEDKVRVMADSMQEKQRMAGQMLLQTFFNIDQISPNKKAHPNMEAGPPESGESTDALKLCPHEEFLRLCKERAEEIYPIKERNNRTRLALIICNTEFDHLPPRNGADFDITGMKELLEGLDYSVDVEENLTARDMESALRAFATRPEHKSSDSTFLVLMSHGILEGICGTVHDEKKPDVLLYDTIFQIFNNRNCLSLKDKPKVIIVQACRGANRGELWVRDSPASLEVASSQSSENLEEDAVYKTHVEKDFIAFCSSTPHNVSWRDSTMGSIFITQLITCFQKYSWCCHLEEVFRKVQQSFETPRAKAQMPTIERLSMTRYFYLFPGN. The pIC50 is 6.1. (4) The target protein sequence is APITAYAQQTRGLLGCIITSLTGRDKNQVEGEVQIVSTAAQTFLATCINGVCWTVYHGAGTRTIASPKGPVIQMYTNVDQDLVGWPAPQGARSLTPCTCGSSDLYLVTRHADVIPVRRRGDSRGSLLSPRPISYLKGSSGGPLLCPAGHAVGLFRAAVCTRGVAKAVDFIPVENLETTMRS. The small molecule is C=CC(=O)NC[C@H](NC(=O)[C@@H](NC(=O)c1cnccn1)C1CCCCC1)C(=O)N1C[C@@H]2CCC[C@@H]2[C@H]1C(=O)N[C@@H](CCC)C(=O)C(=O)NC1CC1. The pIC50 is 6.6. (5) The small molecule is O=C(O)c1cc(C(F)(F)F)ccc1N1CC[C@H](OCCOCc2ccccc2)C1. The target protein (P06858) has sequence MESKALLVLTLAVWLQSLTASRGGVAAADQRRDFIDIESKFALRTPEDTAEDTCHLIPGVAESVATCHFNHSSKTFMVIHGWTVTGMYESWVPKLVAALYKREPDSNVIVVDWLSRAQEHYPVSAGYTKLVGQDVARFINWMEEEFNYPLDNVHLLGYSLGAHAAGIAGSLTNKKVNRITGLDPAGPNFEYAEAPSRLSPDDADFVDVLHTFTRGSPGRSIGIQKPVGHVDIYPNGGTFQPGCNIGEAIRVIAERGLGDVDQLVKCSHERSIHLFIDSLLNEENPSKAYRCSSKEAFEKGLCLSCRKNRCNNLGYEINKVRAKRSSKMYLKTRSQMPYKVFHYQVKIHFSGTESETHTNQAFEISLYGTVAESENIPFTLPEVSTNKTYSFLIYTEVDIGELLMLKLKWKSDSYFSWSDWWSSPGFAIQKIRVKAGETQKKVIFCSREKVSHLQKGKAPAVFVKCHDKSLNKKSG. The pIC50 is 4.5. (6) The drug is CCCCCCCCCCCCCC1OC(=O)/C1=C\CCCCC(N)=O. The target protein (Q96IZ2) has sequence MTKTSTCIYHFLVLSWYTFLNYYISQEGKDEVKPKILANGARWKYMTLLNLLLQTIFYGVTCLDDVLKRTKGGKDIKFLTAFRDLLFTTLAFPVSTFVFLAFWILFLYNRDLIYPKVLDTVIPVWLNHAMHTFIFPITLAEVVLRPHSYPSKKTGLTLLAAASIAYISRILWLYFETGTWVYPVFAKLSLLGLAAFFSLSYVFIASIYLLGEKLNHWKWGDMRQPRKKRK. The pIC50 is 5.9. (7) The compound is NC1=Nc2cccc3cccc(c23)N1. The target protein (P13738) has sequence MKHLHRFFSSDASGGIILIIAAILAMIMANSGATSGWYHDFLETPVQLRVGSLEINKNMLLWINDALMAVFFLLVGLEVKRELMQGSLASLRQAAFPVIAAIGGMIVPALLYLAFNYADPITREGWAIPAATDIAFALGVLALLGSRVPLALKIFLMALAIIDDLGAIIIIALFYTNDLSMASLGVAAVAIAVLAVLNLCGARRTGVYILVGVVLWTAVLKSGVHATLAGVIVGFFIPLKEKHGRSPAKRLEHVLHPWVAYLILPLFAFANAGVSLQGVTLDGLTSILPLGIIAGLLIGKPLGISLFCWLALRLKLAHLPEGTTYQQIMVVGILCGIGFTMSIFIASLAFGSVDPELINWAKLGILVGSISSAVIGYSWLRVRLRPSV. The pIC50 is 5.8.