Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50.. Dataset: Drug-target binding data from BindingDB using IC50 measurements (1) The compound is C[C@@H]1CN(Cc2ccc(F)cc2)CCN1C(=O)COc1ccc(Cl)cc1Nc1c(NCCN2CCOCC2)c(=O)c1=O. The target protein sequence is MEISNITETYPTTTEYDYGDSTPCQKTDVRAFGAGLLPPLYSFVFIIGVVGNILVILVLMQHRRLQSMTSIYLFNLAVSDLVFLFTLPFWIDYKLKDNWVFGDAMCKLLSGFYYLGLYSEIFFIILLTIDRYLAIVHAVFSLRARTVTFGIITSIIIWALAILASIPALCFFKAQWEFTHHTCSPHFPDESLKTWKRFQALKLNLLGLILPLLVMIICYAGIIRILLRRPNEKKAKAVRLIFAITLLFFLLWTPYNLTVFVSAFQDVLFTNQCEQSKQLDLAIQVTEVIAYTHCCVNPIIYVFVGERFRKYLRQLFQRHVAIPLAKWLPFFSVDQLERTSSLTPSTGEHELSGGF. The pIC50 is 7.1. (2) The drug is N=C1N[C@H]2[C@H](COC(=O)NC3CCCCC3)NC(=N)N3CCC(O)(O)[C@]23N1. The target protein sequence is MASSSLPNLVPPGPHCLRPFTPESLAAIEQRAVEEEARLQRNKQMEIEEPERKPRSDLEAGKNLPLIYGDPPPEVIGIPLEDLDPYYSDKKTFIVLNKGKAIFRFSATPALYLLSPFSIVRRVAIKVLIHALFSMFIMITILTNCVFMTMSNPPSWSKHVEYTFTGIYTFESLIKMLARGFCIDDFTFLRDPWNWLDFSVITMAYVTEFVDLGNISALRTFRVLRALKTITVIPGLKTIVGALIQSVKKLSDVMILTVFCLSVFALVGLQLFMGNLRQKCVRWPPPMNDTNTTWYGNDTWYSNDTWYGNDTWYINDTWNSQESWAGNSTFDWEAYINDEGNFYFLEGSNDALLCGNSSDAGHCPEGYECIKAGRNPNYGYTSYDTFSWAFLALFRLMTQDYWENLFQLTLRAAGKTYMIFFVVIIFLGSFYLINLILAVVAMAYAEQNEATLAEDQEKEEEFQQMLEKYKKHQEELEKAKAAQALESGEEADGDPTHNKD.... The pIC50 is 6.2. (3) The compound is CCc1cc(CO[C@H]2CNC[C@@H]2NC(=O)c2cc(-c3ccncc3)cnc2N)ccc1C. The target protein (Q12866) has sequence MGPAPLPLLLGLFLPALWRRAITEAREEAKPYPLFPGPFPGSLQTDHTPLLSLPHASGYQPALMFSPTQPGRPHTGNVAIPQVTSVESKPLPPLAFKHTVGHIILSEHKGVKFNCSISVPNIYQDTTISWWKDGKELLGAHHAITQFYPDDEVTAIIASFSITSVQRSDNGSYICKMKINNEEIVSDPIYIEVQGLPHFTKQPESMNVTRNTAFNLTCQAVGPPEPVNIFWVQNSSRVNEQPEKSPSVLTVPGLTEMAVFSCEAHNDKGLTVSKGVQINIKAIPSPPTEVSIRNSTAHSILISWVPGFDGYSPFRNCSIQVKEADPLSNGSVMIFNTSALPHLYQIKQLQALANYSIGVSCMNEIGWSAVSPWILASTTEGAPSVAPLNVTVFLNESSDNVDIRWMKPPTKQQDGELVGYRISHVWQSAGISKELLEEVGQNGSRARISVQVHNATCTVRIAAVTRGGVGPFSDPVKIFIPAHGWVDYAPSSTPAPGNAD.... The pIC50 is 8.0. (4) The compound is CC[C@H](Cn1nncc1CCCCn1ccc(=O)[nH]c1=O)c1cccc(OCC2CC2)c1. The target protein (P33316) has sequence MTPLCPRPALCYHFLTSLLRSAMQNARGARQRAEAAVLSGPGPPLGRAAQHGIPRPLSSAGRLSQGCRGASTVGAAGWKGELPKAGGSPAPGPETPAISPSKRARPAEVGGMQLRFARLSEHATAPTRGSARAAGYDLYSAYDYTIPPMEKAVVKTDIQIALPSGCYGRVAPRSGLAAKHFIDVGAGVIDEDYRGNVGVVLFNFGKEKFEVKKGDRIAQLICERIFYPEIEEVQALDDTERGSGGFGSTGKN. The pIC50 is 6.1. (5) The drug is Cc1noc(C)c1-c1ccc2c(c1)C(NC[C@@H](C)O)(C1CCCCC1)C(=O)N2. The target protein sequence is MLQNVTPHNKLPGEGNAGLLGLGPEAAAPGKRIRKPSLLYEGFESPTMASVPALQLTPANPPPPEVSNPKKPGRVTNQLQYLHKVVMKALWKHQFAWPFRQPVDAVKLGLPDYHKIIKQPMDMGTIKRRLENNYYWAASECMQDFNTMFTNCYIYNKPTDDIVLMAQTLEKIFLQKVASMPQEEQELVVTIPKNSHKKGAKLAALQGSVTSAHQVPAVSSVSHTALYTPPPEIPTTVLNIPHPSVISSPLLKSLHSAGPPLLAVTAAPPAQPLAKKKGVKRKADTTTPTPTAILAPGSPASPPGSLEPKAARLPPMRRESGRPIKPPRKDLPDSQQQHQSSKKGKLSEQLKHCNGILKELLSKKHAAYAWPFYKPVDASALGLHDYHDIIKHPMDLSTVKRKMENRDYRDAQEFAADVRLMFSNCYKYNPPDHDVVAMARKLQDVFEFRYAKMPDEPLEPGPLPVSTAMPPGL. The pIC50 is 7.5. (6) The small molecule is Nc1ncc(-c2cc(N3C(=O)O[C@@H](c4ccc(OCCCO)cc4)[C@@H]3CO)nc(N3CCOCC3)n2)c(C(F)(F)F)n1. The target protein (A0A0G2K344) has sequence MPPRPSSGELWGIHLMPPRILVECLLPNGMIVTLECLREATLVTIKHELFKEARKYPLHQLLQDESSYIFVSVTQEAEREEFFDETRRLCDLRLFQPFLKVIEPVGNREEKILNREIGFVIGMPVCEFDMVKDPEVQDFRRNILNVCKEAVDLRDLNSPHSRAMYVYPPNVESSPELPKHIYNKLDKGQIIVVIWVIVSPNNDKQKYTLKINHDCVPEQVIAEAIRKKTRSMLLSSEQLKLCVLEYQGKYILKVCGCDEYFLEKYPLSQYKYIRSCIMLGRMPNLMLMAKESLYSQLPIDSFTMPSYSRRISTATPYMNGETATKSLWVINSALRIKILCATYVNVNIRDIDKIYVRTGIYHGGEPLCDNVNTQRVPCSNPRWNEWLNYDIYIPDLPRAARLCLSICSVKGRKGAKEEHCPLAWGNINLFDYTDTLVSGKMALNLWPVPHGLEDLLNPIGVTGSNPNKETPCLELEFDWFSSVVKFPDMSVIEEHANWSV.... The pIC50 is 7.3.