Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50.. Dataset: Drug-target binding data from BindingDB using IC50 measurements (1) The drug is CC(O)c1cccc(Nc2ncc3cc(-c4cc(NC(=O)c5cccc(C(F)(F)F)c5)ccc4Cl)c(=O)n(C)c3n2)c1. The target protein sequence is MLEICLKLVGCKSKKGLSSSSSCYLEEALQRPVASDFEPQGLSEAARWNSKENLLAGPSENDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNSLEKHSWYHGPVSRNAAEYLLSSGINGSFLVRESESSPGQRSISLRYEGRVYHYRINTASDGKLYVSSESRFNTLAELVHHHSTVADGLITTLHYPAPKRNKPTVYGVSPNYDKWEMERTDITMKHKLGGGQYGEVYEGVWKKYSLTVAVKTLKEDTMEVEEFLKEAAVMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDYLRECNRQEVNAVVLLYMATQISSAMEYLEKKNFIHRDLAARNCLVGENHLVKVADFGLSRLMTGDTYTAHAGAKFPIKWTAPESLAYNKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLSQVYELLEKDYRMERPEGCPEKVYELMRACWQWNPSDRPSFAEIHQAFETMFQES.... The pIC50 is 7.6. (2) The small molecule is Oc1ccc2c(c1F)CCCC(c1cccc(F)c1Cl)=C2c1ccc(O[C@H]2CCN(CCCF)C2)cc1. The target protein sequence is MTMTLHTKASGMALLHQIQGNELEPLNRPQLKIPLERPLGEVYLDSSKPAVYNYPEGAAYEFNAAAAANAQVYGQTGLPYGPGSEAAAFGSNGLGGFPPLNSVSPSPLMLLHPPPQLSPFLQPHGQQVPYYLENEPSGYTVREAGPPAFYRPNSDNRRQGGRERLASTNDKGSMAMESAKETRYCAVCNDYASGYHYGVWSCEGCKAFFKRSIQGHNDYMCPATNQCTIDKNRRKSCQACRLRKCYEVGMMKGGIRKDRRGGRMLKHKRQRDDGEGRGEVGSAGDMRAANLWPSPLMIKRSKKNSLALSLTADQMVSALLDAEPPILYSEYDPTRPFSEASMMGLLTNLADRELVHMINWAKRVPGFVDLTLHDQVHLLECAWLEILMIGLVWRSMEHPGKLLFAPNLLLDRNQGKCVEGMVEIFDMLLATSSRFRMMNLQGEEFVCLKSIILLNSGVYTFLSSTLKSLEEKDHIHRVLDKITDTLIHLMAKAGLTLQQQ.... The pIC50 is 7.6. (3) The drug is COC1=CC(=O)C(=O)C(C[C@@]2(C)CC[C@]3(C)C(C)=CCC[C@H]3[C@H]2C)C1=O. The target protein (P00630) has sequence MQVVLGSLFLLLLSTSHGWQIRDRIGDNELEERIIYPGTLWCGHGNKSSGPNELGRFKHTDACCRTHDMCPDVMSAGESKHGLTNTASHTRLSCDCDDKFYDCLKNSADTISSYFVGKMYFNLIDTKCYKLEHPVTGCGERTEGRCLHYTVDKSKPKVYQWFDLRKY. The pIC50 is 7.0. (4) The compound is O=C1c2cc(CO)cc(O)c2C(=O)c2c1ccc(C1(C3O[C@H](CO)[C@@H](O)[C@H](O)[C@H]3O)c3cccc(O)c3C(=O)c3c(O)cc(CO)cc31)c2O. The target protein (P13601) has sequence MSSPAQPAVPAPLANLKIQHTKIFINNEWHNSLNGKKFPVINPATEEVICHVEEGDKADVDKAVKAARQAFQIGSPWRTMDASERGCLLNKLADLMERDRVLLATMESMNAGKIFTHAYLLDTEVSIKALKYFAGWADKIHGQTIPSDGDVFTYTRREPIGVCGQIIPWNGPLILFIWKIGAALSCGNTVIVKPAEQTPLTALYMASLIKEAGFPPGVVNVVPGYGSTAGAAISSHMDIDKVSFTGSTEVGKLIKEAAGKSNLKRVTLELGGKSPCIVFADADLDSAVEFAHQGVFFHQGQICVAASRLFVEESIYDEFVRRSVERAKKYVLGNPLDSGISQGPQIDKEQHAKILDLIESGKKEGAKLECGGGRWGNKGFFVQPTVFSNVTDEMRIAKEEIFGPVQQIMKFKSIDEVIKRANNTPYGLAAGVFTKDLDRAITVSSALQAGTVWVNCYLTLSVQCPFGGFKMSGNGREMGEQGVYEYTELKTVAMKISQKN.... The pIC50 is 6.8. (5) The compound is C[C@H]1NC(=O)[C@@H]1NC(=O)OCCCCC1CCCCC1. The target protein (Q02083) has sequence MRTADREARPGLPSLLLLLLAGAGLSAASPPAAPRFNVSLDSVPELRWLPVLRHYDLDLVRAAMAQVIGDRVPKWVHVLIGKVVLELERFLPQPFTGEIRGMCDFMNLSLADCLLVNLAYESSVFCTSIVAQDSRGHIYHGRNLDYPFGNVLRKLTVDVQFLKNGQIAFTGTTFIGYVGLWTGQSPHKFTVSGDERDKGWWWENAIAALFRRHIPVSWLIRATLSESENFEAAVGKLAKTPLIADVYYIVGGTSPREGVVITRNRDGPADIWPLDPLNGAWFRVETNYDHWKPAPKEDDRRTSAIKALNATGQANLSLEALFQILSVVPVYNNFTIYTTVMSAGSPDKYMTRIRNPSRK. The pIC50 is 4.5.