This data is from Drug-target binding data from BindingDB using IC50 measurements. The task is: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50. (1) The drug is O=C1c2c(O)cc(O)cc2O[C@H](c2cc(O)c(O)c(O)c2)[C@H]1O. The target protein sequence is MLPGLALLLLAAWTARALEVPTDGNAGLLAEPQIAMFCGRLN. The pIC50 is 4.6. (2) The compound is CC(C)CN([C@H](CO)CCCCNC(=O)[C@H](Cc1ccccc1Br)NC(=O)C1=NCC(C)N=C1)S(=O)(=O)c1ccc(N)cc1. The pIC50 is 9.0. The target protein sequence is PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVRQYDQILIEICGKKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF. (3) The compound is O=C(O)[C@H](Cc1c[nH]c2ccccc12)NC(=O)[C@H](CS)C1CCc2ccccc21. The target protein (P0DPD8) has sequence MASPGAGRAPPELPERNCGYREVEYWDQRYQGAADSAPYDWFGDFSSFRALLEPELRPEDRILVLGCGNSALSYELFLGGFPNVTSVDYSSVVVAAMQARHAHVPQLRWETMDVRKLDFPSASFDVVLEKGTLDALLAGERDPWTVSSEGVHTVDQVLSEVGFQKGTRQLLGSRTQLELVLAGASLLLAALLLGCLVALGVQYHRDPSHSTCLTEACIRVAGKILESLDRGVSPCEDFYQFSCGGWIRRNPLPDGRSRWNTFNSLWDQNQAILKHLLENTTFNSSSEAEQKTQRFYLSCLQVERIEELGAQPLRDLIEKIGGWNITGPWDQDNFMEVLKAVAGTYRATPFFTVYISADSKSSNSNVIQVDQSGLFLPSRDYYLNRTANEKVLTAYLDYMEELGMLLGGRPTSTREQMQQVLELEIQLANITVPQDQRRDEEKIYHKMSISELQALAPSMDWLEFLSFLLSPLELSDSEPVVVYGMDYLQQVSELINRTEP.... The pIC50 is 7.8. (4) The compound is [N-]=[N+]=Nc1ccc(OC[C@H](O)/C=C/[C@H]2[C@H](O)C[C@H](O)[C@@H]2C/C=C\CCCC(=O)O)c(O)c1. The target protein (P37289) has sequence MSTNSSIQPVSPESELLSNTTCQLEEDLSISFSIIFMTVGILSNSLAIAILMKAYQRFRQKYKSSFLLLASALVITDFFGHLINGTIAVFVYASDKDWIYFDKSNILCSIFGICMVFSGLCPLFLGSLMAIERCIGVTKPIFHSTKITTKHVKMMLSGVCFFAVFVALLPILGHRDYKIQASRTWCFYKTDEIKDWEDRFYLLLFAFLGLLALGISFVCNAITGISLLKVKFRSQQHRQGRSHHFEMVIQLLGIMCVSCICWSPFLVTMASIGMNIQDFKDSCERTLFTLRMATWNQILDPWVYILLRKAVLRNLYVCTRRCCGVHVISLHVWELSSIKDSLKVAAISDLPVTEKVTQQTST. The pIC50 is 3.9. (5) The compound is [C-]#[N+]C1(c2cc(Nc3nc(NC4CC(F)(F)C4)nc(-c4cccc(C(F)(F)F)n4)n3)ccn2)CC1. The target protein sequence is MSKKISGGSVVEMQGDEMTRIIWELIKEKLIFPYVELDLHSYDLGIENRDATNDQVTKDAAEAIKKHNVGVKCATITPDEKRVEEFKLKQMWKSPNGTIRNILGGTVFREAIICKNIPRLVSGWVKPIIIGCHAYGDQYRATDFVVPGPGKVEITYTPSDGTQKVTYLVHNFEEGGGVAMGMYNQDKSIEDFAHSSFQMALSKGWPLYLSTKNTILKKYDGRFKDIFQEIYDKQYKSQFEAQKIWYEHRLIDDMVAQAMKSEGGFIWACKNYDGDVQSDSVAQGYGSLGMMTSVLVCPDGKTVEAEAAHGTVTRHYRMYQKGQETSTNPIASIFAWTRGLAHRAKLDNNKELAFFANALEEVSIETIEAGFMTKDLAACIKGLPNVQRSDYLNTFEFMDKLGENLKIKLAQAKL. The pIC50 is 7.1. (6) The compound is CCCC[C@H](NC(=O)[C@H](Cc1ccc(S(=O)(=O)O)cc1)NC(=O)OC(C)(C)C)C(=O)NCC(=O)N[C@H](Cc1c[nH]c2ccccc12)C(=O)N[C@@H](CCCC)C(=O)NC(CCCOc1ccccc1)CC(=O)O. The target protein (P04746) has sequence MKFFLLLFTIGFCWAQYSPNTQQGRTSIVHLFEWRWVDIALECERYLAPKGFGGVQVSPPNENVAIYNPFRPWWERYQPVSYKLCTRSGNEDEFRNMVTRCNNVGVRIYVDAVINHMCGNAVSAGTSSTCGSYFNPGSRDFPAVPYSGWDFNDGKCKTGSGDIENYNDATQVRDCRLTGLLDLALEKDYVRSKIAEYMNHLIDIGVAGFRLDASKHMWPGDIKAILDKLHNLNSNWFPAGSKPFIYQEVIDLGGEPIKSSDYFGNGRVTEFKYGAKLGTVIRKWNGEKMSYLKNWGEGWGFVPSDRALVFVDNHDNQRGHGAGGASILTFWDARLYKMAVGFMLAHPYGFTRVMSSYRWPRQFQNGNDVNDWVGPPNNNGVIKEVTINPDTTCGNDWVCEHRWRQIRNMVIFRNVVDGQPFTNWYDNGSNQVAFGRGNRGFIVFNNDDWSFSLTLQTGLPAGTYCDVISGDKINGNCTGIKIYVSDDGKAHFSISNSAED.... The pIC50 is 9.0. (7) The compound is C[C@@H]1CC=CCCCCCC(=O)Cc2c(Cl)c(O)cc(O)c2C(=O)O1. The target protein (P02829) has sequence MASETFEFQAEITQLMSLIINTVYSNKEIFLRELISNASDALDKIRYKSLSDPKQLETEPDLFIRITPKPEQKVLEIRDSGIGMTKAELINNLGTIAKSGTKAFMEALSAGADVSMIGQFGVGFYSLFLVADRVQVISKSNDDEQYIWESNAGGSFTVTLDEVNERIGRGTILRLFLKDDQLEYLEEKRIKEVIKRHSEFVAYPIQLVVTKEVEKEVPIPEEEKKDEEKKDEEKKDEDDKKPKLEEVDEEEEKKPKTKKVKEEVQEIEELNKTKPLWTRNPSDITQEEYNAFYKSISNDWEDPLYVKHFSVEGQLEFRAILFIPKRAPFDLFESKKKKNNIKLYVRRVFITDEAEDLIPEWLSFVKGVVDSEDLPLNLSREMLQQNKIMKVIRKNIVKKLIEAFNEIAEDSEQFEKFYSAFSKNIKLGVHEDTQNRAALAKLLRYNSTKSVDELTSLTDYVTRMPEHQKNIYYITGESLKAVEKSPFLDALKAKNFEVLF.... The pIC50 is 6.7. (8) The target protein (P30872) has sequence MFPNGTASSPSSSPSPSPGSCGEGGGSRGPGAGAADGMEEPGRNASQNGTLSEGQGSAILISFIYSVVCLVGLCGNSMVIYVILRYAKMKTATNIYILNLAIADELLMLSVPFLVTSTLLRHWPFGALLCRLVLSVDAVNMFTSIYCLTVLSVDRYVAVVHPIKAARYRRPTVAKVVNLGVWVLSLLVILPIVVFSRTAANSDGTVACNMLMPEPAQRWLVGFVLYTFLMGFLLPVGAICLCYVLIIAKMRMVALKAGWQQRKRSERKITLMVMMVVMVFVICWMPFYVVQLVNVFAEQDDATVSQLSVILGYANSCANPILYGFLSDNFKRSFQRILCLSWMDNAAEEPVDYYATALKSRAYSVEDFQPENLESGGVFRNGTCTSRITTL. The compound is CC(=O)N[C@@H]1SS[C@H](C(=O)O)NC(=O)[C@H](Cc2ccccc2)NC(=O)[C@H]([C@@H](C)O)NC(=O)[C@H](CCCCN)NC(=O)[C@@H](Cc2c[nH]c3ccccc23)NC(=O)[C@H](Cc2ccccc2)NC(=O)[C@H](Cc2ccccc2)NC1=O. The pIC50 is 6.5. (9) The small molecule is CC(=N)N1CCC(Oc2ccc3c(c2)OCC(=O)N3Cc2cc3ccc(C(=N)N)cc3n2C)CC1. The target protein (P00763) has sequence MRALLFLALVGAAVAFPVDDDDKIVGGYTCQENSVPYQVSLNSGYHFCGGSLINDQWVVSAAHCYKSRIQVRLGEHNINVLEGNEQFVNAAKIIKHPNFDRKTLNNDIMLIKLSSPVKLNARVATVALPSSCAPAGTQCLISGWGNTLSSGVNEPDLLQCLDAPLLPQADCEASYPGKITDNMVCVGFLEGGKDSCQGDSGGPVVCNGELQGIVSWGYGCALPDNPGVYTKVCNYVDWIQDTIAAN. The pIC50 is 7.1. (10) The small molecule is Cc1c(CCC(=O)O)c(=O)oc2cc(OCc3cccc(-c4ccccc4)c3)ccc12. The target protein sequence is MSDVMADRTPPHNIEAEQAVLGAILIDQDALTSASELLVPDSFYRTKHQKIFEVMLGLSDKGEPIDLVMMTSAMADQGLLEEVGGVSYLAELAEVVPTAANVEYYARIIAEKALLRRLIRTATHIVSDGYEREDDVDGLLNEAEKKILEVSHQTNAKAFQNIKDVLVDAYDKIELLHNQKGEVTGIPTGFTELDKMTAGFQRNDLIIVAARPSVGKTAFSLNIAQNVATKTDENVAIFSLEMGADQLVMRMLCAEGNIDAQRLRTGSLTSDDWAKLTMAMGSLSNAGIYIDDTPGIKVNEIRAKCRRLKQEQGLGMILIDYLQLIQGSGKSGENRQQEVSEISRTLKGIARELQVPVIALSQLSRGVESRQDKRPMMSDIRESGSIEQDADIVAFLYREDYYDRETENKNTIEIIIAKQRNGPVGSVELAFVKEFNKFVNLERRFEDGHAPPA. The pIC50 is 4.8.