Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50.. Dataset: Drug-target binding data from BindingDB using IC50 measurements (1) The drug is CC(C)C[C@H](CO)NC(=O)C1O[C@H]2CN(Cc3ccccc3)C(=O)[C@@H]1O2. The target protein (P0CS83) has sequence MFLKNIFIALAIALLVDATPTTTKRSAGFVALDFSVVKTPKAFPVTNGQEGKTSKRQAVPVTLHNEQVTYAADITVGSNNQKLNVIVDTGSSDLWVPDVNVDCQVTYSDQTADFCKQKGTYDPSGSSASQDLNTPFKIGYGDGSSSQGTLYKDTVGFGGVSIKNQVLADVDSTSIDQGILGVGYKTNEAGGSYDNVPVTLKKQGVIAKNAYSLYLNSPDAATGQIIFGGVDNAKYSGSLIALPVTSDRELRISLGSVEVSGKTINTDNVDVLLDSGTTITYLQQDLADQIIKAFNGKLTQDSNGNSFYEVDCNLSGDVVFNFSKNAKISVPASEFAASLQGDDGQPYDKCQLLFDVNDANILGDNFLRSAYIVYDLDDNEISLAQVKYTSASSISALT. The pIC50 is 5.0. (2) The drug is O=C(NCC1(C(F)(F)F)OC(=O)Nc2ccc(Cl)cc21)Nc1ccc(F)cc1. The target protein (Q920L5) has sequence MNMSVLTLQEYEFEKQFNENEAIQWMQENWKKSFLFSALYAAFIFGGRHLMNKRAKFELRKPLVLWSLTLAVFSIFGALRTGAYMLYILMTKGLKQSVCDQSFYNGPVSKFWAYAFVLSKAPELGDTIFIILRKQKLIFLHWYHHITVLLYSWYSYKDMVAGGGWFMTMNYGVHAVMYSYYALRAAGFRVSRKFAMFITLSQITQMLMGCVINYLVFNWMQHDNDQCYSHFQNIFWSSLMYLSYLVLFCHFFFEAYIGKVKKATKAE. The pIC50 is 6.4. (3) The compound is COc1cc(OCc2cccc(-c3ccc(C(F)(F)F)cn3)c2)c2cc(-c3cn4nc(OC)sc4n3)oc2c1. The target protein (Q96RI0) has sequence MWGRLLLWPLVLGFSLSGGTQTPSVYDESGSTGGGDDSTPSILPAPRGYPGQVCANDSDTLELPDSSRALLLGWVPTRLVPALYGLVLVVGLPANGLALWVLATQAPRLPSTMLLMNLAAADLLLALALPPRIAYHLRGQRWPFGEAACRLATAALYGHMYGSVLLLAAVSLDRYLALVHPLRARALRGRRLALGLCMAAWLMAAALALPLTLQRQTFRLARSDRVLCHDALPLDAQASHWQPAFTCLALLGCFLPLLAMLLCYGATLHTLAASGRRYGHALRLTAVVLASAVAFFVPSNLLLLLHYSDPSPSAWGNLYGAYVPSLALSTLNSCVDPFIYYYVSAEFRDKVRAGLFQRSPGDTVASKASAEGGSRGMGTHSSLLQ. The pIC50 is 9.2. (4) The target protein (P50053) has sequence MEEKQILCVGLVVLDVISLVDKYPKEDSEIRCLSQRWQRGGNASNSCTVLSLLGAPCAFMGSMAPGHVADFLVADFRRRGVDVSQVAWQSKGDTPSSCCIINNSNGNRTIVLHDTSLPDVSATDFEKVDLTQFKWIHIEGRNASEQVKMLQRIDAHNTRQPPEQKIRVSVEVEKPREELFQLFGYGDVVFVSKDVAKHLGFQSAEEALRGLYGRVRKGAVLVCAWAEEGADALGPDGKLLHSDAFPPPRVVDTLGAGDTFNASVIFSLSQGRSVQEALRFGCQVAGKKCGLQGFDGIV. The pIC50 is 8.2. The small molecule is C[C@H]1[C@H](O)CN1c1nc(N2C[C@H]3[C@H](CC(=O)O)[C@H]3C2)cc(C(F)(F)F)c1C#N. (5) The drug is CCCCCCC=CC(=O)C(F)(F)F. The target protein sequence is MIQQRMLQLLLLGQLLAGPGPFCAALATVDQLTVCPPSVGCLKGTNLQGYQSERFEAFMGIPYALPPIGDLRFSNPKVMPKLLGMYDASAPKMDCIQKNYLLPTPVVYGDEDCLYLNVYRPEIRKSALPVMVYIHGGGFFGGSAGPGVTGPEYFMDSGEVILVTMAYRLGPFGFLSTQDAVMSGNFGLKDQNLALRWVQRNIRFFGGDPQRVTIFGQSAGGVAAHMHLLSPRSHGLFHRVISMSGTANVPFAIAEQPLEQARLLAEFADVPDARNLSTVKLTKALRRINATKLLNAGDGLKYWDVDHMTNFRPVVEEGLEVDAFLNAHPMDMLAQGMPTSIPLLLGTVPGEGAVRVVNILGNETLRQSFNLRFDELLQELLEFPASFSQDRREKMMDLLVEVYFQGQHEVNELTVQGFMNLISDRGFKQPLYNTIHKNVCHTPNPVYLYSFNYQGPLSYASAYTSANVTGKYGVVHCDDLLYLFRSPLLFPDFQRNSTEA.... The pIC50 is 4.1. (6) The drug is C[C@]12CC[C@H](O)C[C@]1(CO)CC=C1CCC12. The target protein (O93875) has sequence MDIVLEICDYYLFDKVYADVFPKDGAVHEFLKPAIQSFSQIDFPSLPNLDSFDTNSTLISSNNFNISNVNPATIPSYLFSKIASYQDKSEIYGLAPKFFPATDFINTSFLARSNIFRETLSLFIITTIFGWLLYFIVAYLSYVFVFDKKIFNHPRYLKNQMSLEIKRATTAIPVMVLLTIPFFLLELNGYSFLYLDINECTGGYKAILWQIPKFILFTDCGIYFLHRWLHWPSVYKVLHKPHHKWIVCTPFASHAFHPVDGFFQSLPYHLYPLLFPLHKVLYLFLFTFVNFWTVMIHDGSYWSNDPVVNGTACHTVHHLYFNYNYGQFTTLWDRLGNSYRRPDDSLFVKDVKAEEEKKIWKEQTRKMEEIRGEVEGKVDDREYVEQ. The pIC50 is 4.0. (7) The small molecule is COc1ccc(C2C=C(c3ccccc3)NC(C)=C2C(=O)SC(C)(C)C)cc1. The target protein (Q01668) has sequence MMMMMMMKKMQHQRQQQADHANEANYARGTRLPLSGEGPTSQPNSSKQTVLSWQAAIDAARQAKAAQTMSTSAPPPVGSLSQRKRQQYAKSKKQGNSSNSRPARALFCLSLNNPIRRACISIVEWKPFDIFILLAIFANCVALAIYIPFPEDDSNSTNHNLEKVEYAFLIIFTVETFLKIIAYGLLLHPNAYVRNGWNLLDFVIVIVGLFSVILEQLTKETEGGNHSSGKSGGFDVKALRAFRVLRPLRLVSGVPSLQVVLNSIIKAMVPLLHIALLVLFVIIIYAIIGLELFIGKMHKTCFFADSDIVAEEDPAPCAFSGNGRQCTANGTECRSGWVGPNGGITNFDNFAFAMLTVFQCITMEGWTDVLYWMNDAMGFELPWVYFVSLVIFGSFFVLNLVLGVLSGEFSKEREKAKARGDFQKLREKQQLEEDLKGYLDWITQAEDIDPENEEEGGEEGKRNTSMPTSETESVNTENVSGEGENRGCCGSLCQAISKSK.... The pIC50 is 4.6.