Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50. From a dataset of Drug-target binding data from BindingDB using IC50 measurements. (1) The small molecule is COc1cc2cc(-c3cccc(F)c3)n(Cc3cccc(C(=O)O)n3)c2cc1F. The target protein (P70597) has sequence MSPYGLNLSLVDEATTCVTPRVPNTSVVLPTGGNGTSPALPIFSMTLGAVSNVLALALLAQVAGRLRRRRSTATFLLFVASLLAIDLAGHVIPGALVLRLYTAGRAPAGGACHFLGGCMVFFGLCPLLLGCGMAVERCVGVTQPLIHAARVSVARARLALALLAAMALAVALLPLVHVGHYELQYPGTWCFISLGPPGGWRQALLAGLFAGLGLAALLAALVCNTLSGLALLRARWRRRRSRRFRENAGPDDRRRWGSRGLRLASASSASSITSTTAALRSSRGGGSARRVHAHDVEMVGQLVGIMVVSCICWSPLLVLVVLAIGGWNSNSLQRPLFLAVRLASWNQILDPWVYILLRQAMLRQLLRLLPLRVSAKGGPTELSLTKSAWEASSLRSSRHSGFSHL. The pIC50 is 7.6. (2) The drug is Cn1ccnc1SC[C@]1(C)S[C@@H]2[C@H](Br)C(=O)N2[C@H]1C(=O)O. The target protein sequence is MIKSSWRKIAMLAAAVPLLLASGALWASTDAIHQKLTDLEKRSGGRLGVALINTADNSQILYRGDERFAMCSTSKVMAAAAVLKQSESNKEVVNKRLEINAADLVVWSPITEKHLQSGMTLAELSAATLQYSDNTAMNLIIGYLGGPEKVTAFARSIGDATFRLDRTEPTLNTAIPGDERDTSTPLAMAESLRKLTLGDALGEQQRAQLVTWLKGNTTGGQSIRAGLPESWVVGDKTGAGDYGTTNDIAVIWPEDHAPLILVTYFTQPQQDAKNRKEVLAAAAKIVTEGL. The pIC50 is 5.8. (3) The compound is Cc1ccc(C)n1-c1ccc(C(=O)O)c(O)c1. The target protein sequence is MAGIFYFALFSCLFGICDAVTGSRVYPANEVTLLDSRSVQGELGWIASPLEGGWEEVSIMDEKNTPIRTYQVCNVMEPSQNNWLRTDWITREGAQRVYIEIKFTLRDCNSLPGVMGTCKETFNLYYYESDNDKERFIRENQFVKIDTIAADESFTQVDIGDRIMKLNTEIRDVGPLSKKGFYLAFQDVGACIALVSVRVFYKKCPLTVRNLAQFPDTITGADTSSLVEVRGSCVNNSEEKDVPKMYCGADGEWLVPIGNCLCNAGHEERSGECQACKIGYYKALSTDATCAKCPPHSYSVWEGATSCTCDRGFFRADNDAASMPCTRPPSAPLNLISNVNETSVNLEWSSPQNTGGRQDISYNVVCKKCGAGDPSKCRPCGSGVHYTPQQNGLKTTKVSITDLLAHTNYTFEIWAVNGVSKYNPNPDQSVSVTVTTNQAAPSSIALVQAKEVTRYSVALAWLEPDRPNGVILEYEVKYYEKDQNERSYRIVRTAARNTDI.... The pIC50 is 5.9. (4) The drug is CCOC(=O)n1ccc2c1C(c1ccc(Cl)cc1)N(c1cc(C)c3nnc(C)n3c1)C2=O. The target protein sequence is NPPPPETSNPNKPKRQTNQLQYLLRVVLKTLWKHQFAWPFQQPVDAVKLNLPDYYKIIKTPMDMGTIKKRLENNYYWNAQECIQDFNTMFTNCYIYNKPGDDIVLMAEALEKLFLQKINELPTEETEIMIVQAKGRGRGRKETGTAKPGVSTVPNTTQASTPPQTQTPQPNPPPVQATPHPFPAVTPDLIVQTPVMTVVPPQPLQTPPPVPPQPQPPPAPAPQPVQSHPPIIAATPQPVKTKKGVKRKADTTTPTTIDPIHEPPSLPPEPKTTKLGQRRESSRPVKPPKKDVPDSQQHPAPEKSSKVSEQLKCCSGILKEMFAKKHAAYAWPFYKPVDVEALGLHDYCDIIKHPMDMSTIKSKLEAREYRDAQEFGADVRLMFSNCYKYNPPDHEVVAMARKLQDVFEMRFAKMPDEPEEPVVAVSSPAVPPPT. The pIC50 is 7.6. (5) The compound is CCOc1ncccc1-c1cc(NCc2cnn(C)n2)c2c(n1)c(C)nn2C(C)C. The target protein (Q14123) has sequence MESPTKEIEEFESNSLKYLQPEQIEKIWLRLRGLRKYKKTSQRLRSLVKQLERGEASVVDLKKNLEYAATVLESVYIDETRRLLDTEDELSDIQSDAVPSEVRDWLASTFTRQMGMMLRRSDEKPRFKSIVHAVQAGIFVERMYRRTSNMVGLSYPPAVIEALKDVDKWSFDVFSLNEASGDHALKFIFYELLTRYDLISRFKIPISALVSFVEALEVGYSKHKNPYHNLMHAADVTQTVHYLLYKTGVANWLTELEIFAIIFSAAIHDYEHTGTTNNFHIQTRSDPAILYNDRSVLENHHLSAAYRLLQDDEEMNILINLSKDDWREFRTLVIEMVMATDMSCHFQQIKAMKTALQQPEAIEKPKALSLMLHTADISHPAKAWDLHHRWTMSLLEEFFRQGDREAELGLPFSPLCDRKSTMVAQSQVGFIDFIVEPTFTVLTDMTEKIVSPLIDETSQTGGTGQRRSSLNSISSSDAKRSGVKTSGSEGSAPINNSVIS.... The pIC50 is 7.5. (6) The compound is CCCC[C@]1(CC)CS(=O)(=O)c2cc(CP(=O)(O)O)c(OC)cc2[C@@H](c2ccccc2)N1. The target protein (Q62633) has sequence MDNSSVCSPNATFCEGDSCLVTESNFNAILSTVMSTVLTILLAMVMFSMGCNVEINKFLGHIKRPWGIFVGFLCQFGIMPLTGFILSVASGILPVQAVVVLIMGCCPGGTGSNILAYWIDGDMDLSVSMTTCSTLLALGMMPLCLFIYTKMWVDSGTIVIPYDSIGISLVALVIPVSIGMFVNHKWPQKAKIILKIGSIAGAILIVLIAVVGGILYQSAWIIEPKLWIIGTIFPIAGYSLGFFLARLAGQPWYRCRTVALETGMQNTQLCSTIVQLSFSPEDLNLVFTFPLIYTVFQLVFAAIILGMYVTYKKCHGKNDAEFLEKTDNDMDPMPSFQETNKGFQPDEK. The pIC50 is 9.0. (7) The compound is CCc1cn([C@@H]2C[C@H](O)[C@@H](CO)O2)c(=O)[nH]c1=O. The target protein sequence is MASHAGHQDSPALDRVAGSAGHGDHPSALLRIYVDGPHGLGKTTTAAALAAALGRRDEIEYVPEPMAYWQTLGGPQTITRIFDAQHRLDRGEISASEAAMAMASAQVTMSTPYAVTESAVAPHIGAELPPGHGPHPNIDLTLVFDRHPVASLLCYPAARYLMGSLSLPTVLSFAALLPQTTPGTNLVLGALPEAVHAERLAQRQRPGERLDLAMLSAIRRVYDMLGNAIVYLQRGGSWRADWRRLSPARSAAASGRPARILPRPEIEDTIFALFCAPELLDETGEPYRVFAWTLDLLAERLRPMHLLVLDYNQAPHHCWMDLMEMIPEMTPTLPATPGSMLTLQLLAREFAREMTSTRGGDVGGEGRETR. The pIC50 is 4.3.