This data is from Drug-target binding data from BindingDB using IC50 measurements. The task is: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50. (1) The small molecule is COc1ccc2oc(Nc3ccc(SC)cc3)nc2c1. The target protein (P48999) has sequence MPSYTVTVATGSQWFAGTDDYIYLSLIGSAGCSEKHLLDKAFYNDFERGAVDSYDVTVDEELGEIYLVKIEKRKYWLHDDWYLKYITLKTPHGDYIEFPCYRWITGEGEIVLRDGRAKLARDDQIHILKQHRRKELEARQKQYRWMEWNPGFPLSIDAKCHKDLPRDIQFDSEKGVDFVLNYSKAMENLFINRFMHMFQSSWHDFADFEKIFVKISNTISERVKNHWQEDLMFGYQFLNGCNPVLIKRCTALPPKLPVTTEMVECSLERQLSLEQEVQEGNIFIVDYELLDGIDANKTDPCTHQFLAAPICLLYKNLANKIVPIAIQLNQTPGESNPIFLPTDSKYDWLLAKIWVRSSDFHVHQTITHLLRTHLVSEVFGIAMYRQLPAVHPLFKLLVAHVRFTIAINTKAREQLICEYGLFDKANATGGGGHVQMVQRAVQDLTYSSLCFPEAIKARGMDSTEDIPFYFYRDDGLLVWEAIQSFTMEVVSIYYENDQVV.... The pIC50 is 5.6. (2) The drug is CC[C@H]1NC[C@H](O)[C@@H]1O. The target protein (Q8BVW0) has sequence MEAAEKEEISVEDEAVDKTIFKDCGKIAFYRRQKQQLTKTTTYQALLGSVDTEQDSTRFQIISEATKIPLVAEVYGIEKDIFRLKINEETPLKPRLVCSGDTGSLILTNRKGDLKCHVSANPFKIDLLSKNEAVISINSLGQLYFEHLQVPHKQRATKGNGQNTPAATSQENQEDLGLWEEKFGKFVDVKANGPSSVGLDFSLHGFEHLYGIPQHAESHQLKNTRDGDAYRLYNLDVYGYQVHDKMGIYGSVPYLLAHKQGRTVGIFWLNASETLVEINTEPAVEYTLTQMGPAAAKPKVRCRTDVHWMSESGIIDVFLLTGPTPADVFKQYSYITGTQAMPPLFSLGYHQCRWNYEDEQDVKAVDAGFDEHDIPYDVMWLDIEHTEDKKYFTWDKKRFANPKRMQELLRSKKRKLVVISDPHIKVDPDYTVYAQAKEQGFFVKNPEGGDFEGVCWPGLSSYLDFTNPKVREWYSSLFAFPVYQGSTDILFLWNDMNEPS.... The pIC50 is 4.8. (3) The small molecule is Oc1ccc(-c2csc(Nc3ccccc3)n2)cc1. The target protein (P00636) has sequence MTDQAAFDTNIVTLTRFVMEEGRKARGTGEMTQLLNSLCTAVKAISTAVRKAGIAHLYGIAGSTNVTGDQVKKLDVLSNDLVINVLKSSFATCVLVSEEDKNAIIVEPEKRGKYVVCFDPLDGSSNIDCLVSIGTIFGIYRKNSTDEPSEKDALQPGRNLVAAGYALYGSATMLVLAMVNGVNCFMLDPAIGEFILVDRDVKIKKKGSIYSINEGYAKEFDPAITEYIQRKKFPPDNSAPYGARYVGSMVADVHRTLVYGGIFMYPANKKSPKGKLRLLYECNPMAYVMEKAGGLATTGKEAVLDIVPTDIHQRAPIILGSPEDVTELLEIYQKHAAK. The pIC50 is 3.5. (4) The compound is Cc1[nH]c(C(=O)N[C@@H]2CCN(c3ncc(C(=O)O)s3)C[C@@H]2C)c(Cl)c1Cl. The target protein (Q79EC5) has sequence MTAYILTAEAEADLRGIIRYTRREWGAAQVRRYIAKLEQGIARLAAGEGPFKDMSELFPALRMARCEHHYVFCLPRAGEPALVVAILHERMDLMTRLADRLKG. The pIC50 is 6.0. (5) The compound is CC(C)N(C[C@H]1O[C@@H](n2cnc3c(N)ncnc32)[C@H](O)[C@@H]1O)[C@H]1C[C@@H](CCc2nc3cc(C(C)(C)C)ccc3[nH]2)C1. The target protein sequence is MGEKLELRLKSPVGAEPAVYPWPLPVYDKHHDAAHEIIETIRWVCEEIPDLKLAMENYVLIDYDTKSFESMQRLCDKYNRAIDSIHQLWKGTTQPMKLNTRPSTGLLRHILQQVYNHSVTDPEKLNNYEPFSPEVYGETSFDLVAQMIDEIKMTDDDLFVDLGSGVGQVVLQVAAATNCKHHYGVEKADIPAKYAETMDREFRKWMKWYGKKHAEYTLERGDFLSEEWRERIANTSVIFVNNFAFGPEVDHQLKERFANMKEGGRIVSSKPFAPLNFRINSRNLSDIGTIMRVVELSPLKGSVSWTGKPVSYYLHTIDRTILENYFSSLKNPKLREEQEAARRRQQRESKSNAATPTKGPEGKVAGPADAPMDSGAEEEKAGAATVKKPSPSKARKKKLNKKGRKMAGRKRGRPKK. The pIC50 is 9.1. (6) The drug is CC1=CC2=NC(C(=O)NCCSCc3ccccc3F)CN2C=C1. The target protein (Q01860) has sequence MAGHLASDFAFSPPPGGGGDGPGGPEPGWVDPRTWLSFQGPPGGPGIGPGVGPGSEVWGIPPCPPPYEFCGGMAYCGPQVGVGLVPQGGLETSQPEGEAGVGVESNSDGASPEPCTVTPGAVKLEKEKLEQNPEESQDIKALQKELEQFAKLLKQKRITLGYTQADVGLTLGVLFGKVFSQTTICRFEALQLSFKNMCKLRPLLQKWVEEADNNENLQEICKAETLVQARKRKRTSIENRVRGNLENLFLQCPKPTLQQISHIAQQLGLEKDVVRVWFCNRRQKGKRSSSDYAQREDFEAAGSPFSGGPVSFPLAPGPHFGTPGYGSPHFTALYSSVPFPEGEAFPPVSVTTLGSPMHSN. The pIC50 is 5.1. (7) The compound is Clc1cccc(Cn2cnc(-c3cc4[nH]nnc4cn3)c2)c1. The target protein sequence is MASVPVYCLCRLPYDVTRFMIECDMCQDWFHGSCVGVEEEKAADIDLYHCPNCEVLHGPSIMKKRRGSSKGHDTHKGKPVKTGSPTFVRELRSRTFDSSDEVILKPTGNQLTVEFLEENSFSVPILVLKKDGLGMTLPSPSFTVRDVEHYVGSDKEIDVIDVTRQADCKMKLGDFVKYYYSGKREKVLNVISLEFSDTRLSNLVETPKIVRKLSWVENLWPEECVFERPNVQKYCLMSVRDSYTDFHIDFGGTSVWYHVLKGEKIFYLIRPTNANLTLFECWSSSSNQNEMFFGDQVDKCYKCSVKQGQTLFIPTGWIHAVLTPVDCLAFGGNFLHSLNIEMQLKAYEIEKRLSTADLFRFPNFETICWYVGKHILDIFRGLRENRRHPASYLVHGGKALNLAFRAWTRKEALPDHEDEIPETVRTVQLIKDLAREIRLVEDIFQQNVGKTSNIFGLQRIFPAGSIPLTRPAHSTSVSMSRLSLPSKNGSKKKGLKPKEL.... The pIC50 is 5.3.