Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50. From a dataset of Drug-target binding data from BindingDB using IC50 measurements. (1) The drug is CN(C)C(=O)c1ccc2c(c1)nc(-c1ccccn1)n2[C@@H]1CCC[C@H](NC(=O)c2ccc(Br)s2)C1. The target protein (O15382) has sequence MAAAALGQIWARKLLSVPWLLCGPRRYASSSFKAADLQLEMTQKPHKKPGPGEPLVFGKTFTDHMLMVEWNDKGWGQPRIQPFQNLTLHPASSSLHYSLQLFEGMKAFKGKDQQVRLFRPWLNMDRMLRSAMRLCLPSFDKLELLECIRRLIEVDKDWVPDAAGTSLYVRPVLIGNEPSLGVSQPTRALLFVILCPVGAYFPGGSVTPVSLLADPAFIRAWVGGVGNYKLGGNYGPTVLVQQEALKRGCEQVLWLYGPDHQLTEVGTMNIFVYWTHEDGVLELVTPPLNGVILPGVVRQSLLDMAQTWGEFRVVERTITMKQLLRALEEGRVREVFGSGTACQVCPVHRILYKDRNLHIPTMENGPELILRFQKELKEIQYGIRAHEWMFPV. The pIC50 is 4.7. (2) The small molecule is OC[C@H]1N[C@H](CO)[C@@H](O)[C@H](O)[C@H]1O. The target protein (Q8BVW0) has sequence MEAAEKEEISVEDEAVDKTIFKDCGKIAFYRRQKQQLTKTTTYQALLGSVDTEQDSTRFQIISEATKIPLVAEVYGIEKDIFRLKINEETPLKPRLVCSGDTGSLILTNRKGDLKCHVSANPFKIDLLSKNEAVISINSLGQLYFEHLQVPHKQRATKGNGQNTPAATSQENQEDLGLWEEKFGKFVDVKANGPSSVGLDFSLHGFEHLYGIPQHAESHQLKNTRDGDAYRLYNLDVYGYQVHDKMGIYGSVPYLLAHKQGRTVGIFWLNASETLVEINTEPAVEYTLTQMGPAAAKPKVRCRTDVHWMSESGIIDVFLLTGPTPADVFKQYSYITGTQAMPPLFSLGYHQCRWNYEDEQDVKAVDAGFDEHDIPYDVMWLDIEHTEDKKYFTWDKKRFANPKRMQELLRSKKRKLVVISDPHIKVDPDYTVYAQAKEQGFFVKNPEGGDFEGVCWPGLSSYLDFTNPKVREWYSSLFAFPVYQGSTDILFLWNDMNEPS.... The pIC50 is 6.5. (3) The drug is CCC1=C[C@H](C(=O)O)[C@H]2C(=O)C(C(=O)OC)=C(CC)[C@H]2[C@H]1C(=O)OC. The target protein (P09884) has sequence MAPVHGDDSLSDSGSFVSSRARREKKSKKGRQEALERLKKAKAGEKYKYEVEDFTGVYEEVDEEQYSKLVQARQDDDWIVDDDGIGYVEDGREIFDDDLEDDALDADEKGKDGKARNKDKRNVKKLAVTKPNNIKSMFIACAGKKTADKAVDLSKDGLLGDILQDLNTETPQITPPPVMILKKKRSIGASPNPFSVHTATAVPSGKIASPVSRKEPPLTPVPLKRAEFAGDDVQVESTEEEQESGAMEFEDGDFDEPMEVEEVDLEPMAAKAWDKESEPAEEVKQEADSGKGTVSYLGSFLPDVSCWDIDQEGDSSFSVQEVQVDSSHLPLVKGADEEQVFHFYWLDAYEDQYNQPGVVFLFGKVWIESAETHVSCCVMVKNIERTLYFLPREMKIDLNTGKETGTPISMKDVYEEFDEKIATKYKIMKFKSKPVEKNYAFEIPDVPEKSEYLEVKYSAEMPQLPQDLKGETFSHVFGTNTSSLELFLMNRKIKGPCWLE.... The pIC50 is 4.3. (4) The compound is Cc1[nH]n(-c2ccccc2)c(=O)c1N=Nc1c(O)cc(S(=O)(=O)O)c2ccccc12. The target protein (P25044) has sequence MAAAPWYIRQRDTDLLGKFKFIQNQEDGRLREATNGTVNSRWSLGVSIEPRNDARNRYVNIMPYERNRVHLKTLSGNDYINASYVKVNVPGQSIEPGYYIATQGPTRKTWDQFWQMCYHNCPLDNIVIVMVTPLVEYNREKCYQYWPRGGVDDTVRIASKWESPGGANDMTQFPSDLKIEFVNVHKVKDYYTVTDIKLTPTDPLVGPVKTVHHFYFDLWKDMNKPEEVVPIMELCAHSHSLNSRGNPIIVHCSAGVGRTGTFIALDHLMHDTLDFKNITERSRHSDRATEEYTRDLIEQIVLQLRSQRMKMVQTKDQFLFIYHAAKYLNSLSVNQ. The pIC50 is 3.9. (5) The drug is CCc1ccc2nc(O)c(CNc3ccc(N(C)C)cc3)cc2c1. The target protein (P9WHH9) has sequence MTHYDVVVLGAGPGGYVAAIRAAQLGLSTAIVEPKYWGGVCLNVGCIPSKALLRNAELVHIFTKDAKAFGISGEVTFDYGIAYDRSRKVAEGRVAGVHFLMKKNKITEIHGYGTFADANTLLVDLNDGGTESVTFDNAIIATGSSTRLVPGTSLSANVVTYEEQILSRELPKSIIIAGAGAIGMEFGYVLKNYGVDVTIVEFLPRALPNEDADVSKEIEKQFKKLGVTILTATKVESIADGGSQVTVTVTKDGVAQELKAEKVLQAIGFAPNVEGYGLDKAGVALTDRKAIGVDDYMRTNVGHIYAIGDVNGLLQLAHVAEAQGVVAAETIAGAETLTLGDHRMLPRATFCQPNVASFGLTEQQARNEGYDVVVAKFPFTANAKAHGVGDPSGFVKLVADAKHGELLGGHLVGHDVAELLPELTLAQRWDLTASELARNVHTHPTMSEALQECFHGLVGHMINF. The pIC50 is 5.3. (6) The compound is Cc1cc(O)c2c(c1)C(=O)c1c(c(O)cc(O)c1-c1c(O)cc(O)c3c1C(=O)c1cc(C)cc(O)c1C3=O)C2=O. The target protein (P0DPI1) has sequence MPFVNKQFNYKDPVNGVDIAYIKIPNAGQMQPVKAFKIHNKIWVIPERDTFTNPEEGDLNPPPEAKQVPVSYYDSTYLSTDNEKDNYLKGVTKLFERIYSTDLGRMLLTSIVRGIPFWGGSTIDTELKVIDTNCINVIQPDGSYRSEELNLVIIGPSADIIQFECKSFGHEVLNLTRNGYGSTQYIRFSPDFTFGFEESLEVDTNPLLGAGKFATDPAVTLAHELIHAGHRLYGIAINPNRVFKVNTNAYYEMSGLEVSFEELRTFGGHDAKFIDSLQENEFRLYYYNKFKDIASTLNKAKSIVGTTASLQYMKNVFKEKYLLSEDTSGKFSVDKLKFDKLYKMLTEIYTEDNFVKFFKVLNRKTYLNFDKAVFKINIVPKVNYTIYDGFNLRNTNLAANFNGQNTEINNMNFTKLKNFTGLFEFYKLLCVRGIITSKTKSLDKGYNKALNDLCIKVNNWDLFFSPSEDNFTNDLNKGEEITSDTNIEAAEENISLDLIQ.... The pIC50 is 4.5. (7) The compound is Nc1ncnc2c1ncn2C1OC(COC(=O)c2ccc([S+](=O)([O-])Oc3ccc(/C=C/[N+](=O)[O-])cc3)cc2)C(O)C1O. The target protein (P00521) has sequence YITPVNSLEKHSWYHGPVSRNAAEYLLSSGINGSFLVRESESSPGQRSISLRYEGRVYHYRINTASDGKLYVSSESRFNTLAELVHHHSTVADGLITTLHYPAPKRNKPTIYGVSPNYDKWEMERTDITMKHKLGGGQYGEVYEGVWKKYSLTVAVKTLKEDTMEVEEFLKEAAVMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDYLRECNRQEVSAVVLLYMATQISSAMEYLEKKNFIHRDLAARNCLVGENHLVKVADFGLSRLMTGDTYTAHAGAKFPIKWTAPESLAYNKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLSQVYELLEKDYRMERPEGCPEKVYELMRACWQWNPSDRPSFAEIHQAFETMFQESSISDEVEKELGKRGTRGGAGSMLQAPELPTKTRTCRRAAEQKASPPSLTPKLLRRQVTASPSSGLSHKKEATKGSASGMGTPATAEPAPPSNKVGLSKASSEEMRVRRHKHSSE.... The pIC50 is 4.0. (8) The drug is O=C(Nc1ccccc1C(=O)O)c1ccc(-c2ccccc2)c(Oc2ccccc2)c1. The target protein (Q2FZS0) has sequence MNVGIKGFGAYAPEKIIDNAYFEQFLDTSDEWISKMTGIKERHWADDDQDTSDLAYEASLKAIADAGIQPEDIDMIIVATATGDMPFPTVANMLQERLGTGKVASMDQLAACSGFMYSMITAKQYVQSGDYHNILVVGADKLSKITDLTDRSTAVLFGDGAGAVIIGEVSDGRGIISYEMGSDGTGGKHLYLDKDTGKLKMNGREVFKFAVRIMGDASTRVVEKANLTSDDIDLFIPHQANIRIMESARERLGISKDKMSVSVNKYGNTSAASIPLSIDQELKNGKIKDDDTIVLVGFGGGLTWGAMTIKWGK. The pIC50 is 5.0. (9) The compound is CCCCCCCCNS(=O)(=O)NC1CCOC1=O. The target protein (P12746) has sequence MKNINADDTYRIINKIKACRSNNDINQCLSDMTKMVHCEYYLLAIIYPHSMVKSDISILDNYPKKWRQYYDDANLIKYDPIVDYSNSNHSPINWNIFENNAVNKKSPNVIKEAKTSGLITGFSFPIHTANNGFGMLSFAHSEKDNYIDSLFLHACMNIPLIVPSLVDNYRKINIANNKSNNDLTKREKECLAWACEGKSSWDISKILGCSERTVTFHLTNAQMKLNTTNRCQSISKAILTGAIDCPYFKN. The pIC50 is 5.1.