Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50. From a dataset of Drug-target binding data from BindingDB using IC50 measurements. (1) The drug is CC(C)(C)c1ccc(CC(=O)N2CCC3(CC2)CCN(CCc2cccnc2)c2ccccc2O3)cc1. The target protein (P07308) has sequence MPAHMLQEISSSYTTTTTITEPPSGNLQNGREKMKKVPLYLEEDIRPEMREDIHDPSYQDEEGPPPKLEYVWRNIILMALLHVGALYGITLIPSSKVYTLLWGIFYYLISALGITAGAHRLWSHRTYKARLPLRIFLIIANTMAFQNDVYEWARDHRAHHKFSETHADPHNSRRGFFFSHVGWLLVRKHPAVKEKGGKLDMSDLKAEKLVMFQRRYYKPGLLLMCFILPTLVPWYCWGETFLHSLFVSTFLRYTLVLNATWLVNSAAHLYGYRPYDKNIQSRENILVSLGAVGEGFHNYHHAFPYDYSASEYRWHINFTTFFIDCMAALGLAYDRKKVSKAAVLARIKRTGDGSHKSS. The pIC50 is 5.5. (2) The compound is Cn1cnc(S(=O)(=O)n2cc3c(n2)CN([C@H]2CO[C@H](c4cc(F)ccc4F)[C@@H](N)C2)C3)c1. The target protein (A5D7B7) has sequence MKTWLKIVFGVATSAVLALLVMCIVLRPSRVHNSEESTTRALTLKDILNGTFSYKTFFPNWISGQEYLHQSTDNNVVFYNIETGESYTILSNTTMKSVNASNYGLSPDRQFAYLESDYSKLWRYSYTATYHIYDLTNGEFIRRNELPRPIQYLCWSPVGSKLAYVYQNNIYLKQRPEDPPFQITYNGKENKIFNGIPDWVYEEEMLATKYALWWSPNGKFLAYAEFNDTEIPVIAYSYYGDEQYPRTINIPYPKAGAKNPVVRIFIIDATYPEHIGPREVPVPAMIASSDYYFSWLTWVTDDRICLQWLKRIQNVSVLSTCDFREDWQTWNCPKTQEHIEESRTGWAGGFFVSTPVFSHDTISYYKIFSDKDGYKHIHYIRDTVENAIQITSGKWEAINIFRVTQDSLFYSSNEFEGYPGRRNIYRISIGSHSPSKKCITCHLRKKRCQYYTASFSDYAKYYALVCYGPGLPISTLHDGRTDQEIKILEDNKELENALKN.... The pIC50 is 4.0. (3) The small molecule is O=c1oc(OCCCCCc2ccccc2)c(Cl)c2ccc([N+](=O)[O-])cc12. The target protein (P46116) has sequence MAEQQNPFSIKSKARFSLGAIALTLTLVLLNIAVYFYQIVFASPLDSRESNLILFGANIYQLSLTGDWWRYPISMMLHSNGTHLAFNCLALFVIGIGCERAYGKFKLLAIYIISGIGAALFSAYWQYYEISNSDLWTDSTVYITIGVGASGAIMGIAAASVIYLIKVVINKPNPHPVIQRRQKYQLYNLIAMIALTLINGLQSGVDNAAHIGGAIIGALISIAYILVPHKLRVANLCITVIAASLLTMMIYLYSFSTNKHLLEEREFIYQEVYTELADANQ. The pIC50 is 5.0. (4) The drug is CC(C)OC(=O)NC(=O)[C@@H]1CCCN1C(=O)[C@H](CC1CCCC1)CN(O)C=O. The pIC50 is 9.2. The target protein (Q9F2F0) has sequence MSAIERITKAAHLIDMNDIIREGNPTLRAIAEEVTFPLSDQEIILGEKMMQFLKHSQDPVMAEKMGLRGGVGLAAPQLDISKRIIAVLVPNIVEEGETPQEAYDLEAIMYNPKIVSHSVQDAALGEGEGCLSVDRNVPGYVVRHARVTVDYFDKDGEKHRIKLKGYNSIVVQHEIDHINGIMFYDRINEKDPFAVKDGLLILE. (5) The small molecule is O=C1c2ccccc2C(=O)N1c1ccc(S(=O)(=O)Nc2ccc([N+](=O)[O-])cc2)cc1. The target protein (P14340) has sequence MNNQRKKARNTPFNMLKRERNRVSTVQQLTKRFSLGMLQGRGPLKLFMALVAFLRFLTIPPTAGILKRWGTIKKSKAINVLRGFRKEIGRMLNILNRRRRTAGMIIMLIPTVMAFHLTTRNGEPHMIVSRQEKGKSLLFKTEDGVNMCTLMAMDLGELCEDTITYKCPFLKQNEPEDIDCWCNSTSTWVTYGTCTTTGEHRREKRSVALVPHVGMGLETRTETWMSSEGAWKHAQRIETWILRHPGFTIMAAILAYTIGTTHFQRALIFILLTAVAPSMTMRCIGISNRDFVEGVSGGSWVDIVLEHGSCVTTMAKNKPTLDFELIETEAKQPATLRKYCIEAKLTNTTTDSRCPTQGEPSLNEEQDKRFVCKHSMVDRGWGNGCGLFGKGGIVTCAMFTCKKNMKGKVVQPENLEYTIVITPHSGEEHAVGNDTGKHGKEIKITPQSSITEAELTGYGTVTMECSPRTGLDFNEMVLLQMENKAWLVHRQWFLDLPLPW.... The pIC50 is 4.0. (6) The drug is CC(C)[C@H](CO)NC(=O)c1ccc2ncc(-c3cc4ccccc4o3)n2n1. The target protein sequence is MVSSQKLEKPIEMGSSEPLPIADGDRRRKKKRRGRATDSLPGKFEDMYKLTSELLGEGAYAKVQGAVSLQNGKEYAVKIIEKQAGHSRSRVFREVETLYQCQGNKNILELIEFFEDDTRFYLVFEKLQGGSILAHIQKQKHFNEREASRVVRDVAAALDFLHTKGIAHRDLKPENILCESPEKVSPVKICDFDLGSGMKLNNSCTPITTPELTTPCGSAEYMAPEVVEVFTDQATFYDKRCDLWSLGVVLYIMLSGYPPFVGHCGADCGWDRGEVCRVCQNKLFESIQEGKYEFPDKDWAHISSEAKDLISKLLVRDAKQRLSAAQVLQHPWVQGQAPEKGLPTPQVLQRNSSTMDLTLFAAEAIALNRQLSQHEENELAEEPEALADGLCSMKLSPPCKSRLARRRALAQAGRGEDRSPPTAL. The pIC50 is 8.2. (7) The drug is COc1ccccc1Nc1nc(Cl)nc2nc[nH]c12. The target protein (Q9FUJ3) has sequence MANLRLMITLITVLMITKSSNGIKIDLPKSLNLTLSTDPSIISAASHDFGNITTVTPGGVICPSSTADISRLLQYAANGKSTFQVAARGQGHSLNGQASVSGGVIVNMTCITDVVVSKDKKYADVAAGTLWVDVLKKTAEKGVSPVSWTDYLHITVGGTLSNGGIGGQVFRNGPLVSNVLELDVITGKGEMLTCSRQLNPELFYGVLGGLGQFGIITRARIVLDHAPKRAKWFRMLYSDFTTFTKDQERLISMANDIGVDYLEGQIFLSNGVVDTSFFPPSDQSKVADLVKQHGIIYVLEVAKYYDDPNLPIISKVIDTLTKTLSYLPGFISMHDVAYFDFLNRVHVEENKLRSLGLWELPHPWLNLYVPKSRILDFHNGVVKDILLKQKSASGLALLYPTNRNKWDNRMSAMIPEIDEDVIYIIGLLQSATPKDLPEVESVNEKIIRFCKDSGIKIKQYLMHYTSKEDWIEHFGSKWDDFSKRKDLFDPKKLLSPGQDI.... The pIC50 is 4.0. (8) The drug is CN(C)c1ccc(/C=C(\C#N)c2ccccc2Cl)cc1. The target protein (P35396) has sequence MEQPQEETPEAREEEKEEVAMGDGAPELNGGPEHTLPSSSCADLSQNSSPSSLLDQLQMGCDGASGGSLNMECRVCGDKASGFHYGVHACEGCKGFFRRTIRMKLEYEKCDRICKIQKKNRNKCQYCRFQKCLALGMSHNAIRFGRMPEAEKRKLVAGLTASEGCQHNPQLADLKAFSKHIYNAYLKNFNMTKKKARSILTGKSSHNAPFVIHDIETLWQAEKGLVWKQLVNGLPPYNEISVHVFYRCQSTTVETVRELTEFAKNIPNFSSLFLNDQVTLLKYGVHEAIFAMLASIVNKDGLLVANGSGFVTHEFLRSLRKPFSDIIEPKFEFAVKFNALELDDSDLALFIAAIILCGDRPGLMNVPQVEAIQDTILRALEFHLQVNHPDSQYLFPKLLQKMADLRQLVTEHAQMMQWLKKTESETLLHPLLQEIYKDMY. The pIC50 is 7.3. (9) The small molecule is CC(C)=CCC/C(C)=C/CC/C(C)=C/CC/C=C(\C)CC/C=C(\C)CCCNC1CC1. The target protein sequence is MWTFLGIATFTYFYKKCGDFVSLANKELLLGVLVFLSLGLVLSYRCRYRNGALLGRQQSGSQFAVFSDILSALPLIGFFWAKSPTGSEKKEQLGSRRGKKGSNISETTLVGAAASPLISSQNDPEIIIVGSGVLGSALAAVLSRDGRKVTVIERDLKEPDRILGEYLQPGGCHVLKDLGLEDTMEGIDAQVVDGYIIHDQESKSEVQIPFPLSENNHVQSGRAFRHGRFIMGLRKAAMAEPNAKFIEGTVLQLLEEEDVVLGVQYRDKETGDIKELHAPLTIVADGLFSKFRKNLISNKVSVSSHFVGFLMENAPQFKANHAELVLANPSPVLIYQISPSETRVLVDIRGEMPRNLREYMIENIYPQLPDHLKEPFLEASQNSHLRSMPASFLPSSPVNKRGVLLLGDAHNMRHPLTGGGMTVAFNDIKLWRKLLKGIPDLYDDAAILQAKKSFYWTRKMSHSFVVNVLAQALYELFSATDDSLYQLRKACFFYFKLGGE.... The pIC50 is 5.7. (10) The small molecule is CNC(=O)c1cc(OCCCCN2CCN(c3cccc4sccc34)CC2)ccc1F. The target protein sequence is MDILCEENTSLSSTTNSLMQLNDDTRLYSNDFNSGEANTSDAFNWTVDSENRTNLSCEGCLSPSCLSLLHLQEKNWSALLTAVVIILTIAGNILVIMAVSLEKKLQNATNYFLMSLAIADMLLGFLVMPVSMLTILYGYRWPLPSKLCAVWIYLDVLFSTASIMHLCAISLDRYVAIQNPIHHSRFNSRTKAFLKIIAVWTISVGISMPIPVFGLQDDSKVFKEGSCLLADDNFVLIGSFVSFFIPLTIMVITYFLTIKSLQKEATLCVSDLGTRAKLASFSFLPQSSLSSEKLFQRSIHREPGSYTGRRTMQSISNEQKACKVLGIVFFLFVVMWCPFFITNIMAVICKESCNEDVIGALLNVFVWIGYLSSAVNPLVYTLFNKTYRSAFSRYIQCQYKENKKPLQLILVNTIPALAYKSSQLQMGQKKNSKQDAKTTDNDCSMVALGKQHSEEASKDNSDGVNEKVSCV. The pIC50 is 7.1.