Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50.. Dataset: Drug-target binding data from BindingDB using IC50 measurements (1) The small molecule is CCCCCC(=O)Cc1cc(O)cc2c1C(=O)Oc1cc(O)c(C(=O)O)c(CCCCC)c1O2. The target protein (Q96Q89) has sequence MESNFNQEGVPRPSYVFSADPIARPSEINFDGIKLDLSHEFSLVAPNTEANSFESKDYLQVCLRIRPFTQSEKELESEGCVHILDSQTVVLKEPQCILGRLSEKSSGQMAQKFSFSKVFGPATTQKEFFQGCIMQPVKDLLKGQSRLIFTYGLTNSGKTYTFQGTEENIGILPRTLNVLFDSLQERLYTKMNLKPHRSREYLRLSSEQEKEEIASKSALLRQIKEVTVHNDSDDTLYGSLTNSLNISEFEESIKDYEQANLNMANSIKFSVWVSFFEIYNEYIYDLFVPVSSKFQKRKMLRLSQDVKGYSFIKDLQWIQVSDSKEAYRLLKLGIKHQSVAFTKLNNASSRSHSIFTVKILQIEDSEMSRVIRVSELSLCDLAGSERTMKTQNEGERLRETGNINTSLLTLGKCINVLKNSEKSKFQQHVPFRESKLTHYFQSFFNGKGKICMIVNISQCYLAYDETLNVLKFSAIAQKVCVPDTLNSSQEKLFGPVKSSQ.... The pIC50 is 5.3. (2) The compound is COc1cccc(S(=O)(=O)Nc2cc3c(cc2Oc2ccccc2)n(C)c(=O)n3C)c1. The target protein (O95696) has sequence MRRKGRCHRGSAARHPSSPCSVKHSPTRETLTYAQAQRMVEIEIEGRLHRISIFDPLEIILEDDLTAQEMSECNSNKENSERPPVCLRTKRHKNNRVKKKNEALPSAHGTPASASALPEPKVRIVEYSPPSAPRRPPVYYKFIEKSAEELDNEVEYDMDEEDYAWLEIVNEKRKGDCVPAVSQSMFEFLMDRFEKESHCENQKQGEQQSLIDEDAVCCICMDGECQNSNVILFCDMCNLAVHQECYGVPYIPEGQWLCRHCLQSRARPADCVLCPNKGGAFKKTDDDRWGHVVCALWIPEVGFANTVFIEPIDGVRNIPPARWKLTCYLCKQKGVGACIQCHKANCYTAFHVTCAQKAGLYMKMEPVKELTGGGTTFSVRKTAYCDVHTPPGCTRRPLNIYGDVEMKNGVCRKESSVKTVRSTSKVRKKAKKAKKALAEPCAVLPTVCAPYIPPQRLNRIANQVAIQRKKQFVERAHSYWLLKRLSRNGAPLLRRLQSSL.... The pIC50 is 5.7. (3) The small molecule is CC(=O)N1CCc2c(c(Nc3ccc(-c4cnn(C)c4)cc3F)nn2[C@@H]2CCOC2)C1. The target protein (Q9H0E9) has sequence MATGTGKHKLLSTGPTEPWSIREKLCLASSVMRSGDQNWVSVSRAIKPFAEPGRPPDWFSQKHCASQYSELLETTETPKRKRGEKGEVVETVEDVIVRKLTAERVEELKKVIKETQERYRRLKRDAELIQAGHMDSRLDELCNDIATKKKLEEEEAEVKRKATDAAYQARQAVKTPPRRLPTVMVRSPIDSASPGGDYPLGDLTPTTMEEATSGVNESEMAVASGHLNSTGVLLEVGGVLPMIHGGEIQQTPNTVAASPAASGAPTLSRLLEAGPTQFTTPLASFTTVASEPPVKLVPPPVESVSQATIVMMPALPAPSSAPAVSTTESVAPVSQPDNCVPMEAVGDPHTVTVSMDSSEISMIINSIKEECFRSGVAEAPVGSKAPSIDGKEELDLAEKMDIAVSYTGEELDFETVGDIIAIIEDKVDDHPEVLDVAAVEAALSFCEENDDPQSLPGPWEHPIQQERDKPVPLPAPEMTVKQERLDFEETENKGIHELVD.... The pIC50 is 4.7. (4) The drug is COc1cc(C=CC(=O)CC(=O)C=Cc2ccc(O)cc2)ccc1O. The target protein (P15840) has sequence MSKVENKTKKLRVFEAFAGIGAQRKALEKVRKDEYEIVGLAEWYVPAIVMYQAIHNNFHTKLEYKSVSREEMIDYLENKTLSWNSKNPVSNGYWKRKKDDELKIIYNAIKLSEKEGNIFDIRDLYKRTLKNIDLLTYSFPCQDLSQQGIQKGMKRGSGTRSGLLWEIERALDSTEKNDLPKYLLMENVGALLHKKNEEELNQWKQKLESLGYQNSIEVLNAADFGSSQARRRVFMISTLNEFVELPKGDKKPKSIKKVLNKIVSEKDILNNLLKYNLTEFKKTKSNINKASLIGYSKFNSEGYVYDPEFTGPTLTASGANSRIKIKDGSNIRKMNSDETFLYIGFDSQDGKRVNEIEFLTENQKIFVCGNSISVEVLEAIIDKIGG. The pIC50 is 7.5. (5) The small molecule is CC1=N[C@H]2[C@@H](O[C@H](CO)[C@@H](O[C@@H]3O[C@H](CO[C@H]4O[C@H](CO)[C@@H](O)[C@H](O)[C@@H]4O)[C@@H](O)[C@H](O[C@@H]4O[C@H](CO)[C@@H](O)[C@H](O)[C@@H]4O)[C@@H]3O)[C@@H]2O)S1. The target protein sequence is MRKAFLVGLVCTACVLLHDDPVAASTYNGPLSSHWFPEELAQWEPDSDPDAPFNRSHVPLEPGRVANRVNANADKDAHLVSLSALNRHTSGVPSQGAPVFYENTFSYWHYTDLMVYWAGSAGEGIIVPPSADVIDASHRNGVPILGNVFFPPTVYGGQLEWLEQMLEQEEDGSFPLADKLLEVADYYGFDGWFINQETEGADEGTAEAMQAFLVYLQEQKPEGMHIMWYDSMIDTGAIAWQNHLTDRNKMYLQNGSTRVADSMFLNFWWRDQRQSNELAQALGRSPYDLYAGVDVEARGTSTPVQWEGLFPEGEKAHTSLGLYRPDWAFQSSETMEAFYEKELQFWVGSTGNPAETDGQSNWPGMAHWFPAKSTATSVPFVTHFNTGSGAQFSAEGKTVSEQEWNNRSLQDVLPTWRWIQHGGDLEATFSWEEAFEGGSSLQWHGSLAEGEHAQIELYQTELPISEGTSLTWTFKSEHGNDLNVGFRLDGEEDFRYVEGE.... The pIC50 is 4.9. (6) The target protein sequence is MPFVNKQFNYKDPVNGVDIAYIKIPNAGQMQPVKAFKIHNKIWVIPERDTFTNPEEGDLNPPPEAKQVPVSYYDSTYLSTDNEKDNYLKGVTKLFERIYSTDLGRMLLTSIVRGIPFWGGSTIDTELKVIDTNCINVIQPDGSYRSEELNLVIIGPSADIIQFECKSFGHDVLNLTRNGYGSTQYIRFSPDFTFGFEESLEVDTNPLLGAGKFATDPAVTLAHELIHAEHRLYGIAINPNRVFKVNTNAYYEMSGLEVSFEELRTFGGHDAKFIDSLQENEFRLYYYNKFKDVASTLNKAKSIIGTTASLQYMKNVFKEKYLLSEDTSGKFSVDKLKFDKLYKMLTEIYTEDNFVNFFKVINRKTYLNFDKAVFRINIVPDENYTIKDGFNLKGANLSTNFNGQNTEINSRNFTRLKNFTGLFEF. The drug is O=C(CS)Nc1cc(-c2ccccc2C(F)(F)F)n[nH]1. The pIC50 is 5.1. (7) The compound is Cc1ccc(C23OC[C@@](CO)(O2)[C@@H](O)[C@H](O)[C@H]3O)cc1Cc1ccc(-c2ccc(F)cc2)s1. The target protein (Q9QXI6) has sequence MDSSTLSPAVTATDAPIPSYERIRNAADISVIVIYFVVVMAVGLWAMFSTNRGTVGGFFLAGRSMVWWPIGASLFASNIGSGHFVGLAGTGAAAGIAMGGFEWNALVLVVVLGWIFVPIYIKAGVVTMPEYLRKRFGGKRIQIYLSVLSLLLYIFTKISADIFSGAIFINLALGLDIYLAIFILLAITALYTITGGLAAVIYTDTLQTAIMLVGSFILTGFAFNEVGGYEAFMDKYMKAIPTKVSNGNFTAKEECYTPRADSFHIFRDPITGDMPWPGLIFGLAILALWYWCTDQVIVQRCLSAKNMSHVKAGCTLCGYLKLLPMFLMVMPGMISRILYTEKIACVLPEECQKYCGTPVGCTNIAYPTLVVELMPNGLRGLMLSVMMASLMSSLTSIFNSASTLFTMDIYTKIRKKASEKELMIAGRLFILVLIGISIAWVPIVQSAQSGQLFDYIQSITSYLGPPIAAVFLLAIFCKRVNEQGAFWGLILGFLIGISRM.... The pIC50 is 5.9. (8) The compound is COc1ccc2nc(SCc3cccc(Oc4ccccc4)c3)[nH]c2c1. The target protein (P10632) has sequence MEPFVVLVLCLSFMLLFSLWRQSCRRRKLPPGPTPLPIIGNMLQIDVKDICKSFTNFSKVYGPVFTVYFGMNPIVVFHGYEAVKEALIDNGEEFSGRGNSPISQRITKGLGIISSNGKRWKEIRRFSLTTLRNFGMGKRSIEDRVQEEAHCLVEELRKTKASPCDPTFILGCAPCNVICSVVFQKRFDYKDQNFLTLMKRFNENFRILNSPWIQVCNNFPLLIDCFPGTHNKVLKNVALTRSYIREKVKEHQASLDVNNPRDFIDCFLIKMEQEKDNQKSEFNIENLVGTVADLFVAGTETTSTTLRYGLLLLLKHPEVTAKVQEEIDHVIGRHRSPCMQDRSHMPYTDAVVHEIQRYSDLVPTGVPHAVTTDTKFRNYLIPKGTTIMALLTSVLHDDKEFPNPNIFDPGHFLDKNGNFKKSDYFMPFSAGKRICAGEGLARMELFLFLTTILQNFNLKSVDDLKNLNTTAVTKGIVSLPPSYQICFIPV. The pIC50 is 4.3.