Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50.. Dataset: Drug-target binding data from BindingDB using IC50 measurements (1) The small molecule is C[n+]1ccc(-c2c3nc(c(-c4cc[n+](C)cc4)c4ccc([nH]4)c(-c4cc[n+](C)cc4)c4ccc([nH]4)c(-c4cc[n+](C)cc4)c4nc2C=C4)C=C3)cc1. The target protein sequence is MSSMWSEYTIGGVKIYFPYKAYPSQLAMMNSILRGLNSKQHCLLESPTGSGKSLALLCSALAWQQSLSGKPADEGVSEKAEVQLSCCCACHSKDFTNNDMNQGTSRHFNYPSTPPSERNGTSSTCQDSPEKTTLAAKLSAKKQASIYRDENDDFQVEKKRIRPLETTQQIRKRHCFGTEVHNLDAKVDSGKTVKLNSPLEKINSFSPQKPPGHCSRCCCSTKQGNSQESSNTIKKDHTGKSKIPKIYFGTRTHKQIAQITRELRRTAYSGVPMTILSSRDHTCVHPEVVGNFNRNEKCMELLDGKNGKSCYFYHGVHKISDQHTLQTFQGMCKAWDIEELVSLGKKLKACPYYTARELIQDADIIFCPYNYLLDAQIRESMDLNLKEQVVILDEAHNIEDCARESASYSVTEVQLRFARDELDSMVNNNIRKKDHEPLRAVCCSLINWLEANAEYLVERDYESACKIWSGNEMLLTLHKMGITTATFPILQGHFSAVLQK.... The pIC50 is 5.5. (2) The drug is CC(C)Nc1cc(-c2cnc3cc(C#N)cnn23)ncc1-c1cc(C2CCOCC2)[nH]n1. The target protein (P48544) has sequence MAGDSRNAMNQDMEIGVTPWDPKKIPKQARDYVPIATDRTRLLAEGKKPRQRYMEKSGKCNVHHGNVQETYRYLSDLFTTLVDLKWRFNLLVFTMVYTVTWLFFGFIWWLIAYIRGDLDHVGDQEWIPCVENLSGFVSAFLFSIETETTIGYGFRVITEKCPEGIILLLVQAILGSIVNAFMVGCMFVKISQPKKRAETLMFSNNAVISMRDEKLCLMFRVGDLRNSHIVEASIRAKLIKSRQTKEGEFIPLNQTDINVGFDTGDDRLFLVSPLIISHEINQKSPFWEMSQAQLHQEEFEVVVILEGMVEATGMTCQARSSYMDTEVLWGHRFTPVLTLEKGFYEVDYNTFHDTYETNTPSCCAKELAEMKREGRLLQYLPSPPLLGGCAEAGLDAEAEQNEEDEPKGLGGSREARGSV. The pIC50 is 8.3. (3) The small molecule is Cc1c(C(=O)Nc2c(C)n(C)n(C3CCCCC3)c2=O)noc1/C=C/C(C)(C)C. The pIC50 is 7.2. The target protein sequence is RPKDLKKRLMVKFRGEEGLDYGGVAREWLYLLCHEMLNPYYGLFQYSTDNIYMLQINPDSSINPDHLSYFHFVGRIMGLAVFHGHYINGGFTVPFYKQLLGKPIQLSDLESVDPELHKSLVWILENDITPVLDHTFCVEHNAFGRILQHELKPNGRNVPVTEENKKEYVRLYVNWRFMRGIEAQFLALQKGFNELIPQHLLKPFDQKELELIIGGLDKIDLNDWKSNTRLKHCVADSNIVRWFWQAVETFDEERRARLLQFVTGSTRVPLQGFKALQGSTGAAGPRLFTIHLIDANTDNLPKAHTCFNRIDIPPYESYEKLYEKLLTAVEETCGFAVE. (4) The drug is CC(C)(C)OC(=O)N1CCN(C(=O)/C=C\C(=O)Nc2cccc(C(N)=O)c2)CC1. The target protein (Q460N5) has sequence MAVPGSFPLLVEGSWGPDPPKNLNTKLQMYFQSPKRSGGGECEVRQDPRSPSRFLVFFYPEDVRQKVLERKNHELVWQGKGTFKLTVQLPATPDEIDHVFEEELLTKESKTKEDVKEPDVSEELDTKLPLDGGLDKMEDIPEECENISSLVAFENLKANVTDIMLILLVENISGLSNDDFQVEIIRDFDVAVVTFQKHIDTIRFVDDCTKHHSIKQLQLSPRLLEVTNTIRVENLPPGADDYSLKLFFENPYNGGGRVANVEYFPEESSALIEFFDRKVLDTIMATKLDFNKMPLSVFPYYASLGTALYGKEKPLIKLPAPFEESLDLPLWKFLQKKNHLIEEINDEMRRCHCELTWSQLSGKVTIRPAATLVNEGRPRIKTWQADTSTTLSSIRSKYKVNPIKVDPTMWDTIKNDVKDDRILIEFDTLKEMVILAGKSEDVQSIEVQVRELIESTTQKIKREEQSLKEKMIISPGRYFLLCHSSLLDHLLTECPEIEIC.... The pIC50 is 6.4. (5) The small molecule is COc1cc(C(C#Cc2c(C)nc(N)nc2N)OC)cc(OC)c1OC. The target protein sequence is MSEKNVSIVVAASVLSSGIGINGQLPWSISEDLKFFSKITNNKCDSNKKNALIMGRKTWDSIGRRPLKNRIIVVISSSLPQDEADPNVVVFRNLEDSIENLMNDDSIENIFVCGGESIYRDALKDNFVDRIYLTRVALEDIEFDTYFPEIPETFLPVYMSQTFCTKNISYDFMIFEKQEKKTLQNCDPARGQLKSIDDTVDLLGEIFGIRKMGNRHKFPKEEIYNTPSIRFGREHYEFQYLDLLSRVLENGAYRENRTGISTYSIFGQMMRFDMRESFPLLTTKKVAIRSIFEELIWFIKGDTNGNHLIEKKVYIWSGNGSKEYLERIGLGHREENDLGPIYGFQWRHYNGEYKTMHDDYTGVGVDQLAKLIETLKNNPKDRRHILTAWNPSALSQMALPPCHVLSQYYVTNDNCLSCNLYQRSCDLGLGSPFNIASYAILTMMLAQVCGYEPGELAIFIGDAHIYENHLTQLKEQLSRTPRPFPQLKFKRKVENIEDFK.... The pIC50 is 5.3. (6) The drug is CC1(C)CC[C@]2(C(=O)O)CC[C@]3(C)C(=CC[C@@H]4[C@@]5(C)CCC(=O)C(C)(C)[C@@H]5CC[C@]43C)[C@@H]2C1. The target protein (P13726) has sequence METPAWPRVPRPETAVARTLLLGWVFAQVAGASGTTNTVAAYNLTWKSTNFKTILEWEPKPVNQVYTVQISTKSGDWKSKCFYTTDTECDLTDEIVKDVKQTYLARVFSYPAGNVESTGSAGEPLYENSPEFTPYLETNLGQPTIQSFEQVGTKVNVTVEDERTLVRRNNTFLSLRDVFGKDLIYTLYYWKSSSSGKKTAKTNTNEFLIDVDKGENYCFSVQAVIPSRTVNRKSTDSPVECMGQEKGEFREIFYIIGAVVFVVIILVIILAISLHKCRKAGVGQSWKENSPLNVS. The pIC50 is 9.5. (7) The small molecule is O=C(CNC(=O)[C@@H]1COc2ccccc2O1)NN=Cc1cccs1. The target protein sequence is MEYSPNEVIKQEREVFVGKEKSGSKFKRKRSIFIVLTVSICFMFALMLFYFTRNENNKTLFTNSLSNNINDDYIINSLLKSESGKKFIVSKLEELISSYDKEKKMRTTGAEENNMNMNGIDDKDNKSVSFVNKKNGNLKVNNNNQVSYSNLFDTKFLMDNLETVNLFYIFLKENNKKYETSEEMQKRFIIFSENYRKIELHNKKTNSLYKRGMNKFGDLSPEEFRSKYLNLKTHGPFKTLSPPVSYEANYEDVIKKYKPADAKLDRIAYDWRLHGGVTPVKDQALCGSCWAFSSVGSVESQYAIRKKALFLFSEQELVDCSVKNNGCYGGYITNAFDDMIDLGGLCSQDDYPYVSNLPETCNLKRCNERYTIKSYVSIPDDKFKEALRYLGPISISIAASDDFAFYRGGFYDGECGAAPNHAVILVGYGMKDIYNEDTGRMEKFYYYIIKNSWGSDWGEGGYINLETDENGYKKTCSIGTEAYVPLLE. The pIC50 is 4.3. (8) The drug is O=C1NCc2nc(Sc3ccccc3)ccc2N1c1c(Cl)cccc1Cl. The target protein sequence is MSQERPTFYRQELNKTIWEVPERYQNLSPVGSGAYGSVCAAFDTKTGLRVAVKKLSRPFQSIIHAKRTYRELRLLKHMKHENVIGLLDVFTPARSLEEFNDVYLVTHLMGADLNNIVKCQKLTDDHVQFLIYQILRGLKYIHSADIIHRDLKPSNLAVNEDCELKILDFGLARHTDDEMTGYVATRWYRAPEIMLNWMHYNQTVDIWSVGCIMAELLTGRTLFPGTDHIDQLKLILRLVGTPGAELLKKISSESARNYIQSLTQMPKMNFANVFIGANPLAVDLLEKMLVLDSDKRITAAQALAHAYFAQYHDPDDEPVADPYDQSFESRDLLIDEWKSLTYDEVISFVPPPLDQEEMES. The pIC50 is 6.6.