Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50. From a dataset of Drug-target binding data from BindingDB using IC50 measurements. (1) The drug is COc1cc(-c2ccc(=O)[nH]n2)ccc1OC(F)F. The target protein sequence is HMKVSDDEYTKLLHDGIQPVAAIDSNFASFTYTPRSLPEDDTSMAILSMLQDMNFINNYKIDCPTLARFCLMVKKGYRDPPYHNWMHAFSVSHFCYLLYKNLELTNYLEDIEIFALFISCMCHDLDHRGTNNSFQVASKSVLAALYSSEGSVMERHHFAQAIAILNTHGCNIFDHFSRKDYQRMLDLMRDIILATDLAHHLRIFKDLQKMAEVGYDRNNKQHHRLLLCLLMTSCDLSDQTKGWKTTRKIAELIYKEFFSQGDLEKAMGNRPMEMMDREKAYIPELQISFMEHIAMPIYKLLQDLFPKAAELYERVASNREHWTKVSHKFTIRGLPSNNSLDFLDEEYEVPDLDGTRAPINGCCSLDAE. The pIC50 is 3.7. (2) The drug is COC(=O)c1c(-c2cc(C)on2)n[nH]c1C. The target protein sequence is MGSSHHHHHHSSGLVPRGSHMTEQEDVLAKELEDVNKWGLHVFRIAELSGNRPLTVIMHTIFQERDLLKTFKIPVDTLITYLMTLEDHYHADVAYHNNIHAADVVQSTHVLLSTPALEAVFTDLEILAAIFASAIHDVDHPGVSNQFLINTNSELALMYNDSSVLENHHLAVGFKLLQEENCDIFQNLTKKQRQSLRKMVIDIVLATDMSKHMNLLADLKTMVETKKVTSSGVLLLDNYSDRIQVLQNMVHCADLSNPTKPLQLYRQWTDRIMEEFFRQGDRERERGMEISPMCDKHNASVEKSQVGFIDYIVHPLWETWADLVHPDAQDILDTLEDNREWYQSTIPQS. The pIC50 is 3.7. (3) The drug is CO/N=C(\C(=O)NCP(=O)(O)Oc1ccc(C#N)c(F)c1)c1cccs1. The target protein (P0A3M1) has sequence MRYIRLCIISLLATLPLAVHASPQPLEQIKLSESQLSGRVGMIEMDLASGRTLTAWRADERFPMMSTFKVVLCGAVLARVDAGDEQLERKIHYRQQDLVDYSPVSEKHLADGMTVGELCAAAITMSDNSAANLLLATVGGPAGLTAFLRQIGDNVTRLDRWETELNEALPGDARDTTTPASMAATLRKLLTSQRLSARSQRQLLQWMVDDRVAGPLIRSVLPAGWFIADKTGASKRGARGIVALLGPNNKAERIVVIYLRDTPASMAERNQQIAGIGAALIEHWQR. The pIC50 is 4.9. (4) The compound is Nc1ncnc2c1c(-c1ccc3nc(Cc4ccccc4Cl)[nH]c3c1)nn2[C@H]1CC[C@H](N2CCOCC2)CC1. The target protein sequence is RHIVRKRTLRRLLQERELVEPLTPSGEAPNQALLRILKETEFKKIKVLGSGAFGTVYKGLWIPEGEKVKIPVAIKELREATSPKANKEILDEAYVMASVDNPHVCRLLGICLTSTVQLITQLMPFGCLLDYVREHKDNIGSQYLLNWCVQIAKGMNYLEDRRLVHRDLAARNVLVKTPQHVKITDFGRAKLLGAEEKEYHAEGGKVPIKWMALESILHRIYTHQSDVWSYGVTVWELMTFGSKPYDGIPASEISSILEKGERLPQPPICTIDVYMIMVKCWMIDADSRPKFRELIIEFSKMARDPQRYLVIQGDERMHLPSPTDSNFYRALMDEEDMDDVVDADEYLIPQQGFFSSPSTSRTPLLSSLSATSNNSTVACIDRNGLQSCPIKEDSFLQRYSSDPTGALTEDSIDDTFLPVPEYINQSVPKRPAGSVQNPVYHNQPLNPAPSRDPHYQDPHSTAVGNPEYLNTVQPTCVNSTFDSPAHWAQKGSHQISLDNP.... The pIC50 is 7.2. (5) The small molecule is CNC(=O)c1c(-c2ccc(F)cc2)oc2cc(N3CCCC3=O)c(-c3ccc4c(n3)-c3cc5cc(F)ccc5n3CO4)cc12. The target protein (P26662) has sequence MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGRRQPIPKARRPEGRTWAQPGYPWPLYGNEGMGWAGWLLSPRGSRPSWGPTDPRRRSRNLGKVIDTLTCGFADLMGYIPLVGAPLGGAARALAHGVRVLEDGVNYATGNLPGCSFSIFLLALLSCLTIPASAYEVRNVSGIYHVTNDCSNSSIVYEAADMIMHTPGCVPCVRESNFSRCWVALTPTLAARNSSIPTTTIRRHVDLLVGAAALCSAMYVGDLCGSVFLVSQLFTFSPRRYETVQDCNCSIYPGHVSGHRMAWDMMMNWSPTTALVVSQLLRIPQAVVDMVAGAHWGVLAGLAYYSMVGNWAKVLIVMLLFAGVDGHTHVTGGRVASSTQSLVSWLSQGPSQKIQLVNTNGSWHINRTALNCNDSLQTGFIAALFYAHRFNASGCPERMASCRPIDEFAQGWGPITHDMPESSDQRPYCWHYAPRPCGIVPAS.... The pIC50 is 8.3. (6) The drug is CO/N=C(\C(=O)NCP(=O)(O)Oc1ccc(C#N)c(F)c1)c1cnc(NC(=O)C(Cl)(Cl)Cl)s1. The target protein (P0A3M1) has sequence MRYIRLCIISLLATLPLAVHASPQPLEQIKLSESQLSGRVGMIEMDLASGRTLTAWRADERFPMMSTFKVVLCGAVLARVDAGDEQLERKIHYRQQDLVDYSPVSEKHLADGMTVGELCAAAITMSDNSAANLLLATVGGPAGLTAFLRQIGDNVTRLDRWETELNEALPGDARDTTTPASMAATLRKLLTSQRLSARSQRQLLQWMVDDRVAGPLIRSVLPAGWFIADKTGASKRGARGIVALLGPNNKAERIVVIYLRDTPASMAERNQQIAGIGAALIEHWQR. The pIC50 is 4.8. (7) The pIC50 is 7.0. The target protein (P16473) has sequence MRPADLLQLVLLLDLPRDLGGMGCSSPPCECHQEEDFRVTCKDIQRIPSLPPSTQTLKLIETHLRTIPSHAFSNLPNISRIYVSIDVTLQQLESHSFYNLSKVTHIEIRNTRNLTYIDPDALKELPLLKFLGIFNTGLKMFPDLTKVYSTDIFFILEITDNPYMTSIPVNAFQGLCNETLTLKLYNNGFTSVQGYAFNGTKLDAVYLNKNKYLTVIDKDAFGGVYSGPSLLDVSQTSVTALPSKGLEHLKELIARNTWTLKKLPLSLSFLHLTRADLSYPSHCCAFKNQKKIRGILESLMCNESSMQSLRQRKSVNALNSPLHQEYEENLGDSIVGYKEKSKFQDTHNNAHYYVFFEEQEDEIIGFGQELKNPQEETLQAFDSHYDYTICGDSEDMVCTPKSDEFNPCEDIMGYKFLRIVVWFVSLLALLGNVFVLLILLTSHYKLNVPRFLMCNLAFADFCMGMYLLLIASVDLYTHSEYYNHAIDWQTGPGCNTAGFF.... The drug is CC(=O)N1c2ccc(NC(=O)CCc3cccc(Cl)c3)cc2[C@@](C)(c2ccccc2)CC1(C)C. (8) The drug is CSC[C@H]1O[C@@H](n2cnc3c(N)ncnc32)[C@H](O)[C@@H]1O. The target protein sequence is MDKLISNNKLKLSVVLLGGLCSLAYYHLKNKFHLSQFCFSKKWFSEFSIMWPGQAFSVEIKKILYETKSKYQNVLVFESTTYGKVLVLDGVIQLTEKDEFAYHEMMTHIPMTVSKEPKNVLVVGGGDGGIIRELCKYKSVENIDICEIDETVIEVSKIYFKNISCGYEDKRVNVFIEDASKFLENVTNTYDVIIVDSSDPIGPAETLFNQNFYEKIYNALKPNGYCVAQCESLWIHVGTIKNMIGYAKKLFKKVEYANISIPTYPCGCIGILCCSKTDTGLTKPNKKLESKEFADLKYYNYENHSAAFKLPAFLLKEIENI. The pIC50 is 3.8. (9) The pIC50 is 8.0. The drug is COc1cc(N2CCC(N3CCN(C)CC3)CC2)ccc1Nc1ncc(Cl)c(Nc2ccccc2S(=O)(=O)C(C)C)n1. The target protein (P97793) has sequence MGAAGFLWLLPPLLLAAASYSGAATDQRAGSPASGPPLQPREPLSYSRLQRKSLAVDFVVPSLFRVYARDLLLPQPRSPSEPEAGGLEARGSLALDCEPLLRLLGPLPGISWADGASSPSPEAGPTLSRVLKGGSVRKLRRAKQLVLELGEETILEGCIGPPEEVAAVGILQFNLSELFSWWILHGEGRLRIRLMPEKKASEVGREGRLSSAIRASQPRLLFQIFGTGHSSMESPSETPSPPGTFMWNLTWTMKDSFPFLSHRSRYGLECSFDFPCELEYSPPLHNHGNQSWSWRHVPSEEASRMNLLDGPEAEHSQEMPRGSFLLLNTSADSKHTILSPWMRSSSDHCTLAVSVHRHLQPSGRYVAQLLPHNEAGREILLVPTPGKHGWTVLQGRVGRPANPFRVALEYISSGNRSLSAVDFFALKNCSEGTSPGSKMALQSSFTCWNGTVLQLGQACDFHQDCAQGEDEGQLCSKLPAGFYCNFENGFCGWTQSPLSP.... (10) The compound is CC(NCc1ccc(N(C)C)cc1)C1CC2CCC1C2. The target is TRQARRNRRRRWRERQR. The pIC50 is 4.1.