Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50.. Dataset: Drug-target binding data from BindingDB using IC50 measurements (1) The small molecule is Cc1ccc(S(=O)(=O)OCC(=O)Nc2ccc(C(=O)O)c(O)c2)cc1. The target protein (P42225) has sequence MSQWFELQQLDSKFLEQVHQLYDDSFPMEIRQYLAQWLEKQDWEHAAYDVSFATIRFHDLLSQLDDQYSRFSLENNFLLQHNIRKSKRNLQDNFQEDPVQMSMIIYNCLKEERKILENAQRFNQAQEGNIQNTVMLDKQKELDSKVRNVKDQVMCIEQEIKTLEELQDEYDFKCKTSQNREGEANGVAKSDQKQEQLLLHKMFLMLDNKRKEIIHKIRELLNSIELTQNTLINDELVEWKRRQQSACIGGPPNACLDQLQTWFTIVAETLQQIRQQLKKLEELEQKFTYEPDPITKNKQVLSDRTFLLFQQLIQSSFVVERQPCMPTHPQRPLVLKTGVQFTVKSRLLVKLQESNLLTKVKCHFDKDVNEKNTVKGFRKFNILGTHTKVMNMEESTNGSLAAELRHLQLKEQKNAGNRTNEGPLIVTEELHSLSFETQLCQPGLVIDLETTSLPVVVISNVSQLPSGWASILWYNMLVTEPRNLSFFLNPPCAWWSQLSE.... The pIC50 is 3.5. (2) The compound is Cc1nc2ccccc2n1CCCSCCCSc1nc(-c2ccccc2)c(-c2ccccc2)[nH]1. The target protein (Q61263) has sequence MSLRNRLSKSGENPEQDEAQKNFMDTYRNGHITMKQLIAKKRLLAAEAEELKPLFMKEVGCHFDDFVTNLIEKSASLDNGGCALTTFSILEEMKKNHRAKDLRAPPEQGKIFISRQSLLDELFEVDHIRTIYHMFIALLILFVLSTIVVDYIDEGRLVLEFNLLAYAFGKFPTVIWTWWAMFLSTLSIPYFLFQRWAHGYSKSSHPLIYSLVHGLLFLVFQLGVLGFVPTYVVLAYTLPPASRFILILEQIRLIMKAHSFVRENIPRVLNAAKEKSSKDPLPTVNQYLYFLFAPTLIYRDNYPRTPTVRWGYVAMQFLQVFGCLFYVYYIFERLCAPLFRNIKQEPFSARVLVLCVFNSILPGVLILFLSFFAFLHCWLNAFAEMLRFGDRMFYKDWWNSTSYSNYYRTWNVVVHDWLYYYVYKDLLWFFSKRFKSAAMLAVFALSAVVHEYALAICLSYFYPVLFVLFMFFGMAFNFIVNDSRKRPIWNIMVWASLFLG.... The pIC50 is 5.1. (3) The drug is CSCC[C@H](NC(=O)[C@@H](Cc1ccc(C)cc1C)NC(=O)[C@@H](NC(=O)[C@@H](N)CS)C(C)C)C(=O)O. The target protein (P29702) has sequence MAAADGVGEAAQGGDPGQPEPPPPPQPHPPPPPPQPPQEEAAAASPIDDGFLSLDSPTYVLYRDRPEWADIDPVPQNDGPNPVVQIIYSEKFQDVYDYFRAVLQRDERSERAFKLTRDAIELNAANYTVWHFRRVLLKSLQKDLHEEMNYISAIIEEQPKNYQVWHHRRVLVEWLRDPSQELEFIADILTQDAKNYHAWQHRQWVIQEFKLWDNELQYVDQLLKEDVRNNSVWNQRYFVISNTTGYNDRAILEREVQYTLEMIKLVPHNESAWNYLKGILQDRGLSKYPNLLNQLLDLQPSHSSPYLIAFLVDIYEDMLENQCDNKEDILNKALELCEILAKEKDTIRKEYWRYIGRSLQSKHSTESDPPTNVQQ. The pIC50 is 5.4. (4) The compound is NS(=O)(=O)c1ccc(CCNC(=S)Nc2cccc(C(F)(F)F)c2)cc1. The target protein (P00959) has sequence MTQVAKKILVTCALPYANGSIHLGHMLEHIQADVWVRYQRMRGHEVNFICADDAHGTPIMLKAQQLGITPEQMIGEMSQEHQTDFAGFNISYDNYHSTHSEENRQLSELIYSRLKENGFIKNRTISQLYDPEKGMFLPDRFVKGTCPKCKSPDQYGDNCEVCGATYSPTELIEPKSVVSGATPVMRDSEHFFFDLPSFSEMLQAWTRSGALQEQVANKMQEWFESGLQQWDISRDAPYFGFEIPNAPGKYFYVWLDAPIGYMGSFKNLCDKRGDSVSFDEYWKKDSTAELYHFIGKDIVYFHSLFWPAMLEGSNFRKPSNLFVHGYVTVNGAKMSKSRGTFIKASTWLNHFDADSLRYYYTAKLSSRIDDIDLNLEDFVQRVNADIVNKVVNLASRNAGFINKRFDGVLASELADPQLYKTFTDAAEVIGEAWESREFGKAVREIMALADLANRYVDEQAPWVVAKQEGRDADLQAICSMGINLFRVLMTYLKPVLPKLT.... The pIC50 is 4.3. (5) The compound is Cc1ccc(CNC(=O)c2cc(O[C@@H]3CCOC3)cc(-c3ncc(C)s3)c2)nn1. The target protein (Q9UBL9) has sequence MAAAQPKYPAGATARRLARGCWSALWDYETPKVIVVRNRRLGVLYRAVQLLILLYFVWYVFIVQKSYQESETGPESSIITKVKGITTSEHKVWDVEEYVKPPEGGSVFSIITRVEATHSQTQGTCPESIRVHNATCLSDADCVAGELDMLGNGLRTGRCVPYYQGPSKTCEVFGWCPVEDGASVSQFLGTMAPNFTILIKNSIHYPKFHFSKGNIADRTDGYLKRCTFHEASDLYCPIFKLGFIVEKAGESFTELAHKGGVIGVIINWDCDLDLPASECNPKYSFRRLDPKHVPASSGYNFRFAKYYKINGTTTRTLIKAYGIRIDVIVHGQAGKFSLIPTIINLATALTSVGVGSFLCDWILLTFMNKNKVYSHKKFDKVCTPSHPSGSWPVTLARVLGQAPPEPGHRSEDQHPSPPSGQEGQQGAECGPAFPPLRPCPISAPSEQMVDTPASEPAQASTPTDPKGLAQL. The pIC50 is 6.0. (6) The small molecule is COCCNc1ccc2c3c(cccc13)C(=O)N(c1cccc(Br)c1)C2=O. The target protein sequence is MSTNPKPQRKTKRNTNRRPQDVKFPGGGQIVGGVYLLPRRGPRLGVRATRKTSERSQPRGRRQPIPKDRRSTGKSWGKPGYPWPLYGNEGCGWAGWLLSPRGSRPTWGPTDPRHRSRNLGRVIDTITCGFADLMGYIPVVGAPVGGVARALAHGVRVLEDGINYATGNLPGCSFSIFLLALLSCVTVPVSAVEVRNISSSYYATNDCSNNSITWQLTDAVLHLPGCVPCENDNGTLHCWIQVTPNVAVKHRGALTRSLRTHVDMIVMAATACSALYVGDVCGAVMILSQAFMVSPQRHNFTQECNCSIYQGHITGHRMAWDMMLSWSPTLTMILAYAARVPELVLEIIFGGHWGVVFGLAYFSMQGAWAKVIAILLLVAGVDATTYSSGQEAGRTVAGFAGLFTTGAKQNLYLINTNGSWHINRTALNCNDSLQTGFLASLFYTHKFNSSGCPERLSSCRGLDDFRIGWGTLEYETNVTNDGDMRPYCWHYPPRPCGIVP.... The pIC50 is 6.9. (7) The drug is Cc1cc(OCC(=O)Nc2cccnc2)c2c(C)c(C)c(=O)oc2c1. The target protein (Q62848) has sequence MASPRTRKVLKEVRAQDENNVCFECGAFNPQWVSVTYGIWICLECSGRHRGLGVHLSFVRSVTMDKWKDIELEKMKAGGNAKFREFLEAQDDYEPSWSLQDKYSSRAAALFRDKVATLAEGKEWSLESSPAQNWTPPQPKTLQFTAHRPAGQPQNVTTSGDKAFEDWLNDDLGSYQGAQENRYVGFGNTVPPQKREDDFLNSAMSSLYSGWSSFTTGASKFASAAKEGATKFGSQASQKASELGHSLNENVLKPAQEKVKEGRIFDDVSSGVSQLASKVQGVGSKGWRDVTTFFSGKAEDTSDRPLEGHSYQNSSGDNSQNSTIDQSFWETFGSAEPPKAKSPSSDSWTCADASTGRRSSDSWDIWGSGSASNNKNSNSDGWESWEGASGEGRAKATKKAAPSTAADEGWDNQNW. The pIC50 is 4.2. (8) The drug is CN[C@@H](C)C(=O)N[C@H]1CN(C(=O)c2ccc(C(C)O)cc2)c2ccccc2N(Cc2c(OC)ccc3c(Br)cccc23)C1=O. The target protein sequence is MRHHHHHHRSDAVSSDRNFPNSTNLPRNPSMADYEARIFTFGTWIYSVNKEQLARAGFYALGEGDKVKCFHCGGGLTDWKPSEDPWEQHAKWYPGCKYLLEQKGQEYINNIHLTHSLEECLVRTT. The pIC50 is 4.3. (9) The small molecule is CC1(C)[C@H](C(=O)O)N2C(=O)C[C@H]2S1(=O)=O. The target protein sequence is MRFKKISCLLLPPLFIFSSSIYAGNTPKEQEIKKLVDQNFKPLLEKYDVPGMAVGVIQNNKKYEMYYGLQSVQDKKAVNSNTIFELGSVSKLFTATAGGYAKTKGTISFNDTPGKYWKELKNTPIDQVNLLQLATYTSGNLALQFPDEVKTDQQVLTFFKEWKPKNPIGEYRQYSNPSIGLFGKVVALSMNKPFDQVLEKTIFPDLGLKHSYVNVPKTQMQNYAFGYNQENQPIRVNPGPLDAPAYGVKSTLPDMLSFINANLNPQKYPANIQRAINETHQGFYQVGTMYQALGWEEFSYPALLQTLLDSNSEQIVMKPNKVTAISKEPSVKMFHKTGSTNGFGTYVVFIPKENIGLVMLTNKRIPNEERIKAAYAVLNAIKK. The pIC50 is 5.1.