Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50.. Dataset: Drug-target binding data from BindingDB using IC50 measurements (1) The drug is COCCc1cccc(C[C@H](O)CC[C@H]2CCC(=O)N2CCCCCCC(=O)O)c1. The target protein (P43114) has sequence MSIPGVNASFSSTPERLNSPVTIPAVMFIFGVVGNLVAIVVLCKSRKEQKETTFYTLVCGLAVTDLLGTLLVSPVTIATYMKGQWPGDQALCDYSTFILLFFGLSGLSIICAMSIERYLAINHAYFYSHYVDKRLAGLTLFAVYASNVLFCALPNMGLGRSERQYPGTWCFIDWTTNVTAYAAFSYMYAGFSSFLILATVLCNVLVCGALLRMLRQFMRRTSLGTEQHHAAAAAAVASVACRGHAAASPALQRLSDFRRRRSFRRIAGAEIQMVILLIATSLVVLICSIPLVVRVFINQLYQPSVVKDISRNPDLQAIRIASVNPILDPWIYILLRKTVLSKAIEKIKCLFCRIGGSGRDGSAQHCSESRRTSSAMSGHSRSFLSRELREISSTSHTLLYLPDLTESSLGGKNLLPGTHGMGLTQADTTSLRTLRISETSDSSQGQDSESVLLVDEVSGSQREEPASKGNSLQVTFPSETLKLSEKCI. The pIC50 is 6.7. (2) The small molecule is Cc1ccc2cc(C(=O)O)[nH]c2c1. The target protein (P38919) has sequence MATTATMATSGSARKRLLKEEDMTKVEFETSEEVDVTPTFDTMGLREDLLRGIYAYGFEKPSAIQQRAIKQIIKGRDVIAQSQSGTGKTATFSISVLQCLDIQVRETQALILAPTRELAVQIQKGLLALGDYMNVQCHACIGGTNVGEDIRKLDYGQHVVAGTPGRVFDMIRRRSLRTRAIKMLVLDEADEMLNKGFKEQIYDVYRYLPPATQVVLISATLPHEILEMTNKFMTDPIRILVKRDELTLEGIKQFFVAVEREEWKFDTLCDLYDTLTITQAVIFCNTKRKVDWLTEKMREANFTVSSMHGDMPQKERESIMKEFRSGASRVLISTDVWARGLDVPQVSLIINYDLPNNRELYIHRIGRSGRYGRKGVAINFVKNDDIRILRDIEQYYSTQIDEMPMNVADLI. The pIC50 is 4.0. (3) The target protein (Q28021) has sequence MSRPPPTGKMPGAPEAVSGDGAGASRQRKLEALIRDPRSPINVESLLDGLNPLVLDLDFPALRKNKNIDNFLNRYEKIVKKIRGLQMKAEDYDVVKVIGRGAFGEVQLVRHKASQKVYAMKLLSKFEMIKRSDSAFFWEERDIMAFANSPWVVQLFCAFQDDKYLYMVMEYMPGGDLVNLMSNYDVPEKWAKFYTAEVVLALDAIHSMGLIHRDVKPDNMLLDKHGHLKLADFGTCMKMDETGMVHCDTAVGTPDYISPEVLKSQGGDGYYGRECDWWSVGVFLFEMLVGDTPFYADSLVGTYSKIMDHKNSLCFPEDAEISKHAKNLICAFLTDREVRLGRNGVEEIKQHPFFKNDQWNWDNIRETAAPVVPELSSDIDSSNFDDIEDDKGDVETFPIPKAFVGNQLPFIGFTYYRENLLLSDSPSCKENDSIQSRKNEESQEIQKKLYTLEEHLSTEIQAKEELEQKCKSVNTRLEKVAKELEEEITLRKNVESTLRQ.... The pIC50 is 6.3. The drug is O=S(=O)(c1cccc2cnccc12)N1CCCNCC1. (4) The compound is CC(=O)C1=C(O)C(=O)N(CCc2c[nH]c3ccccc23)C1c1ccc(C)cc1O. The target protein (Q8VIJ4) has sequence MATIEEIAHQIIDQQMGEIVTEQQTGQKIQIVTALDHSTQGKQFILANHEGSTPGKVFLTTPDAAGVNQLFFASPDLSTPHLQLLTENSPDQGPNKVFDLCVVCGDKASGRHYGAITCEGCKGFFKRSIRKNLVYSCRGSKDCIINKHHRNRCQYCRLQRCIAFGMKQDSVQCERKPIEVSREKSSNCAASTEKIYIRKDLRSPLAATPTFVTDSETARSTGLLDSGMFVNIHPSGIKTEPALLMTPDKAESCQGDLGTLASVVTSLANLGKAKDLSHCGGDLPVVQSLRNGDTSFGAFHQDIQTNGDVSRAFDNLAKALTPGENPACQSPGESMEGSTHLIAGEPSCMEREGPLLSDSHVVFRLTMPSPMPEYLNVHYIGESASRLLFLSMHWALSIPSFQALGQENSISLVKAYWNELFTLGLAQCWQVMNVATILATFVNCLHNSLQQDKMSPERRKLLMEHIFKLQEFCNSMVKLCIDGHEYAYLKAIVLFSPDHP.... The pIC50 is 4.8. (5) The pIC50 is 4.2. The target protein sequence is MTGGMFGRKGQKIKGTVVLMPKNVLDFNAITSVGKGSAKDTATDFLGKGLDALGHAVDALTAFAGHSISLQLISATQTDGSGKGKVGNEAYLEKHLPTLPTLGARQEAFDINFEWDASFGIPGAFYIKNFMTDEFFLVSVKLEDIPNHGTINFVCNSWVYNFKSYKKNRIFFVNDTYLPSATPGPLVKYRQEELEVLRGDGTGKRRDFDRIYDYDIYNDLGNPDGGDPRPIIGGSSNYPYPRRVRTGREKTRKDPNSEKPGEIYVPRDENFGHLKSSDFLTYGIKSLSQNVIPLFKSIILNLRVTSSEFDSFDEVRGLFEGGIKLPTNILSQISPLPVLKEIFRTDGENTLQFPPPHVIRVSKSGWMTDDEFAREMIAGVNPNVIRRLQEFPPKSTLDPATYGDQTSTITKQQLEINLGGVTVEEAISAHRLFILDYHDAFFPYLTKINSLPIAKAYATRTILFLKDDGSLKPLAIELSKPATVSKVVLPATEGVESTIW.... The compound is COc1cccc(/C=C/C(=O)/C=C/c2cccc(OC)c2OC)c1OC. (6) The small molecule is CCCCCCCN(CCCCCSc1nc[nH]n1)C(=O)Nc1ccc(F)cc1F. The target protein (Q61263) has sequence MSLRNRLSKSGENPEQDEAQKNFMDTYRNGHITMKQLIAKKRLLAAEAEELKPLFMKEVGCHFDDFVTNLIEKSASLDNGGCALTTFSILEEMKKNHRAKDLRAPPEQGKIFISRQSLLDELFEVDHIRTIYHMFIALLILFVLSTIVVDYIDEGRLVLEFNLLAYAFGKFPTVIWTWWAMFLSTLSIPYFLFQRWAHGYSKSSHPLIYSLVHGLLFLVFQLGVLGFVPTYVVLAYTLPPASRFILILEQIRLIMKAHSFVRENIPRVLNAAKEKSSKDPLPTVNQYLYFLFAPTLIYRDNYPRTPTVRWGYVAMQFLQVFGCLFYVYYIFERLCAPLFRNIKQEPFSARVLVLCVFNSILPGVLILFLSFFAFLHCWLNAFAEMLRFGDRMFYKDWWNSTSYSNYYRTWNVVVHDWLYYYVYKDLLWFFSKRFKSAAMLAVFALSAVVHEYALAICLSYFYPVLFVLFMFFGMAFNFIVNDSRKRPIWNIMVWASLFLG.... The pIC50 is 4.6. (7) The compound is COc1ccc2c(c1)c(CC(=O)O)c(C)n2C(=O)c1ccc(Cl)cc1. The target protein sequence is ANPCCSNPCQNRGECMSTGFDQYKCDCTRTGFYGENCTTPEFLTRIKLLLKPTPNTVHYILTHFKGVWNIVNNIPFLRSLIMKYVLTSQSYLIDSPPTYNVHYGYKSWEAFSNLSYYTRALPPVADDCPTPMGVKGNKELPDSKEVLEKVLLRREFIPDPQGSNMMFAFFAQHFTHQFFKTDHKRGPGFTRGLGHGVDLNHIYGETLDRQHKLRLFKDGKLKYQVIGGEVYPPTVKDTQVEMIYPPHIPENLQFAVGQEVFGLVPGLMMYATIWLREHNRVCDILKQEHPEWGDEQLFQTSRLILIGETIKIVIEDYVQHLSGYHFKLKFDPELLFNQQFQYQNRIASEFNTLYHWHPLLPDTFNIEDQEYSFKQFLYNNSILLEHGLTQFVESFTRQIAGRVAGGRNVPIAVQAVAKASIDQSREMKYQSLNEYRKRFSLKPYTSFEELTGEKEMAAELKALYSDIDVMELYPALLVEKPRPDAIFGETMVELGAPFSL.... The pIC50 is 6.5. (8) The drug is CS(=O)(=O)c1ccc(-c2cnc(NCc3ccco3)n3cnnc23)cc1. The target protein sequence is TTNVGDSTLADLLDHSCTSGSGSGLPFLVQRTVARQITLLECVGKGRYGEVWRGSWQGENVAVKIFSSRDEKSWFRETELYNTVMLRHENILGFIASDMTSRHSSTQLWLITHYHEMGSLYDYLQLTTLDTVSCLRIVLSIASGLAHLHIEIFGTQGKPAIAHRDLKSKNILVKKNGQCCIADLGLAVMHSQSTNQLDVGNNPRVGTKRYMAPEVLDETIQVDCFDSYKRVDIWAFGLVLWEVARRMVSNGIVEDYKPPFYDVVPNDPSFEDMRKVVCVDQQRPNIPNRWFSDPTLTSLAKLMKECWYQNPSARLTALRIKKTLTKID. The pIC50 is 5.0. (9) The compound is CC1=CC(=O)c2ccccc2C1=O. The target protein (Q5FB27) has sequence MDRASELLFYVNGRKVIEKNVDPETMLLPYLRKKLRLTGTKYGCGGGGCGACTVMISRYNPITNRIRHHPANACLIPICSLYGTAVTTVEGIGSTHTRIHPVQERIAKCHGTQCGFCTPGMVMSIYTLLRNHPEPTLDQLTDALGGNLCRCTGYRPIIDACKTFCETSGCCQSKENGVCCLDQRINGLPEFEEGSKTSPKLFAEEEFLPLDPTQELIFPPELMIMAEKQPQRTRVFGSERMMWFSPVTLKELLEFKFKYPQAPVIMGNTSVGPQMKFKGVFHPVIISPDRIEELSVVNHTHNGLTLGAGLSLAQVKDILADVVQKLPGEKTQTYHALLKHLGTLAGSQIRNMASLGGHIISRHPDSDLNPILAVGNCTLNLLSKEGKRQIPLNEQFLSKCPNADLKPQEILVSVNIPYSRKLEFVSAFRQAQRQENALAIVNSGMRVFFGEGHGIIRELSISYGGVGPATICAKNSCQKLIGRHWNEEMLDTACRLVLEE.... The pIC50 is 7.0. (10) The target protein sequence is KNPFSTGDTDLDLEMLAPYIPMDDDFQLRSFDQLSNGQTKPLPALKLALEYIVPCMNKHGICVVDDFLGKETGQQIGDEVRALHDTGKFTDGQLVSQKSDSSKDIRGDKITWIEGKEPGCETIGLLMSSMDDLIRHCNGKLGSYKINGRTKAMVACYPGNGTGYVRHVDNPNGDGRCVTCIYYLNKDWDAKVSGGILRIFPEGKAQFADIEPKFDRLLFFWSDRRNPHEVQPAYATRYAITVWYFDADERARAKVKYLTGEKGVRVELNKPSDSVGKDVF. The compound is CN(C)c1nc(-n2cc(C(=O)O)cn2)nc2cc(F)c(OC3CCCCC3)cc12. The pIC50 is 7.1.