Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50.. Dataset: Drug-target binding data from BindingDB using IC50 measurements (1) The compound is Nc1nc2[nH]cc(CCc3ccc(C(=O)N[C@@H](CCC(=O)O)C(=O)O)cc3)c2c(=O)[nH]1. The target protein (P45352) has sequence MLVEGSELQSGAQQPRTEAPQHGELQYLRQVEHIMRCGFKKEDRTGTGTLSVFGMQARYSLRDEFPLLTTKRVFWKGVLEELLWFIKGSTNAKELSSKGVRIWDANGSRDFLDSLGFSARQEGDLGPVYGFQWRHFGADYKDMDSDYSGQGVDQLQKVIDTIKTNPDDRRIIMCAWNPKDLPLMALPPCHALCQFYVVNGELSCQLYQRSGDMGLGVPFNIASYALLTYMIAHITGLQPGDFVHTLGDAHIYLNHIEPLKIQLQREPRPFPKLRILRKVETIDDFKVEDFQIEGYNPHPTIKMEMAV. The pIC50 is 4.7. (2) The drug is CC(C)C[C@H](NC(=O)Cc1ccc(NC(=O)[C@H](Cc2ccccc2)NC(=O)[C@H](Cc2cnc[nH]2)NC(=O)CNC(=O)[C@@H](NC(=O)[C@@H](NC(=O)[C@H](Cc2ccccc2)NC(=O)[C@@H](N)CCCN=C(N)N)C(C)(C)S)[C@@H](C)O)cc1)C(=O)N[C@@H](Cc1ccc(O)cc1)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CS)C(=O)O. The target protein (P55899) has sequence MGVPRPQPWALGLLLFLLPGSLGAESHLSLLYHLTAVSSPAPGTPAFWVSGWLGPQQYLSYNSLRGEAEPCGAWVWENQVSWYWEKETTDLRIKEKLFLEAFKALGGKGPYTLQGLLGCELGPDNTSVPTAKFALNGEEFMNFDLKQGTWGGDWPEALAISQRWQQQDKAANKELTFLLFSCPHRLREHLERGRGNLEWKEPPSMRLKARPSSPGFSVLTCSAFSFYPPELQLRFLRNGLAAGTGQGDFGPNSDGSFHASSSLTVKSGDEHHYCCIVQHAGLAQPLRVELESPAKSSVLVVGIVIGVLLLTAAAVGGALLWRRMRSGLPAPWISLRGDDTGVLLPTPGEAQDADLKDVNVIPATA. The pIC50 is 3.9. (3) The target protein (Q8HZR1) has sequence MSRGSRLHRWPLLLLLLLLLPPPPVLPAEARTPAPVNPCCYYPCQHQGICVRFGLDRYQCDCTRTGYSGPNCTIPELWTWLRNSLRPSPSFLHFLLTHGRWFWEFINATFIRDMLMRLVLTARSNLIPSPPTYNIAHDYISWESFSNVSYYTRVLPSVPQDCPTPMGTKGKKQLPDAQLLGRRFLLRRKFIPDPQGTNLMFAFFAQHFTHQFFKTSGKMGPGFTKALGHGVDLGHIYGDNLDRQYQLRLFKDGKLKYQVLDGEMYPPSVEEAPVLMHYPRGILPQSQMAVGQEVFGLLPGLMLYATLWLREHNRVCDLLKAEHPTWGDEQLFQTARLILIGETIKIVIEEYVQQLSGYFLQLKFDPELLFSAQFQYRNRIAMEFNQLYHWHPLMPDSFWVGSQEYSYEQFLFNTSMLTHYGIEALVDAFSRQSAGRIGGGRNIDHHVLHVAVETIKESRELRLQPFNEYRKRFGMRPYMSFQELTGEKEMAAELEELYGD.... The pIC50 is 4.5. The compound is CS(=O)(=O)c1ccc(-n2nc(C(F)(F)F)cc2-c2ccc(Br)cc2)nc1. (4) The compound is CCCOc1ccc(NC(=O)CC2C(=O)N(C)C(=S)N2Cc2ccc3c(c2)OCO3)cc1. The target protein (O00482) has sequence MSSNSDTGDLQESLKHGLTPIGAGLPDRHGSPIPARGRLVMLPKVETEALGLARSHGEQGQMPENMQVSQFKMVNYSYDEDLEELCPVCGDKVSGYHYGLLTCESCKGFFKRTVQNNKRYTCIENQNCQIDKTQRKRCPYCRFQKCLSVGMKLEAVRADRMRGGRNKFGPMYKRDRALKQQKKALIRANGLKLEAMSQVIQAMPSDLTISSAIQNIHSASKGLPLNHAALPPTDYDRSPFVTSPISMTMPPHGSLQGYQTYGHFPSRAIKSEYPDPYTSSPESIMGYSYMDSYQTSSPASIPHLILELLKCEPDEPQVQAKIMAYLQQEQANRSKHEKLSTFGLMCKMADQTLFSIVEWARSSIFFRELKVDDQMKLLQNCWSELLILDHIYRQVVHGKEGSIFLVTGQQVDYSIIASQAGATLNNLMSHAQELVAKLRSLQFDQREFVCLKFLVLFSLDVKNLENFQLVEGVQEQVNAALLDYTMCNYPQQTEKFGQLL.... The pIC50 is 5.8. (5) The drug is COC(=O)c1c(OCc2ccc(-c3ccccc3-c3nnn[nH]3)cc2)cc(C)nc1C. The target protein (Q9WV26) has sequence MILNSSTEDGIKRIQDDCPKAGRHSYIFVMIPTLYSIIFVVGIFGNSLVVIVIYFYMKLKTVASVFLLNLALADICFLLTLPLWAVYTAMEYRWPFGNYLCKIASASVSFNLYASVFLLTCLSIDRYLAIVHPMKSRLRRTMLVAKVTCVIIWLMAGLASLPAVIHRNVFFIENTNITVCAFHYESQNSTLPIGLGLTKNILGFMFPFLIILTSYTLIWKALKKAYEIQKNKPRNDDIFKIIMAIVLFFFFSWVPHQIFTFLDVLIQLGIIHDCKISDIVDTAMPITICIAYFNNCLNPLFYGFLGKKFKKYFLQLLKYIPPKAKSHSTLSTKMSTLSYRPSDNVSSSAKKPVQCFEVE. The pIC50 is 7.3.