From a dataset of Drug-target binding data from BindingDB using IC50 measurements. Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50. (1) The compound is CC(C)(Oc1ccc(F)cc1F)c1nnc(-c2ccc(C(N)=O)cc2F)n1C1CC1. The target protein (P16232) has sequence MKKYLLPVLVLCLGYYYSTNEEFRPEMLQGKKVIVTGASKGIGREMAYHLSKMGAHVVLTARSEEGLQKVVSRCLELGAASAHYIAGTMEDMAFAERFVVEAGKLLGGLDMLILNHITQTTMSLFHDDIHSVRRSMEVNFLSYVVLSTAALPMLKQSNGSIAIISSMAGKMTQPLIASYSASKFALDGFFSTIRKEHLMTKVNVSITLCVLGFIDTETALKETSGIILSQAAPKEECALEIIKGTVLRKDEVYYDKSSWTPLLLGNPGRRIMEFLSLRSYNRDLFVSN. The pIC50 is 7.2. (2) The compound is Cc1cccc(C)c1CN(NC(=O)[C@H](C)NC(=O)[C@@H](Cc1c[nH]c2ccccc12)NC(=O)[C@@H](N)Cc1cnc[nH]1)C(=O)N[C@H](Cc1ccccc1)C(=O)N[C@@H](CCCCN)C(N)=O. The target protein (Q07969) has sequence MGCDRNCGLITGAVIGAVLAVFGGILMPVGDLLIEKTIKREVVLEEGTIAFKNWVKTGTTVYRQFWIFDVQNPEEVAKNSSKIKVKQRGPYTYRVRYLAKENITQDPKDSTVSFVQPNGAIFEPSLSVGTENDNFTVLNLAVAAAPHIYTNSFVQGVLNSLIKKSKSSMFQTRSLKELLWGYKDPFLSLVPYPISTTVGVFYPYNNTVDGVYKVFNGKDNISKVAIIDTYKGKRNLSYWESYCDMINGTDAASFPPFVEKSQTLRFFSSDICRSIYAVFESEVNLKGIPVYRFVLPANAFASPLQNPDNHCFCTEKVISNNCTSYGVLDIGKCKEGKPVYISLPHFLHASPDVSEPIEGLNPNEDEHRTYLDVEPITGFTLQFAKRLQVNILVKPARKIEALKNLKRPYIVPILWLNETGTIGDEKAEMFRNQVTGKIKLLGLVEMVLLGVGVVMFVAFMISYCACRSKNGK. The pIC50 is 5.3. (3) The drug is COc1cccc(CSC(C(=O)O)c2ccccc2)c1. The target protein (P15304) has sequence MKPRRPISFTREITAMEPSSTSVSRPEWRPEAQQTLTDYPGSRELQEFGIPQKQSLPNEATAQQGAEFQQEQGVQQSTLLQKLLTPLAFPVPQQSFPSHKVHSDQQEATSQNGPGAGKVHTTQKELEHRDEHVGTAESGPAEPPPATEVEATSIAQAVSGPDKKLPTQTDLVSQERAEQSDPTAQQTPLVQGVKSDQGSLIESGILARLQKLAIQQPSQEWKTFLDCVTESDMEKYLNSSSKSNPPEPSGGTVIPGTLPSKQKPDCGKMSGYGGKLPHGKKGILQKHKHYWDTASAFSHSMDLRTMTQSLVALAEDNMAFFSSQGPGETARRLSNVFAGVREQALGLEPTLGQLLGVAHHFDLDTETPANGYRSLVHTARCCLAHLLHKSRYVASNRRSIFFRASHNLAELEAYLAALTQLRALAYYAQRLLTINRPGVLFFEGDEGLSADFLQDYVTLHKGCFYGRCLGFQFTPAIRPFLQTLSIGLVSFGEHYKRNET.... The pIC50 is 6.3. (4) The compound is Cn1nc(C(=O)NCc2ccc(C(=O)O)cc2)cc1-c1ccc2[nH]c(C3=NCCO3)cc2c1. The target protein (P33435) has sequence MHSAILATFFLLSWTPCWSLPLPYGDDDDDDLSEEDLVFAEHYLKSYYHPATLAGILKKSTVTSTVDRLREMQSFFGLEVTGKLDDPTLDIMRKPRCGVPDVGEYNVFPRTLKWSQTNLTYRIVNYTPDMSHSEVEKAFRKAFKVWSDVTPLNFTRIYDGTADIMISFGTKEHGDFYPFDGPSGLLAHAFPPGPNYGGDAHFDDDETWTSSSKGYNLFIVAAHELGHSLGLDHSKDPGALMFPIYTYTGKSHFMLPDDDVQGIQFLYGPGDEDPNPKHPKTPEKCDPALSLDAITSLRGETMIFKDRFFWRLHPQQVEAELFLTKSFWPELPNHVDAAYEHPSRDLMFIFRGRKFWALNGYDILEGYPRKISDLGFPKEVKRLSAAVHFENTGKTLFFSENHVWSYDDVNQTMDKDYPRLIEEEFPGIGNKVDAVYEKNGYIYFFNGPIQFEYSIWSNRIVRVMPTNSILWC. The pIC50 is 7.4. (5) The small molecule is Cc1cc(F)cc2c(=O)[nH]c(-c3ccc(C(=O)N4CCN(C)CC4)nc3)cc12. The target protein sequence is MHHHHHHSSGVDLGTENLYFQSMQGTNPYLTFHCVNQGTILLDLAPEDKEYQSVEEEMQSTIREHRDGGNAGGIFNRYNVIRIQKVVNKKLRERFCHRQKEVSEENHNHHNERMLFHGSPFINAIIHKGFDERHAYIGGMFGAGIYFAENSSKSNQYVYGIGGGTGCPTHKDRSCYICHRQMLFCRVTLGKSFLQFSTIKMAHAPPGHHSVIGRPSVNGLAYAEYVIYRGEQAYPEYLITYQIMKPEAPSQTATAAEQ. The pIC50 is 7.8. (6) The compound is COc1cc2c(cc1OC)-c1c(OC)c(OC)cc3c1[C@H](C2)N(C)CC3. The target protein (Q14761) has sequence MALPCTLGLGMLLALPGALGSGGSAEDSVGSSSVTVVLLLLLLLLLATGLALAWRRLSRDSGGYYHPARLGAALWGRTRRLLWASPPGRWLQARAELGSTDNDLERQEDEQDTDYDHVADGGLQADPGEGEQQCGEASSPEQVPVRAEEARDSDTEGDLVLGSPGPASAGGSAEALLSDLHAFAGSAAWDDSARAAGGQGLHVTAL. The pIC50 is 4.1. (7) The pIC50 is 3.9. The target protein (Q01970) has sequence MAGAQPGVHALQLEPPTVVETLRRGSKFIKWDEETSSRNLVTLRVDPNGFFLYWTGPNMEVDTLDISSIRDTRTGRYARLPKDPKIREVLGFGGPDARLEEKLMTVVSGPDPVNTVFLNFMAVQDDTAKVWSEELFKLAMNILAQNASRNTFLRKAYTKLKLQVNQDGRIPVKNILKMFSADKKRVETALESCGLKFNRSESIRPDEFSLEIFERFLNKLCLRPDIDKILLEIGAKGKPYLTLEQLMDFINQKQRDPRLNEVLYPPLRPSQARLLIEKYEPNQQFLERDQMSMEGFSRYLGGEENGILPLEALDLSTDMTQPLSAYFINSSHNTYLTAGQLAGTSSVEMYRQALLWGCRCVELDVWKGRPPEEEPFITHGFTMTTEVPLRDVLEAIAETAFKTSPYPVILSFENHVDSAKQQAKMAEYCRSIFGDALLIEPLDKYPLAPGVPLPSPQDLMGRILVKNKKRHRPSAGGPDSAGRKRPLEQSNSALSESSAA.... The compound is O=C(CCCN1C(=O)/C(=C/c2ccco2)SC1=S)Nc1cccc(C(=O)O)c1.