This data is from Drug-target binding data from BindingDB using IC50 measurements. The task is: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50. (1) The compound is c1nc(NCCc2cnc[nH]2)c2cc(-c3ccc4c(c3)OCO4)ccc2n1. The target protein (P49761) has sequence MPVLSARRRELADHAGSGRRSGPSPTARSGPHLSALRAQPARAAHLSGRGTYVRRDTAGGGPGQARPLGPPGTSLLGRGARRSGEGWCPGAFESGARAARPPSRVEPRLATAASREGAGLPRAEVAAGSGRGARSGEWGLAAAGAWETMHHCKRYRSPEPDPYLSYRWKRRRSYSREHEGRLRYPSRREPPPRRSRSRSHDRLPYQRRYRERRDSDTYRCEERSPSFGEDYYGPSRSRHRRRSRERGPYRTRKHAHHCHKRRTRSCSSASSRSQQSSKRSSRSVEDDKEGHLVCRIGDWLQERYEIVGNLGEGTFGKVVECLDHARGKSQVALKIIRNVGKYREAARLEINVLKKIKEKDKENKFLCVLMSDWFNFHGHMCIAFELLGKNTFEFLKENNFQPYPLPHVRHMAYQLCHALRFLHENQLTHTDLKPENILFVNSEFETLYNEHKSCEEKSVKNTSIRVADFGSATFDHEHHTTIVATRHYRPPEVILELGWA.... The pIC50 is 5.0. (2) The small molecule is CC(C)Oc1ccc(NC(=O)[C@@H]2C[C@@H]3CC[C@@H]2N(S(=O)(=O)c2cn(C)cn2)C3)cc1. The target protein (Q9NYP7) has sequence MEHFDASLSTYFKALLGPRDTRVKGWFLLDNYIPTFICSVIYLLIVWLGPKYMRNKQPFSCRGILVVYNLGLTLLSLYMFCELVTGVWEGKYNFFCQGTRTAGESDMKIIRVLWWYYFSKLIEFMDTFFFILRKNNHQITVLHVYHHASMLNIWWFVMNWVPCGHSYFGATLNSFIHVLMYSYYGLSSVPSMRPYLWWKKYITQGQLLQFVLTIIQTSCGVIWPCTFPLGWLYFQIGYMISLIALFTNFYIQTYNKKGASRRKDHLKDHQNGSMAAVNGHTNSFSPLENNVKPRKLRKD. The pIC50 is 5.0. (3) The small molecule is O=C1NCc2cc(NS(=O)(=O)c3cc(Cl)ccc3O)ccc2N1c1c(Cl)cccc1Cl. The target protein (P47811) has sequence MSQERPTFYRQELNKTIWEVPERYQNLSPVGSGAYGSVCAAFDTKTGHRVAVKKLSRPFQSIIHAKRTYRELRLLKHMKHENVIGLLDVFTPARSLEEFNDVYLVTHLMGADLNNIVKCQKLTDDHVQFLIYQILRGLKYIHSADIIHRDLKPSNLAVNEDCELKILDFGLARHTDDEMTGYVATRWYRAPEIMLNWMHYNQTVDIWSVGCIMAELLTGRTLFPGTDHIDQLKLILRLVGTPGAELLKKISSESARNYIQSLAQMPKMNFANVFIGANPLAVDLLEKMLVLDSDKRITAAQALAHAYFAQYHDPDDEPVADPYDQSFESRDLLIDEWKSLTYDEVISFVPPPLDQEEMES. The pIC50 is 6.2. (4) The small molecule is CCn1sc(=O)n(Cc2ccccc2)c1=S. The target is XTSFAESXKPVQQPSAFGS. The pIC50 is 5.2. (5) The target protein sequence is MSDGFSLSDALPAHNPGAPPPQGWNRPPGPGAFPAYPGYPGAYPGAPGPYPGAPGPHHGPPGPYPGGPPGPYPGGPPGPYPGGPPGPYPGGPTAPYSEAPAAPLKVPYDLPLPAGLMPRLLITITGTVNSNPNRFSLDFKRGQDIAFHFNPRFKEDHKRVIVCNSMFQNNWGKEERTAPRFPFEPGTPFKLQVLCEGDHFKVAVNDAHLLQFNFREKKLNEITKLCIAGDITLTSVLTSMI. The compound is O=C1NCc2ccc(cc2)CNC(=O)[C@@H]2O[C@@H](C/C=C/C[C@H]3O[C@H]1[C@@H](OCc1cn([C@@H]4O[C@H](CO)[C@@H](O[C@@H]5O[C@H](CO)[C@H](O)[C@H](O)[C@H]5O)[C@H](O)[C@H]4O)nn1)[C@H](OCc1ccccc1)[C@H]3OCc1ccccc1)[C@@H](OCc1ccccc1)[C@H](OCc1ccccc1)[C@H]2OCc1cn([C@@H]2O[C@H](CO)[C@@H](O[C@@H]3O[C@H](CO)[C@H](O)[C@H](O)[C@H]3O)[C@H](O)[C@H]2O)nn1. The pIC50 is 3.1. (6) The compound is CC(C)C[C@H](NP(=O)(O)O[C@@H]1O[C@@H](C)[C@H](O)[C@@H](O)[C@H]1O)C(=O)N[C@@H](Cc1c[nH]c2ccccc12)C(=O)O. The target protein (P42891) has sequence MMSTYKRATLDEEDLVDSLSESDVYPNHLQVNFRGPRNGQRCWAARTPVEKRLVVLVALLAAALVACLAVLGIQYQTRTPSVCLSEACISVTSSILSSMDPTVDPCQDFFTYACGGWIKANPVPDGHSRWGTFSNLWEHNQAIIKHLLENSTASVSEAERKAQVYYRACMNETRIEELKAKPLMELIEKLGGWNITGPWDKDNFQDTLQVVTSHYHTSPFFSVYVSADSKNSNSNVIQVDQSGLGLPSRDYYLNKTENEKVLTGYLNYMVQLGKLLGGGAEDTIRPQMQQILDFETALANITIPQEKRRDEELIYHKVTAAELQTLAPAINWLPFLNTIFYPVEINESEPIVIYDKEYLSKVSTLINSTDKCLLNNYMIWNLVRKTSSFLDQRFQDADEKFMEVMYGTKKTCLPRWKFCVSDTENTLGFALGPMFVKATFAEDSKNIASEIILEIKKAFEESLSTLKWMDEDTRKSAKEKADAIYNMIGYPNFIMDPKEL.... The pIC50 is 5.5.