This data is from Drug-target binding data from BindingDB using IC50 measurements. The task is: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50. (1) The drug is CCN(Cc1ccccc1)c1nc(Nc2ccc3ncsc3c2)[nH]c(=O)n1. The target protein (P16092) has sequence MWGWKCLLFWAVLVTATLCTARPAPTLPEQAQPWGVPVEVESLLVHPGDLLQLRCRLRDDVQSINWLRDGVQLVESNRTRITGEEVEVRDSIPADSGLYACVTSSPSGSDTTYFSVNVSDALPSSEDDDDDDDSSSEEKETDNTKPNRRPVAPYWTSPEKMEKKLHAVPAAKTVKFKCPSSGTPNPTLRWLKNGKEFKPDHRIGGYKVRYATWSIIMDSVVPSDKGNYTCIVENEYGSINHTYQLDVVERSPHRPILQAGLPANKTVALGSNVEFMCKVYSDPQPHIQWLKHIEVNGSKIGPDNLPYVQILKTAGVNTTDKEMEVLHLRNVSFEDAGEYTCLAGNSIGLSHHSAWLTVLEALEERPAVMTSPLYLEIIIYCTGAFLISCMLGSVIIYKMKSGTKKSDFHSQMAVHKLAKSIPLRRQVTVSADSSASMNSGVLLVRPSRLSSSGTPMLAGVSEYELPEDPRWELPRDRLVLGKPLGEGCFGQVVLAEAIGL.... The pIC50 is 5.0. (2) The small molecule is CC[C@H](C)[C@H](NC(=O)[C@@H](N)Cc1cnc[nH]1)C(=O)N[C@@H](CO)C(=O)N[C@@H](Cc1cnc[nH]1)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CC(C)C)C(=O)O. The target protein (P9WKK7) has sequence MSVVGTPKSAEQIQQEWDTNPRWKDVTRTYSAEDVVALQGSVVEEHTLARRGAEVLWEQLHDLEWVNALGALTGNMAVQQVRAGLKAIYLSGWQVAGDANLSGHTYPDQSLYPANSVPQVVRRINNALQRADQIAKIEGDTSVENWLAPIVADGEAGFGGALNVYELQKALIAAGVAGSHWEDQLASEKKCGHLGGKVLIPTQQHIRTLTSARLAADVADVPTVVIARTDAEAATLITSDVDERDQPFITGERTREGFYRTKNGIEPCIARAKAYAPFADLIWMETGTPDLEAARQFSEAVKAEYPDQMLAYNCSPSFNWKKHLDDATIAKFQKELAAMGFKFQFITLAGFHALNYSMFDLAYGYAQNQMSAYVELQEREFAAEERGYTATKHQREVGAGYFDRIATTVDPNSSTTALTGSTEEGQFH. The pIC50 is 3.7. (3) The compound is Cc1ccc(SCCC(=O)N/N=C/c2cccnc2)cc1. The target protein sequence is MALKLLSEKANSQALKVLLCSYYVKRPVEVSLSGAYATPILHHPAFKQPIIAPNEMARVILFYSVEPTSNNGGAADSSNGDGTASPVAGLTNLTLEHETWLEWEATTFTRAVHPLYTQRRQTAESLAVFSYLDKKISENDDRCVYSPAVEGKGAADPTDAVSTFFIDCIVWCAVLPALCESGVLRDSEKQQLPHLVKWFNTFQKEQKTLIDNAFENLSVQEAADFLRCPRVYKVSAKVEKVFFVTSPIYYVNAAPHIGHVYSTLITDVIGRYHRVKGERVFALTGTDEHGQKVAEAAKQKQVSPYDFTAAVAGEFKKCFEQMDYSIDYFIRTTNEQHKAVVKELWTKLEQKGDIYLGRYEGWYSISDESFLTPQNITDGVDKDGNPCKVSLESGHVVTWVSEENYMFRLSAFRERLLEWYHANPGCIVPEFRRREVIRAVEKGLPDLSVSRKKETLHNWAIPVPGNPDHCVYVWLDALTNYLTGSRLRVDESGKEVSLAD.... The pIC50 is 4.6. (4) The small molecule is O=S(=O)(N[C@H]1C(O)O[C@H](CNS(=O)(=O)c2cccc(Cl)c2Cl)[C@@H](O)[C@@H]1O)c1cccc(-c2ccc(Cl)cc2)c1. The target protein (P19367) has sequence MIAAQLLAYYFTELKDDQVKKIDKYLYAMRLSDETLIDIMTRFRKEMKNGLSRDFNPTATVKMLPTFVRSIPDGSEKGDFIALDLGGSSFRILRVQVNHEKNQNVHMESEVYDTPENIVHGSGSQLFDHVAECLGDFMEKRKIKDKKLPVGFTFSFPCQQSKIDEAILITWTKRFKASGVEGADVVKLLNKAIKKRGDYDANIVAVVNDTVGTMMTCGYDDQHCEVGLIIGTGTNACYMEELRHIDLVEGDEGRMCINTEWGAFGDDGSLEDIRTEFDREIDRGSLNPGKQLFEKMVSGMYLGELVRLILVKMAKEGLLFEGRITPELLTRGKFNTSDVSAIEKNKEGLHNAKEILTRLGVEPSDDDCVSVQHVCTIVSFRSANLVAATLGAILNRLRDNKGTPRLRTTVGVDGSLYKTHPQYSRRFHKTLRRLVPDSDVRFLLSESGSGKGAAMVTAVAYRLAEQHRQIEETLAHFHLTKDMLLEVKKRMRAEMELGLR.... The pIC50 is 6.0. (5) The drug is CC[C@@]1(O)C(=O)OCc2c1cc1n(c2=O)Cc2cc3c(NC(=O)CN)cccc3nc2-1.Cl. The target protein (Q9Y6Q9) has sequence MSGLGENLDPLASDSRKRKLPCDTPGQGLTCSGEKRRREQESKYIEELAELISANLSDIDNFNVKPDKCAILKETVRQIRQIKEQGKTISNDDDVQKADVSSTGQGVIDKDSLGPLLLQALDGFLFVVNRDGNIVFVSENVTQYLQYKQEDLVNTSVYNILHEEDRKDFLKNLPKSTVNGVSWTNETQRQKSHTFNCRMLMKTPHDILEDINASPEMRQRYETMQCFALSQPRAMMEEGEDLQSCMICVARRITTGERTFPSNPESFITRHDLSGKVVNIDTNSLRSSMRPGFEDIIRRCIQRFFSLNDGQSWSQKRHYQEAYLNGHAETPVYRFSLADGTIVTAQTKSKLFRNPVTNDRHGFVSTHFLQREQNGYRPNPNPVGQGIRPPMAGCNSSVGGMSMSPNQGLQMPSSRAYGLADPSTTGQMSGARYGGSSNIASLTPGPGMQSPSSYQNNNYGLNMSSPPHGSPGLAPNQQNIMISPRNRGSPKIASHQFSPV.... The pIC50 is 6.0. (6) The drug is N=C(NCCCO)NC(=O)Cn1c(-c2ccccc2)ccc1C12CC3CC(CC(C3)C1)C2. The target protein (P0DJD7) has sequence MKWLLLLGLVALSECIMYKVPLIRKKSLRRTLSERGLLKDFLKKHNLNPARKYFPQWEAPTLVDEQPLENYLDMEYFGTIGIGTPAQDFTVVFDTGSSNLWVPSVYCSSLACTNHNRFNPEDSSTYQSTSETVSITYGTGSMTGILGYDTVQVGGISDTNQIFGLSETEPGSFLYYAPFDGILGLAYPSISSSGATPVFDNIWNQGLVSQDLFSVYLSADDQSGSVVIFGGIDSSYYTGSLNWVPVTVEGYWQITVDSITMNGEAIACAEGCQAIVDTGTSLLTGPTSPIANIQSDIGASENSDGDMVVSCSAISSLPDIVFTINGVQYPVPPSAYILQSEGSCISGFQGMNLPTESGELWILGDVFIRQYFTVFDRANNQVGLAPVA. The pIC50 is 4.3. (7) The compound is N#CCNC(=O)[C@H](Cc1cccc(Cl)c1)NC(=O)c1cccc2c1CCCC2. The target protein (O60911) has sequence MNLSLVLAAFCLGIASAVPKFDQNLDTKWYQWKATHRRLYGANEEGWRRAVWEKNMKMIELHNGEYSQGKHGFTMAMNAFGDMTNEEFRQMMGCFRNQKFRKGKVFREPLFLDLPKSVDWRKKGYVTPVKNQKQCGSCWAFSATGALEGQMFRKTGKLVSLSEQNLVDCSRPQGNQGCNGGFMARAFQYVKENGGLDSEESYPYVAVDEICKYRPENSVANDTGFTVVAPGKEKALMKAVATVGPISVAMDAGHSSFQFYKSGIYFEPDCSSKNLDHGVLVVGYGFEGANSNNSKYWLVKNSWGPEWGSNGYVKIAKDKNNHCGIATAASYPNV. The pIC50 is 5.0. (8) The target protein (P32245) has sequence MVNSTHRGMHTSLHLWNRSSYRLHSNASESLGKGYSDGGCYEQLFVSPEVFVTLGVISLLENILVIVAIAKNKNLHSPMYFFICSLAVADMLVSVSNGSETIVITLLNSTDTDAQSFTVNIDNVIDSVICSSLLASICSLLSIAVDRYFTIFYALQYHNIMTVKRVGIIISCIWAACTVSGILFIIYSDSSAVIICLITMFFTMLALMASLYVHMFLMARLHIKRIAVLPGTGAIRQGANMKGAITLTILIGVFVVCWAPFFLHLIFYISCPQNPYCVCFMSHFNLYLILIMCNSIIDPLIYALRSQELRKTFKEIICCYPLGGLCDLSSRY. The pIC50 is 4.2. The compound is c1ccc(Cn2c(CNc3nc4ccccc4n3CCN3CCCCC3)nc3ccccc32)cc1. (9) The compound is CCCCCCCCCCCC(=O)c1c(O)c(C)c(O)c(C(=O)CCCCCCCCCCC)c1O. The target protein (Q9QXX3) has sequence MLLLLLLLLLGPGPGFSEATRRSHVYKRGLLELAGTLDCVGPRSPMAYMNYGCYCGLGGHGEPRDAIDWCCYHHDCCYSRAQDAGCSPKLDRYPWKCMDHHILCGPAENKCQELLCRCDEELAYCLAGTEYHLKYLFFPSILCEKDSPKCN. The pIC50 is 5.8. (10) The compound is COc1ccc(C(Nc2c(Nc3cccc(C(=O)N(C)C)c3O)c(=O)c2=O)C2(C)COC2)cc1. The pIC50 is 5.0. The target protein (P25024) has sequence MSNITDPQMWDFDDLNFTGMPPADEDYSPCMLETETLNKYVVIIAYALVFLLSLLGNSLVMLVILYSRVGRSVTDVYLLNLALADLLFALTLPIWAASKVNGWIFGTFLCKVVSLLKEVNFYSGILLLACISVDRYLAIVHATRTLTQKRHLVKFVCLGCWGLSMNLSLPFFLFRQAYHPNNSSPVCYEVLGNDTAKWRMVLRILPHTFGFIVPLFVMLFCYGFTLRTLFKAHMGQKHRAMRVIFAVVLIFLLCWLPYNLVLLADTLMRTQVIQESCERRNNIGRALDATEILGFLHSCLNPIIYAFIGQNFRHGFLKILAMHGLVSKEFLARHRVTSYTSSSVNVSSNL.