Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50. From a dataset of Drug-target binding data from BindingDB using IC50 measurements. (1) The small molecule is COC(=O)NC(Cc1ccc(N=C(N)N)cc1)P(=O)(Oc1ccc(NC(C)=O)cc1)Oc1ccc(NC(C)=O)cc1. The target protein (P18292) has sequence MLHVRGLGLPGCLALAALASLVHSQHVFLAPQQALSLLQRVRRANSGFLEELRKGNLERECVEEQCSYEEAFEALESPQDTDVFWAKYTVCDSVRKPRETFMDCLEGRCAMDLGLNYHGNVSVTHTGIECQLWRSRYPHRPDINSTTHPGADLKENFCRNPDSSTSGPWCYTTDPTVRREECSIPVCGQEGRTTVKMTPRSRGSKENLSPPLGECLLERGRLYQGNLAVTTLGSPCLAWDSLPTKTLSKYQNFDPEVKLVQNFCRNPDRDEEGAWCFVAQQPGFEYCSLNYCDEAVGEENHDGDESIAGRTTDAEFHTFFDERTFGLGEADCGLRPLFEKKSLTDKTEKELLDSYIDGRIVEGWDAEKGIAPWQVMLFRKSPQELLCGASLISDRWVLTAAHCILYPPWDKNFTENDLLVRIGKHSRTRYERNVEKISMLEKIYIHPRYNWRENLDRDIALLKLKKPVPFSDYIHPVCLPDKQTVTSLLQAGYKGRVTGW.... The pIC50 is 4.3. (2) The compound is COc1cccc(C2OC(CC(=O)N3CCC(CC(=O)O)CC3)c3cccn3-c3ccc(Cl)cc32)c1OC. The target protein (Q02769) has sequence MEFVKCLGHPEEFYNLLRFRMGGRRNFIPKMDRNSLSNSLKTCYKYLDQTSRSFAAVIQALDGDIRHAVCVFYLILRAMDTVEDDMAISVEKKIPLLRNFHTFLYEPEWRFTESKEKHRVVLEDFPTISLEFRNLAEKYQTVIADICHRMGCGMAEFLNKDVTSKQDWDKYCHYVAGLVGIGLSRLFSASEFEDPIVGEDTECANSMGLFLQKTNIIRDYLEDQQEGRQFWPQEVWGKYVKKLEDFVKPENVDVAVKCLNELITNALQHIPDVITYLSRLRNQSVFNFCAIPQVMAIATLAACYNNHQVFKGVVKIRKGQAVTLMMDATNMPAVKAIIYQYIEEIYHRVPNSDPSASKAKQLISNIRTQSLPNCQLISRSHYSPIYLSFIMLLAALSWQYLSTLSQVTEDYVQREH. The pIC50 is 8.6. (3) The compound is COC1CC(O)/C=C/C=C/C=C\C(C)C(C(C)C(O)CC(C)O)OC(=O)/C=C\C=C\C=C/C(C)=C/C(C)C(O)/C=C\C=C\c2coc(n2)C1C. The target protein (P60010) has sequence MDSEVAALVIDNGSGMCKAGFAGDDAPRAVFPSIVGRPRHQGIMVGMGQKDSYVGDEAQSKRGILTLRYPIEHGIVTNWDDMEKIWHHTFYNELRVAPEEHPVLLTEAPMNPKSNREKMTQIMFETFNVPAFYVSIQAVLSLYSSGRTTGIVLDSGDGVTHVVPIYAGFSLPHAILRIDLAGRDLTDYLMKILSERGYSFSTTAEREIVRDIKEKLCYVALDFEQEMQTAAQSSSIEKSYELPDGQVITIGNERFRAPEALFHPSVLGLESAGIDQTTYNSIMKCDVDVRKELYGNIVMSGGTTMFPGIAERMQKEITALAPSSMKVKIIAPPERKYSVWIGGSILASLTTFQQMWISKQEYDESGPSIVHHKCF. The pIC50 is 6.8. (4) The small molecule is CCN(CC)c1ccc(-c2c(C)[nH]n(-c3ccccn3)c2=O)cc1. The target protein (P49281) has sequence MVLGPEQKMSDDSVSGDHGESASLGNINPAYSNPSLSQSPGDSEEYFATYFNEKISIPEEEYSCFSFRKLWAFTGPGFLMSIAYLDPGNIESDLQSGAVAGFKLLWILLLATLVGLLLQRLAARLGVVTGLHLAEVCHRQYPKVPRVILWLMVELAIIGSDMQEVIGSAIAINLLSVGRIPLWGGVLITIADTFVFLFLDKYGLRKLEAFFGFLITIMALTFGYEYVTVKPSQSQVLKGMFVPSCSGCRTPQIEQAVGIVGAVIMPHNMYLHSALVKSRQVNRNNKQEVREANKYFFIESCIALFVSFIINVFVVSVFAEAFFGKTNEQVVEVCTNTSSPHAGLFPKDNSTLAVDIYKGGVVLGCYFGPAALYIWAVGILAAGQSSTMTGTYSGQFVMEGFLNLKWSRFARVVLTRSIAIIPTLLVAVFQDVEHLTGMNDFLNVLQSLQLPFALIPILTFTSLRPVMSDFANGLGWRIAGGILVLIICSINMYFVVVYVR.... The pIC50 is 5.3. (5) The compound is CNC(=O)[C@@H](NC(=O)[C@H](CC(C)C)[C@H](O)C(=O)NO)C(C)(C)C. The target protein (O75173) has sequence MSQTGSHPGRGLAGRWLWGAQPCLLLPIVPLSWLVWLLLLLLASLLPSARLASPLPREEEIVFPEKLNGSVLPGSGAPARLLCRLQAFGETLLLELEQDSGVQVEGLTVQYLGQAPELLGGAEPGTYLTGTINGDPESVASLHWDGGALLGVLQYRGAELHLQPLEGGTPNSAGGPGAHILRRKSPASGQGPMCNVKAPLGSPSPRPRRAKRFASLSRFVETLVVADDKMAAFHGAGLKRYLLTVMAAAAKAFKHPSIRNPVSLVVTRLVILGSGEEGPQVGPSAAQTLRSFCAWQRGLNTPEDSDPDHFDTAILFTRQDLCGVSTCDTLGMADVGTVCDPARSCAIVEDDGLQSAFTAAHELGHVFNMLHDNSKPCISLNGPLSTSRHVMAPVMAHVDPEEPWSPCSARFITDFLDNGYGHCLLDKPEAPLHLPVTFPGKDYDADRQCQLTFGPDSRHCPQLPPPCAALWCSGHLNGHAMCQTKHSPWADGTPCGPAQA.... The pIC50 is 5.0. (6) The small molecule is C=CC(=O)N[C@@H]1[C@@H](O)CC(O)(C(=O)O)O[C@H]1[C@H](O)[C@H](O)CO. The target protein (P03437) has sequence MKTIIALSYIFCLALGQDLPGNDNSTATLCLGHHAVPNGTLVKTITDDQIEVTNATELVQSSSTGKICNNPHRILDGIDCTLIDALLGDPHCDVFQNETWDLFVERSKAFSNCYPYDVPDYASLRSLVASSGTLEFITEGFTWTGVTQNGGSNACKRGPGSGFFSRLNWLTKSGSTYPVLNVTMPNNDNFDKLYIWGIHHPSTNQEQTSLYVQASGRVTVSTRRSQQTIIPNIGSRPWVRGLSSRISIYWTIVKPGDVLVINSNGNLIAPRGYFKMRTGKSSIMRSDAPIDTCISECITPNGSIPNDKPFQNVNKITYGACPKYVKQNTLKLATGMRNVPEKQTRGLFGAIAGFIENGWEGMIDGWYGFRHQNSEGTGQAADLKSTQAAIDQINGKLNRVIEKTNEKFHQIEKEFSEVEGRIQDLEKYVEDTKIDLWSYNAELLVALENQHTIDLTDSEMNKLFEKTRRQLRENAEEMGNGCFKIYHKCDNACIESIRNG.... The pIC50 is 3.6. (7) The small molecule is CC(=O)N([N-][CH+]c1ccc([N+](=O)[O-])o1)c1nc2ccccc2c(=O)n1-c1ccccc1. The target protein (P04993) has sequence MKLQKQLLEAVEHKQLRPLDVQFALTVAGDEHPAVTLAAALLSHDAGEGHVCLPLSRLENNEASHPLLATCVSEIGELQNWEECLLASQAVSRGDEPTPMILCGDRLYLNRMWCNERTVARFFNEVNHAIEVDEALLAQTLDKLFPVSDEINWQKVAAAVALTRRISVISGGPGTGKTTTVAKLLAALIQMADGERCRIRLAAPTGKAAARLTESLGKALRQLPLTDEQKKRIPEDASTLHRLLGAQPGSQRLRHHAGNPLHLDVLVVDEASMIDLPMMSRLIDALPDHARVIFLGDRDQLASVEAGAVLGDICAYANAGFTAERARQLSRLTGTHVPAGTGTEAASLRDSLCLLQKSYRFGSDSGIGQLAAAINRGDKTAVKTVFQQDFTDIEKRLLQSGEDYIAMLEEALAGYGRYLDLLQARAEPDLIIQAFNEYQLLCALREGPFGVAGLNERIEQFMQQKRKIHRHPHSRWYEGRPVMIARNDSALGLFNGDIGI.... The pIC50 is 3.9. (8) The drug is CN1CCN(CCCn2c(O)cn(N=C3CCOc4ccc(Cl)cc43)c2=O)CC1. The target protein (O08703) has sequence AVWDWLILLLVIYTAVFTPYSAAFLLKEPEEDAQTADCGYACQPLAVVDLIVDIMFIVDILINFRTTYVNANEEVVSHPGRIAVHYFKGWFLIDMVAAIPFDLLIFGSGSEELIGLLKTARLLRLVRVARKLDRYSEYGAAVLFLLMCTFALIAHWLACIWY. The pIC50 is 5.4. (9) The compound is O=C1c2c(O)ccc(O)c2C(=O)c2c(NCCNCCO)ccc(NCCNCCO)c21. The target protein (Q86VL8) has sequence MDSLQDTVALDHGGCCPALSRLVPRGFGTEMWTLFALSGPLFLFQVLTFMIYIVSTVFCGHLGKVELASVTLAVAFVNVCGVSVGVGLSSACDTLMSQSFGSPNKKHVGVILQRGALVLLLCCLPCWALFLNTQHILLLFRQDPDVSRLTQDYVMIFIPGLPVIFLYNLLAKYLQNQGWLKGQEEESPFQTPGLSILHPSHSHLSRASFHLFQKITWPQVLSGVVGNCVNGVANYALVSVLNLGVRGSAYANIISQFAQTVFLLLYIVLKKLHLETWAGWSSQCLQDWGPFFSLAVPSMLMICVEWWAYEIGSFLMGLLSVVDLSAQAVIYEVATVTYMIPLGLSIGVCVRVGMALGAADTVQAKRSAVSGVLSIVGISLVLGTLISILKNQLGHIFTNDEDVIALVSQVLPVYSVFHVFEAICCVYGGVLRGTGKQAFGAAVNAITYYIIGLPLGILLTFVVRMRIMGLWLGMLACVFLATAAFVAYTARLDWKLAAEE.... The pIC50 is 6.1. (10) The compound is NNCC(=O)O. The target protein (P32929) has sequence MQEKDASSQGFLPHFQHFATQAIHVGQDPEQWTSRAVVPPISLSTTFKQGAPGQHSGFEYSRSGNPTRNCLEKAVAALDGAKYCLAFASGLAATVTITHLLKAGDQIICMDDVYGGTNRYFRQVASEFGLKISFVDCSKIKLLEAAITPETKLVWIETPTNPTQKVIDIEGCAHIVHKHGDIILVVDNTFMSPYFQRPLALGADISMYSATKYMNGHSDVVMGLVSVNCESLHNRLRFLQNSLGAVPSPIDCYLCNRGLKTLHVRMEKHFKNGMAVAQFLESNPWVEKVIYPGLPSHPQHELVKRQCTGCTGMVTFYIKGTLQHAEIFLKNLKLFTLAESLGGFESLAELPAIMTHASVLKNDRDVLGISDTLIRLSVGLEDEEDLLEDLDQALKAAHPPSGSHS. The pIC50 is 5.0.