Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50. From a dataset of Drug-target binding data from BindingDB using IC50 measurements. (1) The small molecule is Cc1ccc(S(=O)(=O)Nc2ccc3c(c2)C(=O)N(N2CCOCC2)C3=O)cc1. The target protein (Q02401) has sequence MELPWTALFLSTVLLGLSCQGSDWESDRNFISAAGPLTNDLVLNLNYPPGKQGSDVVSGNTDHLLCQQPLPSFLSQYFSSLRASQVTHYKVLLSWAQLLPTGSSKNPDQEAVQCYRQLLQSLKDAQLEPMVVLCHQTPPTSSAIQREGAFADLFADYATLAFQSFGDLVEIWFTFSDLEKVIMDLPHKDLKASALQTLSNAHRRAFEIYHRKFSSQGGKLSVVLKAEDIPELLPDPALAALVQGSVDFLSLDLSYECQSVATLPQKLSELQNLEPKVKVFIYTLKLEDCPATGTSPSSLLISLLEAINKDQIQTVGFDVNAFLSCTSNSEESPSCSLTDSLALQTEQQQETAVPSSPGSAYQRVWAAFANQSREERDAFLQDVFPEGFLWGISTGAFNVEGGWAEGGRGPSIWDHYGNLNAAEGQATAKVASDSYHKPASDVALLRGIRAQVYKFSISWSGLFPLGQKSTPNRQGVAYYNKLIDRLLDSHIEPMATLFHW.... The pIC50 is 3.7. (2) The target protein (Q01984) has sequence MASFMRSLFSDHSRYVESFRRFLNNSTEHQCMQEFMDKKLPGIIARIGETKAEIKILSIGGGAGEIDLQILSKVQAQYPGICINNEVVEPNAEQIVKYKELVAKTSNMENIKFAWHKETSSEYQKRVVEEDEEPPKWDFIHMIQMLYYVKDIPATLKFFHGLLAANAKILIILVSGTSGWEKLWKKYGFRLPRDDLCQYVTSSDLAQILDDLGIKYECYDLLSTMDITDCFIDGNENGDLLWDFLTETCNFIKTAPLDLKEEIMKDLQEPEFSVKKEGKVLFNNNLSFIVVEANV. The drug is c1ccc2c(NCCCCc3ccc(OCCCN4CCCCC4)cc3)c3c(nc2c1)CCCC3. The pIC50 is 7.3. (3) The drug is COc1ccc(NC(=O)c2ccc(-c3ccc(C(N)=O)cc3C)cc2)cc1N1CCN(C)CC1. The target protein (P28564) has sequence MEEQGIQCAPPPPATSQTGVPLANLSHNCSADDYIYQDSIALPWKVLLVALLALITLATTLSNAFVIATVYRTRKLHTPANYLIASLAVTDLLVSILVMPISTMYTVTGRWTLGQVVCDFWLSSDITCCTASIMHLCVIALDRYWAITDAVDYSAKRTPKRAAIMIVLVWVFSISISLPPFFWRQAKAEEEVLDCFVNTDHVLYTVYSTVGAFYLPTLLLIALYGRIYVEARSRILKQTPNKTGKRLTRAQLITDSPGSTSSVTSINSRVPEVPSESGSPVYVNQVKVRVSDALLEKKKLMAARERKATKTLGIILGAFIVCWLPFFIISLVMPICKDACWFHMAIFDFFNWLGYLNSLINPIIYTMSNEDFKQAFHKLIRFKCTG. The pIC50 is 8.8. (4) The drug is Cc1sc(-c2cccc(F)c2)nc1CSc1nc(N)cc(N)n1. The target protein (P43346) has sequence MATPPKRFCPSPSTSSEGTRIKKISIEGNIAAGKSTFVNILKQASEDWEVVPEPVARWCNVQSTQEEFEELTTSQKSGGNVLQMMYEKPERWSFTFQSYACLSRIRAQLASLNGKLKDAEKPVLFFERSVYSDRYIFASNLYESDCMNETEWTIYQDWHDWMNSQFGQSLELDGIIYLRATPEKCLNRIYLRGRNEEQGIPLEYLEKLHYKHESWLLHRTLKTSFDYLQEVPVLTLDVNEDFKDKHESLVEKVKEFLSTL. The pIC50 is 6.6. (5) The small molecule is C=CCOC(=O)C[C@H](NC(=O)[C@H](Cc1ccccc1)N(C)C(=O)[C@H](CCCN=C(N)NC(=O)NC)NC(C)=O)C(=O)O. The target protein sequence is MSTNNTINAVAADDAAIMPSIANKKILMGFWHNWAAGASDGYQQGQFANMNLTDIPTEYNVVAVAFMKGQGIPTFKPYNLSDTEFRRQVGVLNSQGRAVLISLGGADAHIELKTGDEDKLKDEIIRLVEVYGFDGLDIDLEQAAIGAANNKTVLPAALKKVKDHYAAQGKNFIISMAPEFPYLRTNGTYLDYINALEGYYDFIAPQYYNQGGDGIWVDELNAWITQNNDAMKEDFLYYLTESLVTGTRGYAKIPAAKFVIGLPSNNDAAATGYVVNKQAVYNAFSRLDAKNLSIKGLMTWSINWDNGKSKAGVAYNWEFKTRYAPLIQGGVTPPPGKPNAPTALTVAELGATSLKLSWAAATGAFPIASYTVYRNGNPIGQTAGLSLADGGLTPATQYSYFVTATDSQGNTSLPSSALAVKTANDGTPPDPGAPEWQNNHSYKAGDVVSYKGKKYTCIQAHTSNAGWTPDAAFTLWQLIA. The pIC50 is 4.5. (6) The small molecule is Cn1cnc(S(=O)(=O)N(CCN(Cc2cncn2C)c2ccc(C#N)cc2)Cc2ccccn2)c1. The target protein (Q04631) has sequence MAATEGVGESAPGGEPGQPEQPPPPPPPPPAQQPQEEEMAAEAGEAAASPMDDGFLSLDSPTYVLYRDRAEWADIDPVPQNDGPSPVVQIIYSEKFRDVYDYFRAVLQRDERSERAFKLTRDAIELNAANYTVWHFRRVLLRSLQKDLQEEMNYIIAIIEEQPKNYQVWHHRRVLVEWLKDPSQELEFIADILNQDAKNYHAWQHRQWVIQEFRLWDNELQYVDQLLKEDVRNNSVWNQRHFVISNTTGYSDRAVLEREVQYTLEMIKLVPHNESAWNYLKGILQDRGLSRYPNLLNQLLDLQPSHSSPYLIAFLVDIYEDMLENQCDNKEDILNKALELCEILAKEKDTIRKEYWRYIGRSLQSKHSRESDIPASV. The pIC50 is 7.4. (7) The compound is Cc1cccc(C)c1/C=C/n1cnc2c(Nc3ccc(P(C)(C)=O)cc3)ncnc21. The target protein sequence is MLRGPGPGLLLLAVQCLGTAVPSTGASKSKRQAQQMVQPQSPVAVSQSKPGCYDNGKHYQINQQWERTYLGNALVCTCYGGSRGFNCESKPEAEETCFDKYTGNTYRVGDTYERPKDSMIWDCTCIGAGRGRISCTIANRCHEGGQSYKIGDTWRRPHETGGYMLECVCLGNGKGEWTCKPIAEKCFDHAAGTSYVVGETWEKPYQGWMMVDCTCLGEGSGRITCTSRNRCNDQDTRTSYRIGDTWSKKDNRGNLLQCICTGNGRGEWKCERHTSVQTTSSGSGPFTDVRAAVYQPQPHPQPPPYGHCVTDSGVVYSVGMQWLKTQGNKQMLCTCLGNGVSCQETAVTQTYGGNSNGEPCVLPFTYNGRTFYSCTTEGRQDGHLWCSTTSNYEQDQKYSFCTDHTVLVQTQGGNSNGALCHFPFLYNNHNYTDCTSEGRRDNMKWCGTTQNYDADQKFGFCPMAAHEEICTTNEGVMYRIGDQWDKQHDMGHMMRCTCVG.... The pIC50 is 8.0.