Dataset: Drug-target binding data from BindingDB using IC50 measurements. Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50. (1) The pIC50 is 5.2. The target protein sequence is MTDRVSVGNLRIARVLYDFVNNEALPGTDIDPDSFWAGVDKVVADLTPQNQALLNARDELQAQIDKWHRRRVIEPIDMDAYRQFLTEIGYLLPEPDDFTITTSGVDAEITTTAGPQLVVPVLNARFALNAANARWGSLYDALYGTDVIPETDGAEKGPTYNKVRGDKVIAYARKFLDDSVPLSSGSFGDATGFTVQDGQLVVALPDKSTGLANPGQFAGYTGAAESPTSVLLINHGLHIEILIDPESQVGTTDRAGVKDVILESAITTIMDFEDSVAAVDAADKVLGYRNWLGLNKGDLAAAVDKDGTAFLRVLNRDRNYTAPGGGQFTLPGRSLMFVRNVGHLMTNDAIVDTDGSEVFEGIMDALFTGLIAIHGLKASDVNGPLINSRTGSIYIVKPKMHGPAEVAFTCELFSRVEDVLGLPQNTMKIGIMDEERRTTVNLKACIKAAADRVVFINTGFLDRTGDEIHTSMEAGPMVRKGTMKSQPWILAYEDHNVDAG.... The drug is Cc1ccc(C(=O)CC(=O)C(=O)O)cc1. (2) The compound is CCN1C(=S)N2CCCC2c2c1nc(C(C)N1CCOCC1)[nH]c2=O. The target protein (P32871) has sequence MPPRPSSGELWGIHLMPPRILVECLLPNGMIVTLECLREATLITIKHELFKEARKYPLHQLLQDESSYIFVSVTQEAEREEFFDETRRLCDLRLFQPFLKVIEPVGNREEKILNREIGFAIGMPVCEFDMVKDPEVQDFRRNILNVCKEAVDLRDLNSPHSRAMYVYPPNVESSPELPKHIYNKLDKGQIIVVIWVIVSPNNDKQKYTLKINHDCVPEQVIAEAIRKKTRSMLLSSEQLKLCVLEYQGKYILKVCGCDEYFLEKYPLSQYKYIRSCIMLGRMPNLMLMAKESLYSQLPMDCFTMPSYSRRISTATPYMNGETSTKSLWVINSALRIKILCATYVNVNIRDIDKIYVRTGIYHGGEPLCDNVNTQRVPCSNPRWNEWLNYDIYIPDLPRAARLCLSICSVKGRKGAKEEHCPLAWGNINLFDYTDTLVSGKMALNLWPVPHGLEDLLNPIGVTGSNPNKETPCLELEFDWFSSVVKFPDMSVIEEHANWSV.... The pIC50 is 9.4. (3) The small molecule is CCN(c1cc(-c2ccc(CN3CCOCC3)cc2)cc(C(=O)NCc2c(C)c(F)c(C)[nH]c2=O)c1C)C1CCOCC1. The target protein sequence is MGQTGKKSEKGPVCWRKRVKSEYMRLRQLKRFRRADEVKSMFSSNRQKILERTEILNQEWKQRRIQPVHILTSVSSLRGTRECSVTSDLDFPTQVIPLKTLNAVASVPIMYSWSPLQQNFMVEDETVLHNIPYMGDEVLDQDGTFIEELIKNYDGKVHGDRECGFINDEIFVELVNALGQYNDDDDDDDGDDPEEREEKQKDLEDHRDDKESRPPRKFPSDKIFEAISSMFPDKGTAEELKEKYKELTEQQLPGALPPECTPNIDGPNAKSVQREQSLHSFHTLFCRRCFKYDCFLHPFHATPNTYKRKNTETALDNKPCGPQCYQHLEGAKEFAAALTAERIKTPPKRPGGRRRGRLPNNSSRPSTPTINVLESKDTDSDREAGTETGGENNDKEEEEKKDETSSSSEANSRCQTPIKMKPNIEPPENVEWSGAEASMFRVLIGTYYDNFCAIARLIGTKTCRQVYEFRVKESSIIAPAPAEDVDTPPRKKKRKHRLWA.... The pIC50 is 7.0. (4) The drug is CN(C)CCCN1c2ccccc2Sc2ccc(Cl)cc21. The target protein (P80456) has sequence MEPAPELLFYVNGRKVVEKQVDPETMLLPYLRKKLRLTGTKYGCGGGGCGACTVMISRYNRVTKKIRHYPVNACLTPICSLYGAAVTTVEGIGSTTTRLHPVQERIAKFHGTQCGFCTPGMVMSMYALLRNHPEPTLDQLADALGGNLCRCTGYRPIIEAYKTFCKTSDCCQNKENGFCCLDQGINGLPEVEEENQTRPNLFSEEEYLPLDPTQELIFPPELMTMAEKQPQRTRVFSGERMMWISPVTLKALLEAKSTYPQAPVVMGNTSVGPGVKFKGIFHPVIISPDSIEELNVVSHTHSGLTLGAGLSLAQVKDILADVVQKVPEENAQTYRALLKHLGTLAGSQIRNMASLGGHIISRHLDSDLNPLLAVGNCTLNVLSKEGERQIPLDEQFLSRCPEADLKPQEILASVHIPYSRKWEFVLAFRQAQRKQNALAIVNSGMRVFFGEGDGIIRELAISYGGVGPTIICAKNSCQKLIGRSWNEEMLDTACRLILDE.... The pIC50 is 4.7. (5) The drug is O=C(c1ccccc1)c1ccccc1OCC(O)CN1CCN(c2cccc3[nH]c(=O)oc23)CC1. The target protein (P35498) has sequence MEQTVLVPPGPDSFNFFTRESLAAIERRIAEEKAKNPKPDKKDDDENGPKPNSDLEAGKNLPFIYGDIPPEMVSEPLEDLDPYYINKKTFIVLNKGKAIFRFSATSALYILTPFNPLRKIAIKILVHSLFSMLIMCTILTNCVFMTMSNPPDWTKNVEYTFTGIYTFESLIKIIARGFCLEDFTFLRDPWNWLDFTVITFAYVTEFVDLGNVSALRTFRVLRALKTISVIPGLKTIVGALIQSVKKLSDVMILTVFCLSVFALIGLQLFMGNLRNKCIQWPPTNASLEEHSIEKNITVNYNGTLINETVFEFDWKSYIQDSRYHYFLEGFLDALLCGNSSDAGQCPEGYMCVKAGRNPNYGYTSFDTFSWAFLSLFRLMTQDFWENLYQLTLRAAGKTYMIFFVLVIFLGSFYLINLILAVVAMAYEEQNQATLEEAEQKEAEFQQMIEQLKKQQEAAQQAATATASEHSREPSAAGRLSDSSSEASKLSSKSAKERRNR.... The pIC50 is 4.5. (6) The compound is O=C1NC(=S)N(c2ccc(Cl)cc2)C(=O)/C1=C\c1cc2ccccc2[nH]1. The target protein (P22188) has sequence MADRNLRDLLAPWVPDAPSRALREMTLDSRVAAAGDLFVAVVGHQADGRRYIPQAIAQGVAAIIAEAKDEATDGEIREMHGVPVIYLSQLNERLSALAGRFYHEPSDNLRLVGVTGTNGKTTTTQLLAQWSQLLGEISAVMGTVGNGLLGKVIPTENTTGSAVDVQHELAGLVDQGATFCAMEVSSHGLVQHRVAALKFAASVFTNLSRDHLDYHGDMEHYEAAKWLLYSEHHCGQAIINADDEVGRRWLAKLPDAVAVSMEDHINPNCHGRWLKATEVNYHDSGATIRFSSSWGDGEIESHLMGAFNVSNLLLALATLLALGYPLADLLKTAARLQPVCGRMEVFTAPGKPTVVVDYAHTPDALEKALQAARLHCAGKLWCVFGCGGDRDKGKRPLMGAIAEEFADVAVVTDDNPRTEEPRAIINDILAGMLDAGHAKVMEGRAEAVTCAVMQAKENDVVLVAGKGHEDYQIVGNQRLDYSDRVTVARLLGVIA. The pIC50 is 4.4. (7) The compound is C=C(CC)CCCCC[C@@H]1NC(=O)[C@H]2CCCCN2CC(=O)[C@H]([C@@H](C)CC)NC(=O)[C@H](Cc2cn(OC)c3ccccc23)NC1=O. The target protein sequence is MGAKKKIAYFYDEEVGNFHYGLGHPMKPHRVRMTHDLVSQYGLLEKVDVMVPTPGTVESLTRFHSNDYVDFLRSVNTDNMHDYSDHLARFNVGEDCPVFDGLWEFCQLSAGGSLGGAQSVNELGYQYAINWAGGLHHGKKHEASGFCYVNDCVLGALEFLKYQHRVCYVDIDIHHGDGVEEAFYTSPRCMCVSFHKYGDYFPGTGALNDVGVEEGLGYSVNVPLKDGVDDATFIDLFTKVMTLVMENYRPGAIVLQCGADSLSGDRLGCFNLSLKGHGHAVSFLKKFNVPLLILGGGGYTLRNVPKCWTYETSLIVDTYIDEQLPNSSNFYGYYGPDFSLAVRTSNMENLNSRQDCEEIYRKISENFRDYVFPIGSQISAYDIPEKLPLLYNPNKTPDDYKDGNNIKHEQHQDFDDEMKEWPTVDYNNRAIG. The pIC50 is 5.8. (8) The drug is CS(=O)(=O)c1ccc2nc(NC(=O)NCc3cccs3)sc2c1. The target protein sequence is MKTRITELLKIDYPIFQGGMAWVADGDLAGAVSKAGGLGIIGGGNAPKEVVKANIDKIKSLTDKPFGVNIMLLSPFVEDIVDLVIEEGVKVVTTGAGNPSKYMERFHEAGIIVIPVVPSVALAKRMEKIGADAVIAEGMEAGGHIGKLTTMTLVRQVATAISIPVIAAGGIADGEGAAAGFMLGAEAVQVGTRFVVAKESNAHPNYKEKILKARDIDTTISAQHFGHAVRAIKNQLTRDFELAEKDAFKQEDPDLEIFEQMGAGALAKAVVHGDVDGGSVMAGQIAGLVSKEETAEEILKDLYYGAAKKIQEEASRWTGVVRND. The pIC50 is 4.5. (9) The small molecule is COC(=O)C[C@H]1[C@]2(C)C3=C(C)[C@H](c4ccoc4)C[C@H]3O[C@@H]2[C@H](OC(C)=O)[C@H]2[C@](C)(C(=O)OC)C=CC(=O)[C@]12C. The target protein sequence is MPCCEVITNVNLPDDNVQSTLSQIENAISDVMGKPLGYIMSNYDYQKNLRFGGSNEAYCFVRITSIGGINRSNNSALADQITKLLVSNLNVKSRRIYVEFRDCSAQNFAFSGSLFG. The pIC50 is 4.7. (10) The small molecule is O=C(OC[C@H]1O[C@@H](O)[C@H](OC(=O)c2cc(O)c(O)c(O)c2)[C@@H](OC(=O)c2cc(O)c(O)c(O)c2)[C@@H]1OC(=O)c1cc(O)c(O)c(O)c1)c1cc(O)c(O)c(O)c1. The target protein sequence is MNPNQKIITIGSVSLTIATVCFLMQIAILATTVTLHFKQHECDSPASNQVMPCEPIIIERNITEIVYLNNTTIEKEICPKVVEYRNWSKPQCQITGFAPFSKDNSIRLSAGGDIWVTREPYVSCDPGKCYQFALGQGTTLDNKHSNDTVHDRIPHRTLLMNELGVPFHLGTRQVCIAWSSSSCHDGKAWLHVCITGDDKNATASFIYDGRLVDSIGSWSQNILRTQESECVCINGTCTVVMTDGSASGRADTRILFIEEGKIVHISPLSGSAQHIEECSCYPRYPGVRCICRDNWKGSNRPVVDINMEDYSIDSSYVCSGLVGDTPRNDDSSSNSNCRNPNNERGTQGVKGWAFDNGNDLWMGRTISKESRSGYETFKVIGGWSTPNSKSQVNRQVIVDNNNWSGYSGIFSVEGKSCINRCFYVELIRGRPQETRVWWTSNSIVVFCGTSGTYGTGSWPDGANINFMPI. The pIC50 is 5.0.