Dataset: Drug-target binding data from BindingDB using IC50 measurements. Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50. (1) The drug is N#Cc1cccc2c(O[C@H]3CC[C@H](NC(=O)c4cccc(F)c4)CC3)ccnc12. The target protein sequence is MEVQLGLGRVYPRPPSKTYRGAFQNLFQSVREVIQNPGPRHPEAASAAPPGASLLLLQQQQQQQQQQQQQQQQQQQQQETSPRQQQQQQGEDGSPQAHRRGPTGYLVLDEEQQPSQPQSALECHPERGCVPEPGAAVAASKGLPQQLPAPPDEDDSAAPSTLSLLGPTFPGLSSCSADLKDILSEASTMQLLQQQQQEAVSEGSSSGRAREASGAPTSSKDNYLGGTSTISDNAKELCKAVSVSMGLGVEALEHLSPGEQLRGDCMYAPLLGVPPAVRPTPCAPLAECKGSLLDDSAGKSTEDTAEYSPFKGGYTKGLEGESLGCSGSAAAGSSGTLELPSTLSLYKSGALDEAAAYQSRDYYNFPLALAGPPPPPPPPHPHARIKLENPLDYGSAWAAAAAQCRYGDLASLHGAGAAGPGSGSPSAAASSSWHTLFTAEEGQLYGPCGGGGGGGGGGGGGGGGGGGGGGGGEAGAVAPYGYTRPPQGLAGQESDFTAPD.... The pIC50 is 7.5. (2) The compound is CC1=C(CCC(=O)O)c2cc3nc(cc4[nH]c(cc5[nH]c(cc1n2)c(C)c5C(COC(=O)C12C[C@H]5C[C@@H](C1)C[C@@H](C2)C5)OC(=O)C12C[C@H]5C[C@@H](C1)C[C@@H](C2)C5)c(C)c4C(COC(=O)C12C[C@H]4C[C@@H](C1)C[C@@H](C2)C4)OC(=O)C12C[C@H]4C[C@@H](C1)C[C@@H](C2)C4)C(C)=C3CCC(=O)O. The target protein sequence is MTGDTPINIFGRNILTALGMSLNLPVARIEPIKITLKPGKDGPRLKQWPLTKEKVEALKEICEKMEKEGQLEEAPPTNPYNTPTFAIKKKDKNKWRMLIDFRELNRVTQDFTEIQLGIPHPAGLAKKKRITVLDVGDAYFSIPLYEDFRPYTAFTLPSVNNVEPGKRYIYKVLPQGWKGSPAIFQYTMRQILEPFRKANPDVILIQYMDDILIASDRTGLEHDKVVLQLKELLNGLGFSTPEEKFQKDPPFQWMGYELWPTKWKLQKIQLPQKETWTVNDIQKLVGILNWAAQIYPGIKTKHLCRLIRGKMTLTEEVQWTELAEAELEENRIILDQEQEGHYYQEEKELEATIQKSQDNQWTYKIHQEEKILKVGKYAKIKNTHTNGVRLLAQVVQKIGKEALVIWGRIPKFHLPVERETWEQWWDNYWQVTWIPEWDFVSTPPLVRLTFNLVGDPIPGTETFYTDGSCNRQSKEGKAGYVTDRGRDKVRVLEQTTNQQA.... The pIC50 is 4.9. (3) The small molecule is Cl.N=C(N)N1CC2C3CCC(C4CCC43)C2C1. The target protein (P0DOF5) has sequence MSLLTEVETPIRNEWGCRCNDSSDPLVVAASIIGILHLILWILDRLFFKCIYRFFEHGLKRGPSTEGVPESMREEYRKEQQSAVDADDSHFVSIELE. The pIC50 is 5.7. (4) The compound is CCCc1nn(C)c2c(=O)[nH]c(-c3cc(S(=O)(=O)N4CCN(C)CC4)ccc3OCC)nc12. The target protein sequence is EETRELQSLAAAVVPSAQTLKITDFSFSDFELSDLETALCTIRMFTDLNLVQNFQMKHEVLCRWILSVKKNYRKNVAYHNWRHAFNTAQCMFAALKAGKIQNKLTDLEILALLIAALSHDLDHRGVNNSYIQRSEHPLAQLYCHSIMEHHHFDQCLMILNSPGNQILSGLSIEEYKTTLKIIKQAILATDLALYIKRRGEFFELIRKNQFNLEDPHQKELFLAMLMTACDLSAITKPWPIQQRIAELVATEFFDQGDRERKELNIEPTDLMNREKKNKIPSMQVGFIDAICLQLYEALTHVSEDCFPLLDGCRKNRQKWQALAEQQ. The pIC50 is 8.6. (5) The drug is CCNC(=O)c1ccc(C(=O)NCCC2CCN(c3ncnc4cc(C(N)=O)sc34)CC2)s1. The target protein (Q96EB6) has sequence MADEAALALQPGGSPSAAGADREAASSPAGEPLRKRPRRDGPGLERSPGEPGGAAPEREVPAAARGCPGAAAAALWREAEAEAAAAGGEQEAQATAAAGEGDNGPGLQGPSREPPLADNLYDEDDDDEGEEEEEAAAAAIGYRDNLLFGDEIITNGFHSCESDEEDRASHASSSDWTPRPRIGPYTFVQQHLMIGTDPRTILKDLLPETIPPPELDDMTLWQIVINILSEPPKRKKRKDINTIEDAVKLLQECKKIIVLTGAGVSVSCGIPDFRSRDGIYARLAVDFPDLPDPQAMFDIEYFRKDPRPFFKFAKEIYPGQFQPSLCHKFIALSDKEGKLLRNYTQNIDTLEQVAGIQRIIQCHGSFATASCLICKYKVDCEAVRGDIFNQVVPRCPRCPADEPLAIMKPEIVFFGENLPEQFHRAMKYDKDEVDLLIVIGSSLKVRPVALIPSSIPHEVPQILINREPLPHLHFDVELLGDCDVIINELCHRLGGEYAKL.... The pIC50 is 8.4. (6) The compound is CCO[C@@H](Cc1ccc(OCc2cc(CO)ccn2)cc1)C(=O)N(C)OC. The target is CKENALLRYLLDKDD. The pIC50 is 4.6. (7) The pIC50 is 6.1. The small molecule is CCCCCNC(=O)N1CCC(CN(Cc2ccc(Cl)cc2)Cc2ccc([N+](=O)[O-])s2)C1. The target protein (P20393) has sequence MTTLDSNNNTGGVITYIGSSGSSPSRTSPESLYSDNSNGSFQSLTQGCPTYFPPSPTGSLTQDPARSFGSIPPSLSDDGSPSSSSSSSSSSSSFYNGSPPGSLQVAMEDSSRVSPSKSTSNITKLNGMVLLCKVCGDVASGFHYGVHACEGCKGFFRRSIQQNIQYKRCLKNENCSIVRINRNRCQQCRFKKCLSVGMSRDAVRFGRIPKREKQRMLAEMQSAMNLANNQLSSQCPLETSPTQHPTPGPMGPSPPPAPVPSPLVGFSQFPQQLTPPRSPSPEPTVEDVISQVARAHREIFTYAHDKLGSSPGNFNANHASGSPPATTPHRWENQGCPPAPNDNNTLAAQRHNEALNGLRQAPSSYPPTWPPGPAHHSCHQSNSNGHRLCPTHVYAAPEGKAPANSPRQGNSKNVLLACPMNMYPHGRSGRTVQEIWEDFSMSFTPAVREVVEFAKHIPGFRDLSQHDQVTLLKAGTFEVLMVRFASLFNVKDQTVMFLSR....