Dataset: Drug-target binding data from BindingDB using IC50 measurements. Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50. (1) The drug is Cn1c(CCCC(=O)O)nc2cc(N(CCCl)CCCl)ccc21. The target protein (Q969S8) has sequence MGTALVYHEDMTATRLLWDDPECEIERPERLTAALDRLRQRGLEQRCLRLSAREASEEELGLVHSPEYVSLVRETQVLGKEELQALSGQFDAIYFHPSTFHCARLAAGAGLQLVDAVLTGAVQNGLALVRPPGHHGQRAAANGFCVFNNVAIAAAHAKQKHGLHRILVVDWDVHHGQGIQYLFEDDPSVLYFSWHRYEHGRFWPFLRESDADAVGRGQGLGFTVNLPWNQVGMGNADYVAAFLHLLLPLAFEFDPELVLVSAGFDSAIGDPEGQMQATPECFAHLTQLLQVLAGGRVCAVLEGGYHLESLAESVCMTVQTLLGDPAPPLSGPMAPCQSALESIQSARAAQAPHWKSLQQQDVTAVPMSPSSHSPEGRPPPLLPGGPVCKAAASAPSSLLDQPCLCPAPSVRTAVALTTPDITLVLPPDVIQQEASALREETEAWARPHESLAREEALTALGKLLYLLDGMLDGQVNSGIAATPASAAAATLDVAVRRGLS.... The pIC50 is 7.1. (2) The compound is CNc1cc(C2(C(=O)Nc3cc(C(=O)Nc4ccc(OC5CCN(C)CC5)c(C(F)(F)F)c4)ccc3C)CC2)ncn1. The target protein (P10398) has sequence MEPPRGPPANGAEPSRAVGTVKVYLPNKQRTVVTVRDGMSVYDSLDKALKVRGLNQDCCVVYRLIKGRKTVTAWDTAIAPLDGEELIVEVLEDVPLTMHNFVRKTFFSLAFCDFCLKFLFHGFRCQTCGYKFHQHCSSKVPTVCVDMSTNRQQFYHSVQDLSGGSRQHEAPSNRPLNELLTPQGPSPRTQHCDPEHFPFPAPANAPLQRIRSTSTPNVHMVSTTAPMDSNLIQLTGQSFSTDAAGSRGGSDGTPRGSPSPASVSSGRKSPHSKSPAEQRERKSLADDKKKVKNLGYRDSGYYWEVPPSEVQLLKRIGTGSFGTVFRGRWHGDVAVKVLKVSQPTAEQAQAFKNEMQVLRKTRHVNILLFMGFMTRPGFAIITQWCEGSSLYHHLHVADTRFDMVQLIDVARQTAQGMDYLHAKNIIHRDLKSNNIFLHEGLTVKIGDFGLATVKTRWSGAQPLEQPSGSVLWMAAEVIRMQDPNPYSFQSDVYAYGVVLY.... The pIC50 is 7.4. (3) The small molecule is Cc1ccc(-c2nc(CSc3nc(N)cc(N)n3)cs2)cc1OCCF. The pIC50 is 5.6. The target protein (P43346) has sequence MATPPKRFCPSPSTSSEGTRIKKISIEGNIAAGKSTFVNILKQASEDWEVVPEPVARWCNVQSTQEEFEELTTSQKSGGNVLQMMYEKPERWSFTFQSYACLSRIRAQLASLNGKLKDAEKPVLFFERSVYSDRYIFASNLYESDCMNETEWTIYQDWHDWMNSQFGQSLELDGIIYLRATPEKCLNRIYLRGRNEEQGIPLEYLEKLHYKHESWLLHRTLKTSFDYLQEVPVLTLDVNEDFKDKHESLVEKVKEFLSTL. (4) The drug is Cc1ccc(CNc2nc(N)nc3[nH]c4cc(C)c(O)cc4c23)cc1. The target protein (P51617) has sequence MAGGPGPGEPAAPGAQHFLYEVPPWVMCRFYKVMDALEPADWCQFAALIVRDQTELRLCERSGQRTASVLWPWINRNARVADLVHILTHLQLLRARDIITAWHPPAPLPSPGTTAPRPSSIPAPAEAEAWSPRKLPSSASTFLSPAFPGSQTHSGPELGLVPSPASLWPPPPSPAPSSTKPGPESSVSLLQGARPFPFCWPLCEISRGTHNFSEELKIGEGGFGCVYRAVMRNTVYAVKRLKENADLEWTAVKQSFLTEVEQLSRFRHPNIVDFAGYCAQNGFYCLVYGFLPNGSLEDRLHCQTQACPPLSWPQRLDILLGTARAIQFLHQDSPSLIHGDIKSSNVLLDERLTPKLGDFGLARFSRFAGSSPSQSSMVARTQTVRGTLAYLPEEYIKTGRLAVDTDTFSFGVVVLETLAGQRAVKTHGARTKYLKDLVEEEAEEAGVALRSTQSTLQAGLAADAWAAPIAMQIYKKHLDPRPGPCPPELGLGLGQLACCC.... The pIC50 is 4.0. (5) The compound is O=c1ccn(COCCCCNS(=O)(=O)c2cccc(OCC3CC3)c2)c(=O)[nH]1. The target protein (P33316) has sequence MTPLCPRPALCYHFLTSLLRSAMQNARGARQRAEAAVLSGPGPPLGRAAQHGIPRPLSSAGRLSQGCRGASTVGAAGWKGELPKAGGSPAPGPETPAISPSKRARPAEVGGMQLRFARLSEHATAPTRGSARAAGYDLYSAYDYTIPPMEKAVVKTDIQIALPSGCYGRVAPRSGLAAKHFIDVGAGVIDEDYRGNVGVVLFNFGKEKFEVKKGDRIAQLICERIFYPEIEEVQALDDTERGSGGFGSTGKN. The pIC50 is 6.2. (6) The drug is CCCCOCCOc1ccc(-c2ccc3c(c2)C=C(C(=O)Nc2ccc([S@@](=O)Cc4cncn4CCC)cc2)CCCN3CC(C)C)cc1. The target protein (P32248) has sequence MDLGKPMKSVLVVALLVIFQVCLCQDEVTDDYIGDNTTVDYTLFESLCSKKDVRNFKAWFLPIMYSIICFVGLLGNGLVVLTYIYFKRLKTMTDTYLLNLAVADILFLLTLPFWAYSAAKSWVFGVHFCKLIFAIYKMSFFSGMLLLLCISIDRYVAIVQAVSAHRHRARVLLISKLSCVGIWILATVLSIPELLYSDLQRSSSEQAMRCSLITEHVEAFITIQVAQMVIGFLVPLLAMSFCYLVIIRTLLQARNFERNKAIKVIIAVVVVFIVFQLPYNGVVLAQTVANFNITSSTCELSKQLNIAYDVTYSLACVRCCVNPFLYAFIGVKFRNDLFKLFKDLGCLSQEQLRQWSSCRHIRRSSMSVEAETTTTFSP. The pIC50 is 5.0. (7) The pIC50 is 5.0. The drug is O=[N+]([O-])c1ccc2c(N=Nc3ccc4ccccc4c3O)c(O)cc(S(=O)(=O)O)c2c1. The target protein (P9WMN1) has sequence MLRVAVPNKGALSEPATEILAEAGYRRRTDSKDLTVIDPVNNVEFFFLRPKDIAIYVGSGELDFGITGRDLVCDSGAQVRERLALGFGSSSFRYAAPAGRNWTTADLAGMRIATAYPNLVRKDLATKGIEATVIRLDGAVEISVQLGVADAIADVVGSGRTLSQHDLVAFGEPLCDSEAVLIERAGTDGQDQTEARDQLVARVQGVVFGQQYLMLDYDCPRSALKKATAITPGLESPTIAPLADPDWVAIRALVPRRDVNGIMDELAAIGAKAILASDIRFCRF. (8) The drug is O=C(c1ccc(OCCN2CCCCC2)cc1)c1c(-c2ccc(O)cc2)sc2cc(O)ccc12. The target protein (Q62986) has sequence MEIKNSPSSLSSPASYNCSQSILPLEHGPIYIPSSYVDNRHEYSAMTFYSPAVMNYSVPGSTSNLDGGPVRLSTSPNVLWPTSGHLSPLATHCQSSLLYAEPQKSPWCEARSLEHTLPVNRETLKRKLSGSSCASPVTSPNAKRDAHFCPVCSDYASGYHYGVWSCEGCKAFFKRSIQGHNDYICPATNQCTIDKNRRKSCQACRLRKCYEVGMVKCGSRRERCGYRIVRRQRSSSEQVHCLSKAKRNGGHAPRVKELLLSTLSPEQLVLTLLEAEPPNVLVSRPSMPFTEASMMMSLTKLADKELVHMIGWAKKIPGFVELSLLDQVRLLESCWMEVLMVGLMWRSIDHPGKLIFAPDLVLDRDEGKCVEGILEIFDMLLATTSRFRELKLQHKEYLCVKAMILLNSSMYPLASANQEAESSRKLTHLLNAVTDALVWVIAKSGISSQQQSVRLANLLMLLSHVRHISNKGMEHLLSMKCKNVVPVYDLLLEMLNAHTL.... The pIC50 is 8.5. (9) The compound is O=C(COc1ccc(-c2oc3cc(O)c(C(=O)O)cc3c2C#Cc2cccc(Cl)c2)cc1)NC1CC1. The target protein (Q9Y2R2) has sequence MDQREILQKFLDEAQSKKITKEEFANEFLKLKRQSTKYKADKTYPTTVAEKPKNIKKNRYKDILPYDYSRVELSLITSDEDSSYINANFIKGVYGPKAYIATQGPLSTTLLDFWRMIWEYSVLIIVMACMEYEMGKKKCERYWAEPGEMQLEFGPFSVSCEAEKRKSDYIIRTLKVKFNSETRTIYQFHYKNWPDHDVPSSIDPILELIWDVRCYQEDDSVPICIHCSAGCGRTGVICAIDYTWMLLKDGIIPENFSVFSLIREMRTQRPSLVQTQEQYELVYNAVLELFKRQMDVIRDKHSGTESQAKHCIPEKNHTLQADSYSPNLPKSTTKAAKMMNQQRTKMEIKESSSFDFRTSEISAKEELVLHPAKSSTSFDFLELNYSFDKNADTTMKWQTKAFPIVGEPLQKHQSLDLGSLLFEGCSNSKPVNAAGRYFNSKVPITRTKSTPFELIQQRETKEVDSKENFSYLESQPHDSCFVEMQAQKVMHVSSAELNYS.... The pIC50 is 6.6. (10) The small molecule is Cc1cnc(C(=O)N[C@@H](Cc2nc[nH]c2Br)C(=O)N2CCC[C@H]2C(N)=O)cn1. The target protein sequence is MDGPSNVSLVHGDTTLGLPEYKVVSVLLVLLVCTVGIVGNAMVVLVVLTSRDMHTPTNCYLVSLALADLIVLLAAGLPNVSDSLVGHWIYGHAGCLGITYFQYLGINVSSCSILAFTVERYIAICHPMRAQTVCTVARARRIIAGIWGVTSLYCLLWFFLVDLNVRDNQRLECGYKVSRGLYLPIYLLDFAVFFIAPLLGTLVLYGFIGRILFQSPLSQEAWQKERQSHGQSEGTPGNCSRSKSSMSSRKQ. The pIC50 is 4.3.