Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50. From a dataset of Drug-target binding data from BindingDB using IC50 measurements. (1) The compound is Cc1ccc2c(Cl)c(C(=O)Nc3cc(C)on3)sc2c1. The target protein (Q9Y5X4) has sequence METRPTALMSSTVAAAAPAAGAASRKESPGRWGLGEDPTGVSPSLQCRVCGDSSSGKHYGIYACNGCSGFFKRSVRRRLIYRCQVGAGMCPVDKAHRNQCQACRLKKCLQAGMNQDAVQNERQPRSTAQVHLDSMESNTESRPESLVAPPAPAGRSPRGPTPMSAARALGHHFMASLITAETCAKLEPEDADENIDVTSNDPEFPSSPYSSSSPCGLDSIHETSARLLFMAVKWAKNLPVFSSLPFRDQVILLEEAWSELFLLGAIQWSLPLDSCPLLAPPEASAAGGAQGRLTLASMETRVLQETISRFRALAVDPTEFACMKALVLFKPETRGLKDPEHVEALQDQSQVMLSQHSKAHHPSQPVRFGKLLLLLPSLRFITAERIELLFFRKTIGNTPMEKLLCDMFKN. The pIC50 is 4.9. (2) The drug is NS(=O)(=O)c1ccccc1NS(=O)(=O)c1ccc(C2CCCCC2)cc1. The target protein (Q9H7Z7) has sequence MDPAARVVRALWPGGCALAWRLGGRPQPLLPTQSRAGFAGAAGGPSPVAAARKGSPRLLGAAALALGGALGLYHTARWHLRAQDLHAERSAAQLSLSSRLQLTLYQYKTCPFCSKVRAFLDFHALPYQVVEVNPVRRAEIKFSSYRKVPILVAQEGESSQQLNDSSVIISALKTYLVSGQPLEEIITYYPAMKAVNEQGKEVTEFGNKYWLMLNEKEAQQVYGGKEARTEEMKWRQWADDWLVHLISPNVYRTPTEALASFDYIVREGKFGAVEGAVAKYMGAAAMYLISKRLKSRHRLQDNVREDLYEAADKWVAAVGKDRPFMGGQKPNLADLAVYGVLRVMEGLDAFDDLMQHTHIQPWYLRVERAITEASPAH. The pIC50 is 5.8. (3) The compound is CCCCNC(=O)OCC#CCOc1nc(-c2ccncc2)nc(NS(=O)(=O)c2ccc(C(C)C)cn2)c1Oc1ccccc1OC. The target protein (P24530) has sequence MQPPPSLCGRALVALVLACGLSRIWGEERGFPPDRATPLLQTAEIMTPPTKTLWPKGSNASLARSLAPAEVPKGDRTAGSPPRTISPPPCQGPIEIKETFKYINTVVSCLVFVLGIIGNSTLLRIIYKNKCMRNGPNILIASLALGDLLHIVIDIPINVYKLLAEDWPFGAEMCKLVPFIQKASVGITVLSLCALSIDRYRAVASWSRIKGIGVPKWTAVEIVLIWVVSVVLAVPEAIGFDIITMDYKGSYLRICLLHPVQKTAFMQFYKTAKDWWLFSFYFCLPLAITAFFYTLMTCEMLRKKSGMQIALNDHLKQRREVAKTVFCLVLVFALCWLPLHLSRILKLTLYNQNDPNRCELLSFLLVLDYIGINMASLNSCINPIALYLVSKRFKNCFKSCLCCWCQSFEEKQSLEEKQSCLKFKANDHGYDNFRSSNKYSSS. The pIC50 is 7.7. (4) The small molecule is CS(=O)(=O)c1ccc(/C=C2/C(=O)Nc3ccc(F)cc32)cc1. The target protein (O02768) has sequence MLARALLLCAAVALSHAANPCCSNPCQNRGVCMTMGFDQYKCDCTRTGFYGENCSTPEFLTRIKLLLKPTPDTVHYILTHFKGVWNIVNSIPFLRNSIMKYVLTSRSHMIDSPPTYNVHYNYKSWEAFSNLSYYTRALPPVADDCPTPMGVKGKKELPDSKDVVEKLLLRRKFIPDPQGTNMMFAFFAQHFTHQFFKTDLKRGPAFTKGLGHGVDLNHIYGETLDRQHKLRLFKDGKMKYQVIDGEVYPPTVKDTQVEMIYPPHIPAHLQFAVGQEVFGLVPGLMMYATIWLREHNRVCDVLKQEHPEWDDEQLFQTSRLILIGETIKIVIEDYVQHLSGYHFKLKFDPELLFNQQFQYQNRIAAEFNTLYHWHPLLPDTFQIDDQQYNYQQFLYNNSILLEHGLTQFVESFTRQIAGRVAGGRNVPPAVQKVAKASIDQSRQMKYQSLNEYRKRFLLKPYESFEELTGEKEMAAELEALYGDIDAVELYPALLVERPRP.... The pIC50 is 4.3. (5) The drug is COc1ccc(CN2CCc3c(sc(NC(=O)c4cc(OCCN=C(N)N)ccc4Cl)c3C#N)C2)cc1. The target protein sequence is MKLTIHEIAQVVGAKNDISIFEDTQLEKAEFDSRLIGTGDLFVPLKGARDGHDFIETAFENGAAVTLSEKEVSNHPYILVDDVLTAFQSLASYYLEKTTVDVFAVTGSNGKTTTKDMLAHLLSTRYKTYKTQGNYNNEIGLPYTVLHMPEGTEKLVLEMGQDHLGDIHLLSELARPKTAIVTLVGEAHLAFFKDRSEIAKGKMQIADGMASGSLLLAPADPIVEDYLPTDKKVVRFGQGAELEITDLVERKDSLTFKANFLEQVLDLPVTGKYNATNAMIASYVALQEGVSEEQIHQAFQDLELTRNRTEWKKAANGADILSDVYNANPTAMKLILETFSAIPANEGGKKIAVLADMKELGNQSVQLHNQMILSLSPDVLDTVIFYGEDIAELAQLASQMFPIGHVYYFKKTEDQDQFEDLVKQVKESLSANDQILLKGSNSMNLAMLVESLENETK. The pIC50 is 4.2. (6) The drug is CC(O)C(O)C1CNc2c(nc(N)[nH]c2=O)N1. The target protein (P00439) has sequence MSTAVLENPGLGRKLSDFGQETSYIEDNCNQNGAISLIFSLKEEVGALAKVLRLFEENDVNLTHIESRPSRLKKDEYEFFTHLDKRSLPALTNIIKILRHDIGATVHELSRDKKKDTVPWFPRTIQELDRFANQILSYGAELDADHPGFKDPVYRARRKQFADIAYNYRHGQPIPRVEYMEEEKKTWGTVFKTLKSLYKTHACYEYNHIFPLLEKYCGFHEDNIPQLEDVSQFLQTCTGFRLRPVAGLLSSRDFLGGLAFRVFHCTQYIRHGSKPMYTPEPDICHELLGHVPLFSDRSFAQFSQEIGLASLGAPDEYIEKLATIYWFTVEFGLCKQGDSIKAYGAGLLSSFGELQYCLSEKPKLLPLELEKTAIQNYTVTEFQPLYYVAESFNDAKEKVRNFAATIPRPFSVRYDPYTQRIEVLDNTQQLKILADSINSEIGILCSALQKIK. The pIC50 is 6.0. (7) The compound is O=[N+]([O-])c1ccccc1CO[C@@H](CO)[C@H](O)C[S+]1C[C@@H](O)[C@H](O)[C@H]1CO.[Cl-]. The target protein (P23739) has sequence MAKKKFSALEISLIVLFIIVTAIAIALVTVLATKVPAVEEIKSPTPTSNSTPTSTPTSTSTPTSTSTPSPGKCPPEQGEPINERINCIPEQHPTKAICEERGCCWRPWNNTVIPWCFFADNHGYNAESITNENAGLKATLNRIPSPTLFGEDIKSVILTTQTQTGNRFRFKITDPNNKRYEVPHQFVKEETGIPAADTLYDVQVSENPFSIKVIRKSNNKVLCDTSVGPLLYSNQYLQISTRLPSEYIYGFGGHIHKRFRHDLYWKTWPIFTRDEIPGDNNHNLYGHQTFFMGIGDTSGKSYGVFLMNSNAMEVFIQPTPIITYRVTGGILDFYIFLGDTPEQVVQQYQEVHWRPAMPAYWNLGFQLSRWNYGSLDTVSEVVRRNREAGIPYDAQVTDIDYMEDHKEFTYDRVKFNGLPEFAQDLHNHGKYIIILDPAISINKRANGAEYQTYVRGNEKNVWVNESDGTTPLIGEVWPGLTVYPDFTNPQTIEWWANECN.... The pIC50 is 6.7. (8) The compound is O=P(O)(O)C(O)(Cn1ccnc1)P(=O)(O)O. The target protein sequence is MEAKIDELINNDPVWSSQNESLISKPYNHILLKPGKNFRLNLIVQINRVMNLPKDQLAIVSQIVELLHNSSLLIDDIEDNAPLRRGQTTSHLIFGVPSTINTANYMAARAMQLVSQLTTKEPLYHNLITIFNEELINLHRGQGLDIYWRDFLPEIIPTQEMYLNMVMNKTGGLFRLTLRLMEALSPSSHHGHSLVPFINLLGIIYQIRDDYLNLKDFQMSSEKGFAEDITEGKLSFPIVHALNFTKTKGQTEQHNEILRILLLRTSDKDIKLKLIQILEFDTNSLAYTKNFINQLVNMIKNDNENKYLPDLASHSDTATNLHDELLYIIDHLSEL. The pIC50 is 6.2. (9) The small molecule is O=C(O)CCC(=O)N1N=C(c2c(-c3ccc(F)cc3)c3ccccc3[nH]c2=O)CC1c1ccc(Cl)cc1. The target protein (Q62645) has sequence MRGAGGPRGPRGPAKMLLLLALACASPFPEEVPGPGAVGGGTGGARPLNVALVFSGPAYAAEAARLGPAVAAAVRSPGLDVRPVALVLNGSDPRSLVLQLCDLLSGLRVHGVVFEDDSRAPAVAPILDFLSAQTSLPIVAVHGGAALVLTPKEKGSTFLQLGSSTEQQLQVIFEVLEEYDWTSFVAVTTRAPGHRAFLSYIEVLTDGSLVGWEHRGALTLDPGAGEAVLGAQLRSVSAQIRLLFCAREEAEPVFRAAEEAGLTGPGYVWFMVGPQLAGGGGSGVPGEPLLLPGGSPLPAGLFAVRSAGWRDDLARRVAAGVAVVARGAQALLRDYGFLPELGHDCRTQNRTHRGESLHRYFMNITWDNRDYSFNEDGFLVNPSLVVISLTRDRTWEVVGSWEQQTLRLKYPLWSRYGRFLQPVDDTQHLTVATLEERPFVIVEPADPISGTCIRDSVPCRSQLNRTHSPPPDAPRPEKRCCKGFCIDILKRLAHTIGFSY.... The pIC50 is 6.2. (10) The drug is Clc1ccc2c3[nH]ncc3n(-c3cccnc3)c2c1. The target protein (P11137) has sequence MADERKDEAKAPHWTSAPLTEASAHSHPPEIKDQGGAGEGLVRSANGFPYREDEEGAFGEHGSQGTYSNTKENGINGELTSADRETAEEVSARIVQVVTAEAVAVLKGEQEKEAQHKDQTAALPLAAEETANLPPSPPPSPASEQTVTVEEDLLTASKMEFHDQQELTPSTAEPSDQKEKESEKQSKPGEDLKHAALVSQPETTKTYPDKKDMQGTEEEKAPLALFGHTLVASLEDMKQKTEPSLVVPGIDLPKEPPTPKEQKDWFIEMPTEAKKDEWGLVAPISPGPLTPMREKDVFDDIPKWEGKQFDSPMPSPFQGGSFTLPLDVMKNEIVTETSPFAPAFLQPDDKKSLQQTSGPATAKDSFKIEEPHEAKPDKMAEAPPSEAMTLPKDAHIPVVEEHVMGKVLEEEKEAINQETVQQRDTFTPSGQEPILTEKETELKLEEKTTISDKEAVPKESKPPKPADEEIGIIQTSTEHTFSEQKDQEPTTDMLKQDSFP.... The pIC50 is 8.3.