Dataset: Drug-target binding data from BindingDB using IC50 measurements. Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50. (1) The compound is COc1cc(Cc2cnc(N)nc2N)cc(OC)c1OC. The target protein (Q54801) has sequence MTKKIVAIWAQDEEGVIGKENRLPWHLPAELQHFKETTLNHAILMGRVTFDGMGRRLLPKRETLILTRNPEEKIDGVATFQDVQSVLDWYQAQEKNLYIIGGKQIFQAFEPYLDEVIVTHIHARVEGDTYFPEELDLSLFETVSSKFYAKDEKNPYDFTIQYRKRKEV. The pIC50 is 5.5. (2) The small molecule is COC(=O)c1ccc(Oc2ccc(C=C3SC(O)=NC3=O)cc2OC)cc1C(F)(F)F. The target protein (P11474) has sequence MSSQVVGIEPLYIKAEPASPDSPKGSSETETEPPVALAPGPAPTRCLPGHKEEEDGEGAGPGEQGGGKLVLSSLPKRLCLVCGDVASGYHYGVASCEACKAFFKRTIQGSIEYSCPASNECEITKRRRKACQACRFTKCLRVGMLKEGVRLDRVRGGRQKYKRRPEVDPLPFPGPFPAGPLAVAGGPRKTAAPVNALVSHLLVVEPEKLYAMPDPAGPDGHLPAVATLCDLFDREIVVTISWAKSIPGFSSLSLSDQMSVLQSVWMEVLVLGVAQRSLPLQDELAFAEDLVLDEEGARAAGLGELGAALLQLVRRLQALRLEREEYVLLKALALANSDSVHIEDAEAVEQLREALHEALLEYEAGRAGPGGGAERRRAGRLLLTLPLLRQTAGKVLAHFYGVKLEGKVPMHKLFLEMLEAMMD. The pIC50 is 7.2. (3) The drug is O=C(Nc1cccc(OC(F)(F)F)c1)Nc1ccc(Oc2ccnc(NC(=O)C3CC3)c2)c(F)c1. The target protein sequence is KRANGGELKTGYLSIVMDPDELPLDEHCERLPYDASKWEFPRDRLKLGKPLGRGAFGQVIEADAFGIDKTATCRTVAVKMLKEGATHSEHRALMSELKILIHIGHHLNVVNLLGACTKPGGPLMVIVEFCKFGNLSTYLRSKRNEFVPYKTKGARFRQGKDYVGAIPVDLKRRLDSITSSQSSASSGFVEEKSLSDVEEEEAPEDLYKDFLTLEHLICYSFQVAKGMEFLASRKCIHRDLAARNILLSEKNVVKICDFGLARDIYKDPDYVRKGDARLPLKWMAPETIFDRVYTIQSDVWSFGVLLWEIFSLGASPYPGVKIDEEFCRRLKEGTRMRAPDYTTPEMYQTMLDCWHGEPSQRPTFSELVEHLGNLLQANAQQDGKDYIVLPISETLSMEEDSGLSLPTSPVSCMEEEEVCDPKFHYDNTAGISQYLQNSKRKSRPVSVKTFEDIPLEEPEVKVIPDDNQTDSGMVLASEELKTLEDRTKLSPSFGGMVPSK.... The pIC50 is 8.3. (4) The small molecule is CS(=O)(=O)Nc1ccc([N+](=O)[O-])cc1OC1CCCCC1. The target protein (Q9JM51) has sequence MPSPGLVMESGQVLPAFLLCSTLLVIKMYAVAVITGQMRLRKKAFANPEDALKRGGLQYYRSDPDVERCLRAHRNDMETIYPFLFLGFVYSFLGPNPLIAWIHFLVVLTGRVVHTVAYLGKLNPRLRSGAYVLAQFSCFSMALQILWEVAHHL. The pIC50 is 8.2. (5) The compound is CC(=O)N[C@@H]1[C@@H](N=C(N)N)C=C(C(=O)O)O[C@H]1[C@H](O)[C@H](O)CO. The target protein sequence is MNPNQKIITIGSICMVVGIVSLMLQIGNMISIWVSHSIQTGNQHQAEPIRNTNFLTENAVASVTLAGNSSLCPIRGWAVHSKDNSIRIGSKGDVFVIREPFISCSHLECRTFFLTQGALLNDKHSNGTVKDRSPHRTLMSCPVGEAPSPYNSRFESVAWSASACHDGTSWLTIGISGPDNGAVAVLKYNGIITDTIKSWRNNILRTQESECACVNGSCFTVMTDGPSNGQASYKIFKMEKGKVVKSVELNAPNYHYEECSCYPDAGEIICVCRDNWHGSNRPWVSFNQNLEYQIGYICSGVFGDNPRPNDGTGSCGPVSPNGAYGIKGFSFKYGNGVWIGRTKSTNSRSGFEMIWDPNGWTGTDSNFSMKQDIVAITDWSGYSGSFVQHPELTGLDCIRPCFWVELIRGRPKESTIWTSGSSISFCGVNSDTVSWSWPDGAELPFTIDK. The pIC50 is 9.1. (6) The target protein sequence is MNLTIKEEDFTNTFMKNEESFNTFRVTKVKRWNAKRLFKILFVTVFIVLAGGFSYYIFENFVFQKNRKINHIIKTSKYSTVGFNIENSYDRLMKTIKEHKLKNYIKESVKLFNKGLTKKSYLGSEFDNVELKDLANVLSFGEAKLGDNGQKFNFLFHTASSNVWVPSIKCTSESCESKNHYDSSKSKTYEKDDTPVKLTSKAGTISGIFSKDLVTIGKLSVPYKFIEMTEIVGFEPFYSESDVDGVFGLGWKDLSIGSIDPYIVELKTQNKIEQAVYSIYLPPENKNKGYLTIGGIEERFFDGPLNYEKLNHDLMWQVDLDVHFGNVSSKKANVILDSATSVITVPTEFFNQFVESASVFKVPFLSLYVTTCGNTKLPTLEYRSPNKVYTLEPKQYLEPLENIFSALCMLNIVPIDLEKNTFVLGDPFMRKYFTVYDYDNHTVGFALAKNL. The small molecule is Cc1cccc(C)c1OCC(=O)N[C@@H](Cc1ccccc1)[C@H](O)C(=O)N1CSC(C)(C)[C@H]1C(=O)N[C@H]1c2ccccc2C[C@H]1O. The pIC50 is 6.2.