Dataset: Drug-target binding data from BindingDB using IC50 measurements. Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50. (1) The compound is O=S(=O)(Cc1noc(CSc2ccc(Cl)cc2)n1)c1ccccn1. The target protein sequence is MSEDAGLPVPRSQWVERGVSCATCGKRFSLFTAKSNCPCCGKLCCSDCVQAECAIVGGSAPSKVCIDCFSMLQSRRRVEPDEGSSFREFNAASAFPLQTRLLADGRVESGETSRVSPPNDGRVQHVSRANGYSNSLPVLDEYVDDLLRKSELLRMENDVLLNRLREQEAEIHALRLERDRAVARIVPDGGSMAGRSGLPQVSDEIVKELRGELAVAHLRIESVKRELKNALDRAKSSETMVRNLKQGLCNYKEEVVRPLQSREEVEMLPGVNGRRDMISTRRLPPSIVQDTILAVVPPKSCAAIGTDVDLRDWGFDTFEVASRVPSVLQSVAMHVALAWNFFASQEEAQKWAFLVAAVENNYRPNPYHNAIHAADVLQGTFSLVSAAKPLMEHLTPLECKAAAFAALTHDVCHPGRTNAFLAAVQDPVSFKFSGKGTLEQLHTVTAFELLNVTEFDFTSSMDNASFLEFKNIVSHLIGHTDMSLHSETIAKHGAKLSAGG.... The pIC50 is 5.0. (2) The drug is COc1ccc(C2=NNC3(S2)C(=O)N(c2ccccc2C)c2ccc(C)cc23)cc1. The pIC50 is 4.7. The target protein (P39900) has sequence MKFLLILLLQATASGALPLNSSTSLEKNNVLFGERYLEKFYGLEINKLPVTKMKYSGNLMKEKIQEMQHFLGLKVTGQLDTSTLEMMHAPRCGVPDVHHFREMPGGPVWRKHYITYRINNYTPDMNREDVDYAIRKAFQVWSNVTPLKFSKINTGMADILVVFARGAHGDFHAFDGKGGILAHAFGPGSGIGGDAHFDEDEFWTTHSGGTNLFLTAVHEIGHSLGLGHSSDPKAVMFPTYKYVDINTFRLSADDIRGIQSLYGDPKENQRLPNPDNSEPALCDPNLSFDAVTTVGNKIFFFKDRFFWLKVSERPKTSVNLISSLWPTLPSGIEAAYEIEARNQVFLFKDDKYWLISNLRPEPNYPKSIHSFGFPNFVKKIDAAVFNPRFYRTYFFVDNQYWRYDERRQMMDPGYPKLITKNFQGIGPKIDAVFYSKNKYYYFFQGSNQFEYDFLLQRITKTLKSNSWFGC. (3) The drug is CS(=O)(=O)c1ccc2nc(-c3ccc(-c4ccc(-c5ccccc5)s4)cc3)[nH]c2c1. The target protein (P07308) has sequence MPAHMLQEISSSYTTTTTITEPPSGNLQNGREKMKKVPLYLEEDIRPEMREDIHDPSYQDEEGPPPKLEYVWRNIILMALLHVGALYGITLIPSSKVYTLLWGIFYYLISALGITAGAHRLWSHRTYKARLPLRIFLIIANTMAFQNDVYEWARDHRAHHKFSETHADPHNSRRGFFFSHVGWLLVRKHPAVKEKGGKLDMSDLKAEKLVMFQRRYYKPGLLLMCFILPTLVPWYCWGETFLHSLFVSTFLRYTLVLNATWLVNSAAHLYGYRPYDKNIQSRENILVSLGAVGEGFHNYHHAFPYDYSASEYRWHINFTTFFIDCMAALGLAYDRKKVSKAAVLARIKRTGDGSHKSS. The pIC50 is 4.7. (4) The compound is CO/C(=C/C=C/c1cc2cc(Cl)c(Cl)cc2[nH]1)C(=O)NC1CC(C)(C)N(C)C(C)(C)C1. The target protein (P18434) has sequence MAALQEKKSCSQRMEEFQRYCWNPDTGQMLGRTLSRWVWISLYYVAFYVVMSGIFALCIYVLMRTIDPYTPDYQDQLKSPGVTLRPDVYGEKGLDISYNVSDSTTWAGLAHTLHRFLAGYSPAAQEGSINCTSEKYFFQESFLAPNHTKFSCKFTADMLQNCSGRPDPTFGFAEGKPCFIIKMNRIVKFLPGNSTAPRVDCAFLDQPRDGPPLQVEYFPANGTYSLHYFPYYGKKAQPHYSNPLVAAKLLNVPRNRDVVIVCKILAEHVSFDNPHDPYEGKVEFKLKIQK. The pIC50 is 5.6. (5) The compound is Cc1cncc2cccc(S(=O)(=O)N3CCCNC[C@@H]3C)c12. The target protein sequence is MGNAAAAKKGSEQESVKEFLAKAKEDFLKKWENPAQNTAHLDQFERIKTIGTGSFGRVMLVKHMETGNHYAMKILDKQKVVKLKQIEHTLNEKRILQAVNFPFLVKLEFSFKDNSNLYMVMEYMPGGDMFSHLRRIGRFSEPHARFYAAQIVLTFEYLHSLDLIYRDLKPENLLIDQQGYIKVADFGFAKRVKGRTWTLCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPFFADQPIQIYEKIVSGKVRFPSHFSSDLKDLLRNLLQVDLTKRFGNLKNGVNDIKNHKWFATTDWIAIYQRKVEAPFIPKFKGPGDTSNFDDYEEEEIRVSINEKCGKEFSEF. The pIC50 is 6.8.