Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50.. Dataset: Drug-target binding data from BindingDB using IC50 measurements (1) The pIC50 is 6.4. The compound is O=C(CCCCCCc1ccccc1)c1ncc(-c2ccc(CBr)cn2)o1. The target protein (Q8VCT4) has sequence MGLYPLIWLSLAACTAWGYPSSPPVVNTVKGKVLGKYVNLEGFTQPVAVFLGVPFAKPPLGSLRFAPPQPAEPWSFVKNTTSYPPMCSQDAVGGQVLSELFTNRKENIPLQFSEDCLYLNIYTPADLTKNSRLPVMVWIHGGGLVVGGASTYDGLALSAHENVVVVTIQYRLGIWGFFSTGDEHSRGNWGHLDQVAALRWVQDNIANFGGNPGSVTIFGESAGGFSVSVLVLSPLAKNLFHRAISESGVSLTAALITTDVKPIAGLVATLSGCKTTTSAVMVHCLRQKTEDELLETSLKLNLFKLDLLGNPKESYPFLPTVIDGVVLPKAPEEILAEKSFSTVPYIVGINKQEFGWIIPTLMGYPLAEGKLDQKTANSLLWKSYPTLKISENMIPVVAEKYLGGTDDLTKKKDLFQDLMADVVFGVPSVIVSRSHRDAGASTYMYEFEYRPSFVSAMRPKAVIGDHGDEIFSVFGSPFLKDGASEEETNLSKMVMKFWAN.... (2) The drug is CC(C)(C#N)c1cc(Cn2cncn2)cc(C(C)(C)C#N)c1. The target protein (P22310) has sequence MARGLQVPLPRLATGLLLLLSVQPWAESGKVLVVPTDGSPWLSMREALRELHARGHQAVVLTPEVNMHIKEEKFFTLTAYAVPWTQKEFDRVTLGYTQGFFETEHLLKRYSRSMAIMNNVSLALHRCCVELLHNEALIRHLNATSFDVVLTDPVNLCGAVLAKYLSIPAVFFWRYIPCDLDFKGTQCPNPSSYIPKLLTTNSDHMTFLQRVKNMLYPLALSYICHTFSAPYASLASELFQREVSVVDLVSYASVWLFRGDFVMDYPRPIMPNMVFIGGINCANGKPLSQEFEAYINASGEHGIVVFSLGSMVSEIPEKKAMAIADALGKIPQTVLWRYTGTRPSNLANNTILVKWLPQNDLLGHPMTRAFITHAGSHGVYESICNGVPMVMMPLFGDQMDNAKRMETKGAGVTLNVLEMTSEDLENALKAVINDKSYKENIMRLSSLHKDRPVEPLDLAVFWVEFVMRHKGAPHLRPAAHDLTWYQYHSLDVIGFLLAVV.... The pIC50 is 5.3. (3) The small molecule is CN(CCCn1c(=N)n(CC(=O)c2ccc(Cl)cc2)c2cccc(Cl)c21)C(=O)c1ccccn1. The pIC50 is 6.7. The target protein (O88410) has sequence MYLEVSERQVLDASDFAFLLENSTSPYDYGENESDFSDSPPCPQDFSLNFDRTFLPALYSLLFLLGLLGNGAVAAVLLSQRTALSSTDTFLLHLAVADVLLVLTLPLWAVDAAVQWVFGPGLCKVAGALFNINFYAGAFLLACISFDRYLSIVHATQIYRRDPRVRVALTCIVVWGLCLLFALPDFIYLSANYDQRLNATHCQYNFPQVGRTALRVLQLVAGFLLPLLVMAYCYAHILAVLLVSRGQRRFRAMRLVVVVVAAFAVCWTPYHLVVLVDILMDVGVLARNCGRESHVDVAKSVTSGMGYMHCCLNPLLYAFVGVKFREQMWMLFTRLGRSDQRGPQRQPSSSRRESSWSETTEASYLGL. (4) The drug is CN1CCN(Cc2cc3c(nc2C=O)N(C(=O)Nc2cc(NC45CC(C4)C5)c(C#N)cn2)CCC3)C(=O)C1. The target protein sequence is GLYRGQALHGRHPRPPATVQKLSRFPLARQFSLESGSSGKSSSSLVRGVRLSSSGPALLAGLVSLDLPLDPLWEFPRDRLVLGKPLGEGCFGQVVRAEAFGMDPARPDQASTVAVKMLKDNASDKDLADLVSEMEVMKLIGRHKNIINLLGVCTQEGPLYVIVECAAKGNLREFLRARRPPGPDLSPDGPRSSEGPLSFPVLVSCAYQVARGMQYLESRKCIHRDLAARNVLVTEDNVMKIADFGLARGVHHIDYYKKTSNGRLPVKWMAPEALFDRVYTHQSDVWSFGILLWEIFTLGGSPYPGIPVEELFSLLREGHRMDRPPHCPPELYGLMRECWHAAPSQRPTFKQLVEALDKVLLAVSEEYLDLRLTFGPYSPSGGDASSTCSSSDSVFSHDPLPLGSSSFPFGSGVQT. The pIC50 is 8.7. (5) The compound is COc1cccc2c(O)nc(-c3ccc([N+](=O)[O-])cc3)nc12. The target protein (O95271) has sequence MAASRRSQHHHHHHQQQLQPAPGASAPPPPPPPPLSPGLAPGTTPASPTASGLAPFASPRHGLALPEGDGSRDPPDRPRSPDPVDGTSCCSTTSTICTVAAAPVVPAVSTSSAAGVAPNPAGSGSNNSPSSSSSPTSSSSSSPSSPGSSLAESPEAAGVSSTAPLGPGAAGPGTGVPAVSGALRELLEACRNGDVSRVKRLVDAANVNAKDMAGRKSSPLHFAAGFGRKDVVEHLLQMGANVHARDDGGLIPLHNACSFGHAEVVSLLLCQGADPNARDNWNYTPLHEAAIKGKIDVCIVLLQHGADPNIRNTDGKSALDLADPSAKAVLTGEYKKDELLEAARSGNEEKLMALLTPLNVNCHASDGRKSTPLHLAAGYNRVRIVQLLLQHGADVHAKDKGGLVPLHNACSYGHYEVTELLLKHGACVNAMDLWQFTPLHEAASKNRVEVCSLLLSHGADPTLVNCHGKSAVDMAPTPELRERLTYEFKGHSLLQAAREA.... The pIC50 is 6.1. (6) The small molecule is C=CCOC(Cn1cc(C)c(=O)[nH]c1=O)OCP(=O)(O)O. The target protein (Q5FVR2) has sequence MAAPGTPPPLAPETAGADSGGGSGEHRQLPELIRLKRNGGHLSEADIRNFVHALMDGRAQDTQIGAMLMAIRLQGMDLEETSVLTQALAESGQQLEWPKAWHQQLVDKHSTGGVGDKVSLVLAPALAACGCKVPMISGRSLGHTGGTLDKLESIPGFSVTQSPEQMLQILEEVGCCIVGQSEKLVPADGILYAARDVTATVDSVPLITASILSKKAVEGLSTLVVDVKFGGAAVFPDQEKARELAKMLVRVGMGLGLQVAAALTAMDNPLGRNVGHTLEVEEALLCLDGAGPPDLRDLVIRLGGAILWLSGQAETQDQGAARVAAALDDGSALHRFQLMLSAQGVDPGLARALCSGSPTQRRQLLPHARKQEELLSPADGIVECVRALPLACVLHELGAGRSRAGQPIRPGVGAELLVDVGQWLSRGTPWLRVHLDGPALSSQQRRTLLGALVLSDRAPFKAPSPFAELVLPPTTP. The pIC50 is 6.5. (7) The compound is N#Cc1c(N)nc2sc(C(=O)c3cccc(Cl)c3)c(N)c2c1-c1ccccc1Cl. The target protein sequence is MSLNAAAAADERSRKEMDRFQVERMAGQGTFGTVQLGKEKSTGMSVAIKKVIQDPRFRNRELQIMQDLAVLHHPNIVQLQSYFYTLGERDRRDIYLNVVMEYVPDTLHRCCRNYYRRQVAPPPILIKVFLFQLIRSIGCLHLPSVNVCHRDIKPHNVLVNEADGTLKLCDFGSAKKLSPSEPNVAYICSRYYRAPELIFGNQHYTTAVDIWSVGCIFAEMMLGEPIFRGDNSAGQLHEIVRVLGCPSREVLRKLNPSHTDVDLYNSKGIPWSNVFSDHSLKDAKEAYDLLSALLQYLPEERMKPYEALCHPYFDELHDPATKLPNNKDLPEDLFRFLPNEIEVMSEAQKAKLVRK. The pIC50 is 4.0. (8) The compound is CC(=NN=C(N)N)c1ccc(C(C)=NN=C(N)N)cc1. The target protein sequence is MSVYPKALRDEYIMSKTLGSGACGEVKLAFERKTCKKVAIKIISKRKFAIGSAREADPALNVETEIEILKKLNHPCIIKIKNFFDAEDYYIVLELMEGGELFDKVVGNKRLKEATCKLYFYQMLLAVQYLHENGIIHRDLKPENVLLSSQEEDCLIKITDFGHSKILGETSLMRTLCGTPTYLAPEVLVSVGTAGYNRAVDCWSLGVILFICLSGYPPFSEHRTQVSLKDQITSGKYNFIPEVWAEVSEKALDLVKKLLVVDPKARFTTEEALRHPWLQDEDMKRKFQDLLSEENESTALPQVLAQPSTSRKRPREGEAEGAE. The pIC50 is 4.7. (9) The pIC50 is 6.9. The target protein (Q64654) has sequence MVLLGLLQSGGSVLGQAMEQVTGGNLLSTLLIACAFTLSLVYLFRLAVGHMVQLPAGAKSPPYIYSPIPFLGHAIAFGKSPIEFLENAYEKYGPVFSFTMVGKTFTYLLGSDAAALLFNSKNEDLNAEEVYGRLTTPVFGKGVAYDVPNAVFLEQKKILKSGLNIAHFKQYVSIIEKEAKEYFKSWGESGERNVFEALSELIILTASHCLHGKEIRSQLNEKVAQLYADLDGGFSHAAWLLPGWLPLPSFRRRDRAHREIKNIFYKAIQKRRLSKEPAEDILQTLLDSTYKDGRPLTDDEIAGMLIGLLLAGQHTSSTTSAWMGFFLARDKPLQDKCYLEQKTVCGEDLPPLTYEQLKDLNLLDRCIKETLRLRPPIMTMMRMAKTPQTVAGYTIPPGHQVCVSPTVNQRLKDSWVERLDFNPDRYLQDNPASGEKFAYVPFGAGRHRCIGENFAYVQIKTIWSTMLRLYEFDLINGYFPSVNYTTMIHTPENPVIRYKR.... The drug is CC(=O)N1CCN(c2ccc(OC[C@@H]3CO[C@@](Cn4ccnc4)(c4ccc(Cl)cc4Cl)O3)cc2)CC1. (10) The compound is CCCCc1oc2ccccc2c1C(=O)c1cc(I)c(OCCN)c(I)c1. The target protein (Q9H160) has sequence MLGQQQQQLYSSAALLTGERSRLLTCYVQDYLECVESLPHDMQRNVSVLRELDNKYQETLKEIDDVYEKYKKEDDLNQKKRLQQLLQRALINSQELGDEKIQIVTQMLELVENRARQMELHSQCFQDPAESERASDKAKMDSSQPERSSRRPRRQRTSESRDLCHMANGIEDCDDQPPKEKKSKSAKKKKRSKAKQEREASPVEFAIDPNEPTYCLCNQVSYGEMIGCDNEQCPIEWFHFSCVSLTYKPKGKWYCPKCRGDNEKTMDKSTEKTKKDRRSR. The pIC50 is 4.0.