Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50.. Dataset: Drug-target binding data from BindingDB using IC50 measurements (1) The drug is CCOC(=O)/C=C/OC(=O)N(CC(N)=O)NC(=O)[C@H](C)NC(=O)[C@H](C)NC(=O)OCc1ccccc1. The target protein (Q99538) has sequence MVWKVAVFLSVALGIGAVPIDDPEDGGKHWVVIVAGSNGWYNYRHQADACHAYQIIHRNGIPDEQIVVMMYDDIAYSEDNPTPGIVINRPNGTDVYQGVPKDYTGEDVTPQNFLAVLRGDAEAVKGIGSGKVLKSGPQDHVFIYFTDHGSTGILVFPNEDLHVKDLNETIHYMYKHKMYRKMVFYIEACESGSMMNHLPDNINVYATTAANPRESSYACYYDEKRSTYLGDWYSVNWMEDSDVEDLTKETLHKQYHLVKSHTNTSHVMQYGNKTISTMKVMQFQGMKRKASSPVPLPPVTHLDLTPSPDVPLTIMKRKLMNTNDLEESRQLTEEIQRHLDARHLIEKSVRKIVSLLAASEAEVEQLLSERAPLTGHSCYPEALLHFRTHCFNWHSPTYEYALRHLYVLVNLCEKPYPLHRIKLSMDHVCLGHY. The pIC50 is 7.5. (2) The compound is CC(C)C[C@H](NC(=O)[C@@H](O)[C@H](N)Cc1ccccc1)C(=O)O. The target protein (P15145) has sequence MAKGFYISKALGILGILLGVAAVATIIALSVVYAQEKNKNAEHVPQAPTSPTITTTAAITLDQSKPWNRYRLPTTLLPDSYNVTLRPYLTPNADGLYIFKGKSIVRLLCQEPTDVIIIHSKKLNYTTQGHMVVLRGVGDSQVPEIDRTELVELTEYLVVHLKGSLQPGHMYEMESEFQGELADDLAGFYRSEYMEGNVKKVLATTQMQSTDARKSFPCFDEPAMKATFNITLIHPNNLTALSNMPPKGSSTPLAEDPNWSVTEFETTPVMSTYLLAYIVSEFQSVNETAQNGVLIRIWARPNAIAEGHGMYALNVTGPILNFFANHYNTSYPLPKSDQIALPDFNAGAMENWGLVTYRENALLFDPQSSSISNKERVVTVIAHELAHQWFGNLVTLAWWNDLWLNEGFASYVEYLGADHAEPTWNLKDLIVPGDVYRVMAVDALASSHPLTTPAEEVNTPAQISEMFDSISYSKGASVIRMLSNFLTEDLFKEGLASYLH.... The pIC50 is 5.1. (3) The small molecule is Cc1nnnn1C(/C=C/[C@@H](O)C[C@@H](O)CC(=O)[O-])=C(c1ccc(F)cc1)c1ccc(F)cc1. The target protein (Q01237) has sequence MLSRLFRMHGLFVASHPWEVIVGTVTLTICMMSMNMFTGNNKICGWNYECPKFEEDVLSSDIIILTITRCIAILYIYFQFQNLRQLGSKYILGIAGLFTIFSSFVFSTVVIHFLDKELTGLNEALPFFLLLIDLSRASALAKFALSSNSQDEVRENIARGMAILGPTFTLDALVECLVIGVGTMSGVRQLEIMCCFGCMSVLANYFVFMTFFPACVSLVLELSRESREGRPIWQLSHFARVLEEEENKPNPVTQRVKMIMSLGLVLVHAHSRWIADPSPQNSTAEQAKVSLGLDEDVSKRIEPSVSLWQFYLSKMISMDIEQVITLSLAFLLAVKYIFFEQAETESTLSLKNPITSPVVTSKKAQDNCCRREPLLVRRNQKLSSVEEDPGANQERKVEVIKPLVVEAETTSRATFVLGASVASPPSALGTQEPGIELPIEPRPNEECLQILENAEKGAKFLSDAEIIQLVNAKHIPAYKLETLMETHERGVSIRRQLLST.... The pIC50 is 6.2. (4) The compound is O=c1ccoc2cc(OCCCN3CCN(CCCNc4c5c(nc6ccccc46)CCCC5)CC3)ccc12. The target protein (P81908) has sequence EEDIIITTKNGKVRGMNLPVLGGTVTAFLGIPYAQPPLGRLRFKKPQSLTKWSNIWNATKYANSCYQNTDQSFPGFLGSEMWNPNTELSEDCLYLNVWIPAPKPKNATVMIWIYGGGFQTGTSSLPVYDGKFLARVERVIVVSMNYRVGALGFLALSENPEAPGNMGLFDQQLALQWVQKNIAAFGGNPRSVTLFGESAGAASVSLHLLSPRSQPLFTRAILQSGSSNAPWAVTSLYEARNRTLTLAKRMGCSRDNETEMIKCLRDKDPQEILLNEVFVVPYDTLLSVNFGPTVDGDFLTDMPDTLLQLGQFKRTQILVGVNKDEGTAFLVYGAPGFSKDNNSIITRKEFQEGLKIFFPRVSEFGRESILFHYMDWLDDQRAENYREALDDVVGDYNIICPALEFTRKFSELGNDAFFYYFEHRSTKLPWPEWMGVMHGYEIEFVFGLPLERRVNYTRAEEILSRSIMKRWANFAKYGNPNGTQNNSTRWPVFKSTEQKY.... The pIC50 is 7.6. (5) The compound is C[n+]1cccc2cc(NC(=O)c3ccc4ccc5ccc(C(=O)Nc6ccc7c(ccc[n+]7C)c6)nc5c4n3)ccc21. The target protein (P27296) has sequence MALTAALKAQIAAWYKALQEQIPDFIPRAPQRQMIADVAKTLAGEEGRHLAIEAPTGVGKTLSYLIPGIAIAREEQKTLVVSTANVALQDQIYSKDLPLLKKIIPDLKFTAAFGRGRYVCPRNLTALASTEPTQQDLLAFLDDELTPNNQEEQKRCAKLKGDLDTYKWDGLRDHTDIAIDDDLWRRLSTDKASCLNRNCYYYRECPFFVARREIQEAEVVVANHALVMAAMESEAVLPDPKNLLLVLDEGHHLPDVARDALEMSAEITAPWYRLQLDLFTKLVATCMEQFRPKTIPPLAIPERLNAHCEELYELIASLNNILNLYMPAGQEAEHRFAMGELPDEVLEICQRLAKLTEMLRGLAELFLNDLSEKTGSHDIVRLHRLILQMNRALGMFEAQSKLWRLASLAQSSGAPVTKWATREEREGQLHLWFHCVGIRVSDQLERLLWRSIPHIIVTSATLRSLNSFSRLQEMSGLKEKAGDRFVALDSPFNHCEQGKI.... The pIC50 is 7.3. (6) The drug is CC1CNC[C@H](O)C1O. The target protein (P16278) has sequence MPGFLVRILPLLLVLLLLGPTRGLRNATQRMFEIDYSRDSFLKDGQPFRYISGSIHYSRVPRFYWKDRLLKMKMAGLNAIQTYVPWNFHEPWPGQYQFSEDHDVEYFLRLAHELGLLVILRPGPYICAEWEMGGLPAWLLEKESILLRSSDPDYLAAVDKWLGVLLPKMKPLLYQNGGPVITVQVENEYGSYFACDFDYLRFLQKRFRHHLGDDVVLFTTDGAHKTFLKCGALQGLYTTVDFGTGSNITDAFLSQRKCEPKGPLINSEFYTGWLDHWGQPHSTIKTEAVASSLYDILARGASVNLYMFIGGTNFAYWNGANSPYAAQPTSYDYDAPLSEAGDLTEKYFALRNIIQKFEKVPEGPIPPSTPKFAYGKVTLEKLKTVGAALDILCPSGPIKSLYPLTFIQVKQHYGFVLYRTTLPQDCSNPAPLSSPLNGVHDRAYVAVDGIPQGVLERNNVITLNITGKAGATLDLLVENMGRVNYGAYINDFKGLVSNLT.... The pIC50 is 3.0. (7) The compound is OC[C@H]1NC[C@H](O)[C@@H](O)[C@@H]1O. The target protein (P21139) has sequence MAAAPFLKHWRTTFERVEKFVSPIYFTDCNLRGRLFGDSCPVTLSSFLTPERLPYEKAVQQNFSPAQVGDSFGPTWWTCWFRVELVIPEVWVGKEVHLCWESDGESLVWRDGEPVQGLTKEGEKTSYVLSERLHAADPRSLTLYVEVACNGLLGAGKGSMIAAPDPEKMFQLSQAKLAVFHRDVHNLLVDLELLLGVAKGLGEDNQRSFQALYTANQMVNICDPAQPETYPAAEALASKFFGQRGGESQHTIHATGHCHIDTAWLWPFKETVRKCARSWSTAVKLMERNTEFTFACSQAQQLEWVKNQYPGLYAQLQEFACRGQFVPVGGTWVEMDGNLPSGEAMVRQFLQGQNFFLQEFGKMCSEFWLPDTFGYSAQLPQIMQGCGIKRFLTQKLSWNLVNSFPHHTFFWEGLDGSQVLVHFPPGDSYGMQGSVEEVLKTVTNNRDKGRTNHSGFLFGFGDGGGGPTQTMLDRLKRLGNTDGQPRVQLSSPGQLFTALE.... The pIC50 is 4.3.