Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50.. Dataset: Drug-target binding data from BindingDB using IC50 measurements (1) The small molecule is COC(=O)/C(=C\C=C\c1cc2ccccc2[nH]1)OC. The target protein (Q9I8D0) has sequence MGELFRSEEMTLAQLFLQSEAAYCCVSELGELGKVQFRDLNPDVNVFQRKFVNEVRRCEEMDRKLRFVEKEIKKANIPIMDTGENPEVPFPRDMIDLEANFEKIENELKEINTNQEALKRNFLELTELKFILRKTQQFFDEMADPDLLEESSSLLEPSEMGRGAPLRLGFVAGVINRERIPTFERMLWRVCRGNVFLRQAEIENPLEDPVTGDYVHKSVFIIFFQGDQLKNRVKKICEGFRASLYPCPETPQERKEMASGVNTRIDDLQMVLNQTEDHRQRVLQAAAKNIRVWFIKVRKMKAIYHTLNLCNIDVTQKCLSAEVWCPVADLDSIQFALRRGTEHSGSTVPSILNRMQTNQTPPTYNKTNKFTCGFQNIVDAYGIGTYREINPAPYTIITFPFLFAVMFGDFGHGILMTLIAIWMVLRESRILSQKSDNEMFSTVFSGRYIILLMGLFSTYTGLIYNDCFSKSLNMFGSSWSVRPMFSKANWSDELLKTTPL.... The pIC50 is 7.0. (2) The compound is O=C(C=Cc1ccccc1CSc1nc(Cc2ccccc2)cc(=O)[nH]1)NO. The target protein sequence is MEVGGQEVKPGATVSCKVGDGLVIHLSQAALGESKKASENAILSVNIDDKKLVLGTLSVEKHPQISCDLVFDKDFELPHNSKTRSVFFRGYKSPVPLFESNSGEDSSDEELKTDQIPLQNNEIKISAAKVPAKDDDDDVFIILAMMMMIYSSDDDDDDFTTSDSDNEMSEEDDSSDEDEMSEEDDSSDEDEMSGGADPSDDSSDESGSEHTSAPKKTDVVVGKKRAIKAEAPYGKKAKSEQSSQKTGDKASTSHPAKQSIKTPADKSRKTPTADKKSPKSGSHGCK. The pIC50 is 6.1. (3) The drug is COc1ccc(CN2C(=O)c3ccccc3C(C=NOCc3ccc(F)cc3)C2=O)cc1. The target protein (P15273) has sequence MNLSLSDLHRQVSRLVQQESGDCTGKLRGNVAANKETTFQGLTIASGARESEKVFAQTVLSHVANIVLTQEDTAKLLQSTVKHNLNNYELRSVGNGNSVLVSLRSDQMTLQDAKVLLEAALRQESGARGHVSSHSHSVLHAPGTPVREGLRSHLDPRTPPLPPRERPHTSGHHGAGEARATAPSTVSPYGPEARAELSSRLTTLRNTLAPATNDPRYLQACGGEKLNRFRDIQCCRQTAVRADLNANYIQVGNTRTIACQYPLQSQLESHFRMLAENRTPVLAVLASSSEIANQRFGMPDYFRQSGTYGSITVESKMTQQVGLGDGIMADMYTLTIREAGQKTISVPVVHVGNWPDQTAVSSEVTKALASLVDQTAETKRNMYESKGSSAVADDSKLRPVIHCRAGVGRTAQLIGAMCMNDSRNSQLSVEDMVSQMRVQRNGIMVQKDEQLDVLIKLAEGQGRPLLNS. The pIC50 is 4.3. (4) The compound is CN1CCN(c2cc(C(=O)Nc3cccc(Nc4ccc5c(c4)NC(=O)/C5=C\c4ccc[nH]4)c3)cc(C(F)(F)F)c2)CC1. The target protein (P43404) has sequence MPDPAAHLPFFYGSISRAEAEEHLKLAGMADGLFLLRQCLRSLGGYVLSLVHDVRFHHFPIERQLNGTYAIAGGKAHCGPAELCQFYSQDPDGLPCNLRKPCNRPPGLEPQPGVFDCLRDAMVRDYVRQTWKLEGDALEQAIISQAPQVEKLIATTAHERMPWYHSSLTREEAERKLYSGQQTDGKFLLRPRKEQGTYALSLVYGKTVYHYLISQDKAGKYCIPEGTKFDTLWQLVEYLKLKADGLIYRLKEVCPNSSASAAVAAPTLPAHPSTFTQPQRRVDTLNSDGYTPEPARLASSTDKPRPMPMDTSVYESPYSDPEELKDKKLFLKRENLLVADIELGCGNFGSVRQGVYRMRKKQIDVAIKVLKQGTEKADKDEMMREAQIMHQLDNPYIVRLIGVCQAEALMLVMEMAGGGPLHKFLLGKKEEIPVSNVAELLHQVAMGMKYLEEKNFVHRDLAARNVLLVNRHYAKISDFGLSKALGADDSYYTARSAGKW.... The pIC50 is 5.2. (5) The compound is CNC(=O)c1c(-c2ccc(F)cc2)oc2cc(N(C)S(C)(=O)=O)c(-c3cnc4ncn(-c5ccc(F)cc5)c(=O)c4c3)cc12. The target protein sequence is APITAYAQQTRGLLGCIITSLTGRDKNQVEGEVQIVSTATQTFLATCINGVCWTVYHGAGTRTIASPKGPVIQTYTNVDQDLVGWPAPQGSRSLTPCTCGSSDLYLVTRHADVIPVRRRGDSRGSLLSPRPISYLKGSSGGPLLCPTGHAVGLFRAAVCTRGVAKAVDFIPVENLETTMRSPVFTDNSSPPAVPQSFQVAHLHAPTGSGKSTKVPAAYAAKGYKVLVLNPSVAATLGFGAYMSKAHGVDPNIRTGVRTITTGSPITYSTYGKFLADAGCSGGAYDIIICDECHSTDATSISGIGTVLDQAETAGARLVVLATATPPGSVTVSHPNIEEVALSTTGEIPFYGKAIPLEVIKGGRHLIFCHSKKKCDELAAKLVALGINAVAYYRGLDVSVIPTSGDVVVVSTDALMTGFTGDFDSVIDCNTCVTQTVDFSLDPTFTIETTTLPQDAVSRTQRRGRTGRGKPGIYRFVAPGERPSGMFDSSVLCECYDAGCA.... The pIC50 is 7.8. (6) The small molecule is CCCCCC/C=C/CC(=O)N[C@H](CSc1ccc2ccccc2c1)CC(=O)O. The target protein sequence is ARMRTGEKYPLIIFSHGLGAFRTIYSAIGTDLASYGFIVAAVEHRDGSASATCFFKDQSAAEIRNKTWLYLRTLGKGEEEFPLRNEQVRQRAEECVCLHEFCT. The pIC50 is 7.6. (7) The compound is O=C1Oc2cc(O)ccc2/C1=C\c1ccc(O)cc1. The target protein (O62855) has sequence MATLSPLLLAALLWVPVGTLTCYGDSGQPVDWFVVYKLPAHSSPGDVAQSGLRYKYLDEESGGWRDGAGSINSSTGALGRSLLPLYRNTSQLAFLLYNDQPPKYRGSQHSSNRGHTKGVLLLDQEGGFWLIHSVPNFPPPSSSAAYSWPPSARTYGQTLICVSFPLTQFLNISRQLTYTYPMVYDYKLEGDFARKFPYLEEVVKGHHVLQEPWNSSVTLTSKAGASFQSFAKCGNFGDDLYSGWLAEALGSNLQVQFWQRSAGILPSNCSGVQHVLDVTQIAFPGPAGPNFNATEDHSKWCVAPERPWTCVGDMNRNKREEHRGGGTLCAQLPALWKAFKPLVKAWEPCEKENRAFSPRSPAKD. The pIC50 is 3.4. (8) The small molecule is CCn1ccn(C(=O)OC)c1=[Se]. The target protein (P22079) has sequence MRVLLHLPALLASLILLQAAASTTRAQTTRTSAISDTVSQAKVQVNKAFLDSRTRLKTAMSSETPTSRQLSEYLKHAKGRTRTAIRNGQVWEESLKRLRQKASLTNVTDPSLDLTSLSLEVGCGAPAPVVRCDPCSPYRTITGDCNNRRKPALGAANRALARWLPAEYEDGLSLPFGWTPGKTRNGFPLPLAREVSNKIVGYLNEEGVLDQNRSLLFMQWGQIVDHDLDFAPDTELGSSEYSKAQCDEYCIQGDNCFPIMFPPNDPKAGTQGKCMPFFRAGFVCPTPPYKSLAREQINALTSFLDASFVYSSEPSLASRLRNLSSPLGLMAVNQEVSDHGLPYLPYDSKKPSPCEFINTTARVPCFLAGDSRASEHILLATSHTLFLREHNRLARELKRLNPQWDGEKLYQEARKILGAFVQIITFRDYLPILLGDHMQKWIPPYQGYSESVDPRISNVFTFAFRFGHLEVPSSMFRLDENYQPWGPEPELPLHTLFFNT.... The pIC50 is 4.6. (9) The drug is Oc1csc(S)n1. The target is XTSFAESXKPVQQPSAFGS. The pIC50 is 4.0. (10) The compound is COc1ccc(N2CCN(C(=O)c3cc4c(s3)-c3ccccc3S(=O)(=O)C4)CC2)cc1. The target protein (O00764) has sequence MEEECRVLSIQSHVIRGYVGNRAATFPLQVLGFEIDAVNSVQFSNHTGYAHWKGQVLNSDELQELYEGLRLNNMNKYDYVLTGYTRDKSFLAMVVDIVQELKQQNPRLVYVCDPVLGDKWDGEGSMYVPEDLLPVYKEKVVPLADIITPNQFEAELLSGRKIHSQEEALRVMDMLHSMGPDTVVITSSDLPSPQGSNYLIVLGSQRRRNPAGSVVMERIRMDIRKVDAVFVGTGDLFAAMLLAWTHKHPNNLKVACEKTVSTLHHVLQRTIQCAKAQAGEGVRPSPMQLELRMVQSKRDIEDPEIVVQATVL. The pIC50 is 5.0.