This data is from Drug-target binding data from BindingDB using IC50 measurements. The task is: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50. (1) The drug is COc1ccc(S(=O)(=O)NCC(CN2CCOCC2)OP(=O)(O)O)cc1. The target protein (P11880) has sequence MISVTLSQLTDILNGELQGADITLDAVTTDTRKLTPGCLFVALKGERFDAHDFADQAKAGGAGALLVSRPLDIDLPQLIVKDTRLAFGELAAWVRQQVPARVVALTGSSGKTSVKEMTAAILSQCGNTLYTAGNLNNDIGVPMTLLRLTPEYDYAVIELGANHQGEIAWTVSLTRPEAALVNNLAAAHLEGFGSLAGVAKAKGEIFSGLPENGIAIMNADNNDWLNWQSVIGSRKVWRFSPNAANSDFTATNIHVTSHGTEFTLQTPTGSVDVLLPLPGRHNIANALAAAALSMSVGATLDAIKAGLANLKAVPGRLFPIQLAENQLLLDDSYNANVGSMTAAVQVLAEMPGYRVLVVGDMAELGAESEACHVQVGEAAKAAGIDRVLSVGKQSHAISTASGVGEHFADKTALITRLKLLIAEQQVITILVKGSRSAAMEEVVRALQENGTC. The pIC50 is 3.3. (2) The compound is CC[C@H](C)[C@H](NC(=O)[C@H](CCCCN=[N+]=[N-])NC(=O)OCc1ccccc1)C(=O)N[C@@H](Cc1ccccc1)B1O[C@@H]2C[C@@H]3C[C@@H](C3(C)C)[C@]2(C)O1. The target protein (P28063) has sequence MALLDLCGAARGQRPEWAALDAGSGGRSDPGHYSFSAQAPELALPRGMQPTAFLRSFGGDQERNVQIEMAHGTTTLAFKFQHGVIVAVDSRATAGSYISSLRMNKVIEINPYLLGTMSGCAADCQYWERLLAKECRLYYLRNGERISVSAASKLLSNMMLQYRGMGLSMGSMICGWDKKGPGLYYVDDNGTRLSGQMFSTGSGNTYAYGVMDSGYRQDLSPEEAYDLGRRAIAYATHRDNYSGGVVNMYHMKEDGWVKVESSDVSDLLYKYGEAAL. The pIC50 is 7.8. (3) The drug is O=C(Nc1ccccc1Cl)c1ccc(OC(F)F)c(OC2CCCC2)c1. The pIC50 is 9.0. The target protein (P51160) has sequence MGEINQVAVEKYLEENPQFAKEYFDRKLRVEVLGEIFKNSQVPVQSSMSFSELTQVEESALCLELLWTVQEEGGTPEQGVHRALQRLAHLLQADRCSMFLCRSRNGIPEVASRLLDVTPTSKFEDNLVGPDKEVVFPLDIGIVGWAAHTKKTHNVPDVKKNSHFSDFMDKQTGYVTKNLLATPIVVGKEVLAVIMAVNKVNASEFSKQDEEVFSKYLNFVSIILRLHHTSYMYNIESRRSQILMWSANKVFEELTDVERQFHKALYTVRSYLNCERYSIGLLDMTKEKEFYDEWPIKLGEVEPYKGPKTPDGREVNFYKIIDYILHGKEEIKVIPTPPADHWTLISGLPTYVAENGFICNMMNAPADEYFTFQKGPVDETGWVIKNVLSLPIVNKKEDIVGVATFYNRKDGKPFDEHDEYITETLTQFLGWSLLNTDTYDKMNKLENRKDIAQEMLMNQTKATPEEIKSILKFQEKLNVDVIDDCEEKQLVAILKEDLPD.... (4) The target protein (Q9ZLT0) has sequence MKIGVFDSGVGGFSVLKSLLKARLFDEIIYYGDSARVPYGTKDPTTIKQFGLEALDFFKPHEIELLIVACNTASALALEEMQKYSKIPIVGVIEPSILAIKRQVEDKNAPILVLGTKATIQSNAYDNALKQQGYLNISHLATSLFVPLIEESILEGELLETCMHYYFTPLEILPEVIILGCTHFPLIAQKIEGYFMGHFALPTPPLLIHSGDAIVEYLQQKYALKNNACTFPKVEFHASGDVIWLERQAKEWLKL. The pIC50 is 5.4. The compound is CCCNC1=Nc2ccccc2C(c2ccccc2)=NC1c1cccs1. (5) The small molecule is CC(=O)N[C@@H](CC(=O)O)C(=O)N[C@H]1CCC[C@H]2SC[C@@H](C(=O)N[C@H]3CC(=O)OC3O)N2C1=O. The target protein (P29452) has sequence MADKILRAKRKQFINSVSIGTINGLLDELLEKRVLNQEEMDKIKLANITAMDKARDLCDHVSKKGPQASQIFITYICNEDCYLAGILELQSAPSAETFVATEDSKGGHPSSSETKEEQNKEDGTFPGLTGTLKFCPLEKAQKLWKENPSEIYPIMNTTTRTRLALIICNTEFQHLSPRVGAQVDLREMKLLLEDLGYTVKVKENLTALEMVKEVKEFAACPEHKTSDSTFLVFMSHGIQEGICGTTYSNEVSDILKVDTIFQMMNTLKCPSLKDKPKVIIIQACRGEKQGVVLLKDSVRDSEEDFLTDAIFEDDGIKKAHIEKDFIAFCSSTPDNVSWRHPVRGSLFIESLIKHMKEYAWSCDLEDIFRKVRFSFEQPEFRLQMPTADRVTLTKRFYLFPGH. The pIC50 is 5.0. (6) The small molecule is COCC(COC)NC(=O)c1nc(N2CC[C@@H](NC(=O)c3[nH]c(C)c(Cl)c3Cl)[C@@H](OC)C2)sc1C(=O)O. The target protein (Q63120) has sequence MDKFCNSTFWDLSLLESPEADLPLCFEQTVLVWIPLGFLWLLAPWQLYSVYRSRTKRSSITKFYLAKQVFVVFLLILAAIDLSLALTEDTGQATVPPVRYTNPILYLCTWLLVLAVQHSRQWCVRKNSWFLSLFWILSVLCGVFQFQTLIRALLKDSKSNMAYSYLFFVSYGFQIVLLILTAFSGPSDSTQTPSVTASFLSSITFSWYDRTVLKGYKHPLTLEDVWDIDEGFKTRSVTSKFEAAMTKDLQKARQAFQRRLQKSQRKPEATLHGLNKKQSQSQDVLVLEEAKKKSEKTTKDYPKSWLIKSLFKTFHVVILKSFILKLIHDLLVFLNPQLLKLLIGFVKSSNSYVWFGYICAILMFAVTLIQSFCLQSYFQHCFVLGMCVRTTVMSSIYKKALTLSNLARKQYTIGETVNLMSVDSQKLMDATNYMQLVWSSVIQITLSIFFLWRELGPSILAGVGVMVLLIPVNGVLATKIRNIQVQNMKNKDKRLKIMNE.... The pIC50 is 4.1. (7) The drug is COc1cc(CN=Nc2ccc(Cl)c(C(=O)O)c2)ccc1OC(=O)c1ccccc1F. The target protein sequence is MRVKGIRRNYQHLWRWGTMLLGMLMICSAKEQLWVTAYYGVPVWKEATTTLFCASDAKAYDTEVHNVWATHACVPTDPNPREVVMGNVTEEFNIWNNSMVEQMHEDIISLWDESLKPCVKLTPLCVTFNCTNYNGTRNGTTTEPPEVKNCTTKETGIKNCSFNIATSGVEDRFKKEYALLYTADIVQIDNSSINYTLIGCNTSVITQACPKVSFEPIPIHYCAPAGFAILKCNNKTFNGKGPCTNVSTVQCTHGIRPVVSTQLLLNGSLAEEVVIRSDNFSDNAKTIIVQLKDPVVINCTRPNNNTRKGIRIGPGRTFYTTERIIGDIRQAHCNISRTQWNNTLRLIAAKLKKQFNNKTIIFRNSSGGDPEIVMHSFNCGGEFFYCNTTQLFNSTWVHNNTWVHNNTGNDTEEGTITLPCRIKQIINMWQEVGKAMYAPPIKGQIRCSSNITGLILTRDGGNTSSNNETFRPGGGDMRDNWRSELYKYKVVKIEPLGVAP.... The pIC50 is 4.2. (8) The drug is O=C(Nc1cccc(Cl)c1N1CCN(c2ccccn2)CC1)c1ccc(Br)o1. The target protein sequence is MSKSKVDNQFYSVEVGDSTFTVLKRYQNLKPIGSGAQGIVCAAYDAVLDRNVAIKKLSRPFQNQTHAKRAYRELVLMKCVNHKNIISLLNVFTPQKTLEEFQDVYLVMELMDANLCQVIQMELDHERMSYLLYQMLCGIKHLHSAGIIHRDLKPSNIVVKSDCTLKILDFGLARTAGTSFMMTPYVVTRYYRAPEVILGMGYKENVDIWSVGCIMGEMVRHKILFPGRDYIDQWNKVIEQLGTPCPEFMKKLQPTVRNYVENRPKYAGLTFPKLFPDSLFPADSEHNKLKASQARDLLSKMLVIDPAKRISVDDALQHPYINVWYDPAXXXXXDEREHTIEEWKELIYKEVMNSE. The pIC50 is 6.0. (9) The compound is O=[N+]([O-])c1ccc(/C=C/c2sc(Nc3ccccc3)n[n+]2-c2ccccc2)cc1. The target protein sequence is MSRAYDLVVLGAGSGGLEAGWNAAVTHKKKVAVVDVQATHGPPLFAALGGTCVNVGCVPKKLMVTGAQYMDLIRESGGFGWEMDRESLCPNWKTLIAAKNKVVNSINESYKSMFADTEGLSFHMGFGALQDAHTVVVRKSEDPHSDVLETLDTEYILIATGSWPTRLGVPGDEFCITSNEAFYLEDAPKRMLCVGGGYIAVEFAGIFNGYKPCGGYVDLCYRGDLILRGFDTEVRKSLTKQLGANGIRVRTNLNPTKITKNEDGSNHVHFNDGTEEDYDQVMLAIGRVPRSQALQLDKAGVRTGKNGAVQVDAYSKTSVDNIYAIGDVTNRVMLTPVAINEGAAFVETVFGGKPRATDHTKVACAVFSIPPIGTCGMTEEEAAKNYETVAVYASSFTPLMHNISGSKHKEFMIRIITNESNGEVLGVHMLGDSAPEIIQSVGICMKMGAKISDFHSTIGVHPTSAEELCSMRTPAYFYESGKRVEKLSSNL. The pIC50 is 5.8. (10) The compound is CC[C@H](C)[C@@H]1NC(=O)[C@H](Cc2ccccc2)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CCCN=C(N)N)NC(=O)CNC(=O)[C@@H](N)CSSC[C@H](N)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](Cc2c[nH]c3ccccc23)C(=O)NCC(=O)N[C@@H](C)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](CO)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC(C)C)C(=O)N[C@H](C(=O)O)CSSC[C@@H](C(=O)NCC(=O)NCC(=O)N[C@@H](CO)C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](Cc2c[nH]c3ccccc23)C(=O)O)NC(=O)[C@H]([C@@H](C)O)NC(=O)[C@H](Cc2ccccc2)NC(=O)[C@H]([C@@H](C)CC)NC(=O)[C@H](C(C)C)NC(=O)[C@H](C)NC(=O)[C@H](CCCN=C(N)N)NC1=O. The target protein (Q8TDU9) has sequence MPTLNTSASPPTFFWANASGGSVLSADDAPMPVKFLALRLMVALAYGLVGAIGLLGNLAVLWVLSNCARRAPGPPSDTFVFNLALADLGLALTLPFWAAESALDFHWPFGGALCKMVLTATVLNVYASIFLITALSVARYWVVAMAAGPGTHLSLFWARIATLAVWAAAALVTVPTAVFGVEGEVCGVRLCLLRFPSRYWLGAYQLQRVVLAFMVPLGVITTSYLLLLAFLQRRQRRRQDSRVVARSVRILVASFFLCWFPNHVVTLWGVLVKFDLVPWNSTFYTIQTYVFPVTTCLAHSNSCLNPVLYCLLRREPRQALAGTFRDLRLRLWPQGGGWVQQVALKQVGRRWVASNPRESRPSTLLTNLDRGTPG. The pIC50 is 8.4.