This data is from Drug-target binding data from BindingDB using IC50 measurements. The task is: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50. (1) The drug is O=c1oc2c(O)c(O)cc3c(=O)oc4c(O)c(O)cc1c4c23. The target protein (P11142) has sequence MSKGPAVGIDLGTTYSCVGVFQHGKVEIIANDQGNRTTPSYVAFTDTERLIGDAAKNQVAMNPTNTVFDAKRLIGRRFDDAVVQSDMKHWPFMVVNDAGRPKVQVEYKGETKSFYPEEVSSMVLTKMKEIAEAYLGKTVTNAVVTVPAYFNDSQRQATKDAGTIAGLNVLRIINEPTAAAIAYGLDKKVGAERNVLIFDLGGGTFDVSILTIEDGIFEVKSTAGDTHLGGEDFDNRMVNHFIAEFKRKHKKDISENKRAVRRLRTACERAKRTLSSSTQASIEIDSLYEGIDFYTSITRARFEELNADLFRGTLDPVEKALRDAKLDKSQIHDIVLVGGSTRIPKIQKLLQDFFNGKELNKSINPDEAVAYGAAVQAAILSGDKSENVQDLLLLDVTPLSLGIETAGGVMTVLIKRNTTIPTKQTQTFTTYSDNQPGVLIQVYEGERAMTKDNNLLGKFELTGIPPAPRGVPQIEVTFDIDANGILNVSAVDKSTGKENK.... The pIC50 is 4.0. (2) The drug is NC(=O)c1nc(Nc2ccc3ccccc3c2)sc1NC(=O)c1ccc(CN2CC[C@H](N)C2)cc1. The target protein (Q9UKE5) has sequence MASDSPARSLDEIDLSALRDPAGIFELVELVGNGTYGQVYKGRHVKTGQLAAIKVMDVTGDEEEEIKQEINMLKKYSHHRNIATYYGAFIKKNPPGMDDQLWLVMEFCGAGSVTDLIKNTKGNTLKEEWIAYICREILRGLSHLHQHKVIHRDIKGQNVLLTENAEVKLVDFGVSAQLDRTVGRRNTFIGTPYWMAPEVIACDENPDATYDFKSDLWSLGITAIEMAEGAPPLCDMHPMRALFLIPRNPAPRLKSKKWSKKFQSFIESCLVKNHSQRPATEQLMKHPFIRDQPNERQVRIQLKDHIDRTKKKRGEKDETEYEYSGSEEEEEENDSGEPSSILNLPGESTLRRDFLRLQLANKERSEALRRQQLEQQQRENEEHKRQLLAERQKRIEEQKEQRRRLEEQQRREKELRKQQEREQRRHYEEQMRREEERRRAEHEQEYIRRQLEEEQRQLEILQQQLLHEQALLLEYKRKQLEEQRQAERLQRQLKQERDYL.... The pIC50 is 8.5. (3) The compound is Oc1cccc(C=NNC(=S)NN=Cc2cccc(O)c2O)c1O. The target protein (P11412) has sequence MSEGPVKFEKNTVISVFGASGDLAKKKTFPALFGLFREGYLDPSTKIFGYARSKLSMEEDLKSRVLPHLKKPHGEADDSKVEQFFKMVSYISGNYDTDEGFDELRTQIEKFEKSANVDVPHRLFYLALPPSVFLTVAKQIKSRVYAENGITRVIVEKPFGHDLASARELQKNLGPLFKEEELYRIDHYLGKELVKNLLVLRFGNQFLNASWNRDNIQSVQISFKERFGTEGRGGYFDSIGIIRDVMQNHLLQIMTLLTMERPVSFDPESIRDEKVKVLKAVAPIDTDDVLLGQYGKSEDGSKPAYVDDDTVDKDSKCVTFAAMTFNIENERWEGVPIMMRAGKALNESKVEIRLQYKAVASGVFKDIPNNELVIRVQPDAAVYLKFNAKTPGLSNATQVTDLNLTYASRYQDFWIPEAYEVLIRDALLGDHSNFVRDDELDISWGIFTPLLKHIERPDGPTPEIYPYGSRGPKGLKEYMQKHKYVMPEKHPYAWPVTKPE.... The pIC50 is 4.2. (4) The drug is O=c1[nH]c(COCCc2ccc(F)cc2)nc2ncccc12. The target protein (Q80Z39) has sequence MSKQNHFLVINGKNCCVFRDENIAKVLPPVLGLEFVFGLLGNGLALWIFCFHLKSWKSSRIFLFNLAVADFLLIICLPFLTDNYVQNWDWRFGSIPCRVMLFMLAMNRQGSIIFLTVVAVDRYFRVVHPHHFLNKISNRTAAIISCFLWGITIGLTVHLLYTDMMTRNGDANLCSSFSICYTFRWHDAMFLLEFFLPLGIILFCSGRIIWSLRQRQMDRHVKIKRAINFIMVVAIVFVICFLPSVAVRIRIFWLLYKHNVRNCDIYSSVDLAFFTTLSFTYMNSMLDPVVYYFSSPSFPNFFSTCINRCLRRKTLGEPDNNRSTSVELTGDPSTIRSIPGALMTDPSEPGSPPYLASTSR. The pIC50 is 5.6. (5) The drug is O=C(O)Cn1c(O)csc1=S. The target is XTSFAESXKPVQQPSAFGS. The pIC50 is 4.0. (6) The compound is CC[C@H](C)[C@H](NC(=O)c1cnc2ccccc2n1)[C@@H](O)C[C@@H](CC(C)C)C(=O)NC. The target protein (P32246) has sequence METPNTTEDYDTTTEFDYGDATPCQKVNERAFGAQLLPPLYSLVFVIGLVGNILVVLVLVQYKRLKNMTSIYLLNLAISDLLFLFTLPFWIDYKLKDDWVFGDAMCKILSGFYYTGLYSEIFFIILLTIDRYLAIVHAVFALRARTVTFGVITSIIIWALAILASMPGLYFSKTQWEFTHHTCSLHFPHESLREWKLFQALKLNLFGLVLPLLVMIICYTGIIKILLRRPNEKKSKAVRLIFVIMIIFFLFWTPYNLTILISVFQDFLFTHECEQSRHLDLAVQVTEVIAYTHCCVNPVIYAFVGERFRKYLRQLFHRRVAVHLVKWLPFLSVDRLERVSSTSPSTGEHELSAGF. The pIC50 is 5.4. (7) The compound is CON1C(=O)[C@@H](N)Cc2ccccc21. The target protein (Q64602) has sequence MNYSRFLTATSLARKTSPIRATVEIMSRAPKDIISLAPGSPNPKVFPFKSAVFTVENGSTIRFEGEMFQRALQYSSSYGIPELLSWLKQLQIKLHNPPTVNYSPNEGQMDLCITSGCQDGLCKVFEMLINPGDTVLVNEPLYSGALFAMKPLGCNFISVPSDDCGIIPEGLKKVLSQWKPEDSKDPTKRTPKFLYTIPNGNNPTGNSLTGDRKKEIYELARKYDFLIIEDDPYYFLQFTKPWEPTFLSMDVDGRVIRADSLSKVISSGLRVGFITGPKSLIQRIVLHTQISSLHPCTLSQLMISELLYQWGEEGFLAHVDRAIDFYKNQRDFILAAADKWLRGLAEWHVPKAGMFLWIKVNGISDAKKLIEEKAIEREILLVPGNSFFVDNSAPSSFFRASFSQVTPAQMDLVFQRLAQLIKDVS. The pIC50 is 5.0.