This data is from Drug-target binding data from BindingDB using IC50 measurements. The task is: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50. (1) The small molecule is C[C@@H]1NC(=O)[C@@H]([C@@H](C)O)NC(=O)CNC(=O)[C@@H](Cc2cnc[nH]2)NC(=O)[C@H](Cc2c[nH]c3ccccc23)NC(=O)[C@@H](CC(N)=O)NC(=O)CNC(=O)C[C@H](C(=O)N[C@@H](Cc2c[nH]c3ccccc23)C(=O)N[C@@H](Cc2ccccc2)C(=O)N[C@@H](Cc2ccccc2)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](Cc2ccc(O)cc2)C(=O)N[C@@H](Cc2ccc(O)cc2)C(=O)N[C@H](Cc2ccc3ccccc3c2)C(=O)O)NC(=O)[C@H]2CCCN2C1=O. The target protein (P21450) has sequence METFWLRLSFWVALVGGVISDNPESYSTNLSIHVDSVATFHGTELSFVVTTHQPTNLALPSNGSMHNYCPQQTKITSAFKYINTVISCTIFIVGMVGNATLLRIIYQNKCMRNGPNALIASLALGDLIYVVIDLPINVFKLLAGRWPFEQNDFGVFLCKLFPFLQKSSVGITVLNLCALSVDRYRAVASWSRVQGIGIPLVTAIEIVSIWILSFILAIPEAIGFVMVPFEYKGAQHRTCMLNATSKFMEFYQDVKDWWLFGFYFCMPLVCTAIFYTLMTCEMLNRRNGSLRIALSEHLKQRREVAKTVFCLVVIFALCWFPLHLSRILKKTVYDEMDTNRCELLSFLLLMDYIGINLATMNSCINPIALYFVSKKFKNCFQSCLCCCCYQSKSLMTSVPMNGTSIQWKNHEQNNHNTERSSHKDSIN. The pIC50 is 6.4. (2) The drug is COc1cc2c(NCc3ccc4c(c3)OCO4)ncnc2c(OC)c1OC. The target protein sequence is MGLYLLRAGVRLPLAVALLAACCGGEALVQIGLGVGEDHLLSLPAATWLVLRLRLGVLMIALTSAVRTVSLISLERFKVAWRPYLAYLAGVLGILLARYVEQILPQSAGAAPREHFGSQLLAGTKEDIPEFKRRRRSSSVVSAEMSGCSSKSHRRTSLPCIPREQLMGHSEWDHKRGPRGSQSSGTSITVDIAVMGEAHGLITDLLADPSLPPNVCTSLRAVSNLLSTQLTFQAIHKPRVNPAVSFSENYTCSDSEESAEKDKLAIPKRLRRSLPPGLLRRVSSTWTTTTSATGLPTLEPSPVRRDRSASIKLHEAPSSSAINPDSWKNPVMMTLTKSRSFTSSYAVSASNHVKAKKQSRPGSLVKISPLSSPCSSALQGTPASSPVSKISTVQFPEPADATAKQGLSSHKALTYTQSAPDLSPHILTPPVICSSCGRPYSQGNPADGPLERSGPAIQAQSRTDDTAQVTSDYETNNNSDSSDIVQNEDETECSREPLRK.... The pIC50 is 4.0. (3) The compound is Nc1nc2c(CC3CCCC3)c[nH]c2c(=O)[nH]1. The target protein (P85973) has sequence MENEFTYEDYQRTAEWLRSHTKHRPQVAVICGSGLGGLTAKLTQPQAFDYNEIPNFPQSTVQGHAGRLVFGFLNGRSCVMMQGRFHMYEGYSLSKVTFPVRVFHLLGVDTLVVTNAAGGLNPKFEVGDIMLIRDHINLPGFCGQNPLRGPNDERFGVRFPAMSDAYDRDMRQKAFNAWKQMGEQRELQEGTYIMSAGPTFETVAESCLLRMLGADAVGMSTVPEVIVARHCGLRVFGFSLITNKVVMDYNNLEKASHQEVLEAGKAAAQKLEQFVSILMESIPPRERAN. The pIC50 is 7.5. (4) The drug is O=C1C(=O)C(C2CCC(c3ccc(Cl)cc3)CC2)C(=O)c2ccccc21. The target protein sequence is TATGDDHFYAEYLMPGLQRLLDPESAHRLAVRVTSLGLLPRATFQDSDMLEVKVLGHKFRNPVGIAAGFDKNGEAVDGLYKLGFGFVEVGSVTPQPQEGNPRPRVFRLPEDQAVINRYGFNSHGLSVVEHRLRARQQKQAQLTADGLPLGINLGKNKTSEDAAADYAEGVRTLGPLADYLVVNVSSPNTAGLRSLQGKTELRHLLSKVLQERDALKGTRKPAVLVKIAPDLTAQDKEDIASVARELGIDGLIVTNTTVSRPVGLQGALRSETGGLSGKPLRDLSTQTIREMYALTQGRIPIIGVGGVSSGQDALEKIQAGASLVQLYTALIFLGPPVVVRVKRELEALLKERGFTTVTDAIGADHRR. The pIC50 is 6.0. (5) The small molecule is O=S(=O)([O-])O[C@@H]1[C@H](O)C[C@@H](O)C[C@H]1OCCCCc1ccccc1O. The target protein (P20456) has sequence MADPWQECMDYAVTLAGQAGEVVREALKNEMNIMVKSSPADLVTATDQKVEKMLITSIKEKYPSHSFIGEESVAAGEKSILTDNPTWIIDPIDGTTNFVHGFPFVAVSIGFVVNKKMEFGIVYSCLEDKMYTGRKGKGAFCNGQKLQVSHQEDITKSLLVTELGSSRTPETVRIILSNIERLLCLPIHGIRGVGTAALNMCLVAAGAADAYYEMGIHCWDVAGAGIIVTEAGGVLLDVTGGPFDLMSRRVIASSNKTLAERIAKEIQIIPLQRDDED. The pIC50 is 4.0. (6) The target protein (Q9NYW2) has sequence MFSPADNIFIILITGEFILGILGNGYIALVNWIDWIKKKKISTVDYILTNLVIARICLISVMVVNGIVIVLNPDVYTKNKQQIVIFTFWTFANYLNMWITTCLNVFYFLKIASSSHPLFLWLKWKIDMVVHWILLGCFAISLLVSLIAAIVLSCDYRFHAIAKHKRNITEMFHVSKIPYFEPLTLFNLFAIVPFIVSLISFFLLVRSLWRHTKQIKLYATGSRDPSTEVHVRAIKTMTSFIFFFFLYYISSILMTFSYLMTKYKLAVEFGEIAAILYPLGHSLILIVLNNKLRQTFVRMLTCRKIACMI. The small molecule is Cc1noc(C)c1Cn1cc(N2C(=O)CN(Cc3ccc(O)cc3)C2=O)cn1. The pIC50 is 7.2. (7) The small molecule is CNC(=O)[C@H](Cc1ccccc1)NC(=O)[C@H](CS(C)=O)NC(=O)CS. The target protein (O88766) has sequence MLHLKTLPFLFFFHTQLATALPVPPEHLEEKNMKTAENYLRKFYHLPSNQFRSARNATMIAEKLKEMQRFFGLPETGKPDAATIEIMEKPRCGVPDSGDFLLTPGSPKWTHTNLTYRIINHTPQMSKAEVKTEIEKAFKIWSVPSTLTFTETLEGEADINIAFVSRDHGDNSPFDGPNGILAHAFQPGRGIGGDAHFDSEETWTQDSKNYNLFLVAAHEFGHSLGLSHSTDPGALMYPNYAYREPSTYSLPQDDINGIQTIYGPSDNPVQPTGPSTPTACDPHLRFDAATTLRGEIYFFKDKYFWRRHPQLRTVDLNFISLFWPFLPNGLQAAYEDFDRDLVFLFKGRQYWALSAYDLQQGYPRDISNYGFPRSVQAIDAAVSYNGKTYFFVNNQCWRYDNQRRSMDPGYPTSIASVFPGINCRIDAVFQQDSFFLFFSGPQYFAFNLVSRRVTRVARSNLWLNCP. The pIC50 is 6.3. (8) The drug is N#Cc1ccccc1S(=O)(=O)Nc1cccc(-c2c(O)ccc3cc(-c4cccc(O)c4)ccc23)c1. The target protein sequence is MSTFFSDTAWICLAVPTVLCGTVFCKYKKSSGQLWSWMVCLAGLCAVCLLILSPFWGLILFSVSCFLMYTYLSGQELLPVDQKAVLVTGGDCGLGHALCKYLDELGFTVFAGVLNENGPGAEELRRTCSPRLSVLQMDITKPVQIKDAYSKVAAMLQDRGLWAVINNAGVLGFPTDGELLLMTDYKQCMAVNFFGTVEVTKTFLPLLRKSKGRLVNVSSMGGGAPMERLASYGSSKAAVTMFSSVMRLELSKWGIKVASIQPGGFLTNIAGTSDKWEKLEKDILDHLPAEVQEDYGQDYILAQRNFLLLINSLASKDFSPVLRDIQHAILAKSPFAYYTPGKGAYLWICLAHYLPIGIYDYFAKRHFGQDKPMPRALRMPNYKKKAT. The pIC50 is 6.4. (9) The drug is CCCCC(CC#N)n1cc(-c2ncnc3[nH]ccc23)cn1. The target protein sequence is FFRAIMRDINKLEEQNPDIVSEKKPATEVDPTHFEKRFLKRIRDLGEGHFGKVELCRYDPEGDNTGEQVAVKSLKPESGGNHIADLKKEIEILRNLYHENIVKYKGICTEDGGNGIKLIMEFLPSGSLKEYLPKNKNKINLKQQLKYAVQICKGMDYLGSRQYVHRDLAARNVLVESEHQVKIGDFGLTKAIETDKEYYTVKDDRDSPVFWYAPECLMQSKFYIASDVWSFGVTLHELLTYCDSDSSPMALFLKMIGPTHGQMTVTRLVNTLKEGKRLPCPPNCPDEVYQLMRKCWEFQPSNRTSF. The pIC50 is 8.7. (10) The drug is CC(=O)NC1C(O)O[C@@H](CO)[C@@H](OC2OC(CO)[C@H](O)[C@H](OC3(C(=O)[O-])C[C@@H](O)[C@H](NC(=O)NC(=O)CC4OC(CO)C(O)C(O)C4O)C([C@H](O)[C@H](O)CO)O3)[C@@H]2O)C1O[C@@H]1O[C@@H](C)C(O)C(O)C1O. The target protein (P02763) has sequence MALSWVLTVLSLLPLLEAQIPLCANLVPVPITNATLDQITGKWFYIASAFRNEEYNKSVQEIQATFFYFTPNKTEDTIFLREYQTRQDQCIYNTTYLNVQRENGTISRYVGGQEHFAHLLILRDTKTYMLAFDVNDEKNWGLSVYADKPETTKEQLGEFYEALDCLRIPKSDVVYTDWKKDKCEPLEKQHEKERKQEEGES. The pIC50 is 2.1.