From a dataset of Drug-target binding data from BindingDB using IC50 measurements. Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50. (1) The drug is [NH2+]=C1CCCC(O)(O)C2NC(=[NH2+])NC2C(COC(=O)NCCCCCNC(=O)c2ccc(C(=O)c3ccccc3)cc2)N1. The target protein (P15390) has sequence MASSSLPNLVPPGPHCLRPFTPESLAAIEQRAVEEEARLQRNKQMEIEEPERKPRSDLEAGKNLPLIYGDPPPEVIGIPLEDLDPYYSDKKTFIVLNKGKAIFRFSATPALYLLSPFSIVRRVAIKVLIHALFSMFIMITILTNCVFMTMSNPPSWSKHVEYTFTGIYTFESLIKMLARGFCIDDFTFLRDPWNWLDFSVITMAYVTEFVDLGNISALRTFRVLRALKTITVIPGLKTIVGALIQSVKKLSDVMILTVFCLSVFALVGLQLFMGNLRQKCVRWPPPMNDTNTTWYGNDTWYSNDTWYGNDTWYINDTWNSQESWAGNSTFDWEAYINDEGNFYFLEGSNDALLCGNSSDAGHCPEGYECIKAGRNPNYGYTSYDTFSWAFLALFRLMTQDYWENLFQLTLRAAGKTYMIFFVVIIFLGSFYLINLILAVVAMAYAEQNEATLAEDQEKEEEFQQMLEKYKKHQEELEKAKAAQALESGEEADGDPTHNKD.... The pIC50 is 7.3. (2) The compound is CCOc1ccccc1-c1cc(F)c(NC(=O)c2ccsc2C(=O)O)c(F)c1. The target protein sequence is MATGDERFYAEHLMPTLQGLLDPESAHRLAVRFTSLGLLPRARFQDSDMLEVRVLGHKFRNPVGIAAGFDKHGEAVDGLYKMGFGFVEIGSVTPKPQEGNPRPRVFRLPEDQAVINRYGFNSHGLSVVEHRLRARQQKQAKLTEDGLPLGVNLGKNKTSVDAAEDYAEGVRVLGPLADYLVVNVSSPNTAGLRSLQGKAELRRLLTKVLQERDGLRRVHRPAVLVKIAPDLTSQDKEDIASVVKELGIDGLIVTNTTVSRPAGLQGALRSETGGLSGKPLRDLSTQTIREMYALTQGRVPIIGVGGVSSGQDALEKIRAGASLVQLYTALTFWGPPVVGKVKRELEALLKEQGFGGVTDAIGADHRR. The pIC50 is 8.0. (3) The drug is COc1ccc(S(=O)(=O)N(Cc2ccccc2)[C@H](C)C(=O)NO)cc1. The target protein (P34960) has sequence MSCTLLKGVCTMKFLMMIVFLQVSACGAAPMNDSEFAEWYLSRFYDYGKDRIPMTKTKTNRNFLKEKLQEMQQFFGLEATGQLDNSTLAIMHIPRCGVPDVQHLRAVPQRSRWMKRYLTYRIYNYTPDMKREDVDYIFQKAFQVWSDVTPLRFRKLHKDEADIMILFAFGAHGDFNYFDGKGGTLAHAFYPGPGIQGDAHFDEAETWTKSFQGTNLFLVAVHELGHSLGLQHSNNPKSIMYPTYRYLNPSTFRLSADDIRNIQSLYGAPVKPPSLTKPSSPPSTFCHQSLSFDAVTTVGEKIFFFKDWFFWWKLPGSPATNITSISSIWPSIPSGIQAAYEIESRNQLFLFKDEKYWLINNLVPEPHYPRSIYSLGFSASVKKVDAAVFDPLRQKVYFFVDKHYWRYDVRQELMDPAYPKLISTHFPGIKPKIDAVLYFKRHYYIFQGAYQLEYDPLFRRVTKTLKSTSWFGC. The pIC50 is 8.0. (4) The drug is COc1cc2nc(-c3ccsc3)c(CNCCc3ccc(Br)cc3)cc2cc1O. The target protein (Q80SS6) has sequence MMTPNSTELSAIPMGVLGLSLALASLIVIANLLLALGIALDRHLRSPPAGCFFLSLLLAGLLTGLALPMLPGLWSRNHQGYWSCLLLHLTPNFCFLSLLANLLLVHGERYMAVLQPLRPHGSVRLALFLTWVSSLFFASLPALGWNHWSPDANCSSQAVFPAPYLYLEVYGLLLPAVGATALLSVRVLATAHRQLCEIRRLERAVCRDVPSTLARALTWRQARAQAGATLLFLLCWGPYVATLLLSVLAYERRPPLGPGTLLSLISLGSTSAAAVPVAMGLGDQRYTAPWRTAAQRCLRVLRGRAKRDNPGPSTAYHTSSQCSIDLDLN. The pIC50 is 6.5. (5) The drug is CC(C)CN1CCN(C2CCN(c3nnc(N)[nH]3)CC2)[C@@H](Cc2ccc(Cl)cc2)C1. The target protein (Q13231) has sequence MVRSVAWAGFMVLLMIPWGSAAKLVCYFTNWAQYRQGEARFLPKDLDPSLCTHLIYAFAGMTNHQLSTTEWNDETLYQEFNGLKKMNPKLKTLLAIGGWNFGTQKFTDMVATANNRQTFVNSAIRFLRKYSFDGLDLDWEYPGSQGSPAVDKERFTTLVQDLANAFQQEAQTSGKERLLLSAAVPAGQTYVDAGYEVDKIAQNLDFVNLMAYDFHGSWEKVTGHNSPLYKRQEESGAAASLNVDAAVQQWLQKGTPASKLILGMPTYGRSFTLASSSDTRVGAPATGSGTPGPFTKEGGMLAYYEVCSWKGATKQRIQDQKVPYIFRDNQWVGFDDVESFKTKVSYLKQKGLGGAMVWALDLDDFAGFSCNQGRYPLIQTLRQELSLPYLPSGTPELEVPKPGQPSEPEHGPSPGQDTFCQGKADGLYPNPRERSSFYSCAAGRLFQQSCPTGLVFSNSCKCCTWN. The pIC50 is 4.3. (6) The drug is COc1ccc(-c2cc(=O)c3ccccc3o2)cc1. The target protein (Q96P20) has sequence MKMASTRCKLARYLEDLEDVDLKKFKMHLEDYPPQKGCIPLPRGQTEKADHVDLATLMIDFNGEEKAWAMAVWIFAAINRRDLYEKAKRDEPKWGSDNARVSNPTVICQEDSIEEEWMGLLEYLSRISICKMKKDYRKKYRKYVRSRFQCIEDRNARLGESVSLNKRYTRLRLIKEHRSQQEREQELLAIGKTKTCESPVSPIKMELLFDPDDEHSEPVHTVVFQGAAGIGKTILARKMMLDWASGTLYQDRFDYLFYIHCREVSLVTQRSLGDLIMSCCPDPNPPIHKIVRKPSRILFLMDGFDELQGAFDEHIGPLCTDWQKAERGDILLSSLIRKKLLPEASLLITTRPVALEKLQHLLDHPRHVEILGFSEAKRKEYFFKYFSDEAQARAAFSLIQENEVLFTMCFIPLVCWIVCTGLKQQMESGKSLAQTSKTTTAVYVFFLSSLLQPRGGSQEHGLCAHLWGLCSLAADGIWNQKILFEESDLRNHGLQKADVS.... The pIC50 is 4.5. (7) The drug is Cc1onc(-c2ccccc2)c1C(=O)Nc1ccc2c(c1)OC1(CCCC1)O2. The target protein (P42582) has sequence MFPSPALTPTPFSVKDILNLEQQQRSLASGDLSARLEATLAPASCMLAAFKPEAYSGPEAAASGLAELRAEMGPAPSPPKCSPAFPAAPTFYPGAYGDPDPAKDPRADKKELCALQKAVELDKAETDGAERPRARRRRKPRVLFSQAQVYELERRFKQQRYLSAPERDQLASVLKLTSTQVKIWFQNRRYKCKRQRQDQTLELLGPPPPPARRIAVPVLVRDGKPCLGDPAAYAPAYGVGLNAYGYNAYPYPSYGGAACSPGYSCAAYPAAPPAAQPPAASANSNFVNFGVGDLNTVQSPGMPQGNSGVSTLHGIRAW. The pIC50 is 4.6. (8) The small molecule is CCC(Oc1ccc(Br)cc1)c1nnc(Nc2ccc(F)cc2)o1. The target protein sequence is MKLSPREVEKLGLHNAGYLAQKRLARGVRLNYTEAVALIASQIMEYARDGEKTVAQLMCLGQHLLGRRQVLPAVPHLLNAVQVEATFPDGTKLVTVHDPISRENGELQEALFGSLLPVPSLDKFAETKEDNRIPGEILCEDECLTLNIGRKAVILKVTSKGDRPIQVGSHYHFIEVNPYLTFDRRKAYGMRLNIAAGTAVRFEPGDCKSVTLVSIEGNKVIRGGNAIADGPVNETNLEAAMHAVRSKGFGHEEEKDASEGFTKEDPNCPFNTFIHRKEYANKYGPTTGDKIRLGDTNLLAEIEKDYALYGDECVFGGGKVIRDGMGQSCGHPPAISLDTVITNAVIIDYTGIIKADIGIKDGLIASIGKAGNPDIMNGVFSNMIIGANTEVIAGEGLIVTAGAIDCHVHYICPQLVYEAISSGITTLVGGGTGPAAGTRATTCTPSPTQMRLMLQSTDYLPLNFGFTGKGSSSKPDELHEIIKAGAMGLKLHEDWGSTPA.... The pIC50 is 5.1. (9) The compound is CCOc1ccc(C[C@@H](NC(=O)c2ccccc2)C(=O)N[C@@H](Cc2ccccc2)C(=O)NCc2ccc(CN)cc2)cc1. The target protein sequence is IPPWEAPKEHKYKAEEHTVVLTVTGEPCHFPFQYHRQLYHKCTHKGRPGPQPWCATTPNFDQDQRWGYCLEPKKVKDHCSKHSPCQKGGTCVNMPSGPHCLCPQHLTGNHCQKEKCFEPQLLRFFHKNEIWYRTEQAAVARCQCKGPDAHCQRLASQACRTNPCLHGGRCLEVEGHRLCHCPVGYTGAFCDVDTKASCYDGRGLSYRGLARTTLSGAPCQPWASEATYRNVTAEQARNWGLGGHAFCRNPDNDIRPWCFVLNRDRLSWEYCDLAQCQTPTQAAPPTPVSPRLHVPLMPAQPAPPKPQPTTRTPPQSQTPGALPAKREQPPSLTRNGPLSCGQRLRKSLSSMTRVVGGLVALRGAHPYIAALYWGHSFCAGSLIAPCWVLTAAHCLQDRPAPEDLTVVLGQERRNHSCEPCQTLAVRSYRLHEAFSPVSYQHDLALLRLQEDADGSCALLSPYVQPVCLPSGAARPSETTLCQVAGWGHQFEGAEEYASFL.... The pIC50 is 5.0.