Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50.. Dataset: Drug-target binding data from BindingDB using IC50 measurements (1) The pIC50 is 8.7. The small molecule is COC1CCC2(CC1)Cc1ccc(C#CC3CC3)cc1C21N=C(N)N(Cc2ccccc2)C1=O. The target protein sequence is MAQALPWLLLWMGAGVLPAHGTQHGIRLPLRSGLGGAPLGLRLPRETDEEPEEPGRRGSFVEMVDNLRGKSGQGYYVEMTVGSPPQTLNILVDTGSSNFAVGAAPHPFLHRYYQRQLSSTYRDLRKGVYVPYTQGKWEGELGTDLVSIPHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDDSLEPFFDSLVKQTHVPNLFSLQLCGAGFPLNQSEVLASVGGSMIIGGIDHSLYTGSLWYTPIRREWYYEVIIVRVEINGQDLKMDCKEYNYDKSIVDSGTTNLRLPKKVFEAAVKSIKAASSTEKFPDGFWLGEQLVCWQAGTTPWNIFPVISLYLMGEVTNQSFRITILPQQYLRPVEDVATSQDDCYKFAISQSSTGTVMGAVIMEGFYVVFDRARKRIGFAVSACHVHDEFRTAAVEGPFVTLDMEDCGYNIPQTDESTLMTI. (2) The drug is CCN(CC)c1ccc2c(-c3ccc(S(=O)(=O)NCCCC[C@H](NC(=O)Cc4csc(=[NH2+])n4C)C(=O)N[C@@H](Cc4cn(Cc5ccccc5)c[n+]4C)C(=O)NC4CC[NH+](C)CC4)cc3S(=O)(=O)[O-])c3ccc(=[N+](CC)CC)cc-3oc2c1.O=C([O-])C(F)(F)F.O=C([O-])C(F)(F)F.O=C([O-])C(F)(F)F. The target protein (Q9R0R3) has sequence MNLSFCVQALLLLWLSLTAVCGVPLMLPPDGKGLEEGNMRYLVKPRTSRTGPGAWQGGRRKFRRQRPRLSHKGPMPF. The pIC50 is 5.5. (3) The pIC50 is 7.7. The drug is CSCC[C@H](NC(=O)[C@H](Cc1ccccc1)NC[C@@H](NC[C@@H](N)CS)C(C)C)C(=O)O. The target protein sequence is MEFVKCLGHPEEFYNLLRFQMGGRRKVIPKMDQDSLSSSLKTCYKYLNQTSRSFAAVIQALDGEMRHAVCIFYLVLRALDTLEDDMTISIERKVPLLHNFHSYLYEPDWRFTESKEKDRQVLEDFPTISLEFRNLAEKYQTVIVDVCQKMGFGMAEFLDKRVTSEREWDKYCHYVAGLVGIGLSRLFSASELEDPLIGEDTERANSMGLFLQKTNIIRDYLEDQREGREFWPQETWSKYVKKLGDFAKPENIDLAVQCLNELITNTLHHIPDVITYLSRLRNQSIFNFCAIPQVMAIATLAACYNNQQVFKGVVKIRKGQAVTLMMDATNMPAVKAIIHQYMEEIYHRIPNSDPCSTKTQQIISTIRTQNLPNCQLVSRSHYSPIYLSFVMLLAALSWQYLSTLSQVTEDYVQTGEH. (4) The drug is O=C(Nc1nc2cccc(-c3ccc(OCc4ccccc4)c(F)c3)n2n1)C1CC1. The target protein sequence is PHNLADVLTVNPDSPASDPTVFHKRYLKKIRDLGEGHFGKVSLYCYDPTNDGTGEMVAVKALKADCGPQHRSGWKQEIDILRTLYHEHIIKYKGCCEDQGEKSLQLVMEYVPLGSLRDYLPRHSIGLAQLLLFAQQICEGMAYLHAQHYIHRDLAARNVLLDNDRLVKIGDFGLAKAVPEGHEYYRVREDGDSPVFWYAPECLKEYKFYYASDVWSFGVTLYELLTHCDSSQSPPTKFLELIGIAQGQMTVLRLTELLERGERLPRPDKCPCEVYHLMKNCWETEASFRPTFENLIPILKTVHEKYQGQAPSVFSVC. The pIC50 is 6.0. (5) The compound is C[C@@H]1CCCC(N)=NC1. The target protein sequence is MEDHMFGVQQIQPNVISVRLFKRKVGGLGFLVKERVSKPPVIISDLIRGGAAEQSGLIQAGDIILAVNGRPLVDLSYDSALEVLRGIASETHVVLILRGPEGFTTHLETTFTGDGTPKTIRVTQPLGPPTKAVDLSHQPPAGKEQPLAVDGASGPGNGPQHAYDDGQEAGSLPHANGLAPRPPGQDPAKKATRVSLQGRGENNELLKEIEPVLSLLTSGSRGVKGGAPAKAEMKDMGIQVDRDLDGKSHKPLPLGVENDRVFNDLWGKGNVPVVLNNPYSEKEQPPTSGKQSPTKNGSPSKCPRFLKVKNWETEVVLTDTLHLKSTLETGCTEYICMGSIMHPSQHARRPEDVRTKGQLFPLAKEFIDQYYSSIKRFGSKAHMERLEEVNKEIDTTSTYQLKDTELIYGAKHAWRNASRCVGRIQWSKLQVFDARDCTTAHGMFNYICNHVKYATNKGNLRSAITIFPQRTDGKHDFRVWNSQLIRYAGYKQPDGSTLGD.... The pIC50 is 5.4. (6) The drug is Cc1cnccc1-c1ccc(-c2nc(-c3ccc(Cl)c(Cl)c3)cs2)cc1C(=O)O. The target protein (P06730) has sequence MATVEPETTPTPNPPTTEEEKTESNQEVANPEHYIKHPLQNRWALWFFKNDKSKTWQANLRLISKFDTVEDFWALYNHIQLSSNLMPGCDYSLFKDGIEPMWEDEKNKRGGRWLITLNKQQRRSDLDRFWLETLLCLIGESFDDYSDDVCGAVVNVRAKGDKIAIWTTECENREAVTHIGRVYKERLGLPPKIVIGYQSHADTATKSGSTTKNRFVV. The pIC50 is 4.9. (7) The compound is COC(=O)[C@H]1[C@H]2C[C@@H]3c4[nH]c5cc(OC)ccc5c4CCN3C[C@H]2C[C@@H](OC(=O)c2cc(OC)c(OC)c(OC)c2)[C@@H]1OC. The target protein sequence is MALSDLVLLRWLRDSRHSRKLILFIVFLALLLDNMLLTVVVPIIPSYLYSIKHEKNSTEIQTTRPELVVSTSESIFSYYNNSTVLITGNATGTLPGGQSHKATSTQHTVANTTVPSDCPSEDRDLLNENVQGGLLFASKATVQLLTNPFIGLLTNRIGYPIPMFAGFCIMFISTVMFAFSSSYAFLLIARSLQGIGSSCSSVAGMGMLASVYTDDEERGKPMGIALGGLAMGVLVGPPFGSVLYEFVGKTAPFLVLAALVLLDGAIQLFVLQPSRVQPESQKGTPLTTLLKDPYILIAAGSICFANMGIAMLEPALPIWMMETMCSRKWQLGVAFLPASISYLIGTNIFGILAHKMGRWLCALLGMVIVGISILCIPFAKNIYGLIAPNFGVGFAIGMVDSSMMPIMGYLVDLRHVSVYGSVYAIADVAFCMGYAIGPSAGGAIAKAIGFPWLMTIIGIIDIAFAPLCFFLRSPPAKEEKMAILMDHNCPIKRKMYTQNN.... The pIC50 is 7.0. (8) The compound is Nc1nc(Nc2ccc(S(N)(=O)=O)cc2)sc1C(=O)c1ccccc1[N+](=O)[O-]. The target protein (O75909) has sequence MKENKENSSPSVTSANLDHTKPCWYWDKKDLAHTPSQLEGLDPATEARYRREGARFIFDVGTRLGLHYDTLATGIIYFHRFYMFHSFKQFPRYVTGACCLFLAGKVEETPKKCKDIIKTARSLLNDVQFGQFGDDPKEEVMVLERILLQTIKFDLQVEHPYQFLLKYAKQLKGDKNKIQKLVQMAWTFVNDSLCTTLSLQWEPEIIAVAVMYLAGRLCKFEIQEWTSKPMYRRWWEQFVQDVPVDVLEDICHQILDLYSQGKQQMPHHTPHQLQQPPSLQPTPQVPQVQQSQPSQSSEPSQPQQKDPQQPAQQQQPAQQPKKPSPQPSSPRQVKRAVVVSPKEENKAAEPPPPKIPKIETTHPPLPPAHPPPDRKPPLAAALGEAEPPGPVDATDLPKVQIPPPAHPAPVHQPPPLPHRPPPPPPSSYMTGMSTTSSYMSGEGYQSLQSMMKTEGPSYGALPPAYGPPAHLPYHPHVYPPNPPPPPVPPPPASFPPPAIP.... The pIC50 is 7.9. (9) The target protein sequence is MFKLLSKLLVYLTASIMAIASPLAFSVDSSGEYPTVSEIPVGEVRLYQIADGVWSHIATQSFDGAVYPSNGLIVRDGDELLLIDTAWGAKNTAALLAEIEKQIGLPVTRAVSTHFHDDRVGGVDVLRAAGVATYASPSTRRLAEVEGNEIPTHSLEGLSSSGDAVRFGPVELFYPGAAHSTDNLVVYVPSASVLYGGCAIYELSRTSAGNVADADLAEWPTSIERIQQHYPEAQFVIPGHGLPGGLDLLKHTTNVVKAHTNRSVVE. The pIC50 is 6.3. The compound is CN1CCN(NC(=O)CN2C(=O)/C(=C/c3c(Cl)ccc(Cl)c3Cl)SC2=S)CC1.