Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50.. Dataset: Drug-target binding data from BindingDB using IC50 measurements (1) The drug is COc1cc2c(cc1-c1c(C)noc1C)ncc1c2n([C@@H](C)c2ccccn2)c(=O)n1CC(=O)NCCOCCNC(=O)Cn1c(=O)n([C@H](C)c2ccccn2)c2c3cc(OC)c(-c4c(C)noc4C)cc3ncc21. The target protein sequence is NTKKNGRLTNQLQYLQKVVLKDLWKHSFSWPFQRPVDAVKLQLPDYYTIIKNPMDLNTIKKRLENKYYAKASECIEDFNTMFSNCYLYNKPGDDIVLMAQALEKLFMQKLSQMPQEE. The pIC50 is 6.7. (2) The compound is Cc1cc(O)c2c(c1)C(=O)C(c1c(C)cc3c(c1O)C(=O)C=CC3=O)=CC2=O. The target protein sequence is MTDTTLPPDDSLDRIEPVDIEQEMQRSYIDYAMSVIVGRALPEVRDGLKPVHRRVLYAMFDSGFRPDRSHAKSARSVAETMGNYHPHGDASIYDSLVRMAQPWSLRYPLVDGQGNFGSPGNDPPAAMRYTEARLTPLAMEMLREIDEETVDFIPNYDGRVQEPTVLPSRFPNLLANGSGGIAVGMATNIPPHNLRELADAVFWALENHDADEEETLAAVMGRVKGPDFPTAGLIVGSQGTADAYKTGRGSIRMRGVVEVEEDSRGRTSLVITELPYQVNHDNFITSIAEQVRDGKLAGISNIEDQSSDRVGLRIVIEIKRDAVAKVVINNLYKHTQLQTSFGANMLAIVDGVPRTLRLDQLIRYYVDHQLDVIVRRTTYRLRKANERAHILRGLVKALDALDEVIALIRASETVDIARAGLIELLDIDEIQAQAILDMQLRRLAALERQRIIDDLAKIEAEIADLEDILAKPERQRGIVRDELAEIVDRHGDDRRTRIIA.... The pIC50 is 4.8. (3) The small molecule is CC[C@H](C)[C@H](NC(=O)[C@H](CC(=O)O)NC(=O)[C@H](Cc1c[nH]c2ccccc12)NC(=O)[C@H](NC(C)=O)C1c2ccccc2CCc2ccccc21)C(=O)N[C@H](C(=O)N[C@@H](Cc1c[nH]c2ccccc12)C(=O)O)[C@@H](C)CC. The target protein (Q29010) has sequence METFCFRVSFWVALLGCVISDNPESHSTNLSTHVDDFTTFRGTEFSLVVTTHRPTNLALPSNGSMHNYCPQQTKITSAFKYINTVISCTIFIVGMVGNATLLRIIYQNKCMRNGPNALIASLALGDLIYVVIDLPINVFKLLAGRWPFENHDFGVFLCKLFPFLQKSSVGITVLNLCALSVDRYRAVASWSRVQGIGIPLVTAIEIVSIWILSFILAIPEAIGFVMVPFEYKGEEHKTCMLNATSKFMEFYQDVKDWWLFGFYFCMPLVCTAIFYTLMTCEMLNRRNGSLRIALSEHLKQRREVAKTVFCLVVIFALCWFPLHLSRILKKTVYDEMDKNRCELLSFLLLMDYIGINLATMNSCINPIALYFVSKKFKNCFQSCLCCCCYQSKSLMTSVPMNGTSIQWKNHEQNNHNTERSSHKDSIN. The pIC50 is 8.0. (4) The drug is O=c1cc(-c2ccc(O)cc2)oc2cc(O)cc(O)c12. The target protein (Q16678) has sequence MGTSLSPNDPWPLNPLSIQQTTLLLLLSVLATVHVGQRLLRQRRRQLRSAPPGPFAWPLIGNAAAVGQAAHLSFARLARRYGDVFQIRLGSCPIVVLNGERAIHQALVQQGSAFADRPAFASFRVVSGGRSMAFGHYSEHWKVQRRAAHSMMRNFFTRQPRSRQVLEGHVLSEARELVALLVRGSADGAFLDPRPLTVVAVANVMSAVCFGCRYSHDDPEFRELLSHNEEFGRTVGAGSLVDVMPWLQYFPNPVRTVFREFEQLNRNFSNFILDKFLRHCESLRPGAAPRDMMDAFILSAEKKAAGDSHGGGARLDLENVPATITDIFGASQDTLSTALQWLLLLFTRYPDVQTRVQAELDQVVGRDRLPCMGDQPNLPYVLAFLYEAMRFSSFVPVTIPHATTANTSVLGYHIPKDTVVFVNQWSVNHDPLKWPNPENFDPARFLDKDGLINKDLTSRVMIFSVGKRRCIGEELSKMQLFLFISILAHQCDFRANPNEP.... The pIC50 is 7.6. (5) The compound is COc1cc(-c2ccc3c(c2)[C@@]2(N=C(C)C(N)=N2)[C@]2(CC[C@H](OC)CC2)C3)cc(C#N)c1F. The target protein sequence is MAQALPWLLLWMGAGVLPAHGTQHGIRLPLRSGLGGAPLGLRLPRETDEEPEEPGRRGSFVEMVDNLRGKSGQGYYVEMTVGSPPQTLNILVDTGSSNFAVGAAPHPFLHRYYQRQLSSTYRDLRKGVYVPYTQGKWEGELGTDLVSIPHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEIARPDDSLEPFFDSLVKQTHVPNLFSLQLCGAGFPLNQSEVLASVGGSMIIGGIDHSLYTGSLWYTPIRREWYYEVIIVRVEINGQDLKMDCKEYNYDKSIVDSGTTNLRLPKKVFEAAVKSIKAASSTEKFPDGFWLGEQLVCWQAGTTPWNIFPVISLYLMGEVTNQSFRITILPQQYLRPVEDVATSQDDCYKFAISQSSTGTVMGAVIMEGFYVVFDRARKRIGFAVSACHVHDEFRTAAVEGPFVTLDMEDCGYNIPQTDESTLMTIAY. The pIC50 is 8.6. (6) The target protein (P0C1P9) has sequence MTEQQKFKVLADQIKISNQLDAEILNSGELTRIDVSNKNRTWEFHITLPQFLAHEDYLLFINAIEQEFKDIANVTCRFTVTNGTNQDEHAIKYFGHCIDQTALSPKVKGQLKQKKLIMSGKVLKVMVSNDIERNHFDKACNGSLIKAFRNCGFDIDKIIFETNDNDQEQNLASLEAHIQEEDEQSARLATEKLEKMKAEKAKQQDNKQSAVDKCQIGKPIQIENIKPIESIIEEEFKVAIEGVIFDINLKELKSGRHIVEIKVTDYTDSLVLKMFTRKNKDDLEHFKALSVGKWVRAQGRIEEDTFIRDLVMMMSDIEEIKKATKKDKAEEKRVEFHLHTAMSQMDGIPNIGAYVKQAADWGHPAIAVTDHNVVQAFPDAHAAAEKHGIKMIYGMEGMLVDDGVPIAYKPQDVVLKDATYVVFDVETTGLSNQYDKIIELAAVKVHNGEIIDKFERFSNPHERLSETIINLTHITDDMLVDAPEIEEVLTEFKEWVGDAI.... The pIC50 is 5.4. The small molecule is CCc1cc(Nc2nc(O)c(Cc3ccccc3)c(=O)[nH]2)ccc1C. (7) The pIC50 is 7.2. The target protein (P79227) has sequence MKFLLLILTLWVTSSGADPLKENDMLFAENYLENFYGLKVERIPMTKMKTNRNFIEEKVQEMQQFLGLNVTGQLDTSTLEMMHKPRCGVPDVYHFKTMPGRPVWRKHYITYRIKNYTPDMKREDVEYAIQKAFQVWSDVTPLKFRKITTGKADIMILFASGAHGDYGAFDGRGGVIAHAFGPGPGIGGDTHFDEDEIWSKSYKGTNLFLVAVHELGHALGLDHSNDPKAIMFPTYGYIDLNTFHLSADDIRGIQSLYGGPEQHQPMPKPDNPEPTACDHNLKFDAVTTVGNKIFFFKDSFFWWKIPKSSTTSVRLISSLWPTLPSGIEAAYEIGDRHQVFLFKGDKFWLISHLRLQPNYPKSIHSLGFPDFVKKIDAAVFNPSLRKTYFFVDNLYWRYDERREVMDAGYPKLITKHFPGIGPKIDAVFYFQRYYYFFQGPNQLEYDTFSSRVTKKLKSNSWFDC. The compound is CC(C)[C@H](NS(=O)(=O)c1ccc2c(c1)oc1ccc(-c3nccs3)cc12)C(=O)O. (8) The compound is Nc1ncnc2c1nc(N)n2C1O[C@H](COP(=O)(O)O[C@H]2C(n3c(N)nc4c(N)ncnc43)O[C@H](COP(=O)(O)O[C@H]3C(n4c(N)nc5c(N)ncnc54)O[C@H](COP(=O)(O)O[C@H]4C(n5c(N)nc6c(N)ncnc65)O[C@H](COP(=O)(O)O)[C@H]4O)[C@H]3O)[C@H]2O)[C@@H](O)[C@H]1O. The target protein (Q05921) has sequence METPDYNTPQGGTPSAGSQRTVVEDDSSLIKAVQKGDVVRVQQLLEKGADANACEDTWGWTPLHNAVQAGRVDIVNLLLSHGADPHRRKKNGATPFIIAGIQGDVKLLEILLSCGADVNECDENGFTAFMEAAERGNAEALRFLFAKGANVNLRRQTTKDKRRLKQGGATALMSAAEKGHLEVLRILLNDMKAEVDARDNMGRNALIRTLLNWDCENVEEITSILIQHGADVNVRGERGKTPLIAAVERKHTGLVQMLLSREGINIDARDNEGKTALLIAVDKQLKEIVQLLLEKGADKCDDLVWIARRNHDYHLVKLLLPYVANPDTDPPAGDWSPHSSRWGTALKSLHSMTRPMIGKLKIFIHDDYKIAGTSEGAVYLGIYDNREVAVKVFRENSPRGCKEVSCLRDCGDHSNLVAFYGREDDKGCLYVCVSLCEWTLEEFLRLPREEPVENGEDKFAHSILLSIFEGVQKLHLHGYSHQDLQPQNILIDSKKAVRLA.... The pIC50 is 6.3. (9) The compound is O=C(Cn1c(=O)n(-c2ccccc2)c(=O)n1CC(=O)c1ccccc1)c1ccccc1. The target is XTSFAESXKPVQQPSAFGS. The pIC50 is 4.0.