Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50.. Dataset: Drug-target binding data from BindingDB using IC50 measurements (1) The drug is Cc1sc(NC(=O)C2=C(C(=O)O)CCCC2)c(-c2nc(C3CC3)no2)c1C. The target protein (Q01469) has sequence MATVQQLEGRWRLVDSKGFDEYMKELGVGIALRKMGAMAKPDCIITCDGKNLTIKTESTLKTTQFSCTLGEKFEETTADGRKTQTVCNFTDGALVQHQEWDGKESTITRKLKDGKLVVECVMNNVTCTRIYEKVE. The pIC50 is 7.0. (2) The drug is CO/N=C(\C(=O)NCP(=O)(O)Oc1ccc(C#N)c(F)c1)c1cnc(NC(=O)C(Cl)(Cl)Cl)s1. The target protein (P05364) has sequence MMRKSLCCALLLGISCSALATPVSEKQLAEVVANTITPLMKAQSVPGMAVAVIYQGKPHYYTFGKADIAANKPVTPQTLFELGSISKTFTGVLGGDAIARGEISLDDAVTRYWPQLTGKQWQGIRMLDLATYTAGGLPLQVPDEVTDNASLLRFYQNWQPQWKPGTTRLYANASIGLFGALAVKPSGMPYEQAMTTRVLKPLKLDHTWINVPKAEEAHYAWGYRDGKAVRVSPGMLDAQAYGVKTNVQDMANWVMANMAPENVADASLKQGIALAQSRYWRIGSMYQGLGWEMLNWPVEANTVVEGSDSKVALAPLPVAEVNPPAPPVKASWVHKTGSTGGFGSYVAFIPEKQIGIVMLANTSYPNPARVEAAYHILEALQ. The pIC50 is 6.2. (3) The small molecule is CCC(C)[C@H](N)C(=O)NS(=O)(=O)OC[C@H]1O[C@@H](c2nc(CCc3ccc(Oc4ccccc4)cc3)cs2)[C@H](O)[C@@H]1O. The target protein (P41972) has sequence MDYKETLLMPKTDFPMRGGLPNKEPQIQEKWDAEDQYHKALEKNKGNETFILHDGPPYANGNLHMGHALNKILKDFIVRYKTMQGFYAPYVPGWDTHGLPIEQALTKKGVDRKKMSTAEFREKCKEFALEQIELQKKDFRRLGVRGDFNDPYITLKPEYEAAQIRIFGEMADKGLIYKGKKPVYWSPSSESSLAEAEIEYHDKRSASIYVAFNVKDDKGVVDADAKFIIWTTTPWTIPSNVAITVHPELKYGQYNVNGEKYIIAEALSDAVAEALDWDKASIKLEKEYTGKELEYVVAQHPFLDRESLVINGDHVTTDAGTGCVHTAPGHGEDDYIVGQKYELPVISPIDDKGVFTEEGGQFEGMFYDKANKAVTDLLTEKGALLKLDFITHSYPHDWRTKKPVIFRATPQWFASISKVRQDILDAIENTNFKVNWGKTRIYNMVRDRGEWVISRQRVWGVPLPVFYAENGEIIMTKETVNHVADLFAEHGSNIWFEREA.... The pIC50 is 6.9. (4) The compound is O=C(Nc1cccc(Br)c1)Nc1cc(Cl)ccc1C(=O)O. The target protein (P19493) has sequence MRIICRQIVLLFSGFWGLAMGAFPSSVQIGGLFIRNTDQEYTAFRLAIFLHNTSPNASEAPFNLVPHVDNIETANSFAVTNAFCSQYSRGVFAIFGLYDKRSVHTLTSFCSALHISLITPSFPTEGESQFVLQLRPSLRGALLSLLDHYEWNCFVFLYDTDRGYSILQAIMEKAGQNGWHVSAICVENFNDVSYRQLLEELDRRQEKKFVIDCEIERLQNILEQIVSVGKHVKGYHYIIANLGFKDISLERFIHGGANVTGFQLVDFNTPMVTKLMDRWKKLDQREYPGSETPPKYTSALTYDGVLVMAETFRSLRRQKIDISRRGNAGDCLANPAAPWGQGIDMERTLKQVRIQGLTGNVQFDHYGRRVNYTMDVFELKSTGPRKVGYWNDMDKLVLIQDMPTLGNDTAAIENRTVVVTTIMESPYVMYKKNHEMFEGNDKYEGYCVDLASEIAKHIGIKYKIAIVPDGKYGARDADTKIWNGMVGELVYGKAEIAIAP.... The pIC50 is 4.0. (5) The compound is COC(=O)C1C[C@@H](NC(=O)c2ccc(O)cc2)[C@H](OC(=O)c2cc(O)c(C(=O)c3c(O)cccc3C(=O)O)c(O)c2)C1. The target protein (P24723) has sequence MSSGTMKFNGYLRVRIGEAVGLQPTRWSLRHSLFKKGHQLLDPYLTVSVDQVRVGQTSTKQKTNKPTYNEEFCANVTDGGHLELAVFHETPLGYDHFVANCTLQFQELLRTTGASDTFEGWVDLEPEGKVFVVITLTGSFTEATLQRDRIFKHFTRKRQRAMRRRVHQINGHKFMATYLRQPTYCSHCREFIWGVFGKQGYQCQVCTCVVHKRCHHLIVTACTCQNNINKVDSKIAEQRFGINIPHKFSIHNYKVPTFCDHCGSLLWGIMRQGLQCKICKMNVHIRCQANVAPNCGVNAVELAKTLAGMGLQPGNISPTSKLVSRSTLRRQGKESSKEGNGIGVNSSNRLGIDNFEFIRVLGKGSFGKVMLARVKETGDLYAVKVLKKDVILQDDDVECTMTEKRILSLARNHPFLTQLFCCFQTPDRLFFVMEFVNGGDLMFHIQKSRRFDEARARFYAAEIISALMFLHDKGIIYRDLKLDNVLLDHEGHCKLADFGM.... The pIC50 is 7.0. (6) The small molecule is CN1CCN(C2=C(Cl)C(=O)N(c3c(Cl)cccc3Cl)C2=O)CC1. The target protein sequence is MAESELMHIHSLAEHYLQYVLQVPAFESAPSQACRVLQRVAFSVQKEVEKNLKSYLDDFHVESIDTARIIFNQVMEKEFEDGIINWGRIVTIFAFGGVLLKKLPQEQIALDVCAYKQVSSFVAEFIMNNTGEWIRQNGGWEDGFIKKFEPKSGWLTFLQMTGQIWEMLFLLK. The pIC50 is 4.9. (7) The drug is CC[C@H](O)CC1(OCC/C=C/c2coc(C(C)CCNC(=O)[C@@H](O)C(O)[C@H](COC)N(C)C)n2)O[C@H]([C@H](C[C@H](O)C(C)C2OC(C)(/C(C)=C/C=C/C(C)=C/C#N)[C@H](Br)[C@H]2C)OC)[C@H](OP(=O)(O)O)C1(C)C. The target protein (Q15257) has sequence MAEGERQPPPDSSEEAPPATQNFIIPKKEIHTVPDMGKWKRSQAYADYIGFILTLNEGVKGKKLTFEYRVSEMWNEVHEEKEQAAKQSVSCDECIPLPRAGHCAPSEAIEKLVALLNTLDRWIDETPPVDQPSRFGNKAYRTWYAKLDEEAENLVATVVPTHLAAAVPEVAVYLKESVGNSTRIDYGTGHEAAFAAFLCCLCKIGVLRVDDQIAIVFKVFNRYLEVMRKLQKTYRMEPAGSQGVWGLDDFQFLPFIWGSSQLIDHPYLEPRHFVDEKAVNENHKDYMFLECILFITEMKTGPFAEHSNQLWNISAVPSWSKVNQGLIRMYKAECLEKFPVIQHFKFGSLLPIHPVTSG. The pIC50 is 7.7.