Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pKd (pKd = -log10(Kd in M); higher means stronger binding). Dataset: bindingdb_kd.. Dataset: Drug-target binding data from BindingDB using Kd measurements (1) The small molecule is C/C=C/C=C/C=C/C(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@H]1COC(=O)[C@@H]2CCCN2C(=O)[C@H](C)NC(=O)[C@H](C)N(C)C(=O)[C@@H]2CCCN2C1=O. The target protein (P0A6G7) has sequence MSYSGERDNFAPHMALVPMVIEQTSRGERSFDIYSRLLKERVIFLTGQVEDHMANLIVAQMLFLEAENPEKDIYLYINSPGGVITAGMSIYDTMQFIKPDVSTICMGQAASMGAFLLTAGAKGKRFCLPNSRVMIHQPLGGYQGQATDIEIHAREILKVKGRMNELMALHTGQSLEQIERDTERDRFLSAPEAVEYGLVDSILTHRN. The pKd is 6.2. (2) The compound is COc1c2ccoc2cc2oc(=O)ccc12. The target protein (P16390) has sequence MTVVPGDHLLEPEAAGGGGGDPPQGGCGSGGGGGGCDRYEPLPPALPAAGEQDCCGERVVINISGLRFETQLKTLCQFPETLLGDPKRRMRYFDPLRNEYFFDRNRPSFDAILYYYQSGGRIRRPVNVPIDIFSEEIRFYQLGEEAMEKFREDEGFLREEERPLPRRDFQRQVWLLFEYPESSGPARGIAIVSVLVILISIVIFCLETLPEFRDEKDYPASPSQDVFEAANNSTSGAPSGASSFSDPFFVVETLCIIWFSFELLVRFFACPSKATFSRNIMNLIDIVAIIPYFITLGTELAERQGNGQQAMSLAILRVIRLVRVFRIFKLSRHSKGLQILGQTLKASMRELGLLIFFLFIGVILFSSAVYFAEADDPSSGFNSIPDAFWWAVVTMTTVGYGDMHPVTIGGKIVGSLCAIAGVLTIALPVPVIVSNFNYFYHRETEGEEQAQYMHVGSCQHLSSSAEELRKARSNSTLSKSEYMVIEEGGMNHSAFPQTPF.... The pKd is 4.0. (3) The drug is CSCC[C@H](NC(=O)[C@H](CC(N)=O)NC(=O)[C@H](C)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CC(=O)O)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](Cc1c[nH]c2ccccc12)NC(=O)[C@H](CO)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CC(=O)O)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@@H]1CCCN1C(=O)[C@@H](NC(=O)[C@H](CS)NC(=O)[C@H](Cc1cnc[nH]1)NC(=O)[C@@H]1CCCN1)C(C)C)C(=O)N[C@@H](CS)C(=O)N[C@@H](CC(C)C)C(=O)N1CCC[C@H]1C(=O)O. The target protein (P31946) has sequence MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSSWRVISSIEQKTERNEKKQQMGKEYREKIEAELQDICNDVLELLDKYLIPNATQPESKVFYLKMKGDYFRYLSEVASGDNKQTTVSNSQQAYQEAFEISKKEMQPTHPIRLGLALNFSVFYYEILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSENQGDEGDAGEGEN. The pKd is 8.2. (4) The drug is C=CC(=O)Nc1cc2c(Nc3ccc(F)c(Cl)c3)ncnc2cc1OCCCN1CCOCC1. The target is PFCDPK1(Pfalciparum). The pKd is 5.0. (5) The compound is CC(=O)NC1C(O)CC(OC2C(O)C(CO)OC(OC3C(CO)OC(O)C(O)C3O)C2O)(C(=O)O)OC1[C@H](O)[C@H](O)CO. The target protein sequence is MLAPGSSRVELFKRKNSTVPFEDKAGKVTERVVHSFRLPALVNVDGVMVAIADARYDTSNDNSLIDTVAKYSVDDGETWETQIAIKNSRVSSVSRVVDPTVIVKGNKLYVLVGSYYSSRSYWSSHGDARDWDILLAVGEVTKSIAGGKITASIKWGSPVSLKKFFPAEMEGMHTNQFLGGAGVAIVASNGNLVYPVQVTNKRKQVFSKIFYSEDDGKTWKFGKGRSDFGCSEPVALEWEGKLIINTRVDWKRRLVYESSDMGNTWVEAVGTLSRVWGPSPKSDHPGSQSSFTAVTIEGMRVMLFTHPLNFKGRWLRDRLNLWLTDNQRIYNVGQVSIGDENSAHSSVLYKDDKLYCLHEINTDEVYSLVFARLVGELRIIKSVLRSWKNWDSHLSSICTPADPAASSSESGCGPAVTTVGLVGFLSGNASQNVWEDAYRCVNASTANAERVRNGLKFAGVGGGALWPVSQQGQNQRYRFANHAFTLVASVTIHEAPRAAS.... The pKd is 3.6. (6) The compound is C[C@@H](O)[C@H](NC(=O)[C@H](CC(=O)O)NC(=O)[C@H](CC(=O)O)NC(=O)[C@@H](N)CCC(=O)O)C(=O)N[C@@H](Cc1ccc(OP(=O)(O)O)cc1)C(=O)N[C@@H](CC(=O)O)C(=O)NCC(=O)N[C@@H](Cc1ccc(O)cc1)C(=O)N[C@@H](CCC(=O)O)C(=O)O. The target protein sequence is PVSRNAAEYLLSSGINGSFLVRESESSPGQRSISLRYEGRVYHYRINTASDGKLYVSSESRFNTLAELVHHHSTVADGLITTLHYPAPKRNKPTVYGVSPNYDKWEME. The pKd is 4.2. (7) The small molecule is C[N+]1(C)[C@H]2CC(OC(=O)[C@H](CO)c3ccccc3)C[C@@H]1[C@H]1O[C@@H]21. The target protein sequence is MTLHSQSTTSPLFPQISSSWVHSPSEAGLPLGTVTQLGSYQISQETGQFSSQDTSSDPLGGHTIWQVVFIAFLTGFLALVTIIGNILVIVAFKVNKQLKTVNNYFLLSLASADLIIGVISMNLFTTYIIMNRWALGNLACDLWLSIDYVASNASVMNLLVISFDRYFSITRPLTYRAKRTTKRAGVMIGLAWVISFVLWAPAILFWQYFVGKRTVPPGECFIQFLSEPTITFGTAIAAFYMPVTIMTILYWRIYKETEKRTKELAGLQASGTEIEGRIEGRIEGRTRSQITKRKRMSLIKEKKAAQTLSAILLAFIITWTPYNIMVLVNTFADSAIPKTYWNLGYWLCYINSTVNPVAYALSNKTCRTTFKTLLLSQSDKRKRRKQQYQQRQSVIFHKRVPEQAL. The pKd is 9.2.