Dataset: Drug-target binding data from BindingDB using Kd measurements. Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pKd (pKd = -log10(Kd in M); higher means stronger binding). Dataset: bindingdb_kd. (1) The compound is CSCC[C@H](NC(=O)[C@@H](NC(=O)[C@@H](NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](Cc1c[nH]c2ccccc12)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H](CC(C)C)NC(=S)Nc1ccc2c(c1)C(=O)OC21c2ccc(O)cc2Oc2cc(O)ccc21)[C@@H](C)O)C(C)C)C(=O)O. The target protein sequence is NENEPREADKSHPEQRELRPRLCTMKKGPSGYGFNLHSDKSKPGQFIRSVDPDSPAEASGLRAQDRIVEVNGVCMEGKQHGDVVSAIRAGGDETKLLVVDRETDEFFKKCRVIPSQEHLNGPLPVPFTNGEIQKENSREALAEAALESPRPALVRSASSDTSEELNSQ. The pKd is 5.6. (2) The small molecule is CCN1CCN(Cc2ccc(-c3cc4c(N[C@H](C)c5ccccc5)ncnc4[nH]3)cc2)CC1. The target protein sequence is GEAPNQALLRILKETEFKKIKVLGSGAFGTVYKGLWIPEGEKVKIPVAIKELREATSPKANKEILDEAYVMASVDNPHVCRLLGICLTSTVQLIMQLMPFGCLLDYVREHKDNIGSQYLLNWCVQIAKGMNYLEDRRLVHRDLAARNVLVKTPQHVKITDFGRAKLLGAEEKEYHAEGGKVPIKWMALESILHRIYTHQSDVWSYGVTVWELMTFGSKPYDGIPASEISSILEKGERLPQPPICTIDVYMIMVKCWMIDADSRPKFRELIIEFSKMARDPQRYLVIQGDERMHLPSPTDSNFYRALMDEEDMDDVVDADEYLIPQQG. The pKd is 7.7. (3) The drug is C=CC1=C(C)c2cc3[n-]c(cc4[n-]c(cc5nc(cc1n2)C(C)=C5C=C)c(C)c4CCC(=O)O)c(CCC(=O)O)c3C. The target protein sequence is HHHHHHHHMLARALLLCAVLALSHTANPCCSHPCQNRGVCMSVGFDQYKCDCTRTGFYGENCSTPEFLTRIKLFLKPTPNTVHYILTHFKGFWNVVNNIPFLRNAIMSYVLTSRSHLIDSPPTYNADYGYKSWEAFSNLSYYTRALPPVPDDCPTPLGVKGKKQLPDSNEIVEKLLLRRKFIPDPQGSNMMFAFFAQHFTHQFFKTDHKRGPAFTNGLGHGVDLNHIYGETLARQRKLRLFKDGKMKYQIIDGEMYPPTVKDTQAEMIYPPQVPEHLRFAVGQEVFGLVPGLMMYATIWLREHNRVCDVLKQEHPEWGDEQLFQTSRLILIGETIKIVIEDYVQHLSGYHFKLKFDPELLFNKQFQYQNRIAAEFNTLFHWHPLLPDTFQIHDQKYNYQQFIYNNSILLEHGITQFVESFTRQIAGRVAGGRNVPPAVQKVSQASIDQSRQMKYQSFNEYRKRFMLKPYESFEELTGEKEMSAELEALYGDIDAVELYPA.... The pKd is 5.8. (4) The small molecule is CC[C@H](C)[C@H](NC(=O)[C@H](CCCCN)NC(=O)[C@H](Cc1cnc[nH]1)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@@H](NC(=O)[C@H](CC(C)C)NC(=O)[C@@H](N)CS)[C@@H](C)O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](Cc1cnc[nH]1)C(=O)N[C@@H](CCCNC(=N)N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCC(=O)O)C(=O)O. The target protein sequence is IRKDRRGGRMLKHKRQRDDGEGRGEVGSAGDMRAANLWPSPLMIKRSKKNSLALSLTADQMVSALLDAEPPILYSEYDPTRPFSEASMMGLLTNLADRELVHMINWAKRVPGFVDLTLHDQVHLLECAWLEILMIGLVWRSMEHPGKLLFAPNLLLDRNQGKCVEGMVEIFDMLLATSSRFRMMNLQGEEFVCLKSIILLNSGVYTFLSSTLKSLEEKDHIHRVLDKITDTLIHLMAKAGLTLQQQHQRLAQLLLILSHIRHMSNKGMEHLYSMKCKNVVPLYDLLLEMLDAHRLHAPTSRGGASVEETDQSHLATAGSTSSHSLQKYYITGEAEGFPATV. The pKd is 6.0. (5) The pKd is 6.8. The target protein sequence is NPPPPETSNPNKPKRQTNQLQYLLRVVLKTLWKHQFAWAFQQPVDAVKLNLPDYYKIIKTPMDMGTIKKRLENNYYWNAQECIQDFNTMFTNCYIYNKPGDDIVLMAEALEKLFLQKINELPT. The compound is Cc1sc2c(c1C)C(c1ccc(Cl)cc1)=N[C@@H](CC(=O)OC(C)(C)C)c1nnc(C)n1-2. (6) The compound is CSCC[C@H](N)C(=O)N[C@@H](CO)C(=O)NCC(=O)N[C@@H](CCCNC(=N)N)C(=O)NCC(=O)N[C@@H](CCCCN)C(=O)NCC(=O)NCC(=O)N[C@@H](CCCCN)C(=O)NCC(=O)N[C@@H](CC(C)C)C(=O)NCC(=O)N[C@@H](CCCCN)C(=O)NCC(=O)NCC(=O)N[C@@H](C)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCCNC(=N)N)C(=O)N[C@@H](Cc1cnc[nH]1)C(=O)N[C@@H](CCCNC(=N)N)C(=O)N[C@@H](CCCCN)C(=O)N[C@H](C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCNC(=N)N)C(=O)N[C@@H](CC(=O)O)C(=O)O)C(C)C. The target protein sequence is NPPPPETSNPNKPKRQTNQLQYLLRVVLKTLWKHQFAWPFQQPVDAVKLNLPDYYKIIKTPMDMGTIKKRLENNYYWNAQECIQDFNTMFTNCYIYNKPGDDAVLMAEALEKLFLQKINELPT. The pKd is 4.8. (7) The compound is COc1ccc(NC(=O)CCC(=O)O)cc1. The target protein sequence is APITAYAQQTRGLLGCIITSLTGRDKNQVEGEVQIVSTAAQTFLATCINGVCWTVYHGAGTRTIASSKGPVIQMYTNVDQDLVGWPAPQGARSLTPCTCGSSDLYLVTRHADVIPVRRRGDGRGSLLSPRPISYLKGSSGGPLLCPAGHAVGIFRAAVCTRGVAKAVDFIPVEGLETTMRSPVFSDNSSPPAVPQSYQVAHLHAPTGSGKSTKVPAAYAAQGYKVLVLNPSVAATLGFGAYMSKAHGIDPNIRTGVRTITTGSPITYSTYGKFLADGGCSGGAYDIIICDECHSTDATSILGIGTVLDQAETAGARLTVLATATPPGSVTVPHPNIEEVALSTTGEIPFYGKAIPLEAIKGGRHLIFCHSKKKCDELAAKLVALGVNAVAYYRGLDVSVIPASGDVVVVATDALMTGFTGDFDSVIDCNTCVTQTVDFSLDPTFTIETTTLPQDAVSRTQRRGRTGRGKPGIYRFVTPGERPSGMFDSSVLCECYDAGCA.... The pKd is 4.0. (8) The compound is N[C@@H](CCC(=O)N[C@@H](CS)C(=O)NCC(=O)O)C(=O)O. The pKd is 6.6. The target protein sequence is MKLTTKLSFIAAGLMIFNSAQASNKTFINCVSRAPSSFSPALVMEGISYNASSQQVYNRLIEFKRGSTDIEPALAESWQISDDGLTYTFYLRKGVKFHKTKDYQPSREFNADDVIFSFQRQLDKTHPYHEVSKGTYPYFNAMKFPKLLQAVEKVDDYTVKITLTKPDATFLASLGMDFISIYSAEYADKMMKAGTPEKVDTTPIGTGPFIFAGYQLEQKIRFLANPDYWQPKAEIDRLIFDIVPDAGTRYAKLQSGACDMIDFPNIADLAKMKADPKIQLMSKEGLNIAYIAFNTEKAPFDNVKVRQALNYATDKKAIIDVVYQGAGVMAKNVLPPTIWSYNDDVQDYPFDIEKAKQLLAEAGYPNGFETEIWVQPVVRASNPNPRRMAEVIQNDWAKAGVKAKLVSYEWGDYIKRTKAGELTAGTYGWSGDNGDPDNFLSPLLGTENIGNSNYARWSNAEFDALLSKAISLSNQAERAELYKQAQVIAKEQAPWITVAH.... (9) The small molecule is CCOc1cc2ncc(C#N)c(Nc3ccc(OCc4ccccn4)c(Cl)c3)c2cc1NC(=O)/C=C/CN(C)C. The pKd is 5.0. The target protein (P61075) has sequence MEKYHGLEKIGEGTYGVVYKAQNNYGETFALKKIRLEKEDEGIPSTTIREISILKELKHSNIVKLYDVIHTKKRLVLVFEHLDQDLKKLLDVCEGGLESVTAKSFLLQLLNGIAYCHDRRVLHRDLKPQNLLINREGELKIADFGLARAFGIPVRKYTHEVVTLWYRAPDVLMGSKKYSTTIDIWSVGCIFAEMVNGTPLFPGVSEADQLMRIFRILGTPNSKNWPNVTELPKYDPNFTVYEPLPWESFLKGLDESGIDLLSKMLKLDPNQRITAKQALEHAYFKENN.