This data is from Drug-target binding data from BindingDB using Kd measurements. The task is: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pKd (pKd = -log10(Kd in M); higher means stronger binding). Dataset: bindingdb_kd. (1) The small molecule is NCCCC[C@@H]1NC(=O)[C@H](CO)NC(=O)[C@H](CO)NC(=O)[C@@H]2CSSC[C@@H](C(N)=O)NC(=O)[C@@H]3CSSC[C@H](NC(=O)[C@@H](N)CSSC[C@H](NC(=O)[C@H](Cc4c[nH]c5ccccc45)NC1=O)C(=O)N[C@@H](CCCN=C(N)N)C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](Cc1cnc[nH]1)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCCN=C(N)N)C(=O)N3)C(=O)N[C@H](CC(N)=O)C(=O)N2. The target protein (P04774) has sequence MEQTVLVPPGPDSFNFFTRESLAAIERRIAEEKAKNPKPDKKDDDENGPKPNSDLEAGKNLPFIYGDIPPEMVSEPLEDLDPYYINKKTFIVLNKGKAIFRFSATSALYILTPFNPLRKIAIKILVHSLFSMLIMCTILTNCVFMTMSNPPDWTKNVEYTFTGIYTFESLIKIIARGFCLEDFTFLRDPWNWLDFTVITFAYVTEFVDLGNVSALRTFRVLRALKTISVIPGLKTIVGALIQSVKKLSDVMILTVFCLSVFALIGLQLFMGNLRNKCVQWPPTNASLEEHSIEKNVTTDYNGTLVNETVFEFDWKSYIQDSRYHYFLEGVLDALLCGNSSDAGQCPEGYMCVKAGRNPNYGYTSFDTFSWAFLSLFRLMTQDFWENLYQLTLRAAGKTYMIFFVLVIFLGSFYLINLILAVVAMAYEEQNQATLEEAEQKEAEFQQMLEQLKKQQEAAQQAAAATASEHSREPSAAGRLSDSSSEASKLSSKSAKERRNR.... The pKd is 6.5. (2) The small molecule is CO[C@@H]1[C@H](N(C)C(=O)c2ccccc2)C[C@H]2O[C@]1(C)n1c3ccccc3c3c4c(c5c6ccccc6n2c5c31)C(=O)NC4. The target protein (O15146) has sequence MRELVNIPLVHILTLVAFSGTEKLPKAPVITTPLETVDALVEEVATFMCAVESYPQPEISWTRNKILIKLFDTRYSIRENGQLLTILSVEDSDDGIYCCTANNGVGGAVESCGALQVKMKPKITRPPINVKIIEGLKAVLPCTTMGNPKPSVSWIKGDSPLRENSRIAVLESGSLRIHNVQKEDAGQYRCVAKNSLGTAYSKVVKLEVEVFARILRAPESHNVTFGSFVTLHCTATGIPVPTITWIENGNAVSSGSIQESVKDRVIDSRLQLFITKPGLYTCIATNKHGEKFSTAKAAATISIAEWSKPQKDNKGYCAQYRGEVCNAVLAKDALVFLNTSYADPEEAQELLVHTAWNELKVVSPVCRPAAEALLCNHIFQECSPGVVPTPIPICREYCLAVKELFCAKEWLVMEEKTHRGLYRSEMHLLSVPECSKLPSMHWDPTACARLPHLDYNKENLKTFPPMTSSKPSVDIPNLPSSSSSSFSVSPTYSMTVIISI.... The pKd is 5.0. (3) The small molecule is Cc1ccc(NC(=O)c2cccc(C(F)(F)F)c2)cc1Nc1ncccc1-c1ncnc(Nc2ccc(OCCNC(=O)CCCCNC(=O)CCC3=[N+]4B(F)n5c(C)cc(C)c5C=C4C=C3)cc2)n1. The target protein sequence is MLEICLKLVGCKSKKGLSSSSSCYLEEALQRPVASDFEPQGLSEAARWNSKENLLAGPSENDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNSLEKHSWYHGPVSRNAAEYLLSSGINGSFLVRESESSPGQRSISLRYEGRVYHYRINTASDGKLYVSSESRFNTLAELVHHHSTVADGLITTLHYPAPKRNKPTVYGVSPNYDKWEMERTDITMKHKLGGGQYGVVYEGVWKKYSLTVAVKTLKEDTMEVEEFLKEAAVMKEIKHPNLVQLLGVCTREPPFYIITEFMTYGNLLDYLRECNRQEVNAVVLLYMATQISSAMEYLEKKNFIHRDLAARNCLVGENHLVKVADFGLSRLMTGDTYTAHAGAKFPIKWTAPESLAYNKFSIKSDVWAFGVLLWEIATYGMSPYPGIDLSQVYELLEKDYRMERPEGCPEKVYELMRACWQWNPSDRPSFAEIHQAFETMFQES.... The pKd is 7.3. (4) The drug is CSC1OC(CO)C(O)C(n2cc(C(=O)NCc3ccccc3)nn2)C1O. The target protein (P16110) has sequence MADSFSLNDALAGSGNPNPQGYPGAWGNQPGAGGYPGAAYPGAYPGQAPPGAYPGQAPPGAYPGQAPPSAYPGPTAPGAYPGPTAPGAYPGQPAPGAFPGQPGAPGAYPQCSGGYPAAGPYGVPAGPLTVPYDLPLPGGVMPRMLITIMGTVKPNANRIVLDFRRGNDVAFHFNPRFNENNRRVIVCNTKQDNNWGKEERQSAFPFESGKPFKIQVLVEADHFKVAVNDAHLLQYNHRMKNLREISQLGISGDITLTSANHAMI. The pKd is 4.0.