Dataset: Drug-target binding data from BindingDB using Kd measurements. Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pKd (pKd = -log10(Kd in M); higher means stronger binding). Dataset: bindingdb_kd. (1) The small molecule is COc1cc([C@@H]2c3cc4c(cc3[C@H](O)[C@H]3COC(=O)[C@H]23)OCO4)cc(OC)c1OC. The target protein (P03118) has sequence MENLSSRLDLLQEQLMNLYEQDSKLIEDQIKQWNLIRQEQVLFHFARKNGVMRIGLQAVPSLASSQEKAKTAIEMVLHLESLKDSPYGTEDWSLQDTSRELFLAPPAGTFKKSGSTLEVTYDNNPDNQTRHTIWNHVYYQNGDDVWRKVSSGVDAVGVYYLEHDGYKNYYVLFAEEASKYSTTGQYAVNYRGKRFTNVMSSTSSPRAAGAPAVHSDYPTLSESDTAQQSTSIDYTELPGQGETSQVRQRQQKTPVRRRPYGRRRSRSPRGGGRREGESTPSRTPGSVPSARDVGSIHTTPQKGHSSRLRRLLQEAWDPPVVCVKGGANQLKCLRYRLKASTQVDFDSISTTWHWTDRKNTERIGSARMLVKFIDEAQREKFLERVALPRSVSVFLGQFNGS. The pKd is 6.0. (2) The small molecule is CC(C)(C)c1cc(NC(=O)Nc2ccc(-c3cn4c(n3)sc3cc(OCCN5CCOCC5)ccc34)cc2)no1. The target protein (Q8NG66) has sequence MLKFQEAAKCVSGSTAISTYPKTLIARRYVLQQKLGSGSFGTVYLVSDKKAKRGEELKVLKEISVGELNPNETVQANLEAQLLSKLDHPAIVKFHASFVEQDNFCIITEYCEGRDLDDKIQEYKQAGKIFPENQIIEWFIQLLLGVDYMHERRILHRDLKSKNVFLKNNLLKIGDFGVSRLLMGSCDLATTLTGTPHYMSPEALKHQGYDTKSDIWSLACILYEMCCMNHAFAGSNFLSIVLKIVEGDTPSLPERYPKELNAIMESMLNKNPSLRPSAIEILKIPYLDEQLQNLMCRYSEMTLEDKNLDCQKEAAHIINAMQKRIHLQTLRALSEVQKMTPRERMRLRKLQAADEKARKLKKIVEEKYEENSKRMQELRSRNFQQLSVDVLHEKTHLKGMEEKEEQPEGRLSCSPQDEDEERWQGREEESDEPTLENLPESQPIPSMDLHELESIVEDATSDLGYHEIPEDPLVAEEYYADAFDSYCEESDEEEEEIALE.... The pKd is 5.0. (3) The drug is O=S(=O)([O-])OC[C@H]1O[C@H](O[C@@H]2[C@H](OS(=O)(=O)[O-])[C@@H](O[C@@H]3[C@@H](OCCCCCCCCn4cc(-c5ccccc5)nn4)O[C@H](COS(=O)(=O)[O-])[C@@H](OS(=O)(=O)[O-])[C@@H]3OS(=O)(=O)[O-])O[C@H](COS(=O)(=O)[O-])[C@H]2OS(=O)(=O)[O-])[C@@H](OS(=O)(=O)[O-])[C@@H](O[C@H]2O[C@H](COS(=O)(=O)[O-])[C@@H](OS(=O)(=O)[O-])[C@H](OS(=O)(=O)[O-])[C@@H]2OS(=O)(=O)[O-])[C@@H]1OS(=O)(=O)[O-]. The target protein (Q9Y251) has sequence MLLRSKPALPPPLMLLLLGPLGPLSPGALPRPAQAQDVVDLDFFTQEPLHLVSPSFLSVTIDANLATDPRFLILLGSPKLRTLARGLSPAYLRFGGTKTDFLIFDPKKESTFEERSYWQSQVNQDICKYGSIPPDVEEKLRLEWPYQEQLLLREHYQKKFKNSTYSRSSVDVLYTFANCSGLDLIFGLNALLRTADLQWNSSNAQLLLDYCSSKGYNISWELGNEPNSFLKKADIFINGSQLGEDFIQLHKLLRKSTFKNAKLYGPDVGQPRRKTAKMLKSFLKAGGEVIDSVTWHHYYLNGRTATKEDFLNPDVLDIFISSVQKVFQVVESTRPGKKVWLGETSSAYGGGAPLLSDTFAAGFMWLDKLGLSARMGIEVVMRQVFFGAGNYHLVDENFDPLPDYWLSLLFKKLVGTKVLMASVQGSKRRKLRVYLHCTNTDNPRYKEGDLTLYAINLHNVTKYLRLPYPFSNKQVDKYLLRPLGPHGLLSKSVQLNGLTL.... The pKd is 6.5. (4) The compound is O=c1ncn2nc(Sc3ccc(F)cc3F)ccc2c1-c1c(Cl)cccc1Cl. The pKd is 5.0. The target protein (Q8TD19) has sequence MSVLGEYERHCDSINSDFGSESGGCGDSSPGPSASQGPRAGGGAAEQEELHYIPIRVLGRGAFGEATLYRRTEDDSLVVWKEVDLTRLSEKERRDALNEIVILALLQHDNIIAYYNHFMDNTTLLIELEYCNGGNLYDKILRQKDKLFEEEMVVWYLFQIVSAVSCIHKAGILHRDIKTLNIFLTKANLIKLGDYGLAKKLNSEYSMAETLVGTPYYMSPELCQGVKYNFKSDIWAVGCVIFELLTLKRTFDATNPLNLCVKIVQGIRAMEVDSSQYSLELIQMVHSCLDQDPEQRPTADELLDRPLLRKRRREMEEKVTLLNAPTKRPRSSTVTEAPIAVVTSRTSEVYVWGGGKSTPQKLDVIKSGCSARQVCAGNTHFAVVTVEKELYTWVNMQGGTKLHGQLGHGDKASYRQPKHVEKLQGKAIRQVSCGDDFTVCVTDEGQLYAFGSDYYGCMGVDKVAGPEVLEPMQLNFFLSNPVEQVSCGDNHVVVLTRNKE.... (5) The compound is CC1=NN(C(=O)c2ccc(Cl)cc2)C(=O)C1/N=N/c1ccc(S(=O)(=O)Nc2ncccn2)cc1. The target protein sequence is SNIEQYIHDLDSNSFELDLQFSEDEKRLLLEKQAGGNPWHQFVENNLILKMGPVDKRKGLFARRRQLLLTEGPHLYYVDPVNKVLKGEIPWSQELRPEAKNFKTFFVHTPNRTYYLMDPSGNAHKWCRKIQEVWRQRYQSHPDAAVQ. The pKd is 5.8. (6) The compound is CO[C@]12CC[C@@]3(C[C@@H]1C(C)(C)O)[C@H]1Cc4ccc(O)c5c4[C@@]3(CCN1CC1CC1)[C@H]2O5. The target protein sequence is MDSPIQIFRGEPGPTCAPSACLPPNSSAWFPGWAEPDSNGSAGSEDAQLEPAHISPAIPVIITAVYSVVFVVGLVGNSLVMFVIIRYTKMKTATNIYIFNLALADALVTTTMPFQSTVYLMNSWPFGDVLCKIVISIDYYNMFTSIFTLTMMSVDRYIAVCHPVKALDFRTPLKAKIINICIWLLSSSVGISAIVLGGTKVREDVDVIECSLQFPDDDYSWWDLFMKICVFIFAFVIPVLIIIVCYTLMILRLKSVRLLSGSREKDRNLRRITRLVLVVVAVFVVCWTPIAIFILVEALGSTSHSTAALSSYYFCIALGYTNSSLNPILYAFLDENFKRCFRDFCFPLKMRMERQSTSRVRNTVQDPAYLRDIDGMNKPV. The pKd is 8.2. (7) The drug is N#Cc1cnc2cc(OC[C@H](O)CO)c(NC(=O)C[C@H]3CCSS3)cc2c1Nc1ccc(OCc2cccc(F)c2)c(Cl)c1. The target protein sequence is MRPSGTAGAALLALLAALCPASRALEEKKVCQGTSNKLTQLGTFEDHFLSLQRMFNNCEVVLGNLEITYVQRNYDLSFLKTIQEVAGYVLIALNTVERIPLENLQIIRGNMYYENSYALAVLSNYDANKTGLKELPMRNLQEILHGAVRFSNNPALCNVESIQWRDIVSSDFLSNMSMDFQNHLGSCQKCDPSCPNGSCWGAGEENCQKLTKIICAQQCSGRCRGKSPSDCCHNQCAAGCTGPRESDCLVCRKFRDEATCKDTCPPLMLYNPTTYQMDVNPEGKYSFGATCVKKCPRNYVVTDHGSCVRACGADSYEMEEDGVRKCKKCEGPCRKVCNGIGIGEFKDSLSINATNIKHFKNCTSISGDLHILPVAFRGDSFTHTPPLDPQELDILKTVKEITGFLLIQAWPENRTDLHAFENLEIIRGRTKQHGQFSLAVVSLNITSLGLRSLKEISDGDVIISGNKNLCYANTINWKKLFGTSGQKTKIISNRGENSCK.... The pKd is 7.3. (8) The compound is Cc1sc2c(c1C)C(c1ccc(Cl)cc1)=N[C@@H](CC(=O)OC(C)(C)C)c1nnc(C)n1-2. The target protein sequence is NPPPPETSNPNKPKRQTNQLQYLLRVVLKTLWKHQFAWPFQQPVDAVKLNLPDYYKIIKTPMDMGTIKKRLENNYYWNAQECIQDFNTMFTNCYIYAKPGDDIVLMAEALEKLFLQKINELPT. The pKd is 8.1. (9) The drug is CC(C)c1ccc(C2=CCC(C)(C)c3ccc(C#Cc4ccc(C(=O)O)cc4)cc32)cc1. The target protein (P10826) has sequence MTTSGHACPVPAVNGHMTHYPATPYPLLFPPVIGGLSLPPLHGLHGHPPPSGCSTPSPATIETQSTSSEELVPSPPSPLPPPRVYKPCFVCQDKSSGYHYGVSACEGCKGFFRRSIQKNMIYTCHRDKNCVINKVTRNRCQYCRLQKCFEVGMSKESVRNDRNKKKKETSKQECTESYEMTAELDDLTEKIRKAHQETFPSLCQLGKYTTNSSADHRVRLDLGLWDKFSELATKCIIKIVEFAKRLPGFTGLTIADQITLLKAACLDILILRICTRYTPEQDTMTFSDGLTLNRTQMHNAGFGPLTDLVFTFANQLLPLEMDDTETGLLSAICLICGDRQDLEEPTKVDKLQEPLLEALKIYIRKRRPSKPHMFPKILMKITDLRSISAKGAERVITLKMEIPGSMPPLIQEMLENSEGHEPLTPSSSGNTAEHSPSISPSSVENSGVSQSPLVQ. The pKd is 8.0. (10) The compound is CC[C@H]1O[C@@H](n2cnc3c(N)ncnc32)[C@H]2O[B-]3(OCc4cc(F)ccc43)OC21. The target protein sequence is MSGPVTFEKTFRRDALIDIEKKYQKVWAEEKVFEVDAPTFEECPIEDVEQVQEAHPKFFATMAYPYMNGVLHAGHAFTLSKVEFATGFQRMNGKRALFPLGFHCTGMPIKAAADKIKREVELFGSDFSKAPIDDEDAVESQQPAKTETKREDVTKFSSKKSKAAAKQGRAKFQYEIMMQLGIPREEVAKFANTDYWLEFFPPLCQKDVTAFGARVDWRRSMITTDANPYYDAFVRWQINRLRDVGKIKFGERYTIYSEKDGQACLDHDRQSGEGVGPQEYVGIKIRLTDVAPQAQELFKKENLDVKENKVYLVAATLRPETMYGQTCCFVSPKIDYGVFDAGNGDYFITTERAFKNMSFQNLTPKRGYYKPLFIINGKTLIGSRIDAPYAVNKNLRVLPMETVLATKGTGVVTCVPSDSPDDFVTTRDLANKPEYYGIEKDWVQTDIVPIVHTEKYGDKCAEFLVNDLKIQSPKDSVQLANAKELAYKEGFYNGTMLIGK.... The pKd is 6.2.