Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pKd (pKd = -log10(Kd in M); higher means stronger binding). Dataset: bindingdb_kd. From a dataset of Drug-target binding data from BindingDB using Kd measurements. (1) The compound is CCCC(=O)O. The target protein (Q9Z429) has sequence MQQFTIRTRLLMLVGAMFIGFITIELMGFSALQRGVASLNTVYLDRVVPLRDLKTIADLYAVKIVDSSHKARSGRMTYAQAEQEVKDAGRQIDMLWHAYQKTKKIDEEQRSVDALAKLVDEAQDPIERLKGILERGDKAALDTFVENEMYPLIDPLSEGLSHLTQIQVEESKRAYDAAVVLYDSSRTMLALLLLGILICGGVFATRLIRSIIHPLTTLKDAAARVALGDLSQSIQVSGRNEVTDVQQSVQAMQANLRNTLQDIQGSAAQLAAAAEELQTATESTAQGIHRQNDEMQMAATAVTEMSAAVDEVADNANRTSNASHEAMDLADGGRKQVMLTRETIDRLSGKLNETTRTVFRLAEEASNIGRVLDVIRAIAEQTKLLALNAAIEAAHAGEAGRGFAVVADEVRNLAQRTQTSTQEIERMISAIQSVTQEGVRDVQQSCEFAARSQTMSSEADQALTLIAERITEINGMNLVIASAAEEQAQVAREVDRNLVA.... The pKd is 4.0. (2) The pKd is 5.7. The target protein (Q9Y6R4) has sequence MREAAAALVPPPAFAVTPAAAMEEPPPPPPPPPPPPEPETESEPECCLAARQEGTLGDSACKSPESDLEDFSDETNTENLYGTSPPSTPRQMKRMSTKHQRNNVGRPASRSNLKEKMNAPNQPPHKDTGKTVENVEEYSYKQEKKIRAALRTTERDRKKNVQCSFMLDSVGGSLPKKSIPDVDLNKPYLSLGCSNAKLPVSVPMPIARPARQTSRTDCPADRLKFFETLRLLLKLTSVSKKKDREQRGQENTSGFWLNRSNELIWLELQAWHAGRTINDQDFFLYTARQAIPDIINEILTFKVDYGSFAFVRDRAGFNGTSVEGQCKATPGTKIVGYSTHHEHLQRQRVSFEQVKRIMELLEYIEALYPSLQALQKDYEKYAAKDFQDRVQALCLWLNITKDLNQKLRIMGTVLGIKNLSDIGWPVFEIPSPRPSKGNEPEYEGDDTEGELKELESSTDESEEEQISDPRVPEIRQPIDNSFDIQSRDCISKKLERLESE.... The compound is COc1cc2c(Oc3ccc(NC(=O)C4(C(=O)NC5=CCC(F)C=C5)CC4)cc3F)ccnc2cc1OCCCN1CCOCC1. (3) The drug is Oc1ccccc1Cl. The target protein (P42535) has sequence MSTYPINAPGQSADAAVLIVGGGPTGLIAANELLRRGVSCRMIDRLPVAHQTSKSCTIHARSMEMMEHIGIAARYIETGVRSNGFTFNFENTDANALLDFSVLPGRYPFITIYNQNETERVLRHDLEATYSFQPEWGTQLLALNQDENGIRADLRLKDGTKQTISPRWVIGADGVRSRVRECLGIAYEGEDYEENVLQMMDVGIQDFEAGDDWIHYFIGQDKFVFVTKLPGSNYRVIISDLGGANKSNLEETREAFQGYLSSFDDHATLDEPRWATKWRVWKRMATAYRKGNVFLAGDAAHCHSPSGGSGMNVGMQDAFNLGWKIAMVERGEAKPDLLDTYHTERTPVAQQLLEGTHAMHEIIMGHGKGLTDRIELTQAPGWHDAATYRVSGMSYNYRDQLVSFNDDRLAGPSAGDRIPDAELAPRIRLFDLVRNTRPTLLVAPATEAEVAEAEKLRDLIREQWPLVKPVLVRPQGSEESIEGDVHVDSYGQLKREWGDN.... The pKd is 3.0. (4) The small molecule is CSCC[C@H](N)C(=O)N[C@@H](CO)C(=O)NCC(=O)N[C@@H](CCCNC(=N)N)C(=O)NCC(=O)N[C@@H](CCCCN)C(=O)NCC(=O)NCC(=O)N[C@@H](CCCCN)C(=O)NCC(=O)N[C@@H](CC(C)C)C(=O)NCC(=O)N[C@@H](CCCCN)C(=O)NCC(=O)NCC(=O)N[C@@H](C)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCCNC(=N)N)C(=O)N[C@@H](Cc1cnc[nH]1)C(=O)N[C@@H](CCCNC(=N)N)C(=O)N[C@@H](CCCCN)C(=O)N[C@H](C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCNC(=N)N)C(=O)N[C@@H](CC(=O)O)C(=O)O)C(C)C. The target protein sequence is NPPPPETSNPNKPKRQTNQLQYLLRVVLKTLWKHQFAWPFQQPVDAVKLNLPDYYKIIKTPMDMGTIKKRLENNYYWNAQECIQDFNTMFTNCYIYNKPGADIVLMAEALEKLFLQKINELPT. The pKd is 5.2. (5) The small molecule is C[C@H](C[C@H](O)[C@H]1OC1(C)C)C1=C2C[C@H](O)[C@H]3[C@@]4(C)CCC(=O)C(C)(C)[C@@H]4CC[C@]3(C)[C@@]2(C)CC1. The target protein (P04191) has sequence MEAAHSKSTEECLAYFGVSETTGLTPDQVKRHLEKYGHNELPAEEGKSLWELVIEQFEDLLVRILLLAACISFVLAWFEEGEETITAFVEPFVILLILIANAIVGVWQERNAENAIEALKEYEPEMGKVYRADRKSVQRIKARDIVPGDIVEVAVGDKVPADIRILSIKSTTLRVDQSILTGESVSVIKHTEPVPDPRAVNQDKKNMLFSGTNIAAGKALGIVATTGVSTEIGKIRDQMAATEQDKTPLQQKLDEFGEQLSKVISLICVAVWLINIGHFNDPVHGGSWIRGAIYYFKIAVALAVAAIPEGLPAVITTCLALGTRRMAKKNAIVRSLPSVETLGCTSVICSDKTGTLTTNQMSVCKMFIIDKVDGDFCSLNEFSITGSTYAPEGEVLKNDKPIRSGQFDGLVELATICALCNDSSLDFNETKGVYEKVGEATETALTTLVEKMNVFNTEVRNLSKVERANACNSVIRQLMKKEFTLEFSRDRKSMSVYCSP.... The pKd is 4.6. (6) The drug is CO[C@]12CC[C@@]3(C[C@@H]1C(C)(C)O)[C@H]1Cc4ccc(O)c5c4[C@@]3(CCN1CC1CC1)[C@H]2O5. The target protein sequence is MDSPIQIFRGEPGPTCAPSACLPPNSSAWFPGWAEPDSNGSAGSEDAQLEPAHISPAIPVIITAVYSVVFVVGLVGNSLVMFVIIRYTKMKTATNIYIFNLALADALVTTTMPFQSTVYLMNSWPFGDVLCKIVISIDYYNMFTSIFTLTMMSVDRYIAVCHPVKALDFRTPLKAKIINICIWLLSSSVGISAIVLGGTKVREDVDVIECSLQFPDDDYSWWDLFMKICVFIFAFVIPVLIIIVCYTLMILRLKSVRLLSGSREKDRNLRRITRLVLVVVAVFVVCWTPIHIFILVEALGSTSHSTAALSSAYFCIALGYTNSSLNPILYAFLDENFKRCFRDFCFPLKMRMERQSTSRVRNTVQDPAYLRDIDGMNKPV. The pKd is 9.1. (7) The small molecule is CC(O)[C@H](NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H](CCCNC(N)=[NH2+])NC(=O)[C@H](CCCCNC(=O)C[C@@H](NC(=O)CCCCCNC(=O)[C@H]1O[C@@H](n2cc(I)c3c(N)ncnc32)[C@H](O)[C@@H]1O)C(=O)[O-])NC(=O)[C@H](CCCNC(N)=[NH2+])NC(=O)[C@H](C)[NH3+])C(=O)N[C@@H](C)C(=O)N[C@H](CCCCNC(=O)c1ccc(-c2c3ccc(=[N+](C)C)cc-3oc3cc(N(C)C)ccc23)c(C(=O)O)c1)C(N)=O. The target protein sequence is KGPVPFSHCLPTEKLQRCEKIGEGVFGEVFQTIADHTPVAIKIIAIEGPDLVNGSHQKTFEEILPEIIISKELSLLSGEVCNRTEGFIGLNSVHCVQGSYPPLLLKAWDHYNSTKGSANDRPDFFKDDQLFIVLEFEFGGIDLEQMRTKLSSLATAKSILHQLTASLAVAEASLRFEHRDLHWGNVLLKKTSLKKLHYTLNGKSSTIPSCGLQVSIIDYTLSRLERDGIVVFCDVSMDEDLFTGDGDYQFDIYRLMKKENNNRWGEYHPYSNVLWLHYLTDKMLKQMTFKTKCNTPAMKQIKRKIQEFHRTMLNFSSATDLLCQHSLFK. The pKd is 9.7. (8) The small molecule is COc1cc(-c2cn(C)c(=O)c3cnccc23)cc(OC)c1CN(C)C. The target protein (Q9BXF3) has sequence MCPEEGGAAGLGELRSWWEVPAIAHFCSLFRTAFRLPDFEIEELEAALHRDDVEFISDLIACLLQGCYQRRDITPQTFHSYLEDIINYRWELEEGKPNPLREASFQDLPLRTRVEILHRLCDYRLDADDVFDLLKGLDADSLRVEPLGEDNSGALYWYFYGTRMYKEDPVQGKSNGELSLSRESEGQKNVSSIPGKTGKRRGRPPKRKKLQEEILLSEKQEENSLASEPQTRHGSQGPGQGTWWLLCQTEEEWRQVTESFRERTSLRERQLYKLLSEDFLPEICNMIAQKGKRPQRTKAELHPRWMSDHLSIKPVKQEETPVLTRIEKQKRKEEEEERQILLAVQKKEQEQMLKEERKRELEEKVKAVEGMCSVRVVWRGACLSTSRPVDRAKRRKLREERAWLLAQGKELPPELSHLDPNSPMREEKKTKDLFELDDDFTAMYKVLDVVKAHKDSWPFLEPVDESYAPNYYQIIKAPMDISSMEKKLNGGLYCTKEEFV.... The pKd is 6.7. (9) The small molecule is Oc1ccc(-c2nc(-c3ccncc3)c(-c3ccc(F)cc3)[nH]2)cc1. The target protein sequence is MNKMKNFKRRFSLSVPRTETIEESLAEFTEQFNQLHNRRNENLQLGPLGRDPPQECSTFSPTDSGEEPGQLSPGVQFQRRQNQRRFSMEDVSKRLSLPMDIRLPQEFLQKLQMESPDLPKPLSRMSRRASLSDIGFGKLETYVKLDKLGEGTYATVFKGRSKLTENLVALKEIRLEHEEGAPCTAIREVSLLKNLKHANIVTLHDLIHTDRSLTLVFEYLDSDLKQYLDHCGNLMSMHNVKIFMFQLLRGLAYCHHRKILHRDLKPQNLLINERGELKLADFGLARAKSVPTKTYSNEVVTLWYRPPDVLLGSTEYSTPIDMWGVGCIHYEMATGRPLFPGSTVKEELHLIFRLLGTPTEETWPGVTAFSEFRTYSFPCYLPQPLINHAPRLDTDGIHLLSSLLLYESKSRMSAEAALSHSYFRSLGERVHQLEDTASIFSLKEIQLQKDPGYRGLAFQQPGRGKNRRQSIF. The pKd is 5.0.