This data is from Drug-target binding data from BindingDB using Kd measurements. The task is: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pKd (pKd = -log10(Kd in M); higher means stronger binding). Dataset: bindingdb_kd. (1) The small molecule is OCCc1cc(-c2ccccc2OCc2ccc(Br)cc2)no1. The target protein (P34158) has sequence MQKSPLEKASFISKLFFSWTTPILRKGYRHHLELSDIYQAPSSDSADHLSEKLEREWDREQASKKKPQLIHALRRCFVWRFVFYGVLLYLGEVTKAVQPVLLGRIIASYDPDNTEERSIAIYLGIGLCLLFIVRTLLLHPAIFGLHHIGMQMRIAMFSLIYKKTLKLSSRVLDKISIGQLISLLSNNLNKFDEGLALAHFIWIAPLQVVLLMGLLWDLLQFSAFCGLGLLIVLVIFQAILGKMMVKYRDKRAAKINERLVITSEVIDNIYSVKAYCWESAMEKIIESLREEELKMTRRSAYMRFFTSSAFFFSGFFVVFLSVLPYTVINGIVLRKIFTTISFCIVLRMSVTRQFPTAVQIWYDSLGMIRKIQDFLQTQEYKVLEYNLMFTGLVMENVTAFWEEGFQELLEKVQLNNDDRKTSNGENHLSFSHLCLVGNPVLKNINLNIKKGEMLAITGSTGAGKTSLLMLILGELEASEGIIKHSGRVSFSSQISWIMPG.... The pKd is 4.6. (2) The drug is O=C(O)[C@H](Cc1ccccc1)N1C(=O)/C(=C/c2ccc(Br)cc2)SC1=S. The target protein sequence is MKRKKILIVGAGFSGAVIGRQLAEKGHQVHIIDQRDHIGGNSYDARDSETNVMVHVYGPHIFHTDNESVWNYVNKHAEMMPYVNRVKATVNGQVFSLPINLHTINQFFSKTCSPDEARALIAEKGDSTIADPQTFEEQALRFIGKELYEAFFKGYTIKQWGMQPSELPASILKRLPVRFNYDDNYFNHKFQGMPKCGYTQMIKSILKHENIKVDLQREFIVDERTHYDHVFYSGPLDAFYGYQYGRLGYRTLDFKKFIYQGDYQGCAVMNYCSVDVPYTRITEHKYFSPWEQHDGSVCYKEYSRACEENDIPYYPIRQMGEMALLEKYLSLAENETNITFVGRLGTYRYLDMDVTIAEALKTAEVYLNSLTENQPMPVFTVSVR. The pKd is 4.5. (3) The drug is CC[C@H](C)[C@H](NC(=O)[C@@H]1CCCN1C(=O)[C@H](CO)NC(=O)[C@@H](N)CO)C(=O)N[C@@H](Cc1ccc(OP(=O)(O)O)cc1)C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](C)C(=O)N[C@@H](C)C(=O)O. The target protein sequence is PVSRNAAEYLLSSGINGSFLVRESESSPGQRSISLRYEGRVYHYRINTASDGKLYVSSESRFNTLAELVHHHSTVADGLITTLHYPAPKRNKPTVYGVSPNYDKWEME. The pKd is 5.5. (4) The compound is CN(C)C[C@@H]1CCn2cc(c3ccccc32)C2=C(C(=O)NC2=O)c2cn(c3ccccc23)CCO1. The target protein (Q92772) has sequence MEKYENLGLVGEGSYGMVMKCRNKDTGRIVAIKKFLESDDDKMVKKIAMREIKLLKQLRHENLVNLLEVCKKKKRWYLVFEFVDHTILDDLELFPNGLDYQVVQKYLFQIINGIGFCHSHNIIHRDIKPENILVSQSGVVKLCDFGFARTLAAPGEVYTDYVATRWYRAPELLVGDVKYGKAVDVWAIGCLVTEMFMGEPLFPGDSDIDQLYHIMMCLGNLIPRHQELFNKNPVFAGVRLPEIKEREPLERRYPKLSEVVIDLAKKCLHIDPDKRPFCAELLHHDFFQMDGFAERFSQELQLKVQKDARNVSLSKKSQNRKKEKEKDDSLVEERKTLVVQDTNADPKIKDYKLFKIKGSKIDGEKAEKGNRASNASCLHDSRTSHNKIVPSTSLKDCSNVSVDHTRNPSVAIPPLTHNLSAVAPSINSGMGTETIPIQGYRVDEKTKKCSIPFVKPNRHSPSGIYNINVTTLVSGPPLSDDSGADLPQMEHQH. The pKd is 5.0. (5) The drug is CC(C)C[C@H](NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](Cc1cnc[nH]1)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CCOC(C(F)(F)F)(C(F)(F)F)C(F)(F)F)NC(=O)[C@H](CCCCN)NC(=O)[C@H](Cc1cnc[nH]1)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCC(=O)O)NC(=O)[C@@H](NC(=O)[C@H](CC(C)C)NC(=O)[C@@H](N)CS)[C@@H](C)O)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CCC(=O)O)C(=O)O. The target protein sequence is IRKDRRGGRMLKHKRQRDDGEGRGEVGSAGDMRAANLWPSPLMIKRSKKNSLALSLTADQMVSALLDAEPPILYSEYDPTRPFSEASMMGLLTNLADRELVHMINWAKRVPGFVDLTLHDQVHLLECAWLEILMIGLVWRSMEHPGKLLFAPNLLLDRNQGKCVEGMVEIFDMLLATSSRFRMMNLQGEEFVCLKSIILLNSGVYTFLSSTLKSLEEKDHIHRVLDKITDTLIHLMAKAGLTLQQQHQRLAQLLLILSHIRHMSNKGMEHLYSMKCKNVVPLYDLLLEMLDAHRLHAPTSRGGASVEETDQSHLATAGSTSSHSLQKYYITGEAEGFPATV. The pKd is 5.2. (6) The small molecule is COc1cc2c(N3CCN(C(=O)Nc4ccc(OC(C)C)cc4)CC3)ncnc2cc1OCCCN1CCCCC1. The target protein sequence is MRHSKRTYCPDWDDKDWDYGKWRSSSSHKRRKRSHSSAQENKRCKYNHSKMCDSHYLESRSINEKDYHSRRYIDEYRNDYTQGCEPGHRQRDHESRYQNHSSKSSGRSGRSSYKSKHRIHHSTSHRRSHGKSHRRKRTRSVEDDEEGHLICQSGDVLSARYEIVDTLGEGAFGKVVECIDHKAGGRHVAVKIVKNVDRYCEAARSEIQVLEHLNTTDPNSTFRCVQMLEWFEHHGHICIVFELLGLSTYDFIKENGFLPFRLDHIRKMAYQICKSVNFLHSNKLTHTDLKPENILFVQSDYTEAYNPKIKRDERTLINPDIKVVDFGSATYDDEHHSTLVSTRHYRAPEVILALGWSQPCDVWSIGCILIEYYLGFTVFPTHDSKEHLAMMERILGPLPKHMIQKTRKRKYFHHDRLDWDEHSSAGRYVSRACKPLKEFMLSQDVEHERLFDLIQKMLEYDPAKRITLREALKHPFFDLLKKSI. The pKd is 6.2. (7) The drug is COc1cccc(C(=O)NC2C(O)[C@H](O[C@@H]3OC(CO)[C@H](O)C(OCc4ccc5ccccc5c4)C3O)C(CO)O[C@H]2OCCNC(=O)CCCCCN2C(=O)C=CC2=O)c1. The target protein (P47929) has sequence MSNVPHKSSLPEGIRPGTVLRIRGLVPPNASRFHVNLLCGEEQGSDAALHFNPRLDTSEVVFNSKEQGSWGREERGPGVPFQRGQPFEVLIIASDDGFKAVVGDAQYHHFRHRLPLARVRLVEVGGDVQLDSVRIF. The pKd is 3.9. (8) The pKd is 9.1. The compound is Cc1nc(Nc2ncc(C(=O)Nc3c(C)cccc3Cl)s2)cc(N2CCN(CCO)CC2)n1. The target protein sequence is MRGARGAWDFLCVLLLLLRVQTGSSQPSVSPGEPSPPSIHPGKSDLIVRVGDEIRLLCTDPGFVKWTFEILDETNENKQNEWITEKAEATNTGKYTCTNKHGLSNSIYVFVRDPAKLFLVDRSLYGKEDNDTLVRCPLTDPEVTNYSLKGCQGKPLPKDLRFIPDPKAGIMIKSVKRAYHRLCLHCSVDQEGKSVLSEKFILKVRPAFKAVPVVSVSKASYLLREGEEFTVTCTIKDVSSSVYSTWKRENSQTKLQEKYNSWHHGDFNYERQATLTISSARVNDSGVFMCYANNTFGSANVTTTLEVVDKGFINIFPMINTTVFVNDGENVDLIVEYEAFPKPEHQQWIYMNRTFTDKWEDYPKSENESNIRYVSELHLTRLKGTEGGTYTFLVSNSDVNAAIAFNVYVNTKPEILTYDRLVNGMLQCVAAGFPEPTIDWYFCPGTEQRCSASVLPVDVQTLNSSGPPFGKLVVQSSIDSSAFKHNGTVECKAYNDVGKT....