Dataset: Drug-target binding data from BindingDB using IC50 measurements. Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50. (1) The small molecule is Cc1nnn2c1-c1ccc(Cn3cccn3)cc1N(c1ccc(Cl)cc1)CC2. The target protein sequence is WKHQFAWPFRQPVDAVKLGLPDYHKIIKQPMDMGTIKRRLENNYYWAASECMQDFNTMFTNCYIYNKPTDDIV. The pIC50 is 6.3. (2) The small molecule is Oc1cc(O)cc(-c2cc3ccc(O)cc3o2)c1. The target protein (Q08499) has sequence MEAEGSSAPARAGSGEGSDSAGGATLKAPKHLWRHEQHHQYPLRQPQFRLLHPHHHLPPPPPPSPQPQPQCPLQPPPPPPLPPPPPPPGAARGRYASSGATGRVRHRGYSDTERYLYCRAMDRTSYAVETGHRPGLKKSRMSWPSSFQGLRRFDVDNGTSAGRSPLDPMTSPGSGLILQANFVHSQRRESFLYRSDSDYDLSPKSMSRNSSIASDIHGDDLIVTPFAQVLASLRTVRNNFAALTNLQDRAPSKRSPMCNQPSINKATITEEAYQKLASETLEELDWCLDQLETLQTRHSVSEMASNKFKRMLNRELTHLSEMSRSGNQVSEFISNTFLDKQHEVEIPSPTQKEKEKKKRPMSQISGVKKLMHSSSLTNSSIPRFGVKTEQEDVLAKELEDVNKWGLHVFRIAELSGNRPLTVIMHTIFQERDLLKTFKIPVDTLITYLMTLEDHYHADVAYHNNIHAADVVQSTHVLLSTPALEAVFTDLEILAAIFASA.... The pIC50 is 5.5. (3) The drug is CN[C@@H]1C[C@H]2O[C@@](C)([C@@H]1OC)n1c3ccccc3c3c4c(c5c6ccccc6n2c5c31)C(=O)NC4. The target protein sequence is MSATIEREFEELDAQCRWQPLYLEIRNESHDYPHRVAKFPENRNRNRYRDVSPYDHSRVKLQSAENDYINASLVDIEEAQRSYILTQGPLPNTCCHFWLMVWQQKTRAVVMLNRTVEKESVKCAQYWPTDDREMVFKETGFSVKLLSEDVKSYYTVHLLQLENINSGETRTISHFHYTTWPDFGVPESPASFLNFLFKVRESGSLNPDHGPAVIHCSAGIGRSGTFSLVDTCLVLMEKGEDVNVKQILLSMRKYRMGLIQTPDQLRFSYMAIIEGAKYTKGDSNIQKRWKELSKEDLSPVCRHSQNRTMTEKYNGKRIGSEDEKLTGLSSKVPDTVEESSESILRKRIREDRKATTAQKVQQMRQRLNETERKRKRWLYWQPILTKMGFVSVILVGALVGWTLLFQLNVLPRLTDT. The pIC50 is 6.4. (4) The small molecule is Nc1ncnc2c1nc(NCc1ccc(-c3ccccc3)c(OCCCCO)c1)n2[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O. The target protein (Q62773) has sequence MAKSEGRKSASQDTSENGMENPGLELMEVGNLEQGKTLEEVTQGHSLKDGLGHSSLWRRILQPFTKARSFYQRHAGLFKKILLGLLCLAYAAYLLAACILNFRRALALFVITCLVIFILACHFLKKFFAKKSIRCLKPLKNTRLRLWLKRVFMGAAVVGLILWLALDTAQRPEQLISFAGICMFILILFACSKHHSAVSWRTVFWGLGLQFVFGILVIRTEPGFNAFQWLGDQIQIFLAYTVEGSSFVFGDTLVQSVFAFQSLPIIIFFGCVMSILYYLGLVQWVIQKIAWFLQITMGTTAAETLAVAGNIFVGMTEAPLLIRPYLADMTLSEIHAVMTGGFATIAGTVLGAFISFGIDASSLISASVMAAPCALALSKLVYPEVEESKFKSKEGVKLPRGEERNILEAASNGATDAIALVANVAANLIAFLAVLAFINSTLSWLGEMVDIHGLTFQVICSYVLRPMVFMMGVQWADCPLVAEIVGVKFFINEFVAYQQL.... The pIC50 is 4.4. (5) The drug is O=C(NCCc1ccccc1)Nc1ccc2[nH]ncc2c1. The target protein sequence is MSTGDSFETRFEKMDNLLRDPKSEVNSDCLLDGLDALVYDLDFPALRKNKNIDNFLSRYKDTINKIRDLRMKAEDYEVVKVIGRGAFGEVQLVRHKSTRKVYAMKLLSKFEMIKRSDSAFFWEERDIMAFANSPWVVQLFYAFQDDRYLYMVMEYMPGGDLVNLMSNYDVPEKWARFYTAEVVLALDAIHSMGFIHRDVKPDNMLLDKSGHLKLADFGTCMKMNKEGMVRCDTAVGTPDYISPEVLKSQGGDGYYGRECDWWSVGVFLYEMLVGDTPFYADSLVGTYSKIMNHKNSLTFPDDNDISKEAKNLICAFLTDREVRLGRNGVEEIKRHLFFKNDQWAWETLRDTVAPVVPDLSSDIDTSNFDDLEEDKGEEETFPIPKAFVGNQLPFVGFTYYSNRRYLSSANPNDNRTSSNADKSLQESLQKTIYKLEEQLHNEMQLKDEMEQKCRTSNIKLDKIMKELDEEGNQRRNLESTVSQIEKEKMLLQHRINEYQR.... The pIC50 is 6.2. (6) The small molecule is O=C(NS(=O)(=O)c1ccccc1)c1ccc(-c2ccccn2)nc1. The target protein (Q9NXG6) has sequence MAAAAVTGQRPETAAAEEASRPQWAPPDHCQAQAAAGLGDGEDAPVRPLCKPRGICSRAYFLVLMVFVHLYLGNVLALLLFVHYSNGDESSDPGPQHRAQGPGPEPTLGPLTRLEGIKVGHERKVQLVTDRDHFIRTLSLKPLLFEIPGFLTDEECRLIIHLAQMKGLQRSQILPTEEYEEAMSTMQVSQLDLFRLLDQNRDGHLQLREVLAQTRLGNGWWMTPESIQEMYAAIKADPDGDGVLSLQEFSNMDLRDFHKYMRSHKAESSELVRNSHHTWLYQGEGAHHIMRAIRQRVLRLTRLSPEIVELSEPLQVVRYGEGGHYHAHVDSGPVYPETICSHTKLVANESVPFETSCRYMTVLFYLNNVTGGGETVFPVADNRTYDEMSLIQDDVDLRDTRRHCDKGNLRVKPQQGTAVFWYNYLPDGQGWVGDVDDYSLHGGCLVTRGTKWIANNWINVDPSRARQALFQQEMARLAREGGTDSQPEWALDRAYRDARV.... The pIC50 is 4.5. (7) The compound is C[C@]12CC[C@@H](O)[C@@](C)(CO)[C@@H]1CC[C@H]1C[C@@H]3C[C@@]12CC[C@]3(O)CO. The target protein (P04292) has sequence MFSGGGGPLSPGGKSAARAASGFFAPAGPRGAGRGPPPCLRQNFYNPYLAPVGTQQKPTGPTQRHTYYSECDEFRFIAPRVLDEDAPPEKRAGVHDGHLKRAPKVYCGGDERDVLRVGSGGFWPRRSRLWGGVDHAPAGFNPTVTVFHVYDILENVEHAYGMRAAQFHARFMDAITPTGTVITLLGLTPEGHRVAVHVYGTRQYFYMNKEEVDRHLQCRAPRDLCERMAAALRESPGASFRGISADHFEAEVVERTDVYYYETRPALFYRVYVRSGRVLSYLCDNFCPAIKKYEGGVDATTRFILDNPGFVTFGWYRLKPGRNNTLAQPRAPMAFGTSSDVEFNCTADNLAIEGGMSDLPAYKLMCFDIECKAGGEDELAFPVAGHPEDLVIQISCLLYDLSTTALEHVLLFSLGSCDLPESHLNELAARGLPTPVVLEFDSEFEMLLAFMTLVKQYGPEFVTGYNIINFDWPFLLAKLTDIYKVPLDGYGRMNGRGVFR.... The pIC50 is 6.4. (8) The drug is CC(C)C(O)(c1ccc2c3c(ccc2c1)C(=O)NC3)c1cnc[nH]1. The target protein sequence is MALRVTADVWARPWQCLHRTRALGSTATQAPKTLKPFEAIPQYSRNKWLKMIQILREQGQENLHLEMHQAFQELGPIFRHSAGGAQIVSVMLPEDAEKLHQVESILPRRMTLESWVAHRELRGLRRGVFLLNGADWRFNRLQLNPNMLSPKAVQSFVPFVDVVARDFVENLKKRMLENVHGSMSMDIQSNVFNYTMEASHFVISGERLGLTGHDLNPESLKFIHALHSMFKSTTQLMFLPKNLTRWTSTQVWKGHFESWDIISEYVTNVSRNVYRELAEGRQQSWSVISEMVAQSTLSMDAIHANSMELIAGSVDTTAISLVMTLFELARNPDVQQALRQESLAAEASIAANPQKAMSDLPLLRAALKETLRLYPIGSSLERIVDSDLVLQNYHVPAGTLVIIYLYSMGRNPAVFPRPERYMPQRWLERKRSFQHLAFGFGVRQCLGRRLAEVEVLLLLHHMLKIFQVETLRQEDVQMAYRFVLMPNPRLVLTIRPVS. The pIC50 is 6.7. (9) The compound is CC(=O)Nc1ccc2c(c1)C(=O)c1cc(CC(=O)O)ccc1SC2. The target protein (P15038) has sequence MELKATTLGKRLAQHPYDRAVILNAGIKVSGDRHEYLIPFNQLLAIHCKRGLVWGELEFVLPDEKVVRLHGTEWGETQRFYHHLDAHWRRWSGEMSEIASGVLRQQLDLIATRTGENKWLTREQTSGVQQQIRQALSALPLPVNRLEEFDNCREAWRKCQAWLKDIESARLQHNQAYTEAMLTEYADFFRQVESSPLNPAQARAVVNGEHSLLVLAGAGSGKTSVLVARAGWLLARGEASPEQILLLAFGRKAAEEMDERIRERLHTEDITARTFHALALHIIQQGSKKVPIVSKLENDTAARHELFIAEWRKQCSEKKAQAKGWRQWLTEEMQWSVPEGNFWDDEKLQRRLASRLDRWVSLMRMHGGAQAEMIASAPEEIRDLFSKRIKLMAPLLKAWKGALKAENAVDFSGLIHQAIVILEKGRFISPWKHILVDEFQDISPQRAALLAALRKQNSQTTLFAVGDDWQAIYRFSGAQMSLTTAFHENFGEGERCDLDT.... The pIC50 is 5.8. (10) The small molecule is C[C@H]1CN(C)[C@H](C)C[C@@]1(O)CCc1cccc2ccccc12. The target protein (P58406) has sequence MERAPPDGLMNASGALAGEAAAAGGARGFSAAWTAVLAALMALLIVATVLGNALVMLAFVADSSLRTQNNFFLLNLAISDFLVGAFCIPLYVPYVLTGRWTFGRGLCKLWLVVDYLLCASSVFNIVLISYDRFLSVTRAVSYRAQQGDTRRAVRKMALVWVLAFLLYGPAILSWEYLSGGSSIPEGHCYAEFFYNWYFLITASTLEFFTPFLSVTFFNLSIYLNIQRRTRLRLDGGREAGPEPPPDAQPSPPPAPPSCWGCWPKGHGEAMPLHRYGVGEAGPGVETGEAGLGGGSGGGAAASPTSSSGSSSRGTERPRSLKRGSKPSASSASLEKRMKMVSQSITQRFRLSRDKKVAKSLAIIVSIFGLCWAPYTLLMIIRAACHGHCVPDYWYETSFWLLWANSAVNPVLYPLCHYSFRRAFTKLLCPQKLKVQPHGSLEQCWK. The pIC50 is 7.1.