Dataset: Drug-target binding data from BindingDB using Ki measurements. Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pKi (pKi = -log10(Ki in M); higher means stronger inhibition). Dataset: bindingdb_ki. (1) The small molecule is Cc1c(CC(=O)c2ccccc2)oc(=O)c(C)c1O. The pKi is 3.2. The target protein (Q99895) has sequence MLGITVLAALLACASSCGVPSFPPNLSARVVGGEDARPHSWPWQISLQYLKNDTWRHTCGGTLIASNFVLTAAHCISNTRTYRVAVGKNNLEVEDEEGSLFVGVDTIHVHKRWNALLLRNDIALIKLAEHVELSDTIQVACLPEKDSLLPKDYPCYVTGWGRLWTNGPIADKLQQGLQPVVDHATCSRIDWWGFRVKKTMVCAGGDGVISACNGDSGGPLNCQLENGSWEVFGIVSFGSRRGCNTRKKPVVYTRVSAYIDWINEKMQL. (2) The compound is Cc1ccc(C(=O)Nc2ccc(S(=O)(=O)O)c3cc(S(=O)(=O)O)cc(S(=O)(=O)O)c23)cc1NC(=O)c1cccc(NC(=O)Nc2cccc(C(=O)Nc3cc(C(=O)Nc4ccc(S(=O)(=O)O)c5cc(S(=O)(=O)O)cc(S(=O)(=O)O)c45)ccc3C)c2)c1. The target protein (O35795) has sequence MAGKLVSLVPPLLLAAAGLTGLLLLCVPTQDVREPPALKYGIVLDAGSSHTSMFVYKWPADKENDTGIVGQHSSCDVQGGGISSYANDPSKAGQSLVRCLEQALRDVPRDRHASTPLYLGATAGMRPFNLTSPEATARVLEAVTQTLTQYPFDFRGARILSGQDEGVFGWVTANYLLENFIKYGWVGRWIRPRKGTLGAMDLGGASTQITFETTSPSEDPGNEVHLRLYGQHYRVYTHSFLCYGRDQILLRLLASALQIHRFHPCWPKGYSTQVLLQEVYQSPCTMGQRPRAFNGSAIVSLSGTSNATLCRDLVSRLFNISSCPFSQCSFNGVFQPPVAGNFIAFSAFYYTVDFLTTVMGLPVGTLKQLEEATEITCNQTWTELQARVPGQKTRLADYCAVAMFIHQLLSRGYHFDERSFREVVFQKKAADTAVGWALGYMLNLTNLIPADLPGLRKGTHFSSWVALLLLFTVLILAALVLLLRQVRSAKSPGAL. The pKi is 4.2. (3) The drug is OC[C@H]1O[C@@H](n2cnc3c2NC=NC[C@H]3O)C[C@@H]1O. The target protein sequence is MTASRIDTETLRRLPKAVLHDHLDGGLRPATVVELAAAVGHTLPTTDPDELAAWYVEAANSGDLVRYIATFEHTLAVMQTREGLLRTAEEYVLDLAADGVVYAEVRYAPELMLKGGLTLTEVVEAVQEGLAAGMAKAAAAGTPVRVGTLLCGMRMFDRVREAAGLAVAYRDAGVVGFDIAGAEDGFPPADHLDAFAYLRAESMPFTIHAGEAYGLPSIHQALQVCGAQRIGHGVRLTEDIVDGKLGRLASWVRDRRIALEMCPTSNLQTGCATSIAEHPITALKDLGFRVTLNTDNRLVSGTTMTREMSLLVEQAGWTVEDLRTVTVNALKSAFVPFDERTALIEDVVLPGYAAAL. The pKi is 8.7. (4) The drug is CC[C@H](C)[C@H](NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CCCNC(=N)N)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](Cc1ccccc1)NC(=O)CNC(=O)CNC(=O)[C@@H](N)Cc1ccc(O)cc1)C(=O)N[C@@H](CCCNC(=N)N)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCCN)C(=O)O. The target protein sequence is MDSPIQIFRGEPGPTCAPSACLPPNSSAWFPGWAEPDSNGSAGSEDAQLEPAHISPAIPVIITAVYSVVFVVGLVGNSLVMFVIIRYTKMKTATNIYIFNLALADALVTTTMPFQSTVYLMNSWPFGDVLCKIVISIDYYNMFTSIFTLTMMSVDRYIAVCHPVKALDFRTPLKAKIINICIWLLSSSVGISAIVLGGTKVREDVDVIECSLQFPDDDYSWWDLFMKICVFIFAFVIPVLIIIVCYTLMILRLKSVRLLSGSREKDRNLRRITRLVLVVVAVFVVCWTPIHIFILVEALGSTSHSTAALSSYAFCIALGYTNSSLNPILYAFLDENFKRCFRDFCFPLKMRMERQSTSRVRNTVQDPAYLRDIDGMNKPV. The pKi is 7.3. (5) The small molecule is N[C@@]1(C(=O)O)CC[C@H](C(=O)O)C1. The target protein sequence is MVLLLILSVLLLKEDVRGSAQSSERRVVAHMPGDIIIGALFSVHHQPTVDKVHERKCGAVREQYGIQRVEAMLHTLERINSDPTLLPNITLGCEIRDSCWHSAVALEQSIEFIRDSLISAEEEEGLVRCVDGSSSFRSKKPIVGVIGPGSSSVAIQVQNLLQLFNIPQIAYSATSMDLSDKTLFKYFMRVVPSDAQQARAMVDIVKRYNWTYVSAVHTEGNYGESGMEAFKDMSAKEGICIAHSYKIYSNAGEQSFDKLLKKLTSHLPKARVVACFCEGMTVRGLLMAMRRLGLAGEFLLLGSDGWADRYDVTDGYQREAVGGITIKLQSPDVKWFDDYYLKLRPETNLRNPWFQEFWQHRFQCRLEGFAQENSKYNKTCNSSLTLRTHHVQDSKMGFVINAIYSMAYGLHNMQMSLCPGYAGLCDAMKPIDGRKLLDSLMKTNFTGVSGDMILFDENGDSPGRYEIMNFKEMRKDYFDYINVGSWDNGELKMDDDEVWS.... The pKi is 5.3. (6) The compound is CNC(=O)[C@H]1O[C@@H](n2cnc3c(NCc4cccc(I)c4)nc(SC)nc32)[C@H](O)[C@@H]1O. The target protein (O54698) has sequence MTTSHQPQDRYKAVWLIFFVLGLGTLLPWNFFITATQYFTSRLNTSQNISLVTNQSCESTEALADPSVSLPARSSLSAIFNNVMTLCAMLPLLIFTCLNSFLHQKVSQSLRILGSLLAILLVFLVTATLVKVQMDALSFFIITMIKIVLINSFGAILQASLFGLAGVLPANYTAPIMSGQGLAGFFTSVAMICAVASGSKLSESAFGYFITACAVVILAILCYLALPWMEFYRHYLQLNLAGPAEQETKLDLISEGEEPRGGREESGVPGPNSLPANRNQSIKAILKSIWVLALSVCFIFTVTIGLFPAVTAEVESSIAGTSPWKNCYFIPVACFLNFNVFDWLGRSLTAICMWPGQDSRWLPVLVACRVVFIPLLMLCNVKQHHYLPSLFKHDVWFITFMAAFAFSNGYLASLCMCFGPKKVKPAEAETAGNIMSFFLCLGLALGAVLSFLLRALV. The pKi is 4.3. (7) The compound is O=[N+]([O-])c1ccc2ccn([C@H]3C[C@H](O)[C@@H](COP(=O)(O)OP(=O)(O)OP(=O)(O)O)O3)c2c1. The pKi is 5.0. The target protein sequence is MITVNEKEHILEQKYRPSTIDECILPAFDKETFKSITSKGKIPHIILHSPSPGTGKTTVAKALCHDVNADMMFVNGSDCKIDFVRGPLTNFASAASFDGRQKVIVIDEFDRSGLAESQRHLRSFMEAYSSNCSIIITANNIDGIIKPLQSRCRVITFGQPTDEDKIEMMKQMIRKLTEICKHEGIAIADMKVVAALVKKNFPDFRKTIGELDSYSSKGVLDAGILSLVTNDRGAIDDVLESLKNKDVKQLRALAPKYAADYSWFVGKLAEEIYSRVTPQSIIRMYEIVGENNQYHGIAANTELHLAYLFIQLACEMQWKMSLFKDDIQLNEHQVAWYSKDWTAVQSAADSFKEKAENEFFEIIGAINNKTKCSIAQKDYSKFMVENALSQFPECMPAVYAMNLIGSGLSDEAHFNYLMAAVPRGKRYGKWAKLVEDSTEVLIIKLLAKRYQVNTNDAINYKSILTKNGKLPLVLKELKGLVTDDFLKEVTKNVKEQKQLK.... (8) The drug is CCCC/N=C1\SC[C@@H]2[C@H](O)[C@H](O)[C@@H](O)CN12. The target protein sequence is LRNATQRMFEIDYSRDSFLKDGQPFRYTSGSIHYSRVPRFYWKDRLLKMKMAGLNAIQTYVPWNFHEPWPGQYQFSEDHDVEYFLRLAHELGLLVILRPGPYICAEWEMGGLPAWLLEKESILLRSSDPDYLAAVDKWLGVLLPKMKPLLYQNGGPVITVQVENEYGSYFACDFDYLRFLQKRFRHHLGDDVVLFTTDGAHKTFLKCGALQGLYTTVDFGTGSNITDAFLSQRKCEPKGPLINSEFYTGWLDHWGQPHSTIKTEAVASSLYDILARGASVNLYMFIGGTNFAYWNGANSPYAAQPTSYDYDAPLSEAGDLTEKYFALRNIIQKFEKVPEGPIPPSTPKFAYGKVTLEKLKTVGAALDILCPSGPIKSLYPLTFIQVKQHYGFVLYRTTLPQDCSNPAPLSSPLNGVHDRAYVAVDGIPQGVLERNNVITLNITGKAGATLDLLVENMGRVNYGAYINDFKGLVSNLTLSSNILTDWTIFPLDTEDAVRSH.... The pKi is 4.2. (9) The compound is C[C@@H](CC[C@H](F)[C@@H](C)C(=O)SCCNC(=O)CCNC(=O)C(O)C(C)(C)COP(=O)(O)OP(=O)(O)OC[C@H]1O[C@@H](n2cnc3c(N)ncnc32)[C@H](O)[C@@H]1OP(=O)(O)O)[C@H]1CC[C@H]2[C@H]3[C@H](C[C@H](O)[C@@]21C)[C@@]1(C)CC[C@@H](O)C[C@H]1C[C@H]3O. The target protein (P70473) has sequence MALRGVRVLELAGLAPGPFCGMILADFGAEVVLVDRLGSVNHPSHLARGKRSLALDLKRSPGAAVLRRMCARADVLLEPFRCGVMEKLQLGPETLRQDNPKLIYARLSGFGQSGIFSKVAGHDINYVALSGVLSKIGRSGENPYPPLNLLADFGGGGLMCTLGILLALFERTRSGLGQVIDANMVEGTAYLSTFLWKTQAMGLWAQPRGQNLLDGGAPFYTTYKTADGEFMAVGAIEPQFYTLLLKGLGLESEELPSQMSIEDWPEMKKKFADVFARKTKAEWCQIFDGTDACVTPVLTLEEALHHQHNRERGSFITDEEQHACPRPAPQLSRTPAVPSAKRDPSVGEHTVEVLKDYGFSQEEIHQLHSDRIIESNKLKANL. The pKi is 5.4.