Dataset: Drug-target binding data from BindingDB using Ki measurements. Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pKi (pKi = -log10(Ki in M); higher means stronger inhibition). Dataset: bindingdb_ki. (1) The drug is Nc1ncnc2c1ncn2[C@@H]1O[C@H](COS(=O)(=O)NC(=O)[C@@H](N)CCC(=O)O)[C@@H](O)[C@H]1O. The target protein (Q9CXJ1) has sequence MAAPLKRLLLAEPHVVALGHRVGRREASLGPDPGAPVRVRFAPSPTGFLHLGGLRTALYNYIFAKKHQGSFILRLEDTDQSRLVPGAAESIEDMLEWAGIPPDESPRRGGPAGPYCQSQRLALYAQATEALLRSGAAYPCFCLPQRLELLKKEALRSRQTPRYDNRCRNLSQAQVAQKLAVDPKPAIRFRLEEAVPAFQDLVYGWTQHEVASVEGDPVILKSDGFPTYHLACVVDDHHMSISHVLRGSEWLVSTSKHLLLYQALGWQPPRFAHLPLLLNRDGSKLSKRQGDIFLEHFAATGFLPEALLDIITNCGSGFAENQMGRTLPELITQFDLTRITCHSALLDLEKLPEFNRLHLRRLVSSETQRPQLVEKLQGLVKEAFGSELQNKDVLDPAYMERILLLRQGHISRLQDLVSPVYSYLWTRPAVHRSELGASSENVDVIAKRLLGLLERPGLSLTQDVLNRELKKLSEGLEGAKHSSVMKLLRMALSGQLQGPP.... The pKi is 7.2. (2) The drug is CSCCC(NC(=O)C(CC(C)C)NC(=O)CNC(=O)C(NC(=O)C(Cc1ccccc1)NC(=O)C(CO)NC(=O)C(CC(=O)O)NC(=O)C(NC(=O)C(CCCCN)NC(=O)C(N)Cc1cnc[nH]1)C(C)O)C(C)C)C(N)=O. The target protein (P05504) has sequence MNENLFASFITPTMMGLPIVVTIIMFPSILFPSSERLISNRLHSFQHWLIKLIIKQMMLIHTPKGRTWALMIVSLIMFIGSTNLLGLLPHTFTPTTQLSMNLSMAIPLWAGAVILGFRHKLKNSLAHFLPQGTPISLIPMLIIIETISLFIQPMALAVRLTANITAGHLLMHLIGGATLVLMDISPPTATITFIILLLLTVLEFAVALIQAYVFTLLVSLYLHDNT. The pKi is 7.4. (3) The drug is [NH3+]C(CCS(=O)CCC(=O)NCC(=O)O)C(=O)O. The target protein (P07314) has sequence MKNRFLVLGLVAVVLVFVIIGLCIWLPTTSGKPDHVYSRAAVATDAKRCSEIGRDMLQEGGSVVDAAIASLLCMGLINAHSMGIGGGLFFTIYNSTTRKAEVINAREMAPRLANTSMFNNSKDSEEGGLSVAVPGEIRGYELAHQRHGRLPWARLFQPSIQLARHGFPVGKGLARALDKKRDIIEKTPALCEVFCRQGKVLQEGETVTMPKLADTLQILAQEGARAFYNGSLTAQIVKDIQEAGGIMTVEDLNNYRAEVIEHPMSIGLGDSTLYVPSAPLSGPVLILILNILKGYNFSPKSVATPEQKALTYHRIVEAFRFAYAKRTMLGDPKFVDVSQVIRNMSSEFYATQLRARITDETTHPTAYYEPEFYLPDDGGTAHLSVVSEDGSAVAATSTINLYFGSKVLSRVSGILFNDEMDDFSSPNFTNQFGVAPSPANFIKPGKQPLSSMCPSIIVDKDGKVRMVVGASGGTQITTSVALAIINSLWFGYDVKRAVEE.... The pKi is 4.3. (4) The small molecule is NC(CCC(=O)NC(CSC(=O)OCc1ccc([N+](=O)[O-])cc1)C(=O)NCC(=O)O)C(=O)O. The target protein (O35952) has sequence MVLGRGSLCLRSLSVLGAACARRGLGQALLGLSLCHTDFRKNLTVQQDMMKIELLPALTDNYMYLIIDEDTQEAAVVDPVQPQKVIETVKKHRVKLTTVLTTHHHWDHAGGNEKLVKLEPGLKVYGGDDRIGALTHKVTHLSTLEVGSLSVKCLSTPCHTSGHICYFVSKPGSSEPSAVFTGDTLFVAGCGKFYEGTADEMYKALLEVLGRLPPDTKVICGHEYTVNNLKFARHVEPGNTAVQEKLAWAKEKNAIGEPTVPSTLAEEFTYNPFMRVKEKTVQQHAGETDPVTTMRAIRREKDQFKVPRD. The pKi is 5.2. (5) The compound is Cc1ccnc(C)c1OC[C@H]1CN(CCN2CCc3ccccc32)CCO1. The target protein (P51436) has sequence MGNSSATEDGGLLAGRGPESLGTGAGLGGAGAAALVGGVLLIGLVLAGNSLVCVSVASERTLQTPTNYFIVSLAAADLLLAVLVLPLFVYSEVQGGVWLLSPRLCDTLMAMDVMLCTASIFNLCAISVDRFVAVTVPLRYNQQGQCQLLLIAATWLLSAAVASPVVCGLNDVPGRDPAVCCLENRDYVVYSSVCSFFLPCPLMLLLYWATFRGLRRWEAARHTKLHSRAPRRPSGPGPPVSDPTQGPFFPDCPPPLPSLRTSPSDSSRPESELSQRPCSPGCLLADAALPQPPEPSSRRRRGAKITGRERKAMRVLPVVVGAFLVCWTPFFVVHITRALCPACFVSPRLVSAVTWLGYVNSALNPIIYTIFNAEFRSVFRKTLRLRC. The pKi is 5.3. (6) The drug is NC1CC1c1ccc(NC(=O)C(Cc2ccccc2)NC(=O)OCc2ccccc2)cc1. The target protein (Q3UXZ9) has sequence MASVGPGGYAAEFVPPPECPVFEPSWEEFTDPLSFIGRIRPFAEKTGICKIRPPKDWQPPFACEVKTFRFTPRVQRLNELEAMTRVRLDFLDQLAKFWELQGSTLKIPVVERKILDLYALSKIVASKGGFEIVTKEKKWSKVGSRLGYLPGKGTGSLLKSHYERILYPYELFQSGVSLMGVQMPDLDLKEKVEAEVLSTDIQPSPERGTRMNIPPKRTRRVKSQSDSGEVNRNTELKKLQIFGAGPKVVGLAVGAKDKEDEVTRRRKVTNRSDAFNMQMRQRKGTLSVNFVDLYVCMFCGRGNNEDKLLLCDGCDDSYHTFCLLPPLPDVPKGDWRCPKCVAEECNKPREAFGFEQAVREYTLQSFGEMADNFKSDYFNMPVHMVPTELVEKEFWRLVSSIEEDVIVEYGADISSKDFGSGFPKKDGQRKMLPEEEEYALSGWNLNNMPVLEQSVLAHINVDISGMKVPWLYVGMCFSSFCWHIEDHWSYSINYLHWGEP.... The pKi is 4.7. (7) The compound is CC(C)C[C@H](N)C(=O)CC(C)C(=O)O. The target protein (Q9ULA0) has sequence MQVAMNGKARKEAVQTAAKELLKFVNRSPSPFHAVAECRNRLLQAGFSELKETEKWNIKPESKYFMTRNSSTIIAFAVGGQYVPGNGFSLIGAHTDSPCLRVKRRSRRSQVGFQQVGVETYGGGIWSTWFDRDLTLAGRVIVKCPTSGRLEQQLVHVERPILRIPHLAIHLQRNINENFGPNTEMHLVPILATAIQEELEKGTPEPGPLNAVDERHHSVLMSLLCAHLGLSPKDIVEMELCLADTQPAVLGGAYDEFIFAPRLDNLHSCFCALQALIDSCAGPGSLATEPHVRMVTLYDNEEVGSESAQGAQSLLTELVLRRISASCQHPTAFEEAIPKSFMISADMAHAVHPNYLDKHEENHRPLFHKGPVIKVNSKQRYASNAVSEALIREVANKVKVPLQDLMVRNDTPCGTTIGPILASRLGLRVLDLGSPQLAMHSIREMACTTGVLQTLTLFKGFFELFPSLSHNLLVD. The pKi is 3.0. (8) The compound is CCC(=O)OC1CC2CCCC1N2C. The target protein (P11229) has sequence MNTSAPPAVSPNITVLAPGKGPWQVAFIGITTGLLSLATVTGNLLVLISFKVNTELKTVNNYFLLSLACADLIIGTFSMNLYTTYLLMGHWALGTLACDLWLALDYVASNASVMNLLLISFDRYFSVTRPLSYRAKRTPRRAALMIGLAWLVSFVLWAPAILFWQYLVGERTVLAGQCYIQFLSQPIITFGTAMAAFYLPVTVMCTLYWRIYRETENRARELAALQGSETPGKGGGSSSSSERSQPGAEGSPETPPGRCCRCCRAPRLLQAYSWKEEEEEDEGSMESLTSSEGEEPGSEVVIKMPMVDPEAQAPTKQPPRSSPNTVKRPTKKGRDRAGKGQKPRGKEQLAKRKTFSLVKEKKAARTLSAILLAFILTWTPYNIMVLVSTFCKDCVPETLWELGYWLCYVNSTINPMCYALCNKAFRDTFRLLLLCRWDKRRWRKIPKRPGSVHRTPSRQC. The pKi is 6.6. (9) The drug is COc1ccc(/C=N/NC(=O)c2ccccc2)cc1. The target protein sequence is MNPSFFLTVLCLGVASAAPKLDPNLDAHWHQWKATHRRLYGMNEEGWRRAVWEKNKKIIDLHNQEYSQGKHGFSMAMNAFGDMTNEEFRQVMNGFQSQKRKKGKLFREPLLIDVPKSVDWTKKGYVTPVKNQGQCGSCWAFSATGALEGQMFRKTGKLVSLSEQNLVDCSRPQGNQGCNGGLMDNAFQYIKENGGLDSEESYPYLATDTNSCTYKPECSAANDTGFVDIPQREKALMKAVATVGPISVAIDAGHASFQFYKSGIYYDPDCSSKDLDHGVLVVGYGFEGTDSNNNKFWIVKNSWGPEWGWNGYVKMAKDQNNHCGIATAASYPTV. The pKi is 7.7. (10) The drug is CN(CCC#N)C[C@H]1OC(n2cnc3c(N)ncnc32)[C@H](O)[C@@H]1O. The target protein (P17708) has sequence MEAAHFFEGTEKLLEVWFSRQQSDASQGSGDLRTIPRSEWDVLLKDVQCSIISVTKTDKQEAYVLSESSMFVSKRRFILKTCGTTLLLKALVPLLKLARDYSGFDSIQSFFYSRKNFMKPSHQGYPHRNFQEEIEFLNAIFPNGAAYCMGRMNSDCWYLYTLDLPESRVINQPDQTLEILMSELDPAVMDQFYMKDGVTAKDVTRESGIRDLIPGSVIDATLFNPCGYSMNGMKSDGTYWTIHITPEPEFSYVSFETNLSQTSYDDLIRKVVEVFKPGKFVTTLFVNQSSKCRTVLSSPQKIDGFKRLDCQSAMFNDYNFVFTSFAKKQQQQS. The pKi is 3.5.