Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50.. Dataset: Drug-target binding data from BindingDB using IC50 measurements (1) The small molecule is Cc1cc(C)c2nc(CC(C)C)n(Cc3ccc(/C=C/CN4CCN(C(C)C)CC4)cc3)c2n1. The target protein (Q4KLH9) has sequence MDNSTGTWEGCHVDSRVDHLFPPSLYIFVIGVGLPTNCLALWAAYRQVRQRNELGVYLMNLSIADLLYICTLPLWVDYFLHHDNWIHGPGSCKLFGFIFYSNIYISIAFLCCISVDRYLAVAHPLRFARLRRVKTAVAVSSVVWATELGANSAPLFHDELFRDRYNHTFCFEKFPMERWVAWMNLYRVFVGFLFPWALMLLCYRGILRAVQSSVSTERQEKVKIKRLALSLIAIVLVCFAPYHALLLSRSAVYLGRPWDCGFEERVFSAYHSSLAFTSLNCVADPILYCLVNEGARSDVAKALHNLLRFLASNKPQEMANASLTLETPLTSKRSTTGKTSGAVWAVPPTAQGDQVPLKVLLPPAQ. The pIC50 is 4.7. (2) The compound is Nc1ncnc2c1ncn2[C@@H]1O[C@H](COP(=O)([O-])CP(=O)([O-])OP(=O)([O-])CP(=O)([O-])OC[C@H]2O[C@@H](n3cnc4c(N)ncnc43)[C@H](O)[C@@H]2O)[C@@H](O)[C@H]1O. The pIC50 is 4.5. The target protein (P22413) has sequence MERDGCAGGGSRGGEGGRAPREGPAGNGRDRGRSHAAEAPGDPQAAASLLAPMDVGEEPLEKAARARTAKDPNTYKVLSLVLSVCVLTTILGCIFGLKPSCAKEVKSCKGRCFERTFGNCRCDAACVELGNCCLDYQETCIEPEHIWTCNKFRCGEKRLTRSLCACSDDCKDKGDCCINYSSVCQGEKSWVEEPCESINEPQCPAGFETPPTLLFSLDGFRAEYLHTWGGLLPVISKLKKCGTYTKNMRPVYPTKTFPNHYSIVTGLYPESHGIIDNKMYDPKMNASFSLKSKEKFNPEWYKGEPIWVTAKYQGLKSGTFFWPGSDVEINGIFPDIYKMYNGSVPFEERILAVLQWLQLPKDERPHFYTLYLEEPDSSGHSYGPVSSEVIKALQRVDGMVGMLMDGLKELNLHRCLNLILISDHGMEQGSCKKYIYLNKYLGDVKNIKVIYGPAARLRPSDVPDKYYSFNYEGIARNLSCREPNQHFKPYLKHFLPKRLH.... (3) The target protein sequence is MEYETSLKCLDEIRCVNNVKYMETEDLTDFNKKSAYYICKEIYEKQLSNENGYVVIGLSGGKTPIDVYKNMCAIKDIKIDKNKLIFFIIDERYKNDDHKFSNYNNIKFLFDELNINKETQLYKPDTKKDLVSCIRDYNEQIKSMIEKYKKIDIVILGMGSDFHIASLFPNVYYNIYMNNYQNNYIYEDNETIRSLNADNNVNLSLLNEQVYFTTTNNFDVRKRITVSLNLLSNSTSKIFLLNTADKLNLWKNMLLNFYVNPNYNLYPAFKMIDSSNTTVIACGHKNYSKMLEDLYVQKDEALSPISNNNVENKNELLTIVIFGCSGDLAKKKIYPALFKLFCNNLLPKNIIIIGFARTGQDFESFFNKIAIYLKISLNSYKNLSVFEKAERLNSFKSKCRYFIGNYLSPESFENFDVYITQEERIALGCCGQKGNEKHKQVNVTSQFPNNHTSINIINNIDNGCESPMLTDSPKRYPCSSSYSSTSGTAVCPYSSQHDVK.... The pIC50 is 4.1. The drug is COc1cc(CNc2ccc(N3CCCC3)cc2)cc(Cl)c1OC. (4) The target protein sequence is MAVAKVEPIKIMLKPGKDGPKLRQWPLTKEKIEALKEICEKMEKEGQLEEAPPTNPYNTPTFAIKKKDKNKWRMLIDFRELNKVTQDFTEIQLGIPHPAGLAKKRRITVLDVGDAYFSIPLHEDFRPYTAFTLPSVNNAEPGKRYIYKVLPQGWKGSPAIFQHTMRQVLEPFRKANKDVIIIQYMDDILIASDRTDLEHDRVILQLKELLNGLGFSTPDEKFQKDPPYHWMGYELWPTKWKLQKIQLPQKEIWTVNDIQKLVGVLNWAAQLYPGIKTKHLCRLIRGKMTLTEEVQWTELAEAELEENRIILSQEQEGHYYQEEKELEATVQKDQDNQWTYKIHQEDKILKVGKYAKVKNTHTNGIRLLAQVVQKIGKEALVIWGRIPKFHLPVEREIWEQWWDNYWQVTWIPDWDFVSTPPLVRLAFNLVGDPIPGAETFYTDGSCNRQSKEGKAGYVTDRGKDKVKKLEQTTNQQAELEAFAMALTDSGPKVNIIVDSQ.... The pIC50 is 5.4. The small molecule is Cc1ccnc2c1NC(=O)c1cccnc1N2C1CC1. (5) The compound is CC1(C)S[C@@H]2[C@@H](CO)C(=O)N2[C@H]1C(=O)[O-]. The target protein (P62593) has sequence MSIQHFRVALIPFFAAFCLPVFAHPETLVKVKDAEDQLGARVGYIELDLNSGKILESFRPEERFPMMSTFKVLLCGAVLSRVDAGQEQLGRRIHYSQNDLVEYSPVTEKHLTDGMTVRELCSAAITMSDNTAANLLLTTIGGPKELTAFLHNMGDHVTRLDRWEPELNEAIPNDERDTTMPAAMATTLRKLLTGELLTLASRQQLIDWMEADKVAGPLLRSALPAGWFIADKSGAGERGSRGIIAALGPDGKPSRIVVIYTTGSQATMDERNRQIAEIGASLIKHW. The pIC50 is 4.2. (6) The target protein sequence is MEPGSDDFLPPPECPVFEPSWAEFRDPLGYIAKIRPIAEKSGICKIRPPADWQPPFAVEVDNFRFTPRIQRLNELEAQTRVKLNYLDQIAKFWEIQGSSLKIPNVERRILDLYSLSKIVVEEGGYEAICKDRRWARVAQRLNYPPGKNIGSLLRSHYERIVYPYEMYQSGANLVQCNTRPFDNEEKDKEYKPHSIPLRQSVQPSKFNSYGRRAKRLQPDPEPTEEDIEKNPELKKLQIYGAGPKMMGLGLMAKDKTLRKKDKEGPECPPTVVVKEELGGDVKVESTSPKTFLESKEELSHSPEPCTKMTMRLRRNHSNAQFIESYVCRMCSRGDEDDKLLLCDGCDDNYHIFCLLPPLPEIPKGVWRCPKCVMAECKRPPEAFGFEQATREYTLQSFGEMADSFKADYFNMPVHMVPTELVEKEFWRLVNSIEEDVTVEYGADIHSKEFGSGFPVSDSKRHLTPEEEEYATSGWNLNVMPVLEQSVLCHINADISGMKVP.... The drug is O=c1[nH]c(Oc2cnn(Cc3ccccc3)c2)nc2cnccc12. The pIC50 is 7.2. (7) The drug is COc1ccc(-c2csc3nc(SCC(=O)NN)n(-c4cccc(F)c4)c(=O)c23)cc1. The target protein (P21980) has sequence MAEELVLERCDLELETNGRDHHTADLCREKLVVRRGQPFWLTLHFEGRNYEASVDSLTFSVVTGPAPSQEAGTKARFPLRDAVEEGDWTATVVDQQDCTLSLQLTTPANAPIGLYRLSLEASTGYQGSSFVLGHFILLFNAWCPADAVYLDSEEERQEYVLTQQGFIYQGSAKFIKNIPWNFGQFEDGILDICLILLDVNPKFLKNAGRDCSRRSSPVYVGRVVSGMVNCNDDQGVLLGRWDNNYGDGVSPMSWIGSVDILRRWKNHGCQRVKYGQCWVFAAVACTVLRCLGIPTRVVTNYNSAHDQNSNLLIEYFRNEFGEIQGDKSEMIWNFHCWVESWMTRPDLQPGYEGWQALDPTPQEKSEGTYCCGPVPVRAIKEGDLSTKYDAPFVFAEVNADVVDWIQQDDGSVHKSINRSLIVGLKISTKSVGRDEREDITHTYKYPEGSSEEREAFTRANHLNKLAEKEETGMAMRIRVGQSMNMGSDFDVFAHITNNTA.... The pIC50 is 6.5. (8) The drug is Fc1ccc(C(OCCN2CCN(CCCc3ccccc3)CC2)c2ccc(F)cc2)cc1. The target protein sequence is MKRRSVLLSGVALSGTALANDSIFFSPLKYLGAEQQRSIDASRSLLDNLIPPSLPQYDNLAGKLARRAVLTSKKLVYVWTENFGNVKGVPMARSVPLGELPNVDWLLKTAGVIVELIVNFVASLPASAAAQFERIATGLSGDLEAARQVHEALLEEAKNDPAAAGSLLLRFTELQTRVIAILTRVGLLVDDILKSASNLVTQRGQGDGLNRFRAVFGTLRLPEVADSFRDDEAFAYWRVAGPNPLLIRRVDALPANFPLGEEQFRRVMGADDSLLEAAASRRLYLLDYAELGKLAPSGAVDKLLTGTGFAYAPIALFALGKDRARLLPVAIQCGQDPATHPMFVRPAESESDLYWGWQMAKTVVQVAEENYHEMFVHLAQTHLVSEAFCLATQRTLAPSHPLHVLLAPHFEGTLFINEGAARILLPSAGFIDVMFAAPIQDTQATAGGNRLGFDFYRGMLPESLKARNVDDPLALPDYPYRDDGLLVWNAIRQWAADYVA.... The pIC50 is 5.1. (9) The small molecule is CC(=O)N[C@@H]1[C@@H](N=C(Cc2ccccc2)NS(C)(=O)=O)C=C(C(=O)O)O[C@H]1[C@H](O)[C@H](O)CO. The target protein sequence is MAEKGKTNSSYWSTTRNDNSTVNTYIDTPAGKTHIWLLIATTMHTILSFIIMILCIDLIIKQDTCMKTNIITISSMNESAKTIKETITELIRQEVISRTINIQSSVQSGIPILLNKQSRDLTQLIEKSCNRQELAQICENTNAIHHADGISPLDPHDFWRCPVGEPLLSDNPNISLLPGPSLLSGSTTISGCVRLPSLSIGDAIYAYSSNLITQGCADIGKSYQVLQLGYISLNSDMYPDLNPVISHTYDINDNRKSCSVIAAGTRGYQLCSLPTVNETTDYSSEGIEDLVFDILDLKGKTKSHRYKNEDITFDHPFSAMYPSVGSGIKIENTLIFLGYGGLTTPLQGDTKCVTNRCANVNQSVCNDALKITWRLKKRQVNVLIRINNYLSDRPKIVVETIPITQNYLGAEGRLLKLGKKIYIYTRSSGWHSHLQIGSLDINNPMTIKWAPHEVLSRPGNQDCNWYNRCPRECISGVYTDAYPLSPDAVNVATTTLYANT.... The pIC50 is 2.7.