Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50.. Dataset: Drug-target binding data from BindingDB using IC50 measurements (1) The drug is O=C(c1ncc[nH]1)[C@@H]1CCCN1C(=O)[C@@H]1CCCN1C(=O)CCCc1ccccc1. The target protein (P23687) has sequence MLSFQYPDVYRDETAIQDYHGHKVCDPYAWLEDPDSEQTKAFVEAQNKITVPFLEQCPIRGLYKERMTELYDYPKYSCHFKKGKRYFYFYNTGLQNQRVLYVQDSLEGEARVFLDPNILSDDGTVALRGYAFSEDGEYFAYGLSASGSDWVTIKFMKVDGAKELPDVLERVKFSCMAWTHDGKGMFYNAYPQQDGKSDGTETSTNLHQKLYYHVLGTDQSEDILCAEFPDEPKWMGGAELSDDGRYVLLSIREGCDPVNRLWYCDLQQESNGITGILKWVKLIDNFEGEYDYVTNEGTVFTFKTNRHSPNYRLINIDFTDPEESKWKVLVPEHEKDVLEWVACVRSNFLVLCYLHDVKNTLQLHDLATGALLKIFPLEVGSVVGYSGQKKDTEIFYQFTSFLSPGIIYHCDLTKEELEPRVFREVTVKGIDASDYQTVQIFYPSKDGTKIPMFIVHKKGIKLDGSHPAFLYGYGGFNISITPNYSVSRLIFVRHMGGVLA.... The pIC50 is 8.0. (2) The drug is CN(C)CCCN1c2ccccc2Sc2ccc(C(F)(F)F)cc21. The target protein (P33302) has sequence MPEAKLNNNVNDVTSYSSASSSTENAADLHNYNGFDEHTEARIQKLARTLTAQSMQNSTQSAPNKSDAQSIFSSGVEGVNPIFSDPEAPGYDPKLDPNSENFSSAAWVKNMAHLSAADPDFYKPYSLGCAWKNLSASGASADVAYQSTVVNIPYKILKSGLRKFQRSKETNTFQILKPMDGCLNPGELLVVLGRPGSGCTTLLKSISSNTHGFDLGADTKISYSGYSGDDIKKHFRGEVVYNAEADVHLPHLTVFETLVTVARLKTPQNRIKGVDRESYANHLAEVAMATYGLSHTRNTKVGNDIVRGVSGGERKRVSIAEVSICGSKFQCWDNATRGLDSATALEFIRALKTQADISNTSATVAIYQCSQDAYDLFNKVCVLDDGYQIYYGPADKAKKYFEDMGYVCPSRQTTADFLTSVTSPSERTLNKDMLKKGIHIPQTPKEMNDYWVKSPNYKELMKEVDQRLLNDDEASREAIKEAHIAKQSKRARPSSPYTVS.... The pIC50 is 5.3. (3) The small molecule is COC(=O)[C@H]1[C@H]2C[C@@H]3c4[nH]c5cc(OC)ccc5c4CCN3C[C@H]2C[C@@H](OC(=O)c2cc(OC)c(OC)c(OC)c2)[C@@H]1OC. The target protein sequence is MALSDLVLLRWLRDSRHSRKLILFIVFLALLLDNMLLTVVVPIIPSYLYSIKHEKNSTEIQTTRPELVVSTSESIFSYYNNSTVLITGNATGTLPGGQSHKATSTQHTVANTTVPSDCPSEDRDLLNENVQVGLLFASKATVQLLTNPFIGLLTNRIGYPIPMFAGFCIMFISTVMFAFSSSYAFLLIARSLQGIGSSCSSVAGMGMLASVYTDDEERGKPMGIALGGLAMGVLVGPPFGSVLYEFVGKTAPFLVLAALVLLDGAIQLFVLQPSRVQPESQKGTPLTTLLKDPYILIAAGSICFANMGIAMLELALPIWMMETMCSRKWQLGVAFLPASISYLIGTNIFGILAHKMGRWLCALLGMVIVGISILCIPFAKNIYGLIAPNFGVGFAIGMVDSSMMPIMGYLVDLRHVSVYGSVYAIADVAFCMGYAIGPSAGGAIAKAIGFPWLMTIIGIIDIAFAPLCFFLRSPPAKEEKMAILMDHNCPIKRKMYTQNN.... The pIC50 is 8.3. (4) The small molecule is Cc1ccc2nc(-c3ccc(NC(=O)c4ccc(Cl)cc4)cc3)sc2c1. The target protein (P06721) has sequence MADKKLDTQLVNAGRSKKYTLGAVNSVIQRASSLVFDSVEAKKHATRNRANGELFYGRRGTLTHFSLQQAMCELEGGAGCVLFPCGAAAVANSILAFIEQGDHVLMTNTAYEPSQDFCSKILSKLGVTTSWFDPLIGADIVKHLQPNTKIVFLESPGSITMEVHDVPAIVAAVRSVVPDAIIMIDNTWAAGVLFKALDFGIDVSIQAATKYLVGHSDAMIGTAVCNARCWEQLRENAYLMGQMVDADTAYITSRGLRTLGVRLRQHHESSLKVAEWLAEHPQVARVNHPALPGSKGHEFWKRDFTGSSGLFSFVLKKKLNNEELANYLDNFSLFSMAYSWGGYESLILANQPEHIAAIRPQGEIDFSGTLIRLHIGLEDVDDLIADLDAGFARIV. The pIC50 is 3.7. (5) The small molecule is N#C/C(=C\c1cn(Cc2ccccc2F)c2ccccc12)C(=O)NCc1ccco1. The target protein (P03211) has sequence MSDEGPGTGPGNGLGEKGDTSGPEGSGGSGPQRRGGDNHGRGRGRGRGRGGGRPGAPGGSGSGPRHRDGVRRPQKRPSCIGCKGTHGGTGAGAGAGGAGAGGAGAGGGAGAGGGAGGAGGAGGAGAGGGAGAGGGAGGAGGAGAGGGAGAGGGAGGAGAGGGAGGAGGAGAGGGAGAGGGAGGAGAGGGAGGAGGAGAGGGAGAGGAGGAGGAGAGGAGAGGGAGGAGGAGAGGAGAGGAGAGGAGAGGAGGAGAGGAGGAGAGGAGGAGAGGGAGGAGAGGGAGGAGAGGAGGAGAGGAGGAGAGGAGGAGAGGGAGAGGAGAGGGGRGRGGSGGRGRGGSGGRGRGGSGGRRGRGRERARGGSRERARGRGRGRGEKRPRSPSSQSSSSGSPPRRPPPGRRPFFHPVGEADYFEYHQEGGPDGEPDVPPGAIEQGPADDPGEGPSTGPRGQGDGGRRKKGGWFGKHRGQGGSNPKFENIAEGLRALLARSHVERTTDE.... The pIC50 is 5.2. (6) The target protein sequence is MGACSSKAQHQTRDPEPREQQAAQEQKSTGPSGAPNDAPAPAEAERKMSGSSATAPKGEMPTASTGTPEQQQQQQQQQQQQQEQQQHPEHQQSEKQQQHGEEQQQERKPSQQQQNEEAAAPHKHGGERKVQKAIKQQEDTQAEDARLLGHLEKREKTPSDLSLIRDSLSTNLVCSSLNDAEVEALANAVEFFTFKKGDVVTKQGESGSYFFIVHSGEFEVIVNDKVVNKILTGQAFGEISLIHNSARTATIKTLSEDAALWGVQRQVFRETLKQLSSRNFAENRQFLASVKFFEMLTEAQKNVITNALVVQSFQPGQAIVKEGEKGDVLYILKSGKALVSIKNKEVRVLQRGEYFGERALLYDEPRSATITAEEPTVCVSIGRDLLDRVLGNLQHVLFRNIMLEALQQSKVFASFPTEQLSRLIGSVVVKDYPENYIILDRENRTRASASALFSAQGVRFFFVLEGEVSVFAYKDKSSSSSSSSSSSSSSSSAEGEMELH.... The pIC50 is 9.7. The drug is CN(C)Cc1ccn2c(-c3ccnc(NC4CC4)n3)c(-c3ccc(F)cc3)nc2c1. (7) The compound is N#CCNC(=O)[C@H](Cc1cccc(Cl)c1)NC(=O)C1=CCCCC1. The target protein sequence is APRSVDWREKGYVTPVKNQGQCGSCWAFSATGALEGQMFRKTGRLISLSEQNLVDCSGPQGNEGCNGGLMDYAFQYVQDNGGLDSEESYPYEATEESCKYNPKYSVANDAGFVDIPKQEKALMKAVATVGPISVAIDAGHESFLFYKEGIYFEPDCSSEDMDHGVLVVGYGFESTESDNNKYWLVKNSWGEEWGMGGYVKMAKDRRNHCGIASAASYPTV. The pIC50 is 6.3. (8) The small molecule is O=c1[nH]c(Oc2cnn(C3CCCC3)c2)nc2cnccc12. The target protein (O75164) has sequence MASESETLNPSARIMTFYPTMEEFRNFSRYIAYIESQGAHRAGLAKVVPPKEWKPRASYDDIDDLVIPAPIQQLVTGQSGLFTQYNIQKKAMTVREFRKIANSDKYCTPRYSEFEELERKYWKNLTFNPPIYGADVNGTLYEKHVDEWNIGRLRTILDLVEKESGITIEGVNTPYLYFGMWKTSFAWHTEDMDLYSINYLHFGEPKSWYSVPPEHGKRLERLAKGFFPGSAQSCEAFLRHKMTLISPLMLKKYGIPFDKVTQEAGEFMITFPYGYHAGFNHGFNCAESTNFATRRWIEYGKQAVLCSCRKDMVKISMDVFVRKFQPERYKLWKAGKDNTVIDHTLPTPEAAEFLKESELPPRAGNEEECPEEDMEGVEDGEEGDLKTSLAKHRIGTKRHRVCLEIPQEVSQSELFPKEDLSSEQYEMTECPAALAPVRPTHSSVRQVEDGLTFPDYSDSTEVKFEELKNVKLEEEDEEEEQAAAALDLSVNPASVGGRLV.... The pIC50 is 6.1. (9) The small molecule is COc1ccc(N2CCN(C(=O)c3cc4c(s3)-c3ccccc3S(=O)(=O)C4)CC2)cc1. The target protein sequence is MCGNTMSVPLLTDAATVSGAERETAAVIFLHGLGDTGHSWADALSTIRLPHVKYICPHAPRIPVTLNMKMVMPSWFDLMGLSPDAPEDEAGIKKAAENIKALIEHEMKNGIPANRIVLGGFAQGGALSLYTALTCPHPLAGIVALSCWLPLHRAFPQAANGSAKDLAILQCHGELDPMVPVRFGALTAEKLRSVVTPARVQFKTYPGVMHSSCPQEMAAVKEFLEKLLPPV. The pIC50 is 6.0. (10) The small molecule is COc1ccccc1OCCNCC(O)COc1cccc2[nH]c3ccccc3c12. The target protein (P22309) has sequence MAVESQGGRPLVLGLLLCVLGPVVSHAGKILLIPVDGSHWLSMLGAIQQLQQRGHEIVVLAPDASLYIRDGAFYTLKTYPVPFQREDVKESFVSLGHNVFENDSFLQRVIKTYKKIKKDSAMLLSGCSHLLHNKELMASLAESSFDVMLTDPFLPCSPIVAQYLSLPTVFFLHALPCSLEFEATQCPNPFSYVPRPLSSHSDHMTFLQRVKNMLIAFSQNFLCDVVYSPYATLASEFLQREVTVQDLLSSASVWLFRSDFVKDYPRPIMPNMVFVGGINCLHQNPLSQEFEAYINASGEHGIVVFSLGSMVSEIPEKKAMAIADALGKIPQTVLWRYTGTRPSNLANNTILVKWLPQNDLLGHPMTRAFITHAGSHGVYESICNGVPMVMMPLFGDQMDNAKRMETKGAGVTLNVLEMTSEDLENALKAVINDKSYKENIMRLSSLHKDRPVEPLDLAVFWVEFVMRHKGAPHLRPAAHDLTWYQYHSLDVIGFLLAVVL.... The pIC50 is 4.9.