This data is from Drug-target binding data from BindingDB using IC50 measurements. The task is: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50. (1) The drug is O=C(CCCCCCCc1cc(-c2cccnc2)on1)Nc1ccccc1-c1ccccc1. The target protein (Q99KQ4) has sequence MNAAAEAEFNILLATDSYKVTHYKQYPPNTSKVYSYFECREKKTENSKVRKVKYEETVFYGLQYILNKYLKGKVVTKEKIQEAKEVYREHFQDDVFNERGWNYILEKYDGHLPIEVKAVPEGSVIPRGNVLFTVENTDPECYWLTNWIETILVQSWYPITVATNSREQKKILAKYLLETSGNLDGLEYKLHDFGYRGVSSQETAGIGASAHLVNFKGTDTVAGIALIKKYYGTKDPVPGYSVPAAEHSTITAWGKDHEKDAFEHIVTQFSSVPVSVVSDSYDIYNACEKIWGEDLRHLIVSRSTEAPLIIRPDSGNPLDTVLKVLDILGKKFPVTENSKGYKLLPPYLRVIQGDGVDINTLQEIVEGMKQKKWSIENVSFGSGGALLQKLTRDLLNCSFKCSYVVTNGLGVNVFKDPVADPNKRSKKGRLSLHRTPAGNFVTLEEGKGDLEEYGHDLLHTVFKNGKVTKSYSFDEVRKNAQLNIEQDVAPH. The pIC50 is 7.3. (2) The drug is C[C@H](N)P(=O)(O)OCC(=O)O. The target protein (Q06241) has sequence MEIGFTFLDEIVHGVRWDAKYATWDNFTGKPVDGYEVNRIVGTYELAESLLKAKELAATQGYGLLLWDGYRPKRAVNCFMQWAAQPENNLTKESYYPNIDRTEMISKGYVASKSSHSRGSAIDLTLYRLDTGELVPMGSRFDFMDERSHHAANGISCNEAQNRRRLRSIMENSGFEAYSLEWWHYVLRDEPYPNSYFDFPVK. The pIC50 is 3.3. (3) The small molecule is CCOC(C(=O)c1ccc(-c2cc(OC)c(Cl)c(OC)c2)o1)c1ccc(-c2nnc(C)s2)cc1. The target protein sequence is MRRQPAASLDPLAKEPGPPGSRDDRLEDALLSLGSVIDISGLQRAVKEALSAVLPRVETVYTYLLDGESQLVCEDPPHELPQEGKVREAIISQKRLGCNGLGFSDLPGKPLARLVAPLAPDTQVLVMPLADKEAGAVAAVILVHCGQLSDNEEWSLQAVEKHTLVALRRVQVLQQRGPREAPRAVQNPPEGTAEDQKGGAAYTDRDRKILQLCGELYDLDASSLQLKVLQYLQQETRASRCCLLLVSEDNLQLSCKVIGDKVLGEEVSFPLTGCLGQVVEDKKSIQLKDLTSEDVQQLQSMLGCELQAMLCVPVISRATDQVVALACAFNKLEGDLFTDEDEHVIQHCFHYTSTVLTSTLAFQKEQKLKCECQALLQVAKNLFTHLDDVSVLLQEIITEARNLSNAEICSVFLLDQNELVAKVFDGGVVDDESYEIRIPADQGIAGHVATTGQILNIPDAYAHPLFYRGVDDSTGFRTRNILCFPIKNENQEVIGVAELV.... The pIC50 is 6.3. (4) The drug is NC(=Nc1ccc2nc(NCCN3CCOCC3)sc2c1)c1cccs1. The target protein (Q62600) has sequence MGNLKSVGQEPGPPCGLGLGLGLGLCGKQGPASPAPEPSQAPVPPSPTRPAPDHSPPLTRPPDGPKFPRVKNWEVGSITYDTLSAQAQQDGPCTPRRCLGSLVFPRKLQSRPTQGPSPTEQLLGQARDFINQYYNSIKRSGSQAHEQRLQEVEAEVVATGTYQLRESELVFGAKQAWRNAPRCVGRIQWGKLQVFDARDCRTAQEMFTYICNHIKYATNRGNLRSAITVFPQRYAGRGDFRIWNSQLVRYAGYRQQDGSVRGDPANVEITELCIQHGWTPGNGRFDVLPLLLQAPDEPPELFTLPPELVLEVPLEHPTLEWFAALGLRWYALPAVSNMLLEIGGLEFPAAPFSGWYMSSEIGMRDLCDPHRYNILEDVAVCMDLDTRTTSSLWKDKAAVEINVAVLYSYQLAKVTIVDHHAATASFMKHLENEQKARGGCPADWAWIVPPISGSLTPVFHQEMVNYFLSPAFRYQPDPWKGSAAKGTGITRKKTFKEVAN.... The pIC50 is 5.0. (5) The target protein (P45446) has sequence MCENQLKTKADGTAQIEVIPCKICGDKSSGIHYGVITCEGCKGFFRRSQQNNASYSCPRQRNCLIDRTNRNRCQHCRLQKCLALGMSRDAVKFGRMSKKQRDSLYAEVQKHQQRLQEQRQQQSGEAEALARVYSSSISNGLSNLNTETGGTYANGHVIDLPKSEGYYNIDSGQPSPDQSGLDMTGIKQIKQEPIYDLTSVHNLFTYSSFNNGQLAPGITMSEIDRIAQNIIKSHLETCQYTMEELHQLAWQTHTYEEIKAYQSKSREALWQQCAIQITHAIQYVVEFAKRITGFMELCQNDQILLLKSGCLEVVLVRMCRAFNPLNNTVLFEGKYGGMQMFKALGSDDLVNEAFDFAKNLCSLQLTEEEIALFSSAVLISPDRAWLLEPRKVQKLQEKIYFALQHVIQKNHLDDETLAKLIAKIPTITAVCNLHGEKLQVFKQSHPDIVNTLFPPLYKELFNPDCAAVCK. The pIC50 is 5.0. The small molecule is Cc1cccc(CC2=N[C@@H](CCCCN3C[C@H](Cc4ccccc4)N(CCc4ccccc4)C3=S)CN2)c1. (6) The compound is CS(=O)(=O)N1CCc2oc(-c3ccnc(Nc4ccc(OCCN5CCCC5)cc4)n3)cc2C1. The target protein (Q62137) has sequence MAPPSEETPLIPQRSCSLSSSEAGALHVLLPPRGPGPPQRLSFSFGDYLAEDLCVRAAKACGILPVYHSLFALATEDFSCWFPPSHIFCIEDVDTQVLVYRLRFYFPDWFGLETCHRFGLRKDLTSAILDLHVLEHLFAQHRSDLVSGRLPVGLSMKEQGEFLSLAVLDLAQMAREQAQRPGELLKTVSYKACLPPSLRDVIQGQNFVTRRRIRRTVVLALRRVVACQADRYALMAKYILDLERLHPAATTETFRVGLPGAQEEPGLLRVAGDNGISWSSGDQELFQTFCDFPEIVDVSIKQAPRVGPAGEHRLVTVTRMDGHILEAEFPGLPEALSFVALVDGYFRLICDSRHYFCKEVAPPRLLEEEAELCHGPITLDFAIHKLKAAGSLPGTYILRRSPQDYDSFLLTACVQTPLGPDYKGCLIRQDPSGAFSLVGLSQPHRSLRELLAACWNSGLRVDGAALNLTSCCAPRPKEKSNLIVVRRGCTPAPAPGCSPS.... The pIC50 is 5.3. (7) The small molecule is O=C(O)COc1cc(Cl)ccc1C(=O)NCc1c(Br)c(Br)c(Br)c(Br)c1Br. The target protein sequence is MATFVELSTKAKMPIVGLGTWKSPLGKVKEAVKVAIDAGYRHIDCAYVYQNEHEVGEAIQEKIQEKAVKREDLFIVSKLWPTFFERPLVRKAFEKTLKDLKLSYLDVYLIHWPQGFKSGDDLFPRDDKGNAIGGKATFLDAWEAMEELVDEGLVKALGVSNFSHFQIEKLLNKPGLKYKPVTNQVECHPYLTQEKLIQYCHSKGITVTAYSPLGSPDRPWAKPEDPSLLEDPKIKEIAAKHKKTAAQVLIRFHIQRNVIVIPKSVTPARIVENIQVFDFKLSDEEMATILSFNRNWRACNLLQSSHLEDYPFNAEY. The pIC50 is 7.1. (8) The compound is CCCC[C@]1(CC)CS(=O)(=O)c2cc(OC)c(OC)cc2[C@@H](c2ccccc2)N1. The target protein (P70172) has sequence MDNSSVCPPNATVCEGDSCVVPESNFNAILNTVMSTVLTILLAMVMFSMGCNVEVHKFLGHIKRPWGIFVGFLCQFGIMPLTGFILSVASGILPVQAVVVLIMGCCPGGTGSNILAYWIDGDMDLSVSMTTCSTLLALGMMPLCLFVYTKMWVDSGTIVIPYDSIGISLVALVIPVSFGMFVNHKWPQKAKIILKIGSITGVILIVLIAVIGGILYQSAWIIEPKLWIIGTIFPIAGYSLGFFLARLAGQPWYRCRTVALETGMQNTQLCSTIVQLSFSPEDLNLVFTFPLIYTVFQLVFAAVILGIYVTYRKCYGKNDAEFLEKTDNEMDSRPSFDETNKGFQPDEK. The pIC50 is 8.5. (9) The small molecule is CC[C@@]1(O)C(=O)OCc2c1cc1n(c2=O)Cc2cc3c(CN(C)C)c(O)ccc3nc2-1. The target protein (Q86VL8) has sequence MDSLQDTVALDHGGCCPALSRLVPRGFGTEMWTLFALSGPLFLFQVLTFMIYIVSTVFCGHLGKVELASVTLAVAFVNVCGVSVGVGLSSACDTLMSQSFGSPNKKHVGVILQRGALVLLLCCLPCWALFLNTQHILLLFRQDPDVSRLTQDYVMIFIPGLPVIFLYNLLAKYLQNQGWLKGQEEESPFQTPGLSILHPSHSHLSRASFHLFQKITWPQVLSGVVGNCVNGVANYALVSVLNLGVRGSAYANIISQFAQTVFLLLYIVLKKLHLETWAGWSSQCLQDWGPFFSLAVPSMLMICVEWWAYEIGSFLMGLLSVVDLSAQAVIYEVATVTYMIPLGLSIGVCVRVGMALGAADTVQAKRSAVSGVLSIVGISLVLGTLISILKNQLGHIFTNDEDVIALVSQVLPVYSVFHVFEAICCVYGGVLRGTGKQAFGAAVNAITYYIIGLPLGILLTFVVRMRIMGLWLGMLACVFLATAAFVAYTARLDWKLAAEE.... The pIC50 is 5.1.