Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50. From a dataset of Drug-target binding data from BindingDB using IC50 measurements. (1) The drug is O=C(Nc1ccccc1C(=O)O)c1ccc(-c2ccc(F)c(F)c2)c(Oc2ccccc2)c1. The target protein (Q820T1) has sequence MKNYARISCTSRYVPENCVTNHQLSEMMDTSDEWIHSRTGISERRIVTQENTSDLCHQVAKQLLEKSGKQASEIDFILVATVTPDFNMPSVACQVQGAIGATEAFAFDISAACSGFVYALSMAEKLVLSGRYQTGLVIGGETFSKMLDWTDRSTAVLFGDGAAGVLIEAAETPHFLNEKLQADGQRWAALTSGYTINESPFYQGHKQASKTLQMEGRSIFDFAIKDVSQNILSLVTDETVDYLLLHQANVRIIDKIARKTKISREKFLTNMDKYGNTSAASIPILLDEAVENGTLILGSQQRVVLTGFGGGLTWGSLLLTL. The pIC50 is 6.5. (2) The drug is CCCCCCOC(=O)[C@]1(O)C[C@@H]2O[C@@]1(C)n1c3ccccc3c3c4c(c5c6ccccc6n2c5c31)C(=O)NC4. The target protein sequence is MGNAAAAKKGSEQESVKEFLAKAKEDFLKKWENPAQNTAHLDQFERIKTLGTGSFGRVMLVKHMETGNHYAMKILDKQKVVKLKQIEHTLNEKRILQAVNFPFLVKLEFSFKDNSNLYMVMEYVPGGEMFSHLRRIGRFSEPHARFYAAQIVLTFEYLHSLDLIYRDLKPENLLIDQQGYIKVADFGFAKRVKGRTWTLCGTPEYLAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPFFADQPIQIYEKIVSGKVRFPSHFSSDLKDLLRNLLQVDLTKRFGNLKNGVNDIKNHKWFATTDWIAIYQRKVEAPFIPKFKGPGDTSNFDDYEEEEIRVSINEKCGKEFSEF. The pIC50 is 7.3. (3) The small molecule is O=C(NCC1CCNCC1)[C@@H]1CC[C@@H]2CN1C(=O)N2OS(=O)(=O)O. The target protein sequence is MKKFILPIFSISILVSLSACSSIKTKSEDNFHISSQQHEKAIKSYFDEAQTQGVIIIKEGKNLSTYGNALARANKEYVPASTFKMLNALIGLENHKATTNEIFKWDGKKRTYPMWEKDMTLGEAMALSAVPVYQELARRTGLELMQKEVKRVNFGNTNIGTQVDNFWLVGPLKITPVQEVNFADDLAHNRLPFKLETQEEVKKMLLIKEVNGSKIYAKSGWGMGVTPQVGWLTGWVEQANGKKIPFSLNLEMKEGMSGSIRNEITYKSLENLGII. The pIC50 is 4.3. (4) The compound is O=C(N[C@@H]1CNCCC[C@H]1OC(=O)c1cc(O)c(Cc2c(O)cccc2C(=O)O)c(O)c1)c1ccc(O)cc1. The target protein (P05771) has sequence MADPAAGPPPSEGEESTVRFARKGALRQKNVHEVKNHKFTARFFKQPTFCSHCTDFIWGFGKQGFQCQVCCFVVHKRCHEFVTFSCPGADKGPASDDPRSKHKFKIHTYSSPTFCDHCGSLLYGLIHQGMKCDTCMMNVHKRCVMNVPSLCGTDHTERRGRIYIQAHIDRDVLIVLVRDAKNLVPMDPNGLSDPYVKLKLIPDPKSESKQKTKTIKCSLNPEWNETFRFQLKESDKDRRLSVEIWDWDLTSRNDFMGSLSFGISELQKASVDGWFKLLSQEEGEYFNVPVPPEGSEANEELRQKFERAKISQGTKVPEEKTTNTVSKFDNNGNRDRMKLTDFNFLMVLGKGSFGKVMLSERKGTDELYAVKILKKDVVIQDDDVECTMVEKRVLALPGKPPFLTQLHSCFQTMDRLYFVMEYVNGGDLMYHIQQVGRFKEPHAVFYAAEIAIGLFFLQSKGIIYRDLKLDNVMLDSEGHIKIADFGMCKENIWDGVTTKT.... The pIC50 is 5.3. (5) The compound is COCCOc1ccn2c(-c3ccc4cccc(O[C@@H]5CCNC[C@H]5F)c4n3)cnc2c1. The target protein sequence is MGLPGVIPALVLRGQLLLSVLWLLGPQTSRGLVITPPGPEFVLNISSTFVLTCSGSAPVMWEQMSQVPWQEAAMNQDGTFSSVLTLTNVTGGDTGEYFCVYNNSLGPELSERKRIYIFVPDPTMGFLPMDSEDLFIFVTDVTETTIPCRVTDPQLEVTLHEKKVDIPLHVPYDHQRGFTGTFEDKTYICKTTIGDREVDSDTYYVYSLQVSSINVSVNAVQTVVRQGESITIRCIVMGNDVVNFQWTYPRMKSGRLVEPVTDYLFGVPSRIGSILHIPTAELSDSGTYTCNVSVSVNDHGDEKAINISVIENGYVRLLETLGDVEIAELHRSRTLRVVFEAYPMPSVLWLKDNRTLGDSGAGELVLSTRNMSETRYVSELILVRVKVSEAGYYTMRAFHEDDEVQLSFKLQVNVPVRVLELSESHPANGEQTIRCRGRGMPQPNVTWSTCRDLKRCPRKLSPTPLGNSSKEESQLETNVTFWEEDQEYEVVSTLRLRHVD.... The pIC50 is 8.5. (6) The compound is N#CC(c1ccc(C(F)(F)F)cc1)c1ccc(C(F)(F)F)c(Cl)n1. The pIC50 is 5.6. The target protein sequence is MGMRTVLTGLAGMLLGSMMPVQADMPRPTGLAADIRWTAYGVPHIRAKDERGLGYGIGYAYARDNACLLAEEIVTARGERARYFGSEGKSSAELDNLPSDIFYAWLNQPEALQAFWQAQTPAVRQLLEGYAAGFNRFLREADGKTTSCLGQPWLRAIATDDLLRLTRRLLVEGGVGQFADALVAAAPPGTEKVALSGEQAFQVAEQRRQRFRLERGSNAIAVGSERSADGKGMLLANPHFPWNGAMRFYQMHLTIPGRLDVMGASLPGLPVVNIGFSRHLAWTHTVDTSSHFTLYRLALDPKDPRRYLVDGRSLPLEEKSVAIEVRGADGKLSRVEHKVYQSIYGPLVVWPGKLDWNRSEAYALRDANLENTRVLQQWYSINQASDVADLRRRVEALQGIPWVNTLAADEQGNALYMNQSVVPYLKPELIPACAIPQLVAEGLPALQGQDSRCAWSRDPAAAQAGITPAAQLPVLLRRDFVQNSNDSAWLTNPASPLQGF....