Dataset: Drug-target binding data from BindingDB using IC50 measurements. Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50. (1) The compound is CC(C)Cc1c(C(=O)C(N)=O)c2c(OCC(=O)NS(=O)(=O)c3ccccc3)cccc2n1Cc1ccccc1. The target protein (Q9QUL3) has sequence MKPPIALACLCLLVPLAGGNLVQFGVMIERMTGKPALQYNDYGCYCGVGGSHWPVDETDWCCHAHDCCYGRLEKLGCDPKLEKYLFSITRDNIFCAGRTACQRHTCECDKRAALCFRHNLNTYNRKYAHYPNKLCTGPTPPC. The pIC50 is 5.8. (2) The drug is C[C@]1(/C=C\c2cncs2)[C@H](C(=O)[O-])N2C(=O)C[C@H]2S1(=O)=O. The target protein (P05193) has sequence MMKKSICCALLLTASFSTFAAAKTEQQIADIVNRTITPLMQEQAIPGMAVAIIYEGKPYYFTWGKADIANNHPVTQQTLFELGSVSKTFNGVLGGDRIARGEIKLSDPVTKYWPELTGKQWRGISLLHLATYTAGGLPLQIPGDVTDKAELLRFYQNWQPQWTPGAKRLYANSSIGLFGALAVKSSGMSYEEAMTRRVLQPLKLAHTWITVPQSEQKNYAWGYLEGKPVHVSPGQLDAEAYGVKSSVIDMARWVQANMDASHVQEKTLQQGIELAQSRYWRIGDMYQGLGWEMLNWPLKADSIINGSDSKVALAALPAVEVNPPAPAVKASWVHKTGSTGGFGSYVAFVPEKNLGIVMLANKSYPNPARVEAAWRILEKLQ. The pIC50 is 6.4. (3) The drug is C[C@H](NC(=O)OCc1ccccc1)C(=O)N[C@@H](C)C(=O)NN(CC(N)=O)C(=O)C=CC(=O)N(C)c1ccccc1. The target protein sequence is MMLFSLFLISILHILLVKCQLDTNYEVSDETVSDNNKWAVLVAGSNGYPNYRHQADVCHAYHVLRSKGIKPEHIITMMYDDIAYNLMNPFPGKLFNDYNHKDWYEGVVIDYRGKKVNSKTFLKVLKGDKSAGGKVLKSGKNDDVFIYFTDHGAPGLIAFPDDELYAKQFMSTLKYLHSHKRYSKLVIYIEACESGSMFQRILPSNLSIYATTAASPTESSYGTFCDDPTITTCLADLYSYDWIVDSQTHHLTQRTLDQQYKEVKRETNLSHVQRYGDTRMGKLHVSEFQGSRDKSSTENDEPPMKPRHSIASRDIPLHTLHRQIMMTNNAEDKSFLMQILGLKLKRRDLIEDTMKLIVKVMNNEEIPNTKATIDQTLDCTESVYEQFKSKCFTLQQAPEVGGHFSTLYNYCADGYTAETINEAIIKICG. The pIC50 is 7.2. (4) The small molecule is CC(=O)Nc1nc(C)c(-c2cnc(Nc3cccc(C(=O)O)c3)o2)s1. The target protein (Q9JHG7) has sequence MELENYEQPVVLREDNLRRRRRMKPRSAAGSLSSMELIPIEFVLPTSQRISKTPETALLHVAGHGNVEQMKAQVWLRALETSVAAEFYHRLGPDQFLLLYQKKGQWYEIYDRYQVVQTLDCLHYWKLMHKSPGQIHVVQRHVPSEETLAFQKQLTSLIGYDVTDISNVHDDELEFTRRRLVTPRMAEVAGRDAKLYAMHPWVTSKPLPDYLSKKIANNCIFIVIHRGTTSQTIKVSADDTPGTILQSFFTKMAKKKSLMNISESQSEQDFVLRVCGRDEYLVGETPLKNFQWVRQCLKNGDEIHLVLDTPPDPALDEVRKEEWPLVDDCTGVTGYHEQLTIHGKDHESVFTVSLWDCDRKFRVKIRGIDIPVLPRNTDLTVFVEANIQHGQQVLCQRRTSPKPFAEEVLWNVWLEFGIKIKDLPKGALLNLQIYCCKTPSLSSKASAETPGSESKGKAQLLYYVNLLLIDHRFLLRHGDYVLHMWQISGKAEEQGSFNAD.... The pIC50 is 5.0. (5) The compound is Cn1cc(-c2cc(-c3ccc(N4CCN(Cc5ccccc5F)CC4)nc3)c3c(C#N)cnn3c2)cn1. The target protein sequence is HCYHKFAHKPPISSAEMTFRRPAQAFPVSYSSSGARRPSLDSMENQVSVDAFKILEDPKWEFPRKNLVLGKTLGEGEFGKVVKATAFHLKGRAGYTTVAVKMLKENASPSELRDLLSEFNVLKQVNHPHVIKLYGACSQDGPLLLIMEYAKYGSLRGFLRESRKVGPGYLGSGGSRNSSSLDHPDERALTMGDLISFAWQISQGMQYLAEMKLVHRDLAARNILVAEGRKMKISDFGLSRDVYEEDSYVKRSQGRIPVKWMAIESLFDHIYTTQSDVWSFGVLLWEIVTLGGNPYPGIPPERLFNLLKTGHRMERPDNCSEEMYRLMLQCWKQEPDKRPVFADISKDLEKMMVKRRDYLDLAASTPSDSLIYDDGLSEEETPLVDCNNAPLPRALPSTWIENKLYGMSDPNWPGESPVPLTRADGTNTGFPRYPNDSVYANWMLSPSAAKLMDTFDS. The pIC50 is 7.6. (6) The compound is NC(N)=Nc1cc(N)cc(C(=O)O)c1. The target protein (P03472) has sequence MNPNQKILCTSATALVIGTIAVLIGITNLGLNIGLHLKPSCNCSHSQPEATNASQTIINNYYNDTNITQISNTNIQVEERAIRDFNNLTKGLCTINSWHIYGKDNAVRIGEDSDVLVTREPYVSCDPDECRFYALSQGTTIRGKHSNGTIHDRSQYRALISWPLSSPPTVYNSRVECIGWSSTSCHDGKTRMSICISGPNNNASAVIWYNRRPVTEINTWARNILRTQESECVCHNGVCPVVFTDGSATGPAETRIYYFKEGKILKWEPLAGTAKHIEECSCYGERAEITCTCRDNWQGSNRPVIRIDPVAMTHTSQYICSPVLTDNPRPNDPTVGKCNDPYPGNNNNGVKGFSYLDGVNTWLGRTISIASRSGYEMLKVPNALTDDKSKPTQGQTIVLNTDWSGYSGSFMDYWAEGECYRACFYVELIRGRPKEDKVWWTSNSIVSMCSSTEFLGQWDWPDGAKIEYFL. The pIC50 is 2.7.