Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50.. Dataset: Drug-target binding data from BindingDB using IC50 measurements (1) The drug is Cc1cn(C2CN(C)C2)c2c1C(=O)N(c1cc(C)c3nnn(C)c3n1)C2c1ccc(Cl)cc1. The target protein sequence is NPPPPETSNPNKPKRQTNQLQYLLRVVLKTLWKHQFAWPFQQPVDAVKLNLPDYYKIIKTPMDMGTIKKRLENNYYWNAQECIQDFNTMFTNCYIYNKPGDDIVLMAEALEKLFLQKINELPTEETEIMIVQAKGRGRGRKETGTAKPGVSTVPNTTQASTPPQTQTPQPNPPPVQATPHPFPAVTPDLIVQTPVMTVVPPQPLQTPPPVPPQPQPPPAPAPQPVQSHPPIIAATPQPVKTKKGVKRKADTTTPTTIDPIHEPPSLPPEPKTTKLGQRRESSRPVKPPKKDVPDSQQHPAPEKSSKVSEQLKCCSGILKEMFAKKHAAYAWPFYKPVDVEALGLHDYCDIIKHPMDMSTIKSKLEAREYRDAQEFGADVRLMFSNCYKYNPPDHEVVAMARKLQDVFEMRFAKMPDEPEEPVVAVSSPAVPPPT. The pIC50 is 7.8. (2) The drug is COC(=O)[C@H](Cc1ccc(-c2ccccc2)cc1)NC(=O)CCCCCCCC(=O)NO. The target protein (Q4QQW4) has sequence MAQTQGTKRKVCYYYDGDVGNYYYGQGHPMKPHRIRMTHNLLLNYGLYRKMEIYRPHKANAEEMTKYHSDDYIKFLRSIRPDNMSEYSKQMQRFNVGEDCPVFDGLFEFCQLSTGGSVASAVKLNKQQTDIAVNWAGGLHHAKKSEASGFCYVNDIVLAILELLKYHQRVLYIDIDIHHGDGVEEAFYTTDRVMTVSFHKYGEYFPGTGDLRDIGAGKGKYYAVNYPLRDGIDDESYEAIFKPVMSKVMEMFQPSAVVLQCGSDSLSGDRLGCFNLTIKGHAKCVEFVKSFNLPMLMLGGGGYTIRNVARCWTYETAVALDTEIPNELPYNDYFEYFGPDFKLHISPSNMTNQNTNEYLEKIKQRLFENLRMLPHAPGVQMQAIPEDAIPEESGDEDEEDPDKRISICSSDKRIACEEEFSDSDEEGEGGRKNSSNFKKAKRVKTEDEKEKDPEEKKEVTEEEKTKEEKPEAKGVKEEVKMA. The pIC50 is 5.8. (3) The small molecule is CN(CCOC(c1ccc(C(F)(F)F)cc1)c1ccc(C(F)(F)F)cc1)[C@H]1CCCC[C@H]1CO. The target protein (P31649) has sequence MENRASGTTSNGETKPVCPAMEKVEEDGTLEREHWNNKMEFVLSVAGEIIGLGNVWRFPYLCYKNGGGAFFIPYLIFLFTCGIPVFFLETALGQYTNQGGITAWRRICPIFEGIGYASQMIVSLLNVYYIVVLAWALFYLFSSFTTDLPWGSCSHEWNTENCVEFQKANDSMNVTSENATSPVIEFWERRVLKLSDGIQHLGSLRWELVLCLLLAWIICYFCIWKGVKSTGKVVYFTATFPYLMLVVLLIRGVTLPGAAQGIQFYLYPNITRLWDPQVWMDAGTQIFFSFAICLGCLTALGSYNKYHNNCYRDCIALCILNSSTSFMAGFAIFSILGFMSQEQGVPISEVAESGPGLAFIAYPRAVVMLPFSPLWACCFFFMVVLLGLDSQFVCVESLVTALVDMYPRVFRKKNRREVLILIVSVISFFIGLIMLTEGGMYVFQLFDYYAASGMCLLFVAIFESLCVAWVYGAGRFYDNIEDMIGYKPWPLIKYCWLFFT.... The pIC50 is 3.5. (4) The target protein (Q99884) has sequence MKKLQGAHLRKPVTPDLLMTPSDQGDVDLDVDFAAHRGNWTGKLDFLLSCIGYCVGLGNVWRFPYRAYTNGGGAFLVPYFLMLAICGIPLFFLELSLGQFSSLGPLAVWKISPLFKGAGAAMLLIVGLVAIYYNMIIAYVLFYLFASLTSDLPWEHCGNWWNTELCLEHRVSKDGNGALPLNLTCTVSPSEEYWSRYVLHIQGSQGIGSPGEIRWNLCLCLLLAWVIVFLCILKGVKSSGKVVYFTATFPYLILLMLLVRGVTLPGAWKGIQFYLTPQFHHLLSSKVWIEAALQIFYSLGVGFGGLLTFASYNTFHQNIYRDTFIVTLGNAITSILAGFAIFSVLGYMSQELGVPVDQVAKAGPGLAFVVYPQAMTMLPLSPFWSFLFFFMLLTLGLDSQFAFLETIVTAVTDEFPYYLRPKKAVFSGLICVAMYLMGLILTTDGGMYWLVLLDDYSASFGLMVVVITTCLAVTRVYGIQRFCRDIHMMLGFKPGLYFRA.... The pIC50 is 5.0. The small molecule is O=C(Nc1ccc(C(=O)N2CCN(c3ncccn3)CC2)cc1)OCc1ccccc1. (5) The compound is N=C(N)NCCC[C@H](NC(=O)[C@@H]1CCC2CC[C@@](N)(Cc3ccccc3)CN21)C(=O)c1nc2ccccc2s1. The target protein (Q01177) has sequence MDHKEIILLFLLFLKPGQGDSLDGYVSTQGASLHSLTKKQLAAGSIADCLAKCEGETDFICRSFQYHSKEQQCVIMAENSKTSSIIRMRDVILFEKRVYLSECKTGIGKGYRGTMSKTKTGVTCQKWSDTSPHVPKYSPSTHPSEGLEENYCRNPDNDEQGPWCYTTDPDQRYEYCNIPECEEECMYCSGEKYEGKISKTMSGLDCQSWDSQSPHAHGYIPAKFPSKNLKMNYCRNPDGEPRPWCFTTDPNKRWEYCDIPRCTTPPPPPGPTYQCLKGRGENYRGTVSVTASGKTCQRWSEQTPHRHNRTPENFPCKNLEENYCRNPDGETAPWCYTTDSQLRWEYCEIPSCGSSVSPDQSDSSVLPEQTPVVQECYQGNGKSYRGTSSTTNTGKKCQSWVSMTPHSHSKTPANFPDAGLEMNYCRNPDNDQRGPWCFTTDPSVRWEYCNLKRCSETGGGVAESAIVPQVPSAPGTSETDCMYGNGKEYRGKTAVTAAGT.... The pIC50 is 6.6. (6) The drug is CC(C)NC(=O)Cn1nc(-c2ccc(CCN3CCOCC3)cc2)nc1-c1cccc(Cl)c1. The target protein (P47901) has sequence MDSGPLWDANPTPRGTLSAPNATTPWLGRDEELAKVEIGVLATVLVLATGGNLAVLLTLGQLGRKRSRMHLFVLHLALTDLAVALFQVLPQLLWDITYRFQGPDLLCRAVKYLQVLSMFASTYMLLAMTLDRYLAVCHPLRSLQQPGQSTYLLIAAPWLLAAIFSLPQVFIFSLREVIQGSGVLDCWADFGFPWGPRAYLTWTTLAIFVLPVTMLTACYSLICHEICKNLKVKTQAWRVGGGGWRTWDRPSPSTLAATTRGLPSRVSSINTISRAKIRTVKMTFVIVLAYIACWAPFFSVQMWSVWDKNAPDEDSTNVAFTISMLLGNLNSCCNPWIYMGFNSHLLPRPLRHLACCGGPQPRMRRRLSDGSLSSRHTTLLTRSSCPATLSLSLSLTLSGRPRPEESPRDLELADGEGTAETIIF. The pIC50 is 8.6. (7) The drug is O=c1[nH]cnc2c1nc(NCc1ccc(-c3ccccc3)c(OCCCCO)c1)n2[C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O. The target protein (Q62773) has sequence MAKSEGRKSASQDTSENGMENPGLELMEVGNLEQGKTLEEVTQGHSLKDGLGHSSLWRRILQPFTKARSFYQRHAGLFKKILLGLLCLAYAAYLLAACILNFRRALALFVITCLVIFILACHFLKKFFAKKSIRCLKPLKNTRLRLWLKRVFMGAAVVGLILWLALDTAQRPEQLISFAGICMFILILFACSKHHSAVSWRTVFWGLGLQFVFGILVIRTEPGFNAFQWLGDQIQIFLAYTVEGSSFVFGDTLVQSVFAFQSLPIIIFFGCVMSILYYLGLVQWVIQKIAWFLQITMGTTAAETLAVAGNIFVGMTEAPLLIRPYLADMTLSEIHAVMTGGFATIAGTVLGAFISFGIDASSLISASVMAAPCALALSKLVYPEVEESKFKSKEGVKLPRGEERNILEAASNGATDAIALVANVAANLIAFLAVLAFINSTLSWLGEMVDIHGLTFQVICSYVLRPMVFMMGVQWADCPLVAEIVGVKFFINEFVAYQQL.... The pIC50 is 3.0. (8) The drug is CN(C)C/C=C/C(=O)Nc1ccc2ncnc(Nc3cccc(Br)c3)c2c1. The target protein sequence is GEAPNQALLRILKETEFKKIKVLGSGAFGTVYKGLWIPEGEKVKIPVAIKELREATSPKANKEILDEAYVMASVDNPHVCRLLGICLTSTVQLIMQLMPFGCLLDYVREHKDNIGSQYLLNWCVQIAKGMNYLEDRRLVHRDLAARNVLVKTPQHVKITDFGLAKLLGAEEKEYHAEGGKVPIKWMALESILHRIYTHQSDVWSYGVTVWELMTFGSKPYDGIPASEISSILEKGERLPQPPICTIDVYMIMVKCWMIDADSRPKFRELIIEFSKMARDPQRYLVIQGDERMHLPSPTDSNFYRALMDEEDMDDVVDADEYLIPQQGFFSSPSTSRTPLLSSLSATSNNSTVACIDRNGLQSCPIKEDSFLQRYSSDPTGALTEDSIDDTFLPVPEYINQSVPKRPAGSVQNPVYHNQPLNPAPSRDPHYQDPHSTAVGNPEYLNTVQPTCVNSTFDSPAHWAQKGSHQISLDNPDYQQDFFPKEAKPNGIFKGSTAENA.... The pIC50 is 8.5. (9) The compound is O=C(Cn1c(-c2nccs2)nc2ccccc21)Nc1ccc2ccccc2c1. The target protein (Q8T6T2) has sequence MGTKNIGKGLTFEDILLVPNYSEVLPREVSLETKLTKNVSLKIPLISSAMDTVTEHLMAVGMARLGGIGIIHKNMDMESQVNEVLKVKNWISNLEKNESTPDQNLDKESTDGKDTKSNNNIDAYSNENLDNKGRLRVGAAIGVNEIERAKLLVEAGVDVIVLDSAHGHSLNIIRTLKEIKSKMNIDVIVGNVVTEEATKELIENGADGIKVGIGPGSICTTRIVAGVGVPQITAIEKCSSVASKFGIPIIADGGIRYSGDIGKALAVGASSVMIGSILAGTEESPGEKELIGDTVYKYYRGMGSVGAMKSGSGDRYFQEKRPENKMVPEGIEGRVKYKGEMEGVVYQLVGGLRSCMGYLGSASIEELWKKSSYVEITTSGLRESHVHDVEIVKEVMNYSK. The pIC50 is 8.1.