This data is from Drug-target binding data from BindingDB using IC50 measurements. The task is: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50. (1) The pIC50 is 5.8. The target protein (Q6J4K2) has sequence MAGRRLNLRWALSVLCVLLMAETVSGTRGSSTGAHISPQFPASGVNQTPVVDCRKVCGLNVSDRCDFIRTNPDCHSDGGYLDYLEGIFCHFPPSLLPLAVTLYVSWLLYLFLILGVTAAKFFCPNLSAISTTLKLSHNVAGVTFLAFGNGAPDIFSALVAFSDPHTAGLALGALFGAGVLVTTVVAGGITILHPFMAASRPFFRDIVFYMVAVFLTFLMLFRGRVTLAWALGYLGLYVFYVVTVILCTWIYQRQRRGSLFCPMPVTPEILSDSEEDRVSSNTNSYDYGDEYRPLFFYQETTAQILVRALNPLDYMKWRRKSAYWKALKVFKLPVEFLLLLTVPVVDPDKDDQNWKRPLNCLHLVISPLVVVLTLQSGTYGVYEIGGLVPVWVVVVIAGTALASVTFFATSDSQPPRLHWLFAFLGFLTSALWINAAATEVVNILRSLGVVFRLSNTVLGLTLLAWGNSIGDAFSDFTLARQGYPRMAFSACFGGIIFNIL.... The compound is O=C1CSC(c2ccccc2Cl)c2cc(Cl)ccc2N1. (2) The compound is O=C(COc1ccc(C(=O)Nc2cccc(F)c2)c2ccccc12)Nc1ccc(O)cc1. The target protein (P20264) has sequence MATAASNPYLPGNSLLAAGSIVHSDAAGAGGGGGGGGGGGGGGAGGGGGGMQPGSAAVTSGAYRGDPSSVKMVQSDFMQGAMAASNGGHMLSHAHQWVTALPHAAAAAAAAAAAAVEASSPWSGSAVGMAGSPQQPPQPPPPPPQGPDVKGGAGRDDLHAGTALHHRGPPHLGPPPPPPHQGHPGGWGAAAAAAAAAAAAAAAAHLPSMAGGQQPPPQSLLYSQPGGFTVNGMLSAPPGPGGGGGGAGGGAQSLVHPGLVRGDTPELAEHHHHHHHHAHPHPPHPHHAQGPPHHGGGGGGAGPGLNSHDPHSDEDTPTSDDLEQFAKQFKQRRIKLGFTQADVGLALGTLYGNVFSQTTICRFEALQLSFKNMCKLKPLLNKWLEEADSSTGSPTSIDKIAAQGRKRKKRTSIEVSVKGALESHFLKCPKPSAQEITNLADSLQLEKEVVRVWFCNRRQKEKRMTPPGIQQQTPDDVYSQVGTVSADTPPPHHGLQTSVQ.... The pIC50 is 3.9. (3) The target protein (Q9P8Q7) has sequence MPYTPIDIQKEEADFQKEVAEIKKWWSEPRWRKTKRIYSAEDIAKKRGTLKINHPSSQQADKLFKLLETHDADKTVSFTFGALDPIHVAQMAKYLDSIYVSGWQCSSTASTSNEPSPDLADYPMDTVPNKVEHLWFAQLFHDRKQREERLTLSKEERAKTPYIDFLRPIIADADTGHGGITAIIKLTKMFIERGAAGIHIEDQAPGTKKCGHMAGKVLVPVQEHINRLVAIRASADIFGSNLLAVARTDSEAATLITSTIDHRDHYFIIGATNPEAGDLAALMAEAESKGIYGNELAAIESEWTKKAGLKLFHEAVIDEIKNGNYSNKDALIKKFTDKVNPLSHTSHKEAKKLAKELTGKDIYFNWDVARAREGYYRYQGGTQCAVMRGRAFAPYADLIWMESALPDYAQAKEFADGVKAAVPDQWLAYNLSPSFNWNKAMPADEQETYIKRLGKLGYVWQFITLAGLHTTALAVDDFSNQYSQIGMKAYGQTVQQPEIE.... The compound is O=C(O)C1=NCCc2c1[nH]c1ccc(O)cc21. The pIC50 is 3.4. (4) The compound is CCCC(C(=O)O)=C1O[C@@H]2CC(=O)N2C1C(=O)OC. The target protein (P0AD63) has sequence MRYIRLCIISLLATLPLAVHASPQPLEQIKLSESQLSGRVGMIEMDLASGRTLTAWRADERFPMMSTFKVVLCGAVLARVDAGDEQLERKIHYRQQDLVDYSPVSEKHLADGMTVGELCAAAITMSDNSAANLLLATVGGPAGLTAFLRQIGDNVTRLDRWETELNEALPGDARDTTTPASMAATLRKLLTSQRLSARSQRQLLQWMVDDRVAGPLIRSVLPAGWFIADKTGAGERGARGIVALLGPNNKAERIVVIYLRDTPASMAERNQQIAGIGAALIEHWQR. The pIC50 is 6.5. (5) The compound is Cc1cc(N)ccc1/N=N/c1cc(S(=O)(=O)O)c2cccc(S(=O)(=O)[O-])c2c1. The target protein (Q9BQF6) has sequence MDKRKLGRRPSSSEIITEGKRKKSSSDLSEIRKMLNAKPEDVHVQSPLSKFRSSERWTLPLQWERSLRNKVISLDHKNKKHIRGCPVTSKSSPERQLKVMLTNVLWTDLGRKFRKTLPRNDANLCDANKVQSDSLPSTSVDSLETCQKLEPLRQSLNLSERIPRVILTNVLGTELGRKYIRTPPVTEGSLSDTDNLQSEQLSSSSDGSLESYQNLNPHKSCYLSERGSQRSKTVDDNSAKQTAHNKEKRRKDDGISLLISDTQPEDLNSGSRGCDHLEQESRNKDVKYSDSKVELTLISRKTKRRLRNNLPDSQYCTSLDKSTEQTKKQEDDSTISTEFEKPSENYHQDPKLPEEITTKPTKSDFTKLSSLNSQELTLSNATKSASAGSTTETVENSNSIDIVGISSLVEKDENELNTIEKPILRGHNEGNQSLISAEPIVVSSDEEGPVEHKSSEILKLQSKQDRETTNENESTSESALLELPLITCESVQMSSELCPY.... The pIC50 is 5.7. (6) The drug is C=CS(=O)(=O)Nc1cccc(-c2nn(C(C)C)c3ncnc(N)c23)c1. The target protein sequence is QTQGLAKDAWEIPRESLRLEVKLGQGCFGEVWMGTWNGTTRVAIKTLKPGTMSPEAFLQEAQVMKKLRHEKLAQLYAVVSEEPIYIVCEYMSKGSLLDFLKGEMGKYLRLPQLVDMAAQIASGMAYVERMNYVHRDLRAANILVGENLVCKVADFGLARLIEDNEYTARQGAKFPIKWTAPEAALYGRFTIKSDVWSFGILLTELTTKGRVPYPGMVNREVLDQVERGYRMPCPPECPESLHDLMCQCWRKDPEERPTFEYLQAFLEDYFTSTEPQYQPGENL. The pIC50 is 7.2. (7) The small molecule is NC(=O)c1cc2c(SC3CCCCC3)cncc2s1. The target protein (P05362) has sequence MAPSSPRPALPALLVLLGALFPGPGNAQTSVSPSKVILPRGGSVLVTCSTSCDQPKLLGIETPLPKKELLLPGNNRKVYELSNVQEDSQPMCYSNCPDGQSTAKTFLTVYWTPERVELAPLPSWQPVGKNLTLRCQVEGGAPRANLTVVLLRGEKELKREPAVGEPAEVTTTVLVRRDHHGANFSCRTELDLRPQGLELFENTSAPYQLQTFVLPATPPQLVSPRVLEVDTQGTVVCSLDGLFPVSEAQVHLALGDQRLNPTVTYGNDSFSAKASVSVTAEDEGTQRLTCAVILGNQSQETLQTVTIYSFPAPNVILTKPEVSEGTEVTVKCEAHPRAKVTLNGVPAQPLGPRAQLLLKATPEDNGRSFSCSATLEVAGQLIHKNQTRELRVLYGPRLDERDCPGNWTWPENSQQTPMCQAWGNPLPELKCLKDGTFPLPIGESVTVTRDLEGTYLCRARSTQGEVTRKVTVNVLSPRYEIVIITVVAAAVIMGTAGLST.... The pIC50 is 6.1. (8) The small molecule is CCCCOc1ccc2cc(S(=O)(=O)N[C@H](CCC(=O)OC)C(=O)OC)ccc2c1. The target protein (P14900) has sequence MADYQGKNVVIIGLGLTGLSCVDFFLARGVTPRVMDTRMTPPGLDKLPEAVERHTGSLNDEWLMAADLIVASPGIALAHPSLSAAADAGIEIVGDIELFCREAQAPIVAITGSNGKSTVTTLVGEMAKAAGVNVGVGGNIGLPALMLLDDECELYVLELSSFQLETTSSLQAVAATILNVTEDHMDRYPFGLQQYRAAKLRIYENAKVCVVNADDALTMPIRGADERCVSFGVNMGDYHLNHQQGETWLRVKGEKVLNVKEMKLSGQHNYTNALAALALADAAGLPRASSLKALTTFTGLPHRFEVVLEHNGVRWINDSKATNVGSTEAALNGLHVDGTLHLLLGGDGKSADFSPLARYLNGDNVRLYCFGRDGAQLAALRPEVAEQTETMEQAMRLLAPRVQPGDMVLLSPACASLDQFKNFEQRGNEFARLAKELG. The pIC50 is 2.7. (9) The small molecule is O=C1C(Cl)=C(Cl)C(=O)N1c1ccccc1Cl. The target protein sequence is MAESELMHIHSLAEHYLQYVLQVPAFESAPSQACRVLQRVAFSVQKEVEKNLKSYLDDFHVESIDTARIIFNQVMEKEFEDGIINWGRIVTIFAFGGVLLKKLPQEQIALDVCAYKQVSSFVAEFIMNNTGEWIRQNGGWEDGFIKKFEPKSGWLTFLQMTGQIWEMLFLLK. The pIC50 is 5.4.