This data is from Drug-target binding data from BindingDB using IC50 measurements. The task is: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50. (1) The small molecule is CC(C)C[C@@H](N=C(N)N)C(=O)NCC(=O)N1CCC(c2cc(-c3ccc(OCc4ccc(C(=O)O)cc4)c(Cl)c3Cl)nn2C)CC1. The target protein (P01590) has sequence MEPRLLMLGFLSLTIVPSCRAELCLYDPPEVPNATFKALSYKNGTILNCECKRGFRRLKELVYMRCLGNSWSSNCQCTSNSHDKSRKQVTAQLEHQKEQQTTTDMQKPTQSMHQENLTGHCREPPPWKHEDSKRIYHFVEGQSVHYECIPGYKALQRGPAISICKMKCGKTGWTQPQLTCVDEREHHRFLASEESQGSRNSSPESETSCPITTTDFPQPTETTAMTETFVLTMEYKVAVASCLFLLISILLLSGLTWQHRWRKSRRTI. The pIC50 is 6.7. (2) The compound is CC(C)(C)NC(=O)C1c2ccccc2C(=O)N1Cc1ccccc1-c1ccccc1. The target protein (P22460) has sequence MEIALVPLENGGAMTVRGGDEARAGCGQATGGELQCPPTAGLSDGPKEPAPKGRGAQRDADSGVRPLPPLPDPGVRPLPPLPEELPRPRRPPPEDEEEEGDPGLGTVEDQALGTASLHHQRVHINISGLRFETQLGTLAQFPNTLLGDPAKRLRYFDPLRNEYFFDRNRPSFDGILYYYQSGGRLRRPVNVSLDVFADEIRFYQLGDEAMERFREDEGFIKEEEKPLPRNEFQRQVWLIFEYPESSGSARAIAIVSVLVILISIITFCLETLPEFRDERELLRHPPAPHQPPAPAPGANGSGVMAPPSGPTVAPLLPRTLADPFFIVETTCVIWFTFELLVRFFACPSKAGFSRNIMNIIDVVAIFPYFITLGTELAEQQPGGGGGGQNGQQAMSLAILRVIRLVRVFRIFKLSRHSKGLQILGKTLQASMRELGLLIFFLFIGVILFSSAVYFAEADNQGTHFSSIPDAFWWAVVTMTTVGYGDMRPITVGGKIVGSLC.... The pIC50 is 5.0. (3) The compound is C[C@@H]1NC(=O)[C@@H]([C@@H](C)O)NC(=O)CNC(=O)[C@@H](Cc2cnc[nH]2)NC(=O)[C@H](Cc2c[nH]c3ccccc23)NC(=O)[C@@H](CC(N)=O)NC(=O)CNC(=O)C[C@H](C(=O)N[C@@H](Cc2c[nH]c3ccccc23)C(=O)N[C@@H](Cc2ccccc2)C(=O)N[C@@H](Cc2ccccc2)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](Cc2ccc(O)cc2)C(=O)N[C@@H](Cc2ccc(O)cc2)C(=O)N[C@H](Cc2c[nH]c3ccccc23)C(=O)O)NC(=O)[C@H]2CCCN2C1=O. The target protein (P21450) has sequence METFWLRLSFWVALVGGVISDNPESYSTNLSIHVDSVATFHGTELSFVVTTHQPTNLALPSNGSMHNYCPQQTKITSAFKYINTVISCTIFIVGMVGNATLLRIIYQNKCMRNGPNALIASLALGDLIYVVIDLPINVFKLLAGRWPFEQNDFGVFLCKLFPFLQKSSVGITVLNLCALSVDRYRAVASWSRVQGIGIPLVTAIEIVSIWILSFILAIPEAIGFVMVPFEYKGAQHRTCMLNATSKFMEFYQDVKDWWLFGFYFCMPLVCTAIFYTLMTCEMLNRRNGSLRIALSEHLKQRREVAKTVFCLVVIFALCWFPLHLSRILKKTVYDEMDTNRCELLSFLLLMDYIGINLATMNSCINPIALYFVSKKFKNCFQSCLCCCCYQSKSLMTSVPMNGTSIQWKNHEQNNHNTERSSHKDSIN. The pIC50 is 6.0. (4) The drug is CC(=O)n1c(O)c(-c2sc(=S)n(CCCCCC(=O)O)c2O)c2ccccc21. The target protein (Q29495) has sequence MSTPSVHCLKPSPLHLPSGIPGSPGRQRRHTLPANEFRCLTPEDAAGVFEIEREAFISVSGNCPLNLDEVQHFLTLCPELSLGWFVEGRLVAFIIGSLWDEERLTQESLALHRPRGHSAHLHALAVHRSFRQQGKGSVLLWRYLHHVGAQPAVRRAVLMCEDALVPFYQRFGFHPAGPCAIVVGSLTFTEMHCSLRGHAALRRNSDR. The pIC50 is 5.2. (5) The compound is C[C@]12CC[C@H](O)C[C@H]1CC[C@@H]1[C@@H]2CC[C@]2(C)[C@@H](/C=C/C=O)CC[C@]12O. The target protein (P50997) has sequence MGKGVGRDKYEPAAVSEHGDKKKAKKERDMDELKKEVSMDDHKLSLDELHRKYGTDLSRGLTTARAAEILARDGPNALTPPPTTPEWVKFCRQLFGGFSMLLWIGAILCFLAYGIQAATEEEPQNDNLYLGVVLSAVVIITGCFSYYQEAKSSKIMESFKNMVPQQALVIRNGEKMSINAEEVVIGDLVEVKGGDRIPADLRIISANGCKVDNSSLTGESEPQTRSPDFTNENPLETRNIAFFSTNCVKGTARGIVVYTGDRTVMGRIATLASGLEGGQTPIAAEIEHFIHIITGVAVFLGVSFFILSLILEYTWLEAVIFLIGIIVANVPEGLLATVTVCLTLTAKRMARKNCLVKNLEAVETLGSTSTICSDKTGTLTQNRMTVAHMWFDNQIHEADTTENQSGVSFDKSSATWLALSRIAGLCNRAVFQANQENLPILKRAVAGDASESALLKCIELCCGSVKEMRDRYAKIVEIPFNSTNKYQLSIHKNPNTSEPR.... The pIC50 is 6.6. (6) The small molecule is CCCCCCCCCCC(N)C(=O)N(CCCN(C)C)OCc1ccccc1. The target protein (P09598) has sequence MKKKVLALAAAITVVAPLQSVAFAHENDGGSKIKIVHRWSAEDKHKEGVNSHLWIVNRAIDIMSRNTTLVKQDRVAQLNEWRTELENGIYAADYENPYYDNSTFASHFYDPDNGKTYIPFAKQAKETGAKYFKLAGESYKNKDMKQAFFYLGLSLHYLGDVNQPMHAANFTNLSYPQGFHSKYENFVDTIKDNYKVTDGNGYWNWKGTNPEEWIHGAAVVAKQDYSGIVNDNTKDWFVKAAVSQEYADKWRAEVTPMTGKRLMDAQRVTAGYIQLWFDTYGDR. The pIC50 is 4.5. (7) The drug is O=C(O)[C@H]1CCc2c([nH]c3ccc(Cl)cc23)C1. The target protein (P0A988) has sequence MKFTVEREHLLKPLQQVSGPLGGRPTLPILGNLLLQVADGTLSLTGTDLEMEMVARVALVQPHEPGATTVPARKFFDICRGLPEGAEIAVQLEGERMLVRSGRSRFSLSTLPAADFPNLDDWQSEVEFTLPQATMKRLIEATQFSMAHQDVRYYLNGMLFETEGEELRTVATDGHRLAVCSMPIGQSLPSHSVIVPRKGVIELMRMLDGGDNPLRVQIGSNNIRAHVGDFIFTSKLVDGRFPDYRRVLPKNPDKHLEAGCDLLKQAFARAAILSNEKFRGVRLYVSENQLKITANNPEQEEAEEILDVTYSGAEMEIGFNVSYVLDVLNALKCENVRMMLTDSVSSVQIEDAASQSAAYVVMPMRL. The pIC50 is 3.4. (8) The compound is Cc1c(C(F)(F)F)nn(-c2ccc(S(C)(=O)=O)cn2)c1OC(C)C. The target protein sequence is MLARALVLCAALAVVRAANPCCSHPCQNQGICMSTGFDQYKCDCTRTGFYGENCSTPEFLTRIKLYLKPTPNTVHYILTHFKGVWNIVNNIPFLRNTIMKYVLTSRSHLIESPPTYNVNYGYKSWEAFSNLSYYTRALPPVPDDCPTPMGVKGKKELPDSKEIVEKFLLRRKFIPDPQGTNMMFAFFAQHFTHQFFKTDHKRGPAFTKGLGHGVDLNHVYGETLDRQHKLRLFKDGKMKYQVIDGEVYPPTVKDTQVEMIYPPHVPEHLQFAVGQEVFGLVPGLMMYATIWLREHNRVCDVLKQEHPEWDDERLFQTSRLILIGETIKIVIEDYVQHLSGYHFKLKFDPELLFNQQFQYQNRIAAEFNTLYHWHPLLPDTLQIDDQEYNFQQFIYNNSILLEHGLTQFVESFSRQIAGRVAGGRNVPAAVQQVAKASIDQSRQMKYQSLNEYRKRFRLKPYTSFEELTGEKEMAAGLEALYGDIDAMELYPALLVEKPRP.... The pIC50 is 6.3. (9) The drug is O=C(O)C(Cc1ccc(O)cc1)NC(=O)C(Cc1ccccc1)C(S)CCc1ccccc1. The target protein (P08049) has sequence MGRSESQMDITDINTPKPKKKQRWTPLEISLSVLVLLLTVIAVTMIALYATYDDGICKSSDCIKSAARLIQNMDATAEPCTDFFKYACGGWLKRNVIPETSSRYSNFDILRDELEVILKDVLQEPKTEDIVAVQKAKTLYRSCVNETAIDSRGGQPLLKLLPDVYGWPVATQNWEQTYGTSWSAEKSIAQLNSNYGKKVLINFFVGTDDKNSMNHIIHIDQPRLGLPSRDYYECTGIYKEACTAYVDFMIAVAKLIRQEEGLPIDENQISVEMNKVMELEKEIANATTKSEDRNDPMLLYNKMTLAQIQNNFSLEINGKPFSWSNFTNEIMSTVNINIPNEEDVVVYAPEYLIKLKPILTKYFPRDFQNLFSWRFIMDLVSSLSRTYKDSRNAFRKALYGTTSESATWRRCANYVNGNMENAVGRLYVEAAFAGESKHVVEDLIAQIREVFIQTLDDLTWMDAETKKKAEEKALAIKERIGYPDDIVSNDNKLNNEYLEL.... The pIC50 is 6.7. (10) The small molecule is COc1cc2sc(-c3c(-c4cccs4)n[nH]c3N)nc2cc1F. The target protein (Q13418) has sequence MDDIFTQCREGNAVAVRLWLDNTENDLNQGDDHGFSPLHWACREGRSAVVEMLIMRGARINVMNRGDDTPLHLAASHGHRDIVQKLLQYKADINAVNEHGNVPLHYACFWGQDQVAEDLVANGALVSICNKYGEMPVDKAKAPLRELLRERAEKMGQNLNRIPYKDTFWKGTTRTRPRNGTLNKHSGIDFKQLNFLTKLNENHSGELWKGRWQGNDIVVKVLKVRDWSTRKSRDFNEECPRLRIFSHPNVLPVLGACQSPPAPHPTLITHWMPYGSLYNVLHEGTNFVVDQSQAVKFALDMARGMAFLHTLEPLIPRHALNSRSVMIDEDMTARISMADVKFSFQCPGRMYAPAWVAPEALQKKPEDTNRRSADMWSFAVLLWELVTREVPFADLSNMEIGMKVALEGLRPTIPPGISPHVCKLMKICMNEDPAKRPKFDMIVPILEKMQDK. The pIC50 is 6.2.