This data is from Drug-target binding data from BindingDB using IC50 measurements. The task is: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50. (1) The small molecule is CC[C@H](C)[C@@H]1NC(=O)CNC(=O)[C@H](CC(=O)O)NC(=O)[C@@H]2CSSC[C@@H]3NC(=O)[C@H](CC(C)C)NC(=O)[C@H](Cc4c[nH]c5ccccc45)NC(=O)[C@@H]4CCCN4C(=O)[C@H](CC(=O)O)NC(=O)[C@H](CSSC[C@H](N)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](C)C(=O)N[C@@H](Cc4cnc[nH]4)C(=O)N[C@@H](Cc4ccc(O)cc4)C(=O)NCC(=O)N[C@@H](CCCCN)C(=O)N2)NC(=O)[C@H](CSSC[C@@H](C(=O)N[C@@H](CC(C)C)C(=O)O)NC(=O)[C@H](Cc2ccccc2)NC(=O)CNC(=O)[C@H]([C@@H](C)CC)NC(=O)[C@H]([C@@H](C)CC)NC(=O)[C@@H]2CCCN2C(=O)[C@@H]2CCCN2C(=O)[C@H]([C@@H](C)O)NC3=O)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H](CC(N)=O)NC(=O)[C@H]([C@@H](C)CC)NC1=O. The target protein (P56634) has sequence QKDANFASGRNSIVHLFEWKWNDIADECERFLQPQGFGGVQISPPNEYLVADGRPWWERYQPVSYIINTRSGDESAFTDMTRRCNDAGVRIYVDAVINHMTGMNGVGTSGSSADHDGMNYPAVPYGSGDFHSPCEVNNYQDADNVRNCELVGLRDLNQGSDYVRGVLIDYMNHMIDLGVAGFRVDAAKHMSPGDLSVIFSGLKNLNTDYGFADGARPFIYQEVIDLGGEAISKNEYTGFGCVLEFQFGVSLGNAFQGGNQLKNLANWGPEWGLLEGLDAVVFVDNHDNQRTGGSQILTYKNPKPYKMAIAFMLAHPYGTTRIMSSFDFTDNDQGPPQDGSGNLISPGINDDNTCSNGYVCEHRWRQVYGMVGFRNAVEGTQVENWWSNDDNQIAFSRGSQGFVAFTNGGDLNQNLNTGLPAGTYCDVISGELSGGSCTGKSVTVGDNGSADISLGSAEDDGVLAIHVNAKL. The pIC50 is 5.6. (2) The small molecule is Nc1c(S(=O)(=O)[O-])cc(Nc2cccc3ccccc23)c2c1C(=O)c1ccccc1C2=O. The target protein (Q63371) has sequence MERDNGTIQAPGLPPTTCVYREDFKRLLLPPVYSVVLVVGLPLNVCVIAQICASRRTLTRSAVYTLNLALADLLYACSLPLLIYNYARGDHWPFGDLACRLVRFLFYANLHGSILFLTCISFQRYLGICHPLAPWHKRGGRRAAWVVCGVVWLVVTAQCLPTAVFAATGIQRNRTVCYDLSPPILSTRYLPYGMALTVIGFLLPFTALLACYCRMARRLCRQDGPAGPVAQERRSKAARMAVVVAAVFVISFLPFHITKTAYLAVRSTPGVSCPVLETFAAAYKGTRPFASANSVLDPILFYFTQQKFRRQPHDLLQKLTAKWQRQRV. The pIC50 is 4.5. (3) The drug is CCc1nc(N)nc(N)c1-c1ccccc1. The target protein (P06865) has sequence MTSSRLWFSLLLAAAFAGRATALWPWPQNFQTSDQRYVLYPNNFQFQYDVSSAAQPGCSVLDEAFQRYRDLLFGSGSWPRPYLTGKRHTLEKNVLVVSVVTPGCNQLPTLESVENYTLTINDDQCLLLSETVWGALRGLETFSQLVWKSAEGTFFINKTEIEDFPRFPHRGLLLDTSRHYLPLSSILDTLDVMAYNKLNVFHWHLVDDPSFPYESFTFPELMRKGSYNPVTHIYTAQDVKEVIEYARLRGIRVLAEFDTPGHTLSWGPGIPGLLTPCYSGSEPSGTFGPVNPSLNNTYEFMSTFFLEVSSVFPDFYLHLGGDEVDFTCWKSNPEIQDFMRKKGFGEDFKQLESFYIQTLLDIVSSYGKGYVVWQEVFDNKVKIQPDTIIQVWREDIPVNYMKELELVTKAGFRALLSAPWYLNRISYGPDWKDFYIVEPLAFEGTPEQKALVIGGEACMWGEYVDNTNLVPRLWPRAGAVAERLWSNKLTSDLTFAYERL.... The pIC50 is 4.5. (4) The small molecule is N#Cc1ccc2cc3c(=O)[nH]c(=O)nc-3n(-c3cccc(-n4cnnn4)c3)c2c1. The target protein sequence is MASGSSSDAAEPAGPAGRAASAPEAAQAEEDRVKRRRLQCLGFALVGGCDPTMVPSVLRENDWQTQKALSAYFELPENDQGWPRQPPTSFKSEAYVDLTNEDANDTTILEASPSGTPLEDSSTISFITWNIDGLDGCNLPERARGVCSCLALYSPDVVFLQEVIPPYCAYLKKRAASYTIITGNEEGYFTAILLKKGRVKFKSQEIIPFPNTKMMRNLLCVNVSLGGNEFCLMTSHLESTRGHAAERIRQLKTVLGKMQEAPDSTTVIFAGDTNLRDREVTRCGGLPDNVFDAWEFLGKPKHCQYTWDTKANNNLGITAACKLRFDRIFFRAEEGHLIPQSLDLVGLEKLDCGRFPSDHWGLLCTLNVVL. The pIC50 is 7.5. (5) The drug is COc1cc(N2CCNCC2)ccc1Nc1ncc(C(F)(F)F)c(CCc2ccccc2CC(N)=O)n1. The target protein sequence is DPGEVPLEEQCEYLSYDASQWEFPRERLHLGRVLGYGAFGKVVEASAFGIHKGSSCDTVAVKMLKEGATASEHRALMSELKILIHIGNHLNVVNLLGACTKPQGPLMVIVEFCKYGNLSNFLRAKRDAFSPSPLTMEDLVCYSFQVARGMEFLASRKCIHRDLAARNILLSESDVVKICDFGLARDIYKDPDYVRKGSARLPLKWMAPESIFDKVYTTQSDVWSFGVLLWEIFSLGASPYPGVQINEEFCQRLRDGTRMRAPELATPAIRRIMLNCWSGDPKARPAFSELVEILGDLLQGRG. The pIC50 is 6.6. (6) The small molecule is COc1cccc(CCCCCCc2ccccc2)c1C(=O)O. The target protein (P43680) has sequence MQRSPPGYGAQDDPPSRRDCAWAPGIGAAAEARGLPVTNVSPTSPASPSSLPRSPPRSPESGRYGFGRGERQTADELRIRRPMNAFMVWAKDERKRLAQQNPDLHNAVLSKMLGKAWKELNTAEKRPFVEEAERLRVQHLRDHPNYKYRPRRKKQARKVRRLEPGLLLPGLVQPSAPPEAFAAASGSARSFRELPTLGAEFDGLGLPTPERSPLDGLEPGEASFFPPPLAPEDCALRAFRAPYAPELARDPSFCYGAPLAEALRTAPPAAPLAGLYYGTLGTPGPFPNPLSPPPESPSLEGTEQLEPTADLWADVDLTEFDQYLNCSRTRPDATTLPYHVALAKLGPRAMSCPEESSLISALSDASSAVYYSACISG. The pIC50 is 3.5. (7) The compound is O=C1c2cc(CO)cc(O)c2C(=O)c2c1ccc(C1(C3O[C@H](CO)[C@@H](O)[C@H](O)[C@H]3O)c3cccc(O)c3C(=O)c3c(O)cc(CO)cc31)c2O. The target protein (P13601) has sequence MSSPAQPAVPAPLANLKIQHTKIFINNEWHNSLNGKKFPVINPATEEVICHVEEGDKADVDKAVKAARQAFQIGSPWRTMDASERGCLLNKLADLMERDRVLLATMESMNAGKIFTHAYLLDTEVSIKALKYFAGWADKIHGQTIPSDGDVFTYTRREPIGVCGQIIPWNGPLILFIWKIGAALSCGNTVIVKPAEQTPLTALYMASLIKEAGFPPGVVNVVPGYGSTAGAAISSHMDIDKVSFTGSTEVGKLIKEAAGKSNLKRVTLELGGKSPCIVFADADLDSAVEFAHQGVFFHQGQICVAASRLFVEESIYDEFVRRSVERAKKYVLGNPLDSGISQGPQIDKEQHAKILDLIESGKKEGAKLECGGGRWGNKGFFVQPTVFSNVTDEMRIAKEEIFGPVQQIMKFKSIDEVIKRANNTPYGLAAGVFTKDLDRAITVSSALQAGTVWVNCYLTLSVQCPFGGFKMSGNGREMGEQGVYEYTELKTVAMKISQKN.... The pIC50 is 6.3. (8) The drug is O=C(CNc1cccc(Cl)c1)N/N=C/c1ccccc1Cl. The target protein sequence is MNKISQRLLFLFLHFYTIVCFIQNNTQKTFHNVLHNEQIRGKEKAFYRKEKRENIFIGNKMKHLNNMNNTHNNNHYMEKEEQDASNIYKIKEENKNEDICFIAGIGDTNGYGWGIAKELSKRNVKIIFGIWPPVYNIFMKNYKNGKFDNDMIIDKDKKMNILDMLPFDASFDTANDIDEETKNNKRYNMLQNYTIEDVANLIHQKYGKINMLVHSLANAKEVQKDLLNTSRKGYLDALSKSSYSLISLCKYFVNIMKPQSSIISLTYHASQKVVPGYGGGMSSAKAALESDTRVLAYHLGRNYNIRINTISAGPLKSRAATAINKLNNTYENNTNQNKNRNSHDVHNIMNNSGEKEEKKNSASQNYTFIDYAIEYSEKYAPLRQKLLSTDIGSVASFLLSRESRAITGQTIYVDNGLNIMFLPDDIYRNENE. The pIC50 is 4.1. (9) The compound is N#CCCN. The target protein (B5DF27) has sequence MEIPFGSCLYSCLALLVLLPSLSLAQYESWPYQLQYPEYFQQPPPEHHQHQVPSDVVKIQVRLAGQKRKHNEGRVEVYYEGQWGTVCDDDFSIHAAHVVCREVGYVEAKSWTASSSYGPGEGPIWLDNIYCTGKESTLAACSSNGWGVTDCKHPEDVGVVCSEKRIPGFKFDNSLINQIESLNIQVEDIRIRPILSAFRHRKPVTEGYVEVKEGKAWKQICDKHWTAKNSHVVCGMFGFPAEKTYNPKAYKTFASRRKLRYWKFSMNCTGTEAHISSCKLGPPMFRDPVKNATCENGQPAVVSCVPSQIFSPDGPSRFRKAYKPEQPLVRLRGGAQVGEGRVEVLKNGEWGTVCDDKWDLVSASVVCRELGFGTAKEAVTGSRLGQGIGPIHLNEVQCTGTEKSIIDCKLNTESQGCNHEEDAGVRCNIPIMGFQKKVRLNGGRNPYEGRVEVLTERNGSLVWGNVCGQNWGIVEAMVVCRQLGLGFASNAFQETWYWHG.... The pIC50 is 6.9. (10) The drug is CC1=C(/C=N/NC(=S)Nc2ccccc2)C(C)(C)CC=C1. The target protein (P21397) has sequence MENQEKASIAGHMFDVVVIGGGISGLSAAKLLTEYGVSVLVLEARDRVGGRTYTIRNEHVDYVDVGGAYVGPTQNRILRLSKELGIETYKVNVSERLVQYVKGKTYPFRGAFPPVWNPIAYLDYNNLWRTIDNMGKEIPTDAPWEAQHADKWDKMTMKELIDKICWTKTARRFAYLFVNINVTSEPHEVSALWFLWYVKQCGGTTRIFSVTNGGQERKFVGGSGQVSERIMDLLGDQVKLNHPVTHVDQSSDNIIIETLNHEHYECKYVINAIPPTLTAKIHFRPELPAERNQLIQRLPMGAVIKCMMYYKEAFWKKKDYCGCMIIEDEDAPISITLDDTKPDGSLPAIMGFILARKADRLAKLHKEIRKKKICELYAKVLGSQEALHPVHYEEKNWCEEQYSGGCYTAYFPPGIMTQYGRVIRQPVGRIFFAGTETATKWSGYMEGAVEAGERAAREVLNGLGKVTEKDIWVQEPESKDVPAVEITHTFWERNLPSVSG.... The pIC50 is 4.7.