This data is from Drug-target binding data from BindingDB using IC50 measurements. The task is: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50. (1) The compound is COc1ccc(C(C)=NOC(N)=O)cc1OC1CCCC1. The target protein sequence is MEKLSYHSICTSEEWQGLMQFTLPVRLCKEIELFHFDIGPFENMWPGIFVYMVHRSCGTSCFELEKLCRFIMSVKKNYRRVPYHNWKHAVTVAHCMYAILQNNHTLFTDLERKGLLIACLCHDLDHRGFSNSYLQKFDHPLAALYSTSTMEQHHFSQTVSILQLEGHNIFSTLSSSEYEQVLEIIRKAIIATDLALYFGNRKQLEEMYQTGSLNLNNQSHRDRVIGLMMTACDLCSVTKLWPVTKLTANDIYAEFWAEGDEMKKLGIQPIPMMDRDKKDEVPQGQLGFYNAVAIPCYTTLTQILPPTEPLLKACRDNLSQWEKVIRGEETATWISSPSVAQKAAASED. The pIC50 is 4.6. (2) The compound is COc1cc(C=Nc2nnc(S)s2)ccc1O. The target protein sequence is LIPFDDAVGPTEFSPFDQWTGYCTHGSTLFPTWHRPYVLILEQILSGHAQQIADTYTVNKSEWKKAATEFRHPYWDWASNSVPPPEVISLPKVTITTPNGQKTSVANPLMRYTFNSVNDGGFYGPYNQWDTTLRQPDSTGVNAKDNVNRLKSVLKNAQASLTRATYDMFNRVTTWPHFSSHTPASGGSTSNSIEAIHDNIHVLVGGNGHMSDPSVAPFDPIFFLHHANVDRLIALWSAIRYDVWTSPGDAQFGTYTLRYKQSVDESTDLAPWWKTQNEYWKSNELRSTESLGYTYPEFVGLDMYNKDAVNKTISRKVAQLYGPQRGGQRSLVEDLSNSHARRSQRPAKRSRLGQLLKGLFSDWSAQIKFNRHEVGQSFSVCLFLGNVPEDPREWLVSPNLVGARHAFVRSVKTDHVAEEIGFIPINQWIAEHTGLPSFAVDLVKPLLAQGLQWRVLLADGTPAELDSLEVTILEVPSELTDDEPNPRSRPPRYHKDITHG.... The pIC50 is 7.4. (3) The drug is Cn1cc(/C=C/C(=O)c2cc(F)cc(F)c2)cc1/C=C/C(=O)NO. The target protein sequence is MAASGEGVSLPSPAGGEDAHRRRVSYFYEPSIGDYYYGQGHPMKPHRIRMAHSLVVHYGLHRLLELSRPYPASEADIRRFHSDDYVAFLASATGNPGVLDPRAIKRFNVGEDCPVFDGLFPFCQASAGGSIGAAVKLNRGDADITVNWAGGLHHAKKSEASGFCYVNDIVLAILELLKFHRRVLYVDIDVHHGDGVEEAFFTTNRVMTVSFHKYGDFFPGTGHITDVGAAEGKHYALNVPLSDGIDDTTFRGLFQCIIKKVMEVYQPDVVVLQCGADSLAGDRLGCFNLSVKGHADCLRFLRSYNVPMMVLGGGGYTIRNVARCWCYETAVAVGVEPDNKLPYNDYYEYFGPDYTLHIQPKSVENLNTTKDLENIKNMILENLSKIEHVPSTQFHDRPSDPEAPEEKEEDMDKRPPQRSRLWSGGAYDSDTEDPDSLKSEGKDVTANFQMKDEPKDDL. The pIC50 is 4.6. (4) The small molecule is COc1ccc(-c2nnc(Sc3ccc([N+](=O)[O-])c4nonc34)o2)cc1. The target protein (Q03330) has sequence MVTKHQIEEDHLDGATTDPEVKRVKLENNVEEIQPEQAETNKQEGTDKENKGKFEKETERIGGSEVVTDVEKGIVKFEFDGVEYTFKERPSVVEENEGKIEFRVVNNDNTKENMMVLTGLKNIFQKQLPKMPKEYIARLVYDRSHLSMAVIRKPLTVVGGITYRPFDKREFAEIVFCAISSTEQVRGYGAHLMNHLKDYVRNTSNIKYFLTYADNYAIGYFKKQGFTKEITLDKSIWMGYIKDYEGGTLMQCSMLPRIRYLDAGKILLLQEAALRRKIRTISKSHIVRPGLEQFKDLNNIKPIDPMTIPGLKEAGWTPEMDALAQRPKRGPHDAAIQNILTELQNHAAAWPFLQPVNKEEVPDYYDFIKEPMDLSTMEIKLESNKYQKMEDFIYDARLVFNNCRMYNGENTSYYKYANRLEKFFNNKVKEIPEYSHLID. The pIC50 is 6.0. (5) The compound is Cc1cc(COc2ccc(S(=O)(=O)CC(C=C3CCC(=O)CC3)N(O)C=O)cc2)c2ccccc2n1. The pIC50 is 5.6. The target protein (Q9ULZ9) has sequence MRRRAARGPGPPPPGPGLSRLPLPLLLLLALGTRGGCAAPAPAPRAEDLSLGVEWLSRFGYLPPADPTTGQLQTQEELSKAITAMQQFGGLEATGILDEATLALMKTPRCSLPDLPVLTQARRRRQAPAPTKWNKRNLSWRVRTFPRDSPLGHDTVRALMYYALKVWSDIAPLNFHEVAGSAADIQIDFSKADHNDGYPFDGPGGTVAHAFFPGHHHTAGDTHFDDDEAWTFRSSDAHGMDLFAVAVHEFGHAIGLSHVAAAHSIMRPYYQGPVGDPLRYGLPYEDKVRVWQLYGVRESVSPTAQPEEPPLLPEPPDNRSSAPPRKDVPHRCSTHFDAVAQIRGEAFFFKGKYFWRLTRDRHLVSLQPAQMHRFWRGLPLHLDSVDAVYERTSDHKIVFFKGDRYWVFKDNNVEEGYPRPVSDFSLPPGGIDAAFSWAHNDRTYFFKDQLYWRYDDHTRHMDPGYPAQSPLWRGVPSTLDDAMRWSDGASYFFRGQEYWK.... (6) The target protein (P10499) has sequence MTVMSGENADEASAAPGHPQDGSYPRQADHDDHECCERVVINISGLRFETQLKTLAQFPNTLLGNPKKRMRYFDPLRNEYFFDRNRPSFDAILYYYQSGGRLRRPVNVPLDMFSEEIKFYELGEEAMEKFREDEGFIKEEERPLPEKEYQRQVWLLFEYPESSGPARVIAIVSVMVILISIVIFCLETLPELKDDKDFTGTIHRIDNTTVIYTSNIFTDPFFIVETLCIIWFSFELVVRFFACPSKTDFFKNIMNFIDIVAIIPYFITLGTEIAEQEGNQKGEQATSLAILRVIRLVRVFRIFKLSRHSKGLQILGQTLKASMRELGLLIFFLFIGVILFSSAVYFAEAEEAESHFSSIPDAFWWAVVSMTTVGYGDMYPVTIGGKIVGSLCAIAGVLTIALPVPVIVSNFNYFYHRETEGEEQAQLLHVSSPNLASDSDLSRRSSSTISKSEYMEIEEDMNNSIAHYRQANIRTGNCTATDQNCVNKSKLLTDV. The small molecule is CC(C)(C)OC(=O)NCCNC(=O)c1ccc(-c2c3nc(c(-c4ccc(C(=O)NCCNC(=O)OC(C)(C)C)cc4)c4ccc([nH]4)c(-c4ccc(C(=O)NCCNC(=O)OC(C)(C)C)cc4)c4nc(c(-c5ccc(C(=O)NCCNC(=O)OC(C)(C)C)cc5)c5ccc2[nH]5)C=C4)C=C3)cc1. The pIC50 is 4.9.