Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50.. Dataset: Drug-target binding data from BindingDB using IC50 measurements (1) The target protein (P9WG47) has sequence MTDTTLPPDDSLDRIEPVDIEQEMQRSYIDYAMSVIVGRALPEVRDGLKPVHRRVLYAMFDSGFRPDRSHAKSARSVAETMGNYHPHGDASIYDSLVRMAQPWSLRYPLVDGQGNFGSPGNDPPAAMRYTEARLTPLAMEMLREIDEETVDFIPNYDGRVQEPTVLPSRFPNLLANGSGGIAVGMATNIPPHNLRELADAVFWALENHDADEEETLAAVMGRVKGPDFPTAGLIVGSQGTADAYKTGRGSIRMRGVVEVEEDSRGRTSLVITELPYQVNHDNFITSIAEQVRDGKLAGISNIEDQSSDRVGLRIVIEIKRDAVAKVVINNLYKHTQLQTSFGANMLAIVDGVPRTLRLDQLIRYYVDHQLDVIVRRTTYRLRKANERAHILRGLVKALDALDEVIALIRASETVDIARAGLIELLDIDEIQAQAILDMQLRRLAALERQRIIDDLAKIEAEIADLEDILAKPERQRGIVRDELAEIVDRHGDDRRTRIIA.... The small molecule is COc1c(N2CC[C@@H](C(C)(C)N)C2)c(F)cc2c(=O)c3c(=O)[nH]sc3n(C3CC3)c12. The pIC50 is 5.5. (2) The compound is Cn1c(=O)c(S(=O)(=O)c2ccc(F)cc2F)cc2cnc(Nc3ccc4[nH]ccc4c3)nc21. The target protein (Q99986) has sequence MPRVKAAQAGRQSSAKRHLAEQFAVGEIITDMAKKEWKVGLPIGQGGFGCIYLADMNSSESVGSDAPCVVKVEPSDNGPLFTELKFYQRAAKPEQIQKWIRTRKLKYLGVPKYWGSGLHDKNGKSYRFMIMDRFGSDLQKIYEANAKRFSRKTVLQLSLRILDILEYIHEHEYVHGDIKASNLLLNYKNPDQVYLVDYGLAYRYCPEGVHKEYKEDPKRCHDGTIEFTSIDAHNGVAPSRRGDLEILGYCMIQWLTGHLPWEDNLKDPKYVRDSKIRYRENIASLMDKCFPEKNKPGEIAKYMETVKLLDYTEKPLYENLRDILLQGLKAIGSKDDGKLDLSVVENGGLKAKTITKKRKKEIEESKEPGVEDTEWSNTQTEEAIQTRSRTRKRVQK. The pIC50 is 5.0. (3) The compound is Nc1nonc1C(=Nc1ccc(F)c(Cl)c1)NO. The target protein (P48775) has sequence MSGCPFLGNNFGYTFKKLPVEGSEEDKSQTGVNRASKGGLIYGNYLHLEKVLNAQELQSETKGNKIHDEHLFIITHQAYELWFKQILWELDSVREIFQNGHVRDERNMLKVVSRMHRVSVILKLLVQQFSILETMTALDFNDFREYLSPASGFQSLQFRLLENKIGVLQNMRVPYNRRHYRDNFKGEENELLLKSEQEKTLLELVEAWLERTPGLEPHGFNFWGKLEKNITRGLEEEFIRIQAKEESEEKEEQVAEFQKQKEVLLSLFDEKRHEHLLSKGERRLSYRALQGALMIYFYREEPRFQVPFQLLTSLMDIDSLMTKWRYNHVCMVHRMLGSKAGTGGSSGYHYLRSTVSDRYKVFVDLFNLSTYLIPRHWIPKMNPTIHKFLYTAEYCDSSYFSSDESD. The pIC50 is 5.5. (4) The target protein (P43119) has sequence MADSCRNLTYVRGSVGPATSTLMFVAGVVGNGLALGILSARRPARPSAFAVLVTGLAATDLLGTSFLSPAVFVAYARNSSLLGLARGGPALCDAFAFAMTFFGLASMLILFAMAVERCLALSHPYLYAQLDGPRCARLALPAIYAFCVLFCALPLLGLGQHQQYCPGSWCFLRMRWAQPGGAAFSLAYAGLVALLVAAIFLCNGSVTLSLCRMYRQQKRHQGSLGPRPRTGEDEVDHLILLALMTVVMAVCSLPLTIRCFTQAVAPDSSSEMGDLLAFRFYAFNPILDPWVFILFRKAVFQRLKLWVCCLCLGPAHGDSQTPLSQLASGRRDPRAPSAPVGKEGSCVPLSAWGEGQVEPLPPTQQSSGSAVGTSSKAEASVACSLC. The pIC50 is 7.3. The drug is O=C([O-])COc1cccc(CN2CCC[C@@H]2c2nc(-c3ccccc3)c(-c3ccccc3)o2)c1. (5) The drug is COc1cc([N+](=O)[O-])ccc1-c1ccc(/C=C2\C(=O)NC(=O)N(c3ccc4c(c3)OCO4)C2=O)o1. The target protein (P12259) has sequence MFPGCPRLWVLVVLGTSWVGWGSQGTEAAQLRQFYVAAQGISWSYRPEPTNSSLNLSVTSFKKIVYREYEPYFKKEKPQSTISGLLGPTLYAEVGDIIKVHFKNKADKPLSIHPQGIRYSKLSEGASYLDHTFPAEKMDDAVAPGREYTYEWSISEDSGPTHDDPPCLTHIYYSHENLIEDFNSGLIGPLLICKKGTLTEGGTQKTFDKQIVLLFAVFDESKSWSQSSSLMYTVNGYVNGTMPDITVCAHDHISWHLLGMSSGPELFSIHFNGQVLEQNHHKVSAITLVSATSTTANMTVGPEGKWIISSLTPKHLQAGMQAYIDIKNCPKKTRNLKKITREQRRHMKRWEYFIAAEEVIWDYAPVIPANMDKKYRSQHLDNFSNQIGKHYKKVMYTQYEDESFTKHTVNPNMKEDGILGPIIRAQVRDTLKIVFKNMASRPYSIYPHGVTFSPYEDEVNSSFTSGRNNTMIRAVQPGETYTYKWNILEFDEPTENDAQC.... The pIC50 is 5.3.