Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50.. Dataset: Drug-target binding data from BindingDB using IC50 measurements (1) The small molecule is N[C@H]1Cc2cc(Cl)ccc2N(O)C1=O. The target protein (Q64602) has sequence MNYSRFLTATSLARKTSPIRATVEIMSRAPKDIISLAPGSPNPKVFPFKSAVFTVENGSTIRFEGEMFQRALQYSSSYGIPELLSWLKQLQIKLHNPPTVNYSPNEGQMDLCITSGCQDGLCKVFEMLINPGDTVLVNEPLYSGALFAMKPLGCNFISVPSDDCGIIPEGLKKVLSQWKPEDSKDPTKRTPKFLYTIPNGNNPTGNSLTGDRKKEIYELARKYDFLIIEDDPYYFLQFTKPWEPTFLSMDVDGRVIRADSLSKVISSGLRVGFITGPKSLIQRIVLHTQISSLHPCTLSQLMISELLYQWGEEGFLAHVDRAIDFYKNQRDFILAAADKWLRGLAEWHVPKAGMFLWIKVNGISDAKKLIEEKAIEREILLVPGNSFFVDNSAPSSFFRASFSQVTPAQMDLVFQRLAQLIKDVS. The pIC50 is 6.6. (2) The pIC50 is 4.4. The drug is Cn1cc(C(=O)NCc2ccc(Cl)cc2)c(=O)c2cc(CN3CCOCC3)sc21. The target protein (P09252) has sequence MAIRTGFCNPFLTQASGIKYNPRTGRGSNREFLHSYKTTMSSFQFLAPKCLDEDVPMEERKGVHVGTLSRPPKVYCNGKEVPILDFRCSSPWPRRVNIWGEIDFRGDKFDPRFNTFHVYDIVETTEAASNGDVSRFATATRPLGTVITLLGMSRCGKRVAVHVYGICQYFYINKAEVDTACGIRSGSELSVLLAECLRSSMITQNDATLNGDKNAFHGTSFKSASPESFRVEVIERTDVYYYDTQPCAFYRVYSPSSKFTNYLCDNFHPELKKYEGRVDATTRFLMDNPGFVSFGWYQLKPGVDGERVRVRPASRQLTLSDVEIDCMSDNLQAIPNDDSWPDYKLLCFDIECKSGGSNELAFPDATHLEDLVIQISCLLYSIPRQSLEHILLFSLGSCDLPQRYVQEMKDAGLPEPTVLEFDSEFELLIAFMTLVKQYAPEFATGYNIVNFDWAFIMEKLNSIYSLKLDGYGSINRGGLFKIWDVGKSGFQRRSKVKING.... (3) The small molecule is CC(=O)N[C@@H](CC(C)C)C(=O)NCC(CC(=O)O)S(=O)(=O)N[C@H](C(N)=O)[C@@H](C)O. The target protein (Q13477) has sequence MDFGLALLLAGLLGLLLGQSLQVKPLQVEPPEPVVAVALGASRQLTCRLACADRGASVQWRGLDTSLGAVQSDTGRSVLTVRNASLSAAGTRVCVGSCGGRTFQHTVQLLVYAFPDQLTVSPAALVPGDPEVACTAHKVTPVDPNALSFSLLVGGQELEGAQALGPEVQEEEEEPQGDEDVLFRVTERWRLPPLGTPVPPALYCQATMRLPGLELSHRQAIPVLHSPTSPEPPDTTSPESPDTTSPESPDTTSQEPPDTTSPEPPDKTSPEPAPQQGSTHTPRSPGSTRTRRPEISQAGPTQGEVIPTGSSKPAGDQLPAALWTSSAVLGLLLLALPTYHLWKRCRHLAEDDTHPPASLRLLPQVSAWAGLRGTGQVGISPS. The pIC50 is 3.3. (4) The compound is Nc1nc2c(ncn2[C@@H]2O[C@@H]3CO[P@](=O)(O)O[C@@H]4[C@@H](CO[P@](=O)(O)O[C@H]3[C@H]2O)O[C@@H](n2cnc3c(=O)[nH]c(N)nc32)[C@@H]4O)c(=O)[nH]1. The target protein sequence is MNDLNVLVLEDEPFQRLVAVTALKKVVPGSILEAADGKEAVAILESCGHVDIAICDLQMSGMDGLAFLRHASLSGKVHSVILSSEVDPILRQATISMIECLGLNFLGDLGKPFSLERITALLTRYNARRQDLPRQIEVAELPSVADVVRGLDNGEFEAYYQPKVALDGGGLIGAEVLARWNHPHLGVLPPSHFLYVMETYNLVDKLFWQLFSQGLATRRKLAQLGQPINLAFNVHPSQLGSRALAENISALLTEFHLPPSSVMFEITETGLISAPASSLENLVRLRIMGCGLAMDDFGAGYSSLDRLCEFPFSQIKLDRTFVQKMKTQPRSCAVISSVVALAQALGISLVVEGVESDEQRVRLIELGCSIAQGYLFARPMPEQHFLDYCSGS. The pIC50 is 7.7. (5) The drug is CNCC(O)CN1CCC(Cc2ccccc2)CC1. The target protein (Q96LA8) has sequence MSQPKKRKLESGGGGEGGEGTEEEDGAEREAALERPRRTKRERDQLYYECYSDVSVHEEMIADRVRTDAYRLGILRNWAALRGKTVLDVGAGTGILSIFCAQAGARRVYAVEASAIWQQAREVVRFNGLEDRVHVLPGPVETVELPEQVDAIVSEWMGYGLLHESMLSSVLHARTKWLKEGGLLLPASAELFIAPISDQMLEWRLGFWSQVKQHYGVDMSCLEGFATRCLMGHSEIVVQGLSGEDVLARPQRFAQLELSRAGLEQELEAGVGGRFRCSCYGSAPMHGFAIWFQVTFPGGESEKPLVLSTSPFHPATHWKQALLYLNEPVQVEQDTDVSGEITLLPSRDNPRRLRVLLRYKVGDQEEKTKDFAMED. The pIC50 is 5.5. (6) The small molecule is COCC(=O)Nc1ccc(O)c(-c2cc(C(=O)O)ccn2)c1. The target protein (Q6B0I6) has sequence METMKSKANCAQNPNCNIMIFHPTKEEFNDFDKYIAYMESQGAHRAGLAKIIPPKEWKARETYDNISEILIATPLQQVASGRAGVFTQYHKKKKAMTVGEYRHLANSKKYQTPPHQNFEDLERKYWKNRIYNSPIYGADISGSLFDENTKQWNLGHLGTIQDLLEKECGVVIEGVNTPYLYFGMWKTTFAWHTEDMDLYSINYLHLGEPKTWYVVPPEHGQRLERLARELFPGSSRGCGAFLRHKVALISPTVLKENGIPFNRITQEAGEFMVTFPYGYHAGFNHGFNCAEAINFATPRWIDYGKMASQCSCGEARVTFSMDAFVRILQPERYDLWKRGQDRAVVDHMEPRVPASQELSTQKEVQLPRRAALGLRQLPSHWARHSPWPMAARSGTRCHTLVCSSLPRRSAVSGTATQPRAAAVHSSKKPSSTPSSTPGPSAQIIHPSNGRRGRGRPPQKLRAQELTLQTPAKRPLLAGTTCTASGPEPEPLPEDGALMDK.... The pIC50 is 5.4. (7) The drug is Cc1cc(Nc2cccc(NC(=O)c3cccc(Nc4ccnc5ccccc45)c3)c2)nc(N)n1. The target protein (Q9UJW3) has sequence MAAIPALDPEAEPSMDVILVGSSELSSSVSPGTGRDLIAYEVKANQRNIEDICICCGSLQVHTQHPLFEGGICAPCKDKFLDALFLYDDDGYQSYCSICCSGETLLICGNPDCTRCYCFECVDSLVGPGTSGKVHAMSNWVCYLCLPSSRSGLLQRRRKWRSQLKAFYDRESENPLEMFETVPVWRRQPVRVLSLFEDIKKELTSLGFLESGSDPGQLKHVVDVTDTVRKDVEEWGPFDLVYGATPPLGHTCDRPPSWYLFQFHRLLQYARPKPGSPRPFFWMFVDNLVLNKEDLDVASRFLEMEPVTIPDVHGGSLQNAVRVWSNIPAIRSRHWALVSEEELSLLAQNKQSSKLAAKWPTKLVKNCFLPLREYFKYFSTELTSSL. The pIC50 is 5.5. (8) The drug is CC12CC3CC1(C)CC3(CN)C2. The target protein sequence is MSLLTEVETPIRNEWGCRCNDSSDPLVVAASIIGILHLILWILDRLFFKCIYRFFEHGLKRGPSTEGVPESMREEYRKEQQSAVDADDSHFVSIEL. The pIC50 is 5.1. (9) The drug is N#C/C(=C\c1cccc(-n2cc(-c3ccc(Cl)cc3)c3c(N)ncnc32)c1)C(=O)N1CCC1. The target protein sequence is MGSNKSKPKDASQRRRSLEPAENVHGAGGGAFPASQTPSKPASADGHRGPSAAFAPAAAEPKLFGGFNSSDTVTSPQRAGPLAGGVTTFVALYDYESRTETDLSFKKGERLQIVNNTEGDWWLAHSLSTGQTGYIPSNYVAPSDSIQAEEWYFGKITRRESERLLLNAENPRGTFLVRESETTKGAYCLSVSDFDNAKGLNVKHYKIRKLDSGGFYITSRTQFNSLQQLVAYYSKHADGLCHRLTTVCPTSKPQTQGLAKDAWEIPRESLRLEVKLGQGCFGEVWMGTWNGTTRVAIKTLKPGTMSPEAFLQEAQVMKKLRHEKLVQLYAVVSEEPIYIVTEYMCKGSLLDFLKGETGKYLRLPQLVDMAAQIASGMAYVERMNYVHRDLRAANILVGENLVCKVADFGLARLIEDNEYTARQGAKFPIKWTAPEAALYGRFTIKSDVWSFGILLTELTTKGRVPYPGMVNREVLDQVERGYRMPCPPECPESLHDLMCQ.... The pIC50 is 5.6.