Task: Regression. Given a target protein amino acid sequence and a drug SMILES string, predict the binding affinity score between them. We predict pIC50 (pIC50 = -log10(IC50 in M); higher means more potent). Dataset: bindingdb_ic50.. Dataset: Drug-target binding data from BindingDB using IC50 measurements (1) The small molecule is COc1ccc(/C=C2/Oc3c(ccc(O)c3O)C2=O)c(O)c1. The target protein (P0A2Y6) has sequence MRYLTAGESHGPRLTAIIEGIPAGLPLTAEDINEDLRRRQGGYGRGGRMKIENDQVVFTSGVRHGKTTGAPITMDVINKDHQKWLDIMSAEDIEDRLKSKRKITHPRPGHADLVGGIKYRFDDLRNSLERSSARETTMRVAVGAVAKRLLAELDMEIANHVVVFGGKEIDVPENLTVAEIKQRAAQSEVSIVNQEREQEIKDYIDQIKRDGDTIGGVVETVVGGVPVGLGSYVQWDRKLDARLAQAVVSINAFKGVEFGLGFEAGYRKGSQVMDEILWSKEDGYTRRTNNLGGFEGGMTNGQPIVVRGVMKPIPTLYKPLMSVDIETHEPYKATVERSDPTALPAAGMVMEAVVATVLAQEILEKFSSDNLEELKEAVAKHRDYTKNY. The pIC50 is 5.8. (2) The drug is Cn1nc(-c2nnc(Cc3ccc(F)cc3)o2)/c(=N/O)c2ncccc21. The target protein (P24740) has sequence MGARASVLSGKKLDSWEKIRLRPGGNKKYRLKHLVWASRELEKFTLNPGLLETAEGCQQILGQLQPALQTGTEELRSLYNTVAVLYCVHQRIDVKDTKEALNKIEEMQNKNKQRTQQAAANTGSSQNYPIVQNAQGQPVHQALSPRTLNAWVKVVEDKAFSPEVIPMFSALSEGATPQDLNMMLNVVGGHQAAMQMLKDTINEEAAEWDRLHPVHAGPIPPGQMREPRGSDIAGTTSTVQEQIGWMTGNPPIPVGDIYRRWIILGLNKIVRMYSPVSILDIRQGPKEPFRDYVDRFFKTLRAEQATQDVKNWMTETLLVQNANPDCKSILRALGPGATLEEMMTACQGVGGPGHKARVLAEAMSQVQQTSIMMQRGNFRGPRRIKCFNCGKEGHLAKNCRAPRKKGCWKCGKEGHQMKDCTERQANFLRENLAFQQGEAREFSSEQTRANSPTSRNLWDGGKDDLPCETGAERQGTDSFSFPQITLWQRPLVTVKIGGQL.... The pIC50 is 4.6. (3) The small molecule is CNC(=O)NC(N)=NCCC[C@H](NC(=O)[C@@H](C)NC(C)=O)C(=O)N(C)[C@@H](Cc1ccccc1)C(=O)N[C@@H](CC(=O)O)C(=O)O. The pIC50 is 5.4. The target protein (Q873X9) has sequence MRFATSTIVKVALLLSSLCVDAAVMWNRDTSSTDLEARASSGYRSVVYFVNWAIYGRNHNPQDLPVERLTHVLYAFANVRPETGEVYMTDSWADIEKHYPGDSWSDTGNNVYGCIKQLYLLKKQNRNLKVLLSIGGWTYSPNFAPAASTDAGRKNFAKTAVKLLQDLGFDGLDIDWEYPENDQQANDFVLLLKEVRTALDSYSAANAGGQHFLLTVASPAGPDKIKVLHLKDMDQQLDFWNLMAYDYAGSFSSLSGHQANVYNDTSNPLSTPFNTQTALDLYRAGGVPANKIVLGMPLYGRSFANTDGPGKPYNGVGQGSWENGVWDYKALPQAGATEHVLPDIMASYSYDATNKFLISYDNPQVANLKSGYIKSLGLGGAMWWDSSSDKTGSDSLITTVVNALGGTGVFEQSQNELDYPVSQYDNLRNGMQT. (4) The compound is S=C1Nc2ccc(-c3cccc(Cl)c3)cc2C12CCCCC2. The target protein (Q63449) has sequence MTELQAKDPRTLHTSGAAPSPTHVGSPLLARLDPDPFQGSQHSDASSVVSPIPISLDRLLFSRSCQAQELPDEKTQNQQSLSDVEGAFSGVEASRRRSRNPRAPEKDSRLLDSVLDTLLAPSGPEQSQTSPPACEAITSWCLFGPELPEDPRSVPATKGLLSPLMSRPESKAGDSSGTGAGQKVLPKAVSPPRQLLLPTSGSAHWPGAGVKPSQQPATVEVEEDGGLETEGSAGPLLKSKPRALEGMCSGGGVTANAPGAAPGGVTLVPKEDSRFSAPRVSLEQDAPVAPGRSPLATTVVDFIHVPILPLNHALLAARTRQLLEGDSYDGGAAAQVPFAPPRGSPSAPSPPVPCGDFPDCTYPPEGDPKEDGFPVYGEFQPPGLKIKEEEEGTEAASRSPRPYLLAGASAATFPDFPLPPRPPRAPPSRPGEAAVAAPSAAVSPVSSSGSALECILYKAEGAPPTQGSFAPLPCKPPAASSCLLPRDSLPAAPTSSAAPA.... The pIC50 is 9.3. (5) The small molecule is O=c1[nH]nc(CCCc2ccccc2)cc1O. The target protein (P18894) has sequence MRVAVIGAGVIGLSTALCIHERYHPTQPLHMKIYADRFTPFTTSDVAAGLWQPYLSDPSNPQEAEWSQQTFDYLLSCLHSPNAEKMGLALISGYNLFRDEVPDPFWKNAVLGFRKLTPSEMDLFPDYGYGWFNTSLLLEGKSYLPWLTERLTERGVKLIHRKVESLEEVARGVDVIINCTGVWAGALQADASLQPGRGQIIQVEAPWIKHFILTHDPSLGIYNSPYIIPGSKTVTLGGIFQLGNWSGLNSVRDHNTIWKSCCKLEPTLKNARIVGELTGFRPVRPQVRLEREWLRHGSSSAEVIHNYGHGGYGLTIHWGCAMEAANLFGKILEEKKLSRLPPSHL. The pIC50 is 6.4. (6) The drug is CC[C@H](C)[C@H](NC(=O)[C@H](C)NC(=O)[C@@H](N)[C@@H](C)O)C(=O)N[C@@H](C)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](CCCNC(=N)N)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@H](C(=O)N[C@H](C(=O)N[C@H](C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](Cc1ccc(O)cc1)C(=O)N[C@@H](Cc1ccccc1)C(=O)N[C@H](C(=O)N[C@H](C(=O)N[C@@H](CO)C(=O)O)[C@@H](C)O)[C@@H](C)CC)[C@@H](C)O)C(C)C)[C@@H](C)O. The target protein (P21146) has sequence MADLEAVLADVSYLMAMEKSKATPAARASKKILLPEPSIRSVMQKYLEDRGEVTFEKIFSQKLGYLLFRDFCLKHLEEAKPLVEFYEEIKKYEKLETEEERLVCSREIFDTYIMKELLACSHPFSKSAIEHVQGHLVKKQVPPDLFQPYIEEICQNLRGDVFQKFIESDKFTRFCQWKNVELNIHLTMNDFSVHRIIGRGGFGEVYGCRKADTGKMYAMKCLDKKRIKMKQGETLALNERIMLSLVSTGDCPFIVCMSYAFHTPDKLSFILDLMNGGDLHYHLSQHGVFSEADMRFYAAEIILGLEHMHNRFVVYRDLKPANILLDEHGHVRISDLGLACDFSKKKPHASVGTHGYMAPEVLQKGVAYDSSADWFSLGCMLFKLLRGHSPFRQHKTKDKHEIDRMTLTMAVELPDSFSPELRSLLEGLLQRDVNRRLGCLGRGAQEVKESPFFRSLDWQMVFLQKYPPPLIPPRGEVNAADAFDIGSFDEEDTKGIKLLD.... The pIC50 is 4.4. (7) The drug is CCC/C=C1/OC(=O)c2ccccc21. The target protein (P07265) has sequence MTISDHPETEPKWWKEATIYQIYPASFKDSNNDGWGDLKGITSKLQYIKDLGVDAIWVCPFYDSPQQDMGYDISNYEKVWPTYGTNEDCFELIDKTHKLGMKFITDLVINHCSTEHEWFKESRSSKTNPKRDWFFWRPPKGYDAEGKPIPPNNWKSFFGGSAWTFDETTNEFYLRLFASRQVDLNWENEDCRRAIFESAVGFWLDHGVDGFRIDTAGLYSKRPGLPDSPIFDKTSKLQHPNWGSHNGPRIHEYHQELHRFMKNRVKDGREIMRVGEVAHGSDNALYTSAARYEVSEVFSFTHVEVGTSPFFRYNIVPFTLKQWKEAIASNFLFINGTDSWATTYIENHDQARSITRFADDSPKYRKISGKLLTLLECSLTGTLYVYQGQEIGQINFKEWPIEKYEDVDVKNNYEIIKKSFGKNSKEMKDFFKGIALLSRDHSRTPMPWTKDKPNAGFTGPDVKPWFFLNESFEQGINVEQESRDDDSVLNFWKRALQARK.... The pIC50 is 2.6. (8) The compound is N=C(N)Nc1ccc(C(=O)Oc2ccc(CC(=O)N[C@@H](Cc3ccccc3)C(=O)O)c(C(F)(F)F)c2)cc1. The target protein sequence is MSALLFLALVGAAVAFPVDDDDKIVGGYTCRENSVPYQVSLNSGYHFCGGSLINDQWVVAAHCYKTRIQVRLGEHNINVLEGNEQFIDAAKIIKHPNFNRKTLNNDIMLIKLSSPVTLNARVATVALPSSCAPAGTQCLISGWGNTLSFGVSEPDLLQCLDAPLLPQADCEASYPGKITGNMVCAGFLEGGKDSCQGDSGGPVVCNGELQGIVSWGYGCALPDNPGVYTKVCNYVDWIQDTIAAN. The pIC50 is 9.5. (9) The target protein (Q05209) has sequence MEQVEILRKFIQRVQAMKSPDHNGEDNFARDFMRLRRLSTKYRTEKIYPTATGEKEENVKKNRYKDILPFDHSRVKLTLKTPSQDSDYINANFIKGVYGPKAYVATQGPLANTVIDFWRMIWEYNVVIIVMACREFEMGRKKCERYWPLYGEDPITFAPFKISCEDEQARTDYFIRTLLLEFQNESRRLYQFHYVNWPDHDVPSSFDSILDMISLMRKYQEHEDVPICIHCSAGCGRTGAICAIDYTWNLLKAGKIPEEFNVFNLIQEMRTQRHSAVQTKEQYELVHRAIAQLFEKQLQLYEIHGAQKIADGVNEINTENMVSSIEPEKQDSPPPKPPRTRSCLVEGDAKEEILQPPEPHPVPPILTPSPPSAFPTVTTVWQDNDRYHPKPVLHMVSSEQHSADLNRNYSKSTELPGKNESTIEQIDKKLERNLSFEIKKVPLQEGPKSFDGNTLLNRGHAIKIKSASPCIADKISKPQELSSDLNVGDTSQNSCVDCSV.... The small molecule is CCOc1ccc2nc(NC(=O)CSc3nc4sc(C)c(C)c4c(=O)[nH]3)sc2c1. The pIC50 is 4.7. (10) The drug is O=C(COn1nnc2ccc(C(F)(F)F)cc21)Nc1c(Cl)cccc1Cl. The target protein (G5EFF5) has sequence MGTNGGVIAEQSMEIETNENPDKVEEPVVRRKRVTRRRHRRIHSKNNCLTPPNSDDDPQMSTPDDPVIHSPPSIGAAPGMNGYHGSGVKLEESSGACGSPDDGLLDSSEESRRRQKTCRVCGDHATGYNFNVITCESCKAFFRRNALRPKEFKCPYSEDCEINSVSRRFCQKCRLRKCFTVGMKKEWILNEEQLRRRKNSRLNNTGTCNKRSQPGNQQSPQGPNQQPHLSPHHPGVAIYPPQPQRPLTINPMDNQMMHHMQANRPNAMPQLISPPGAQPYPLTSPVGSSASDSPPNRSLTMMHNGEKSPDGYDPNIMAHRAPPPSFNNRPKMDSGQVVLSTEEYKQLLSRIPGAQVPGLMNEEEPINKRAAYNCNGHPMPAETTPPYSAPMSDMSLSRHNSTSSGTEKNHMTHSTVSAIPGNSAQNHFDIASFGMGIVTATGGGDAAEEMYKRMNMFYENCIQSALDSPENQEPKPQEAMIPKEEYMTPTHGFQYQSDPY.... The pIC50 is 4.2.