Abstract
Glycans are critical to every facet of biology and medicine, from viral infections to embryogenesis. Tools to study glycans are rapidly evolving; however, the majority of our knowledge is deeply dependent on binding by glycan binding proteins (e.g., lectins). The specificities of lectins, which are often naturally isolated proteins, have not been well-defined, making it difficult to leverage their full potential for glycan analysis. Herein, we use a combination of machine learning algorithms and expert annotation to define lectin specificity for this important probe set. Our analysis uses comprehensive glycan microarray analysis of commercially available lectins we obtained using version 5.0 of the Consortium for Functional Glycomics glycan microarray (CFGv5). This data set was made public in 2011. We report the creation of this data set and its use in large-scale evaluation of lectin-glycan binding behaviors. Our motif analysis was performed by integrating 68 manually defined glycan features with systematic probing of computational rules for significant binding motifs using mono- and disaccharides and linkages. Combining machine learning with manual annotation, we create a detailed interpretation of glycan-binding specificity for 57 unique lectins, categorized by their major binding motifs: mannose, complex-type N-glycan, O-glycan, fucose, sialic acid and sulfate, GlcNAc and chitin, Gal and LacNAc, and GalNAc. Our work provides fresh insights into the complex binding features of commercially available lectins in current use, providing a critical guide to these important reagents.