Ando and Zhang (2005)
Rie Johnson and Tong Zhang. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data. JMLR 2005.
The bulk of the structural learning presentation is devoted to this work. In this journal paper, Rie Johnson (formerly Ando) and Tong Zhang pose structural learning as a multitask learning problem. They assume that a good (linear) hypothesis for their original problem is close to a low-dimensional hypothesis space. This low-dimensional hypothesis space, in turn, is an approximation to the space spanned by linear auxiliary predictors which can be trained using only unlabeled data. The algorithm we present in the tutorial is a variant of their alternating structural optimization algorithm, which discovers a hypothesis space that minimizes the joint empirical risk across all of the auxiliary problems. In addition to entity recognition, they give results for part of speech tagging, chunking, and digit recognition.
John Blitzer, Ryan McDonald, and Fernando Pereira. Domain Adaptation with Structural Correspondence Learning. EMNLP 2006.
John Blitzer, Mark Dredze, and Fernando Pereira. Biographies, Bollywood, Boomboxes, and Blenders: Domain Adaptation for Sentiment Classification. ACL 2007.
The domain adaptation subsection of the tutorial discusses the algorithms and results described in these papers. Structural correspondence learning (SCL) addresses the problem of semi-supervised domain adaptation: We have labeled training data from a source domain, but we wish to make predictions in a different target domain for which we have no labeled data. The SCL procedure chooses as auxiliary problems the prediction of pivot features which both occur frequently in both domains and characterize the original problem which we want to solve. In both papers, Blitzer et al. examine the process of incorporating labeled examples from the target domain, in tandem with the SCL representation.
Sham Kakade and Dean Foster. Multi-view Regression via Canonical Correlation Analysis. COLT 2007.
Kakade and Foster do not directly analyze (or even address) structural learning, but we show in the presentation how to relate structural learning to canonical correlation analysis (CCA). The paper is theoretical. They begin by assuming a weakened version of the first co-training assumption. Namely they assume that the models $f_1$ and $f_2$ have low regret to the best linear model which uses both views (but not necessarily low error overall). Under this assumption, they suggest an algorithm which performs ridge regression one one view in the CCA-transformed canonical space. The resulting bound gives a rate of convergence which depends on the amount of correlation captured by the canonical basis: the more correlation the slower the rate of convergence.
More structural learning
Rie Johnson, Mark Dredze, and Tong Zhang. TREC 2005 Genomics Track Experiments at IBM Watson. TREC 2005.
An application of the ideas of structural learning to information retrieval. The key idea of the paper is to use structural learning as a kind of pseudo-relevance feedback mechanism. Given a query, Johnson et al. generate a set of alternative queries by discarding different words in the query. They then create a single "centroid document" using the BM25-weighted documents from these alternative queries. Finally, the top singular vector of the matrix whose columns are these centroid vectors functions as an augmented query. They show how this approach to pseudo-relevance feedback can significantly improve retrieval results for their biomedical task.
Ariadna Quattoni, Michael Collins, and Trevor Darrel. Learning Visual Representations using Images with Captions. CVPR 2007.
Quattoni et al. apply structural learning to the problem of image classification. They consider a setting in which they have a small number of labeled images (without captions) and a large number of unlabeled images which have captions. Using SIFT features as their image representation, they propose to create auxiliary problems from captions for using in a structural learning task. Then they use the resulting structural representation to perform semi-supervised learning on the original problem.