Gaussian Processes With Categorical Inputs

Abstract

Regression problems arise in a variety of contexts including the development of Gaussian process models for computer simulators. Many approaches already exist. for Gaussian process regression with continuous valued inputs, however many simulators(and observational data sets) contain both continuous and discrete valued inputs. There are relatively few approaches for addressing Gaussian process regression with mixed continuous and categorical inputs. These include treed Gaussian processes, Dirichlet processes with Generalized Linear Models, and Gaussian processes which use a Hypersphere parameterization. The aim of this work is to extend Gaussian process models such that they can use categorical inputs e.g. someone’s occupation, {Student, Lecturer...}, alongside the usual continuous inputs. A naïve approach would be to fit independent Gaussian processes for each category, but this quickly gets inefficient as the number of categories, and in particular the number of categorical inputs, increases. In this work we propose to model the categorical inputs by including a mapping from each categorical element to a continuous real value. We propose to learn the categorical mapping using likelihood based methods. The posterior distribution of the categorical mappings and their relation are expected to reflect their relative influence on the output. Using examples we illustrate the learning dynamics of our method. We explore the strongly multi-modal nature of the posterior distributions for the mappings of the categorical data into real values. We contrast the plug-in estimators which are obtained using likelihood methods with a Bayesian approach using MCMC. Comparisons between our approach and other existing methods for categorical inputs are made on simple data sets.

Publication DOI: https://doi.org/10.48780/publications.aston.ac.uk.00044156
Additional Information: Copyright © Tulloch, Sean-Michael. 2012. Sean-Michael Tulloch asserts their moral right to be identified as the author of this thesis. This copy of the thesis has been supplied on condition that anyone who consults it is understood to recognise that its copyright rests with its author and that no quotation from the thesis and no information derived from it may be published without appropriate permission or acknowledgement. If you have discovered material in Aston Publications Explorer which is unlawful e.g. breaches copyright, (either yours or that of a third party) or any other law, including but not limited to those relating to patent, trademark, confidentiality, data protection, obscenity, defamation, libel, then please read our Takedown Policy and contact the service immediately.
Institution: Aston University
Last Modified: 15 May 2025 14:51
Date Deposited: 01 Sep 2022 15:37
Completed Date: 2012-10
Authors: Tulloch, Sean-Michael

Export / Share Citation


Statistics

Additional statistics for this record