OSDS Workshop

Abstracts

Thomas Lumley (Department of Statistics, University of Auckland)

Talk title: Data Science: Will Computing and Informatics Eat Our Lunch Now?

Abstract: Mainstream statistics ignored computing for many years, so that students were taught to handle infinite N, but not N of a million. Practical estimation of conditional probabilities and conditional distributions in large data sets was often left to computer science and informatics. Although statistics started behind, we are catching up: many individual statisticians and some statistics departments are taking computing seriously. More importantly, applied statistics has a long tradition of understanding how to formulate questions: large-scale empirical data can tell you a lot of things, but not what your question is. I will talk about what statistics needs to do to avoid being left behind, but also what we have to offer. In particular, we understand sampling and understand interpretation and implications of models, which are helpful in avoiding Pointless, Creepy, or Evil data science. This is an update of the 2015 Lancaster Lecture.

Golbon Zakeri (Department of Engineering Science, University of Auckland)

Talk title: Analytics, optimization and economics of rain, wind and weather for the NZEM

Abstract: New Zealand has one of the pioneer electricity markets in the world and in many instances it is looked upon to inform the development of regulation for electricity markets in other jurisdictions. In this talk we will provide an overview of a number of operations research and data analytics projects we have undertaken towards efficiency of the New Zealand Electricity Market (NZEM).

Tim Robinson (Department Statistics, University of Wyoming)

Talk title: A Case Study in Data Science: Mobilizing Data for the Monitoring of Native Grass Composition in the Prairie Pothole Region

Abstract: The Prairie Pothole Region (PPR) is a 715,000 km² area in North America that is filled with wetlands, hills, and lakes formed by glaciers as they melted and moved through the area more than 10,000 years ago (https://en.wikipedia.org/wiki/Prairie_Pothole_Region). Over the last century, much of the acreage in the PPR has been converted to agricultural use. It is estimated that nearly half of the potholes have been drained for agricultural use and in some areas, nearly 90% of the potholes have disappeared. Effective management for conservation requires data informed decision making and subsequent policy development within a complex system. For data to be of value for conservation efforts, thought must be placed into the workflow required to effectively mobilize the data to decision makers. This talk will provide an overview of the workflow process being used to mobilize data for decision making for conservation managers tasked with adaptive management of the PPR. As part of this talk, I will also discuss the importance of placing careful thought into the sampling design (selection of sampling units as well as a power analysis for trend detection) used for monitoring native prairie health objectives.

Miriam Hodge (Faculty of Agriculture and Life Sciences, Lincoln University)

Talk title: Avoiding extreme outcomes in transport

Abstract: Goods can experience a wide variety of conditions during transport. The conditions that the goods experience result from the interaction localised conditions inside the vehicle and external environmental conditions. Once an extreme outcome, such as the death of livestock, has occurred it is not possible to recover the goods. The first objective of this study is to identify the thresholds for extreme conditions. The second objective is to determine conditions under our control that can be altered to avoid extreme conditions before they occur.

Kourosh Neshatian (Computer Science and Software Engineering, University of Canterbury)

Talk title: Which learning algorithm is best?

Abstract: When reading the literature or exploring software libraries on supervised machine learning, one often sees a wide spectrum of algorithms ranging from those that produce linear models or decision trees to Support Vector Machines and deep learning. One natural question to ask is whether there is any (partial) order between these algorithms, particularly from the perspective of optimisation or generalisation over large classes of problems. In this talk we review a few formal frameworks that try to address this question.

Blair Robertson (Department of Mathematics and Statistics, University of Canterbury)

Talk title: Connections between local non-smooth optimization and global optimization

Abstract: In this talk, we show that there is a connection between local non-smooth optimization and global optimization. This connection is then used to define an algorithm for non-smooth local optimization. The method forms a partition on the search region using training observations and classification trees. The partition defines regions where the objective function is relatively low based on observed values. Further points are selected from these low regions, before a new partition is formed. Alternating between partition and sampling phases provides an effective method for local optimization. The sequence of iterates generated by the algorithm can be shown to converge to an essential local minimizer with probability one under mild conditions.