John Healy

I'm a mathematician and data scientist at the Tutte Institute for Mathematics and Computing (TIMC). I enjoy identifying fundamental mathematical problems which underlie a variety of real world problems and working with a team to design algorithms to solve them. I then enjoy closing the loop by bringing those solutions back to clients and helping them understand how to use them to make a difference.
I have worked with a wide variety of machine learning and statistical techniques over the years from neural networks to relational or graph analytics. Most recently my work has focused on unsupervised learning and specifically clustering, outlier detection, dimension reduction and interactive data visualization.
Current research and/or projects
Most recently the projects that I've been most heavily involved in are the development of a fast version of the density based clustering algorithm, HDBSCAN, which is current in scikit-learn-contrib, the invention and development a dimension reduction algorithm called Uniform Manifold Approximation and Projection (UMAP) and a python library for vectorizing variable length sequences called Vectorizers.
The theme of my current work involves developing a solid practical pipeline for vectorizing, exploring and labelling data within low dimensional interactive maps. I have a particular interest in cyber defense data.
Research and/or project statements
Professional activities / interests
I am actively involved in promoting and assisting with data science during the the cyber defense Geek Week workshops.
Additional links
Key publications
- McInnes, L, Healy, J, UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction, ArXiv e-prints 1802.03426, 2018
- Etienne Becht, Leland McInnes, John Healy, Charles-Antoine Dutertre, Immanuel WH Kwok, Lai Guan Ng, Florent Ginhoux, Evan W Newell, Dimensionality reduction for visualizing single-cell data using UMAP, Nature Biotechnology 37, 38–44 (2019)
- McInnes L, Healy J. Accelerated Hierarchical Density Based Clustering In: 2017 IEEE International Conference on Data Mining Workshops (ICDMW), IEEE, pp 33-42. 2017
- Xin Li, Ondrej E. Dyck, Mark P. Oxley, Andrew R. Lupini, Leland McInnes, John Healy, Stephen Jesse & Sergei V. Kalinin, Manifold learning of four-dimensional scanning transmission electron microscopy, npj Computational Materials 5, 5 (2019)
- M. Dewar, J. Healy, X. Perez-Gimenez, P. Pralat, J. Proos, B. Reiniger and K. Ternovsky, "Subhypergraphs in non-uniform random hypergraphs", Internet Mathematics, 2018 (2018).
- M. Dewar, J. Healy, X. Perez-Gimenez, P. Pralat, J. Proos, B. Reiniger and K. Ternovsky, "Subhypergraphs in non-uniform random hypergraphs", Lecture Notes in Computer Science, vol. 10088: Proceedings fro the 13th International Workshop on Algorithms and Models for the Web Graph, eds. Anthony Bonato, Fan Chung Graham and Pawel Pralat, Springer, New York, 2016.