Adam Ashenfelter

Co-founder and hacker at BigML (Corvallis, OR)

I'm a co-founder and hacker at BigML since 2011. Before that I worked on recommender systems and applying Dynamic Bayesian Networks to assorted problems (mostly in the defense world). But recently at BigML I've spent my time implementing various ML techniques in a scalable fashion as part of our web service (trees, ensembles, gmeans and fancy kmeans, isolation forests, etc.). I also spend time working on visualizations and a few open source projects, which is what this page covers.

Open Source Projects

A few BigML open source projects where I've acted as the main contributer. They're all data-oriented, offering techniques for summarizing or sampling over large (or streaming) data sources.

Sketching Algorithms in Clojure

Various probabilistic hashing/sketching algorithms in Clojure (bloom filters, min-hash, hyper-loglog, count-min). Useful for making compact and mergable summaries of streaming data. Handy for distributed systems and even for ML tasks (specifically min-hash).

Random Sampling

A Clojure library for random sampling. Includes simple in-memory sampling, reservoir sampling, and stream-oriented sampling, along with options for with-replacement, without-replacement, weighting, seeds, and more. See the blog post and check out the code.

Fun Projects

A selection of my ever changing side projects. These ones I actually finished... or at least came close enough to be presentable!

Cycling and Public Transit in PDX

My wife and I have been dreaming of Portland. So I collected oodles of data to get a better grasp of which areas are well connected to downtown, grocers, and schools. The resulting app lets you set the filters to find the best connected homes. Beware, the page takes a while to load and I've done almost no testing for mobile devices.

Gerrymandering Grades

Grades each state and individual congressional district based on how badly the districts appear to be gerrymandered. Grades are calculated for both the 2009 (pre-census) and 2013 (post-census). The grades are found using the ratio of a district’s area to its convex hull. See the fancy viz and the code used to generate it.

Congressional Partisanship

Tracks the current and historical levels of partisanship in the voting behavior of the US congress. You guessed it, today's climate isn't pretty. Check out the code (kinda ancient) or the write-up (updated with current data).

Visualizations

Demos of some of the visualizations I've created at BigML. These all use the fantastic D3 library. Many of them are now in production, but others are still being polished for prime time. See more on my bl.ocks.org page.

Dynamic Heatmap

An experimental heatmap that supports both numeric and categorical features, automatically selects the grid scale, uses color to code a third dimension, and shades according to density.

Dynamic Scatterplot

A scatterplot that supports both numeric and categorical features, clumps together nearby points, uses color as a third dimension, and has pretty transistion animations to boot. The iris dataset is a classic example, but I think the abalone and autos examples are more interesting to explore.

t-SNE + LDA Topics

LDA is nice for transforming unstructured text into set of numeric features, or topics. t-SNE gives us a nice way to plot those topics so that closely related topics tend to be near one another.