# Feature finding using an Indian Buffet Process

A demonstration of using Indian Buffet Process for feature finding. Following A Nonparametric Bayesian Method for Inferring Features From Similarity Judgments. Navarro and Griffiths. NeurIPS 2006.

Extensions and imports for this Literate Haskell file
``{-# LANGUAGE ScopedTypeVariables #-}``
``module AdditiveClusteringDemo where``
``````import LazyPPL
import Distr
import Distr.IBP
import Distr.Memoization``````
``````import Data.List
import Data.Monoid``````
``````import Data.Array
import Data.Maybe
import Graphics.Matplotlib``````

Additive clustering is a method for inferring features over a set of of objects using similarity data. The nature of the features is unknown, and the number of possible features is unknown, and inferred using the Indian Buffet Process. Each feature has an associated “saliency” weight indicating how much it contributes to the similarity coefficient. We also infer the saliency weights.

## Data set

``data Country = China | Cuba | Germany | Indonesia | Iraq | Italy | Jamaica | Japan | Libya | Nigeria | Philippines | Russia | Spain | US | Vietnam | Zimbabwe deriving (Eq, Ord, Show, Enum, Bounded, Ix)``
``````countries :: [Country]
countries = [China .. Zimbabwe]``````
``similarityData :: Array (Country,Country) Double``
Data from experiment
``````similarityData = array ((minBound,minBound),(maxBound,maxBound)) [((i,j),if i > j then (countriesDataset !! (fromEnum i) !! (fromEnum j)) else if i == j then 1 else (countriesDataset !! (fromEnum j) !! (fromEnum i))) | i <- countries , j <- countries]
where
countriesDataset :: [[Double]] =
[[ ],
[0.11],
[0.06, 0.04],
[0.43, 0.06, 0.03],
[0.06, 0.32, 0.04, 0.14],
[0.02, 0.09, 0.70, 0.02, 0.03],
[0.02, 0.59, 0.02, 0.14, 0.04, 0.10],
[0.69, 0.01, 0.26, 0.35, 0.03, 0.06, 0.03],
[0.03, 0.32, 0.01, 0.04, 0.70, 0.04, 0.11, 0.01],
[0.01, 0.12, 0.01, 0.04, 0.20, 0.03, 0.31, 0.01, 0.45],
[0.42, 0.12, 0.01, 0.87, 0.09, 0.02, 0.17, 0.31, 0.05, 0.04],
[0.51, 0.35, 0.55, 0.01, 0.13, 0.22, 0.02, 0.17, 0.05, 0.02, 0.03],
[0.02, 0.37, 0.58, 0.03, 0.04, 0.90, 0.20, 0.04, 0.04, 0.03, 0.04, 0.15],
[0.30, 0.11, 0.42, 0.03, 0.06, 0.20, 0.12, 0.46, 0.02, 0.04, 0.01, 0.43, 0.20],
[0.60, 0.12, 0.03, 0.55, 0.12, 0.01, 0.05, 0.45, 0.10, 0.03, 0.57, 0.08, 0.02, 0.12],
[0.01, 0.08, 0.01, 0.11, 0.15, 0.02, 0.29, 0.01, 0.31, 0.83, 0.08, 0.01, 0.02, 0.01, 0.03]]``````

Data set taken from Representing Stimulus Similarity, PhD thesis, D. J. Navarro. Department of Psychology University of Adelaide. December, 2002

The Indian Buffet Process provides abstract types `Restaurant` and `Dish`, and the following interface:

• `newRestaurant :: Double -> Prob Restaurant` opens a new restaurant;

• `newCustomer :: Restaurant -> Prob [Dish]` assigns a finite set of dishes to a new customer.

Our additive clustering models has three parameters (`alpha`, `lambda1` and `lambda2`) and produces a mapping from countries to lists of features. We use an Indian Buffet Process abstraction, so we imagine that each country arrives as a customer at a buffet, and takes some dishes, which are the features. We can then assess which countries took similar dishes to see which countries are similar.

``````additiveClustering :: Double -> Double -> Double
-> Array (Country,Country) Double -> Meas (Array Country [Dish])``````

(Recall that in Haskell, the type `(Array a b)` comprises `a`-indexed arrays of things of type `b`.)

``````additiveClustering alpha lambda1 lambda2 similarityData = do
(restaurant :: Restaurant) <- sample \$ newRestaurant alpha``````

Each dish is randomly assigned a weight, which is how important it is to the similarities between countries.

``    (weights :: Dish -> Double) <- sample \$ memoize (\d -> gamma lambda1 lambda2)``

Each country is assigned a list of dishes, by regarding it as a new customer in the restaurant.

``````    (features :: Array Country [Dish])
<- listArray (minBound,maxBound) <\$>
(replicateM (fromEnum (maxBound :: Country) + 1) \$
sample \$ newCustomer restaurant)``````

We define a function that calculates the similarity of two countries given the dishes that they share.

``````    let similarity :: Country -> Country -> Double
similarity i j = sum [weights a | a <- features!j, a `elem` features!i]``````

We score based on how closely this similarity matches the experimental data.

``    mapM_ (\(i, j) -> score \$ normalPdf (similarityData!(i,j)) 0.1 (similarity i j)) [ (i, j) | i <- countries, j <- countries, j < i ]``

We return the feature assignment.

``    return features``

## Simulation

We sample from this using Metropolis Hastings simulation, and plot some MAP features. The columns represent the found features, with a coloured box if the country has that feature.

``````test = do
samples <- mhirreducible 0.2 0.01 \$ additiveClustering 2 6 0.3 similarityData
return \$ plotFeatureMatrix \$ maxap \$ take 100000 samples``````

This is a highly multi-modal distribution. For the three runs, we typically have different numbers of features, and they are different. But in each case they admit a human interpretation, for example geographic, history, development status.

Plotting routines
``````maxap samples =
let maxw = (maximum \$ map snd samples) in
fromJust \$ Data.List.lookup maxw \$ map (\(z,w) -> (w,z)) samples ``````
``````plotFeatureMatrix :: (Ix x,Enum x,Bounded x,Eq y) => Array x [y] -> Matplotlib
plotFeatureMatrix features = do
let ys = nub \$ concat \$ elems features
let matrix = [[ if elem y (features ! x) then fromJust (elemIndex y ys) else 0 | y <- ys] | x <- [minBound..maxBound] ]
matShow matrix @@ [o2 "cmap" \$ raw "twilight"] % yticks (map fromEnum countries) % ytickLabels (map (raw . show) countries) ``````
``````combinePlots filename fs = do
putStrLn \$ "Generating " ++ filename ++ "..."
file filename \$ foldl (\p i -> p % setSubplot i % fs !! i) (subplots @@ [o2 "ncols" (length fs), o2 "sharey" True, o2 "gridspec_kw" \$ lit "{'width_ratios': [1, 1, 1]}"]) [0..(length fs -1)]
putStrLn "Done."``````
``main = do { ps <- replicateM 3 test ; combinePlots "images/additiveclustering-matrix.svg" ps }``

Generated by Pandoc from Literate Haskell. Full source at Bitbucket.