# hotspotr: hotspot mapping in R

08 Jul 2014**Updated 6/17/2016: A new version of hotspotr is available for download here. The new version is a standalone program with a number of improvements and better performance than the R package discussed here, which still works but is no longer maintained.**

A very alpha version of a package I put together to do hotspot mapping in R is now available on github. All you to get started is a a dataset with x,y coordinates and markers for case vs. control status (i.e. home locations of diseased vs. non-diseased individuals).

Although there are already great packages out there to do this kind of thing (i.e. sparr and spatstat), I put this together to make it easy to compare between different algorithms and also to facilitate plotting the hotspot map on top of a geographic map using the `ggmap`

package.

The following is a short demo of how the package can be used to create a very simple hotspot map using the algorithm of your choosing to calculate local case densities (i.e., distance-based mapping, kernel density estimation, etc.). In the next few weeks, I hope to post a tutorial or two talking about these methods and potential applications in more depth.

## Demo

To install, make sure you have the `devtools`

package installed and loaded and run:

Import `hotspotr`

:

Generate a set of (x,y) points in the unit square:

Using the `random_hotspot`

function in `hotspotr`

, place an area of increased risk in the center of the square. In this case, we’ll select an area covering the middle 30% of the unit square where 80% of individuals within this are are cases and only 20% outside of it are cases, for a relative risk of 4:

Create a new data frame with case points labeled as z = 1 and controls as z = 0:

We can plot this and see anecdotally that there is a greater density of cases (represented by triangles) in the center:

We can verify whether this area of increased density is statistically significant using the `hotspot_map`

function in `hotspotr`

.

The first argument to `hotspot_map`

is the data frame with columns `x`

, `y`

, and `z`

with x and y coordinates and case/control designations, respectively. The second argument is the density estimation method to use (currently only the distance-based-mapping method of Jeffery et al. is supported (as the function `dbm_score_rr`

).

User-defined density functions can easily be written. All that is required is a function of the form `fn(hs)`

that returns a density measure at each (x,y) point in the `hs`

dataframe. `p`

specifies the width of the smoothing window, and `color_samples`

specifies the number of random permutations of case/control designations to use when generating the color scale for the map:

We can then plot the resulting map and see that in fact the area at the center of the map represents a likely hotspot: