Showing posts with label statistics. Show all posts
Showing posts with label statistics. Show all posts

Wednesday, 7 September 2016

rpostgis and RQGIS: two useful statistical tools for archaeologists


Hi all.
Just a short post to spread the announcement of the recent release of two R packages: rpostgis (see also here) and RQGIS (see also here and here). The first facilitates transfer between PostGIS "Geometry" objects (stored in PostgreSQL databases) and R spatial objects; the latter establishes an interface between R and QGIS and allows the user to access the many QGIS geoalgorithms from within R.
I tested them briefly and I think they are very useful tools to perform and simplify statistical and geo-statistical analyses in archaeological contexts. Here I present a quick example of usage. 
Firstly I imported a set of archaeological site-points stored in a PostgreSQL/PostGIS database. That is very simple with rpostgis package: it's enough to create a database connection (like in RpostgreSQL package) and launch the "pgGetGeom" function.

Then I used RQGIS package to run (within R) the QGIS geoalgorithm that builds a polygon from layer extent. After setting the same parameters required by QGIS, the function "run_qgis" creates a red polygon around the outermost points of my dataset.


Actually, must be careful to the version of QGIS we are using. With 2.14 there's no problem, but if you're using 2.16.1 or 2.16.2 (like me) you must modify the QGIS file "AlgorithmExecutor.py" (path for linux users should be: "/usr/share/qgis/python/plugins/processing/AlgorithmExecutor.py") as described in the web page. In the next future this problem should be correct by QGIS core team.
At the end I performed a specific point pattern analysis with the data imported and created by our two packages: in this example I calculated the Ripley's K function (for an archaeological example see here) in order to identify the distribution model (random, regular or clustered) of my archaeological sites.


In my opinion these two new R packages make easier and faster the traditional spatial analyses in R and facilitate a more virtuous integration between GIS, geo-database and statistics.
Bye.

Denis Francisci

Wednesday, 6 February 2013

Financial Candlestick chart for archaeological purposes: preliminary tests

Candlesticks chart is a plot used primarily in finance for describing price movements of quoted stock, derivative or currency over time. It is a combination of a line-chart and a bar-chart:

Chart made by R package "Quantmod"

the wick (i.e. one or two lines coming out from the polygon) illustrates the highest and lowest traded prices during the time interval represented; the body (i.e. the polygon) illustrates the opening and closing trades. Candlesticks chart seems apparently similar to box plots, but they are totally different (source: Wikipedia, 05/02/2013).
I often saw this kind of chart in newspapers or in TV, but only now I have undertaken to understand how it works; and so I had the “crazy” idea of applying candlesticks chart to archaeological data.
More specifically I thought about archaeological finds. For many of them - in particular for ceramic types - we know a starting date, i.e. the period in which a specific production begins, a time range of maximum diffusion, defined by an initial moment and a final one, and an end date after which there are no more traces of our object.
If we replace the 4 financial values of higest, lowest, opening and closing price with these 4 chronological value (starting date, initial and final moments of maximum diffusion's range, end date), we could use profitably candle plot for describing the life's path of each archaeological material found in a stratigraphic unit (US); the goal is to date the same US by comparing the candlesticks of all materials contained therein.

In R Candlesticks chart is provided by Quantmod package, Quantitative Financial Modelling & Trading Framework for R (http://cran.r-project.org/web/packages/quantmod/index.html). But this package is very specific for financial purposes and requires specific data types like time series (xts): so I put aside the idea of using the Quantmod package and I tried to build a new R function for plotting candlesticks with non-financial data.
This is a first (simplified) example of my preliminary tests (for which I have to thank R-help mailing list: https://stat.ethz.ch/mailman/listinfo/r-help).

I built a table like this:

find, min, int_1, int_2, max
find_a, 250, 300, 400, 550
find_b, 200, 350, 400, 450
find_c, 350, 400, 450, 500
find_d, 250, 350, 450, 500
find_e, 200, 400, 500, 600

For each archaeological object (find_a, find_b, find_c, …) is given the starting date (min), the initial and final moments of maximum presence range (int_1, int_2) and the end date (max), all in approximate years.
I plotted this dataframe in R using “with()” function that enables to build a personal environment from data. Here is the source code:

> US1 <- read.table("../example.txt", header=TRUE, sep=",")
> with(US1, symbols(find, (int_1 + int_2)/2, boxplots=cbind(.4,int_2-int_1,int_1-min,max-int_2,0), inches=F, ylim=range(US1[,-1]), xaxt="n", ylab="Years (AD)", xlab="Findings", main="Findings chronological distribution of US 1", fg="brown", bg="orange", col="brown"))
axis(1,seq_along(US1$find),labels=US1$find)

and here is the result:


Analyzing this plot it is possible to deduce that the layer US1 probably dates back to the first half of 5th century A.D.; the materials find_a and find_b could be residuals of previous ages.


As I said, this is just a simple example, but potentialities are clear. This method enables to plot the duration of archaeological materials and to compare the datings of objects found in stratigraphic units for assigning them a chronological information. The statistical environment could provide other advantages like probabilistic analyses, confidence intervals, etc., giving a mathematical-statistical support to the usual (and often subjective) dating of the archaeological layers.
The next steps will be the building of a specific R function for “archaeological” candle plot, starting from the simple code written above, and tests to plot the duration of archeological finds with other statistical techniques like seriation, boxplot, etc.

Any suggestions, websites, literature and bibliographic references about this topic, advice on R packages different from Quantmod that provide candlesticks chart without financial data are welcomed.

by Denis Francisci

Tuesday, 22 January 2013

manageR, a usefull plugin for QGIS

manageR is a QGIS plugin providing a simple and usefull interface to R statistical programming environment (http://www.r-project.org/). It is created by Carson J. Q. Farmer (http://www.ftools.ca/manageR) and is downloadable from this repository: http://www.ftools.ca/cfarmerQgisRepo.xml. To install it in QGIS is enough add such repository in QGIS Python Plugin Installer (Plugins → Fetch Python Plugins).

One of the most interesting things is that you can take data directly from the .dbf table of the shapefile layer loaded in QGIS and process them in R environment. Usually, when I work with PostgreSQL/PostGIS or SQLite/SpatiaLite for managing attributes table of vector layers, I connect directly database with R using RODBC or RSQLite packages. But if I have to use shapefiles and their .dbf tables, manageR could be a good solution, specially for fast and simple works.

Here, I would like to present a small example of plugin's use. In QGIS I created a distribution map of Roman funerary sites in Trentino-Alto Adige region (Northern Italy). The sites (blue dots) are registered in a simple shapefile and every single point is associated to a record stored in a .dbf table. As usual, the .dbf table is divided in several columns each of which contains different attributes about sites (ID, coordinates, height, date, etc.).


I need to plot an histogram of heights above sea level to get an immediate view of sites distribution based on heights. I can launch manageR from QGIS.


At first sight, manageR is a simple GUI that includes R command line, some toolbars for managing data, graphic devices, history, etc. and several buttons to make some of the most common statistical analysis.
As I said, in manageR I can import layer attributes with button “Action → Import Layers Attribute” (or CTRL+T) and then I can select the column I need (in my case, “height”) using R language.


Typing in R command line or using button “Analysis” in main toolbar, I can select and launch the statistical function I need and plot the diagram; in my example I plotted an histogram of heights a.s.l. of my funerary sites.



This is a simple example, but manageR plugin could be a very usefull tool for archaeologists, also for more complex works. Its main advantage is that it works directly with .dbf table, avoiding the export of data or the opening of .dbf file in Calc/Excel.

by Denis Francisci

Friday, 27 July 2012

July 27, 2011 - July 27, 2012: a year of ATOR

Hello all,
today ATOR reaches its first year, so I thought to post and analyze some statistics to see how this experiment of shared research is progressing.
Until now we have six active authors, who wrote 79 posts. The community reacted with 96 comments, although most of them are written by the authors in response to direct questions from readers. Overall the blog counted 21484 visits (8695 visits since the activation of the Revolver Maps plugin, as you can see in the image below).


Today ATOR has 25 members and the general trend is still growing, but sinlge posts may affect the statistics with an increase of visitors related both to the quality of the post and to the interest aroused by the topic. A good example of this situation is the post of Cicero Moraes about forensic facial reconstruction, which has captured the attention of the community of 3D modelers and of physical anthropologists, reching the peak traffic you can see in June 2012 in the graph below.





The post reached also the attention of Ton Roosendaal (original creator of Blender), who wrote a tweet about it:


Anyway, the main strength of ATOR and of an open approach to research remains the active collaboration between researchers operating in different fields (not only archaeologists). In this case must be placed, for example, the new collaborations with the 3D artist Cicero Moraes (already mentioned) and with the anthropologist Moreno Tiziani, creator of anthropological association Antrocom Onlus, which publishes the Online Journal of Anthropology
We will go on working with this open philosopy in archaeology, inspired by the Free/Libre and Open Source Software movement, and to further increase the quality of posts in ATOR with the help of the community. As usual, if you want to collaborate, just contact us! 
Thank you. 
BlogItalia - La directory italiana dei blog Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.