Showing posts with label Free/Libre and Open Source Software. Show all posts
Showing posts with label Free/Libre and Open Source Software. Show all posts

Friday, 24 April 2015

Doing quantitative archaeology with open source software

This short post is written for archaeologists who frequently perform common data analysis and visualisation tasks in Excel, SPSS or similar commercial packages. It was motivated by my recent observations at the Society of American Archaeology meeting in San Francisco - the largest annual meeting of archaeologists in the world - where I noticed that the great majority of archaeologists use Excel and SPSS. I wrote this post to describe why those packages might not be the best choices, and explain what one good alternative might be. There’s nothing specifically about archaeology in here, so this post will likely to be relevant to researchers in the social sciences in general. It’s also cross-posted on the Software Sustainability Institute blog.

Prevailing tools for data analysis and visualization in archaeology have severe limitations

For many archaeologists, the standard tools for any kind of quantitative analysis include Microsoft Excel, SPSS, and for more exotic methods, PAST. While these software are widely used, there are a few limitations that are obvious to anyone who has worked with them for a long time, and raise the question about what alternatives are available. Here are three key limitations:
  • File formats: each program has its own proprietary format, and while there is some interoperability between them, we cannot open their files in any program that we wish. And because these formats are controlled by companies rather than a community of researchers, we have no guarantee that the Excel or SPSS file format of today will be readable by any software 10 or 20 years from now. 
  • Click-trails: the main interaction with these programs is by using the mouse the point and click on menus, windows, buttons and so on. These mouse actions are ephemeral and unrecorded, so that many of the choices made during a quantitative analysis in Excel are undocumented. When a researcher wants to retrace the steps of their workflow days, months or years after the original effort, they are dependent on their memory or some external record of many of the choices made in an analysis. This can make it very difficult for another person to understand how an analysis was conducted because many of the details are not recorded. 
  • Black boxes: the algorithms that these programs use for generating results are not available for convenient inspection to the researcher. The programs are a classic black box, where data and settings go it, and a result comes out, as if by magic. For moderately complicated computations, this can make it difficult for the researcher to interpret their results, since they do not have access to all of the details of the computation. This black box design also limits the extent to which the researcher can customise or extend built-in methods to new applications.
How to overcome these limitations?

For a long time archaeologists had few options to deal with these problems because there were few alternative programs. The general alternative to using a point-and-click program is writing scripts to program algorithms for statistical analysis and visualisations. Writing scripts means that the data analysis workflow is documented and preserved, so it can be revisited in the future and distributed to others for them to inspect, reuse or extend. For many years this was only possible using ubiquitous but low-level computer languages such as C or Fortran (or exotic higher level languages such as S), which required a substantial investment of time and effort, and a robust knowledge of computer science. In recent years, however, there has been a convergence of developments that have dramatically increased the ease of using a high level programming language, specifically R, to write scripts to do statistical analysis and visualisations. As an open source programming language with special strengths in statistical analysis and visualisations, R has the potential to be a solution to the three problems of using software such as Excel and SPSS. Open source means that all of the code and algorithms that make the program operate are available for inspection and reuse, so that there is nothing hidden from the user about how the program operates (and the user is free to alter their copy of the program in any way they like, for example, to increase computation speed).

Three reasons why R has become easier to use

Although R was first released in 1993, it has only been in the last five years or so that it has really become accessible and a viable option for archaeologists. Until recently, only researchers steeped in computer science and fluent in other programming languages could make effective use of R. Now the barriers to getting started with R are very low, and archaeologists without any background with computers and programming can quickly get to a point where they can do useful work with R. There are three factors that are relevant to the recent increase in the usability of R, and that any new user should take advantage of:
  • the release of an Integrated Development Environment, RStudio, especially for R
  • the shift toward more user-friendly idioms of the language resulting from the prolific contributions of Hadley Wickham, and 
  • the massive growth of an active online community of users and developers from all disciplines.
1. RStudio

For the beginner user of R, the free and open source program RStudio is by far the easiest way to quickly get to the point of doing useful work. First released in 2011, it has numerous conveniences that simplify writing and running code, and handling the output. Before RStudio, an R user had little more than a blinking command line prompt to work with, and might struggle for some time to identify efficient methods for getting data in, run code (especially if more than a few lines) and then get data and plots out for use in reports, etc. With RStudio, the barriers to doing these things are lowered substantially. The biggest help is having a text editor right next to the R console. The text editor is like a plain text editor (such as Notepad on Windows), but has many features to help with writing code. For example, it is code-aware and automatically colours the text to make it a lot easier to read (functions are one colour, objects another, etc.). The code editor has comprehensive auto-complete feature that shows suggested options while you type, and gives in-context access to the help documentation. This makes spelling mistakes rare when writing code, which is very helpful. There is a plot pane for viewing visualisations and buttons for saving them in various formats, and a workspace pane for inspecting data objects that you've created. These kinds of features lower the cognitive burden to working with a programming language, and make it easier to be productive with a limited knowledge of the language.

2. The Hadleyverse

A second recent development that makes it easier for a new user to be productive using R is a set of contributed packages affectionately known in the R user community as the Hadleyverse. User contributed packages are add-on modules that extend the functionality of base R. Base R is what you get when you download R from r-project.org, and while it is a complete programming language, the 6000-odd user contributed packages provide ready-made functions for a vast range of data analysis and visualization tasks. Because the large number of packages can make discovering relevant ones challenges, they have been organised into 'task views' that list packages relevant to specific areas of analysis. There is a task view for archaeology, providing an annotated list of R packages useful for archaeological research. Among these user-contributed packages are a set by Hadley Wickham (Chief Scientist at RStudio and adjunct Professor at Rice University) and his collaborators that make plotting better, simplify common data analysis activities, speed up importing data in R (including from Excel and SPSS files), and improve many other common tasks. The overall result is that for many people, programming in R is shifting from the base R idioms to a new set of idioms enabled by Wickham's packages. This is an advantage for the new user of R because writing code with Wickham's packages results in code that is easier to read by people, as well as being highly efficient to compute. This is because it simplifies many common tasks (so the user doesn't have to specify exotic options if they don't want to), uses common English verbs ('filter', 'arrange', etc.), and uses pipes. Pipes mean that functions are written one after the other, following the order they would appear in when you explain the code to another person in conversation. This is different from the base R idiom, which doesn't have pipes and instead has functions nested inside each other, requiring them to be read from the center (or inside of the nest) to the left (outside of the nest), and use temporary objects, which is a counter-intuitive flow for most people new to programming.

3. Big open online communities of users

A third major factor in the improved accessibility of R to new users is the growth of an active online communities of R users. There has long been an email list for R users, but more recently, user communities have former around websites such as Stackoverflow. Stackoverflow is a free question-and-answer website for programmers using any language. The unique concept is that it gamifies the process of asking and answering questions, so that if you ask a good question (ie. well-described, includes a small self-contained example of the code that is causing the problem), other users can reward your effort by upvoting your question. High quality questions can attract very quick answers, because of the size of the community active on the site. Similarly, if you post a high-quality answer to someone else's question, other users can recognise this by upvoting your answer. These voting processes make the site very useful even for the casual R user searching for answers (and who may not care for voting), because they can identify the high-quality answers by the number of votes they've received. It's often the case that if you copy and paste an error message from the R console into the google search box, the first few results will be Q&A pages on Stackoverflow. This is very different experience compared to using the r-help email list, where help can come slowly, if at all, and searching the email list, where it's not always clear which is the best solution. Another useful output from the online community of R users are blogs that document how to conduct various analyses or produce visualizations (some 500 blogs are aggregated at http://www.r-bloggers.com/). The key advantage to Stackoverflow and blogs, aside from their free availability, is that they very frequently include enough code for the casual user to reproduce the described results. They are like a method exchange, where you can collect a method in the form of someone else's code, and adapt it to suit your own research workflow.

There's no obvious single explanation for the growth of this online community of R users. Contributing factors might include a shift from SAS (a commercial product with licensing fees) to R as the software to teach students with in many academic departments, due to the Global Financial Crisis of 2008 that forced budget reductions at many universities. This led to a greater proportion of recent generations of graduates being R users. The flexibility of R as a data analysis tool, combined with  rise of data science as an attractive career path, and demand for data mining skills in the private sector may also have contributed to the convergence of people who are active online that are also R users, since so many of the user contributed packages are focused on statistical analyses.

So What?

The prevailing programs used for statistical analyses in archaeology have severe limitations resulting from their corporate origins (proprietary file formats, uninspectable algorithms) and mouse-driven interfaces (impeding reproducibility). The generic solution is an open source programming language with tools for handling diverse file types and a wide range of statistical and visualization functions. In recent years R has become the a very prominent and widely used language that fulfills these criteria. Here I have briefly described three recent developments that have made R highly accessible to the new user, in the hope that archaeologists who are not yet using it might adopt it as more flexible and useful program for data analysis and visualization than their current tools. Of course it is quite likely that the popularity of R will rise and fall like many other programming languages, and ten years from now the fashionable choice may be Julia or something that hasn't even been invented yet. However, the general principle that a scripted analyses using an open source language is better for archaeologists, and science generally, will remain true regardless of the details of the specific language.

Wednesday, 30 April 2014

The Austro-Hungarian emplacements on top of Mt. Roteck

(2390m)
Dolomites / South-Tyrol 

 A case study for extensive survey and documentation on occasion of the 100th anniversary of the beginning of WW1 on the Italian front in May 2015.

As reported on ATOR in summer 2013, Arc-Team is pushing ahead the plan of  mapping extensive areas of the high alpine frontline of WW1 from the Swiss border to the Dolomites.
Our approach consists in a very detailed DGPS-survey, terrestrial structure from motion, geolocalized images, archeological description and aerial survey by our drone.
Of course we are basing the whole working process on Open Source Soft- and, where possible , also on Hardware.
Now we want to share the latest version of a presentation, given originally in occasion of the 7th Fields Of Conflict Conference in Budapest (Hungary) in October 18.-21. 2012.
It outlines the characteristics of the high alpine working environment, the nature of the WW1 remains, the challenges to meet, our project strategy and first results.

Thursday, 17 October 2013

Augmented Reality at Cultways

Yesterday Arc-Team attended the workshop "Cultural tourism and mobile technologies" organized by Trentino Sviluppo in the city of Rovereto. The meeting was related with the European project Cultways and, although we had not participated in the work (already concluded), we were invited to show some related research we did for other institutions (especially the Soprintendenza per i Beni Librari, Archivistici e Archeologici di Trento).
Here below is the poster we presented, done (like always) with Inkscape and GIMP:

The poster for the workshop

In addition to the poster, we prepared some Augmented Reality applications, to show the potentialities of this techniques in Cultural Heritage.
The first installation we did is a prototype for the upcoming exhibition that should take place in Padua in November 2014, as a collaboration between the Antrhropological Museum of the city, Antrocom Onlus, and Arc-Team. This event, called "Facce. I volti della storia umana", is the natural evolution of the Taung Project, and is ideally connected with the exhibition "Faces da Evolução" which took place in Curitiba (BZ). Both of the exhibitions are intended to be "open source", as the data, the software and the know-how has been (and will be) shared through the net.
Here is a short clip of the application, which is based on the joint used of Augmented Reality and 3D printed objects with RepRap (with the help of our friends of Kenstrapper):





The second installation we prepared is a prototype we developed for the Museum of Torre di Pordenone, in order to allow tourist fruition of part of the roman villa buried under the garden.



The third application regards a pilot project we are working on for the Archaeological Office of the Provincia Autonoma di Bolzano Alto Adige/ Autonome Provinz Bozen Suedtirol. The research is connected with a survey to map and document the WW1 evidences in the territory. As an experimental stage of the same project we are testing some Augmented Reality tools to develop cultural sightseeing of the landscape, looking to the commemoration of this event which will be in 2014/2015. Currently we are considering the possibility to print interactive maps like the prototype below:

Just an example of interactive map

Please notice that the informations displayed in the map are not geographically correct: they are just an example to show all the possible kind of data which can be loaded (images, movies, 3D models).




Another possibility for tourist fruition are interactive panels, like the one below...

A test for a panel


... in this case we just added a simple image, but, like the map before, we could load movies or 3D objects.




All the Augmented reality applications were done with Open Source and Free Software. I will describe these tools in the next post, but you can already find some information here in ATOR (just search for this topic).


Monday, 8 October 2012

Kinect 3D indoor: excavation test

To complete the "Kinect trilogy", today I write this post about our first test during a real archaeological fieldwork. 
Also in this case we (Alessandro Bezzi and me) used our "hacked Kinect" with the external battery in connection with the rugged PC and, again, the chosen software for data acquisition was RGBDemo. This time we documented in 3D a layer during an "indoor" excavation, to avoid the problems with direct sunlight I descirbed in this post.
The video below tries to summarize this operation...




... and here are some screenshots to have an idea of the final result:

The pointcloud (frontal view)


The pointcloud (side view)

The mesh

The mesh (wireframe)

As you can see the general quality is lower respect the results we can obtain with other techniques (e.g. SfM and IBM), but Kinect and RGBDemo have the benefit to acquire and elaborate the data almost at the same moment, with the possibility to see the documentation process in real time. 
Ultimately Kinect is one more option to consider for 3D indoor documentation, considering the peculiarities of the archaeological project (the light conditions, the available time, the required level of detail, etc...). Our experiments will now go on now with some tests in particoular situations, where this technique could be the best option (expecially in underground environments).
Have a nice day!

Tuesday, 17 July 2012

3D PDF skull restored

Hi,
today I review the post 3D PDF for archaeology, which had a broken link to the PDF file. The link is now restored (I took the time to check the data). If you download the PDF file and open it with Adobe Acrobat Reader 9 (we still did not find an open source pdf reader able to visualiza u3d data), this is what you should see:


Sorry for the slow clip, but I had to virtualize Windows in ArcheOS (within Virtual Box) and the u3d file of the skull is too heavy to create a light 3D PDF file (it was just a first test).
I would like to thank the people who noticed the broken link. Their report was very helpful in reviewing ATOR old posts! 



Wednesday, 19 October 2011

Data sharing (Vervò webgis)

One of the main topics during the workshop in Ferrara concerned "open data", and particularly we talked about the problems in data sharing. It looks like that in the last years the the situation did not change very much (at least in Italy): archaeological discussion and research inside the scientific community are still slowed by the difficulties of official institutions in data release (for many different reasons, not least of which a general climate of suspicion between archaeologists).
Anyway, according to our past experiences, we have to say that we were quite lucky, finding often (in our institutional partners) people who did not underestimate the problem and allowed us to share archaeological data in specific project. The media we normally chose for this purpose is the webgis.
The image belows regards one of this projects, oriented to archaeological research and conservation in a small area (the territory of Vervò, in north-west Trentino, Italy).

 
The webgis was developed in 2009 by Giuseppe Naponiello using entirely Free and Open Source Software (soon Giuseppe will write a post with more technical information about it); the data come from the research of Alessandro Bezzi and are released with a Creative Commons license. The project was possible thanks to Dr. Nicoletta Pisu of the "Soprintendenza per i Beni Librari Archivistici e ArcheoLogici di Trento".
Actually you can visulize the webgis here.

Monday, 22 August 2011

GNewgnewarchaeology

Good news from University of Ferrara! A new workshop about Free/Libre and Open Source Software in archaeology will take place on the 13th – 14th October 2011. More info on the official website. It will be a good oppotunity to meet and exchange experiences and informations. See you there!

GNU logo

BlogItalia - La directory italiana dei blog Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.