Methodology

Methodological Approach for a Study on Urban Retail Trade in the Belle Époque

Challenges and framework

In terms of the methodology two were the challenges we faced in two distinct stages of research, data collection and its georeferencing.

The first challenge relates to the difficulty in scanning a volume of data of this nature, only available on archive, with no available funding to carry out a project of mass digitization of the original tax records. This was associated with the fact that all sources are handwritten, drawn up by several employees, with equally different handwritings, which would make it impossible, given the current state of handwritten character recognition technology (Beatty 2010; Brumfield 2014), for an automatic or even semiautomatic data treatment.

The second challenge went through geocoding about fifty thousand addresses obtained in this documentation, using essentially the computer's processing power. It is recognized that there are advantages and disadvantages in using several geocoding methods (Goldberg, Wilson, and Knoblock 2007; St-Hilaire et al. 2007; Zandbergen 2008). Its application to Lisbon is made difficult due to the fact that the city have gone through a deep urban morphology transformation and experienced significant changes in its street names throughout the twentieth century (Alves 2005; Oliveira and Pinho 2006). Nevertheless, it remains one of the best methods for automatic assignment of geographic coordinates. The challenge was to think of a process that could overcome the difficulties listed, without having to go through the full reconstitution of the urban network. Not least because the available sources do not made possible the recreation of all the buildings and their functional classification in a GIS environment, as was achieved in other projects (Dunae et al. 2013).

While recognizing the limitations of GIS (Boonstra 2009; Bodenhamer, Corrigan, and Harris 2010; Silveira 2014), it must be highlighted the capabilities of its spatial analysis, processing and data visualization tools, that has been particular useful in the context of urban history (Hillier 2010; DeBats and Gregory 2011; Baics 2016b). In recent years, the use of GIS for the study of cities and their retail trade (Beascoechea Gangoiti 2003; Mirás Araujo 2008; Bassols and Oyón Bañales 2010; Novak and Gilliland 2011; Baics 2016a) has enhanced the theoretical framework and the possibility of international comparative studies, which represented a definitive stimulus to the application of these methodologies to this case study.

slide_5

 

Collecting the data

The solution for the first challenge followed a close approach to crowdsourcing projects (Brumfield 2014; Causer and Terras 2014), using shared databases and collaborative work, a method already developed with excellent results in previous studies about literature and the evolution of urban space (Alves and Queiroz 2013). However, it is known that the use of crowdsourcing can be a lengthy process and not always easy to check on the quality of final results. Furthermore, it requires easier access to sources through digital copies because not all potential participants have availability or even competence for archive work. Finally it require an understanding of the historical context of the sources, in order to better transcribe them. In our case, the sources correspond to thick volumes of bound sheets, deposited in a municipal archive without major conditions for local consultation. There was also the problem of the quality of the data, given that we are talking about handwritten information sometimes difficult to read. In this sense, the use of a collaborative work, through volunteer history students with availability and sensitivity to archival work and knowledge of the historical context of the retail trade was decisive.

This data collection was made with a PostgreSQL database, shared with all students through ODBC, so that all data collected by one of the students become available for subsequent use by others. This process, already tested in other project with very good results (Alves and Queiroz 2015), saved up a lot of time in data collection and redundancies were avoided, because a new address or a new commercial activity detected by one of the students in the sources once registered in the database, was automatically available and could be used / selected by all the others. Even if the original record contained an error, because it was registered only once in the database (since the data model prevented duplicates) the validation task was also facilitated.

slide_7

 

Building the map and geocoding the data

The option for the second challenge went through the reconstruction of the existing streets network of the time, based on geo-referenced digital cartography and city directories. Thus an addresses’ database was created and, with slight adaptations on the geocoding algorithm of the GIS platform used, allowed for a success rate of around 90%.

The first and greatest of all the problems is related to the urban changes that the city has undergone over the last 120 years. On the one hand, strong population growth led to the expansion or changes in the corresponding urban area, even If we take into account just the three years analyzed. The city of 1890 is very different from the one in 1910, in the layout of the streets, in the expansion into new areas, construction or renovation of its buildings. On the other hand, Lisbon had the particularity to overcome, between 1890 and the present, four very different political regimes that have left a peculiar and deeply transformative mark in the city's toponymy. Even if the urban area was stable, only the change of street names over more than a century of profound political changes, would pose great challenges to an automated geocoding process.

For this there was first the need to rebuild the streets network of the time, trying to recreate the map of Lisbon of the Belle Epoque. This map was fixed for 1890 and then it was possible to apply the normal geocoding techniques, adapting either the collected data source, or the software algorithm used, in order to overcome difficulties so small, but so significant in the final results, such as the fact that the software was incapable to deal with some Portuguese names and characters or to recognize certain types of streets that were used at the time but ignored or underused today. Or the fact that the names of some streets are identical or very similar in different areas of the city. This was an iterative process of trial and error so as to refine the best model data that maximizes the results obtained.

However, there was still the issue of street names changing over time. For the period in question this problem is not very complicated, since only at the end of 1910 took place the first regime change, the Republican Revolution, with the consequent wave of place names changes. But there was specific changes in some street names, and new streets were open in the city landscape almost every year. These changes were incorporated in the database, maintaining the address structure already geocoded for 1890, allowing to incorporate the data for the same streets that appeared in the sources with different names.

As for the quality or accuracy of geocoding, it is obvious that the ideal would be to have the possibility to rebuild not only the network of city streets, but also the actual location of the various housing / trade blocs. Unfortunately no sources for this level of detail are available, similar to the maps of fire insurance plans available for some north American cities (Novak and Gilliland 2011; Dunae et al. 2013). In the original shapefile map, which represented the city streets in 2012, the streets have information about existing building numbers in each block. Since we have no way of saying, at this stage of the research, how many buildings or doors existed on every block in 1890, it was decided to distribute all points along the entire length of the street, according to the geocoding algorithm. Obviously, this option can cause some significant deviation in less consolidated urban areas of the city, but it is also true that the areas where the overwhelming majority of small businesses traditionally where implemented account for already established streets from the end of the 18th century. So that problem here is much lower, and the geocoding accuracy is much higher.

The reconstruction of the 1890 street map, base of all subsequent work, was made through a current map in shapefile format, provided by the Municipality. This map was superimposed on a digital copy of an old city map properly georeferenced. The mere overlap of the two maps identified a broad set of streets that did not exist at the time (eliminated from the shapefile) and others that changed their layout (fixed in the shapefile). Using city directories, namely one very complete for 1890 mentioning the name, location and building numbers of all city’s streets, it was possible to correct the shapefile street names that had gone through changes over these 120 years. The link between this corrected shapefile and the shops’ addresses collected in the shared database allowed for geocoding, with minor adjustments to the software algorithm, namely, modifying the “.cls” files that contain information about the classes and types of addresses.

slide_16

 

Results and reuse of the methodology

Overcoming these issues, it was possible to get a far more dynamic source, more accessible and manageable, able to respond quickly to old research questions or to introduce new perspectives about Lisbon and its retail trade in the Belle Époque. Just to mention one possibility, in the Lisbon municipal archive there are many available data that could be crossed with the information collected on shops and shopkeepers. In particular, it has data on electoral census and elections, with voters lists organized by parishes and addresses, for several years of the nineteenth and twentieth century and this data can also be mapped and analyzed using the methodology now developed. The same can happen with data on primary education, for example, because this fund has information on the addresses of the students also. Given that the addresses of the shops are being collected in a separate table and that it allows to be connected to either the GIS or to other tables with different data, whether on small businesses, on elections or on education, the use of this methodology will allow for results in other projects or to add new information relevant to the history of the city without the need to start everything from scratch.

slide_19