2.2 Attribute data Attribute data are data in the form that most people understand by the term. Many GIS software packages include their own attribute databases but allow the user to link to external database management systems (DBMS) or spreadsheets. Examples of this include MapInfo, allowing the user to link to data in Microsoft Excel, ArcView to DBase, and ArcInfo to Oracle. Attribute data are frequently either statistical or textual. As the software improves and becomes more flexible, they can be in virtually any format that the DBMS used to store them can support. Increasingly this includes image formats, animations, hyperlinks, multimedia, and so on. Most of the DBMSs used in GIS are relational database management systems. This means that two or more tables can be joined together based on a common field known as a key. With historical data this can often be either a place name or an ID number (note that place names are not considered to be spatial data: to be spatial the data must have a coordinate-based location), and it allows data from various sources to be integrated without requiring spatial data. For example, a user has a table of Poor Law data organised by Poor Law Unions (these were a type of administrative unit used in England and Wales in the 19th and early 20th centuries to administer relief of the poor), a table of voting statistics organised by parliamentary constituency, and some employment statistics based on towns. A relation join will join all three tables together and all of the data for 'Bristol', for example, will appear on a single row. There are three main problems with doing this: firstly, the join has no knowledge that the entity referred to as 'Bristol' may be a different entity in each table. Secondly, problems will occur where names are not unique, such as 'Whitchurch' which appears in both Hampshire and Shropshire. Different software will handle this in different ways, the most common (and theoretically sound) being to duplicate rows of data. One way round this is to use more than one column as the key, for example, place name and county. The third problem with using place names is that their spellings must be identical to produce a match. Even minor differences in the use of hyphens or apostrophes will cause a non-match. This can be worked around using gazetteers that standardise all possible spellings and create a single spelling from an authority list, or through the use of ID numbers. Creating these can be time consuming. Many attribute databases use Structured Query Language (SQL) to allow flexible querying and joining. This is often implemented though a Graphical User Interface but follows the basic structure: select <column names> from <table names> where <condition> So, for example, we have two tables: 'unemp' that contains data on unemployment rates, and 'inf_mort' that contains data on infant mortality. These have the following fields: Unemp | | Inf_mort | | Field Name | Contents | Field Name | Contents | | Name | The area's place name | Name | The area's place name | | U_rate | The area's unemployment rate | Im_rate | The area's infant mortality rate | | Tot_pop | The area's total population | Births | The number of births in the area |
Table 2.1: Sample tables of attribute data The SQL query: select name, u_rate, tot_pop from unemp where u_rate>10.0 will select the names, unemployment rates, and total populations from the table unemp for places with an unemployment rate of over 10% as is shown in Table 2.2. | Name | U_rate | Tot_pop | | Bolton | 13.2 | 10,000 | | Oldham | 12.1 | 7,500 | | Rochdale | 10.9 | 8,000 |
Table 2.2: Sample data returned by the query above Relational joins are also implemented in this way. For example the query: select unemp.name, unemp.u_rate, inf_mort.im_rate from unemp, inf_mort where unemp.name=inf_mort.name will select the unemployment rates from unemp and infant mortality rates from inf_mort where the values in the name fields in both tables are identical.
| Unemp | Unemp | Inf_mort | | Name | U_rate | Im_rate | | Bolton | 13.2 | 120 | | Oldham | 12.1 | 115 | | Rochdale | 10.9 | 106 | | Burnley | 9.4 | 98 | | Colne | 9.1 | 89 |
Table 2.3: Sample data returned by the query above While relational databases and SQL are not fundamental to an understanding of GIS, they are a useful skill that can enhance an understanding of GIS data and GIS software. Many guides to the use of SQL and relational databases are available: see the bibliography for further information. |