Wednesday 16 December 2009

3.11 References and Resources

Blog
http://eleanor-mclachlan-dita.blogspot.com/
Webspace
http://www.student.city.ac.uk/~abgy297/first.html
Javascript application
http://www.student.city.ac.uk/~abgy297/javascript.html


3.1 Introduction

Web 2.0 technology
http://en.wikipedia.org/wiki/Web_2.0
John Kitson’s blog
http://www.insurancetimes.co.uk/thejohnkitsonblog/


3.2 Text/ HTML

ASCII
http://en.wikipedia.org/wiki/ASCII
ESRI ASCII Raster format
http://resources.esri.com/help/9.3/arcgisengine/java/GP_ToolRef/spatial_analyst_tools/esri_ascii_raster_format.htm
ESRI Whitepaper on metadata
http://www.esri.com/library/whitepapers/pdfs/metadata-and-gis.pdf


3.3 Internet/WWW

Hypertext markup language
http://en.wikipedia.org/wiki/HTML
Internet
http://www.walthowe.com/navnet/history.html
World Wide Web
http://en.wikipedia.org/wiki/History_of_the_World_Wide_Web
Digital divide
http://commons.wikimedia.org/wiki/File:Global_Digital_Divide1.png
Webpages
http://www.student.city.ac.uk/~abgy297/first.html
http://www.student.city.ac.uk/~abgy297/index.html
http://www.student.city.ac.uk/~abgy297/Cat.html


3.4 Images and Graphics

GIF
http://en.wikipedia.org/wiki/Gif
JPEG
http://en.wikipedia.org/wiki/JPEG
PNG
http://en.wikipedia.org/wiki/Portable_Network_Graphics
Other image formats used in GIS
http://en.wikipedia.org/wiki/GIS_file_formats


3.5 XML

XML
http://en.wikipedia.org/wiki/XML
GML
http://gislounge.com/geography-markup-language-gml-20-enabling-the-geo-spatial-web/
OS Mastermap in GML
http://www.ordnancesurvey.co.uk/oswebsite/business/sectors/wireless/news/articles/pdf/OS%20MasterMap%20in%20GML.pdf


3.6 CSS

Cascading style sheets
http://en.wikipedia.org/wiki/Cascading_Style_Sheets
Examples of blog post using different CSS’s
http://www.student.city.ac.uk/~abgy297/css_blog_inverse.html
http://www.student.city.ac.uk/~abgy297/css_blog_large.html
http://www.student.city.ac.uk/~abgy297/css_blog_own.html


3.7 Databases

Program data dependence
http://wiki.answers.com/Q/What_is_program_data_dependence
Spatial databases
http://en.wikipedia.org/wiki/Spatial_database
Spatial database engines
http://www.esri.com/software/arcgis/arcsde/index.html


3.8 Information Retrieval

Stop words
http://www.webconfs.com/stop-words.php
Stemming
http://en.wikipedia.org/wiki/Stemming
Inverted file
http://en.wikipedia.org/wiki/Inverted_index
Google’s three distinct parts
http://www.googleguide.com/google_works.html
Confusion about Google’s use of stop words
http://searchengineland.com/conjunction-junction-google-no-longer-displays-stop-words-malfunction-13161


3.9 Client side programming

My javascript program
http://www.student.city.ac.uk/~abgy297/javascript.html


3.10 Information Architectures

Grid indexing
http://en.wikipedia.org/wiki/Grid
Quadtree indexing
http://en.wikipedia.org/wiki/Quadtree
R-Tree indexing
http://en.wikipedia.org/wiki/R-tree
ESRI Whitepaper
http://www.esri.com/library/whitepapers/pdfs/metadata-and-gis.pdf
ArcCatalog
http://webhelp.esri.com/arcgisdesktop/9.2/index.cfm?TopicName=An_overview_of_ArcCatalog

Tuesday 8 December 2009

3.10 Information Architectures

Information Architecture relates to the organisation, labelling and navigation of information within an information system. The effective organisation of information to facilitate its efficient retrieval is becoming increasingly important as data volumes increase exponentially. This is especially relevant in the field of GIS where the vast quantities of data available mean it is imperative that only the relevant information is returned to the user to enable the swift execution of queries and drawing of maps. Geographic information stored in databases is indexed for this purpose. Grid, Qaudtree and R-tree indices all organise geographic information according to their spatial location to speed up queries and the return of information.

The labelling of geographic information takes the form of metadata, literally defined as data about data. In GIS metadata is often stored as a separate xml file containing information about a files content, quality, type, creation and spatial information such as the coordinate system. An ESRI whitepaper describes metadata as making “spatial information more useful to all types of users by making it easier to document and locate data sets” (ESRI, 2002). Thus this is another form of information architecture that facilitates the efficient organisation and effective retrieval of spatial data.

Navigation of geographic information is solved in ESRI's GIS software via their ArcCatalog architecture. Spatial data often consists of a series of files, for example a single polygon shapefile can consist of up to seven separate files including an index file, a projection file and a geometry file. ArcCatalog displays all these separate pieces of information as a single file, thus making storage, organisation and editing simpler. The application allows you to browse and find geographic information from various sources such as databases, the internet and locally; view and manage metadata; and manage datasets and datasources

3.9 Client Side Programming

My javascript application uses three functions declared in the head of the HTML document. The first, ‘newsorsport’, produces a prompt box asking the user if they are interested in news or sport and requesting they enter 1 for news or 2 for sport. The user's input is declared as a variable and converted to an integer and depending on the outcome either the function ‘area’ or ‘sporttype’ is called. Both these functions request further input from the user via a prompt box and display a link depending on the input.


I found the following aspects of creating the program particularly challenging:
  • Whether to put all the script in the body section of the HTML document or whether to use functions scripted in the head part and then call these in the body
  • The synatx of If statements. In VBA (which I have some limited experience of) 'If, Then Else' is used whereas in Javascript just the word 'If' is required
  • Getting the curly brackets in the correct place
  • Remembering to use double equals signs (==)
  • Getting the anchor tagged URLs to work. They need to be in the paragraph tag and the whole phrase needs to sit within double quotation marks whereas the URL is within single quotes
The fruits of all my labours (and frustrations) can be found here.

Sunday 29 November 2009

3.8 Information Retrieval

Information retrieval refers to the retrieval of unstructured information relevant to a particular user’s requirements. Due to the subjective relevance of the results it is probabilistic, whereas querying a database for structured information is deterministic. For example, many users may enter the same search terms into a search engine, while actually looking for different information, whereas if several users query a RDMS using the same SQL they should be attempting to retrieve the same information.

In order to facilitate the efficient retrieval of unstructured information such as text, the information has to be indexed by identifying relevant fields and words for indexing and preparing the text. This is achieved by removing stop words, stemming and identifying synonyms. The most widely used type of index is an inverted file, an index of searchable terms containing a list of associated documents.

In order to find resources for my DITA blog I have relied mainly on Google. Google has three distinct parts; GoogleBot,- the web crawler that finds and retrieves web pages; the indexer that sorts through the full text of web pages and stores search terms in a massive database; and the query processor which carries out the search by comparing entered terms with the index. There is currently some confusion about Google’s use of stop words. Google used to automatically ignore stop words but informed you that it was doing so and gave you the option to repeat the search with the words included. This message no longer appears and it is unclear whether Google no longer uses stop words and indexes every single word, or whether they still use stop words but just don’t tell the searcher.

Tuesday 17 November 2009

3.7 Databases

Before the advent of the database approach in the early 1970s, data users had no means by which to centrally store and share information; leading to duplication, inaccuracies and program data dependence. Database Management Systems (DBMS) are a suite of software programs which allow information to be stored, organised and accessed in a systematic and consistent way and in a central location, allowing numerous users to access the same data. This increases efficiency by removing duplication and the inaccuracies of maintaining multiple tables of the same information.

In GIS the development of spatial databases and spatial database engines has enabled geographic data to be stored alongside non-spatial database tables within a single DBMS, thus driving the integration of spatial information. Using SQL information can be retrieved from both spatial and non spatial data simultaneously. For example, say we wanted to view a database table of customer addresses on a map, the table could be joined to a spatial table of addresses. As long as the spatial attribute field(s) are included in the output table (either the numeric coordinates or a proprietary geometry field) the table can be imported into a GIS and the customers' addresses viewed spatially. The following SQL query will retrieve the customer number field from the Customer_table and the coordinates from the Address_table and write them to an output table called customer_location.

create table customer_location as
select customer_number, xcoord, ycoord
from customer_table join address_table
on customer_table.address = address_table.address

Customer_table
Customer_number
Address
Postcode
NR173974
45 Laurel Avenue
HP1584
TM184903
7 High Street
E45GE
HA194829
Mill Cottage
IP76CD
MX960417
11 Vincent Street
HP114YE


Address_table
ID
Address
Postcode
Xcoord
Ycoord
ODFD197843
45 Laurel Avenue
HP1584
816304
497628
BNBV497553
7 High Street
E45GE
794382
201975
ASTT796962
Mill Cottage
IP76CD
794682
412876
PEKD969710
11 Vincent Street
HP114YE
994685
325874

Customer_location
Customer_number
Xcoord
Ycoord
NR173974
816304
497628
TM184903
794382
201975
HA194829
794682
412876
MX960417
994685
325874

Sunday 8 November 2009

3.6 CSS

Cascading Style Sheets are a means of describing the aesthetic and stylistic aspects of a web-page using defined syntax to instruct a browser how to display the contents of a page. For example the font type, size, colour and the background colour and layout.

Style sheets bring efficiency to web design by being applicable to any number of documents. A whole website can reference the same CSS and adhere to the same stylistic rules, giving it a distinct aesthetic feel. The term ‘Cascade’ refers to the fact that numerous style sheets can be referenced in the same document and the browser will read the sheets in order so earlier sheets will be successively overwritten by later ones. Cascading Style Sheets can be included in an HTML document as an external CSS file to which the HTML points using the link tag, included using the style tag or included directly in an element via the style attribute.

Pros of Cascading Style Sheets:
  • Separate style from content so HTML remains legible and accessible to all users (eg, visually impaired users can use screen readers or apply different style sheets to the content)
  • Make it easy to change the look of webpages
  • Can be applied to any number of documents improving efficiency as code only has to be written once
  • Reduce network traffic, as if the same sheet is applied to numerous pages it is only downloaded once

Cons of Cascading Style Sheets:
  • Different browsers treat some of the styling instructions in different ways
  • Earlier versions of Internet Explorer don’t support CSS well
Examples of Cascading Style Sheets:
here , here and here are examples of this blog post using different CSS's.

Friday 30 October 2009

3.5 XML

XML  is a means of describing data. It is a method of encoding data in text which renders the data easy to store, transport and interpret. It is not a language but provides a framework that allows users to write their own Mark-up languages for their own specific needs. GML is such a language which is concerned with the description of geographic content. It describes data or objects that have a spatial element by encoding geometry and spatial reference systems. The fact that it is based on XML means that it can be read and edited using any text editor, is easy to transport and transform and, crucially, can be easily integrated with non-spatial data.

This final point is incredibly important. For many years spatial data has been stored, viewed and analysed separately from non-spatial data. In recent years GIS developers have striven to integrate spatial data and analysis. GML enables the integration of geographical information by using a set of rules and guidelines (XML) which can be applied to any data type for any purpose. This means that geographic information can be integrated with a massive range of non-geographic data types thus, greatly enhancing the value and accessibility of spatial information.

The Ordnance Survey’s MasterMap dataset is a digital database of vector information describing geographical features on the ground in incredible detail. Individual building outlines, bollards, trees and road networks all have their own unique identifier called a TOID (Topographic Identifier) as well as information on their geometry and attributes. MasterMap is based on GML. The OS use GML because it's “well defined geometric primitives coupled with a structured mechanism for defining features ensures that when spatial data is exchanged in GML it can be interpreted and understood by everyone.”

Below is a sample of MasterMap for my house viewed in a GIS, and its associated GML.

MasterMap Data Around My House



Crown Copyright/ Database right 2009
An Ordnance Survey/ Edina supplied service

A Sample of the GML for the above MasterMap Data





Monday 19 October 2009

3.4 Images and Graphics

Table Showing Main Differences Between GIF, JPEG and PNG Image Formats

PNG
JPG
GIF
Format
24-Bit
24-Bit
8-bit
Colours
16 Million
16 Million
256
Compression
Lossless
Lossy
Lossless
Best For
Multiple Edits
Photographs
Simple Images



GIF (Graphics Interchange Format) is an image format developed by CompuServe in 1987. Its method of indexing using a colour lookup table and specifying the value of each raster cell as a position within the table makes it particularly suitable for images containing large areas of limited colours, as repetitions of sequences of bytes can be referenced according to their previous occurrence thus saving space. GIF is not a suitable format for complex images such as photographs. Complex images with smooth variations in colour and tone such as these are best suited to the JPEG file format, such named as it was developed by the Joint Photographic Experts Group. The 24 bit format allows for over 16 million colours. However, the compression does sacrifice some of the original data making it unsuitable for important images or those which will undergo multiple edits. The PNG (Portable Network Graphics) format seeks to combine the benefits of the jpg format with the compression rates of GIF by using a 24-bit format and the indexed colour method of storage. Because this format is lossless it is particularly suited to images which need to be preserved in their entirity.

GIS makes use of all three of these image formats both in bringing images into GIS for spatial enrichment (georeferencing) and exporting map documents for printing or use in other applications such as Microsoft PowerPoint. In addition GIS systems make use of other image formats such as BIL, IMG and JPEG 2000.

Below is the same map layout in JPEG, GIF and PNG image formats. All files are 99KB in size. As you can see, the JPEG has not coped well with the resizing.
                                                                             JPEG


GIF


                                                                             PNG

Friday 16 October 2009

3.3 Internet/ WWW

The Internet and the World Wide Web are not the same thing. If the Internet were a road the www would be a bus. But if the Internet was the bus the passengers would be the www. And if the Internet were the passengers the www would be...don’t know. But you get the point.

The Internet is a networking infrastructure connecting millions of computers and allowing them to communicate with each other as long as they are connected to the Internet (or on the bus?). Information is shared between computers on the Internet via a variety of different protocols. The WWW consists of Web Pages constructed using the hypertext markup language protocol which are hosted on servers. The server-client architecture allows the server machines to respond to requests from clients and return the requested HTML documents via the Internet which are interpreted by the client Internet browser as web pages. Internet protocol numbers uniquely identify every computer in the world and are translated into more familiar domain names by the Domain Name System. These are combined with information on the local path of a document to form a Uniform Resource Locator (URL), which allows the server to locate and return the requested page to the client.

The Internet was developed in the 1960s by the American military to create a communications network impervious to nuclear attack. The World Wide Web came in the 1990s, initially as a means to share academic papers across the Internet.

It is easy to become swept up in the hype about the Internet and World Wide Web, however, it is important to consider the socio-economic bias. The digital revolution has occurred predominantly in the western, developed world and a clear digital divide exists.

Here, here and here are the web pages I created in this weeks practical.

Thursday 8 October 2009

3.2 Text/ Html

Different data formats use different binary encodings as an agreed method of interpreting sequences of binary numbers. For example, the American National Standards Institute (ANSI) developed an encoding for alphanumeric characters known as the American Standard Code for Information Interchange or ASCII. ASCII can be used to translate between seven digit binary sequences and alphanumeric characters. It is a format which is familiar to me in my profession as a GIS analyst via the ESRI ASCII Raster format which we use to store digital elevation models.


Different data formats allow computers to use binary code to represent data in different ways to support a range of uses and activities. For example, in GIS the .shp file format is used to store spatial information relating to a vector data type and .prj is used to store information about the geographic projection system. Most data types are proprietary so knowing which program should be used to view a particular data type is key. If I open an ESRI .shp file in Microsoft word, for example, the result is useless (see image). Different data formats require a particular program to allow the file to be read correctly and so enable the user to extract meaning from it.


So data formats allow us to extract meaning from different types of data by interpreting sequences of binary in different ways. Metadata and markup allow us to impart even more meaning to data essentially by providing data about data. For example, in a word (.doc) document, metadata might be used to denote the font type, size and colour. In GIS metadata is normally stored and displayed as an xml file. It is described in an Esri Whitepaper as "a summary document providing content, quality, type, creation and spatial information about a dataset".


Friday 2 October 2009

3.1 - Introduction

Being completely new to blogs I was more than a little daunted about using this particular web 2.0 technology to document my understanding of digital technologies and architectures. However, having selected Blogger.com I have been pleasantly surprised.

Blogger.com is owned by Google, which is immediately evident in the style and ease of use of the website. Before setting up your blog users are invited to 'take a quick tour' which provides you with a brief definition and history of blogs. I particularly liked their view of blogs as a revolutionary tool which has altered not only the virtual world but "impacted politics, shaken up journalism and allowed millions of people to have a voice and connect with others". A bit of hyperbolising by Blogger.com? We shall see. What I was surprised to find was the prevalence of Business blogs within my own organisation, Aviva, for the dissemination of information on the business as a whole or particular projects. The most noteworthy is John Kitson's, our Sales and Marketing Director's blog . It seems the whole world's been blogging except me...

The service is organised in such a way as to make it appealing and accessible to non-technical users with options allowing you to customise the look and feel of your page such as altering the layout, colours and fonts without having to use Html. On signing in you are taken to your profile page which has clear links to allow you to view other peoples blogs via the 'Blogs I'm Following' and 'Blogs of note' tabs making the development of social networks a seamless element of the blogging experience. The options to share and present information also seem to be straightforward with buttons to add images or videos, edit Html and personalise your text.