Sunday 29 November 2009

3.8 Information Retrieval

Information retrieval refers to the retrieval of unstructured information relevant to a particular user’s requirements. Due to the subjective relevance of the results it is probabilistic, whereas querying a database for structured information is deterministic. For example, many users may enter the same search terms into a search engine, while actually looking for different information, whereas if several users query a RDMS using the same SQL they should be attempting to retrieve the same information.

In order to facilitate the efficient retrieval of unstructured information such as text, the information has to be indexed by identifying relevant fields and words for indexing and preparing the text. This is achieved by removing stop words, stemming and identifying synonyms. The most widely used type of index is an inverted file, an index of searchable terms containing a list of associated documents.

In order to find resources for my DITA blog I have relied mainly on Google. Google has three distinct parts; GoogleBot,- the web crawler that finds and retrieves web pages; the indexer that sorts through the full text of web pages and stores search terms in a massive database; and the query processor which carries out the search by comparing entered terms with the index. There is currently some confusion about Google’s use of stop words. Google used to automatically ignore stop words but informed you that it was doing so and gave you the option to repeat the search with the words included. This message no longer appears and it is unclear whether Google no longer uses stop words and indexes every single word, or whether they still use stop words but just don’t tell the searcher.

Tuesday 17 November 2009

3.7 Databases

Before the advent of the database approach in the early 1970s, data users had no means by which to centrally store and share information; leading to duplication, inaccuracies and program data dependence. Database Management Systems (DBMS) are a suite of software programs which allow information to be stored, organised and accessed in a systematic and consistent way and in a central location, allowing numerous users to access the same data. This increases efficiency by removing duplication and the inaccuracies of maintaining multiple tables of the same information.

In GIS the development of spatial databases and spatial database engines has enabled geographic data to be stored alongside non-spatial database tables within a single DBMS, thus driving the integration of spatial information. Using SQL information can be retrieved from both spatial and non spatial data simultaneously. For example, say we wanted to view a database table of customer addresses on a map, the table could be joined to a spatial table of addresses. As long as the spatial attribute field(s) are included in the output table (either the numeric coordinates or a proprietary geometry field) the table can be imported into a GIS and the customers' addresses viewed spatially. The following SQL query will retrieve the customer number field from the Customer_table and the coordinates from the Address_table and write them to an output table called customer_location.

create table customer_location as
select customer_number, xcoord, ycoord
from customer_table join address_table
on customer_table.address = address_table.address

Customer_table
Customer_number
Address
Postcode
NR173974
45 Laurel Avenue
HP1584
TM184903
7 High Street
E45GE
HA194829
Mill Cottage
IP76CD
MX960417
11 Vincent Street
HP114YE


Address_table
ID
Address
Postcode
Xcoord
Ycoord
ODFD197843
45 Laurel Avenue
HP1584
816304
497628
BNBV497553
7 High Street
E45GE
794382
201975
ASTT796962
Mill Cottage
IP76CD
794682
412876
PEKD969710
11 Vincent Street
HP114YE
994685
325874

Customer_location
Customer_number
Xcoord
Ycoord
NR173974
816304
497628
TM184903
794382
201975
HA194829
794682
412876
MX960417
994685
325874

Sunday 8 November 2009

3.6 CSS

Cascading Style Sheets are a means of describing the aesthetic and stylistic aspects of a web-page using defined syntax to instruct a browser how to display the contents of a page. For example the font type, size, colour and the background colour and layout.

Style sheets bring efficiency to web design by being applicable to any number of documents. A whole website can reference the same CSS and adhere to the same stylistic rules, giving it a distinct aesthetic feel. The term ‘Cascade’ refers to the fact that numerous style sheets can be referenced in the same document and the browser will read the sheets in order so earlier sheets will be successively overwritten by later ones. Cascading Style Sheets can be included in an HTML document as an external CSS file to which the HTML points using the link tag, included using the style tag or included directly in an element via the style attribute.

Pros of Cascading Style Sheets:
  • Separate style from content so HTML remains legible and accessible to all users (eg, visually impaired users can use screen readers or apply different style sheets to the content)
  • Make it easy to change the look of webpages
  • Can be applied to any number of documents improving efficiency as code only has to be written once
  • Reduce network traffic, as if the same sheet is applied to numerous pages it is only downloaded once

Cons of Cascading Style Sheets:
  • Different browsers treat some of the styling instructions in different ways
  • Earlier versions of Internet Explorer don’t support CSS well
Examples of Cascading Style Sheets:
here , here and here are examples of this blog post using different CSS's.