Digital Information Technologies and Architectures: November 2009

Sunday, 29 November 2009

3.8 Information Retrieval

Information retrieval refers to the retrieval of unstructured information relevant to a particular user’s requirements. Due to the subjective relevance of the results it is probabilistic, whereas querying a database for structured information is deterministic. For example, many users may enter the same search terms into a search engine, while actually looking for different information, whereas if several users query a RDMS using the same SQL they should be attempting to retrieve the same information.

In order to facilitate the efficient retrieval of unstructured information such as text, the information has to be indexed by identifying relevant fields and words for indexing and preparing the text. This is achieved by removing stop words, stemming and identifying synonyms. The most widely used type of index is an inverted file, an index of searchable terms containing a list of associated documents.

In order to find resources for my DITA blog I have relied mainly on Google. Google has three distinct parts; GoogleBot,- the web crawler that finds and retrieves web pages; the indexer that sorts through the full text of web pages and stores search terms in a massive database; and the query processor which carries out the search by comparing entered terms with the index. There is currently some confusion about Google’s use of stop words. Google used to automatically ignore stop words but informed you that it was doing so and gave you the option to repeat the search with the words included. This message no longer appears and it is unclear whether Google no longer uses stop words and indexes every single word, or whether they still use stop words but just don’t tell the searcher.

Tuesday, 17 November 2009

3.7 Databases

Before the advent of the database approach in the early 1970s, data users had no means by which to centrally store and share information; leading to duplication, inaccuracies and program data dependence. Database Management Systems (DBMS) are a suite of software programs which allow information to be stored, organised and accessed in a systematic and consistent way and in a central location, allowing numerous users to access the same data. This increases efficiency by removing duplication and the inaccuracies of maintaining multiple tables of the same information.

In GIS the development of spatial databases and spatial database engines has enabled geographic data to be stored alongside non-spatial database tables within a single DBMS, thus driving the integration of spatial information. Using SQL information can be retrieved from both spatial and non spatial data simultaneously. For example, say we wanted to view a database table of customer addresses on a map, the table could be joined to a spatial table of addresses. As long as the spatial attribute field(s) are included in the output table (either the numeric coordinates or a proprietary geometry field) the table can be imported into a GIS and the customers' addresses viewed spatially. The following SQL query will retrieve the customer number field from the Customer_table and the coordinates from the Address_table and write them to an output table called customer_location.

create table customer_location as
select customer_number, xcoord, ycoord
from customer_table join address_table
on customer_table.address = address_table.address

Customer_table

Customer_number	Address	Postcode
NR173974	45 Laurel Avenue	HP1584
TM184903	7 High Street	E45GE
HA194829	Mill Cottage	IP76CD
MX960417	11 Vincent Street	HP114YE

Address_table

ID	Address	Postcode	Xcoord	Ycoord
ODFD197843	45 Laurel Avenue	HP1584	816304	497628
BNBV497553	7 High Street	E45GE	794382	201975
ASTT796962	Mill Cottage	IP76CD	794682	412876
PEKD969710	11 Vincent Street	HP114YE	994685	325874

Customer_location

Customer_number	Xcoord	Ycoord
NR173974	816304	497628
TM184903	794382	201975
HA194829	794682	412876
MX960417	994685	325874

Sunday, 8 November 2009

3.6 CSS

Cascading Style Sheets are a means of describing the aesthetic and stylistic aspects of a web-page using defined syntax to instruct a browser how to display the contents of a page. For example the font type, size, colour and the background colour and layout.

Style sheets bring efficiency to web design by being applicable to any number of documents. A whole website can reference the same CSS and adhere to the same stylistic rules, giving it a distinct aesthetic feel. The term ‘Cascade’ refers to the fact that numerous style sheets can be referenced in the same document and the browser will read the sheets in order so earlier sheets will be successively overwritten by later ones. Cascading Style Sheets can be included in an HTML document as an external CSS file to which the HTML points using the link tag, included using the style tag or included directly in an element via the style attribute.

Pros of Cascading Style Sheets:

Separate style from content so HTML remains legible and accessible to all users (eg, visually impaired users can use screen readers or apply different style sheets to the content)
Make it easy to change the look of webpages
Can be applied to any number of documents improving efficiency as code only has to be written once
Reduce network traffic, as if the same sheet is applied to numerous pages it is only downloaded once

Cons of Cascading Style Sheets:

Different browsers treat some of the styling instructions in different ways
Earlier versions of Internet Explorer don’t support CSS well

Examples of Cascading Style Sheets:
here , here and here are examples of this blog post using different CSS's.

Digital Information Technologies and Architectures

Sunday, 29 November 2009

3.8 Information Retrieval

Tuesday, 17 November 2009

3.7 Databases

Sunday, 8 November 2009

3.6 CSS

Followers

Blog Archive

About Me

My Blog List