2-4 Şubat AB 2005, Gaziantep Bilgi Erişim: Temel Kavramlar Yaşar Tonta Hacettepe Üniversitesi yunus.hacettepe.edu.tr/~tonta/ DOK324/BBY220 Bilgi Erişim İlkeleri
2-4 Şubat AB 2005, Gaziantep Plan Bilgi tanımı Belge tanımı Bilgi erişim sistemlerinin mantıksal yapısı Temel kavramlar Erişim kuralları Performans ölçümleri
2-4 Şubat AB 2005, Gaziantep Felsefede Bilgi (Knowledge) Bilgi –Bilme etkinliği –Bu etkinlik sonucu elde edilen çıktı Bilgi etkinlikleri –algılama –anlama –düşünme –muhakeme etme –yorumlama –açıklama –doğrulama –değerlendirme Kaynak: Kuçuradi, 1995, s. 97
2-4 Şubat AB 2005, Gaziantep Bilgi Araştırmalarında Bilgi (Information) Süreç olarak bilgi (information-as- process) Bilgi olarak bilgi (information-as- knowledge) Nesne olarak bilgi (information-as- object)
2-4 Şubat AB 2005, Gaziantep Bilgiye Farklı Bakış Açıları Kaynak: Buckland, 1991, s. 6
2-4 Şubat AB 2005, Gaziantep Belge docere: öğretmek, bilgilendirmek –ment: araçlar “bir fiziksel ya da entellektüel olguyu temsil etmek, yeniden yaratmak ya da ispatlamak için korunan ya da kaydedilen tüm somut ve sembolik dizinsel işaretler” (Suzanne Briet) Belge örnekleri: kil tablet, yontu, papirüs, harita, yazma, kitap, dergi, resim, film, kaset, CD-ROM, DVD, Web sayfası, dijital belgeler, vs.
2-4 Şubat AB 2005, Gaziantep Farklı Disiplinlerde Belge Belge: biçim + işaret + ortam Biçim: –Hattatlar, müzik ve sinema yapımcıları, örüntü tanıma uzmanları, kütüphaneciler, arşivciler, müzeciler İşaret: –Dilbilimciler, bilgisayarcılar, yapay zeka uzmanları Ortam: –Arşivciler, tarihçiler, hukukçular, diplomatik bilimciler, yayıncılar, kütüphaneciler, vd.
2-4 Şubat AB 2005, Gaziantep Bilgi Yönetimi (Information Management) her türlü örgütün etkin olarak işletilmesiyle ilgili bilginin sağlanması, düzenlenmesi, denetimi, yayımı ve kullanımına yönetim ilkelerinin uygulanması “doğru karar vermek için doğru formda, doğru kişiye, doğru maliyetle, doğru zamanda, doğru yerde, doğru bilgiyi sağlamak”
2-4 Şubat AB 2005, Gaziantep Bilgi Yönetimi (Knowledge Management) bir örgütün misyonunu gerçekleştirmesi için örgütün entellektüel sermayesinin kullanımına dayanan bir yönetim uygulaması Entellektüel sermaye: örgüt çalışanlarının geliştirdiği ya da biriktirdiği deneyim, hizmet ve ürünlerden sağlanan bilgi (knowledge). Bilgi (knowledge): –Belirtik (nesne olarak bilgi) –Örtük (bilgi olarak bilgi)
2-4 Şubat AB 2005, Gaziantep Bilgi Yöneticisi Neyi Yönetir? İnsan beyninde saklı örtük bilgileri mi? Üzerinde bilgi taşıdığı varsayılan nesneleri (belgeleri) mi? Yoksa her ikisini de mi? –Kütüphanecilik –Arşivcilik –Dokümantasyon - Belge yönetimi – Kayıt yönetimi - İdari dokümantasyon (records management, document management) –Veri yönetimi, Bilgi kaynakları yönetimi, Bilgi teknolojisi yönetimi –Bilgibilim, bilgi araştırmaları –Bilgi yönetimi (üzerinde bilgi taşıyan belgelerin yönetimi)
2-4 Şubat AB 2005, Gaziantep Bilgi Yönetimi (Information Management) Belgelerin sağlanması, düzenlenmesi, yaşatılması, kullanımı, korunması, arşivlenmesi Kullanıcıların bilgi gereksinimlerinin saptanması ve karşılanması Bilgi sistemlerinin tasarlanması, kurulması ve işletilmesi Bilgi teknolojisi yönetimi
2-4 Şubat AB 2005, Gaziantep Bilgi Erişim “bilgi toplama, sınıflama, kataloglama, depolama, büyük miktardaki verilerden arama yapma ve bu verilerden istenen bilgiyi üretme (veya gösterme) tekniği ve süreci”
2-4 Şubat AB 2005, Gaziantep Bilgi Erişimin Temel İkilemi “Hakkında bilgi bulmak için bilmediğin bir şeyi tanımlama gereği” (Hjerrpe)
2-4 Şubat AB 2005, Gaziantep Bilgi Keşfetme, Tanımlama, Düzenleme ve Erişim Erişim Düzenleme Tanımlama Keşfetme Tanımlama Düzenleme Erişim
2-4 Şubat AB 2005, Gaziantep Belge Erişim Sisteminin Mantıksal Düzenlemesi Dizin tutanakları Gömü - Sözlük Dizinleme Belgeler Kullanıcılar Sorgu formülasyonu Formel sorgu cümlesi Erişim kuralı Kaynak: Maron, 1984
2-4 Şubat AB 2005, Gaziantep İdeal Bilgi Erişim Sistemi İlgili belgelerin tümüne ve salt ilgili belgelere erişim sağlamalı “İlgililik” kavramı – Nesnel ilgililik – Öznel ilgililik Birbirine benzeyen bilgileri bir araya getirmek, benzemeyenleri ayırmak
2-4 Şubat AB 2005, Gaziantep Background Concepts for IR User Information Needs Controlled Vocabularies (Pre and Post- coordination) Indexing Languages IR definitions and concepts –Documents –Queries –Collections –Evaluation –Relevance
2-4 Şubat AB 2005, Gaziantep User Information Need Why build IR systems at all? People have different and highly varied needs for information People often do not know what they want, or may not be able to express it in a usable form –Boulding’s “Image” How to satisfy these user needs for information?
2-4 Şubat AB 2005, Gaziantep Controlled Vocabularies Vocabulary control is the attempt to provide a standardized and consistent set of terms (such as subject headings, names, classifications, etc.) with the intent of aiding the searcher in finding information. Controlled vocabularies are a kind of metadata: –Data about data –Information about information
2-4 Şubat AB 2005, Gaziantep Pre- and Postcoordination Precoordination relies on the indexer (librarian, etc.) to construct some adequate representation of the meaning of a document. Postcoordination relies on the user or searcher to combine more atomic concepts in the attempt to describe the documents that would be considered relevant.
2-4 Şubat AB 2005, Gaziantep Structure of an IR System Search Line Interest profiles & Queries Documents & data Rules of the game = Rules for subject indexing + Thesaurus (which consists of Lead-In Vocabulary and Indexing Language Storage Line Potentially Relevant Documents Comparison/ Matching Store1: Profiles/ Search requests Store2: Document representations Indexing (Descriptive and Subject) Formulating query in terms of descriptors Storage of profiles Storage of Documents Information Storage and Retrieval System Adapted from Soergel, p. 19
2-4 Şubat AB 2005, Gaziantep Uses of Controlled Vocabularies Library Subject Headings, Classification and Authority Files. Commercial Journal Indexing Services and databases Yahoo, and other Web classification schemes Online and Manual Systems within organizations –SunSolve –MacArthur
2-4 Şubat AB 2005, Gaziantep Types of Indexing Languages Uncontrolled Keyword Indexing Indexing Languages –Controlled, but not structured Thesauri –Controlled and Structured Classification Systems –Controlled, Structured, and Coded Faceted Classification Systems
2-4 Şubat AB 2005, Gaziantep Thesauri A Thesaurus is a collection of selected vocabulary (preferred terms or descriptors) with links among Synonymous, Equivalent, Broader, Narrower and other Related Terms
2-4 Şubat AB 2005, Gaziantep Thesauri (cont.) National and International Standards for Thesauri –ANSI/NISO z American National Standard Guidelines for the Construction, Format and Management of Monolingual Thesauri –ANSI/NISO Draft Standard Z x -- American National Standard Guidelines for Indexes in Information Retrieval –ISO Documentation -- Guidelines for the establishment and development of monolingual thesauri –ISO Documentation -- Guidelines for the establishment and development of multilingual thesauri
2-4 Şubat AB 2005, Gaziantep Development of a Thesaurus Term Selection. Merging and Development of Concept Classes. Definition of Broad Subject Fields and Subfields. Development of Classificatory structure Review, Testing, Application, Revision.
2-4 Şubat AB 2005, Gaziantep Categorization Summary Processes of categorization underlie many of the issues having to do with information organization Categorization is messier than our computer systems would like Human categories have graded membership, consisting of family resemblances. Family resemblance is expressed in part by which subset of features are shared It is also determined by underlying understandings of the world that do not get represented in most systems
2-4 Şubat AB 2005, Gaziantep Classification Systems A classification system is an indexing language often based on a broad ordering of topical areas. Thesauri and classification systems both use this broad ordering and maintain a structure of broader, narrower, and related topics. Classification schemes commonly use a coded notation for representing a topic and it’s place in relation to other terms.
2-4 Şubat AB 2005, Gaziantep Classification Systems (cont.) Examples: –The Library of Congress Classification System –The Dewey Decimal Classification System –The ACM Computing Reviews Categories –The American Mathematical Society Classification System
2-4 Şubat AB 2005, Gaziantep Central Concepts in IR Documents Queries Collections Evaluation Relevance
2-4 Şubat AB 2005, Gaziantep Documents What do we mean by a document? –Full document? –Document surrogates? –Pages? Buckland “What is a Document”, “What is a ‘Digital Document’”What is a DocumentWhat is a ‘Digital Document’ Are IR systems better called Document Retrieval systems? A document is a representation of some aggregation of information, treated as a unit.
2-4 Şubat AB 2005, Gaziantep Collection A collection is some physical or logical aggregation of documents –A database –A Library –An index? –Others?
2-4 Şubat AB 2005, Gaziantep Queries A query is some expression of a user’s information needs Can take many forms –Natural language description of need –Formal query in a query language Queries may not be accurate expressions of the information need –Differences between conversation with a person and formal query expression
2-4 Şubat AB 2005, Gaziantep Evaluation Why Evaluate? What to Evaluate? How to Evaluate?
2-4 Şubat AB 2005, Gaziantep Why Evaluate? Determine if the system is desirable Make comparative assessments Others?
2-4 Şubat AB 2005, Gaziantep What to Evaluate? How much of the information need is satisfied. How much was learned about a topic. Incidental learning: –How much was learned about the collection. –How much was learned about other topics. How inviting the system is.
2-4 Şubat AB 2005, Gaziantep What to Evaluate? What can be measured that reflects users’ ability to use system? (Cleverdon 66) –Coverage of Information –Form of Presentation –Effort required/Ease of Use –Time and Space Efficiency –Recall proportion of relevant material actually retrieved –Precision proportion of retrieved material actually relevant effectiveness
2-4 Şubat AB 2005, Gaziantep Relevance In what ways can a document be relevant to a query? –Answer precise question precisely. –Partially answer question. –Suggest a source for more information. –Give background information. –Remind the user of other knowledge. –Others...
2-4 Şubat AB 2005, Gaziantep Relevance “Intuitively, we understand quite well what relevance means. It is a primitive ‘y’ know’ concept, as is information for which we hardly need a definition. … if and when any productive contact [in communication] is desired, consciously or not, we involve and use this intuitive notion or relevance.” »Saracevic, 1975 p. 324
2-4 Şubat AB 2005, Gaziantep Relevance How relevant is the document –for this user, for this information need. Subjective, but Measurable to some extent –How often do people agree a document is relevant to a query? How well does it answer the question? –Complete answer? Partial? –Background Information? –Hints for further exploration?
2-4 Şubat AB 2005, Gaziantep Relevance Research and Thought Review to 1975 by Saracevic Reconsideration of user-centered relevance by Schamber, Eisenberg and Nilan, 1990 Special Issue of JASIS on relevance (April 1994, 45(3))
2-4 Şubat AB 2005, Gaziantep Saracevic Relevance is considered as a measure of effectiveness of the contact between a source and a destination in a communications process –Systems view –Destinations view –Subject Literature view –Subject Knowledge view –Pertinence –Pragmatic view
2-4 Şubat AB 2005, Gaziantep Define your own relevance Relevance is the (A) gage of relevance of an (B) aspect of relevance existing between an (C) object judged and a (D) frame of reference as judged by an (E) assessor Where… From Saracevic, 1975 and Schamber 1990
2-4 Şubat AB 2005, Gaziantep A. Gages Measure Degree Extent Judgement Estimate Appraisal Relation
2-4 Şubat AB 2005, Gaziantep B. Aspect Utility Matching Informativeness Satisfaction Appropriateness Usefulness Correspondence
2-4 Şubat AB 2005, Gaziantep C. Object judged Document Document representation Reference Textual form Information provided Fact Article
2-4 Şubat AB 2005, Gaziantep D. Frame of reference Question Question representation Research stage Information need Information used Point of view request
2-4 Şubat AB 2005, Gaziantep E. Assessor Requester Intermediary Expert User Person Judge Information specialist
2-4 Şubat AB 2005, Gaziantep Schamber, Eisenberg and Nilan “Relevance is the measure of retrieval performance in all information systems, including full-text, multimedia, question- answering, database management and knowledge-based systems.” Systems-oriented relevance: Topicality User-Oriented relevance Relevance as a multi-dimensional concept
2-4 Şubat AB 2005, Gaziantep Schamber, et al. Conclusions “Relevance is a multidimensional concept whose meaning is largely dependent on users’ perceptions of information and their own information need situations Relevance is a dynamic concept that depends on users’ judgements of the quality of the relationship between information and information need at a certain point in time. Relevance is a complex but systematic and measureable concept if approached conceptually and operationally from the user’s perspective.”
2-4 Şubat AB 2005, Gaziantep Froehlich Centrality and inadequacy of Topicality as the basis for relevance Suggestions for a synthesis of views
2-4 Şubat AB 2005, Gaziantep Janes’ View of Relevance Topicality Pertinence Relevance Utility Satisfaction