Main | Browse | Search | Author Links | Manage ETD List | Review ETDs | Catalog ETDs | Help
 

Title page for ETD etd-07152008-150924


Type of Document Dissertation
Author Acosta Negrin, William Felix
Author's Email Address wacosta@gmail.com
URN etd-07152008-150924
Title Improving Search in Unstructured Peer-to-Peer Systems
Degree Doctor of Philosophy
Department Computer Science and Engineering
Advisory Committee
Advisor Name Title
Surendar Chandra Committee Chair
Amitabh Chaudhary Committee Member
John Tracey Committee Member
Patrick Flynn Committee Member
Keywords
  • workload analysis
  • Peer-to-peer
  • P2P
  • information retrieval
  • search
Date of Defense 2008-07-02
Availability unrestricted
Abstract
Peer-to-peer (P2P) systems have been studied as alternatives to centralized systems for their ability to federate the processing, storage and network resources available among a set of independently owned peers for the common good. Various distributed applications have been designed on top of a P2P infrastructure. For example, systems such as PPLive and Joost use peers for streaming videos while systems such as Gnutella and BitTorrent use peers for distributing files.

In this thesis, I focus on P2P file sharing systems. Peers in file-sharing systems form an application-level distributed overlay to manage the communication between peers. Peers then use this overlay to locate files available from other peers. The effectiveness of locating files in unstructured P2P systems depended on the ease with which other peers could be reached. Prior research efforts created overlays that had good network behavior. However, these prior research efforts did not fully investigate the nature of query resolution. In general, queries for objects were resolved by matching terms in the query to terms in the object annotations. The performance of file-sharing systems fundamentally depended not only on the ability to route queries to various peers, but also on whether the query itself could be resolved.

We performed an extensive analysis of the annotations of the files stored in the Gnutella and iTunes P2P systems as well as queries that users were issuing to locate objects in Gnutella. Despite differences in how objects were annotated between Gnutella and iTunes, the overall popularity distribution of objects was similar. I observed a fundamental mismatch between the way that objects were named and the way that they were searched in Gnutella; terms that were popular in the file annotations were not popular in the queries. This mismatch meant that many search queries were unlikely to find matching objects in the system. I showed that approximately 55% of the queries in Gnutella had no matching files. Optimizing the overlays or the search mechanisms itself was unlikely to resolve the mismatch between query terms and file annotations. These results remained consistent across my analysis of the P2P system over several years, suggesting that both the users query behavior and the way that objects were annotated had stabilized.

In this thesis, I addressed the problem of unresolved queries due to query term and object annotation mismatches by transforming the original query and selectively removing terms from the original query in order to broaden its scope. The challenge was to choose query terms without analyzing user intent or performing semantic analysis of the queries and objects. I removed query terms based on observed data from a small number of peers (< 1%) before issuing the modified query against the entire P2P system. Using captured queries and files shared by peers in Gnutella, I showed that my approach improved the maximum success rates from about 45% to over 85% of the queries. We also performed experimental analysis of the query term selection algorithm on a live Gnutella network. We showed that over 61% of the re-written queries were resolved successfully by the Gnutella network.

Files
  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  AcostaWF072008.pdf 1.81 Mb 00:08:22 00:04:18 00:03:46 00:01:53 00:00:09

Browse All Available ETDs by ( Author | Department )

If you have more questions or technical problems, please Contact the Graduate School.