Youssef Bassil

CL
18papers
516citations
Novelty32%
AI Score21

18 Papers

SEMay 31, 2012
A Simulation Model for the Waterfall Software Development Life Cycle

Youssef Bassil

Software development life cycle or SDLC for short is a methodology for designing, building, and maintaining information and industrial systems. So far, there exist many SDLC models, one of which is the Waterfall model which comprises five phases to be completed sequentially in order to develop a software solution. However, SDLC of software systems has always encountered problems and limitations that resulted in significant budget overruns, late or suspended deliveries, and dissatisfied clients. The major reason for these deficiencies is that project directors are not wisely assigning the required number of workers and resources on the various activities of the SDLC. Consequently, some SDLC phases with insufficient resources may be delayed; while, others with excess resources may be idled, leading to a bottleneck between the arrival and delivery of projects and to a failure in delivering an operational product on time and within budget. This paper proposes a simulation model for the Waterfall development process using the Simphony.NET simulation tool whose role is to assist project managers in determining how to achieve the maximum productivity with the minimum number of expenses, workers, and hours. It helps maximizing the utilization of development processes by keeping all employees and resources busy all the time to keep pace with the arrival of projects and to decrease waste and idle time. As future work, other SDLC models such as spiral and incremental are to be simulated, giving project executives the choice to use a diversity of software development methodologies.

CLApr 26, 2012
Context-sensitive Spelling Correction Using Google Web 1T 5-Gram Information

Youssef Bassil, Mohammad Alwani

In computing, spell checking is the process of detecting and sometimes providing spelling suggestions for incorrectly spelled words in a text. Basically, a spell checker is a computer program that uses a dictionary of words to perform spell checking. The bigger the dictionary is, the higher is the error detection rate. The fact that spell checkers are based on regular dictionaries, they suffer from data sparseness problem as they cannot capture large vocabulary of words including proper names, domain-specific terms, technical jargons, special acronyms, and terminologies. As a result, they exhibit low error detection rate and often fail to catch major errors in the text. This paper proposes a new context-sensitive spelling correction method for detecting and correcting non-word and real-word errors in digital text documents. The approach hinges around data statistics from Google Web 1T 5-gram data set which consists of a big volume of n-gram word sequences, extracted from the World Wide Web. Fundamentally, the proposed method comprises an error detector that detects misspellings, a candidate spellings generator based on a character 2-gram model that generates correction suggestions, and an error corrector that performs contextual error correction. Experiments conducted on a set of text documents from different domains and containing misspellings, showed an outstanding spelling error correction rate and a drastic reduction of both non-word and real-word errors. In a further study, the proposed algorithm is to be parallelized so as to lower the computational cost of the error detection and correction processes.

SEApr 1, 2012
Management Language Specifications For Digital Ecosystems

Youssef Bassil

This paper defines the specifications of a management language intended to automate the control and administration of various service components connected to a digital ecosystem. It is called EML short for Ecosystem Management Language and it is based on proprietary syntax and notation and contains a set of managerial commands issued by the system's administrator via a command console. Additionally, EML is shipped with a collection of self-adaptation procedures called SAP. Their purpose is to provide self-adaptation properties to the ecosystem allowing it to self-optimize itself based on the state of its execution environment. On top of that, there exists the EMU short for Ecosystem Management Unit which interprets, validates, parses, and executes EML commands and SAP procedures. Future research can improve upon EML so much so that it can be extended to support a larger set of commands in addition to a larger set of SAP procedures.

SEApr 1, 2012
Communication Language Specifications For Digital Ecosystems

Youssef Bassil

Service-based IT infrastructures are today's trend and the future for every enterprise willing to support dynamic and agile business to contend with the ever changing e-demands and requirements. A digital ecosystem is an emerging business IT model for developing agile e-enterprises made out of self-adaptable, self-manageable, self-organizing, and sustainable service components. This paper defines the specifications of a communication language for exchanging data between connecting entities in digital ecosystems. It is called ECL short for Ecosystem Communication Language and is based on XML to format its request and response messages. An ECU short for Ecosystem Communication Unit is also presented which interprets, validates, parses ECL messages and routes them to their destination entities. ECL is open and provides transparent, portable, and interoperable communication between the different heterogeneous distributed components to send requests, and receive responses from each other, regardless of their incompatible protocols, standards, and technologies. As future research, digital signature for ECL is to be investigated so as to deliver data integrity as well as message authenticity for the digital ecosystem.

CLApr 1, 2012
OCR Post-Processing Error Correction Algorithm using Google Online Spelling Suggestion

Youssef Bassil, Mohammad Alwani

With the advent of digital optical scanners, a lot of paper-based books, textbooks, magazines, articles, and documents are being transformed into an electronic version that can be manipulated by a computer. For this purpose, OCR, short for Optical Character Recognition was developed to translate scanned graphical text into editable computer text. Unfortunately, OCR is still imperfect as it occasionally mis-recognizes letters and falsely identifies scanned text, leading to misspellings and linguistics errors in the OCR output text. This paper proposes a post-processing context-based error correction algorithm for detecting and correcting OCR non-word and real-word errors. The proposed algorithm is based on Google's online spelling suggestion which harnesses an internal database containing a huge collection of terms and word sequences gathered from all over the web, convenient to suggest possible replacements for words that have been misspelled during the OCR process. Experiments carried out revealed a significant improvement in OCR error correction rate. Future research can improve upon the proposed algorithm so much so that it can be parallelized and executed over multiprocessing platforms.

CLApr 1, 2012
OCR Context-Sensitive Error Correction Based on Google Web 1T 5-Gram Data Set

Youssef Bassil, Mohammad Alwani

Since the dawn of the computing era, information has been represented digitally so that it can be processed by electronic computers. Paper books and documents were abundant and widely being published at that time; and hence, there was a need to convert them into digital format. OCR, short for Optical Character Recognition was conceived to translate paper-based books into digital e-books. Regrettably, OCR systems are still erroneous and inaccurate as they produce misspellings in the recognized text, especially when the source document is of low printing quality. This paper proposes a post-processing OCR context-sensitive error correction method for detecting and correcting non-word and real-word OCR errors. The cornerstone of this proposed approach is the use of Google Web 1T 5-gram data set as a dictionary of words to spell-check OCR text. The Google data set incorporates a very large vocabulary and word statistics entirely reaped from the Internet, making it a reliable source to perform dictionary-based error correction. The core of the proposed solution is a combination of three algorithms: The error detection, candidate spellings generator, and error correction algorithms, which all exploit information extracted from Google Web 1T 5-gram data set. Experiments conducted on scanned images written in different languages showed a substantial improvement in the OCR error correction rate. As future developments, the proposed algorithm is to be parallelised so as to support parallel and distributed computing architectures.

IRApr 1, 2012
Semantic-Sensitive Web Information Retrieval Model for HTML Documents

Youssef Bassil, Paul Semaan

With the advent of the Internet, a new era of digital information exchange has begun. Currently, the Internet encompasses more than five billion online sites and this number is exponentially increasing every day. Fundamentally, Information Retrieval (IR) is the science and practice of storing documents and retrieving information from within these documents. Mathematically, IR systems are at the core based on a feature vector model coupled with a term weighting scheme that weights terms in a document according to their significance with respect to the context in which they appear. Practically, Vector Space Model (VSM), Term Frequency (TF), and Inverse Term Frequency (IDF) are among other long-established techniques employed in mainstream IR systems. However, present IR models only target generic-type text documents, in that, they do not consider specific formats of files such as HTML web documents. This paper proposes a new semantic-sensitive web information retrieval model for HTML documents. It consists of a vector model called SWVM and a weighting scheme called BTF-IDF, particularly designed to support the indexing and retrieval of HTML web documents. The chief advantage of the proposed model is that it assigns extra weights for terms that appear in certain pre-specified HTML tags that are correlated to the semantics of the document. Additionally, the model is semantic-sensitive as it generates synonyms for every term being indexed and later weights them appropriately to increase the likelihood of retrieving documents with similar context but different vocabulary terms. Experiments conducted, revealed a momentous enhancement in the precision of web IR systems and a radical increase in the number of relevant documents being retrieved. As further research, the proposed model is to be upgraded so as to support the indexing and retrieval of web images in multimedia-rich web documents.

ROApr 1, 2012
Service-Oriented Architecture for Space Exploration Robotic Rover Systems

Youssef Bassil

Currently, industrial sectors are transforming their business processes into e-services and component-based architectures to build flexible, robust, and scalable systems, and reduce integration-related maintenance and development costs. Robotics is yet another promising and fast-growing industry that deals with the creation of machines that operate in an autonomous fashion and serve for various applications including space exploration, weaponry, laboratory research, and manufacturing. It is in space exploration that the most common type of robots is the planetary rover which moves across the surface of a planet and conducts a thorough geological study of the celestial surface. This type of rover system is still ad-hoc in that it incorporates its software into its core hardware making the whole system cohesive, tightly-coupled, more susceptible to shortcomings, less flexible, hard to be scaled and maintained, and impossible to be adapted to other purposes. This paper proposes a service-oriented architecture for space exploration robotic rover systems made out of loosely-coupled and distributed web services. The proposed architecture consists of three elementary tiers: the client tier that corresponds to the actual rover; the server tier that corresponds to the web services; and the middleware tier that corresponds to an Enterprise Service Bus which promotes interoperability between the interconnected entities. The niche of this architecture is that rover's software components are decoupled and isolated from the rover's body and possibly deployed at a distant location. A service-oriented architecture promotes integrate-ability, scalability, reusability, maintainability, and interoperability for client-to-server communication.

CLApr 1, 2012
Parallel Spell-Checking Algorithm Based on Yahoo! N-Grams Dataset

Youssef Bassil

Spell-checking is the process of detecting and sometimes providing suggestions for incorrectly spelled words in a text. Basically, the larger the dictionary of a spell-checker is, the higher is the error detection rate; otherwise, misspellings would pass undetected. Unfortunately, traditional dictionaries suffer from out-of-vocabulary and data sparseness problems as they do not encompass large vocabulary of words indispensable to cover proper names, domain-specific terms, technical jargons, special acronyms, and terminologies. As a result, spell-checkers will incur low error detection and correction rate and will fail to flag all errors in the text. This paper proposes a new parallel shared-memory spell-checking algorithm that uses rich real-world word statistics from Yahoo! N-Grams Dataset to correct non-word and real-word errors in computer text. Essentially, the proposed algorithm can be divided into three sub-algorithms that run in a parallel fashion: The error detection algorithm that detects misspellings, the candidates generation algorithm that generates correction suggestions, and the error correction algorithm that performs contextual error correction. Experiments conducted on a set of text articles containing misspellings, showed a remarkable spelling error correction rate that resulted in a radical reduction of both non-word and real-word errors in electronic text. In a further study, the proposed algorithm is to be optimized for message-passing systems so as to become more flexible and less costly to scale over distributed machines.

NEApr 1, 2012
Neural Network Model for Path-Planning of Robotic Rover Systems

Youssef Bassil

Today, robotics is an auspicious and fast-growing branch of technology that involves the manufacturing, design, and maintenance of robot machines that can operate in an autonomous fashion and can be used in a wide variety of applications including space exploration, weaponry, household, and transportation. More particularly, in space applications, a common type of robots has been of widespread use in the recent years. It is called planetary rover which is a robot vehicle that moves across the surface of a planet and conducts detailed geological studies pertaining to the properties of the landing cosmic environment. However, rovers are always impeded by obstacles along the traveling path which can destabilize the rover's body and prevent it from reaching its goal destination. This paper proposes an ANN model that allows rover systems to carry out autonomous path-planning to successfully navigate through challenging planetary terrains and follow their goal location while avoiding dangerous obstacles. The proposed ANN is a multilayer network made out of three layers: an input, a hidden, and an output layer. The network is trained in offline mode using back-propagation supervised learning algorithm. A software-simulated rover was experimented and it revealed that it was able to follow the safest trajectory despite existing obstacles. As future work, the proposed ANN is to be parallelized so as to speed-up the execution time of the training process.

IRApr 1, 2012
Hybrid Information Retrieval Model For Web Images

Youssef Bassil

The Bing Bang of the Internet in the early 90's increased dramatically the number of images being distributed and shared over the web. As a result, image information retrieval systems were developed to index and retrieve image files spread over the Internet. Most of these systems are keyword-based which search for images based on their textual metadata; and thus, they are imprecise as it is vague to describe an image with a human language. Besides, there exist the content-based image retrieval systems which search for images based on their visual information. However, content-based type systems are still immature and not that effective as they suffer from low retrieval recall/precision rate. This paper proposes a new hybrid image information retrieval model for indexing and retrieving web images published in HTML documents. The distinguishing mark of the proposed model is that it is based on both graphical content and textual metadata. The graphical content is denoted by color features and color histogram of the image; while textual metadata are denoted by the terms that surround the image in the HTML document, more particularly, the terms that appear in the tags p, h1, and h2, in addition to the terms that appear in the image's alt attribute, filename, and class-label. Moreover, this paper presents a new term weighting scheme called VTF-IDF short for Variable Term Frequency-Inverse Document Frequency which unlike traditional schemes, it exploits the HTML tag structure and assigns an extra bonus weight for terms that appear within certain particular HTML tags that are correlated to the semantics of the image. Experiments conducted to evaluate the proposed IR model showed a high retrieval precision rate that outpaced other current models.

AIApr 1, 2012
Expert PC Troubleshooter With Fuzzy-Logic And Self-Learning Support

Youssef Bassil

Expert systems use human knowledge often stored as rules within the computer to solve problems that generally would entail human intelligence. Today, with information systems turning out to be more pervasive and with the myriad advances in information technologies, automating computer fault diagnosis is becoming so fundamental that soon every enterprise has to endorse it. This paper proposes an expert system called Expert PC Troubleshooter for diagnosing computer problems. The system is composed of a user interface, a rule-base, an inference engine, and an expert interface. Additionally, the system features a fuzzy-logic module to troubleshoot POST beep errors, and an intelligent agent that assists in the knowledge acquisition process. The proposed system is meant to automate the maintenance, repair, and operations (MRO) process, and free-up human technicians from manually performing routine, laborious, and timeconsuming maintenance tasks. As future work, the proposed system is to be parallelized so as to boost its performance and speed-up its various operations.

ROApr 1, 2012
Service-Oriented Architecture for Weaponry and Battle Command and Control Systems in Warfighting

Youssef Bassil

Military is one of many industries that is more computer-dependent than ever before, from soldiers with computerized weapons, and tactical wireless devices, to commanders with advanced battle management, command and control systems. Fundamentally, command and control is the process of planning, monitoring, and commanding military personnel, weaponry equipment, and combating vehicles to execute military missions. In fact, command and control systems are revolutionizing as war fighting is changing into cyber, technology, information, and unmanned warfare. As a result, a new design model that supports scalability, reusability, maintainability, survivability, and interoperability is needed to allow commanders, hundreds of miles away from the battlefield, to plan, monitor, evaluate, and control the war events in a dynamic, robust, agile, and reliable manner. This paper proposes a service-oriented architecture for weaponry and battle command and control systems, made out of loosely-coupled and distributed web services. The proposed architecture consists of three elementary tiers: the client tier that corresponds to any computing military equipment; the server tier that corresponds to the web services that deliver the basic functionalities for the client tier; and the middleware tier that corresponds to an enterprise service bus that promotes interoperability between all the interconnected entities. A command and control system was simulated and experimented and it successfully exhibited the desired features of SOA. Future research can improve upon the proposed architecture so much so that it supports encryption for securing the exchange of data between the various communicating entities of the system.

SEMar 24, 2012
Distributed, Cross-Platform, and Regression Testing Architecture for Service-Oriented Architecture

Youssef Bassil

As per leading IT experts, today's large enterprises are going through business transformations. They are adopting service-based IT models such as SOA to develop their enterprise information systems and applications. In fact, SOA is an integration of loosely-coupled interoperable components, possibly built using heterogeneous software technologies and hardware platforms. As a result, traditional testing architectures are no more adequate for verifying and validating the quality of SOA systems and whether they are operating to specifications. This paper first discusses the various state-of-the-art methods for testing SOA applications, and then it proposes a novel automated, distributed, cross-platform, and regression testing architecture for SOA systems. The proposed testing architecture consists of several testing units which include test engine, test code generator, test case generator, test executer, and test monitor units. Experiments conducted showed that the proposed testing architecture managed to use parallel agents to test heterogeneous web services whose technologies were incompatible with the testing framework. As future work, testing non-functional aspects of SOA applications are to be investigated so as to allow the testing of such properties as performance, security, availability, and scalability.

CLMar 23, 2012
ASR Context-Sensitive Error Correction Based on Microsoft N-Gram Dataset

Youssef Bassil, Paul Semaan

At the present time, computers are employed to solve complex tasks and problems ranging from simple calculations to intensive digital image processing and intricate algorithmic optimization problems to computationally-demanding weather forecasting problems. ASR short for Automatic Speech Recognition is yet another type of computational problem whose purpose is to recognize human spoken speech and convert it into text that can be processed by a computer. Despite that ASR has many versatile and pervasive real-world applications,it is still relatively erroneous and not perfectly solved as it is prone to produce spelling errors in the recognized text, especially if the ASR system is operating in a noisy environment, its vocabulary size is limited, and its input speech is of bad or low quality. This paper proposes a post-editing ASR error correction method based on MicrosoftN-Gram dataset for detecting and correcting spelling errors generated by ASR systems. The proposed method comprises an error detection algorithm for detecting word errors; a candidate corrections generation algorithm for generating correction suggestions for the detected word errors; and a context-sensitive error correction algorithm for selecting the best candidate for correction. The virtue of using the Microsoft N-Gram dataset is that it contains real-world data and word sequences extracted from the web which canmimica comprehensive dictionary of words having a large and all-inclusive vocabulary. Experiments conducted on numerous speeches, performed by different speakers, showed a remarkable reduction in ASR errors. Future research can improve upon the proposed algorithm so much so that it can be parallelized to take advantage of multiprocessor and distributed systems.

CLMar 23, 2012
Post-Editing Error Correction Algorithm for Speech Recognition using Bing Spelling Suggestion

Youssef Bassil, Mohammad Alwani

ASR short for Automatic Speech Recognition is the process of converting a spoken speech into text that can be manipulated by a computer. Although ASR has several applications, it is still erroneous and imprecise especially if used in a harsh surrounding wherein the input speech is of low quality. This paper proposes a post-editing ASR error correction method and algorithm based on Bing's online spelling suggestion. In this approach, the ASR recognized output text is spell-checked using Bing's spelling suggestion technology to detect and correct misrecognized words. More specifically, the proposed algorithm breaks down the ASR output text into several word-tokens that are submitted as search queries to Bing search engine. A returned spelling suggestion implies that a query is misspelled; and thus it is replaced by the suggested correction; otherwise, no correction is performed and the algorithm continues with the next token until all tokens get validated. Experiments carried out on various speeches in different languages indicated a successful decrease in the number of ASR errors and an improvement in the overall error correction rate. Future research can improve upon the proposed algorithm so much so that it can be parallelized to take advantage of multiprocessor computers.

SEFeb 11, 2012
Building sustainable ecosystem-oriented architectures

Youssef Bassil

Currently, organizations are transforming their business processes into e-services and service-oriented architectures to improve coordination across sales, marketing, and partner channels, to build flexible and scalable systems, and to reduce integration-related maintenance and development costs. However, this new paradigm is still fragile and lacks many features crucial for building sustainable and progressive computing infrastructures able to rapidly respond and adapt to the always-changing market and environmental business. This paper proposes a novel framework for building sustainable Ecosystem- Oriented Architectures (EOA) using e-service models. The backbone of this framework is an ecosystem layer comprising several computing units whose aim is to deliver universal interoperability, transparent communication, automated management, self-integration, self-adaptation, and security to all the interconnected services, components, and devices in the ecosystem. Overall, the proposed model seeks to deliver a comprehensive and a generic sustainable business IT model for developing agile e-enterprises that are constantly up to new business constraints, trends, and requirements. Future research can improve upon the proposed model so much so that it supports computational intelligence to help in decision making and problem solving.

SEFeb 11, 2012
Autonomic html interface generator for web applications

Youssef Bassil, Mohammad Alwani

Recent advances in computing systems have led to a new digital era in which every area of life is nearly interrelated with information technology. However, with the trend towards large-scale IT systems, a new challenge has emerged. The complexity of IT systems is becoming an obstacle that hampers the manageability, operability, and maintainability of modern computing infrastructures. Autonomic computing popped up to provide an answer to these ever-growing pitfalls. Fundamentally, autonomic systems are self-configuring, self-healing, self-optimizing, and self-protecting; hence, they can automate all complex IT processes without human intervention. This paper proposes an autonomic HTML web-interface generator based on XML Schema and Style Sheet specifications for self-configuring graphical user interfaces of web applications. The goal of this autonomic generator is to automate the process of customizing GUI web-interfaces according to the ever-changing business rules, policies, and operating environment with the least IT labor involvement. The conducted experiments showed a successful automation of web interfaces customization that dynamically self-adapts to keep with the always-changing business requirements. Future research can improve upon the proposed solution so that it supports the selfconfiguring of not only web applications but also desktop applications.