Jasa Konsultasi Disertasi Keuangan (1) dan Perbankan S3, hubungi WA 0821 2230 7021: Contoh paper teknik untuk anda: Computing for Sustainable Global Development

2016 3rd International Conference on Computing for Sustainable Global Development

(INDIACom 2016)

New Delhi, India

16-18 March 2016

Pages 1-820

1/5 IEEE Catalog Number: ISBN: CFP1683W-POD

978-1-4673-9417-8

Management

***This publication is a representation of what appears in the IEEE Digital Libraries. Some format issues inherent in the e-media version may also appear in this print version.

IEEE Catalog Number: CFP1683W-POD

ISBN (Print-On-Demand): 978-1-4673-9417-8

ISBN (Online): 978-9-3805-4421-2

Additional Copies of This Publication Are Available From:

Curran Associates, Inc

57 Morehouse Lane

Red Hook, NY 12571 USA

Phone: (845) 758-0400

Fax: (845) 758-2633

E-mail: curran@proceedings.com

Web: www.proceedings.com

Table of Contents

Power Consumption Analysis Across Heterogeneous Data Center using Cloudsim 1

Pradeep Singh Rawat, Priti Dimri, G P Saroha and Varun Barthwal

Underlying Text Independent Speaker Recognition 6

Nilu Singh and R A Khan

Automatic Ration Distribution System-A Review 11

Swapnil R Kurkute, Chetan Medhe, Ashlesha Revgade and Ashwini Kshirsagar

Evaluation for POS Tagger, Chunk and Resolving Issues in Word Sense Disambiguate in Machine Translation for Hindi to

English Languages 14

Shachi Mall and Umesh Chandra Jaiswal

Virtual Calling Number for ESME 19

Raghavendra G and K Satyanarayan Reddy

Measuring Performance Outcome of e-Governance Projects through eTaal 23

Shefali Sushil Dash, I P S Sethi and O P Gupta

SCD: Secret Communication Devices (SCD to Control User’s Audio Noise in Confidential Manner) 28 Monika Jain, Dilip Sisodia and Anil Kumar Dubey

New Performance Analysis of AODV, DSDV and OLSR Routing Protocol for MANET 33

Neelu Kumari, Sandeep Kumar Gupta, Rajni Choudhary and Shubh Laxshmi Agrwal

A Novel Filtering Mechanism using Cascading Filters on 3 D Images 36

Jasvinder Sadana and Navin Rajpal

Ant Colony Optimization based TDM for DUA 42

Shivendra Goel

Performance Improvement in AODV Routing Protocol with Artificial Intelligence 45

Amarjeet Singh, Tripatdeep Singh, Mohit Mittal and Krishan Kumar

Green Computing: A Greener Approach towards IT 50

Sharmistha Dutta and Ankit Kumar Gupta

A Study of Social Media Applications in Indian Tourism 54

V Senthil

A Comparative Analysis of Fuzzy based RAnking Functions for Information Retrieval 60

Yogesh Gupta, Ashish Saini and A K Saxena

Finding Research Groups using Modularity based Community Detection Algorithm 65

S Rao Chintalapudi and M H M Krishna Prasad

Designing of SPF based Secure Web Application using Reverse Engineering 70

Nitish Pathak, Girish Sharma and B M Singh

Modeling a SQL Injection Attack 77

Navdeep Kaur and Parminder Kaur

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

xii

Automated Waterlogging Prevention System 83

Mitesh Vinayak Naik and Tanvi Santosh Javkar

Traffic Signal Preemption (TSP) System by Ordinary Vehicles in Case of Emergency based on Internet of Things

Ecosystem 85

Prabhanshu Attri, Fatima Rafiqui and Neha Rawal

ICT Based Communication Systems as Enabler for Technology Transfer 90

Anil K Saini and Vijay Kumar Khurana

Some Observations on Migration from Data Mining to Web Mining 100

Satyaveer Singh and Mahendra Singh Aswal

Analysis of Security Algorithms in Cloud Computing 106

Tanvi Agrawal and S K Singh

Enhancing e-Learning through Data Mining in the Context of Education Data 109

Lovkesh

Adaptability in Constant Modulus Algorithm and Optimization for Smart Antenna Systems 114 A Udawat, P C Sharma and S Katiyal

Design of Gain EnhAnced Stacked Rectangular Dielectric Resonator Antenna for C-Band Applications 119 Abhishek Kumar Gautam and Mahender Singh

Design of Rectangular Dielectric Resonator Antenna using Offset Micro-Strip Feed for Satellite Application 124 Mahender Singh, Abhishek Kumar Gautam, Deep Shikha Gautam and Arti Vaish

Identifying Moving Objects in a Video using Modified Background Subtraction and Optical Flow Method 129 Sumati Manchanda and Shanu Sharma

Identification of Parameters of Digital IIR Filters using Teaching-Learning Optimization Algorithm and Statistical

Inference Comparison with Particle Swarm Optimization Algorithms 134

S Sharma, S Katiyal and L D Arya

Discrete Frailty Models 140

S G Parekh, D K Ghosh, S R Patel and D P Raykundaliya

Proximal Privacy Preserving Linear Classification of Horizontally and Checkerboard Partitioned Data 144 Aparna Mehra, Anu G Aggarwal and Deepti Chadha Singhal

Affective Computing: Emotion Sensing using 3D Images 150

Madhusudan and Aman Kumar Sharma

Beyond M-Score for Detection of Misusability by Malicious Insiders 155

J Pradeep Kumar, A Udaya Kumar and T Ravi

Survey on University Timetabling Problem 160

Jaya Pandey and A K Sharma

Usability of KSOM and Classical Set in Information Retrieval 165

Mukul Aggarwal and Amod Kumar Tiwari

Job Shop Scheduling Algorithms - A Shift from Traditional Techniques to Non-Traditional Techniques 169 Meenu Dave and Kirti Choudhary

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

xiii

Minimum Configuration MLP for Solving XOR Problem 174

Vaibhav Kant Singh and Shweta Pandey

Different Approaches to Convert Speech into Sign Language 180

Shahnaj Fatima, Ambuj Agarwal and Pooja Gupta

PAID: Predictive Agriculture Analysis of Data Integration in India 184

Purva Grover and Rahul Johari

Effect of Population and Bit Size on Optimization of Function by Genetic Algorithm 189

N K Jain, Uma Nangia and Jyoti Jain

Design and Advances of Cylindrical Dielectric Resonator Antenna - A Review 195

Simranjit Singh, Prabhjot Singh and Mahender Singh

Graph Pattern Matching: A Brief Survey of Challenges and Research Directions 199

Komal Singh and Vikram Singh

Microchips a Leading Innovation in Medicine 205

Shruti Shukla, Ambuj Kumar Agarwal and Ashish Lakhmani

Is Sanskrit the Most Suitable Language for Natural Language Processing? 211

Rashmi Jha, Deeptanshu Jha, Aman Jha and Sonika Jha

A Novel Compact Slotted Microstrip Patch Dual-Band Antenna with Rectangular Slotted Substrate and Symmetrical W

Shaped Slotted Ground Having Band-Notching Characteristics for UWB Application 217

Somesh Sharma

A Study on Video Surveillance System for Object Detection and Tracking 221

Pawan Kumar Mishra and G P Saroha

"Face Detection Through LSE Devoid of Re-initiation via RD" 227

Tanvi Dhingra and Virendra P Vishwakarma

Analytical Study of IoT as Emerging Need of the Modern Era 233

Sambhav Gutpa, Nishant Mudgal and Rishabh Mehta

Analysis of Astrology and Scientific Calculation through Orbital Period 236

Harsh Sharma, Naveen Rao and Mohit Sharma

Ultra Sapient Inference Engine 240

Ashish Sharma and Nausheen Khilji

A Review on Improving Recommendation Quality by using Relevant Contextual Information 244 Sanjay Kumar Dwivedi and Bhupesh Rawat

Identification of Scopes for Applicability of Grammatical Inferences for Mobile Network Traffic Prediction 249 M Amarendhar Reddy, Rajeshwar Rao Kodipaka and B Venkata Ramudu

A Survey Paper on Workload Prediction Requirements of Cloud Computing 254

Supreet Kaur Sahi and V S Dhaka

Segmentation on Moving Shadow Detection and Removal by Symlet Transform for Vehicle Detection 259 Cheruku Sandesh Kumar, Ratnadeep Roy, Archek Praveen Kumar, Ashwani Kumar Yadav and Mayank Gupta

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

xiv

Speech Recognition using Arithmetic Coding and MFCC for Telugu Language 265

Archek Praveen Kumar, Neeraj Kumar, Cheruku Sandesh Kumar, Ashwani Kumar Yadav and Abhay Sharma

A Stacked Sparse Autoencoder based Architecture for Punjabi and English Spoken Language Classification using MFCC

Features 269

Vaibhav Arora, Pulkit Sood and Kumar Utkarsh Keshari

A Generic Tool to Process Mongodb or Cassandra Dataset using Hadoop Streaming 273

Chandangole Gopal R and Tidke Bharat A

Web Performance Optimization through Smart Resource and Asset Optimizations 277

Shailesh K S and P V Suresh

An Implementation Approach of Big Data Computation by Mapping Java Classes to MapReduce 282 Chitresh Verma and Rajiv Pandey

An Analysis of Images using Fuzzy Contrast Enhancement Techniques 288

Pushpa Mamoria and Deepa Raj

Auto-Segmentation using Mean-Shift and Entropy Analysis 292

Seba Susan and Ankit Kumar

Performance of Wavelet based Image Compression on Medical Images for Cloud Computing 297 D Ravichandran, Ramesh Nimmatoori and Ashwin Dhivakar M R

Versatility Exploration of FPGAs by Fabrication of Variegated Logic Designs 303

Mitushi Jain, S Banerjee, M Mahawar, S Sinha, Sukriti and S Singhal

Development of a Flower Pollination Algorithm Toolkit in Labview 309

Navneet Kaur Bhatia, Vineet Kumar, K P S Rana, Pradeep Gupta and Puneet Mishra

Developing Smart Cities using Internet of Things: An Empirical Study 315

Gaurav Sarin

Illumination Variation for Object using Sparse Non Negative Matrix Factorization 321

Smita D Khandagale and Gargi Sameer Phadke

Fuzzy Logic Controller with SVC to Improve Transient Stability for a Grid Connected Distributed Generation System 324 Poonam Suraj and Sunita Chauhan

Modeling, Design and Analysis of High CMRR Two Stage Gate Driven Operational Transconductance Amplifier using

0.18 μm CMOS Technology 329

Siddesh Gaonkar and Sushma P S

Identification of Tigers (Panthera Tigris) through their Pugmark using Pattern Recognition using Android Phone

Application 335

Rushikesh Dhande and Vijay Gulhane

Efficient Pre-Processing for Enhanced Semantics based Distributed Document Clustering 338 Neepa Shah and Sunita Mahajan

Data Cleaning for Data Quality 344

S Swapna, P Niranjan, B Srinivas and R Swapna

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

Parametric Analysis of Cellular Network Design in Warfare Situation 349

Neha Aggarwal, Shalini Bahel,Teglovy Singh and Rajan Vohra

Performance Enhancement and Analysis of Privacy Preservation using Slicing Approach over HADOOP 353 Abhang Vikram Kishor and Sable Balasaheb Shrimantrao

Graphene in the Core of Optical Fibers 358

D K Das and S Sahoo

Inclination from Conventional to Contemporary Image Alignment Techniques in Remote Sensing 362 B Sirisha, P Chandra Sekhar, A S C Sastry and B Sandhya

A Novel Energy Efficiency Protocol for WSN based on Optimal Chain Routing 368

Arun Agarwal, Khushboo Gupta and K P Yadav

A Theoretical Study on Classifier Ensemble Methods and its Applications 374

Nazia Tabassum and Tanvir Ahmed

Dynamic Selection of Job Scheduling Policies for Performance Improvement in Cloud Computing 379 Vinay Chavan, Kishore Dhole and Parag Ravikant Kaveri

Pipelined Processor for Image Compression through Burrows - Wheeler Transform 383 Chirag Babulal Bafna, Ankit Berde, Karan Jashnani, Monali Chaudhari, and Simeran Sharma

UOCR: A Ligature based Approach for an Urdu OCR System 388

Tofik Ali, Tauseef Ahmad and Mohd. Imran

Prototype for Localization of Multiple Fire Detecting Mobile Robots in a Dynamic Environment 395 Sanober Farheen Memon, Imtiaz H Kalwar, Ian Grout, Elfed Lewis and Yasmeen Naz Panhwar

Extracting Brief Note from Internet Newspaper 401

Suraj B Karale and G A Patil

A Framework of Lasers for Optical interconnects 407

Sandeep Dahiya, Suresh Kumar and B K Kaushik

Data Retrieval Mechanism using Amazon Simple Storage Service and Windows AZURE 412 P Subhashini and Sindhura Nalla

Priority Aware Longest Job First (PA-LJF) Algorithm for Utilization of the Resource in Cloud Environment 415 Mohit Kumar and Subhash Chander Sharma

Optimization of Cost using DE in VLSI Floorplanning 421

Shefali Gautam

Classification of Renal Diseases using First Order and Higher Order Statistics 425

Komal Sharma and Jitendra Virmani

Obstacle Detection for Vehicles in Intelligent Transport System 431

Sejal V Maru, Vidhi R Shah and Rutvij H Jhaveri

An Innovative Method of Acquiring Optimization for Image Retrieval via Dual Clustering Method based on Segmentation 436 R Tamilkodi, W Jaishri, G Rosline Nesa Kumari and S Maruthuperumal

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

xvi

Provisioning Resources Elastically in IoT Cloud Systems 441

Parth S Panchmatia and Kailas K Devadkar

Providing Context - Aware Healthcare Services using Circular Geofencing Technique 446

Sushama Rani Dutta and Monideepa Roy

Opinion Mining and Sentiment Analysis 452

Rushlene Kaur Bakshi, Ravneet Kaur, Navneet Kaur and Gurpreet Kaur

Natural Language Parsing: using Finite State Automata 456

Rachana Rangra and Madhusudan

A Practical Approach to Overcome Glitches in Achieving High Performance Computing 464 Shaik Khaja Mohiddin, Suresh Babu Yalavarthi and D V Chandra Shekar

Mitigate Black Hole Attack using Trust with AODV in MANET 470

V Keerthika and N Malarvizhi

Improved Real-Time Energy Aware Parallel Task Scheduling in a Cluster 475

Apoorva Dobhal and Ranvijay

Distributed Data Association Rule Mining: Tools and Techniques 481

Manoj Sethi and Rajni Jindal

A New Approach to Anfis Modeling using Kernel based FCM Clustering 486

Sharifa Rajab and Vinod Sharma

A Secure and High Capacity Image Hiding Scheme using DWT and Arithmetic Coding 492 Janki Jasani and Sarita Visavalia

Road Accidents: Overview of its Causes, Avoidance Scheme and a New Proposed Technique for Avoidance 497 Nikhat Ikram and Shilpa Mahajan

Design & Fabrication of e-shaped Microstrip Patch Antenna for WLAN Application 500

Chandan and B S Rai

Comparative Analysis of Trust Establishment through Semantic and Non Semantic Models 504 Shally Garg and Suresh Kumar

Efficient e-Learning Management System through Web Socket 509

Shushant Arora, Jisha Maini, Priyanka Mallick, Poorva Goel and Rohit Rastogi

A Novel QoS-Aware Improved-Clustering-Heuristic for Wireless Sensor Networks 513

Manju, Satish Chand and Bizender Kumar

Designing of HB++ Protocol by using Hardware 519

Preeti and Sachin Kumar

Improvisation of Security in Image Steganography using DWT, Huffman Encoding & RC4 based LSB Embedding 523 Palak Mahajan and Heena Gupta

Insider and Flooding Attack in Cloud: A Discussion 530

Meenu Gupta, Rajeev Yadav and Gundeep Tanwar

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

xvii

A Survey on Security Attacks in Wireless Sensor Networks 536

Atul Gaware and S B Dhonde

Implementation of Automatic Meter Reading Technique using SYNC 2000 & SYNC 5000 in Indian Power System 540 Alok Jain and M K Verma

A Study of Quality of Service in Network Mobility(NeMo)Using Qualnet 7.0 546

M Altamash Sheikh, Neeta Singh and Sumbul Afroz

A Novel Approach to Secure WEP by Introducing an Additional Layer over RC4 551

Ashish Garg

Privacy in Data Mining: A Review 556

Sharmistha Dutta and Ankit Kumar Gupta

Prevention Against DDOS Attack on Cloud Systems using Triple Filter: An Algorithmic Approach 560 Neeta Sharma, Mayank Singh and Anuranjan Misra

Privacy Preservation in Personalized Web Environment 566

Esmita Gupta and Deepali Vora

Design and Implementation of a Power and Speed Efficient Carry Select Adder on FPGA 571 Simarpreet Singh Chawla, Swapnil Aggarwal, Nidhi Goel and Mantek Singh Bhatia

Privacy Prevention of Sensitive Rules and Values using Perturbation Technique 577

Aparna Shinde, Khushboo Saxena, Amit Mishra and Shiv K sahu

Performance Analysis of Characteristic Parameters for Coverage Model in Wireless Sensor Networks 582 Pooja Chaturvedi and A K Daniel

Security in RFID based Smart Retail System 587

Ravi V and Aparna R

Performance Analysis of Data Link Layer Protocols with a Special Emphasis on Improving the Performance of Stop-and

Wait-ARQ 593

Arjun Malhotra and Khushboo Chitre

Power Management in WSN 598

Deepti Goyal and Sonal

Comprhensive Analysis of Spherical Wave Propagation in a Hilly Terrain using Uniform Theory of Diffraction 602 Diwaker Pant, Piyush Dhuliya and Priyanka Sharma

A New Chaotic-Primitive and its Application in Customizing AES for Lightweight Multimedia Encryption 607 Sakshi Dhall, Saibal K Pal and Kapil Sharma

A Retrospective Investigation of Message Authentication in Wireless Sensor Networks: A Review 613 Uma Meena and M K Jha

Design of Logical Scrambling and De-Scrambling System for High Speed Application 617

Dhiraj S Bhojane, Sneha S Oak and Atul S Joshi

Performance Investigation of Reactive Routing Protocol under Flooding Attack in MANET 623 Parveen Kakkar and Krishan Saluja

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

xviii

A KNN-ACO approach for Intrusion Detection using KDDCUP’99 Dataset 628

Sakchi Jaiswal, Khushboo Saxena, Amit Mishra and Shiv Kumar Sahu

The Helping Protocol "DHCP" 634

Abhijeet Kumar Rajput

Zero-Knowledge Proofs Technique using Integer Factorization for Analyzing Robustness in Cryptography 638 Chitranjan Prasad Sah, Kanhaiya Jha and Sushil Nepal

A Hybrid Technique for Spatial Image Steganography 643

Shreya Gupta, Akshay kalra and Charru Hasti

Attack Graphs for Defending Cyber Assets 648

Yogesh Chandra, Pallaw Kumar Mishra and Chaman Prakash Arya

Civic Body Information Framework for Deploying M Governance 654

Mayuresh N Mahajan, Tushar M Desai and Mandar S Bhave

Secure Rule Mining Techniques with Wireless and Wired Distributed Medium 657

Shravan Kumar, Jawahar Kumar, Raghvendra Kumar and Priyanka Pandey

Prediction Analysis of Delay in Transferring the Packets in Adhoc Networks 660

Harshita Tuli and Sanjay Kumar

Exploration of Efficient Symmetric Algorithms 663

Shivlal Mewada, Pradeep Sharma and S S Gautam

Credit Card Fraud Detection at Merchant SIDE using Neural Networks 667

Aman Srivastava, Mugdha Yadav, Sandipani Basu, Shubham Salunkhe, and Muzaffar Shabad

Broadcasting Techniques for Route Discovery in Mobile Adhoc Network - A Survey 671 Panthini V Patel and Bintu kadhiwala

Design of Penta Band T Shaped Fractal Patch Antenna for Wireless Application 675 Manpreet Kaur and A P Deepinder Singh

An Efficient Signcryption Algorithm using Bilinear Mapping 680

Vandani Verma and Deepika Gupta

A Survey on Congestion Control Mechanism in Multi-hop Wireless Network 683

Saumya Yadav and Dayashankar Singh

A Dive into Web Scraper World 689

Deepak Kumar Mahto and Lisha Singh

Qualitative Assessment of Authentication Measures 694

Amanpreet Kaur and K Mustafa

Optimized Study of Polarization Technique Imaging Sensor 699

Dimple Chawla and Lipi Passi

EnhAnced Secure Image Steganography using Double Encryption Algorithms 705

Y Manjula and K B Shivakumar

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

xix

Matching of Video Objects Taken from Different Camera Views by using Multi-Feature Fusion and Evolutionary

Learning Methods 709

S R Kharabe and B Raghu

Resource Utilization based Congestion Control for Wireless Sensor Network: A Review 715 Sushma N Rana and Pariza Kamboj

Steganography of the Keys into an Encrypted Speech Signal using Matlab 721

Divya Sharma and Deepshikha Sharma

Cryptography by Reversal of Speech Signal Elements and Implementing Checksum 725 Divya Sharma

Accessing Risk Priority of SSL SYN Attack using Game Theoretic Attack Defense Tree Model for Vanets 729 Sahil Garg and Gagangeet Singh Aujla

Open PGP based Secure Web Email 735

Rakesh Shukla, Hari Om Prakash and R Phanibhusan

Design of Low Profile and Broadband Microstrip Monopolar Patch Antenna 739

Chetan A Mishra, Uday Pandit Khot and Kochar Inderkumar

A Review of Efficient Data Utilization Schemes in Cloud Computing 743

Ankita Kaushik and Amit Chaturvedi

Risk based Quantitative Analysis of SQLIA on Web Application Database 748

Chandershekhar Sharma, S C Jain and Arvind K Sharma

Real-Time Intrusion Detection with Genetic, Fuzzy, Pattern Matching Algorithm 753 Priya Uttam Kadam and Manjusha Deshmukh

A Survey on Clustering Algorithms for Energy Efficiency in Wireless Sensor Network 759 Tuba Firdaus and M Hasan

An Improved Online Plagiarism Detection Approach for Semantic Analysis using Custom Search Engine 764 Kamalpreet Sharma and Balkrishan Jindal

An Overview of Social Semantic Web Framework 769

Usha Yadav, Gagandeep Singh Narula, Neelam Duhan and B K Murthy

Homomorphic Encryption Over Integers 774

Rachna Jain, Sushila Madan and Bindu Garg

The Next Generation Internet of Things 779

Shalini Vermani

Performance Analysis of Approaches for Coverage Issues in WSN 782

Meenakshi Yadav and A K Daniel

Wireless Communication-Moving from RF to Optical 788

Mrinmoyee Mukherjee

A Research on Point to Point QoS in 3G using OPNET Modeler 796

Charu Rawal and Rajeev Gupta

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

Secure IoT Architecture for Integrated Smart Services Environment 800

Vimal Jerald A, Albert Rabara S and Daisy Premila Bai

A Channel Utilization Scheme for IEEE 802.15.4 Working on ISM Band under IEEE 802.11 Intererence 806 Manish Kumar Giri and Girish Tiwari

"Biometrics - Iris Recognition System" A Study of Promising Approaches for Secured Authentication 811 Sarika B Solanke and Ratnadeep R Deshmukh

Intrusion Detection and Prevention System using K-Learning Classification in Cloud 815

Sunita Kumawat, Anil Kumar Sharma and Anjali Kumawat

Internet of Things (IoT) Security 821

Shivaji Kulkarni, Shrihari Durg and Nalini Iyer

Automatic Dynamic Malware Analysis Techniques for Linux Environment 825

Gaurav Damri and Deepti Vidyarthi

Survey on Bias Minimization and Application Performance Maximization using Trust Management in Mobile Adhoc

Networks 831

Pradnya M Nanaware, Sachin D Babar and Ajay Sinha

Acknowledgement based Approaches for Detecting Routing Misbehaviour in MANETs 835

Aditya Bansal, Aishwarey Varshney, Raghav Matta and Ashish Khanna

Tie Strength Prediction in OSN 841

Pratima and Rishabh Kaushal

Internet Gateway Discovery Approaches in Multihop Wireless Networks Ravi Sharma, Rakesh Kumar and Jay Prakash 845 Ravi Sharma, Rakesh Kumar and Jay Prakash

A Recent Trends in Software Defined Networking (SDN) Secuirty 851

Mudit Saxena and Rakesh Kumar

Problems Issues in the Information Security due to the Manual Mistakes 856

Mehtab Mehdi, Malik Rababah and M K Sharma

Biometric based Key-generation System for Multimedia Data Security 864

Indu Verma and Sanjay Jain

An Overview of Healthcare Perspective based Security Issues in Wireless Sensor Networks 870 Mamta and Shiva Prakash

Performance Evaluation of the Multi-slotted Micro-machined Patch Antenna 876

Rajat Arora, Shashi B Rana, Sandeep Arya and Saleem Khan

Secure Routing in MANETs using Three Reliable Matrices 880

Gopal Singh, Rahul Rishi and Harish Rohil

New Approach for Highly Secured I/O Transfer with Data on Timer Streaming 885

Suja G J and Sangeetha Jose

Cloud Computing Service Models: A Comparative Study 890

Mohammad Ubaidullah Bokhari, Qahtan Makki Shallal and Yahya Kord Tamandani

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

xxi

Security and Privacy Issues in Cloud Computing 896

Mohammad Ubaidullah Bokhari, Qahtan Makki Shallal and Yahya Kord Tamandani

Optimal Sensing Simulation in CRNs under Shadow-Fading Environments 901

Inderdeep Kaur Aulakh and Navpreet Kaur

Detection Techniques for Deadlock in Distributed Database System 906

Mamta Yadav and Udai Shanker

Mutual Exclusion and its Variants Survey in MANETs 912

Nikhil Gupta, Ashish Khanna, Sagar Gupta and Prashant Aggarwal

Self-Organized Routing for Radial Underwater Networks 918

Waheeduddin Hyder, Javier Poncela and Pablo Otero

An Improvement Over Kalman Filter for GPS Tracking 923

Ajit Singh and Sonal

Ontology-Driven Modeling for the Culturally-Sensitive Curriculum Development: A Case Study in the Context of

Vocational ICT Education in Afghanistan 928

Mohammad Hadi Hedayati and Laanpere Mart

A Comparison of Image Encryption Techniques based on Chaotic Maps 933

Ritesh Bansal, Rashmi Chawla and Shailender Gupta

Modified Play-fair Encryption Methodusing Quantum Concept 939

Reena Singh, Shaurya Taneja and Kavneet Kaur

Novel Simulation of Computer Memory Management 945

Ritika Wason, Surbhi Dagaur, Ayush Prashar, Chahat Bansal, Manjot Kaur and Smahi Maini

Compact Slotted Meandered PIFA Versus Conventional PIFA Antenna for DCS, GPS, Bluetooth/WLAN, 4G LTE,

WiMax, UMTS and GLONASS Applications 951

Akhilesh Verma and Anamika Chauhan

Hybrid Two-Tier Framework for Improved Security in Cloud Environment 955

Aarti Singh and Manisha Malhotra

Code Optimization as a Tool for Testing Software 961

Manpreet Singh Bajwa, Arun Prakash Aggarwal and Nitika Gupta

Efficient Applications for Mathematical Resources on Web 968

Sharaf Hussain, Samita Bai and Shakeel Khoja

Different Types of SRAM Chips for Power Reduction: A Survey 974

Vaishnavi Agey and D K Shedge

The Speaking Compiler - A Compiler with Audio for Immediate Error Correction 980

A Yaganteeswarudu

Implementing User based Collaborative Filtering to Build a Generic Product Recommender using Apache Mahout 984 Umar Farooque

A Survey on Image Processing Techniques for Tumor Detection in Mammograms 988

Amit Verma and Gayatri Khanna

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

xxii

Comparative Analysis of Software Engineering Paradigms 994

Amit Verma, Iqbaldeep Kaur and Namita Arora

Quatitative Evaluation of Software Architecture 1000

Chandni Ahuja, Parminder Kaur and Hardeep Singh

A Revolutionary Approach of Data Replication in Mining Methodology 1007

Anshi Gupta, Neha Goel and Shailendra Narayan Singh

Software Design Pattern Approach to Develop Login Framework 1013

Keval A Pithva and Ravirajsinh K Vaghela

An Insight to Multi Tasking in Cognitive Robotics 1018

Ranjan Kumar Panda and Shobhit Saxena

Optimization and Appraisal of PSO for CBS using CBSE Metrics 1024

Chander Diwaker and Pradeep tomar

Bandwidth Enhancement and Modification of Single Band Patch Antenna into Double Band 1029 Ranjeet Pratap Singh Bhadoriya and Sumit Nigam

Test Case Prioritization Technique based on Early Fault Detection using Fuzzy Logic 1033 Dharmveer Kumar Yadav and Sandip Dutta

Revisiting Software Reliability Engineering with Fuzzy Techniques 1037

Syed Wajahat Abbas Rizvi, Vivek Kumar Singh and Raees Ahmad Khan

Analysis of Website Usability Evaluation Methods 1043

Sukhpuneet Kaur, Kulwant Kaur and Parminder Kaur

Comparative Analysis of Open Source ERP Softwares for Small and Medium Enterprises 1047 Swati Bajaj and Sanjay Ojha

Tuning of Software Cost Drivers using BAT Algorithm 1051

Sanchi Girotra and Kapil Sharma

Cloud Era in Mobile Application Testing 1057

Anureet Kaur and Kulwant Kaur

Defining Parameters for Examining Effectiveness of Genetic Algorithm for Optimization Problems 1061 Rugved V Deolekar

Automated System for Code Generation from Unstructured Algorithm 1065

Naitik Chetan Soni, Dhruv Ashok Pawar, Namita Sandeep Tambe and Rugved Vivek Deolekar

Analysis of Intelligence Techniques on Sensor less Speed Control of Doubly Fed induction Machine (DFIM) 1071 Arunesh Kumar Singh, Abhinav Saxena, Abhilasha Pawar and Sangeeta Singh

Quality Model for Component-Based e-Commerce Application 1075

Keshav Agarwal, Shardul Kaushik, Chhaviraj, Akansh Gulati and Kavita Sheoran

Enhanced LZW Technique for Medical Image Compression 1080

Sadhana Singh and Preeti Pandey

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

xxiii

Facial Features: A Study for Pain Expression Recognition 1085

Renu Taneja, Anshul Goel and Megha Kumari

Comparative Analysis of Machine Learning Algorithms in OCR 1089

Arun Kumar Dubey, Amit Gupta and Sanchit Sharma

Implementation of Stateful Firewall using POX Controller 1093

Vipin Gupta, Sukhveer Kaur and Karamjeet Kaur

Synchronization Software Tool for Effective Data Mining 1097

Devendra Rao, Praveen Kr Srivastava, M Balasubramaniam and Vikas Bansal

A Measure for Modelling Non-Functional Requirements using Extended Use Case 1101 Harsimran Kaur and Ashish Sharma

Security Framework based on QoS and Networking for Service Oriented Architecture 1106 Abdul Muttalib Khan, Alankar Mishra and Riya Agarwal

Evolution of the Web and e-learning Application 1110

Prajakta Tambe and Deepali Vora

A Novel Expression for BER under Composite Fading for Nano Communication System 1114 Amit Kumar and S Pratap Singh

Comparative Analysis of Sensor-less Speed Control of Three Phase Induction Motor 1118 Abhinav Saxena, Divyang Singhal, Shipra Singh, Dushyant Hari Sharma and Priyal Gupta

A Novel Approach for Selecting an Effective Regression Testing Technique 1122

Priyanka, Harish Kumar and Naresh Chauhan

A Secure Approach of Image Encryption using QR Code on Social Media 1126

Amit Yadav, Surendra Yadav and Brahmdutt Bohra

EEG based Vowel Classification During Speech Imagery 1130

Basil M Idrees and Omar Farooq

Identification of Quality Parameters Associated with 3Vs of Big Data 1135

Ankur Aggarwal

Properties and Interaction of Object Oriented Software Agent with System 1141

Anand Kumar Pandey, A K Vasishtha and A S Saxena

Attributes Identification for Test Case Reusability in Regression Test Case Selection Techniques 1144 Priyanka Dhareula and Anita Ganpati

Frequency Regulation in a Standalone Wind-Diesel Hybrid Power System using Pitch-Angle Controller 1148 Om Krishan and Sathans

Review of Web Document Clustering Algorithms 1153

Sanjib Kumar Sahu and Shalini Srivastava

Data Trait Outflow in Cloud Computing 1156

Sakshi Jolly and Neha Gupta

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

xxiv

Performance Analysis of Image Segmentation using Watershed Algorithm, Fuzzy C Means of Clustering Algorithm and

Simulink Design 1160

Nilesh B Bahadure, Arun Kumar Ray and H Pal Thethi

Customer Behavior Patterns Analysis in Indian Mobile Telecommunications Industry 1165 Renuka Mahajan and Subhranil Som

Network Programmability using Software Defined Networking 1170

Vipin Gupta, Karamjeet Kaur and Sukhveer Kaur

Optimize the Software Testing Efficiency using Genetic Algorithm and Mutation Analysis 1174 Rijwan Khan and Mohd Amjad

Current State of the Research in Agile Quality Development 1177

Parita Jain, Laxmi Ahuja and Arun Sharma

Harmonic Diagnosis of Cascaded MultilevelIinverters 1180

Vivek Sharma, Shobhit Garg and Mohit Sharma

Finfet-one Scale Up CMOS: Resolving Scaling Issues 1183

D Sudha, Ch Santhirani, Sreenivasa Rao Ijjada and Sushree Priyadarsinee

Comparative Analysis of VSI, CSI and Zsi-Fed induction Motor drive System 1188

Vivek Sharma, Shobhit Garg, Parvesh Saini and Bhawna Negi

Predictive Analysis for Monitoring Mobile Patients using Wearable Sensors 1192

Sourabh Dudakiya, Heren Galani, Deven Thanki, Amaad Shaikh and Suvarna Pawar

Comparative Analysis of Big Data Management for Social Networking Sites 1196

Purti Beri and Sanjay Ojha

Internet of Things - Architecture, Applications, Security and other Major Challenges 1201 Litun Patra and Udai Pratap Rao

Development of Power Supply for Atmospheric Pressure Plasma Jet at Room Temperature for Bio-medical Applications 1207 Sadhan Chandra Das, Abhijit Majumdar, Subroto Mukherjee, Sumant Katiyal and T Shripathi

Analysis of Software Fault Detection and Correction Process Models with Burr Type XII Testing-Effort 1210 Md Zafar Imam, Sarwar Sultan and N Ahmad

Provisioning of Medical Application Analysis in Cloud 1214

Ajanta Das, Patralika Mitra and Debajyoti Basu

Statistical Tuning of Cost-231 Hata Model at 1.8 GHz over Dense Urban Areas of Ghaziabad 1220 Ranjeeta Verma and Garima Saini

Data Mining based Decision Making: A Conceptual Model for Public Healthcare System 1226 Anand Sharma and Vibhakar Mansotra

Security Issues in Cloud Computing for Healthcare 1231

Repu Daman, Manish M Tripathi and Saroj K Mishra

Implementation of Custom Exception and its Optimization in Java 1237

Anurag, Akanksha and Ankur Saxena

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

XXV

Virtual Instrumentation based Wireless Sensor Networks in Smart Grid Scenario 1243

Tabia Ahmad and Mohd Rihan

A Comparative Analysis of Effect of Different Domestic Light Sources on the Grid using Virtual Instrumentation 1249 Mohd Zuhaib, Zarin Akram and Mohd Rihan

Comparative Results of Zernike Moments and Pseudo-Zernike Moments 1254

Satish Kumar

A Comparative Study of Discriminative Approaches for Classifying Languages into Tonal and Non-tonal Categories at

Syllabic Level 1260

Biplav Choudhury, Chuya China Bhanja, Tameem Salman Choudhury, Aniket Pramanik and R H Laskar

Feature Fusion for Fake Indian Currency Detection 1265

Neeru Rathee, Arun Kadian, Rajat Sachdeva, Vijul Dalel and Yatin Jaie

Comparative Analysis of Agile Methods and Iterative Enhancement Model in Assessment of Software Maintenance 1271 Ruchika Malhotra and Anuradha Chug

Land Use and Land Cover Classification of LISS-III Satellite Image using KNN and Decision Tree 1277 Anand Upadhyay, Santosh Kumar Singh, Aditya Shetty and Zibreel Siddiqui

Personalized Hybrid Book Recommender System using Neural Network 1281

Ankit Kanojia,Om Prakash Verma and Hitesh Nirwan

Implementation of Cloud Computing and Big Data with Java based Web Application 1289 Ankur Saxena, Neeraj Kaushik, Nidhi Kaushik and Asit Dwivedi

YARN Versus MapReduce - A Comparative Study 1294

Sarah Shaikh and Deepali Vora

The Role of Verification and Validation in Software Testing 1298

Jogannagari Malla Reddy and S V A V Prasad

Adaptive e-Learning Systems for Visual and Verbal Learners 1302

Mahendra A Sethi, Santosh S Lomteand Ulhas B. Shinde

Model for Heterogeneous Data Integration on Cloud 1307

Deepali Tripathi and N K Joshi

A Review of Supervised Machine Learning Algorithms 1310

Amanpreet Singh, Narina Thakur and Aakanksha Sharma

Content Variations in Video Streams 1316

Mohsina Ishrat and Pawanesh Abrol

An Approach for Multimedia Software Size Estimation 1321

Sushil Kumar and Rajiv Nag

Ideation Governance for Human-Centered Innovation in Information Systems 1327

Sisira and Heath Keighran

Natural Language to Ontology Chart 1333

Ahmed Barnawi, Georgios Tsaramirsis, Seyed M Buhari and Naseir A Shatwan Aserey

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

xxvi

EEG-Based Emotion Recognition of Quran Listeners 1338

Anas Fattouh, Ibrahim Albidewi and Bader Baterfi

Towards Simulation of the Classroom Learning Experience: Virtual Reality Approach 1343

Georgios Tsaramirsis, Seyed M Buhari, Khalid Obaid AL-Shammari, Saud Ghazi, Mohd Saleem Nazmudeen and Kostandinos

Tsaramirsis

Word Prediction Algorithm in Resolving Ambiguity in Malay Text 1347

Siti Syakirah Sazali, Zainab Abu Bakar and Jafreezal Jaffar

Scrum to Manage Humans and Intelligent Systems 1353

Georgios Tsaramirsis and Abdullah Basahel

A Novel Approach for Precise Search Results Retrieval based on Semantic Web Technologies 1357 Usha Yadav, Gagandeep Singh Narula, Neelam Duhan and Vishal Jain

M-Learning Preferences and Learning Preferences 1363

Mazen Al Ismail, Tom Gedeon, Ramesh Sankaranarayana and Mohammad Yamin

Integrating Social Media and Mobile Apps into Hajj Management 1368

Abdullah H Almuhammadi, Hasan M Al-Ahmadi and Mohammad Yamin

Suitable Water Desalination Process Selection using AHP 1373

Morched Derbali, Anas Fattouh, Seyed M. Buhari, Georgios Tsaramirsis, Houssem Jerbi and Mohamed Naceur Abdelkrim

Big 5 Personality Traits Affect M-Learning Preferences in Different Contexts and Cultures 1378 Mazen Al Ismail, Tom Gedeon, Ramesh Sankaranarayana and Mohammad Yamin

Future of e-Business for Small and Medium Enterprises (SMEs) in Saudi Arabia 1383

Awad Saleh Alharbi

E-Training Technologies Application and their Impact on Performance Development 1390 Kamel Khoualdi and Muaz Al Ahmadi

Feasibility Analysis for Popularity Prediction of Stack ExchAnge Posts based on its Initial Content 1397 Devaraj Phukan and Aayush Kumar Singha

A Study of Keeping Low Cost in Sensors and Controller Implementations for Daily Activities 1403 Dimitrios Piromalis, Michail Papoutsidakis and Georgios Tsaramirsis

Using of intuitionistic Fuzzy Database for Medical Diagnosis 1408

P K Sharma, Shakil A Habib and Mohammad Yamin

Smart Parking: An IoT Application for Smart City 1412

IoAnnis Karamitsos, Georgios Tsaramirsis and Charalampos Apostolopoulos

A Smart Fusion Framework for Multimodal Object, Activity and Event Detection 1417

Girija Chetty and Mohammad Yamin

Electricity Measuring Device based on Internet of Things 1423

Yogesh Prabhakar Pingle, Sayli Dalvi, Sailee Chaudhari and Pooja Bhatkar

Performance of Hybrid Medium Access Control Scheme for Large Machine to Machine Networks: A Review 1427 Akshat Mokashi and B M Patil

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

xxvii

Health Care Application for Android Smartphones using Internet of Things (IoT) 1430 Rahul Girish Sharma, Parth Ajay Soni, Keyur Rashmikant Shah and Bhavesh Narayanbhai Panchal

Human Detection by Measuring Its Distance Based on IoT 1434

Yogesh Pingle, Neha C Signh, Vaishali Shirsath, Tejaswini Ogale and Shraddha S Sandimani

Database-As-A-Service for IoT 1436

Swati Saigaonkar, Fasih Khatib, Anand Gogawale and Pratik Sontakke

Efficient Routing Protocol for MANET 1439

Kavita Mhatre and Uday Pandit Khot

Digital Forensic Investigations(Dfi) using Internet of Things(IoT) 1443

Kritika Dhar and Yogesh Pingle

The Combined Scheme of Selective Mapping and Clipping for PAPR Reduction of OFDM 1448 Kavita Mhatre and Uday Pandit Khot

IoT for Music Therapy 1453

Yogesh Pingle

IoT in Agriculture 1456

Jeetendra Shenoy and Yogesh Pingle

Medical Image Sequence Compression using Fast Block Matching Algorithm and Spiht 1459 Jayant Kumar Rai, Chandrashekhar Kamargaonkar and Monisha Sharma

Block based Svd Approach for Additive White GaussiAn Noise Level Estimation in Satellite Images 1464 K Sateesh Kumar, S Varadarajan and G Sreenivasulu

Utility based Underlay Power Control Approach for Dynamic Spectrum Allocation 1469 Krishna Kant Singh, Chandra Shekhar Singh and Debjani Mitra

A Comparative Analysis for Haar Wavelet Efficiency to Remove GaussiAn and Speckle Noise from Image 1473 Shallu, Pankaj Nanglia and Yogendra Narayan

Real Time Smart Home Automation based on Pic Microcontroller, Bluetooth and Android Technology 1478 Anand Nayyar and Vikram Puri

A Review of Arduino Boards, Lilypads & Arduino Shields 1485

Anand Nayyar and Vikram Puri

Ant Colony Optimization- Computational Swarm intelligence Technique 1493

Anand Nayyar and Rajeshwar Singh

Low Power Area Efficient Alu with Low Power Full Adder 1500

Usha S, M Rajendiran and A Kavitha

Design and Implementation of 8-Bit Ancient Vedic Multiplier using Serf Technique 1506 N Yogeshwari, P Vairava Raja and A Kavitha

Low Power-Area Efficient Compressor Design for Fast Digital Arithmetic Integrated Circuits 1510 P Selvan, K Dhivyakala and A Kavitha

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

xxviii

Transistor Level Implementation of A 8 Bit Multiplier using Vedic Mathematics in 180Nm Technology 1514 Selvakumari, M Jayaprakash and A Kavitha

Low Power Encoder for Flash Adc Architecture 1521

R Sindhuja , V Navaneethakrishnan and A Kavitha

Hybrid Wave-Pipelined Adder 1525

P Senthil, M Mohanraj and A Kavitha

A Novel and Compact Uwb-Mimo Diversity Antenna for Wireless Applications 1528

Madan Kumar Sharma, M Kumar, J P Saini, Shashank Sukla and Nikhil Bhati

A Novel Two Stage Improved Spectrum Sensing for Cognitive Radio Systems 1533

Bhawna Ahuja and Gurjit Kaur

Throughput Analysis of Energy Detection based Spectrum Sensing in Cognitive Radio 1539 Avantika Bhati and Bhawna Ahuja

Design and Optimization of Ultra Wide Band Monopole Antenna with DGS for Microwave Imaging 1545 Madan Kumar Sharma, M Kumar, J P Saini, ShashAnk Sukla and Nikhil Bhati

Lifetime Enhancement by Optimization of Grid using Mobility and Energy Proficient Clustering Technique of Wireless

Sensor Network 1550

Manpreet Singh and Tejinder Kaur

Control of Mean Arterial Pressure using Fractional Pid Controller 1556

Shabana Urooj and Bharat Singh

Geometric invariant Feature Extraction of Medical Images using Hu's Invariants 1560

Satya P Singh and ShabAna Urooj

Equivalent Circuit Modelling using Electrochemical Impedance Spectroscopy for Different Materials of Sofc 1563 Shuab Khan and Syed Mohd Aijaz Rizvi and Shabana Urooj

An Efficient Barcode Classification Using Morphological Operations 1568

R Anitha, S Jyothi and P RameshBabu

Comparative Study of Encryption Algorithm over Big Data in Cloud Systems 1571

K Sekar and M Padmavathamma

Hybrid Model based Uncertainty Analysis for Geospatial Metadata Supporting Decision Making for Spatial Exploration 1575 Manjula K R and Gangothri Rajaram

Prawns Identification based on Edge Detection and Feature Vectors 1580

G Nagalakshmi, S Jyothi and D M Mamatha

Performance Analysis of Classification Algorithms under Different Datasets 1584

S Jyothi and A Swarupa Rani

Mathematical Model of MHD Unsteady Flow Induced by a Stretching Surface Embedded an a Rotating Casson Fluid with

Thermal Radiation 1590

G Sarojamma, K Sreelakshmi and B Vasundhara

The Application of Semantic Web on Agricultural Domain State of the Art Survey 1596

Naren J and Mohanraj I

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

xxix

An Architectural Framework for e-Agricultural System 1600

Naren J and Mohanraj I

A Semantic Feedback on Student's Performance with Data Mining Techniques: State of the Art Survey 1605 Naren J, Vaishnavi Raghavendran and Kirthika Ashok Kumar

Comparison of Machine Learning Algorithms for Classification of Penaeid Prawn Species 1610 V Sucharita, P Venkateswara Rao and S Jyothi

A Study on different gene expressions using an evolutionary optimization 1614

P Bhargavi and K Lohitha Lakshmi

Construction of Gazetteers from Geo Big Data using Machine Learning Technique on Hadoop 1619 Manjula K R and S Pradeepa

Performance of Synthetic Minority Oversampling Technique on Imbalanced Breast Cancer Data 1623 K Usha Rani, G Naga Ramadevi and D Lavanya

Multi Stage Classification and Segmentation of Brain Tumor 1628

Praveen G B and Anita Agrawal

Combination of Scan and Trace Buffers using the Mixed-Granularity Method for Efficient Signal Selection 1633 Agalya R and Saravanan S

RNA Interference (RNAi) Technology of microRNAs Targeting Juvenile Hormone Epoxide Hydrolase (JHEH) Gene for

Increased Silk Productivity in Bombyx MORI 1636

D M Mamatha1, V V Satyavathi, S Jyothi and K Swetha Kumari

ICT based Special Education Assessment Framework for inclusive Education in India 1644

Kumar Mandula, Ramu Parupalli, Annie Joyce Vullamparthi, Ch AS Murty, E Magesh and Sarat Chandra Babu Nelaturu

Wavelet Transform based Multiple Features Extraction for Detection of Epileptic/ Non-Epileptic Multichannel EEG 1648 Manish N Tibdewal, Himanshu R Dey, Mahadevappa M, AjoyKumar Ray and Monika Malokar

Power Line and Ocular Artifact Denoising from EEG through Notch Filter and Wavelet Transform 1654 Manish Narayandas Tibdewal, Mahadevappa, M Ajoy Kumar Ray, Monika Malokar and Himanshu R Dey

Analysis of Piezoelectric Buzzers as Vibration Energy Harvesters 1660

Shruti Jain, Ritendra Mishra and C Durgaprasad

Utilization of Exhaust Heat from Automotive using thermopile 1665

Shruti Jain, Akshay Kataria and Palak Kaistha

Clustering using Fuzzy Logic in Wireless Sensor Networks 1669

Manjeet Singh, Gaurav, Surender Soni and Vicky Kumar

Improving Network Lifetime & Reporting Delay in Wireless Sensor Networks using Multiple Mobile Sinks 1675 Manjeet Singh, Gaurav, Vicky Kumar and Ashok Kumar

Localization using Varying Anchor Range in Randomly Distributed Wireless Sensor Network 1679 Manjeet Singh, Gaurav, Vicky Kumar and Ashok Kumar

Optimum Design and Simulation of Capacitive Pressure Sensor 1685

Gargi Khanna and Khalilullah Ibrahim

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

XXX

A Comparative Study of Energy Harvesting Circuit for Piezo Electric Energy Harvesting 1689 Gargi Khanna and Prateek Asthana

Implementation of Clustering in Load Sharing Routing Algorithm to Increase The Lifetime of Wireless Sensor Networks 1695 Vinith Chauhan and Surender Soni

Classification of Breast Lesions based on Laws Feature Extraction Techniques 1700

Shruti Jain, Jitendra Virmani and Sahil Bhusri

Classification of Kidney Lesions using Gabor Wavelet Features 1705

Shruti Jain, Jitendra Virmani and Shailja Rana

Interactive Module for Dyslexic Children 1710

Pushp Bajaj, Basudha Dewan, Arjun Chauhan and Meenakshi Sood

Braille Hand Glove - A Real Time Translation and Communication Device 1714

Rahul Saxena, Tarunam Mahajan, Pragya Sharma and Meenakshi Sood

Comparison of Spatial Prediction Methods for Zn and Fe 1719

Garima Shukla, G C Mishra and S K Singh

Analysis of Rainfall and Temperature Variability and Trend Detection: A Non Parametric Mann Kendall Test Approach 1723 Ghanshyam T Patle, A Libang and Sangeeta Ahuja

Solar Cell: A New Era 1728

Animesh Jain, Abhishake Jain and Abhishek Jain

Performance Evaluation of Energy Efficient Routing Protocols in Ad-hoc Network using Different Metrics 1732 Joni Birla, Shreyta Raj, Kamal Kumar Rangaand and Mahesh Kumar

Apaf-1 and Its Isoforms Structural Relationship 1736

Palak Ahuja

Germplasm Evaluation using Cluster Ensemble 1741

Sangeeta Ahuja, Mudenge Febrice, H L Raiger, A K Choubey and O P Sharma

WSN based intelligent Lighting Control using Android 1743

Tanmay K Dalal, Vivek Deodeshmukh

Soft Computing based Transformer Incipient Fault Detection 1747

Tapsi Nagpal and Yadwinder Singh Brar

Cross-Platform Application Development for Smartphones: Approaches and Implications 1752 Kavita Taneja, Harmunish Taneja and Rohit K Bhullar

Control Strategies for Parallel Operation and Load Sharing Between the Inverters 1759

Ruchika, A P Mittal and D K Jain

NLP and Ontology based Clustering - An Integrated Approach for Optimal Information Extraction from Social Web 1765 Kavita Taneja, Harmunish Taneja and Shabina Dhuria

Performance Comparison of Commercial Vmm: Esxi, Xen, Hyper-V & Kvm 1771

Varun Manik and Deepak Arora

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

xxxi

Secham: Secure and Efficient Cluster Head Selection Algorithm for MANET 1776

Renu Popli, Kanwal Garg and Sahil Batra

Modified Method for Reducing the Order of Lti System using Factor Division Algorithm 1780 Sharad Kumar Tiwari and Gagandeep Kaur

A Comparative Analysis of Task Scheduling Approaches for Cloud Environment 1787

Anurag Jain and Rajneesh Kumar

Application of Soft Computing Technique for Optical Add/Drop Multiplexer in Terms of Osnr and Optical Noise Power 1793 Parveen Bajaj, A K Goel and Harbhajan Singh

An Efficient Load Balancing Algorithm for Cloud Computing using Dynamic Cluster Mechanism 1799 Upasana Lakhina, Niharika Singh and Ajay Jangra

Factors Affecting the Accuracy of Automatic Signature Verification 1805

Kamlesh Kumari and V K Shrivastava

Diagnosis and Evaluation of Adhd using Naive Bayes and J48 Classifiers 1809

K Krishnaveni and E Radhamani

Model of Gestational Diabetes - Impact on Placental Volume and the Fetus Weight 1815

Richa Gupta and Deepak Kumar

Intuitionistic Trapezoidal Fuzzy Hybrid Aggregation (Itrfha) Operator: An Algorithm for the Selection of Suitable

Treatment for Lung Cancer 1820

Kumar Vijay and Jain Sarika

Dynamic Intuitionistic Fuzzy Weighting Averaging Operator: A Method for the Diagnosis of the Type of Brain Tumor 1826 Kumar Vijay, Arora Hari and Pal Kiran

Mathematical Modelling of Mucus Transport in Diseased Airways with Effects of Constriction of Airway Diameter and

Mucus Viscosity 1832

Pankaj Kumar, Arti Saxena and A P Tyagi

Mathematical Modeling of Diseased Human Knee Joint under Jumping Condition 1837

Kapil Shekhar, Arti Saxena and A P Tyagi

Modeling for Diabetes Detection with the Help of Epinephrine Behavior 1842

Deepak Kumar and Sandhya

Measures of Cosine Similarity Intended for Fuzzy Sets, Intuitionistic and Interval-Valued Intuitionistic Fuzzy Sets with

Application in Medical Diagnoses 1846

Pratiksha Tiwari and Priti Gupta

Enhanced Weighted Pagerank Algorithm based on Contents and Link Visits 1850

Radheshyam Prajapati and Suresh Kumar

Hadoop-MapReduce: A Platform for Mining Large Datasets 1856

Maedeh Afzali, Nishant Singh and Suresh Kumar

Mathematical Analysis of Bronchitis infection 1861

Vinod Kumar Bais and Deepak Kumar

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

xxxii

The Probability Strata: Algorithmic Approach to DRDoS Defense 1865

Sakshi Yadav, Soumya Bhatnagar, Jaskaran Singh, Khushboo Goyal and Amita Yadav

A Review of Some Generalized Fuzzy Operators in Decision Making Processes 1871

Vijay Kumar, Arora Hari and Kiran Pal

Agretl: Tool for ETL Activities for Agriculture Domain 1877

Rajni Jain, Sonal Sharma and Prateek Mittal

Weighted Association Rule Mining for the Occurrence of the Insect Pest Helicoverpa Armigera(Hübner) Related with a

BIotic Factors on Cotton 1884

M Pratheepa, Abraham Verghese and H Bheemanna

In-Silico Identification of Inhibitors for Controlling Rice Blast 1888

A K Mishra, Amrender Kumar and A K Jain

Software Process Model for Agricultural Productivity Analysis 1893

Rajni Jain and S Surchand Singh

Online Software for Forewarning of Onion Thrips 1898

Subashchandra, Alka Arora, Amrender Kumar, Rajni Jain, Sudeep Marwaha and Tanuj Misra

Applying Data Mining Techniques to Predict Yield of Rice in Humid Subtropical Climatic Zone of India 1901 Niketa Gandhi and Leisa Armstrong

Agricultural Decision Support Framework for Visualisation and Prediction of Western Australian Crop Production 1907 Leisa J Armstrong and Sreedhar A Nallan

Smart Farming using Arduino and Data Mining 1913

Akshay Naik, Mayur Beldar, Ankita Patil and Sachin Deshpande

Eresource for Dus Characterization of Maize Varieties and I+E471nbreds 1918

Mukesh Kumar, Arun Gupta, S Amrapali, P K Agrawal, V Mahajan, L Kant and J K Bisht

Interpolating Climate Data using wTICD: A Web Based Tool 1923

Ipshita Roy Chowdhury, Anshu Bharadwaj, S N Islam, K K Chaturvedi, Prachi Mishra Sahoo and R K Paul

Determinants of Adoption of IPM in Cauliflower Cultivation in Haryana State 1930

Usha Ahuja, D B Ahuja, Rajni Jain, Digvijay Singh Negi and Prem Narayan

A Hybrid Linked Data Query Execution Approach using Backlinks 1936

Samita Bai, Sharaf Hussain and Shakeel Khoja

Inter Digital Transducer Modelling through Mason Equivalent Circuit Model Design and Simulation 1941 Dipti Mishra, Abhishek Singh, D M Akbar Hussain, Sweety Dabas and Minal Dhankar

Circuit Segmentation using GP in FPGA's Technology Mapping 1947

Atul K Srivastava, Himanshu Shishodia and Himanshu

Detecting Imposture in Mobile Application 1952

Sunil Ghadge, Kiran Bhausaheb Sawant, Monika Balasaheb Kadam, Vaibhav Pandurang Khedekar and Aditya Dagadu Chavan

GSM based Advanced Noticeboard Display 1958

B S Chowdhry, Azam Rafique Memon, Sonia Bibi, Rizwan Ali Shah and Quratulain Pathan

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

xxxiii

Projection Virtual Keyboard using Echo Sound Technique and Image Processing for Mobile Phone 1963 Amid Patel, Zeenat Shaikh, Harshal Ahire and Kumari Neha

Grade Identification of Astrocytoma using Image Processing - A Literature Review 1968

Chaitanya Singla and Sheifali Gupta

Lvdci Io Standard based Green Parallel integrator Design in Green Communications on FPGA 1974 Bishwajeet Pandey, Love Kumar, Tanesh Kumar and Manoj Kumar

Automated Microcalcification Analysis using Breast Mammogram 1980

Sushma S J and Prasanna Kumar S C

High Performance Design of 100Gb/S Dpsk Optical Transmitter 1984

Bhagwa Das, Mohammad Faiz Liew, Nor Shahida Mohd Shah and Dil M Akber Hussain

Low Power & High Performance Implementation of Multiplier Architectures 1989

Gaurav Verma, Oorja M Srivastava, Sushant Shekhar, Shikhar Maheshwari and Sukhbani Kaur Virdi

Matlab based FPGA Power Validation Utility 1993

Gaurav Verma, Sushant Shekhar, Shikhar Maheshwari, Sukhbani Kaur Virdi and Oorja M Srivastava

Automated Red Light Enforcement Camera for Traffic Control 1997

Gaurav Verma, Sonam Chugh, Manish Kamti, Ipsita Singh and Deval Verma

Smart Cafe System Implementation 2001

Swati Gautam, Chetan Arora, Priyank Sharma and Gaurav Verma

Low Power Implementation of Telecommunication Switching Architectures for Network on Chip 2004 Gaurav Verma, Shikhar Maheshwari, Sushant shekhar and Sukhbani Kaur Virdi

Street Light Power Reduction System using Microcontroller and Solar Panel 2008

Ashish Sharma, Gaurav Verma, Sandeep Banarwal and HimanshuVerma

Effects of Haze Weather Condition on 980Nm Free Space Optical Communication in Malaysia 2011 Syed Mohammad Ali Shah, Tahir Riaz, Muhammad Shafie Abd Latiff, and Bhawani Shankar Chowdhry

Rough Set Theory Approch to Find Why Emloyees of Government and Non Government Orgaganisation Running after

Legal Justice 2018

Sujogya Mishra, Shakti Prasad Mohanty and Sateesh Kumar Pradhan

Implementation of Novel Web based Data Extraction using Template Extraction Technique and Non-Information Filtering 2023 Jaishree G Waghmare and Vikas B Maral

Wireless Braking System in Train 2027

Aayushi Gautam, Khushhali Goel, Divya Bareja, Ipsita Singh and Gaurav Verma

Investigation of Underwater acoustic Modems: architecture, Test Environment & Performance 2031 Muhammad Yousuf Irfan Zia, Pablo Otero, Abdul Moid Khan and Javier Poncela

Spoofing Detection in Face Recognition System:A Review 2037

Manpreet Bagga and Baljit Singh

Automated Skull Stripping in Brain Mr Images 2043

Srinivasan Aruchamy, Ravi Kant Kumar, Partha Bhattacharjee and Goutam Sanyal

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

xxxiv

Real Time Implementation of Path Planning Algorithm with Obstacle Avoidance for Autonomous Vehicle 2048 Kiran Rafique, Sugandh, Batool, Azam Rafique and Syed Muhammad Zaigham Abbas Shah

Low Power Implementation of FSM based Vending Machine on FPGA 2054

Gaurav Verma, Ashish Papreja, Sushant Shekhar, Shikhar Maheshwari and Sukhbani Kaur Virdi

Evolutionary Trends in Embedded System Design 2059

Priyank Sharma, Himanshu Verma, Vivek Negi, Ashish Sharma, Sandeep Banarwal and Gaurav Verma

Study of Diagnosis of Diabetes Mellitus under Healthcare 2063

Pawar Suvarna Eknath and Sikchi Smita

Internet of Things (IoT) Enabled Smart Animal Farm 2067

Muhammad Hunain Memon, Wanod Kumar, Azam Rafique Memon, Bhawani S Chowdhry, Muhammad Aamir and Pardeep

Kumar

Smart Home System based on Internet of Things 2073

Himanshu Verma, Madhu Jain, Khushhali Goel , Aditya Vikram and Gaurav Verma

Casting Multipath Behaviour into Oantalg to Improve QoS 2076

Amanpreet Kaur, Vijaypal S Dhaka and Gurpreet Singh

Model-View-Controller Pattern in Bi Dashboards: Designing Best Practices 2082

Prathamesh P Churi, Sharad S Wagh, Medha Kalelkar and Deepa Kalelkar

Implementation of A Crossbar Switch for Noc on FPGA 2087

Sukhbani Kaur Virdi, Gaurav Verma, Sushant Shekhar, Shikhar Maheshwari and Oorja M Srivastava

Design of Low Power and Secure Implementation of SBOX for AES 2092

Harshita Prasad, Divya Sharma, Jyoti Kandpal and Gaurav Verma

Estimation of Age from Human Facial Features 2098

Poonam Shirode and S M Handore

Squashed Amplifier Integration in Dual Slit-Cut Equilateral Triangular Microstrip Antenna 2104 Prabal Pratap and Ravinder Singh Bhatia

Privacy Preserving Data Mininga State of the Art 2108

Mamta Narwaria and Suchita Arya

Clouds and Big Data : A Compelling Combination 2113

Pramila Joshi

Detection and Update Method for Attack Behavior Models in Intrusion Detection Systems 2119 Mohd Anuaruddin Bin Ahmadon, Zhaolong Gou, Shingo Yamaguchi and B B Gupta

Comparative Analysis of Features based Machine Learning Approaches for Phishing Detection 2125 B B Gupta and Ankit Kumar Jain

Cross-Site Scripting (Xss) Worms in Online Social Network (Osn): Taxonomy and Defensive Mechanisms 2131 Pooja Chaudhary, B B Gupta and Shashank Gupta

Analogous Study of 4G and 5G 2137

Tushant Sharma and Sarika Agarwal

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

XXXV

Secure Data Transference Architecture for Cloud Computing using Cryptography Algorithms 2141 Manju Khari, Manoj Kumar and Vaishali

Analysis of Software Security Testing using Metaheuristic Search Technique 2147

Manju Khari, Manoj Kumar and Vaishali

Security Outlook for Cloud Computing: A Proposed Architectural based Security Classification for Cloud Computing 2153 Manju Khari, Sana Gupta and Manoj Kumar

Comprehensive Study of Web Application Attacks and Classification 2159

Manju Khari, Manoj Kumar, Vaishali and Sonam

Internet of Things : Proposed Security Aspects for Digitizing the World 2165

Manju Khari, Manoj Kumar, Sonakshi Vij, Priyank Pandey and Vaishali

Appraising A Decade of Research in the Field of Big Data the Next Big Thing 2171

Satyajee Srivastava and Neetu Chaudhari

Analysis of Various Information Retrieval Models 2176

Manju Khari, Amita Jain, Sonakshi Vij and Manoj Kumar

Embedding Security in Software Development Life Cycle (SDLC) 2182

Manju Khari, Vaishali and Prabhat Kumar

Web-Application Attacks: A Survey 2187

Manju Khari, Parikshit Sangwan and Vaishali

5G: Evolution of A Secure Mobile Technology 2192

Sonakshi Vij and Amita Jain

Word Sense Disambiguation using Fuzzy Semantic Relations 2197

Amita Jain, Devendra K Tayal and Sonakshi Vij

Network Forensics: Methodical Literature Review 2203

Gulshan Shrivastava

Smartphone Nabbing: Analysis of Intrusion Detection and Prevention Systems 2209

Sonakshi Vij and Amita Jain

Realistic Golf Flight Simulation 2215

Sumukha R M, Naresh V, Abhishek Reddy, Shashidhar G Koolagudi and Fathima Afroz

Amalgamation of Web Analytics with Cloud Computing 2220

Himani Singal and Shruti Kohli

An Overview: Context- Dependent Acoustic Modeling for Lvcsr 2223

Priyanka Sahu and Mohit Dua

A Review on the Optimization Techniques for Bio-inspired Antenna Design 2228

Rohit Anand and Paras Chawla

Network Forensics: Today and Tomorrow 2234

Gulshan Shrivastava, Kavita Sharma and Reema Kumari

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

xxxvi

Stochastic Programming and C-SOMGA: Animal Ration Formulation 2239

Pratiksha Saxena, Neha Khanna and Dipti Singh

Performance Enhancement of Tunneling Accelerometer using Cuckoo Search Algorithm 2245 Akshit Kanda, K P S Rana, Akarsh Kumar, Vineet Kumar, Amanvir Singh Sidana and Anchan Saxena

Reliable Techniques of Leakage Current Reduction for SRAM-6T Cell :A Review 2251 Ankita Chauhan, D S Chauhan and Neha Sharan

Optimization for Animal Diet Formulation: Programming Technique 2255

Pratiksha Saxena,Vinod Kumar and Rajkumar

Optical Properties of Zinc Oxide (Zno) Thin Films for Applications in Optical Devices: Matlab Simulation 2261 Bhawana Joshi, Pratiksha Saxena and Natasha Khera

Performance Analysis of Queueing Model using Supply Chain Management 2266

Jitendra Kumar and Vikas Shinde

Improvement in Efficiency of PV Module using Soft Computing based MPPT 2273

Divya and Vivek Shrivastava

An Improve Efficiency of Li-Ion Batteries using Optimization Technique 2279

Vivek Shrivastava and Nishi Swal

Development of A Generic Structural Feature Extraction Method for Printed Gurumukhi and Similar Scripts 2285 Roop Sidhu, Jaspreet Singh Madan and Dharam Veer Sharma

Survey on Acoustic Modeling and Feature Extraction for Speech Recognition 2291

Poonam Sharma and Anjali Garg

An Affix Removal Stemmer for Gujarati Text 2296

Nikita Desai and Bijal Dalwadi

Sentiment Analysis of e-Commerce and Social Networking Sites 2300

Ashna Goel, Shubhi Mittal and Rachna Jain

A Roadmap to Auto Story Generation 2306

Anushree Mehta, Resham Gala and Lakshmi Kurup

Listening Deaf through Tactile Sign Language 2311

Urmila Shrawankar and Sayli Dixit

Question Systematization using Templates 2316

Urmila Shrawankar and Komal Pawar

Construction of News Headline from Detailed News Article 2321

Urmila Shrawankar and Kranti Bhaskarrao Wankhede

A Study of Different Stemmer for Sindhi Language based on Devanagari Script 2326

Sangita D Makhija

An Optimized Rule based Approach to Extract Relevant Features for Sentiment Mining 2330 Ashwini Rao and Ketan Shah

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

xxxvii

Buliding A New Model for Feature Optimization in Agricultural Sectors 2337

Soumya Sahoo, Sushruta Mishra, Bijayalaxmi Pand and Nachiketa Jena

Optimal Controller Selection in Software Defined Network using a Greedy-Sa Algorithm 2342 Kshira Sagar Sahoo

Honeypot-Based intrusion Detection System: A Performance Analysis 2347

Janardhan Reddy Kondra, Santosh Kumar Bharti, Sambit Kumar Mishra and Korra Sathya Babu

Twitter Truths: Authenticating Analysis of Information Credibility 2352

Samiksha Agarwal and Ram Chatterjee

Novel Power Efficient Bit-Wise Sequence Detector (Non-Overlapping) 2358 Naman Sharma, Rajat Yadav, Upanshu Saraswat, Rajat Sachdeva and Gunjeet Kaur

Highlighting Image Tampering by Feature Extraction based on Image Quality Deterioration 2362 Surbhi and Parvinder Singh Sandhu

Augmented Reality Cognitive Paradigm 2368

Utkarshani Jaimini and Mayur Dhaniwala

Javascript Empowered Internet of Things 2373

Utkarshani Jaimini and Mayur Dhaniwala

Low Voltage Low Output Impedance High Bandwidth Flipped Voltage Follower Cell 2378 Abhishek, Ankit Kumar, Rohit Kumar Sharma and Shilpa Agrawal

Cross-Country Path Finding Algorithm using Hybridization of Bat and Cuckoo Search 2382 Monica Sood and Subhita Menon

Analog Voltage Comparator based on Differential Circuit 2385

Shukla Jagrut, Ankit Kumar, Abhishek Shrivastava and Shilpa Agarwal

Research and Analysis of Advancements in BAT Algorithm 2391

Shabnam Sharma, Ashish Kr Luhach and Kiran Jyoti

Emotion Detection of Audio Files 2397

Renu Taneja, Javesh Monga, Purva Marwaha and Aman Bhatia

Analytical Evaluation for the Enhancement of Satellite Images using Swarm intelligence Techniques 2401 Geetika Arora, Vartika Singh and Gourav Kumar

Opinion Mining using Sentiful 2406

Komal Dhingra, Sumit Yadav and Dharna Kaushik

Intelligent Aircraft Landing Decision Support System using Artificial Bee Colony 2412 Jasmeet Singh and Samiksha Goel

Traffic Rerouting in Dynamic Environment Inheriting Ant's Perception Radius 2417 Pushkar Goel, Palak Madan and Samiksha Goel

Software Quality Score Board based on SQA Framework to Improvise Software Reliability 2421 Anshu Bansal and Sudhir Pundir

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

xxxviii

Review of Recent Load Balancing Techniques in Cloud Computing and BAT Algorithm Variants 2428 Sinha Sheikh Abdhullah, Shabnam Sharma, Kiran Jyoti and U S Pandey

Exploration of Latent Fingerprints Enhancement using Soft Computing Techniques 2432 Richa Jindal and Sanjay Singla

Robust Tracking for Multiple Objects in A Video 2437

Abhineet Sharma

Preoperative Planning Simulator with Haptic Feedback for Raven-II Surgical Robotics Platform 2443 Vijyant Agarwal, Akshay Bhardwaj and Aarshay Jain

A Comparative Analysis Between MRAC and FMRAC for An Unstable System 2449

Anil Kumar Yadav and Prerna Gaur

An Extended Kalman Filter for Real Time Estimation and Control of Omni Robot with Stochastic Noise 2456 Vijyant Agarwal

Image Acquisition and Colour based Segregation of Objects using Labview 2461

Prerna Gaur, Poras Khetarpal, Abhijeet Bharadwaj and Rajat Sood

A Enhanced Hybrid Fuzzy Logic Controller for Robotic Manipulator 2465

Richa Sharma, Prerna Gaur and A P Mittal

Investigative Analysis of Bio-inspired Robust Controller for A CNC System 2471

Shyama Kant Jha and Anuli Dass

Various Intelligent Control Techniques for Attitude Control of An Aircraft System 2476 Prerna Gaur, Anil Kumar Yadav and Shyama Kant Jha

Master Slave Tracking Control using Adaptive Least Squares Filter 2481

Akshay Bhardwaj, Aarshay Jain, Vijyant Agarwal and Harish Parthasarathy

Design, Analysis and Performance of Zeta Converter in Renewable Energy Systems 2487 Prerna Gaur, Ahana Malhotra, Charvi Malhotra and Shitiz Vij

Comparison of Different topologies of Fuzzy Logic Controller to Control D-STATCOM 2492 Rachana and Anuj

Pervasive Monitoring of Carbon Monoxide and Methane using Air Quality Prediction 2498 Sunil Karamchandani, Aaklin Gonsalves and Deven Gupta

Analysis and Detection of Eventful Messages in Instant Messengers 2503

Darshana Desai and Abhijit R Joshi

Review of Knowledge Representation Techniques for Intelligent Tutoring System 2508

Neha Mendjoge, Abhijit R Joshi and Meera Narvekar

A Review on Student Modeling Approaches in ITS 2513

Lakshmi D Kurup, Abhijit Joshi and Narendra Shekhokar

Low Power VLSI Implementation of Data Compression for Multimedia Devices using CDF M/N DWT on to Resource

Constrained Dynamically Reconfigurable Memories 2518

Chetan H and Indumathi G

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

xxxix

A Review on 3D Network on Chip : Architecture Design and Optimization of Multicore Media Applications 2524 Sridevi S and Indumathi G

VLSI Global Routing Algorithms: A Survey 2528

Geetanjali Udgirkar and G Indumathi

Study of Multigrain Mrpsoc 2534

Kavitha V and K V Ramakrishnan

Detection of Epileptic Seizure Patterns in EEG through Fragmented Feature Extraction 2539 Piyush Swami, Tapan K Gandhi, Bijaya K Panigrahi, Anirudha Kumar and Durga Siva Teja Behara

Emotion Classification from EEG Signals 2543

Soumava Kumar Roy, Chetan Ralekar and Tapan K Gandhi

Efficient Recognition of Ictal Activities in EEG through Correlation based Dimensionality Reduction 2547 Piyush Swami, Tapan K Gandhi, Sanjeev Nara and Bijaya K Panigrahi

Resting State fMRI Analysis using Seed based and ICA Methods 2551

Neha and Tapan K Gandhi

Virtual Repository of Microscopic and Neuroendoscopic Instrumentation in Neurosurgery 2555 Ramandeep Singh, Britty Baby, Sneh Anand and Ashish Suri

Classification of Post Contrast T1 Weighted MRI Brain Images using Support Vector Machine 2560

B K Panigrahi, Tanvi Gupta and Tapan K Gandhi

Serious Games: An Overview of the Game Designing Factors and their Application in Surgical Skills Training 2564 Britty Baby, Vinkle Srivastav, Ramandeep Singh, Ashish Suri and Subhashis Banerjee

Spam Detection on Twitter : A Survey 2570

Prabhjot, Anubha Singhal and Jasleen Kaur

Platelet Count using Image Processing 2574

Prabhjot, Vishakha Sharma and Nishtha Garg

Building A Framework for Network Security Situation Awareness 2578

Pardeep Bhandari and Manpreet Singh

FCM based Conceptual Framework for Software Effort Estimation 2584

Sangeeta Bhandari

Optimized Sentiment Analysis Tool 2589

Nayonika Sharma, Priyanka Chugh (Shivanka), Chetna Sharma and S Indu

Development of Anti-Spam Technique using Modified K-Means & Naive Bayes Algorithm 2593 Amita Jain, Kanak Meena and Devendra K Tayal

Impact of Feature Selection and Engineering in the Classification of Handwritten Text 2598 Anupama Kaushik, Digvijay Singh Latwal and Himanshu Gupta

Survey of Fuzzy based Techniques to Address Class Imbalance Problem 2602

Prabhjot and Anshul Gupta

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

Edge Detection using BAT Algorithm 2605

Hargun Kaur Sethi, Rinky Dwivedi, Monisha Rohilla, Vaani Garg and Yashika Nagpal

Hybrid FCM PSO Algorithm with City Block Distance 2609

Varshini Ailavajhala, Saujanya Sengar, Mansi Sharma and Jyoti Arora

Application of Log based Kernel Function in Image Segmentation 2615

Jyotsna Nigam, Dinesh Rai and Meena

Techniques based Upon Boosting to Counter Class Imbalance Problem-A Survey 2620

Prabhjot and Vasu Negi

Performance Analysis and Comparison of A.O.D.V., D.S.R and D.S.D.V Routing Protocols for Wimax Mesh Networks 2624 Arjun Narain, Tripti Sharma and Prateek Handa

Analyzing Twitter Sentiments through Big Data 2628

Monu Kumar and Anju Bal

Analysis of Cosine Alpha Window Function using Linear Canonical Transform 2632

Navdeep Goel

Analysis of Sampling based Classification Techniques to Overcome Class Imbalancing 2637

Deepika Singh and Anjana Gosain

Optimized Association Rules for MRI Brain Tumor Classification 2644

Poonam Sonar and Udhav Bhosle

Embedding PCA Encrypted Audio Data within a Digital Video using LSB Steganography 2650 S R Gosalia , Shaan Shetty and Ravathi A S

Novel Algorithm for Embedding Audio Watermark in Images 2654

Ratnababu Mamidi, Govind Dadhich, Eisenhower D'souza, Febin George, Stephen Poojari and S Sujana Chowdary

Image Classification using Visible RGB bands 2660

Pragati Dwivedi

Reversible Watermarking for Colored Medical Images using Histogram Shifting Method 2664 Hitesh Shripad Nemade and Vishakha Kelkar

Secure Semi-Blind Steganography using Chaotic Transforms 2669

Janhavi Kulkarni, Karan Nair, Mansi Warde, Vedashree Rawalgaonkar and Jonathan Joshi

Efficient Stereo and 2D Object Tracking 2674

Kushal Kardam Vyas and Jonathan Joshi

Performance Evaluation of Data Hiding Techniques 2680

Pragati Upadhyay, Sheetal Mahadik and Arathi Kamble

Dynamic Video Steganography using LBP on CIELAB based K-Means Clustering 2684

Diljeet Singh and Navdeep Kanwal

High Performance Parallel Processing to Cluster Visually Similar Image Data Sets 2690

Ketan Kanere, Hitesh Mhatre and Arjun Jaiswal

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

xli

TCP/IP Network Protocols- Security Threats, Flaws and Defense Methods 2693

Manan Shah, Vishwa Soni, Harshal Shah and Meghav Desai

Entropy based Image Watermarking using Discrete Wavelet Transform and Singular Value Decomposition 2700 Gurwinder Singh and Navdeep Goel

A Comparative Review of Various Approaches for Feature Extraction in Face Recognition 2705 Gurpreet Kaur and Navdeep Kanwal

Challenges in Recognition of Devanagari Scripts Due to Segmentation of Handwritten Text 2711 Ashok Kumar Bathla, Sunil Kumar Gupta and Manish Kumar Jindal

Deep Learning and its Application in Silent Sound Technology 2716

Ayush Tiwari, Vibhu Varshney and Deeksha Singh

Power Efficient, Reliable & Secure Wireless Body Area Network 2722

Pooja Mohnani and Fathima Jabeen

Low Power Quantum Gates for the Implementation of Reversible Memory Elements using Quantum Dot Cellular

Automata 2727

Preeta Sharan and Samyukta A Hassan

A Comparative Study of Saline and Non-Saline Water in Application of Tomato Yield by using Photonic Sensor 2733 Preeta Sharan, Harshitha M and Sandip Kumar Roy

An Efficient Design of 4-Bit Serial input Parallel Output/Serial Output Shift Register in Quantum-Dot Cellular Automata 2736 Allen Vivean Miranda, T Srinivas and Ashwin Padmanabhan

An Optical Storage Device by Surface Plasmon Resonance Technology 2739

Preeta Sharan and Navyashree H A

Wireless Digital Cross Connect SOC for Optical Networks using FPGA 2743

Gunjan Tahakur, Mrinal Sarvagya and Preeta Sharan

Detection of oncological Cell for Breast Cacer by using SPR Technology 2747

Preeta Sharan, Savitha and K Srinivas Rao

An Efficient Design of QCA based Memories 2750

Preeta Sharan, Vinay Kumar and Pratibha S K

Android Security: Permission based Attack 2754

Arushi Jain and Prachi

A Review to Predictive Methodolgy to Diagnose Chronic Kidney Disease 2760

Anu Batra, Usha Batra and Vijendra Singh

Software Module Clustering using Metaheuristic Search Techniques: A Survey 2764

Vineeta Singh

Analysis and Interpretation of Segmentation Techniques based on Delaunay Triangulation and Iterative Thresholding

Explicitly Used for Detection of Anti-Personnel Landmines 2768

Khandakar Faridar Rahman and Saurabh Mukherjee

Distributed Applications of Triangular Array in SHF Band 2774

Sandeep Singh Sengar and Gargi Punia

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

xlii

Design and Analysis of Controllers for Continous Stirred Tank Reactor (CSTR) 2778

Parvesh Saini, Saneev Kumar Gaba, Nalini Rajput and Astha Aggarwal

Arduino and Rx/Tx based Low Cost Class Monitoring System 2785

Rajesh Singh, Anita, Abinash, Abhas and Abhinav Shukla

Cooling of Solar Cells by Open & Closed Refrigerant Systems 2791

Tanmay Jain, Saumya Shukla, Kushal Kukkar and Ayush Tandon

Solar Hybrid Electric Vehicle- A Green Vehicle for Future Impulse 2794

Ayush Goel , Vishal Rajput, Shivam Bajaj, Rajveer Mittal, Anirudh Dube and Ruchika

Risk Assessment and Mitigation Approach for Architecture Evaluation in Component based Software Development 2801 Maushumi Lahon and Uzzal Sharma

Identification of Emotion from Speech Signal 2805

Uzzal Sharma

Extracting Acoustic Feature Vectors of South Kamrupi Dialect through MFCC 2808

Ranjan Das and Uzzal Sharma

Bengali Speech Emotion Recognition 2812

Abhijit Mohanta and Uzzal Sharma

Human Robot Interaction using Android and Pointbug Algorithm 2815

Nupur Choudhury and Rupesh Mandal

Automatic Video Surveillance for theft Detection in ATM Machines: An Enhanced Approach 2821 Nupur Choudhury and Rupesh Mandal

Automatic Identification of the Dialects of Assamese Language in the District of Nagaon 2827 Jahnabi Borah and Uzzal Sharma

A Study on Human Activity Recognition from Video 2832

Debashis Barman and Usha Mary Sharma

Transformations of Graph Database Model from Multidimensional Data Model 2836

Brajen Kumar Deka

Geometrical, Profile and HOG Feature based Recognition of Meetei Mayek Characters 2841 Chandan Jyoti Kumar and Sanjib Kr Kalita

Analysis of Expected Delay of LAR Protocol for Vanets 2846

Kamleshrana, Sachin Tripathi and Ram Shringar Raw

A Relative Analysis of Modern Temporal Data Models 2851

Shailender Kumar and Rahul Rishi

Bringing Healthcare to Doorstep using Vanets 2856

Arvind Kumar, Sachin Tripathi and Ram Shringar Raw

Efficient Algorithms for Cluster Head Selection in Wireless Sensor Networks: A Comparative Study 2860 Priyank Pandey and Prakhar Srivastava

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

xliii

Vanet based Health Monitoring through Wireless Body Sensor Network 2865

Pratibha Kama,l Ram Sringar Raw, Nanhey Singh, Shailendra Kumar and Arvind Kumar

Sql Injection: Types, Methodology, Attack Queries and Prevention 2872

Nanhay Singh, Mohit Dayal, Ram Shringar Raw and Suresh Kumar

Hop Count and Delay Analysis of D-Lar Protocol 2877

Kavita Pandey, Saurabh Kumar Raina and Ram Shringar Rao

An Adaptive Environmental Modelling for V2I Application Services 2882

Suresh Kumar and Rama Shankar Yadav

An Efficient Data Replication and Load Balancing Technique for Fog Computing Environment 2888 Sagar Verma, Arun Kumar Yadav, Deepak Motwani, R S Raw and Harsh Kumar Singh

Modified Mad Protocol for Vanets 2896

Suresh Kumar, R S Yadav, Rashmi Pratap and Pushpendra Kumar

Round Robin Selection of Datacenter Simulation Technique Cloudsim and Cloud Analsyt Architecture and Making It

Efficient By using Load Balancing Technique 2901

Lahar Singh Nishad, Ankita Pareek, Sumitrabeniwal, Sumit Kumar Bola and Sarvesh Kumar

Survey on Steganography Methods (Text, Image, Audio, Video, Protocol and Networks Steganography) 2906 Prashantjohri, Amba Mishra, Sanjoy Das and Arun Kumar

Speckle Noise Removal Filtering Techniques for Ultrasound Image with Comparison Between Technique According to

Standard Measures 2910

Shrusti Porwal, Sarvesh Kumar, Jitendra Joshi, Vani Madhur and Sumaila Khan

Prolonging Network Lifetime by Electing Suitable Cluster Head for Weighted Clustering Algorithm in MANET 2915 Vijayanand Kumar and Rajesh Kumar Yadav

Energy Efficient Reactive Protocol for data Aggregation in Wireless Sensor Network 2921 Rajesh Kumar Yadav, Daya Gupta and D K Lobiyal

A Novel Energy Efficient Geocast Routing Algorithm for Mobile Ad Hoc Networks 2926 Arvind Kumar, Sushil Kumar and Vipin Kumar

Design and Analysis of Elliptical Slot Loaded Microstrip Antenna for C-Band Communication 2930 Dinesh Kumar Raheja and Binod K Kanaujia

A Page Prefetching Technique Utilizing Semantic Information of Links 2934

Sonia Setia, Jyoti and Neelam Duhan

Cloud Computing Model and its Load Balancing Algorithm 2940

Sakshi and Navtej Singh Ghumman

A Systematic Review of Model-Based Testing in Aspect - Oriented Software Systems 2944 Susheela, Sandeep Dalal and Kamna Solanki

Alignment based Graphical Password Authentication System 2950

Abutalha Danish, Labhya Sharma, Harshit Varshney and Asad Khan

An Efficient Power Aware Routing Protocol for Mobile Ad Hoc Networks using Cluster Head 2955 Pawan, A K Sharma, Rajendra K Sharma and Vinod Jain

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

xliv

A Coverage Hole Healing Scheme to Reduce the Impact of Coverage Holes in a Wireless Sensor Network 2959 Latesh Mehta and Manik Gupta

Fuzzy Logic based Enhancement of Wind Power System 2966

Sajan Varma and Malkiat Singh

A Survey on Cloud Computing and its Type 2971

Sheenam Kamboj and Navtej Singh Ghumman

Field Programmable Analog Array: A Boon for Analog World 2975

Dipti and B V R Reddy

An Optimized Algorithm for Solving Travelling Salesman Problem using Greedy Cross over Operator 2981 Vinod Jain and Jay Shankar Prasad

An Efficient Algorithm to Enhance the Digital/Medical Image using SWT and AMF in Wavelet Transformation Domain 2985 Pawanpreet Kaur and Aarti

Social Networking Sites: Issues and Challenges Ahead 2989

Hardeep Singh and Bikrampal Singh

Black Hole Attack Detection and Prevention Mechanism for Mobile Ad-Hoc Networks 2993 Sandeep Sharma, Siddharth Dhama and Mukul Saini

Detection of Osteoarthritis using SVM Classifications 2997

Sandeep Sharma, Sunpreet Singh Virk and Vibhor Jain

An Embedded System Model for Air Quality Monitoring 3003

Sneha Jangid and Sandeep Sharma

Cognitive Radio Spectrum Sensing 3009

Sandeep Sharma and Anupam Kumar Yadav

Lightweight Multilevel Key Management Scheme for Large Scale Wireless Sensor Network 3014 Akansha Singh, Amit K Awasthi and Karan Singh

Sensor Node Deployment and Coverage Prediction for Underwater Sensor Networks 3018 Anvesha Katti and D K Lobiyal

Energy efficient clustering with Secured Data Transmission Technique for Wireless Sensor Networks 3023 Karuna Babber and Rajneesh Randhawa

Comparison Between AODV Protocol and LSGR Protocol in Vanet 3026

Vimlesh and Anurag Singh Baghel

Digital Watermarking using Spatial Domain and Triple DES 3031

Mudita Srivastava, H M Singh, Manish Gupta and Dharm Raj

Composing an Aspect Oriented Approach to Synchronization Problems 3036

Santosh Kumar Gupta, Jaiveer Singh and Manoj Kumar

Web Documents Prioritization using Genetic Algorithm 3042

Santosh Kumar Gupta, Deepti Singh and Amit Doegar

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

xlv

Power Efficiency Analysis of Advanced Modulation Formats 3048

Sandeep Sharma and Luxmi Kant Vishwakarma

Modeling and Performance Analysis of Wireless Channel 3052

Sandeep Sharma, Jaiprakash Nagar and Poonam Singh

Hybrid Routing Protocol for Mobile-Ad Hoc Network in Rural Environments 3058

Vikash and R K Singh

Security in Vehicular Ad hoc Network by using Multiple Operating Channels 3064

Nitish Shukla, Aarti Gautam Dinker, Nihal Srivastava and Ankita Singh

Attacks and Challenges in Wireless Sensor Networks 3069

Aarti Dinker and Vidushi Sharma

Patten Recognition for Toxic Gases based on Electronic Nose using Artificial Neural Networks 3075

M Sreelatha, G. M. Nasira and P Thangamani

Time Series Modeling and Forecasting: Tropical Cyclone Prediction using ARIMA Model 3080 A Geetha and G M Nasira

Extraction and Dimensionality Reduction of Features for Renal Calculi Detection and Artifact Differentiation from

Segmented Ultrasound Kidney Image 3087

Ranjitha M

Improved Fault tolerant in Workload Execution through Quality Particle Swarm Optimization for Grid Environment 3093 V Indhumathi

Diagnosing of Heart Diseases using Average K-Nearest Neighbor Algorithm of Data Mining 3099

C Kalaiselvi

Joint Approach for Secure Communication using VideoSstenography 3104

R Umadevi

Classification and Prediction of Heart Disease Risk using Data Mining Techniques of Support Vector Machine and Artificial Neural Network 3107

S Radhimeenakshi

A Survey on Application of Data Mining Techniques to Analyze the Soil for Agricultural Purpose 3112

N Hemageetha

Particle Swarm Optimization Enabled Filtering for Fabric Images in Automated Fabric inspection System 3118 S Sahaya and Tamil Selvi

Detecting and Preventing intrusion in Multi-Tier Web Applications using Double Guard 3124

D Seethalakshmi and G M Nasira

Robust Adaptive Watermarking in Video for Protecting intellectual Properties 3128

Pawan Kumar Mishra and Itti Hooda

A Critical Study on the Role of Unified Power Flow Control in Voltage Power Transfer 3132

Pardeep Rana and C Ram Singla

Cryptography & Security Implementation in Network Computing Environments 3136

Rajanikant Pandey and Vinay Kumar Pandey

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

xlvi

A Temporal Domain Audio Steganography Technique using Genetic Algorithm 3141

Manisha Rana and Fateh Bahadur Kunwar

Analysis of the Heat Sink of A Tail Lamp using Finite Element Method 3147

Mohit and Tarun

Photo Acquisition System for GEO-Tagged Photo using Image Compression 3150

Sachin Rathee, Fateh Bahadur Kunwar and Manoj Kumar

Key Parameters Modeling using Bayesian Network in Higher Education: an Indian case based Data Analysis 3153 Raju Ranjan, Jayanthi Ranjan and Fateh Bahadur Kunwar

Number Plate Recognition through Image using Morphological Algorithm 3157

Sandeep Singh and Bikrampal Kaur

2-D Geometric Shape Recognition using Canny Edge Detection Technique 3161

Pahulpreet Kaur and Bikrampal Kaur

Color based Segmentation using K-Mean Clustering and Watershed Segmentation 3165 Ishu Garg and Bikrampal Kaur

Meliorate ACO Routing using Wi-MAX in Vanets 3170

Yogesh and Parminder Singh

A Minimal Spanner and Backoff Approach for Topology and Collision Control in Mesh Networks 3176 Shafi Jasuja and Parminder Singh

Performance Metrics of AODV and OLSR in Wireless Mesh Network 3182

Shivani Kukreja and Parminder Singh

Performance Analysis of Intrusion Detection System 3186

Palamdeep Kaur and Parminder Singh

Analysis and Detection of Byzantine Attack in Wireless Sensor Network 3189

Sukhpreet Kaur and Parminder Singh

Implementation of Ica based Score Level Fusion of Iris and Ear Biometrics 3192

Ramanpreet Kaur, Harsimran Kaur and Shashi Bhushan

Fault Occurrence in Leach Protocol in Wireless Sensor Networks 3197

Jasneet Kaur and Parminder Singh

Sybil Attack in Vanet 3201

Harvinder Kaur, Mandeep Devgan and Parminder Singh

Image Retrieval in Cloud Computing Environment with the Help of Fuzzy Semantic Relevance Matrix 3205 Pawandeep, Hardeep Singh and Surabhi Soni

A Robust 4G/Lte Network Authentication for Realization of Flexible and Robust Security Scheme 3211 Niharika Singh and Mandeep Singh Saini

A Flexible Security Architecture for Mobile Data Offloading 3217

Gunjan Bhatnagar and Mandeep Singh Saini

Robust Login Authentication using Time-Based Otp through Secure Tunnel 3222

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

xlvii

Navpreet Kaur, Mandeep Singh Devgan and Shashi Bhushan

Measuring the Data Routing Efficacy in Level-2 Leach Protocol 3227

Jaspreet Kaur and Parminder Singh

Image Classification using NPR and Comparison of Classification Results 3231

Pankaj Mehta, Purushottam Das, Ashish Bajpai and Ankur Singh Bist

Web Data Mining- A Perspective of Research Issues and Challenges 3235

Kavita, Priyanka Mahani and Neelam Ruhil

Online Voting System Linked with Aadhar 3239

Vishal, Vibhu Chinmay, Rishabh Garg and Poonam Yadav

Digital Watermarking - Effectiveness and Implications on Anti-Circumvention 3241 Prateek Chakraverty

Smart Farming Pot 3247

Shweta Saini, Pooja Kumari, Pooja Yadav, Parul Bansal and Neelam Ruhil

Photo of A Visitor using Whatsapp 3251

Amit Saini, Akansha Marwah and Neelam Ruhil

A Review Paper on Automatic Energy Meter Reading System 3254

Nitesh Rawat, Bhuvesh Yadav, Neha and Sonia

Blood Vessel Detection for Diabetic Retinopathy 3258

Poonam, Poonam Yadav and Neelam Ruhil

A Review and Comparison of ?Aodv, Dsr and Zrp Routing Protocols on the Basis of Qualitative Metrics 3262 Parvinder Kaur, Dalveer Kaur and Rajiv Mahajan

Comparison Study of Routing Protocol in Wireless Sensor Network- A Road Map 3267 Mohit Angurala and Ankita Saini

A Comparative Study Between Leach and Pegasis- A Review 3271

Mohit Angurala and Bharti

Ttp based Vivid Protocol Design for Authentication and Security for Cloud 3275

Akhilesh Kumar Bhardwaj, Rajiv Mahajan and Surender

CBSE Approach using Clean Room Software Engineering for Intrusion Detection System 3279 Mohit Angurala and Geetanjali Sharma

Authentication based Secure Protocol using Ttp for Wmns 3286

Parveen Sharma, Rajiv Mahajan and Surender

Comparative Analysis of Application and Protocols in WSNs 3291

Amandeep Kaur

An Overview of Diverse Protocols and Probing of Malicious Node using Aodv in Vanet Environment 3296 Inderpreet Kaur, Rajeev Bedi and R C Gangwar

Wireless Sensor Networks and Security 3301

Sonia Sharma

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

xlviii

Disaster Management and Business Continuity in Indian Banking Sector 3305

Tejinder Pal Singh Brar, Sawtantar Singh and Dhiraj Sharma

Novel Scheme for Stable Routing using Battery Status 3309

Ashwani Kush and Nancy Garg

Detecting Malicious Node in Network using Packet Delivery Ratio 3313

Sanjay Tyagi, Girdhar Gopal and Vikas Garg

Body Area Networks: A Survey 3319

Kriti Chaudhary and Divya Sharma

Secret Key Establishment for Self Healing of Adhoc Networks 3324

Rosy Pawar and Ashwani Kush

Scheme to Make Aodv and Dsr Energy Efficient 3331

Vishal Dattana, Ashwani Kush and Ruchika

Energy Efficient Virtual Machine Migrations based on Genetic Algorithm in Cloud Data Center 3335 Inderjit Singh Dhanoa and Sawtantar Singh Khurmi

Development of Navigational Structure for Buildings from their Valid 3D Citygml Models 3341 Geetika and N L Sarda

Four-Tier Network Architecture for Wireless Sensor and Actor Networks 3347 Nisha and Mayank Dave

A Novel Authentication Algorithm for Vertical Handoff in Heterogeneous Wireless Networks 3352 Suman

Handover Algorithm for Heterogeneous Networks 3358

Suman Deswal and Anita Singhrova

Realization of Wireless Sensor Network through Real Time Test Bed 3365

Shalu and Amita Malik

Performance of Integrated Signature Verification Approach: Review 3369

Surjeet Dalal and Upasna Jindal

Developing Human Family Tree with Swrl Rules 3374

Ranjna Jain and Neelam Duhan

Dwban: Dynamic Priority based Wban Architecture for Healthcare System 3380 Sapna Gmbhir and Madhumita Kathuria

A Novel Approach for Rank Optimization using Search Engine Transaction Logs 3387 Shipra Kataria

Performance of Static Sink in Wireless Sensor Networks when Implementing Geographical Routing 3394 Priyanka Chugh Shivanka, S Indu, Anupam Joshi, Abhinav Singh and Rahul Rajpal

Detection and Prevention of Black Hole Attacks in Cluster based Wireless Sensor Networks 3399 Prachi Dewal, Gagandeep Singh Narula and Vishal Jain

An Efficient Personalized Query Suggestion Technique for Providing Relevant Results 3404

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

xlix

Shilpa Sethi and Ashutosh Dixit

Ontology Driven Web Search Cerebration System 3409

Tushar Atreja, Komal Kumar Bhatia and Jyoti

Efficient Concurrency Control Mechanism for Distributed Databases 3415

Parul tomar and Suruchi

Semantic Related Tag Recommendation using Folksonomized ontology 3419

Usha Yadav, Jaipreet Kaur and Neelam Duhan

An Approach for Ranking Search Results in Digital Library using Bookmark 3424

Sumita Gupta, Neelam Duhan and Poonam Bansal

Design of Focused Crawler based on Feature Extraction, Classification and Term Extraction 3430 Shilpi Gupta

A Survey of Semantic Similarity Measuring Techniques for Information Retrieval 3435

Mamta Kathuria, C K Nagpal and Neelam Duhan

A Novel Approach for Key Distribution through Fingerprint based Authentication using Mobile Agent 3441 Umesh Kumar and Sapna Gambhir

Monitoring Ambient Light Conditions of A School using IoT 3446

Akansha Chitransh, Garima Singh and Gundeep Tanwar

Software Design for Social Profile Matching Algorithm to Create Ad-Hoc Social Network on Top of Android 3450 Sapna Gmbhir, Samridhi Mangla and Nagender Aneja

Channel Capacity of Different Adaptive Transmission Techniques Over Log-Normal Shadowed Fading Environment 3455 P K Verma, Priyanka Jain, Sanjay Soni and R S Raw

The Performance Analysis of N-S Architecture to Mitigate Ddos Attack in Cloud Environment 3460 Nagaraju Kilari and Sridaran

A Survey on Data Center Network Virtualization 3464

Grishma Ghoda, Tirth Gajjar, Dhaivat Dave, Nayana Meruliya, Disha H Parekh and R Sridaran

Distance, Energy and Storage Efficient Dynamic Load Balancing Algorithm in Cloud Computing 3471 Maulik Parekh, Nootan Padia and Amit Kothari

Component Safety Assessment using Three State Markov Model 3476

Gandi Satyanarayana and P Seetharamaiah

Channel Estimation using Iterative Extended Kalman Filter for Superposition Coded Modulation System 3482 Rashmi and Mrinal Sarvagya

ARP Cache Rectification for Defending Spoofing and Poisoning Attacks 3487

Alok Pandey and Jatinderkumar R Saini

Moving Word Cloud from Visual towards Text Analysis to Endow Elearning 3493

Shailaja Jayashankar and Sridaran

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

Simulation based Analysis of Location Update Strategies in Mobile Computing with Analytical Modeling 3499 Kalpesh A Popat and Priyanka Sharma

Study of Possible Attacks on Image and Video Watermark 3505

Venugopala P S, Shravya Jain, Sarojadevi H and Niranjan N C

System Safety Assessment with the Markov Chain Model and Ternary Decision Diagram 3511 Gandi Satyanarayana and P Seetharamaiah

Performance Evaluation of Traditional TCP Variants in Wireless Multihop Networks 3517 Sunil Lalchand Bajeja and Atul M Gonsai

Hit on Private Data Access and Distinctive Cryptographic Mechanism - A Survey 3523 Dikshan N Shah and R Sridaran

Simulation based Study of Gray Hole Attack in MANET 3529

Joshi Shraddha Dipakkumar, Ashish Kumar Srivastava and Sunil K Vithlani

A Survey of Outlier Detection Algorithms for Data Streams 3535

Jinita Tamboli and Madhu Shukla

Consolidated Study & Analysis of Different Clustering Techniques for Data Streams 3541 Meghnesh Jayswal and Madhu Shukla

Analysis and Impact of Cyber Threats on Online Social Networks 3548

Seema D Trivedi, Dhaivat Dave and R Sridaran

Video Watermarking for Android Mobile Devices 3554

Venugopala P S, Ankitha A Nayak, Sarojadevi H and Niranjan N Chiplunkar

A Survey of Game based Strategies of Resource Allocation in Cloud Computing 3561 Husain Godhrawala and R Sridaran

Novel Approach to Improvise Congestion Control over Vehicular Ad Hoc Networks (VANET) 3567 Bhargavi Goswami and Saleh Asadollahi

Ontology based Framework for Detecting Ambigities in Software Requirement Specifications 3572 M P S Bhatia, Akshi Kumar and Rohit Beniwal

Optimization of Revenue Generated by Hydro Power Plant by Bat Algorithm 3576

Shrey Gupta and Kapil Sharma

Clustering based Feature Selection Methods from Fmri Data for Classification of Cognitive States of the Human Brain 3581 Awijit Gupta, Arjun Gupta and Kapil Sharma

Comparison of Generalized and Big Data Business Intelligence Tools 3585

Parth Nagar, Labhansh Atriwal, Himanshi Mehra and Sandeep Tayal

Quality Issues with Big Data Analytics 3589

Sangeeta and Kapil Sharma

Survey and Evaluation of Food Recommendation Systems and Techniques 3592

Akshi Kumar, Pulkit Tanwar and Saurabh Nigam

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

Evolution of Spark Framework for Simplifying Big Data Analytics 3597

Subhash Kumar

Software Bug Localization using Pachinko Allocation Model 3603

Tanu Sharma, Kapil Sharma and Tapan Sharma

Swot Analysis of Search based Software Engineering 3609

Abhilasha Sharma and Yogita Khatri

Application of Artificial Bee Colony Algorithm using Hadoop 3615

Nupur Bansal, Sanjay Kumar and Ashish Kumar Tripathi

A Comparative Study of Classification Techniques: Support Vector Machine, Fuzzy Support Vector Machine & Decision

Trees 3620

Priyank Pandey and Amita Jain

Classification Techniques for Big Data: A Survey 3625

Priyank Pandey, Manoj Kumar and Prakhar Srivastava

5G Millimeter Wave (Mmwave) Communications 3630

S K Agrawal and Kapil Sharma

Efficient Routing Algorithm using Sectorized Antenna for Mobile Ad-Hoc Networks 3635

Viomesh Kumar Singh, Goldie Gabrani and Sachin Dubey

An Approach to Handle Big Data Analytics using Potential of Swarm Intelligence 3640

Sonu Lal Gupta, Sofia Goel and Anurag Singh Baghel

Ontology based Framework for Reverse Engineering of Conventional Softwares 3645

M P S Bhatia, Akshi Kumar and Rohit Beniwal

A Comparative Study of Wind Power Forecasting Techniques - A Review Article 3649

Madan Mohan Tripathi and Jyoti Varanasi

On using Reviews and Comments for Cross Domain Recommendations and Decision Making 3656 Mala Saraswat, Shampa Chakraverty, Namrata Mahajan and Nikita Tokas

Robot Navigation - Review of Techniques and Research Challenges 3660

Swati Aggarwal, Kushagra Sharma and Manisha Priyadarshni

Applications of Fuzzy Learning Automata A Review 3666

Vidushi Gupta and Swati Aggarwal

Uncertain Data Mining : A Review of Optimization Methods for Uk-Means 3672

Swati Aggarwal, Nitika Agarwal and Monal Jain

Quantum Cryptanalysis using Digital Ant in Pervasive Environment 3678

Suruchi Sinha, D. Shantha Devi, Shukun Tokas and Vanita Kareer

Disease Recognition and Classification from Movement Patterns 3682

Garima Bhatia and Sangeeta Rani

Design of Random Image Slicer using Implementation on Steganography 3688

Tanvi Chavan and Umesh Kulkarni

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

lii

Finding the Malicious Urls using Search Engines 3692

Amruta Rajeev Nagaonkar and Umesh L Kulkarni

Effect of Analysis and Design Phase Factors on Testing of Object Oriented Software 3695 Sanjeev Patwa and Anubha Jain

Big Data Protection Via Neural and Quantum Cryptography 3701

Deepshikha Sharma and Amita Sharma

A Review on Quality Models to Analyse the Impact of Refactored Code on Maintainability with Reference to Software

Product Line 3705

U Devi, A Sharma and N Kesswani

A Statistical Analysis of Imapct of Metadata in Data Warehousing 3709

Vijay Gupta

Design and Development of An intelligent Agent based Framework for Predictive Analysis 3715 Deepshikha Bhargava, Ramesh C Poonia and Upma Arora

Customer Service Experience and Satisfaction in Retail Stores 3719

Pankaj Deshwal and Anish Krishna

Comparitive Analysis of Algorithms for Identification of Session on the Basis of Threshhold Value 3724 Priyanka Verma

How Privacy Invasive Android Apps Are? 3731

Nishtha Kesswani

Intrusion Detection based on Key Feature Selection using Binary Gwo 3735

Jitendra Kumar Seth and Satish Chandra

Evaluation of Some Recent Image Segmentation Methods 3741

Peeyush Tiwari and Prerna Surbhi

Literature Survey on Different Type of Fingerprint Recognition 3748

Sharad Pratap Singh, Shahanz Ayub and J P Saini

Online Transaction Fraud Detection Techniques: A Review of Data Mining Approaches 3756 Bharat Bhushan Sagar, Pratibha Singh and S Mallika

A Nucleic Filter to Enhance the Security in Cloud Computing Environment 3762

Oinam David Singh, Amit Asthana and Yogesh Kushwaha

Twitter Sentiment Analysis in Healthcare using Hadoop and R 3766

Vijay Shankar Gupta and Shruti Kohli

Review of Job Recommender System using Big Data Analytics 3773

Pooja Tripathi, Ruchi Agarwal and Tanushi Vashishtha

Identifying Evolutionary Approach for Search Result Clustering 3778

Shashi Mehrotra and Shruti Kohli

Machine Learning Techniques for Effective Text Analysis of Social Network e-Health Data 3783 Sonia Saini and Shruti Kohli

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

liii

Impact of Demographic Factors On Online Purchase Frequency: A Decision Tree Approach 3789 Dilpreet Singh and Sahil Raj

Impact of Bagging on Mlp Classifier for Credit Evaluation 3794

Shashi Dahiya, S S Handa and N P Singh

A Cross Project Source Code Based Risk Analysis by Identifying Fault Prone Modules 3801 Ishleen Kaur and Neha Kapoor

The Analysis of Software Metrics for Design Complexity and Its Impact on Reliability 3808 Aditya Pratap Singh and Pradeep Tomar

Self Controlled Traffic Management using Autonomic System 3813

Puneet Kumar Aggarwal, Prashant Nigam and Vineet Shrivastava

Building Detection and Extraction Techniques: A Review 3816

Anupama Mishra, Akhileshwar Pandey and Anurag Singh Baghel

A Survey on Object Recognition and Segmentation Techniques 3822

Palak Khurana, Anshika Sharma and Shailendra Narayan Singh

Analytical Review on Shadow Detection and Removal in Images and Videos 3827

Sobiya Amin, Arti Tiwari and Abhishek Srivastava

An Unique Data Security using Text Steganography 3834

Savitha D Torvi, Rupam Das and K B Shivakumar

Analytical Review on Video Base Human Activity Recognition 3839

Akansha Mishra, Shailendra Narayan Singh and Uzair Asad

Green Computing Evaluation Process 3845

Monalisa Kushwaha and Shailendra Narayan Singh

Quality Assurance of Component based Software Systems 3850

Ravi Kumar Sharma and Parul Gandhi

Network Security Problems and Security Attacks 3855

Komal Gandhi

Cloud Computing Security Issues: An Analysis 3858

Komal Gandhi and Parul Gandhi

Empirical Analysis of Image Compression using Wavelets, Discrete Cosine Transform and Neural Network 3862 Gaurav Kumar and Pradeep Kumar Bhatia

A Survey on Prospects of Automated Software Test Case Generation Methods 3867

Vishawjyoti and Parul Gandhi

Selection of Query-Utilize Trust-Based Algorithms to Propagate Trust 3872

Jyoti Pruthi

Exhaustive Block Matching Algorithm to Estimate Disparity Between Stereo Images 3876 D Chandra Devi and M Sundaresan

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

liv

Predictions and Recommendations for the Higher Education institutions from Facebook Social Networks 3882 Mamta Madan, Meenu Dave and Meenu Chopra

Diabetic Retinapathycal Nicking Identification and Classification using Color Fundus Images 3892 Vinitha K and M Sundaresan

Framework Model and Algorithm of Request based One Time Passkey (Rotp) Mechanism to Authenticate Cloud Users in

Secured Way 3898

Boopathy D and M Sundaresan

Significant Image Enhancement Technique for Removal of Noise in Lidar Images 3904

A Vijaya and M Sundaresan

Performance Analysis of Interpolation Methods for Improving Sub-Image Content-Based Retrieval 3909 A B Dhivya and M Sundaresan

Comparative Study on Wavelet-Based Denoising Techniques for Removing Speckle Noise from Partial Fingerprint Images 3913 P S Meenakshi and M Sundaresan

Recognition of Merged Characters in Text based CAPTCHAs 3917

Rafaqat Hussain, Hui Gao, Kamlesh Kumar and Imran Khan

A Comparative Study Among Colorful Image Desprictors for Content based Image Retrieval 3922 Kamlesh Kumar, Jian-Ping Li, Zain-Ul-Abidin and Imran Khan

Content based Grading of Fresh Fruits using Markov Random Field 3927

Riaz Ahmed Shaikh, Jian-Ping Li, Asif Khan and Imran Khan

Vision based Classification of Fresh Fruits Using Fuzzy Logic 3932

Asif Khan, Jian-Ping Li, Riaz Ahmed Shaikh and Imran Khan

Secured Cloud Database Health Care Mining Analysis 3937

Kissi Mireku Kingsford , Zhang FengLi , Mensah Dennis NiiAyeh, Asif Khan and Riaz Ahmed Shaikh

Computing in Cryptography 3941

Mohammad Ubaidullah Bokhari and Shabbir Hassan

Tools of Development of Expert Systems: A Comparative Study 3947

Haider Khalaf Jabbar and Rafiqul Zaman Khan

Power Management for Android Platform by Set CPU 3953

Muhammad Hammad Memon, Muhammad Hunain Memon, Asif Khan, Riaz Ahmed Shaikh and Imran Khan

Building Better e-Learning Environment using HTML5 3959

Mohammad Ubaidullah Bokhari, Hassan Faisal Aldheleai and Yahya Kord Tamandani

An Analysis of Software Requirements Priortization Techniques: A Detailed Survey 3966

Masooma Yousuf, Mohammad Ubaidullah Bokhari and Md Zeyauddin

Comparative Study of Data Mining Tools Used for Clustering 3971

Parvej Aalam and Tamanna Siddiqui

Sense Adaptive Multimodal Information Fusion: A Proposed Model 3976

Mohammad Ubaidullah Bokhari and Faraz Hasan

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

An Effective model for Big Data Analytics 3980

Mohammad Ubaidullah Bokhari

A Novel Task Scheduling Algorithm for Parallel System 3983

Zaki Ahmad Khan, Jamshed Siddiqui and Abdus Samad

Human Speech Sentiments Recognition: A Data Mining Approach for Categorization of Speech 3987 Ritika Gupta and Gaurav Aggarwal

A Review to Different Approaches Used for Devanagri Characters Segmentation 3992

Ankita Srivastav and Neha Sahu

Review of Brain Tumor Detection from MRI Images 3997

Deepa and Akansha Singh

Gender Specific Speech Feature Variation in Hindi and English Language 4001

Sumalata Gautam and Surbhi Dewan

Segmentation of Devanagari Characters 4005

Mayank Sahai and Neha Sahu

A Comparative Study: Spectral Parameter in Speech of Intellectually Disabled and Normal Population 4009 Sumanlata Gautam and Latika Singh

Image Processing Techniques for Offline Handwritten Recognition 4014

Shivani Sihmar and Poonam Sharma

A Review of Exudates Detection using Retinal Images 4019

Mamta Rawat and Akansha Singh

Data Mining in Healthcare Informatics: Techniques and Applications 4023

Tanvi Anand, Rekha Pal and Sanjay Kumar Dubey

To Recognize and Analyse Spam Domains from Spam Emails by Data Mining 4030

Kavita Patel and Sanjay Kumar Dubey

Modified Hierarchial Clustering Algorithm for Time Serirs Data 4036

Sangeeta Rani

Metrics based Evaluation of Mobile Application using AHP Entropy Model 4041

Kirti Sharawat and Sanjay Kumar Dubey

Analytical and Critical Approach for Usability Measurement Method 4045

Kritika Puri and Sanjay Kumar Dubey

A Systematic Review of Performance Analysis and Implementation of OSPFV3 in Ipv6 Network 4051 Sheikh Raashid Javid and Sanjay Kumar Dubey

Metrics based Usability Evaluation for Adoption and Usuage of Mobile Devices and Services By Elderly Population 4058 Ruchika Singh and Sanjay Kumar Dubey

Usability Evaluation of Mobile Phones By using Ahp-Entropy Approach 4063

Divyaa S K, Neha Yadav and Sanjay Kumar Dubey

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

lvi

Big Data with Integrated Cloud Computing for Healthcare Analytics 4068

Rajesh Jangade and Ritu Chauhan

Integrating Neural Networks with Software Reliability 4072

Deepak Kumar

A Novel Comparison between the Optical and Structural Characteristics of ZnO and NiO Thin Films 4078 Surabhi Gupt, Dikshant Pandey, Shail K. Gupta and Shruti Aggarwal

DDoS Attack Algorithm using ICMP Flood 4082

Neha Gupta, Ankur Jain, Pranav Saini and Vaibhav Gupta

2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom)

lvii

Software Sizing, Cost, Schedule, and Risk...the 10 Step Process

Daniel D. Galorath

CEO

Galorath Incorporated

100 N. Sepulveda Blvd., Ste. 1801

El Segundo, CA 90245

P: 310.414.3222

F: 310.414.3220

galorath@galorath.com

Abstract

An effective software estimate provides the information needed to design a workable software development plan. How well the project is estimated is ultimately the key to the project’s (and product’s) success. An effective software estimate provides important information for making project decisions, projecting performance, and defining objectives and plans. Without the proper guidance in a project, the results could be disastrous.

The focus of this paper is how to make software projects more successful by properly estimating and planning costs, schedules, risks, and resources. It begins by covering the fundamental problems of unreasonable software estimation: not planning up front; failure to use viable estimates as the bases of an achievable project plan, not updating the plan and estimates when a project changes, and failing to consider the uncertainties inherent in estimates. Most estimates are prepared early on in the life cycle of a project, when there are typically a large number of undefined areas related to the project. The steps presented in this paper provide a complete method for developing estimates and plans.

This paper proposes a 10-step estimation process that begins by addressing the need for project metrics and the fundamental software estimation concepts. It shows how to build a viable project estimate, which includes the work involved in the actual generation of an estimate, including sizing the software, generating the actual software project estimate, and performing risk/uncertainty analysis. Finally the process rounds out with a discussion on validation of the estimate, obtaining lessons learned, and use of the estimate throughout the project. Based on the book, Software Sizing, Estimation, and Risk Management: When Performance is Measured Performance Improves, by Daniel D. Galorath and Michael W. Evans (Auerbach Publications, February 2006, ISBN: 0849335930)

Introduction

Too many software projects fail. And more of these failures are due to planning inadequacies than unachievable technical objectives, failed technology, or unachievable requirements.

A software estimation process that is integrated with the software development process can help projects establish realistic and credible plans to implement the project requirements and satisfy commitments. It also can support other management activities by providing accurate and timely planning information.

Many elements are involved in determining the structure of a project, including requirements, architecture, quality provisions, and staffing mix. Perhaps the most important element in the success or failure of a project are estimates of its scope, in terms of both the time and cost that will be required and the plans based on those estimates. The estimate drives every aspect of the project, constrains the actions that can be taken in the development or upgrade of a product, and limits available options. Although many people think they can estimate project scope based on their engineering or management experience, most off-the-cuff estimates are incorrect and are most often based on simple assumptions and over-optimism, or worse, are made to accord with what others want to hear. Needless to say, such estimates often lead to disaster.

If the estimate is unrealistically low, the project will be understaffed from its outset and, worse still, the resulting excessive overtime or staff burnout will cause attrition and compound the problems facing the project. Overestimation is not the answer. Indeed, overestimating a project can have the same effects as any other inaccurate estimate.

The definition of the verb to estimate is to produce a statement of the approximate value of some quantity. Estimates are based upon incomplete, imperfect knowledge and assumptions about the future. Most importantly, however, all estimates have uncertainty.

Ideally an estimate should be produced using the ten-step process described in Figure 1.

Figure 1 10 Step Estimation Process

Step One: Establish Estimate Scope and Purpose

Define and document estimate expectations. When all participants understand the scope and purpose of the estimate, you’ll not only have a baseline against which to gauge the effect of future changes; you’ll also head off misunderstandings among the project group and clear up contradictory assumptions about what is expected.

Documenting the application specifications, including technical details, external dependencies and business requirements, will provide valuable input for estimating the resources required to complete the project. The more detailed the specs, the better. Only

when these requirements are known and understood can you establish realistic development costs.

An estimate should be considered a living document; as data changes or new information becomes available, it should be documented and factored into the estimate in order to maintain the project’s integrity.

Step Two: Establish Technical Baseline, Groundrules,

and Assumptions

To establish a reasonable technical baseline, you must first identify the functionality included in the estimate. If detailed functionality is not known, groundrules and assumptions should clearly state what is and isn’t included in the estimate. Issues of COTS, reuse, and other assumptions should be documented as well.

Groundrules and assumptions form the foundation of the estimate and, although in the early stages of the estimate they are preliminary and therefore rife with uncertainty, they must be credible and documented. Review and redefine these assumptions regularly as the estimate moves forward.

Step Three: Collect Data

Any estimate, by definition, encompasses a range of uncertainty, so you should express estimate inputs as least, likely and most rather than characterizing them as single data points. Using ranges for inputs permits the development of a viable initial estimate even before you have defined fully the scope of the system you are estimating.

Certain core information must be obtained in order to ensure a consistent estimate. Not all data will come from one source and it will not all be available at the same time, so a comprehensive data collection form will aid your efforts. As new information is collected, you will already have an organized and thorough system for documenting it.

Step Four: Software Sizing

If you lack the time to complete all the activities described in the ten-step process, prioritize the estimation effort: Spend the bulk of the time available on sizing (sizing databases and tools like SEER-AccuScope can help save time in this process). Using an automated software cost and schedule tool like SEER-SEM can provide the analyst with time-saving tools (SEER-SEM knowledge bases save time in the data collection process).

Size is generally the most significant (but certainly not the only) cost and schedule driver. Overall scope of a software project is defined by identifying not only the amount of new

software that must be developed, but also must include the amount of preexisting, COTS, and other software that will be integrated into the new system. In addition to estimating product size, you will need to estimate any rework that will be required to develop the product, which will generally be expressed as source lines of code (SLOC) or function points, although there are other possible units of measure. To help establish the overall uncertainty, the size estimate should be expressed as a least—likely—most range.

Predicting Size

Whenever possible, start the process of size estimation using formal descriptions of the requirements such as the customer’s request for proposal or a software requirements specification. You should reestimate the project as soon as more scope information is determined. The most widely used methods of estimating product size are:

Expert opinion — This is an estimate based on recollection of prior systems and assumptions regarding what will happen with this system, and the experts’ past experience.

Analogy — A method by which you compare a proposed component to a known component it is thought to resemble, at the most fundamental level of detail possible. Most matches will be approximate, so for each closest match, make additional size adjustments as necessary. A relative sizing approach such as SEER-AccuScope can provide viable size ranges based on comparisons to known projects.

Formalized methodology — Use of automated tools and/or pre-defined algorithms such as counting the number of subsystems or classes and converting them to function points. Statistical sizing — Provides a range of potential sizes that is characterized by least, likely, and most.

Use the Galorath sizing methodology to quantify size and size uncertainty. This includes preparing as many size estimates as time permits and putting them all in a table (Figure 2), then choosing the size range from the variety of sources.

Figure 2 Galorath Size Methodology Identifies Size Ranges From Multiple Methods Step Five: Prepare Baseline Estimate

Budget and schedule are derived from estimates, so if an estimate is not accurate, the resulting schedules and budgets are likely to be inaccurate also. Given the importance of the estimation task, developers who want to improve their software estimation skills should understand and embrace some basic practices. First, trained, experienced, and skilled people should be assigned to size the software and prepare the estimates. Second, it is critically important that they be given the proper technology and tools. And third, the project manager must define and implement a mature, documented, and repeatable estimation process.

To prepare the baseline estimate there are various approaches that can be used, including guessing (which is not recommended), using existing productivity data exclusively, the bottom-up approach, expert judgment, and cost models.

Bottom-Up Estimating: Bottom-up estimating, which is also referred to as “grassroots” or “engineering” estimating, entails decomposing the software to its lowest levels by function or task and then summing the resulting data into work elements. This approach

can be very effective for estimating the costs of smaller systems. It breaks down the required effort into traceable components that can be effectively sized, estimated, and tracked; the component estimates can then be rolled up to provide a traceable estimate that is comprised of individual components that are more easily managed. You thus end up with a detailed basis for your overall estimate.

Software cost models: Different cost models have different information requirements. However, any cost model will require the user to provide at least a few — and sometimes many — project attributes or parameters. This information describes the project, its characteristics, the team’s experience and training levels, and various other attributes the model requires to be effective, such as the processes, methods, and tools that will be used.

Parametric cost models provide a means for applying a consistent method for subjecting uncertain situations to rigorous mathematical and statistical analysis. Thus they are more comprehensive than other estimating techniques and help to reduce the amount of bias that goes into estimating software projects. They also provide a means for organizing the information that serves to describe the project, which facilitates the identification and analysis of risk.

A cost model uses various algorithms to project the schedule and cost of a product from specific inputs. Those who attempt to merely estimate size and divide it by a productivity factor are sorely missing the mark. The people, the products, and the process are all key components of a successful software project. Cost models range from simple, single formula models to complex models that involve thousands of calculations.

Delphi and Wideband Delphi: You can further offset the effects of these biases by implementing the Delphi estimation method, in which several expert teams or individuals, each with an equal voice and an understanding up front that there are no correct answers, start with the same description of the task at hand and generate estimates anonymously, repeating the process until consensus is reached.

Activity-Based Estimates: Another way to estimate the various elements of a software project is to begin with the requirements of the project and the size of the application, and then, based on this information, define the required tasks, which will serve to identify the overall effort that will be required.

The major cost drivers on a typical project are focused on the non-coding tasks that must be adequately considered, planned for, and included in any estimate of required effort. Of course, not every project will require all of these tasks, and you should tailor the list to the specific requirements of your project, adding and deleting tasks as necessary and modifying task descriptions if required, and then build a task hierarchy — which usually takes the form of a WBS — that represents how the work will be organized and performed. The resulting work breakdown structure is the backbone of the project plan and provides a means to identify the tasks to be implemented on a specific project. It is

not a to-do list of every possible activity required for the project; it does provide a structure of tasks that, when completed, will result in satisfaction of all project commitments.

Step Six: Quantify Risks and Risk Analysis

It is important to understand what a risk is and that a risk, in itself, does not necessarily pose a threat to a software project if it is recognized and addressed before it becomes a problem. Many events occur during software development. Risk is characterized by a loss of time, or quality, money, control, understanding, and so on. The loss associated with a risk is called the risk impact.

We must have some idea of the probability that the event will occur. The likelihood of the risk, measured from 0 (impossible) to 1 (certainty) is called the risk probability. When the risk probability is 1, then the risk is called a problem, since it is certain to happen.

For each risk, we must determine what we can do to minimize or avoid the impact of the event. Risk control involves a set of actions taken to reduce or eliminate a risk.

Risk management enables you to identify and address potential threats to a project, whether they result from internal issues or conditions or from external factors that you may not be able to control. Problems associated with sizing and estimating software potentially can have dramatic negative effects. The key word here is potentially, which means that if problems can be foreseen and their causes acted upon in time, effects can be mitigated. The risk management process is the means of doing so.

Step Seven: Estimate Validation and Review

At this point in the process, your estimate should already be reasonably good. It is still important to validate your methods and your results, which is simply a systematic confirmation of the integrity of an estimate. By validating the estimate, you can be more confident that your data is sound, your methods are effective, your results are accurate, and your focus is properly directed.

There are many ways to validate an estimate. Both the process used to build the estimate and the estimate itself must be evaluated. Ideally, the validation should be performed by someone who was not involved in generating the estimate itself, who can view it objectively. The analyst validating an estimate should employ different methods, tools and separately collected data than were used in the estimate under review.

When reviewing an estimate you must assess the assumptions made during the estimation process. Make sure that the adopted ground rules are consistently applied throughout the estimate. Below-the-line costs and the risk associated with extraordinary requirements may have been underestimated or overlooked, while productivity estimates may have

been overstated. The slippery slope of requirements creep may have created more uncertainty than was accounted for in the original estimate.

A rigorous validation process will expose faulty assumptions, unreliable data and estimator bias, providing a clearer understanding of the risks inherent in your projections. Having isolated problems at their source, you can take steps to contain the risks associated with them, and you will have a more realistic picture of what your project will actually require to succeed.

Despite the costs of performing one, a formal validation should be scheduled into every estimation project, before the estimate is used to establish budgets or constraints on your project process or product engineering. Failing to do so may result in much greater downstream costs, or even a failed project.

Step Eight: Generate A Project Plan

The process of generating a project plan includes taking the estimate and allocating the cost and schedule to a function and task-oriented work breakdown structure.

To avoid tomorrow’s catastrophes, a software manager must confront today’s challenges. A good software manager must possess a broad range of technical software development experience and domain knowledge, and must be able to manage people and the unique dynamics of a team environment, recognize project and staff dysfunction, and lead so as to achieve the expected or essential result.

Some managers, mainly due to lack of experience, are not able to evaluate what effects their decisions will have over the long run. They either lack necessary information, or incorrectly believe that if they take the time to develop that information the project will suffer as a result. Other managers make decisions based on what they think higher management wants to hear. This is a significant mistake. A good software manager will understand what a project can realistically achieve, even if it is not what higher management wants. His job is to explain the reality in language his managers can understand. Both types of “problem manager,” although they may mean well, either lead a project to an unintended conclusion or, worse, drift down the road to disaster.

Software management problems have been recognized for decades as the leading causes of software project failures. In addition to the types of management choices discussed above, three other issues contribute to project failure: bad management decisions, incorrect focus, and destructive politics. Models such as SEER-SEM handle these issues by guiding you in making appropriate changes in the environment related to people, process, and products.

Step Nine: Document Estimate and Lessons Learned

Each time you complete an estimate and again at the end of the software development, you should document the pertinent information that constitutes the estimate and record the lessons you learned. By doing so, you will have evidence that your process was valid and that you generated the estimate in good faith, and you will have actual results with which to calibrate your estimation models. Be sure to document any missing or incomplete information and the risks, issues, and problems that the process addressed and any complications that arose. Also document all the key decisions made during the conduct of the estimate and their results and the effects of the actions you took. Finally, describe and document the dynamics that occurred during the process, such as the interactions of your estimation team, the interfaces with your clients, and trade-offs you had to make to address issues identified during the process

You should conduct a lessons-learned session as soon as possible after the completion of a project while the participants’ memories are still fresh. Lessons-learned sessions can range from two team members meeting to reach a consensus about the various issues that went into the estimation process to highly structured meetings conducted by external facilitators who employ formal questionnaires. No matter what form it may take, it is always better to hold a lessons-learned meeting than not, even if the meeting is a burden on those involved. Every software project should be used as an opportunity to improve the estimating process.

Step Ten: Track Project throughout Development

In-process information should be collected and the project should be tracked and

compared to the original plan. If projects are varying far off their plans refined estimates should also be prepared. Ideally, the following attributes of a software project would be tracked:

1. Cost, in terms of staff effort, phase effort and total effort

2. Defects found or corrected, and the effort associated with them

3. Process characteristics such as development language, process model and technology

4. Project dynamics including changes or growth in requirements or code and schedule

5. Project progress (measuring performance against schedule, budget, etc.)

6. Software structure in terms of size and complexity

Earned value, combined with quality and growth can be used to forcast completion very accurately and flag areas where managers should be spending time controlling.

Summary

Software cost estimation is a difficult process but a necessary part of a successful software development. You can help ensure useful results by adopting a process that is standardized and repeatable. Several of the steps we have discussed, particularly those that do not result directly in the production of the estimate (Steps 1, 6, and 7) are often deferred or, worse still, not performed at all, often for what appear to be good reasons

such as a lack of adequate time or resources or a reluctance to face the need to devise a plan if a problem is detected. Sometimes you simply have more work than you can handle and such steps don’t seem absolutely necessary. Sometimes management is reluctant to take these steps, not because the resources are not available, but because managers do not want to really know what they may learn as a result of scoping their estimates, quantifying and analyzing risks, or validating their estimates. This can be a costly attitude, because in reality every shortcut results in dramatic increases in project risks.

References

Boehm, Barry. Software Engineering Economics. Upper Saddle River: Prentice Hall, 1981.

Boehm, B.W., C. Abts, A.W. Brown, S. Chulani, B. Clark, E. Horowitz, R.

Madachy, D. Reifer, and B. Steece. Software Cost Estimation with COCOMO II. Upper

Saddle River: Prentice Hall, 2000.

Booch, Grady. The Software Development Team Whitepaper. Cupertino: Rational Software Corporation, 1999.

DeMarco, Tom. Controlling Software Projects: Management, Measurement, and Estimation. Englewood Cliffs: Yourdon Press, 1998.

DeMarco, Tom. Why Does Software Cost So Much? New York: Dorsett House, 1995.

DeMarco, Tom and Tim Lister. Peopleware; Productive Projects and Teams, 2nd ed. New York: Dorsett House, 1999.

Evans Michael, Alex Abela, and Tom Beltz. “Seven Characteristics of

Dysfunctional Software Projects.” CrossTalk: The Journal of Defense Software Engineering, April 2002.

Ferens, Daniel, L. Fischman, T. Fitzpatrick, D. Galorath, and D. Tarbet. “Automated Software Project Size Estimation via Use Case Points.” Report to the U.S. Government, January 2002.

Florac, William A., Robert Park, and Anita D. Carleton. Practical Software Measurement: Measuring for Process Management and Improvement. Pittsburgh: Carnegie Mellon Software Engineering Institute, 1997.

Galorath, Daniel. “Software Productivity Laws.” Arthur Anderson Symposium, 1997. Galorath, Daniel, Lee Fischman, and Dan Ferens. “Critical Mass: Advancing

the Software Sizing State of the Art, Progress and Lessons Learned.”

Galorath Incorporated. SEER-SEM Internal Mathematical Specification. El Segundo, 2004.

Humphrey, Watts. “Three Dimensions of Process Improvement. Part I:

Process Improvement.” CrossTalk: The Journal of Defense Software Engineering.

February 1998.

Humphrey, W.S., T.R. Snyder, and R.R. Willis. “Software Process Improvement at Hughes Aircraft.” IEEE Software, July 1991.

International Society of Parametric Analysis. Parametric Estimating Handbook, 2nd ed.

Sponsored by the U.S. Department of Defense. Chandler,

2002. <http://www.ispa-cost.org/PEIWeb/newbook.htm>

Jones, Capers. Estimating Software Costs. New York: McGraw-Hill, 1998.

Jones, Capers T. Assessment and Control of Software Risks. Englewood Cliffs: Prentice Hall, February 1994.

Park, Robert E. A Manager’s Checklist for Validating Software Cost and

Schedule Estimates. Pittsburgh: Carnegie Mellon Software Engineering Institute, January

1995.

Putnam, Lawrence H., and Ware Meyers. Industrial Strength Software,

Effective Management Using Measurement. Washington, D.C.: IEEE Computer Press,

1997.

Reifer, Donald J. Practical Software Reuse. 1997.

Stutzke, Richard D. Estimating Software Intensive Systems: Projects, Products, and Processes (SEI Series in Software Engineering), 2005.

Server-Side I/O Coordination for Parallel File Systems

Huaiming Song † , Yanlong Yin †, Xian-He Sun †, Rajeev Thakur $, Samuel Lang $

†Department of Computer Science, Illinois Institute of Technology, Chicago, IL 60616, USA

$Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, USA

{huaiming.song, yyin2, sun}@iit.edu, {thakur, slang}@mcs.anl.gov

ABSTRACT

Parallel file systems have become a common component of mod¬ern high-end computers to mask the ever-increasing gap between disk data access speed and CPU computing power. However, while working well for certain applications, current parallel file systems lack the ability to effectively handle concurrent I/O requests with data synchronization needs, whereas concurrent I/O is the norm in data-intensive applications. Recognizing that an I/O request will not complete until all involved file servers in the parallel file sys¬tem have completed their parts, in this paper we propose a server-side I/O coordination scheme for parallel file systems. The basic idea is to coordinate file servers to serve one application at a time in order to reduce the completion time, and in the meantime main-tain the server utilization and fairness. A window-wide coordina¬tion concept is introduced to serve our purpose. We present the proposed I/O coordination algorithm and its corresponding analy¬sis of average completion time in this study. We also implement a prototype of the proposed scheme under the PVFS2 file system and MPI-IO environment. Experimental results demonstrate that the proposed scheme can reduce average completion time by 8% to 46%, and provide higher I/O bandwidth than that of default data access strategies adopted by PVFS2 for heavy I/O workloads. Ex¬perimental results also show that the server-side I/O coordination scheme has good scalability.

Categories and Subject Descriptors

B.4.3 [Interconnections]: Parallel I/O; D.4.3 [File Systems Man¬agement]: Access methods

Keywords

server-side I/O coordination; parallel I/O synchronization; I/O op¬timization; parallel file systems

1. INTRODUCTION

*This author has now joined R&D center, Dawning Information Industrial LLC, Beijing, China. Email: songhm@sugon.com

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

SC11, November 12-18, 2011, Seattle, Washington, USA

Large-scale data-intensive supercomputing relies on parallel file systems, such as Lustre [1], GPFS [22], PVFS [9], and PanFS[18] for high-performance I/O. However, performance improvements in computing capacity have vastly outpaced the improvements in I/O performance in the past few decades and will likely continue in the future. Many high-performance computing (HPC) applications have become “I/O bounded”, unable to scale with increasing com¬pute power. The gap in performance between compute and I/O is amplified further when multiple applications compete for limited I/O and storage resources at the same time, as this leads to thrash¬ing scenarios within the HPC storage system. Parallel file systems have difficulty handling I/O workloads of multiple applications for two primary reasons. First, the file servers perform data accesses in an interleaved fashion, resulting in excessive disk seeks. Second, file servers perform I/O requests independently, without knowledge of the order of requests performed at other servers, whereas HPC applications tend to coordinate I/O across all processes. This sce¬nario leads to under-utilization of compute resources, as all com¬pute processes are held waiting for completion of an I/O request that is delayed by the interleaved scheduling choices made by an individual file server.

In general, data files are striped across all or a part of the file servers in parallel file systems. One I/O request issued from a sin¬gle client often involves data accesses on multiple servers, and the parallel I/O library has to merge the multiple data pieces from these file servers together. Moreover, collective data access from multi¬ple clients, such as collective I/O in MPI-IO [26], has to wait for all aggregators to complete. Synchronization of I/O requests across processes is common in parallel computing, and can be classified into the following three categories (as shown in Figure 1).

• Intra-request Synchronization: One I/O request issued by one client accesses data in multiple file servers. It needs to gather/scatter data pieces from/to multiple storage nodes and merge them together to complete the I/O request.

• Collective I/O Synchronization: Multiple I/O clients access data from multiple file servers collectively within a single application. It has to wait for all aggregators to complete their collective I/O operations before continuing.

• Inter-request Synchronization: Multiple clients access data from a parallel file system independently, and there is explicit synchronization among these I/O clients.

Figure 1 shows the three scenarios of data synchronization. The first two categories are implicit synchronization and the third one is explicit. In a large-scale and high-performance computing system, the parallel file system is often shared by multiple applications. When these applications run simultaneously, each file server may

Independent I/O Collective I/O Independent I/O

# 0 # 1 # 2 # 0 # 1 # 2 # 0 # 1 # 2

Intra-request sync Collective IO sync Inter-request sync

Figure 1: Three scenarios of data synchronization in parallel I/O

receive multiple I/O requests from different applications. However, these requests are likely to be served in different order on differ¬ent file servers because they are scheduled independently. Figure 2 is an example of the I/O request scheduling in 4 file servers, and there are 3 applications: A, B and C. Usually, the completion time of each application depends on the completion time of the last file server to finish the request. In the left subfigure, all nodes serve the requests in different orders. The completion times of the I/O re¬quests from the three applications are: TA = 4t; TB = 4t; TC = 4t. Thus the average completion time is: Ta,9 = 4t. If we re-arrange the requests in the file servers, letting all nodes service the requests in the same order, as shown in right part of Figure 2, the completion times are: TA = 2t; TB = 3t; TC = 4t. The average completion time is: Ta,9 = 3t. In other words, after requests re-ordering, the aver¬age completion time decreases from 4t to 3t, which reveals a sig¬nificant potential for shortening average completion time through request re-ordering at the file servers.

File Servers File Servers

Figure 2: Order of request handling affects completion time. In the left subfigure, service order is different on different file servers, and the average completion time for the three applica¬tions is 4t. While in the right subfigure, requests are serviced in concert, and the average completion time reduces to 3t.

Existing scheduling algorithms in parallel file systems, such as disk-directed I/O [13], server-directed I/O [23], and stream-based I/O [11, 21], focus on reducing data access overhead on either stor¬age nodes or network traffic, to improve throughput of each file server. These approaches have demonstrated the importance of scheduling in parallel file systems to improve performance. How¬ever, little attention has been paid to server-side I/O coordination in order to reduce average completion time of multiple applications competing for limited I/O resources. In this paper, we propose a new server-side I/O coordination scheme for parallel file systems that enables all file servers to schedule requests from different ap¬plications in a coordinated way, to reduce the synchronization time across clients for multiple applications.

The contribution of this paper is four-fold. First, we present the data synchronization problems in parallel file systems. Sec-ond, we propose an effective server-side I/O coordination scheme for parallel I/O systems to reduce the average completion time of I/O requests, and thus to alleviate the performance penalties of data synchronization. Third, we implement a prototype of the I/O co¬ordination scheme in PVFS2 and MPI-IO. Finally, we evaluate the proposed scheme both analytically and experimentally.

The remainder of this paper is organized as follows. Section 2 examines the overhead of data synchronization without I/O coor¬dination. Section 3 describes the design of I/O coordination algo¬rithm and gives an analysis of completion time. Section 4 presents the implementation of the proposed I/O scheme in PVFS2 and MPI-IO. Experimental and analytical results are discussed in Section 5. Section 6 reviews related work in server-side I/O scheduling and parallel job scheduling. Finally, Section 7 concludes this study and discusses potential future work.

2. THE IMPACT OF DATA SYNCHRONIZA-TION

Data synchronization is common in parallel file systems, where I/O requests usually consist of multiple pieces of data access in multiple file servers and will not complete until all involved servers have completed their parts. However, due to independent schedul¬ing strategies on file servers, I/O requests with synchronization

(a) Finish time on different file servers (HDD)

App0 App1 App2 App3 App4 App5 App6 App7 App8 App9

(b) Minimum and maximum finish time (HDD)

Figure 3: The finish time of I/O requests from different applications on different file servers. This set of experiments were intra-request synchronization scenario with 10 concurrent IOR instances and a 8-node PVFS2 system. The stripe size of PVFS2 was 64KB, and each IOR instance issued a 4MB contiguous read request to the PVFS2 system. Thereby every request involved all 8 file servers, and the size of requested data on one file server was 512KB. ‘App$K’ (k=0-9) refers to an IOR instance, ‘FS$N’ (N=0-7) refers to a file server. ‘MIN’ refers to the finish time of first complete file server, and ‘MAX’ refers to the finish time of last complete file server. The completion time of each application relies on the ’MAX’ finish time for that application on all involved file servers.

needs from different applications are very likely to be served in different orders on different file servers.

Understanding the impact of data synchronization in parallel I/O systems is critical to efficiently improving completion time. In this section, we evaluate the request completion time when file servers serve requests from multiple applications simultaneously. We em¬ployed 8 nodes for PVFS2 file servers. Each file server was in¬stalled with a 7200RPM SATA II 250GB hard disk drive (HDD), a PCI-E X4 100GB solid state disk (SSD), and the interconnection was 4X InfiniBand. We adopted the IOR benchmark to simulate the intra-request synchronization scenario and measured the finish time of all requests on different file servers. The number of concur¬rent IOR instances was 10, to simulate 10 concurrent applications. In these experiments we show only the intra-request data synchro-nization case, so each instance was configured with only one pro¬cess, which issued a 4MB contiguous data read request. Figure 3 shows the finish time of different requests on different file servers. From Figure 3 (a) and (c), we can see that, in both HDD and SSD environments, the finish time of every application varies a lot on different file servers. From subfigure (b) and (d), we can see that, the maximum finish time is 4.4 times of the minimum on average in the HDD environment and 3.1 times in the SSD environment. The completion time of one request is equal to the maximum value

of all finish times on all involved file servers. Therefore, the signif¬icant deviation of finish time on multiple file servers leads to high completion time of data accesses.

The experimental results also indicate that, due to the indepen¬dent scheduling strategy on each file server, data accesses are fin¬ished in different orders for concurrent applications. The difference of service orders on different file servers will become much greater in the inter-request or collective I/O synchronization cases, where each application has multiple processes. As a result, the indepen¬dent scheduling strategy on file servers introduces a large number of idle CPU cycles waiting for data synchronization on computing nodes, and the case will become even worse for large-scale HPC clusters. The results also reveal that there is a significant potential to shorten completion time by coordinated I/O scheduling on file servers.

3. I/O COORDINATION

In order to reduce the overhead of data synchronization, we pro¬pose a server-side I/O coordination scheme which re-arranges I/O requests on file servers, so that requests are serviced in the same order in terms of applications on all involved nodes. Data syn¬chronization usually explicitly or implicitly exists in parallel pro¬cesses of parallel applications. The re-ordering aims at scheduling

File Server 1 File Server 2 File Server 3 File

Server 4 File Server 1 File Server 2 File Server 3 File

Server 4 File Server 1 File Server 2 File Server 3 File Server 4

Time Window

t Time Window

Time Window

t Time Window

Time Window

Figure 4: I/O coordination scheme in parallel file systems

the parallel I/O requests that need to be synchronized to run to-gether, which can benefit the system with a shorter average comple¬tion time of all I/O requests. A good scheduling algorithm should take into account both performance and fairness. A good practi¬cal scheduling algorithm also requires simplicity in implementa¬tion. The proposed I/O coordination is no exception. For fairness, all I/O requests should be serviced within an acceptable period to avoid starvation. To provide the right balance of performance and fairness, the concept of ‘Time Window’ and ‘Application ID’ are introduced to support the server-side I/O coordination approach.

Time Window. All I/O requests issued to a file server can be regarded as a time series. The time series is then divided into suc¬cessive segments by a fixed time interval. Here each segment of the time series is referred to as Time Window. Thus, one Time Window consists of a number of I/O requests. The value of the time interval can be regarded as Time Window Width.

Application ID. We allocate an integer value for each appli-cation running on the cluster. The integer is an identification of “which application an I/O request belongs to”, and is referred to as Application ID. For each I/O request, it will pass on this integer to the file servers.

According to the definition, all I/O requests from one applica-tion have the same ‘Application ID’. For applications with multi¬ple parallel processes, such as MPI programs, there might be large amounts of data synchronization. In order to alleviate the perfor¬mance penalties of synchronization, I/O requests from all processes should have the same ‘Application ID’, and they should be served in concert in multiple file servers. The ‘Application ID’ is gener¬ated automatically in the parallel I/O library and it is transparent to users. It can be implemented in parallel I/O client libraries or the middleware layer, without modifying application programs.

For fairness, requests in an earlier ‘Time Window’ will be ser-viced prior to those in a later one, to avoid starvation. The request time can use either file-server-side time (the arrival time of a re¬quest) or client-side time (the issue time of a request). Because of

network latency and load imbalance issues, one client side request may have different arrival time on different file servers. In a system with many concurrent clients, a request issued earlier might get a later arrival time on some file servers. For these reasons, in our implementation, we choose client-side time as the request time.

3.1 Algorithm

It is not difficult to imagine that in a parallel file system, a large number of I/O requests might be queued on each file server at a time. These I/O requests might come from multiple applications. As all arriving requests are attached with a request time and an ‘Application ID’, the I/O coordination algorithm can be described as follows. In the same ‘Time Window’, I/O requests are ordered by the value of ‘Application ID’; while in different ‘Time Windows’, requests in an earlier window would be serviced prior to those in a later one.

The proposed I/O scheduling algorithm is based on the obser-vation that requests from the same application have a better lo-cality and, equally important, the execution will be optimized if these requests finish at the same time. It takes both performance and fairness into consideration. In each time window, requests are served one application at a time in order to reduce the overhead of data synchronization. In addition, none of the requests will be starved, because requests in an earlier time window will always be performed first.

Figure 4 illustrates how the I/O coordination algorithm works in parallel file systems. In this example, there are 4 file servers and three concurrent applications. The original request arrival orders are inconsistent on different file servers, such as in subfigure (a). The series of I/O requests are split into successive ‘Time Windows’ by a fixed time interval on all file servers, as shown in subfigure (b). The scheduler on each file server then reorders the requests in each ‘Time Window’ by ‘Application ID’, so that requests from one ap¬plication can be serviced in the same time on all file servers, as shown in subfigure (c).

The scheduler on each file server maintains a queue for all re-quests, which determines the service order of I/O requests. When a new I/O request arrives, if the queue is empty, the request will be scheduled immediately. If the queue is not empty, the scheduler will insert the request into the queue in terms of ‘Time Window’ and ‘Application ID’. The scheduler keeps issuing request with the highest priority (i.e. the head of the queue) to the low-level storage devices in current queue on each file server. Since the ‘Applica¬tion ID’ and request time are generated at the client side and then passed to the file servers, there is no communication between dif¬ferent file servers while scheduling the requests. The use of ‘Ap¬plication ID’ and ‘Time Window’ has significantly simplified the implementation of the coordination and paved the foundation for good scalability as the number of file servers increases.

3.2 Completion Time Analysis

Assume that the number of file servers is n, the number of con¬current applications is m, and that each application needs to access data on all file servers (for simplicity). A collective data access from one application is mapped into n sub-parts to all file servers, and each sub-part is also a request in a file server. The service time on each file server for each sub-part is t.

Without I/O coordination, the sub-parts are served in different file servers independently. As requests are issued simultaneously, the sub-parts may be served randomly without order on all file servers. Hence for each sub-part, the finish time on each file server can randomly fall in {t, 2t, 3t, ..., mt}, and the finish time of data access for one application depends on the latest finish time of all nodes. The expectation of completion time of one data access is equal to the expectation of the maximum finish time on all n file servers. The average completion time can be represented as For¬mula (1), where F(k) means the probability distribution function and f (x) represents the probability density function. From the for¬mula, we observe that, if there is only 1 file server, the expectation of completion time is m+1

2 t, which conforms to the distribution of our assumption. The formula also indicates that the completion time increases as the number of file servers n increases, and also as the concurrent applications number m increases. When the file

server number n is very large, mnt

then the average completion time would be close to mt. Tavg = E(Max(T)) = (

k(F(k)  F(k  1)))t

k(( k )n  ( k  1 )n))t

m m

kn (1)

With the I/O coordination strategy, all file servers serve applica¬tions one at a time. I/O requests with synchronization needs will be served at the same time on all file servers. Therefore, the comple¬tion times for these applications are: t, 2t, ..., mt, and the average completion time can be represented as Formula (2). The formula indicates that the average completion time is independent of n, the number of file servers. That means the average completion time of the I/O coordination scheme is much more scalable than that of

existing independent scheduling strategies. Currently, parallel file systems usually reach up to hundreds of storage nodes or even be¬yond. The proposed I/O coordination strategy is a practical way to reduce the request completion time for data-intensive applications.

m + 1

kt =

From Formula (1) and (2), we can calculate the reduction of the average completion time as follows.

Tdif f = Tavg  Tavg

kn (3)

As can be seen in Formula 3, when the number of file servers n is very big, the reduction of completion time would be close to

m1

2t, and the decrease rate would be approaching m1

2m . As the

number of concurrent applications m increases, the decrease rate is

approaching 50%.

4. IMPLEMENTATION

We have implemented the server-side I/O coordination scheme under PVFS2[9] and MPI-IO. PVFS2 is an open source parallel file system developed jointly by Clemson University and Argonne Na¬tional Laboratory. It is a virtual parallel file system for Linux clus¬ters based on underlying native file systems on storage nodes. The prototype implementation includes modifications to the PVFS2 re¬quest scheduling module and the PVFS2 driver package in ROMIO [26] MPI-IO library.

4.1 Implementation in PVFS2

We modified the client interface and server side request sched¬uler in PVFS2. The client interface passes ‘Application ID’ and ‘Request Time’ to the file servers, and then the file servers re¬arrange requests service orders based on the two parameters.

We utilize the ‘PVFS_hint’ mechanism to pass the two param¬eters between I/O clients and file servers. Two new hint types are defined in the PVFS2 source code: ‘PINT_HINT_APP_ID’ and ‘PINT_HINT_REQ_TIME’, representing the Application ID and request time respectively. We made a modification of the client-side interface PVFS_sys_read/write(), adding PVFS_hint as a parameter, so that the hint could be passed to the PVFS2 server side.

When a file server receives a request, the scheduler first calcu¬lates its priority, and then inserts the request to the request queue in the ascending order of their priorities. The smaller the prior¬ity number a request gets, the earlier it would be scheduled. The request priority is calculated as follows.

req_prior = req_time / interval * 32768 + app_id;

Here req_time is the issue time of the I/O request from the client side, and it is an integer value referring to the number of millisec¬onds since ‘1970-01-01 00:00:00 UTC’. Interval is the width of the ‘Time Window’, which can be defined as a startup parameter in the PVFS2 configuration file. If interval is not configured, it will use the default value (1000ms for HDD and 250ms for SSD). App_id represents ‘Application ID’, and it is an integer value in the range 0 to 32767. From the formula we observe that the req_prior

of a request in an earlier ‘Time Window’ is guaranteed to be smaller than a request in a later one. Also in one ‘Time Window’, a request with a small Application ID will be scheduled prior to that with a large one. Therefore, all the I/O requests in file servers are ordered by the value of req_prior.

In current PVFS2, each file server maintains a set of request queues for different file handles, and services requests in each queue in the FCFS (First Come First Serve) way. A file handle corre¬sponds to a data file on one file server, which is usually a subfile of a whole PVFS2 file. We designed a global shared request queue to store all I/O requests of different jobs in the requests scheduler module. In request post function PINT_req_sched_post(), instead of adding an I/O request to the tail of the request queue of each file handle, the I/O scheduler inserts the request into the shared request queue according to the value of req_prior. The trove module of PVFS2 handles read/write operations on block devices one by one from the head of the shared queue. Therefore, all I/O requests are serviced in the order of req_prior.

4.2 Implementation in MPI-IO Library

We also modified the PVFS2 driver in ROMIO [26] to pass ‘Re¬quest Time’ and ‘Application ID’ via ‘PVFS_hint’. ‘Application ID’ is generated the first time when an MPI program calls func¬tion MPI_File_open(), and then it is broadcast to all MPI pro¬cesses. ‘Application ID’ is a global variable shared by all MPI processes, so that all processes of an MPI program get the same value of ‘Application ID’. It is an unsigned integer value, which is generated randomly between 0 and 32767 by default. For system performance tuning, we also provide a configuration interface for parallel file system administrators. Administrators can specify the value of the ‘Application ID’ in a global configuration file, either as a fixed number or a range. If it is specified as a range, the value will be generated randomly in the range.

ROMIO[26] is a high-performance, portable implementation of MPI-IO, providing applications with a uniform interface in the top layer, and dealing with data access to various file systems by an internal abstract I/O device layer called ADIO. It provides vari¬ous types of file system drivers in its internal abstract I/O device layer, including PVFS2. We modified the PVFS2 driver package in ROMIO: for every data access function, it first generates a re¬quest time, and adds the request time and global ‘Application ID’ into a variable of PVFS_hint type, and then passes the hint to file servers by calling modified data access functions. Following is an example of calling the PVFS2 data read interface.

...

PVFS_hint chint = PVFS_HINT_NULL;

int appid = app_id;

struct timeval rtime;

gettimeofday(&rtime, NULL);

long int req_time = rtime.tv_sec;

/* add application id and request time to hint */ PVFS_hint_add(&chint, "pvfs.hint.app_id", sizeof(int), &appid);

PVFS_hint_add(&chint, "pvfs.hint.req_time", sizeof(long int), &req_time);

/* call new read/write function with the hint parameters.*/

ret = PVFS_sys_read2(pvfs_fs->object_ref, file_req, offset, buf, mem_req, &(pvfs_fs->credentials), &resp_io, chint);

...

These code modifications in the MPI-IO library are transparent to application programmers and users. There is no need to mod¬

ify the source code of application; the user can simply relink the program using the modified MPI-IO library.

The request time is one of the primary factors used for request reordering on file servers in the proposed I/O coordination strat-egy. For this reason, the clock of all machines in the large-scale system must be synchronized. In our implementation, the request time is generated in MPI-IO library at the client side, so all the client machines must adopt the same clock. Clock skew of client nodes may lead to unexpected requests service orders, especially for the collective I/O synchronization and inter-request synchro¬nization cases. Currently, most of the high-performance comput¬ing clusters have synchronized clocks using either NTP service or hardware clock synchronization(for example in Blue Gene/P).

5. EXPERIMENTAL EVALUATION 5.1 Experiments Setup

Our experiments were conducted on a 65-node SUN Fire Linux-based cluster, with one head node and 64 computing nodes. All nodes were equipped with Gigabit Ethernet interconnection. The model of head node was Sun Fire X4240, installed with dual 2.7 GHz Opteron quad-core processors, 8GB memory, and 12 500GB 7200RPM SATA II disk drives configured as RAID5 disk array. The computing nodes were Sun Fire X2200 servers, each with dual 2.3GHz Opteron quad-core processors, 8GB memory, and a 250GB 7200RPM SATA hard drive. All 65 nodes were connected with Gigabit Ethernet. In addition, 17 of these nodes (including the head node) were connected with 4X InfiniBand network, and had a PCI-E X4 100GB SSD. All these nodes ran Ubuntu 9.04 (Linux kernel 2.6.28.10) operating system. We implemented the I/O coordination strategy in MPICH2-1.1.1p1 and PVFS2 2.8.1 file system.

We evaluated the proposed I/O coordination strategy in both ‘Gi¬gabit Ethernet + HDD’ and ‘InfiniBand + SSD’ environments. We measured average completion time, system scalability, and band¬width with IOR, PIO-Bench, MPI-TILE-IO, and Noncontig bench¬marks. IOR benchmark is a software used to test random and se¬quential I/O performance of parallel file systems. PIO-Bench pro¬vides a flexible framework for standardized testing of multiple file access methods. MPI-TILE-IO and Noncontig are designed to test the performance of MPI-IO for non-contiguous access workloads. All the tests were repeated 3 times. Before each run, we flushed memory to avoid the impact of memory cache and buffer.

5.2 Results and Analysis

First we conducted experiments to evaluate the completion time of I/O requests with the proposed I/O coordination strategy, by comparing with original scheduling strategy (without I/O coordina¬tion) in PVFS2. We used the same application scenarios shown in Figure 3. Figure 5 shows the completion time of different applica¬tions on different file servers. From the results we see that the I/O requests from one application were served together, and different applications finished one by one on all file servers. The maximum finish time is reduced from 4.4 to 1.3 times of the minimum finish time in the HDD environment, and from 3.1 to 1.2 times in the SSD environment. The completion time of one application relies on the maximum finish time of all file servers. From the results, we ob¬serve that the average completion time of all applications is reduced around 29.8% in the HDD environment and 19.5% in the SSD en¬vironment. Compared with the results in Figure 3, the proposed I/O coordination lets requests from the same application complete together, one at a time, rather than mixed random. We also notice some crossover in the completion time of the requests in subfigure (a) and (c). The reason is that, due to nonuniform network delays,

(a) Finish time on different file servers (HDD) (b) Minimum and maximum finish time (HDD)

Figure 5: The finish time of I/O requests on different file servers with the proposed I/O coordination strategy. All file servers serve one application at a time together, and they serve I/O requests from multiple applications in the same order. The application scenarios are the same as in Figure 3.

some requests with low priority were already issued to the storage devices in cases when some requests with high priority arrived late on some file servers. The proposed I/O coordination scheme al¬ways issues requests with the highest priority to low-level storage devices in the request queue on each file server. Therefore, a small percent crossover is expected.

We then compared the average completion time with different number of concurrent applications. We employed 16 file servers, and tested in both HDD and SSD environments. We used multiple instances of IOR to simulate concurrent applications. The numbers of concurrent instances were 2, 4, 6, 8, 10, 12, 14 and 16 respec¬tively, and the number of MPI processes for each IOR instance were 8, 16 and 32, respectively. The width of ‘Time Window’ was set as 1000 milliseconds. The I/O request size was 128 KB, and the stripe size of PVFS2 was 4 KB. We added an MPI_Barrier operation between two requests and measured the completion time of each I/O request. Figure 6 shows the performance results. The prefix in the legend indicates the number of processes in each application, e.g. ‘32C’ means 32 processes per application. The suffix ‘cio’ means the proposed I/O coordination strategy, and ‘ori’ means the original scheduling strategy in PVFS2. From the results we observe that the proposed I/O coordination always achieves lower average completion time, and the decrease in completion time is about 8% to 42% in HDD environment and 11% to 43% in SSD environment. Moreover, as the number of concurrent applications increases, the

decrease rate of completion time rises, which matches our previous analysis.

Next we conducted experiments to evaluate the scalability of the proposed I/O coordination strategy. We configured PVFS2 with 2, 4, 8, 16, 32 and 64 file servers in HDD environment and 2, 4, 8, and 16 file servers in SSD environment, respectively. We adopted PIO-Bench instances for applications, and the number of MPI pro¬cesses for each application were 8, 16, and 32, respectively. In this set of experiments, we measured the completion time of sequential read and write. The request sizes were 8KB  n (the number of file servers) for different runs, so that for each request the data size on all file servers is the same (8KB). We ran 10 concurrent PIO-Bench instances together. Figure 7 shows the results; the X axis represents the number of MPI processes for I/O coordination and original data access strategies. The figure demonstrates that I/O coordination can get a sustained steady completion time as the number of file servers increases, while with the original data access strategy, the average completion time grows as system scale increases. In the case of 2 file servers, the I/O coordination could obtain about 10% reduc¬tion of average completion time compared to original scheduling strategy in both HDD and SSD environments. While the number of file servers increases, the completion time decrease is around 46% for 64-node HDD environment and 39% for 16-node SSD en¬vironment. The results indicate that, the proposed I/O coordination

Figure 6: Average completion time with different numbers of concurrent applications. Prefix ‘8C’, ‘16C’ and ‘32C’ mean each application has 8, 16 and 32 MPI processes, respectively. Suffix ‘cio’ means the I/O coordination scheme, and ‘ori’ means the original data access strategy. Labels in Figure 7 are simi¬larly defined. We used multiple instances of IOR to simulate concurrent applications.

strategy is effective and even more appropriate for large-scale par¬allel file systems.

We also conducted experiments to evaluate the effect of differ¬ent lengths of the time window of the proposed I/O coordination scheme. We set time window sizes as 250ms, 500ms, 1000ms, and 2000ms, and compared their completion time and I/O bandwidth without I/O coordination. The number of file servers was 16 in both HDD and SSD experiments. We adopted 3 IOR, 3 PIO-Bench, 2 UPI-TILE-IO, and 2 Noncontig instances to simulate 10 concur¬rent applications. The numbers of UPI processes for each appli¬cation were 8, 16, and 32, respectively (labelled as ‘8C’, ‘16C’, and ‘32C’). The request sizes of all programs were 128 KB, and the stripe size was 4 KB. Figure 8 shows the experimental results, where subfigures (a) and (c) show average completion time and subfigures (b) and (d) show the aggregate I/O bandwidth. From subfigures (a) and (c) we observe that, for all time window sizes, the completion times with the proposed I/O coordination strategy are lower than that with original strategy without I/O coordination. In addition, the window size 1000ms results in the lowest comple¬tion time in almost all cases in the HDD tests, and the 250ms win¬dow size results in the lowest completion time in SSD tests. From

Figure 7: Average completion time with different numbers of file servers. In HDD tests, the number of file servers was con-figured as 2, 4, 8, 16, 32, or 64. In SSD tests, we configured the number of file server as 2, 4, 8, or 16, respectively. We adopted PIO-Bench instances to simulated concurrent applications.

subfigure (b) we can observe that, in HDD tests, the I/O bandwidth increases as the window size is increased from 250ms to 1000ms, and the I/O bandwidth with window size 1000ms and 2000ms are similar. From subfigure (d) we observe that, in SSD tests the win¬dow size 250ms obtained the highest bandwidth. From the results in HDD environment we see that, when the number of processes in each application is 8, the bandwidth of original I/O scheduling strategy is little higher (up to 1.2%) than I/O coordination scheme in some cases. But for 32 processes, the I/O coordination strategy achieved about 10.9% higher aggregate bandwidth than the original strategy. The results in SSD show that the I/O coordination scheme always obtains the highest I/O bandwidth. The results indicate that the I/O coordination strategy can achieve comparable bandwidth to the original strategy when the I/O workload of a parallel file sys¬tem is heavy. The experimental results also indicate that, the size of time windows affect the completion time and I/O bandwidth. Gen¬erally, parallel file systems consisting of high performance storage devices should set a short time window, and those consisting of lower performance storage devices should set a relative large win¬dow size. Based on the results in this set of experiments, we rec¬ommend to set the window size to 1000ms in an HDD environment and 250ms in an SSD environment.

Figure 8: Average completion time and aggregate I/O bandwidth under different window sizes. We run 10 concurrent applications (3 IOR, 3 PIO-Bench, 2 MPI-TILE-IO, and 2 Noncontig instances) in both HDD and SSD environments. The window sizes of I/O coordination scheme were 250ms, 500ms, 1000ms, and 2000ms, respectively. We also measured the results without I/O coordination strategy (labelled as ’ORI’ in the figure).

6. RELATED WORK

We discuss related work in the area of scheduling in parallel I/O and parallel file systems, and we also discuss coordinated schedul¬ing techniques, and discuss how our work differs from those efforts.

6.1 Server-side I/O Scheduling in Parallel File

Systems

In order to obtain sustained peak I/O performance, a collection of I/O scheduling techniques have been developed for the server side I/O scheduling of parallel file systems, such as disk-directed I/O [13], server-directed I/O [23], and stream-based I/O [11, 21]. These techniques succeed in achieving high bandwidth in disks and networks of file servers, by reducing either the frequency of disk seeks, or the waiting time of socket connections. However, to the best of our knowledge, little effort has been devoted to reducing the average completion time of I/O requests of multiple applications for multiple file servers.

Numerous research efforts have been devoted to improving qual¬ity of service (QoS) of I/O requests in distributed or parallel storage systems [4, 8, 10, 12, 19, 29]. Some of them adopted deadline-driven strategies [12, 19, 31], that allow the upper layer to spec¬ify latency and throughput goals of file servers and schedule the requests based on Earliest Deadline First(EDF) [16] or its vari¬ants [19, 20, 31]. Some approaches employed proportional-sharing

scheduling strategies [8, 29] between competing I/O workloads. These strategies aim to either provide fairness of sharing of band¬width for clients, or to control the requests issue queue lengths of I/O clients to guarantee a moderate latency. Our approach is differ¬ent from these works in that we do not use explicit deadline-driven or throttling control algorithms, making it completely transparent to I/O clients. Moreover, our approach takes into consideration multiple file servers.

6.2 Coordinated scheduling

Coordinated scheduling has been recognized as an effective ap¬proach to obtain efficient execution for parallel or distributed envi¬ronments. It has been achieved with gang scheduling [5, 6, 7, 14, 17, 27, 28, 30] and co-scheduling [2, 3, 15, 25, 24]. A large body of research has been devoted to reducing the synchronization time for communication between threads/processes, by scheduling related threads/processes to run simultaneously on different processors in a parallel or distributed system. The scheduler packs synchronized processes into gangs and schedules them simultaneously, to allevi¬ate performance penalties of communicative synchronization. Fei-telson et al. [5] made a comparison of various packing schemes for gang scheduling, and evaluated them under different cases. Wise¬man et al. [30] matched gangs that make heavy use of the CPU with gangs that make light use of the CPU, and scheduled such pairs

together, to improve the throughput by making better utilization of the system resources. Wang et al. [27, 28] presented a mathe-matical model that can measure system performance with different scheduling parameters, to guide the design of scheduling policies. All these coordinated scheduling techniques focus on the thread, process or job level, either to reduce synchronization time or to better utilize the system resources. However, none of these works focused on I/O request scheduling. Moreover, current coordinated scheduling policies employed centralized or distributed schedulers, both of which are based on communication itself among proces-sors/nodes. In our approach no central control mechanism exists, nor communication between file servers, making it much more suit-able for large-scale parallel file systems.

Zhang et al. [32] proposed an inter-server coordination technique in parallel file systems to improve the spatial locality and program reuse distance. They calculated the access distances and group the requests with small distance together, but this optimization does not apply to SSDs. Our approach, on the other hand, is based on the observation that the requests with synchronization needs will be optimized if they finish at the same time. We coordinate among file servers so that they work on one application at a time together. The motivation and methodology of the design and implementa¬tion of our and their approaches are very different. In addition, our approach does not require a central control, is simple in implemen¬tation, and can be extended to SSDs.

7. CONCLUSIONS AND FUTURE WORK

Parallel file systems are widely used for data-intensive and high-performance computing applications. However, I/O performance lags far behind the computing capacities in current systems, result¬ing in processors wasting large numbers of cycles waiting for data to arrive. The situation becomes even worse when multiple applica¬tions try to access data concurrently. Existing server-side schedul¬ing algorithms focus on tapping the potential capacity of each sin¬gle file server to achieve a higher throughput of each storage node, thus to improve overall system throughput. Little has been done to investigate coordinated data access on multiple file servers to reduce the average completion time from the parallel data access point-of-view. This paper targets the problem of reducing the aver¬age completion time of I/O requests from multiple applications.

This paper proposes a novel server-side I/O coordination scheme in which all file servers serve requests in step to alleviate the impact of data synchronization, and also maintain fairness and simplicity. The proposed scheme lets all file servers work on one application at a time based on a automatically created chronological order recog¬nized within the whole cluster, rather than all servers working in¬dependently. This paper makes the following contributions. First, we describe the I/O synchronization problems in parallel I/O sys¬tems, and demonstrate that re-arranging service orders on multiple file servers is beneficial. Second, we propose an I/O coordination scheme to let all file servers work in concert. Third, we have imple¬mented the proposed I/O strategy in PVFS2 and MPI-IO. The ex¬perimental results demonstrate that, compared to the conventional data access strategy, the proposed I/O coordination scheme can re¬duce the I/O completion time by up to 46% and provide a com¬parable I/O bandwidth. Analytical and experimental results con¬firm that the control mechanism of the proposed I/O coordination scheme is simple and effective, and it is an appropriate choice for large-scale parallel file systems with heavy I/O workloads.

In the future, we plan to investigate optimization of the I/O coor¬dination strategy based on application data access patterns. We also plan to add a minimum group communication in I/O coordination, to explore its feasibility for imbalanced data access workloads.

8. ACKNOWLEDGMENTS

The authors are thankful to Jibing Li from Illinois Institute of Technology and Robert Ross from Argonne National Laboratory for their constructive and thoughtful suggestions toward this study. This research was supported in part by National Science Founda¬tion under NSF grant CCF-0621435 and CCF-0937877, and in part by the Office of Advanced Scientific Computing Research, Office of Science, U.S. Department of Energy, under Contract DE-AC02-06CH11357.

9. REFERENCES

[1] High-performance Storage Architecture and Scalable Cluster File System. Lustre File System White Paper, Dec. 2007.

[2] R. H. Arpaci, A. C. Dusseau, A. M. Vahdat, L. T. Liu, T. E. Anderson, and D. A. Patterson. The Interaction of Parallel and Sequential Workloads on a Network of Workstations. In SIGMETRICS ’95/PERFORMANCE ’95: Proceedings of the 1995 ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems, pages 267–278, New York, NY, USA, 1995. ACM.

[3] A. C. Arpaci-Dusseau. Implicit Coscheduling: Coordinated Scheduling with Implicit Information in Distributed Systems. ACM Transactions on Computer Systems (TOCS), 19(3):283–331, 2001.

[4] D. D. Chambliss, G. A. Alvarez, P. Pandey, D. Jadav, J. Xu, R. Menon, and T. P. Lee. Performance Virtualization for Large-scale Storage Systems. In SRDS ’03: Proceedings of the 22th International Symposium on Reliable Distributed Systems, pages 109–118, 2003.

[5] D. G. Feitelson. Packing Schemes for Gang Scheduling. In IPPS ’96: Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing, pages 89–110, London, UK, 1996. Springer-Verlag.

[6] D. G. Feitelson and M. A. Jette. Improved Utilization and Responsiveness with Gang Scheduling. In IPPS ’97: Proceedings of the Job Scheduling Strategies for Parallel Processing, pages 238–261, London, UK, 1997. Springer-Verlag.

[7] D. G. Feitelson and L. Rudolph. Gang Scheduling Performance Benefits for Fine-Grain Synchronization. Journal of Parallel and Distributed Computing, 16:306–318, 1992.

[8] A. Gulati, I. Ahmad, and C. A. Waldspurger. PARDA: Proportional Allocation of Resources for Distributed Storage Access. In FAST ’09: Proccedings of the 7th Conference on File and Storage Technologies, pages 85–98, Berkeley, CA, USA, 2009. USENIX Association.

[9] I. F. Haddad. PVFS: A Parallel Virtual File System for Linux Clusters. Linux Journal, page 5, 2000.

[10] L. Huang, G. Peng, and T.-c. Chiueh. Multi-dimensional Storage Virtualization. SIGMETRICS Perform. Eval. Rev., 32(1):14–24, 2004.

[11] W. B. Ligon III and R. B. Ross. Implementation and Performance of a Parallel File System for High Performance Distributed Applications. In HPDC ’96: Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing, page 471, Washington, DC, USA, 1996. IEEE Computer Society.

[12] M. Karlsson, C. Karamanolis, and X. Zhu. Triage: Performance Differentiation for Storage Systems Using Adaptive Control. ACM Trans. Storage, 1(4):457–480, 2005.

[13] D. Kotz. Disk-directed I/O for MIMD Multiprocessors. ACM Trans. Comput. Syst., 15(1):41–74, 1997.

[14] W. Lee, M. Frank, V. Lee, K. Mackenzie, and L. Rudolph. Implications of I/O for Gang Scheduled Workloads. In IPPS ’97: Proceedings of the Job Scheduling Strategies for Parallel Processing, pages 215–237, London, UK, 1997. Springer-Verlag.

[15] S. T. Leutenegger and M. K. Vernon. The Performance of Multiprogrammed Multiprocessor Scheduling Algorithms. In SIGMETRICS ’90: Proceedings of the 1990 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pages 226–236, New York, NY, USA, 1990. ACM.

[16] C. L. Liu and J. W. Layland. Scheduling Algorithms for Multiprogramming in a Dard-Real-Time Environment. Journal of the ACM, 20(1):46–61, 1973.

[17] J. E. Moreira, W. Chan, L. L. Fong, H. Franke, and M. A. Jette. An Infrastructure for Efficient Parallel Job Execution in Terascale Computing Environments. In Supercomputing ’98: Proceedings of the 1998 ACM/IEEE conference on Supercomputing (CDROM), pages 1–14, Washington, DC, USA, 1998. IEEE Computer Society.

[18] D. Nagle, D. Serenyi, and A. Matthews. The Panasas Activescale Storage Cluster: Delivering Scalable High Bandwidth Storage. In Supercomputing ’04: Proceedings of the 2004 ACM/IEEE conference on Supercomputing, page 53, Washington, DC, USA, 2004. IEEE Computer Society.

[19] A. Povzner, D. Sawyer, and S. Brandt. Horizon: Efficient Deadline-driven Disk I/O Management for Distributed Storage Systems. In HPDC ’10: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, pages 1–12, 2010.

[20] A. L. Narasimha Reddy and J. C. Wyllie. Disk Scheduling in a Multimedia I/O System. In MULTIMEDIA ’93: Proceedings of the First ACM International Conference on Multimedia, pages 225–233, New York, NY, USA, 1993. ACM.

[21] R. B. Ross and W. B. Ligon III. Server-Side Scheduling in Cluster Parallel I/O Systems. Calculateurs Parall`eles Special Issue on Parallel I/O for Cluster Computing, 2001.

[22] F. Schmuck and R. Haskin. GPFS: A Shared-Disk File System for Large Computing Clusters. In FAST ’02: Proceedings of the 1st USENIX Conference on File and Storage Technologies, page 19, Berkeley, CA, USA, 2002. USENIX Association.

[23] K. E. Seamons, Y. Chen, P. Jones, J. Jozwiak, and

M. Winslett. Server-directed Collective I/O in Panda. In Supercomputing ’95: Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM), page 57, New York, NY, USA, 1995. ACM.

[24] A. Snavely and D. M. Tullsen. Symbiotic Job Scheduling for a Simultaneous Multithreaded Processor. In ASPLOS-IX: Proceedings of the ninth international conference on Architectural support for programming languages and operating systems, pages 234–244, New York, NY, USA, 2000. ACM.

[25] P. G. Sobalvarro. Demand-Based Coscheduling of Parallel Jobs on Multiprogrammed Multiprocessors. PhD thesis, 1997. Supervisor-Weihl, William E.

[26] R. Thakur, W. Gropp, and E. Lusk. Data Sieving and Collective I/O in ROMIO. In FRONTIERS ’99: Proceedings

of the The 7th Symposium on the Frontiers of Massively Parallel Computation, page 182, Washington, DC, USA, 1999. IEEE Computer Society.

[27] F. Wang, H. Franke, M. C. Papaefthymiou, P. Pattnaik, L. Rudolph, and M. S. Squillante. A Gang Scheduling Design for Multiprogrammed Parallel Computing Environments. In IPPS ’96: Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing, pages 111–125, London, UK, 1996. Springer-Verlag.

[28] F. Wang, M. C. Papaefthymiou, and M. S. Squillante. Performance Evaluation of Gang scheduling for Parallel and Distributed Multiprogramming. In IPPS ’97: Proceedings of the Job Scheduling Strategies for Parallel Processing, pages 277–298, London, UK, 1997. Springer-Verlag.

[29] Y. Wang and A. Merchant. Proportional-share Scheduling for Distributed Storage Systems. In FAST ’07: Proceedings of the 5th USENIX Conference on File and Storage Technologies, pages 4–4, Berkeley, CA, USA, 2007. USENIX Association.

[30] Y. Wiseman and D. G. Feitelson. Paired Gang Scheduling. IEEE Trans. Parallel Distrib. Syst., 14(6):581–592, 2003.

[31] T. M. Wong, R. A. Golding, C. Lin, and R. A.

Becker-Szendy. Zygaria: Storage Performance as a Managed Resource. In In IEEE Real Time and Embedded Technology and Applications Symposium (RTAS 06, pages 125–134, 2006.

[32] X. Zhang, K. Davis, and S. Jiang. IOrchestrator: Improving the Performance of Multi-node I/O Systems via Inter-Server Coordination. In Supercomputing ’10: Proceedings of the 2010 ACM/IEEE conference on Supercomputing, Washington, DC, USA, 2010. IEEE Computer Society.

A Calculus for Game-based Security Proofs

David Nowak1 and Yu Zhang2

1 Research Center for Information Security, AIST, Japan

2 Institute of Software, Chinese Academy of Sciences, China

Abstract. The game-based approach to security proofs in cryptography is a widely-used methodology for writing proofs rigorously. However a unifying language for writing games is still missing. In this paper we show how CSLR, a probabilistic lambda-calculus with a type system that guarantees that computa¬tions are probabilistic polynomial time, can be equipped with a notion of game indistinguishability. This allows us to define cryptographic constructions, effective adversaries, security notions, computational assumptions, game transformations, and game-based security proofs in the unified framework provided by CSLR. Our code for cryptographic constructions is close to implementation in the sense that we do not assume primitive uniform distributions but use a realistic algorithm to approximate them. We illustrate our calculus on cryptographic constructions for public-key encryption and pseudorandom bit generation.

Keywords: game-based proofs, implicit complexity, computational indistinguishability

1 Introduction

Cryptographic constructions are fundamental components for information security. A cryptographic construction must come with a security proof. But those proofs can be subtle and tedious, and thus not easy to check. Bellare and Rogaway even claim in [9] that:

“Many proofs in cryptography have become essentially unverifiable. Our field may be ap¬proaching a crisis of rigor.”

With Shoup [27], they advocate game-based proofs as a remedy. This is a methodology for writing security proofs that makes them easier to read and check. In this approach, a security property is modeled as a probabilistic program implementing a game to be solved by the adversary. The adversary itself is modeled as an external probabilistic procedure interfaced with the game. Proving security amounts to proving that any adversary has at most a negligible advantage over a random player. An adversary is assumed to be efficient i.e., it is modeled as a probabilistic polynomial-time (for short, PPT) function.

However a unifying language for writing games is still missing. In this paper we show how Computational SLR [29] (for short, CSLR), a probabilistic lambda-calculus with a type system that guarantees that computations are probabilistic polynomial time, can be equipped with a notion of game indistinguishability. This allows us to define cryptographic constructions, effective adversaries, security notions, computational assumptions, game transformations, and game-based security proofs in the unified framework provided by CSLR.

Related work. Nowak has given a formal account of the game-based approach, and formalized it in the proof assistant Coq [24, 25]. He follows Shoup by modeling games directly as probability distributions, without going through a programming language. With this approach, he can machine-check game transformations, but not the complexity bound on the adversary. Previously, Corin and den Hartog had proposed a probabilistic Hoare logic in [10] to formalize game-based proofs but they suffer from the same limitation. This issue is addressed in [3] where the authors mention that their

implementation includes tactics that can help establishing that a program is PPT. Their approach is direct in the sense that polynomial-time computation is characterized by explicitly counting the number of computation steps. Backes et al. [2] are also working on a similar approach with the addition of higher-order aimed at reasoning about oracles.

The above approaches are limited to the verification of cryptographic algorithms, and cannot deal with their implementations. This issue has been tackled by Affeldt et al. in [1] where, by adding a new kind of game transformation (so-called implementation steps), game-based security proofs can be conducted directly on implementations in assembly language. They have applied their approach to the verification of an implementation in assembly language of a pseudorandom bit generator (PRBG). However they do not address the issue of uniform distributions. Indeed, because computers are based on binary digits, the cardinal of the support of a uniform distribution has to be a power of 2. Even at a theoretical level, probabilistic Turing machines used in the definition of PPT choose random numbers only among sets of cardinal a power of 2 [14]. In the case of another cardinal, the uniform distribution can only either be approximated or rely on code that might not terminate, although it will terminate with a probability equal to 1 [18]. With arbitrary random choices, one can define more distributions than those allowed by the definition of PPT. This raises a fundamental concern that is usually overlooked by cryptographers.

Mitchell et al. have proposed a process calculus with bounded replications and messages to guarantee that those processes are computable in polynomial time [22]. Messages can be terms of OSLR — SLR with a random oracle [21]. Their calculus aim at being general enough to deal with cryptographic protocols, whereas we aim at a simpler calculus able to deal with cryptographic constructions. Blanchet and Pointcheval have implemented CryptoVerif, a semi-automatic tool for making game-based security proofs, also based on a process calculus. Courant et al. have proposed a specialized Hoare logic for analyzing generic asymmetric encryption schemes in the random oracle model [11]. In our work, we do not want to restrict ourselves to generic schemes. Impagliazzo and Kapron have proposed two logics for reasoning about cryptographic constructions [19]. The first one is based on a non-standard arithmetic model, which, they prove, captures probabilistic polynomial-time computations. The second one is built on top of the first one, with rules justifying computational indistinguishability. More recently Zhang has developed a logic for computational indistinguishability on top of Hofmann’s SLR [29].

Contributions. We propose to use CSLR [29] to conduct game-based security proofs. Because the only basic type in CSLR is the type for bits, our code for cryptographic constructions is closer to implementation than the code in related work: in particular, we address the issue of uniform distributions by using an algorithm that approximates them.

CSLR does not allow sup erpolynomial-time computations (i.e., computations that are not bounded above by any polynomial) nor arbitrary uniform choices. Although this restriction makes sense for the cryptographic constructions and the adversary, the game-based approach to crypto-graphic proofs does not preclude the possibility of introducing games that perform sup erpolynomial-time computations or that use arbitrary uniform distributions. They are just idealized constructions that are used to defined security notions but are not meant to make their way into implementa¬tions. We thus extend CSLR into CSLR+ that allows for sup erpolynomial-time computations and arbitrary uniform choices. However the cryptographic constructions and the adversary will be con¬strained to be terms of CSLR.

We propose a notion of game indistinguishability. Although, it is not stronger than the notion of computational indistinguishability of [29], it is simpler to prove and well-suited for formalizing game

based security proofs. We indeed show that this notion allows to easily model security definitions and computational assumptions. Moreover we show that computational indistinguishability implies game indistinguishability, so that we can reuse as it is the equational proof system of [29]. We illustrate the usability of our approach by: proving formally in our proof system for CSLR that an implementation in CSLR of the public-key encryption scheme ElGamal is semantically secure; and by formalizing the pseudorandom bit generator of Blum, Blum and Shub with the related security definition and computational assumption.

Compared with [2] and [3], our approach has the advantage that it can automatically prove (by type inference [16]) that a program is PPT [17].

Outline. We introduce CSLR in Section 2, and use it in Section 3 to define cryptographic

constructions. We also discuss in Section 3 the problem of approximating uniform sampling from sets of arbitrary size using just fair coin tosses. In Section 4, we introduce the notion of game indistinguishability that we use to define security notions (using higher-order) and game transfor-mations. We deal with the example of ElGamal in Section 5. In Section 6, we extend CSLR with sup erpolynomial-time primitives and arbitrary uniform choices to be used in intermediate games, and we illustrate this extension on the pseudorandom bit generator of Blum, Blum and Shub. Finally, we conclude in Section 7.

2 Computational SLR

Bellantoni and Cook have proposed to replace the model of Turing machines by their safe recur¬sion scheme which defines exactly functions that are computable in polynomial time on a Turing-machine [4]. This is an intrinsic, purely syntactic mechanism: variables are divided into safe variables and normal variables, and safe variables must be instantiated by values that are computed using only safe variables; recursion must take place on normal variables and intermediate recursion results are never sent to normal variables. When higher-order recursors are concerned, it is also required that step functions must be linear, i.e., intermediate recursive results can be used only once in each step. Thanks to those syntactic restrictions, exponential-time computations are avoided. This is an elegant approach in the sense that polynomial-time computation is characterized without explicitly counting the number of computation steps.

Hofmann later developed a functional language called SLR to implement safe recursion [16, 17]. It provides a complete characterization through typing of the complexity class of probabilistic polynomial-time computations. He introduces a type system with modality to distinguish between normal variables and safe variables, and linearity to distinguish between normal functions and linear functions. He proves that well-typed functions of a proper type are exactly polynomial-time computable functions. Moreover there is a type-inference algorithm that can automatically determine the type of any expression [16]. Mitchell et al. have extended SLR by adding a random bit oracle to simulate the oracle tape in probabilistic Turing-machines [21].

More recently, Zhang has introduced CSLR, a non-polymorphic version of SLR extended with probabilistic computations and a primitive notion of bitstrings [29]. His use of monadic types [23], allows for an explicit distinction in CSLR between probabilistic and purely deterministic functions. It was not possible with the extension by Mitchell et al. [21]. We recall below the definition of CSLR and its main property.

Types. Types are defined by:

τ, τ',... ::= Bits  τ  τ'  τ -* τ'  τ -* τ'  τ , τ'  Tτ

Bits is the base type for bitstrings. The monadic types Tτ capture probabilistic computations that produce a result of type τ. All other types are from Hofmann’s SLR [17]. τ x τ' are cartesian product types. There are three kinds of functions: τ -* τ' are types for modal functions with no restriction on the use of their argument; τ -* τ' are types for non-modal functions where the argument must be a safe value; τ -o τ' are types for linear functions where the argument can only be used once. Note that linear types are not necessary when we do not have higher-order recursors, which are themselves not necessary for characterizing PTIME computations but can ease and simplify the programming of certain functions (such as defining the Blum-Blum-Shub pseudorandom bit generator in Section 3).

SLR also have a sub-typing relation <: between types. In particular, the sub-typing relation between the three kinds of functions is: τ -o τ' <: τ -* τ' <: τ -* τ'. We also have Bits -* τ <: Bits -o τ, stating that bitstrings can be duplicated without violating linearity. The subtyping relation is inherited from CSLR, with an additional rule saying that the constructor T preserves sub-typing [29].

Expressions. Expressions of CSLR are defined by the following grammar:

e1, e2,. .. ::= x I nil IB0 I B1 Icase, I rec, I λx.e I e1e2

I (e1, e2) I proj1e I proj2e I rand I return(e) I bind x +- e1 in e2

B0 and B1 are two constants for constructing bitstrings: if u is a bitstring, B0u (respectively, B1u) is the new bitstring with a bit 0 (respectively, 1) added at the left end of u. case, is the constant for case distinction: case, (n, (e, f0, f1)) tests the bitstring n and returns e if n is an empty bitstring, f0(n) if the first bit of n is 0 and f1(n) if the first bit of n is 1. rec, is the constant for recursion on bitstrings: rec,(e, f, n) returns e if n is empty, and f (n, rec,(e, f, n')) otherwise, where n' is the part of the bitstring n with its first bit cut off. rand returns a random bit 0 or 1, each with the probability 21. return(e) is the trivial (deterministic) computation which returns e with prob¬ability 1. bind x +- e1 in e2 is the sequential computation which first computes the probabilistic computation e1, binds its result to the variable x, then computes e2. All other expressions are from Hoffman’s SLR [17].

To ease the reading of CSLR terms, we shall use some syntactic sugar and abbreviations in the rest of the paper:

– λ . e represents λx . e when x does not occur as a free variable in e;

+- e1; e2 represents the probabilistic sequential computation bind x +- e1 in e2;

– x +- e1; e2 represents the deterministic sequential (call-by-value) computation (λx . e2)e1;

– if e then e1 else e2 represents a simple case distinction case(e, (e2, λ .e2, λ .e1)), which tests

the first bit of e: if it is 1 then e1 is executed, otherwise e2 is executed;

– when a program F is defined recursively by λn . rec,(e1, e2, n), we often write the definition as:

F=def λn. if n ?= nil then e1 else e2(n, F (tail(n))),

where ?= and tail are respectively the equality test between two bitstrings and the function that remove the left-most bit from a bitstring. These functions can be defined in CSLR [29].

Type system. Typing assertions of expressions are of the form Γ ~- t : τ, where Γ is a typing context that assigns types and aspects (inherited from Hofmann’s system) to variables. Intuitively, an aspect specifies how the variable can be used in the program. For instance, a linear aspect forces

that the variable can be used only once. A typing context is typically written as a list of bindings x1 :a1 τ1, ... , xn :an τn, where a1,... an are aspects. The type system for CSLR can be found in [29]. Operational semantics. We can define a reduction system for the computational SLR, and prove that every closed term has a canonical form. In particular, the canonical form of type Bits is:

b ::= nil J B0b J B1b.

If u is a closed term of type Bits, we write JuJ for its length. We define the length of a bitstring on its canonical form b:

JnilJ = 0, JBibJ = JbJ + 1 (i = 0, 1).

If e is a closed program of type TBits and all possible results of e are of the same length, we write JeJ for the length of its result bitstrings.

The language deals with bitstrings, but in many discussions of cryptography, it will be more convenient to see them as integers. We write b for the integer value of the bitstring b.

Denotational semantics. The denotational semantics of CSLR is defined based on a set-theoretic model [29]. We write B for the set of bitstrings, with a special element denoting the empty bitstring. To interpret the probabilistic computations, we adopt the probabilistic monad defined in [26]: if A is set, we write DA : A -* [0, 1] for the set of probability mass functions over A. The original monad in [26] is defined using measures instead of mass functions, and is of type (2A -* [0, oc]) -* [0, oc], where 2A denotes the set of all subsets of A, so that it can also represent computing probabilities over infinite data structure, not just discrete probabilities. But for the sake of simplicity, in this paper as well as in [29] we work on mass functions instead of measures. Note that the monad is not the one defined in [21], which is used to keep track of the bits read from the oracle tape rather than reasoning about probabilities.

When d is a mass function of DA and a E A, we also write Pr[d a] for the probability d(a). If there are finitely many elements in d E DA, we can write d as {(a1, p1), ... , (an, pn)}, where ai E A and pi = d(ai). When we restrict ourselves to finite distributions, our monad becomes identical to the one used in [24, 25].

With this monad, every computation type Tτ in CSLR will be interpreted as DTτ], where JτK is the interpretation of τ. Expressions are interpreted within an environment which maps every free variable to an element of the corresponding type. In particular, the two computational constructions are interpreted as:

Jreturn(e)Kρ = {(JeKρ, 1)}

r z

x  $ e1; e2 ρ= λv. Pv Qτ Je2Kρ[xv,](v) X Je1Kρ(v,)

where τ is the type of x (or Tτ is the type of e1). Interpretation of other types and expressions is

given in [29].

The main property of CSLR [21,29] is:

Theorem 1. The set-theoretic interpretations of closed terms of type Bits -* TBits in CSLR define exactly functions that can be computed by a probabilistic Turing machine in polynomial time.

This theorem implies that CSLR is expressive enough to model an adversary and to implement cryptographic constructions, as they both are probabilistic polynomial-time functions. We remark that adversaries can return values of types other than Bits (e.g., tuples of bitstrings), but we can

always define adversaries as a PPT function of type Bits -* TBits by adopting some encoding of different types of values into bitstrings, so the theorem still applies. The same is true in case of functions with multiple arguments: we can uncurrify them and then adopt some encoding so that the theorem still applies.

An example of PPT function. The random bitstring generation is defined as follows:

λn. if (n =? nil) then return(nil)

else b $+_ rand; u $+_ rs(tail(n)); return(b•u)

where • denotes the concatenation operation of bitstrings, which can be programmed and typed in CSLR [29]. rs receives a bitstring and returns a uniformly random bitstring of the same length. It can be checked that ~- rs : Bits -* TBits.

3 Cryptographic constructions in CSLR

Uniform distributions are ubiquitous in cryptography. However modern computers are based on binary digits, and thus in implementations the cardinal of the support of a uniform distribution has to be a power of 2. In case of a different cardinal, such a distribution can be approximated by repeatedly selecting a random value in a larger distribution whose cardinal is a power of 2, until one obtain a value in the desired range or reach the maximal number of allowed attempts (timeout, which determines the precision of the approximation). In the latter case a default value is returned. We implement this pseudo-uniform sampling in CSLR as follows:

zrand def = λn. λt. if t ?= nil then return(0n)

else v +_ rs(n); if v > n then zrand(n, tail(t))

else return(v)

The program takes two arguments: the sampling range (represented by the value n) and the timeout (represented by |t|). The test > can be programmed in CSLR. The timeout is represented by the length of the bitstring t for the sake of simplicity and readability of the program, but an alternative representation of using t as the timeout is certainly acceptable.

The program zrand uses u = 2log2 bn as the cardinal of the larger distribution and makes samplings in this distribution. The probability that one sampling falls outside the desired range

is ubn (ubn/t

u , thus probability that |t| consecutive attempts fail is . zrand will return 0n as the

default value after |t| consecutive failures, so the probability that a value smaller than n but other than 0n is returned is 1(ubn

bn , and the probability that 0n is returned is 1+(bn1)•(ubn

u )|t| u )|t|

bn .

Similarly, a finite group can be encoded in CSLR and multiplication and group exponentiation can be programmed (as implied by Theorem 1). In the sequel, we shall write Zq (q a bitstring) for the set of binaries (of the equal length of q) of {0, 1, ... , q  1}, and Z$q for the truly uniform distribution from Zq.

The public-key encryption scheme ElGamal. Let G be a finite cyclic group of order q

(depending on the security parameter η) and γ E G be a generator. The ElGamal encryption

scheme [13] can be implemented in CSLR by the following programs:

– Key generation:

KG def = λη.x +$_ zrand (q, η); return(γx,x)

KG is of type Bits -* T(Bits x Bits).

– Encryption:

Enc =def Aη. Apk. Am. y +$_ zrand (q,η); return(γy, pky * m)

Enc is of type Bits -* Bits -* Bits -* T(Bits x Bits).

– Decryption:

Dec def = Aη . Ask. Ac. proj2(c) * (proj1(c)sk)1

Dec is of type Bits -* Bits -* Bits -* Bits, which does not involve monadic type because decryption is deterministic.

Note that when encoding cryptographic constructions in CSLR, we put the security parameter η explicitly as the argument of the programs. However, as we work on bitstrings in CSLR, the security parameter in traditional cryptographic contexts actually corresponds to jηj here. In the case of ElGamal encryption, the group order q will be determined by η. Particularly, for the encryption scheme to be semantically secure, we must choose a suitable group such that the DDH assumption holds, and its order will be necessarily exponential in jηj. There are efficient algorithms which computes a suitable DDH group given η, hence can be programmed in CSLR [8].

In the implementation of KG and Enc, the security parameter η is used directly as the timeout of zrand. A more general implementation would instantiate the timeout by a polynomial of jηj, i.e., zrand (q, p(η)) where p is a well-typed SLR function of type Bits -* Bits. The choice of p will affect the final distribution of the program and consequently the advantage of adversaries in security games or experiments, but that remains negligible. It is possible to use CSLR to deal with exact security and the exact timeout with p is necessary in that case. In this paper, we use the specific timeout for the sake of clarity.

The Blum-Blum-Shub pseudorandom bit generator. The BBS generator defined in [7] is a deterministic function and can be programmed in CSLR as follows:

BBS def = Aη . Al. As. bbsrec (η, l, s2mod n)

where bbsrec is defined recursively as

bbsrec =defAη. Al. Ax. if l ?= nil then nil else parity(x)*bbsrec(η, tail(l), x2mod n).

where n is determined by the security parameter η. BBS is a well typed SLR-function of type

qBits -* Bits -* Bits -* Bits, with the second argument being the length of the resulted pseudo¬random bitstring and the third argument being the seed.

4 Game indistinguishability

In game-based proofs, an adversary involved in a game can be an arbitrary probabilistic polynomial-time program, hence it can be encoded as a CSLR program of type Bits -* Tτ, where the security parameter will bound its running time, and τ is the type of messages returned by the adversary. In CSLR, a game is a closed higher-order CSLR function of type Bits -* (Bits -* Tτ) -* TBits that returns one bit.

Definition 1 (Game indistinguishability). Two CSLR games g1 and g2 are game indistin-guishable (written as g1  g2) if for every term  such that : Bits  Tτ, and every positive polynomial P, there exists some N  N such that for all bitstring η with η N,

P(η)

We introduce the notion of game indistinguishability typically for representing game transfor-mations in game-based security proofs. A more general notion of computational indistinguishability in cryptography has been defined in the original CSLR system [29].

Definition 2 (Computational indistinguishability [29]). Two CSLR terms f1 and f2, both of type Bits  τ, are computationally indistinguishable (written as f1 ' f2) if for every closed CSLR term A of type Bits  τ  TBits and every positive polynomial P, there exists some N  N such that for all bitstring η with η N

Pr[Q(η, f1(η))]] -]  Pr[Q(η, f2(η))]] -] < 1

P(η)

where denotes the empty bitstring.

This definition is a reformulation of Definition 3.2.2 of [14] in CSLR. In particular, a CSLR term of type Bits  Tτ defines a so-called probabilistic ensemble.

Intuitively, the difference between the two notions of indistinguishability is that, computational indistinguishability allows for any arbitrary use of the compared terms by the adversary, while the game indistinguishability provides more control over the adversary as it is usual in game-based security definitions. Hence, game indistinguishability is no stronger than computational indistin-guishability as proved in the following proposition. This is why we can sometimes use the CSLR proof system, which is designed for proving computational indistinguishability, for proving game indistinguishability.

Proposition 1. Computational indistinguishability implies game indistinguishability.

Proof. Let g1 and g2 be two arbitrary games of type Bits  (Bits  Tτ)  TBits. For every adversary A of type Bits  Tτ, construct the following adversary A':

λη. λg. b $ g(A);

if b =? 1 then return(nil) else return(0).

Clearly, Pr[Q'(η, gi(η))]] = nil] = Pr[Qgi(η, A)]] = 1], and because g1 and g2 are computationally

indistinguishable, Pr[Q'(η, g1(η))]] = nil]  Pr[QA'(η, g2(η))]] = nil] is negligible. ut

We will also use the program equivalence defined in [29]. Roughly speaking, two terms e1 and e2 are equivalent (written e1  e2) if they have the same denotational semantics in any environment. Our further development in CSLR also relies on the following lemma about zrand :

Lemma 1. Let q be a CSLR bitstring. The probabilistic ensemble Qλη . zrand (q, η)]] and the ensem¬ble of truly uniform distributions Z$ q are computationally indistinguishable, i.e., for every closed CSLR term A of type Bits  τ  TBits and every positive polynomial P, there exists some N  N such that for all bitstring η with η N

Pr[~(η, zrand(q, η))]] - ]  Pr[~(η)]](Z$q) - ] < 1

Proof. We show that the two ensembles are statistically close:

2 • ΣvEZq Pr[Qzrand (q, η)~ v]  Pr[Z$ q v]

C /

1 + (bq  1) • ε 1



bq bq +(bq1)• 1ε 1



bq bq

bq  1

bq • ε

(uibq)|η|

is negligible with respect to η, where ε = and u = 2Flog2bn1. We can then conclude because

statistical closeness implies computational indistinguishability (cf. Section 3.2.2 of [14]). 

4.1 Security notions

Security notions can be defined in term of game indistinguishability. We show how to use it to define some common security notions in cryptography.

Semantic security. An public-key encryption scheme (KG, Enc, Dec) is said to be semantically secure [15] if:

λη. λA. (pk, sk) $ KG(η);

(m0, m1, ') $(η, pk);

b $ rand;

c  Enc(η, mb, pk);

b' $'(c); return(b' =? b)  λη . λ . rand

where  and ' are functions of respective types Bits  τk  T(τm τm (τe TBits)) and τe TBits. Note that τk, τe and τm are the respective types of public keys, cipher-texts and plain-texts, which can be tuples of bitstrings that are distinguished in the language. Roughly speaking, it means that any adversary A playing the semantic security game (left-side game) cannot do significantly better than a random player (right-side game). The semantic security game is to be read as follows: A pair (pk, sk) of public and secret keys is generated; the public key pk is passed to the adversary A which returns two messages m1, m2 and a function A', which can be seen as the continuation of the adversary A and contains necessary information that A has already obtained; one of the messages mb, is selected at random and encrypted with the public key pk; the obtained cipher-text c is then passed to the function A', which returns its guess b' for the selected message; the result of the game is whether the adversary is right or not.

Left-bit unpredictability. An SLR-function F is left-bit unpredictable if:

λη. λA. s $ zrand (q, η); u  F(η, s);

b $(η, tail(u)); return(b ?= head(u)) λη. λA. rand (1)

where  is of type Bits  Bits  Bits. Roughly speaking, it means that any adversary  playing the unpredictability game (left-side game) cannot do significantly better than a random player (right-side game). The left-bit unpredictability game is to be read as follows: a seed s is selected at random in a set of cardinal q; the function F is then used to compute a pseudorandom sequence of bits u of size l(q) > q where l is a polynomial; the sequence u minus its first bit is passed to

the adversary  which returns its guess b for the first bit; the result of the game is whether the adversary is right or not. It was proved by Yao in [28] that left-bit unpredictability is equivalent to passing all polynomial-time statistical tests.

A notion of next-bit unpredictability was defined in [29], but it is based on the sampling from bitstrings of a given length. We can generalize this notion and obtain another notion of left-bit unpredictability, which we shall refer to as strong left-bit unpredictability because it implies the game-based notion of left-bit unpredictability (1). An SLR-function F is strongly left-bit unpre¬dictable if

λ,7. s $ zrand (q, ,7); return(F (,7, s)) ' λ,7 . s $ zrand (q, ,7);

b $ rand;

return(btail (F (,7, s)))

Proposition 2. Strong left-bit unpredictability implies left-bit unpredictability.

Proof. The proof can be done using the CSLR proof system. See Figure 1 for details. 

λη. λA. s $ zrand (q,η); u  F(η, s); b $A(η, tail(u)); return(b ?= head(u))

s $ !

 λη . λ. u $   zrand(q, η); ; b $ (η, tail(u)); return(b =? head(u))

return(F(η, s))

(By rules AX-BIND-3 and AX-BIND-1 of [29])

λη. λ. u$ B @ s  zrand(q, η);

b' $ rand;

return(b'tail (F(η, s))) 1

A; b $

C (η, tail(u)); return(b ?= head(u))

(By strong left-bit unpredictability)

λη. λ. s $ zrand (q,η); b' $ rand; u  b'tail (F(η, s)); b $(η, tail(u));

return(b ?= head(u))

(By rules AX-BIND-3 and AX-BIND-1 of [29])

λη. λ. b' $ rand; s $ zrand (q,η); u  b'tail (F(η, s)); b $(η, tail(u));

return(b ?= b')

(By rules AX-BIND-3 of [29])

 λη . λ . rand

(By Lemma 2)

Fig. 1. Proof of Proposition 2

4.2 Game transformations

Game transformation will consist in rewriting modulo the game indistinguishability relation or the computational indistinguishability. In particular, we will reuse as it is the equational proof system of [29] for game transformations.

We will also need some intermediate lemmas. Those lemmas state basic game transformations used in almost all game-based proofs. The first one states that an expression e which does not depend on a random bit b cannot guess this bit b.

Lemma 2. If Γ ` e : TBits and, for all definable ρ  QΓ , the domain of the distribution Qe~ρ is 0, 1, then

b  $ rand; x  $ e; return(x =? b)  rand

where x, b  dom(Γ).

Proof. We denote by e' the program on the left-hand side. For every definable ρ  Γ, V'~ρ = (0, p0), (1, p1), where

1 1 1

p0 = Pr[~rand~ρ = ~e~ρ] = 2 • Pr[~e~ρ = 0] + 2• Pr[Qe~ρ = 1] = 2

1 1 1

p1 = Pr[~rand~ρ = ~e~ρ] = 2 • Pr[~e~ρ = 0] + 2 • Pr[~e~ρ = 1] = 2

hence e'  rand. 

The second lemma allows for a simplification when the semantics of a subexpression is a per¬mutation.

Lemma 3. Let f, f' be two closed CSLR terms of type Bits Bits such that Qf~ is a permutation over B, and, for every bitstring q, Qf'~ is a permutation over Qf ~(v)  v  Zq. It holds that

λη. x $ zrand (q,η); return(f x) ' λη. x $ zrand (q,η); return(f'(f x))

Proof. Let e1, e2 denote the two programs on the left-hand and right-hand side respectively. Then for a given bitstring η, Qei~(η) are two distributions over bitstrings, and dom(~e2~(η)) = Qf ~(v) v  Zq = dom(~e1~(η)) since Qf'~ is a permutation over dom(Qe1~(η)). For every CSLR adversary A of type Bits  TBits  TBits, define two new adversaries

1 =def λη.λw.(η, x $ w; return(f x))

2 =def λη.λw.(η, x $ w; return(f'(f x))).

Clearly, both 1 and 2 are well-typed CSLR adversaries, and Q(η, ei(η))~ = QAi(η,zrand (q, η))~ (i = 1, 2). According to Lemma 1,

εi = Pr[QAi(η,zrand (q, η))~ -]  Pr[QAi(η)~(Z$q) -]

(i = 1,2) are negligible. Also, by Lemma 3.1 of [25], Q1(η)~(Z$q) = QA2(η)~(Z$q) as Qf'~ is a permutation. Hence,

Pr[Q(η, e1η)~ -]  Pr[Q(η, e2η)~ -]

=Pr[Q1(η, zrand (q, η))~ -]  Pr[QA1(η)~(Z$q) -] (Pr[QA2(η,zrand (q, η))~ -]  Pr[QA2(η)~(Z$q) -]) ε1 + ε2

is still negligible. 

5 Applications

5.1 Computational assumptions

Computational assumptions can be defined in CSLR too. As in the case of defining El-Gamal encryption scheme in CSLR, we have to replace all occurrences of uniform distributions by calls to the function zrand.

Decisional Diffie-Hellman assumption. Let q be a bitstring depending on the security

parameter η, G be a finite cyclic group of order bq and γ  G be a generator. The Decisional Diffie-Hellman (DDH) assumption [12] states that, roughly speaking, no efficient algorithm can distinguish between triples of the form (γx, γy, γxy) and (γx, γy, γz) where x, y and z are random number such that 0  x, y, z < bq3. DDH cannot be written directly in CSLR because it involves arbitrary uniform distributions. Instead we write the following assumption that we call DDH-Bits:

DDHBL ^_ DDHBR

where

DDHBL def = λη. x +$_ zrand (q,η); y +$_ zrand (q,η); return(γx, γy, γxy)

DDHBR def = λη. x +$_ zrand (q,η); y +$_ zrand (q,η); z +$_ zrand (q,η); return(γx, γy, γz) Proposition 3. DDH-bits holds when the DDH assumption holds.

Proof. Let e1, e2 denote the two programs on the left-hand and right-hand side respectively. Then for a given bitstring η, Qei~(η) are two distributions over bitstrings, and dom(~e2~(η)) = {Qf ~(v) | v  Zq} = dom(~e1~(η)) since Qf'~ is a permutation over dom(Qe1~(η)). For every CSLR adversary A of type Bits -* TBits -* TBits, define two new adversaries

A1 =def λη.λw.A(η, x +$_ w; return(f x))

A2 =def λη.λw.A(η, x +$_ w; return(f'(f x))).

Clearly, both A1 and A2 are well-typed CSLR adversaries, and QA(η, ei(η))~ = QAi(η,zrand (q, η))~ (i = 1, 2). According to Lemma 1,

εi = |Pr[QAi(η,zrand (q, η))~ -]  Pr[QAi(η)~(Z$q) -]|

(i = 1,2) are negligible. Also, by Lemma 3.1 of [25], QA1(η)~(Z$q) = QA2(η)~(Z$q) as Qf'~ is a permutation. Hence,

|Pr[QA(η, e1η)~ -]  Pr[QA(η, e2η)~ -]|

=|Pr[QA1(η, zrand (q, η))~ -]  Pr[QA1(η)~(Z$q) -] (Pr[QA2(η,zrand (q, η))~ -]  Pr[QA2(η)~(Z$q) -])|  ε1 + ε2

is still negligible. 

3 We do not assume that bq is prime. However most groups in which DDH is believed to be true have prime order [8].

5.2 Semantic security of El-Gamal encryption scheme

In this section, we illustrate our proof system by proving the semantic security of El-Gamal encryp-tion scheme in Fig. 2. The proof follows the same structure as the one in [24], but here the type system of CSLR guarantees that the adversary is probabilistic polynomial-time. This was not dealt with in [24]. Moreover here all transformations are purely syntactic (thus allowing the immediate prospect of being implemented in a tool), while in [24] they were done at the semantics level.

Note that by using Lemma 3, we assume that the adversary  will not send any junk mes¬sages, i.e., bitstrings that are not elements of the group η. This is considered as a trivial case in cryptography proofs because the El-Gamal encryption procedure will automatically reject the junk messages. But in practice, in more complex crypto-systems, this may not be trivial at all. In our proof system, we can also consider the case where adversaries may send junk messages. It suffices to provide the corresponding code in the program Enc which tests the validity of incoming messages, and we can still prove semantic security in the CSLR proof system. Another possibility would be to use a richer type system to reject adversaries returning junk.

6 Extending CSLR

The discussion in the previous sections was limited to the setting of CSLR with bitstrings. In particular, it does not allow sup erpolynomial-time computations nor arbitrary uniform sampling. Although these restrictions make sense for the cryptographic constructions and the adversary, the game-based approach to cryptographic proofs does not preclude the possibility of introducing games that perform sup erpolynomial-time computations or that use arbitrary uniform distributions. They are just idealized constructions that are used to define security notions but are not meant to make their way into implementations.

In this section, we extend CSLR into CSLR+ so that we can manipulate games with sup erpolynomial-time computations and arbitrary uniform choices.

6.1 CSLR+

CSLR+ extends CSLR with a uniform sampling primitive sample of type Bits -0 TBits and con¬stants for primitive (and possibly sup erpolynomial-time) computations. sample receives a bitstring as argument and returns uniformly a random bitstring of the same length whose integer value is strictly smaller than that of the argument. For instance, the distribution produced by sample(101) is Qsample(101)~ = (000, 51),. .., (100, 51). We can program a sampling from an arbitrary finite set (of CSLR definable elements, usually just bitstrings in cryptography) using sample, assuming that there is an index function over the set, but we shall omit the implementation details and write x  A; for assigning to x a uniformly sampled value from set A.

The type system of CSLR+ is extended with only the proper rules for sample and constants. Note that the type of sample is Bits -o TBits so that it can accept arguments that are defined using linear resources. In fact, in CSLR+ we do not care any more about the complexity class that can be characterized using the type system4 — CSLR+ is the language for describing games, not adversaries.

4 One might expect that the complexity class characterized by CSLR+ is PPTX, where X is the smallest complexity class in which additional constants can be defined, but the exact relation between CSLR+ and the complexity classes remains to be clarified — the addition of the primitive sample alone allows for defining more distributions than in PPT.

λη. λA. hpk, ski $ KG (η); hm0, m1, A'i $A(η,pk);

b $ rand; c $ Enc(η, pk, mb); b'$A'(c);

return(b ?= b')

x $ !

λη. λ. pk, sk $ zrand(q, η); ;m0,m1,' $ (η,pk); return(γx, x)

y $ !

b $  rand; c $   zrand (q, η); ; b'$A'(c);

return(γy, pky mb)

return(b ?= b')

(Inline of definition of KG and Enc)

λη. λ. x $ zrand (q, η); y $ zrand (q, η); b $ rand;

m0, m1, ' $(η, γx); b' $'(γy, (γx)y mb);

return(b ?= b')

(By the equivalence rules AX-BIND-3 and AX-BIND-1 in [29])

λη. λ. v $ DDHBL (η); b $ rand; m0, m1, ' $(η, proj1(v));

b' $'(proj2(v),proj3(v)  mb);

return(b ?= b')

(Inline of DDHBL )

λη. λ. v $ DDHBR(η); b $ rand; m0, m1, ' $(η, proj1(v));

b' $'(proj2(v),proj3(v)  mb);

return(b ?= b')

(By DDH-Bits assumption and SUB)

λη. λ. x $ zrand (q, η); y $ zrand (q, η); z $ zrand (q, η); b $ rand;

m0, m1, ' $(η,γx); b' $'(γy, γz mb);

return(b ?= b')

(Inline of DDHBR)

λη. λ. x $ zrand (q, η); y $ zrand (q, η); b $

!  rand; m0, m1, ' $(η, γx);

z $

v' $   zrand (q, η); ; b' $A'(γy,v');

return(γz mb)

return(b ?= b')

(By the equivalence rules AX-BIND-3 and AX-BIND-1 in [29])

λη. λ. x $ zrand (q, η); y $ zrand (q, η); b $ rand; m0, m1, ' $(η,γx);

!; b' $'(γy, v');

return(b ?= b')

(By Lemma 3 as (  mb) is a permutation over the group when mb is also from the group)

λη. λ. b $ rand; x $ zrand (q, η); y $ zrand (q, η); z $ zrand (q, η); m0, m1, ' $(η,γx); b' $'(γy,γz);

return(b ?= b')

(By the equivalence rules AX-BIND-3 and AX-BIND-1 in [29])

 λη . λ . rand

(By Lemma 2)

Fig. 2. Proof of semantic security of ElGamal

The definitions of computational indistinguishability and game indistinguishability are almost the same as before, except that we are now considering distributions that are produced by CSLR+ programs:

Definition 3 (Game indistinguishability in CSLR+). Two closed CSLR+ programs g1 and g2, both of type Bits -* (Bits -* Tτ) -* TBits, are game indistinguishable (written as g1 ^+ g2) if for every closed CSLR term A of type Bits -* Tτ, and every positive polynomial P, there exists some N E N such that for all bitstring η with |η| > N,

|Pr[Qg1(η, A)~ = 1]  Pr[Qg2(η, A)~ = 1]| <

P(|η|)

Definition 4 (Comput. indistinguishability in CSLR+). Two CSLR+ terms f1 and f2, both

of type Bits -* τ, are computationally indistinguishable (written as f1 + f2) if for every closed CSLR term A of type Bits -* τ -* TBits and every positive polynomial P, there exists some N E N such that for all bitstring η with |η| > N

|Pr[QA(η, f1(η))~ = ]  Pr[QA(η, f2(η))~ = ]| < P(|η|) where denotes the empty bitstring.

CSLR+ inherits most of the equational proof system of CSLR. All the rules for program equiva-lence in CSLR can be used directly in CSLR+. No extra rules are needed for the primitive sample, but we can add rules for constants if necessary. The four rules for proving computational indistin-guishability remain the same as in CSLR (Figure 3) except that in the rule SUB, a new premise enforces that the substitution context (the term e) must be definable in CSLR, i.e., a program that does not contain sample or any CSLR+ constant. The soundness of the system still holds and the proof just goes as for CSLR [29]. In particular, the proof for the rule SUB contains a construction of a new adversary with the context, which remains a CSLR term (i.e., a PPT adversary) thanks to the new premise enforcing that the context must be definable in CSLR.

 ei : Bits  τ (i = 1, 2) e1 + e2

EQUIV

e1 + e2

 ei : Bits  τ (i = 1, 2, 3) e1 + e2 e2 + e3

TRANS-INDIST

e1 + e3

x :° Bits, y :° τ ` e : τ0 e is definable in CSLR ` ei : Bits  τ (i = 1, 2) e1 '+e2

SUB

λx . e[e1(x)/y] '+ λx . e[e2(x)/y]

x :° Bits, n :° Bits  e : τ λn.e[u/x] is numerical for all bitstring u

λx . e[i(x)/n] '+ λx . e[B1i(x)/n] for all canonical polynomial i such that i < p

H-IND λx . e[nil/n] '+ λx . e[p(x)/n]

Fig. 3. Rules for computational indistinguishability in CSLR+

Note that the rule H-IND is not used throughout this paper, but it is an important rule repre-senting the hybrid proof technique that is frequently used in cryptography. Interested readers can find more detailed explanation and examples in [29].

6.2 Applications

This extension of CSLR+ allows us to express directly DDH in the formalism and thus does not require to go through the non-standard computational assumption introduced in Section 5. We can reproduce almost as such the proof of semantic security for ElGamal given in [24]. The difference is that now we can check automatically that the adversary built in the proof is PPT, and all transformations are purely syntactic.

We can also reproduce the proof of unpredictability for the pseudorandom bit generator of Blum, Blum and Shub given in [25]. The proof requires a test for quadratic residuosity which is a sup erpolynomial-time computation — it can be introduced into CSLR+ as a constant. Moreover this proof is based on the Quadratic Residuosity Assumption that uses arbitrary uniform choices. Quadratic Residuosity Assumption. Let n be a positive number and Zn be the set of integers modulo n. The multiplicative group of Zn is written Z* n and consists of the subset of integers modulo n which are coprime with n. An integer x  Z* n is a quadratic residue modulo n iff there exists a y  Z* n such that y2 = x (mod n). Such a y is called a square root of x modulo n. We write Z*n(+1) for the subset of integers in Z* n with Jacobi symbol equal to 1. The quadratic residuosity problem is the following: given an odd composite integer n, decide whether or not an x  Z*n(+1) is a quadratic residue modulo n. The quadratic residuosity assumption (QRA) states that the above problem is intractable when n is the product of two distinct odd primes [20]. We reformulate the assumption in CSLR+:

λη. λA. x $ Z* n(+1); b $A(η, n, x); return(b =? qr(x)) + λη. λA. rand

where  must be definable in CSLR of type Bits  Bits  Bits  TBits, qr(x) is the quadratic residuosity test of the element x of Z* n in our encoding, and n is an expression that depends on the security parameter η.

Blum-Blum-Shub. CSLR+ is expressive enough to encode the proof of [25] that BBS is left-bit unpredictable: for every positive integer l,

λη. λA. s $ Z* q; uBBS (η,l+ 1, s); + λη.λA. rand b $A(η,q,tail(u));return(b ?= head(u))

where  is must be definable in CSLR of type Bits  Bits  Bits  TBits. 7 Conclusions

We have shown how Zhang’s CSLR can be equipped with a notion of game indistinguishability. The system allows us to define cryptographic constructions, effective adversaries, security notions, computational assumptions, game transformations, and game-based security proofs in the unified framework provided by CSLR. We have illustrated our calculus by formalizing the proof of semantic security for a binary implementation of the public-key encryption scheme ElGamal.

CSLR pushes users to write binary encodings of cryptographic constructions, which are close to their computer implementations, but the programming overhead is probably heavy for peo¬ple who want to check their cryptographic proofs in the mathematical setting only. Also, the lack of sup erpolynomial-time computation power limits the application of CSLR in cryptography. We have thus introduced CSLR+ — an extension of CSLR with arbitrary uniform sampling and sup erpolynomial-time constants, and formalized the pseudorandom bit generator of Blum, Blum

and Shub with the related security definition and computational assumption. CSLR+ keeps the feature of characterizing PPT adversaries through typing in CSLR but allows users to write se¬curity games using a richer language, which is closer to the mathematical language and reduces the programming overhead. As a future work, it might be interesting to allow arbitrary types in CSLR+ because intermediate games might be easier to write without having to encode everything into bitstrings.

The most immediate direction for future work is to consider more complex examples. We could also consider an implementation of El-Gamal that would use BBS as a source for pseudorandom bits. Another possible direction would be to implement CSLR and CSLR+ (possibly in a proof assistant) and develop a library of reusable security definitions, assumptions and game transformations. This would help dealing with complex examples.

The notion of oracle is frequently used in cryptography and it is sometimes necessary for defining security notions. For instance, with symmetric keys, an encryption oracle allows the attacker to encrypt messages without knowing the key. The higher-order nature of CSLR makes it easy to define such oracles. As an example, we can define the notion of IND-CPA (indistinguishability under chosen plain-text attack) using encryption oracles: an encryption scheme (KG, Enc, Dec) is said to be IND-CPA secure if

λ,7 . λA. (pk, sk) $+- KG (,7 ); b +$- rand;

O +-λ(m0, m1) . Enc(,7, mb, pk) ;

b' $+- (,7, pk, O);

return(b =? b') λ,7 . λ . rand

This is a CSLR reformulation of the definition from [5] (adapted for asymmetric encryption): the game first generates a pair of public and secret keys and a challenge bit b, then sets up a left-right encryption oracle which, upon receiving a pair of messages, will encrypt one of them (according to the challenge bit) using the public key and return the encrypted cipher-text; the public key is passed to the adversary, who is allowed to query the encryption oracle; in CSLR the oracle can be encoded as a function and passed to the adversary as an argument, just as the public key; the adversary then outputs its guess on the challenge b.

The exact relation between these different definitions of security notions remains to be clarified. It would be interesting to investigate how much CSLR can help dealing with oracles.

References

1. R. Affeldt, D. Nowak, and K. Yamada. Certifying assembly with formal cryptographic proofs: the case of BBS. In Proceedings of the 9th International Workshop on Automated Verification of Critical Systems (AVoCS 2009). To appear.

2. M. Backes, M. Berg, and D. Unruh A formal language for cryptographic pseudocode. In Proceedings of the 15th International Conference on Logic for Programming, Artificial Intelligence and Reasoning (LPAR 2008), volume 5330 of Lecture Notes in Computer Science, pages 353–376. Springer.

3. G. Barthe, B. Gr´egoire, and S. Zanella B´eguelin. Formal certification of code-based cryptographic proofs. In Proceedings of the 36th ACM SIGPLAN- SIGACT Symposium on Principles of Programming Languages (POPL 2009), pages 90–101. ACM Press.

4. S. Bellantoni and S. A. Cook. A new recursion-theoretic characterization of the polytime functions. In Compu¬tational Complexity, 2:97–110, 1992.

5. M. Bellare and C. Namprempre. Authenticated Encryption: Relations among Notions and Analysis of the Generic Composition Paradigm. Journal of Cryptology, 21:469–491, 2008.

6. B. Blanchet and D. Pointcheval. Automated security proofs with sequences of games. In Proceedings of the 26th Annual International Cryptology Conference (CRYPTO 2006), volume 4117 of Lecture Notes in Computer Science, pages 537–554. Springer.

7. Blum, L., Blum, M., Shub, M.: A simple unpredictable pseudo random number generator. SIAM Journal on Computing, 15(2):364–383. Society for Industrial and Applied Mathematics, 1986.

8. D. Boneh. The Decision Diffie-Hellman problem. In Proceedings of the 3rd International Symposium on Algo-rithmic Number Theory (ANTS-III), volume 1423 of Lecture Notes in Computer Science, pages 48–83. Springer, 1998.

9. M. Bellare and P. Rogaway. Code-based game-playing proofs and the security of triple encryption. Cryptology ePrint Archive, Report 2004/331, 2004.

10. R. Corin and J den Hartog. A probabilistic Hoare-style logic for game-based cryptographic proofs. In Proceedings of the 33rd International Colloquium on Automata, Languages and Programming (ICALP 2006), volume 4052 of Lecture Notes in Computer Science, pages 252–263. Springer.

11. J. Courant, M. Daubignard, C. Ene, P. Lafourcade, and Y. Lakhnech. Towards automated proofs for asymmetric encryption schemes in the random oracle model. In Proceedings of the 15th ACM Conference Computer and Communications Security (CCS 2008), pages 371–380. ACM Press.

12. W. Diffie and M. E. Hellman. New directions in cryptography. In IEEE Transactions on Information Theory, 22(6):644–654, 1976.

13. T. Elgamal. A public key cryptosystem and a signature scheme based on discrete logarithms. In IEEE Transac¬tions on Information Theory, 31(4):469–472, 1985.

14. O. Goldreich. The Foundations of Cryptography: Basic Tools. Cambridge University Press, 2001.

15. S. Goldwasser and S. Micali. Probabilistic encryption. In Journal of Computer and System Sciences (JCSS), 28(2):270–29 9. Academic Press, 1984. An earlier version appeared in proceedings of STOC’82.

16. Martin Hofmann. A Mixed Modal/Linear Lambda Calculus with Applications to Bellantoni-Cook Safe Recursion. In Proceeding of the 11th International Workshop on Computer Science Logic (CSL 1997), volune 1414 of Lecture Notes in Computer Science, pages 275–294. Springer.

17. M. Hofmann. Safe recursion with higher types and BCK-algebra. In Annals of Pure and Applied Logic, volume 1414 of 104(1-3):113–166, 2000.

18. J. Hurd. A formal approach to probabilistic termination. In Proceedings of the 15th International Conference on Theorem Proving in Higher Order Logics (TPHOLs 2002), volume 2410 of Lecture Notes in Computer Science, pages 230–245. Springer.

19. R. Impagliazzo and B. M. Kapron. Logics for reasoning about cryptographic constructions. Journal of Computer and System Sciences, 72(2):286–320, 2006.

20. A. J. Menezes, P. C. van Oorschot, S. A. Vanstone. Handbook of Applied Cryptography. CRC Press, 1996.

21. J. C. Mitchell, M. Mitchell, and A. Scedrov. A linguistic characterization of bounded oracle computation and probabilistic polynomial time. In Proceedings of the 39th Annual Symposium on Foundations of Computer Science (FOCS’98), pages 725–733.

22. J. C. Mitchell, A. Ramanathan, A. Scedrov, and V. Teague. A probabilistic polynomial-time process calculus for the analysis of cryptographic protocols. Theoretical Computer Science, 353(1-3):118–164, 2006.

23. E. Moggi. Notions of computation and monads. In Information and Computation, 93(1):55–92, 1991.

24. D. Nowak. A framework for game-based security proofs. In Proceedings of the 9th International Conference on Information and Communications Security (ICICS 2007), volume 4861 of Lecture Notes in Computer Science, pages 319–333. Springer.

25. D. Nowak. On formal verification of arithmetic-based cryptographic primitives. In Proceedings of the 11th International Conference on Information Security and Cryptology (ICISC 2008), volume 5461 of Lecture Notes in Computer Science, pages 368–382. Springer.

26. N. Ramsey and A. Pfeffer. Stochastic lambda calculus and monads of probability distributions. In Proceedings of the 29th SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 2002), pages 154–165.

27. V. Shoup. Sequences of games: a tool for taming complexity in security proofs. Cryptology ePrint Archive, Report 2004/332, 2004.

28. A.C. Yao. Theory and applications of trapdoor functions. In Proceedings of the IEEE 23rd Annual Symposium on Foundations of Computer Science (FOCS’82), pages 80–91. IEEE, 1982.

29. Y. Zhang. The Computational SLR: A Logic for Reasoning about Computational Indistinguishability. In Pro-ceedings of the 9th International Conference on Typed Lambda Calculi and Applications ( TLCA 2009), volume 5608 of Lecture Notes in Computer Science, pages 401–415. Springer.

Modeling the Value-Adding Attributes of Real Estate to the Wealth Maximization of the Firm

Authors Anna-Liisa Lindholm, Karen M. Gibler, and Kari I. Lev a¨inen

Abstract Firms develop strategies to help them achieve their primary goal

of maximizing the wealth of the shareholders. These strategies should define the supporting role corporate real estate management plays; however, current theory and practice do not adequately identify the direct and indirect methods by which corporate real estate management (CREM) adds value to the firm. This paper develops a model of how real estate adds value to the firm to help fill this void. This model can be then used to develop more precise and complete metrics to measure the value real estate adds to the firm.

Globalization of business operations and other competitive pressures are forcing corporations to re-evaluate their real estate needs. The demand for more efficient utilization of space and higher workplace productivity has led to businesses adopting a range of strategies for managing their facilities. The emergence of corporate real estate management (CREM) as a distinct discipline has supported this drive and the search for strategies aimed at enhancing the value of real estate assets and facility-related services to the core business. Yet, the relationship between core and non-core business, in the context of real estate management and facilities management, is not well understood. The field lacks research that develops theoretical models of the relationship between corporate strategic management systems and real estate decisions and operations. The field also lacks empirical testing using well-defined models to quantify the value that real estate adds to the firm.

The lack of unifying corporate real estate models means that the contribution of real estate to the firm and the possibilities that exist for adding value are often not recognized, nor properly considered. In many corporations, real estate and facilities management have evolved over the years from individual transaction-based decisions about physical spaces. As such, they tend to follow traditional approaches of cost minimization and focus on short-term results rather than long¬term strategy, still not moving from taskmaster to business strategist (Joroff,

J R E R Vol. 28 1 N o. 4 – 2006

446 I Lindholm, Gib l e r, and Lev a¨ i n e n

Louargand, Lambert, and Becker, 1993). Many real estate and facilities units within corporations have been established from the perspective of managing existing buildings. CREM decisions are, therefore, based primarily on functions and requirements in relation to structures and not the businesses that are performed within them. Little attention has been paid to the added value that CREM can generate from strategically supporting core business processes.

This traditional approach places buildings and services installations in the foreground and “softer”issues in the background. Realization that both tangible and intangible assets are important to the successful support of the core business calls for a broader view of real estate’s contribution to the firm. Not only direct facility costs, but indirect costs and contribution to the long-term success of the core business must be identified and measured. This requires not only a broad theoretical framework, but also new techniques and tools for measuring, amongst other things, performance, productivity, usability, and functionality that result from real estate decisions rather than just relying on the traditional financial measures corporate real estate officers report using most often (Nourse, 1994; and Bdeir, 2003).

Indeed, many businesspeople and researchers discuss such value-adding concepts yet struggle with their proof. The absence of some form of objective measurement using leading indicators as well as financial outcomes inhibits comparison of alternative CREM strategies, and generally, leaves corporations in the dark as to what they are achieving. Furthermore, a broader, more coherent assessment of the ability of best practice CREM to add value to the core business is missing.

This paper contributes to the field by developing a model of how CREM can produce added value for the core business of the non real estate firm through a broader strategic management framework. The objective of the paper is to use theory from strategic management along with research on business performance, CREM, facilities management, workplace performance, and results of a survey to develop a framework that will illustrate how corporate real estate directly and indirectly adds value to the core business and the wealth of the firm. The paper presents ways corporate real estate strategies can be linked to the overall business strategy of the firm and explains how real estate tactical decisions and actions relate to these real estate strategies. This work is based on previous theoretical models, in-depth interviews with corporate real estate executives and service providers, and the limited empirical studies that have been conducted to date. The result is a model that can be used in future research to empirically test the contribution of real estate to the primary long-term goal of maximizing the wealth of the firm’s shareholders.

Modeling the Value-Adding Attributes 447

Previous Research

Strategic Planning and Financial Performance of the Firm

Over the past two decades, the ideology of shareholder value has become entrenched as a principle of corporate governance (Lazonick and O’Sullivan, 2000; and Nappi-Choulet, 2002). According to shareholder value theory, value to the firm is created by maximizing the wealth of the shareholders. A firm should strive to maximize the return to shareholders, as measured by the sum of capital gains and dividends, for a given level of risk or reduce the risk with the same level of income. However, creating value takes more than acceptance of value maximization as the organizational objective. As a statement of corporate purpose or vision, value maximization is not likely to tap into energy and enthusiasm of employees and managers to create value (Jensen, 2001). The choice of value maximization as the corporate driver must be complemented by a corporate vision, strategy, and tactics that unite participants in its struggle for dominance in the competitive arena. A business strategy gives direction to all the functional areas within the company, including real estate. Thirty-six years ago, Ackoff (1970) identified several conditions that, when present, make a company’s decision a strategic one: it has an effect of long duration; it is difficult to reverse; it affects a large number of organizational functions; and it affects organizational values. It is easy to see from this definition how real estate decisions should form an integral part of any company’s strategic plan.

Strategic planning takes time and money. So undertaking a strategic planning process is only economical if the benefits outweigh the costs. Based on a meta-analysis drawn from twenty-six studies, Miller and Cardinal (1994) created a model to explain the relationship between strategic planning and firm performance. They created a planning performance model that demonstrated that strategic planning positively affects performance, or more specifically, the amount of strategic planning a firm conducts affects its financial performance. This may be extended to establish a relationship between the amount of real estate strategic planning and the firm’s performance.

The organization needs to compute relevant performance measures, which should derive from the firm’s strategy (Keegan, Eiler, and Jones, 1989). Such performance measures are used to ensure that an organization is achieving its aims and objectives, as well as to evaluate, control, and improve organizational processes (Ghalayini and Noble, 1996). A problem area for researchers in strategic management has been the identification and quantification of the contribution of specific policies and decisions to achieving the financial goals of the firm, especially in support areas such as corporate real estate. To play a strategic role in the organization, better real estate performance measures are needed to reflect how well real estate is being utilized in the business, not just its cost to the firm (Nourse, 1994).

J R E R Vol. 28 1 N o. 4 – 2006

448 I Lindholm, Gib l e r, and Lev a¨ i n e n

Many real estate decisions have an indirect and lagged effect on the firm’s financial success that is going unmeasured. Financial performance is correlated with creation of value and delivery of quality products and services (Heskett et al., 1997). These, in turn, are related to employee morale, productivity and both employee and customer satisfaction. Employee morale, productivity, and satisfaction are partially a function of the workplace environment, which is determined by corporate real estate decisions. Customer satisfaction is partially a function of convenient and functional product and service delivery locations. Although researchers may have difficulty in developing reliable measures of such important factors as employee productivity (Kaplan and Aronoff, 1996), the importance of measuring the lagged effect of decisions affecting these conditions is evident.

Banker, Potter, and Srinivasan (2000) show that current nonfinancial measures of customer satisfaction can be significantly associated with future financial performance in the hotel industry. Similarly, Ittner and Larcker (1996) provide evidence that hedge portfolios formed on the basis of customer satisfaction measures outperform the stock market in subsequent periods, demonstrating that decisions that create customer satisfaction, including real estate decisions, lead to better financial performance by the firm.

Gallup and Fortune studies of employee morale and financial returns also support such correlations (Grant, 1998). Meanwhile, Sears finds that a quantitative measure of improvement in employee attitudes drives improvement in customer satisfaction, which, in turn, drives improvement in revenue growth (Rucci, Kirn, and Quinn, 1998) and Maister (2001) determines from statistical analysis of twenty-nine firms that employee satisfaction leads to improved revenues and profits of firms in all industries and countries surveyed.

Reducing employee turnover is a key way to improve financial performance. Research indicates that the cost of losing a trained employee ranges from 1.5 to 3 times salary (Iszo and Withers, 2001). Experienced employees provide stability, institutional memory, and long-term relationships with customers. For example, according to Heskett et al. (1994), a conservative estimate is that it takes nearly five years for a securities broker to rebuild relationships with customers that can return $1 million per year in commissions to the brokerage house, resulting in a cumulative loss of at least $2.5 million in commissions during the five years as the relationship is being slowly rebuilt to the original level.

Yet another way to ensure financial performance is through innovation (Cefis and Ciccarelli, 2005). Innovation is ideally considered as a process of continuous improvement (Bradley, 2002), which leads to commercial success (OECD, 1991). Continuous innovation is a prerequisite for corporations to be at the leading edge of global competition. According to Nonaka and Takeuchi (1995), knowledge creation leads to continuous innovation, and finally to sustainable competitive advantage. For a company, it is not enough to absorb information; the essential skill is the ability to question old truths and to recreate the world “in an ongoing

Modeling the Value-Adding Attributes 449

process of personal and organizational self-renewal.”It is even said that that “companies that don’t innovate die,”(Chesbrough, 2003).

Strategic planning can contribute to the financial success of the firm, but only if the firm identifies the critical drivers of success, develops functional strategies (including real estate strategies) that incorporate these drivers, and develops a system of key leading and lagging performance indicators to provide feedback over time (Kaplan and Norton, 1996; and Barkley, 2001). The system must incorporate the complex set of cause-and-effect relationships between the performance drivers and the financial outcome measures. Only in this way will a firm know if its strategic plan has been successfully translated into functional action plans and implemented with operating decisions that produce the desired results.

Linking Real Estate Value Adding Strategies to Corporate Strategy

An integrated corporate strategy should lead to a real estate strategy that ensures that real estate actions are directly linked to the organization’s strategic goals. The role of real estate within the corporate strategy should not be limited to minimization of costs of physical structures or outsourcing activities on the basis of achieving operational effectiveness (Krumm, 2001). The strategic planning process should align the facilities infrastructure with the core business, as well as drive corporate real estate initiatives relative to process, people, and enabling systems. While many corporate real estate organizations are developing property portfolio strategies, most still do not engage in strategic planning for service offerings and capabilities to support the core business (Acoba and Foster, 2003).

By producing strategic real estate plans that address the business units’ objectives (e.g., efficiency, customer satisfaction, productivity, etc.), corporate real estate executives can best demonstrate their value and provide a platform for being involved in the broader corporate planning process (Lambert, Poteete, and Waltch, 1995). This will help corporate real estate executives overcome the problems associated with being excluded from the strategic planning process cited in previous research (Pittman and Parker, 1989; Veale, 1989; Teoh, 1993; Carn, Black, and Rabianski, 1999; Schaefers, 1999; Gibler, Black, and Moon, 2002).

According to Nourse and Roulac (1993), to effectively support a range of corporate objectives, multiple rather than single real estate strategies may be required. They list eight types of real property strategies that encompass how a company’s property decisions can be guided (Exhibit 1). The first seven strategies encompass common corporate real estate decisions regarding site selection, facility design, and leasing, but place them in a strategic context within the broader aims of the firm. Some encompass the traditional goals of reducing occupancy costs and facilitating production, operations, and service delivery. However, Nourse and Roulac also separate facilitating knowledge work from other operations, include

J R E R Vol. 28 1 N o. 4 – 2006

4 5 0 I Lindholm, Gib l e r, and Lev a¨ i n e n

Exhibit 1 1 Alternative Real Estate Strategies

1. Occupancy cost minimization

Explicit lowest-cost provider strategy

Signal to critical constituencies of cost-consciousness

2. Flexibility

Accommodate changing organizational space requirements

Manage variability/ risk associated with dramatic escalation/compression space needs

Favor facilities that can readily be adapted to multiple uses by corporation and others

3. Promote Human Resources objectives

Provide efficient environment to enhance productivity

Recognize that environments are important elements of job satisfaction and therefore

compensation

Seek locations convenient to employees with preferred amenities

4. Promote marketing message

Symbolic statement of substance or some other value

Form of physical institutional advertising

Control environment of interaction with company’s product/service offering

5. Promote sales and selling process

High traffic location to attract customers

Attractive environment to support/enhance sale

6. Facilitate and control production, operations, service delivery

Seek/design facilities that facilitate making company products/delivering company services

Favor locations and arrangements that are convenient to customers

Select locations and layouts that are convenient to suppliers

7. Facilitate managerial process and knowledge work

Emphasize knowledge work setting over traditional industrial paradigm

Recognize changing character, tools used in, and location of work

8. Capture the real estate value creation of business

Real estate impacts resulting from demand created by customers Real estate impacts resulting from demand created by employees Real estate impacts resulting from demand created by suppliers

Note: The source of the information is Nourse and Roulac, 1993, p. 480.

flexibility as a real estate strategy, and identify that real estate strategies can be integrated with other functional strategies, such as human resources and marketing.

In an effort to pinpoint the added value of real estate, De Jonge (1996) describes seven elements of added value (Exhibit 2) that contribute to the transformation of real estate from mere “cost of doing business”to a true corporate asset (Krumm, 1999). De Jonge also identifies cost reduction, flexibility, and the relationship between real estate and marketing as ways real estate can add value to the firm. His lists differs from that of Nourse and Roulac (1993) by reformulating facilitating operations to increasing productivity, more clearly identifying

Modeling the Value-Adding Attributes 1 451

Exhibit 2 1 Elements of Added Value of Real Estate

1. Increasing productivity

Offering adequate accommodation

Site selection

Introducing alternative workplaces

Reducing absence of leave

2. Cost reduction

Creating insight into cost structure

More efFicient use of workplaces

Controlling costs of Financing

3. Risk control

Retaining a Flexible real estate portfolio

Selecting suitable locations

Controlling the value development of the real estate portfolio

Controlling the process risk during (re)construction

Controlling environmental aspects and labor conditions

4. Increase of value

Timely purchase and sale of real estate

Redevelopment of obsolete properties

Knowledge and insight into real estate market

5. Increase of Flexibility

Organizational measures (working hours, occupancy rates)

Legal/Financial measures (mix own/rent/lease)

6. Changing the culture

Introducing workplace innovations

7. PR and marketing

Selection of branch locations

Image of buildings

Governing corporate identity

Note: The source of the information is De Jonge, 1996 in Krumm, 1999, p. 66.

increasing value as a strategy, highlighting changing culture by introducing workplace innovations, and grouping a range of real estate decisions under the heading of risk control.

Any strategic real estate model must recognize that corporate real estate management has traditionally focused on meeting the continuous need for accommodation, providing the facilities for the firm’s production and delivery of goods and services. However, to meet their biggest challenges in today’s fast-paced competitive business environment, firms need flexible, efficient, innovative, and productive work environments (Gibson and Lizieri, 1999; Gibler, Black, and Moon, 2002; and Gibson and Louargand, 2001). Gibson (2000) and Blakstad (2001) consider the physical, functional, and financial aspects of property as

J R E R 1 Vol. 28 1 N o. 4 – 2006

4 5 2 I Lindholm, Gib l e r, and Lev a¨ i n e n

sources of flexibility. From the physical perspective, flexibility is articulated in terms of building design, including usable areas, modular floor plates, and the ability to change the internal configuration of space (Harris, 1996). Functional flexibility is about the organization’s use of space and the space’s functional possibilities, such as if the space is multifunctional and able to accommodate changes. The main issues related to the organization’s use of space include alternative workplace solutions (e.g., hot desking, shared workspaces, free address areas, team space, etc.), varying density, operating hours, and flexible working locations (Becker and Steel, 1995; and Blakstad, 2001). Financial flexibility is related to the financial situation and arrangements of owners and users of the property and in the real estate market in general (Blakstad, 2001). It is influenced by the tenure of the occupier, lease terms, and the level of services offered by the property provider (Gibson, 2000).

Employers must provide appropriately designed workspaces in locations that attract and retain the best knowledge workers and allow them to do their best work in an efficient manner. The physical workplace is the third most important factor (after compensation and benefits) in the decision to accept or leave a job; 41% of those surveyed in the United States said it would influence their decision to take a position (ASID, 1999). Research conducted by the Buffalo Organization for Social and Technological Innovation (BOSTI) demonstrates that the physical environment for office work can measurably affect job performance, satisfaction, and ease and quality of communication and suggests that supportive design has positive effects on work and workers. The economic benefit of properly planning and designing office space can equal 2% to 5% of each worker’s salary annually, and could be higher (up to 15%) if the office were planned and designed to be a “perfect fit”for the work (Brill, 1984).

Retailers, hotels, and industrial firms have long recognized that site selection is an essential component of financial success (e.g., Craig, Ghosh, and McLafferty, 1984; Kimes and Fitzsimmons, 1990; and Singhvi, 1987). Service providers can also trace financial success to proper site selection (Becker, Kaldenberg, and McAlexander, 1997). Office occupiers can gain value by using buildings to create or reinforce a corporate image, using them as symbols to reflect their values and culture (Capowski, 1993).

Thus, real estate is expected to serve multiple roles within the firm’s plans. The real estate decision maker must balance the shareholder’s perspective of the firm’s real estate holdings with the user’s perspective to make optimal decisions (Pfnuer, Schaefer, and Armonat, 2004). Properly managing the company’s portfolio must start with an inventory and valuation of current facilities. Many firms lack accurate property information and accounting systems (Gibler, Black, and Moon, 2002). Lease versus own decisions can have a direct impact on the wealth of the shareholders (Allen, Rutherford, and Springer, 1993) and must be made considering both space users and the overall long range corporate strategic and financial plans.

Modeling the Value-Adding Attributes 1 453

Unfortunately, nowadays many firms are focusing on outsourcing real estate services (Kimbler and Rutherford, 1993; Kleeman, 1994; Lyne, 1997; Gibson, 1998; McDonagh and Hayward, 2000; Gibson and Barkham, 2001; Ernst & Young, 2002; Acoba and Foster, 2003; and Gibler and Black, 2004) and reducing the impact of real estate assets on the corporate balance sheet. The ongoing focus on solely cost reduction and not cost efficiency may provide immediate financial results while creating long-term performance problems. A more comprehensive approach to real estate decision-making is needed.

1Initial Model

The following model is proposed to visually capture how corporate real estate can add value to the firm in the modern business environment (Exhibit 3). The primary aim is maximizing the wealth of shareholders. A business strategy for achieving this goal is developed based on the firm’s vision. The firm must develop strategies for the functional areas such as human resources, information technology, finance, and real estate that follow from and support the general business strategy. Within the corporate real estate area, strategies are implemented through asset management (AM), property management (PM), and facilities management (FM). Staff makes operating decisions in each of these areas that can directly and

Exhibit 3 1 CREM as a Part of the Firm’s Strategic Framework

J R E R 1 Vol. 28 1 N o. 4 – 2006

454 I Lindholm, Gib l e r, and Lev a¨ i n e n

indirectly affect the core business and the value of the firm, and thereby shareholder wealth. Key to this model is linking real estate strategies to overall business strategy, identifying how real estate decisions directly and indirectly affect the firm’s financial success, and measuring those impacts on the firm.

One basis for a strategic management system incorporating the direct and indirect value-adding abilities of real estate is Kaplan and Norton’s (1996, 2000, 2004) Balanced Scorecard (BSC) approach. Their model places corporate strategy at the center, organizing strategic objectives into four perspectives that must be balanced to ensure success: financial (growth, profitability, and risk viewed from the perspective of the shareholder), customer (creating value and differentiation from the customer’s perspective), internal (priorities for business processes that create customer and shareholder satisfaction), and organizational learning and growth (climate that supports change, innovation, and growth and provides the needed training and technology). Organizations have two basic approaches for increasing economic value: revenue growth and productivity. The former generally has two components: build the franchise with revenue from new markets, new products, and new customers; and increase value to existing customers by deepening relationships with them through expanded sales. The productivity strategy also usually has two parts: improve the company’s cost structure by reducing direct and indirect expenses, and use assets more efficiently by reducing the working and fixed capital needed to support a given level of business.

In line with Kaplan and Norton (1996, 2000, 2004), Krumm and de Vries (2003) state that cost reduction and revenue growth are the key elements for global performance. Also Burns (2002) comes to the conclusion that the contribution of CREM to the organization’s value could be measured by adapting the BSC view, where organizations have two financial strategies for driving shareholder value: profitability and growth. Typically, corporate real estate’s performance has related to the profitability or productivity aspect of organizational performance and its contribution measured through space efficiency, cost reduction, and capital minimization. For example, according to Nourse’s (1994), Arthur Andersen’s (1993), and Bdeir’s (2003) research, space, and occupancy cost measures such as cost per square foot are the most common methods to evaluate the real estate performance by both senior management and corporate real estate executives. However, real estate decisions can also contribute to increased revenues. This is especially important to recognize in knowledge-based businesses whose value lies mainly in their intangible assets. These firms are more likely than manufacturers or retailers to view real estate not as a physical factor of production, but as a facilitator that creates an inviting and supportive workspace that enables employees to provide high quality services.

The BSC approach focuses on the drivers of performance that ultimately support the overall objective of maximizing wealth. Often real estate decisions affect financial outcomes through causal pathways involving two or three intermediate stages. For example, proper site selection may lead to higher customer satisfaction, which leads to better financial performance as found by Ittner and Larcker (1996)

Modeling the Value-Adding Attributes 455

and Banker, Potter, and Srinivasan (2000). Such indirect methods of influencing financial performance are recognized by the BSC approach. Some of the drivers of performance on critical dimensions relating to customers, internal processes, and organizational learning are best measured by non-financial indicators, an innovation that has been lacking in the corporate real estate field.

The framework is also appropriate for non-profit and governmental agencies (Simons, 1993; and Wilson, Leckman, Cappucino, and Pullen, 2001). While the primary goal of these agencies is not wealth maximization for shareholders, they do have identifiable stakeholders and a corporate mission, which can be translated into a business strategy with supporting real estate strategy and appropriate performance indicators.

Data Gathering

The aim of this research is to devise a framework and key concepts for analyzing the value CREM adds to the core business and wealth of the firm. To achieve this objective, in addition to synthesizing previous models and research, organizations in a variety of industries in four different countries were surveyed.

Questionnaire

Based on the previous research discussed above and consultations with corporate real estate researchers, a structured questionnaire was developed for the interview survey. The questionnaire was comprised of a mix of closed- and open-ended questions to get respondents to fully explain their ideas and opinions on subjects not previously specifically studied. The questionnaire was pretested with two Finnish corporate real estate executives: one is a corporate real estate director for a Finnish transportation firm and the other holds a similar position with a public organization. The questionnaire was revised after their comments.

The questionnaire covers several topics. First it is used to gather classification data on the respondents and their firms. In an effort to identify the attributes of CREM that can add value to the core business of an organization, respondents are asked how they would define the term ‘added value’ and how they thought the CREM units could add value to the core business.

Sample

A convenience sample of 26 firms was selected that had a range of core businesses in Finland, the Netherlands, the United Kingdom, and the U.S. Firms were selected across a wide range of industries, real estate portfolios, and countries to ensure development of a general model, which will be useful across borders and industries. The number of responses is sufficient and suitable for exploratory and theory-building research. The results, while not generalizable for estimating

J R E R Vol. 28 1 N o. 4 – 2006

4 5 6 I Lindholm, Gib l e r, and Lev a¨ i n e n

parameters or sufficient for tests of statistical significance, are useful for development of a model for subsequent statistical testing.1

Data on each of the organizations was gathered from their websites, annual reports, and interviews published as part of previous research projects. Specific corporate real estate executives within each firm were selected to interview to access their knowledge based on being continuously involved in the corporate real estate decisions and strategies in their organizations. The individual interviewees were chosen on the basis of their being active in the CREM field (participation in professional networks, seminars, workshops etc.), as well as professional contacts through CoreNet Global. In some of the organizations, multiple members of the corporate real estate staff participated in the interviews to provide complete data on the organization’s corporate real estate operations. When questions asked for opinions and definitions, the participants often brainstormed and provided a group answer that was used in the analysis. Exhibit 4 presents the core business of each of the twenty-six organizations, the home country of each organization, number of people participating in the interviews, job titles of respondents, and some descriptive statistics of interviewed organizations and their real estate portfolio.

Interviews

The interviews with the corporate real estate managers were conducted between January and June 2004. Typically, each interview lasted from one to two hours. At least two multilingual investigators participated in each interview, taking full notes. In the U.S., U.K., and the Netherlands, the interviews were conducted in English; in Finland, the interviews were conducted in Finnish. Thus, respondents in the U.S., U.K., and Finland were interviewed in their native language and those in the Netherlands were interviewed in their second language. After each interview, the notes and findings of both investigators were combined and compared. Subsequent to the interviews, the notes were transcribed and the Finnish interview transcripts were translated into English by the researchers.

In addition, four leading corporate real estate consultants were interviewed, one from each of the countries included in the study, to gain perspective through their knowledge and experience with dealing with these issues in different kinds of organizations and business environments. Consultants were selected based on having experience working with corporate real estate issues and strategic decision-making. In each country, the selected consultant represents a major CREM service provider firm. The most common job title among interviewed consultants was director or managing director. Their comments were helpful in the interpretation and organization of the results of the interviews with the corporate real estate executives.

Results

The survey data was analyzed using open, inductive content analysis following Miles and Huberman’s (1994) framework. Patterns and themes in the data were

Exhibit 4 1 Interviewed Organizations and Respondents

Titles of Total CREM Properties Owned

Core Business Country Respondents Respondents Employees Employees Total (m2) Properties

Panel A: Private Organization

Air transportation U.S. 1 CRE manager 60,000 57 430,000 43%

Alcohol industry U.K. 2 Facilities manager 24,000 — 1,000,000 90%

Automotive systems Netherlands 1 CRE director 40,000—— —

Bakery industry Finland 1 CRE director 3,900 1 180,000 —

Banking services U.S. 1 CRE transactions director 130,000 100 6,500,000 30%

Beverage industry U.S. 1 CRE director 70,000 11 4,000,000 88%

Broadcasting U.S. 2 CRE director 8,000 250 285,000 54%

VP of strategic planning

(property)

Broadcasting Finland 1 CRE manager 3,700 60 270,000 70%

Building services consulting Finland 1 Property manager 280 0.5 5,800 1%

Business consulting services U.K. 2 CRE director 9,000 20 — —

Data management U.S. 2 CRE director 4,800 2 120,000 2%

CRE manager

Electronics Netherlands 1 CRE financial controller 165,000 450 8,500,000 67%

Energy providing U.S. 1* CRE manager 25,000 91 1,600,000 40%

Energy providing Finland 1 CRE director 14,000 55 320,000 30%

Home appliances manufacturing U.S. 1* CRE director 68,000 8 4,600,000 68%

Telecommunication services Finland 1 CRE director 6,500 15 500,000 40%

Transportation (railway) Finland 2 CRE director 14,400 140 — —

Environment manager

Exhibit 4 1 (continued)

Interviewed Organizations and Respondents

Titles of Total CREM Properties Owned

Core Business Country Respondents Respondents Employees Employees Total (m2) Properties

Panel B: Public Organization

Education & research Finland 2 CRE director

Project manager 3,000 28 230,000 0%

Education & research U.S. 2 FM director

CRE manager 3,000 250 420,000 90%

Education & research Netherlands 1 CRE director 4,100 30 400,000 95%

Federal services U.S. 5 CRE director

Portfolio management

director

FM director

Property disposals director

Planning and

development director 1,000,000 500 3,600,000 44%

Municipal services Finland 2 CRE director

CRE manager 6,300 300 625,000 90%

Municipal services Finland 1 Facilities manager 13,000 390 900,000 85%

Municipal services Finland 2 CRE director

Property manager 6,300 36 430,000 85%

Municipal services Netherlands 1 CRE director 1,700 12 47,764 100%

National central banking Finland 1 CRE director 630 20 130,000 90%

Notes: There were 26 organizations and 39 respondents. *Phone interview

Modeling the Value-Adding Attributes 459

noted, links with previous literature drawn, and areas of notable contribution to existing knowledge identified. As is common with open-ended questions, respondents provided a variety of answers that require distillation and interpretation. A comparison of the content analysis between two of the authors was made and inter-researcher differences were resolved through discussion and reference back to the interview transcripts, as suggested by Miles and Huberman.

The intention was to get the respondents to think about defining ‘added value’ to the firm in a general way, and then to get them to specifically describe how they believe CREM adds value. Participants were allowed to provide multiple definitions of added value, some of which could overlap. However, when asked to define ‘added value’ in a general way, many of the respondents answered in the context of corporate real estate’s contribution, rather than in a broader, more generic sense. Thus, participants from ten firms (38%) stated that supporting the core business is adding value. Among those who were able to describe added value in a broader context, their perceptions do reflect several of the interpretations found in the literature. In line with shareholder theory, respondents from six firms (23%) contend that added value is about increasing the value of the firm. Twelve (46%) identify increasing profitability as a primary way to add value. Nine (35%) mention either improving efficiency or productivity as a means of adding value. Eight (31%) cite decreasing costs. Eight (31%) also mention increasing revenue or income. All these are consistent with the model proposed by Kaplan and Norton (2004). Evident from four of the responses is the contextual nature of what one can do to add value to the firm. The appropriate actions to add value to the firm vary with economic conditions and competitive position. Thus, one cannot identify one “best”way to add value to the firm. Respondents from seven firms point out the need for understanding that every firm has multiple stakeholders (owners, employees, customers) with sometimes conflicting goals. Thus, what would add value for one stakeholder may not add value to the position of other stakeholders, so actions must be evaluated in terms of their impact on each group of interested parties. A representative list of the respondents’ statements from which these interpretations are drawn is provided in Exhibit 5.

When asked how corporate real estate executives perceive that real estate and facilities management functions specifically create added value to the core business, the answers reflect several different themes. Exhibit 6 indicates how the responses from different organizations can be grouped to identify the most common themes while Exhibit 7 provides, with some editing, the interviewees’ stated perceptions of how CREM creates added value to the core business. Dominant throughout the responses is CREM’s supportive relationship to the core functions of the firm and the need for real estate decision makers to participate in the strategic process to ensure real estate strategies and decisions that support the core business. Several respondents point out how their real estate knowledge and expertise allows them to establish standards and decision criteria that ensure their efforts have the desired effects in support of the firm’s goals. Interviewees often describe the value added by CREM in terms of more than one concept.

J R E R Vol. 28 1 N o. 4 – 2006

4 6 0 I Lindholm, Gib l e r, and Lev a¨ i n e n

Exhibit 5 1 Summary of Interviewees’ Definitions of Added Value

Definition Statements: Added value is. .. (Statements may be classified into more than one category)

Supporting core business

Increasing the value of the firm

Increasing profitability

Increasing efficiency or productivity ‘the contribution to the employees’ performance by workplace.’ ‘supporting the core business workers so that they can concentrate on doing their work.’

‘improving core business processes.’

‘intangible and tangible goods that support the core business.’ ‘created by being a good service provider for end-users.’ ‘improving the core business. Could be work or services (output) which is more than economic or human capital (input).’ ‘producing high-quality and economical services to the core business.’

‘operation or activity that develops or improves core business. For employees it is different than for owners.’

‘activity or operation that increases directly or indirectly the value of the business compared to the situation where such an activity or an operation is not performed.’

‘the value over that profit investors are expecting for their investments.’

‘improving business performance and value to the owners.’ ‘increase in shareholder value (better returns to investments).’ ‘ability to proactively manage the portfolio so that the company can manage and survive in the ever changing environment.’ ‘improving the core corporation value.’

‘multi-layered—for employee it’s about satisfying the basic need,

so they can work efficiently; for the shareholders its about

making profit.’

‘improving the company’s operating income.’

‘providing services that help the customers grow their businesses

and increase their profitability.’

‘decreasing costs and improving efficiency.’

‘when output is more than input.’

‘input-> process -> service (added value = service - input).’

‘economies of scale.’

‘the contribution to the employees’ performance’

‘supporting the core business workers so that they can concentrate on doing their work.’

‘improving core business processes.’

‘improving the core business. Could be work or services (output) which is more than economic or human capital (input).’ ‘contribution to the effectiveness of the primary processes of core business.’

‘multi-layered—for employee it’s about satisfying the basic need, so they can work efficiently; for the shareholders its about making profit.’

Modeling the Value-Adding Attributes 1 461

Exhibit 5 1 (continued)

Summary of Interviewees’ Definitions of Added Value

Statements: Added value is. .. (Statements may be classified into

Definition more than one category)

Decreasing costs ‘decreasing costs.’

‘producing high-quality and economical services to the core

business.’

‘improving the company’s operating income.’

‘decreasing costs and improving efficiency.’

‘economical value added (expenses/ revenue).’

‘depends on cycles—business is struggling–cost reduction.’

Increasing revenue or income ‘improving core business processes and generating revenue.’

‘improving core business processes.’

‘improving the company’s operating income.’

‘generating revenue.’

‘providing services that help the customers grow their businesses

and increase their profitability.’

‘depends on cycles—business is growing–revenue growth.’

The most frequently (65% of firms) cited way that respondents believe CREM adds value to the core business is through cost reduction, a value-adding attribute identified by De Jonge (1996) and partially included in Nourse and Roulac’s (1993) real estate strategy list. Respondents mention reducing acquisition costs and occupancy costs through proper timing and economies of scale. Almost two-thirds (16) of the responding firm CREM staff think CREM adds value by actions that could increase productivity by supporting production and maintaining workspaces. Productivity is also one of the elements of added value identified by De Jonge.

The third most common method (50% of firms) of CREM adding value is “participating in the strategic process.”Respondents relate the importance of strategic planning in terms of “translating the business needs into real estate strategies and operations, which support core business strategies”and “aligning the core business and real estate and workplace strategies.”The added value of CREM is also thought to be related to knowledge of the core business and having good communication links and networks with the strategic level of the firm. Respondents cite how their “real estate department operates closely with other

support services of the firm such as human resources and information technology,”the “real estate department consults regularly with other business units concerning

the role of real estate,”and the “real estate department is able to speak the same language with different stakeholders.”These findings are in line with Pittman and Parker’s (1989) results, where they found that of the five most important factors that are believed to be important to a top-performing CREM department, four

J R E R 1 Vol. 28 1 N o. 4 – 2006

Modeling the Value-Adding Attributes 463

Exhibit 7 1 Summary of Interviewees’ Definitions of Added Value of CREM

Definition Statements: Added value is created to the core business by. .. (Statements may be classified into more than one category)

Decreasing costs

Increasing

productivity

Participating in strategic processes

Increasing employee satisfaction "standardizing workplaces.’

"being a control mechanism between business units and real estate needs.’

"efficient use of resources (workplaces).’

"minimizing occupancy costs.’

"optimizing real estate service production (outsourcing)"

"providing negotiation rooms with high-tech connections (major savings in

travel expenses).’

"having an economical view in service purchasing.’

"creating economies of scale (cheaper contracts).’

"providing services that save time or costs or work.’

"providing solutions for core businesses that lower their expenses.’

"providing more efficient working environment for the core business.’

"having knowledge of core business and by providing facilities that support

the core business.’

"providing services that save time or costs or work.’

"providing more efficient working environment for the core business.’

"supporting production.’

"providing space that is efficient and attracts employees.’

"improving logistics through site selection and planning.’

"ensuring that maintenance operations do not impact on the core business.’

"finding suitable locations for different functions.’

"providing workplace solutions that affect the productivity and

innovativeness of employees.’

"providing optimal working environment (lightning, acoustics, temperature,

etc.) for employees.’

"creating a good communication link with the strategic level of the

organization.’

"better core process knowledge.’

"speaking the same language with different shareholders.’

"providing strategic support in real estate issues.’

"realizing that all problems aren’t necessarily real estate problems.’

"forming a strategic link with the core business.’

"being professionals; advising core business in every level of real estate

issues.’

"having good relationships with the decision makers.’

"aligning the core business and real estate and workplace strategies.’

"translating the business needs.’

"providing optimal working environment for employees.’

"providing amenities desired by employees.’

"providing pleasant working environments to employees (clean...)’

"maintaining a world class workforce and world class workplace, which is

pleasant and productive.’

"being service oriented.’

"providing space that is efficient and attracts employees.’

"providing pleasant workplaces that are also pleasurable for end-users and

usability is high.’

"providing better customer services so that the end-customers are happier.’

464 I Lindholm, Gib l e r, and Lev a¨ i n e n

Exhibit 7 1 (continued)

Summary of Interviewees’ Definitions of Added Value of CREM

Definition Statements: Added value is created to the core business by. .. (Statements may be classified into more than one category)

Increasing value of assets

Increasing Flexibility

Promoting

marketing and sales

Increasing

innovations "making sure the portfolio is optimal for core business (not too much

capital tied up).’

"financial management in real estate issues.’

"selling properties (generating cash).’

"providing alternatives to meet operational and financial objectives.’

"knowing the right timing (when to sell).’

"providing real estate solutions that create value to shareholders.’

"acquiring properties for the lowest price and selling properties that are

surplus assets.’

"finding Flexible accommodation solutions with a short term, mid term, and

long term perspective.’

"being Flexible (instant offices, hot-desking etc...).’

"making sure the portfolio is optimal for core business (not too much

capital tied up).’

"delivering real estate when needed.’

"providing alternatives to meet operational and financial objectives.’

"becoming as Flexible as possible (cost efficiency and asset efficiency).’

"selecting properties that support the image and brand of the firm and

image of the whole industry.’

"ensuring that facilities support customer’s mission.’

"providing an appropriate infrastructure that focuses on safe environments

and customer service.’

"providing the right combination of amenities to support a given operations

at an appropriate marketplace.’

"creating a high-level real estate environment for the core business, which

attracts also customers, high-class buildings etc. "

"supporting the organization brand (providing workplaces that mirror the

brand).’

"workplace solutions that affect the productivity and innovativeness of employees’

"providing real estate solutions that support the revenue generating opportunities (innovations).’

"providing pleasant workplaces that are also pleasurable for end-users and usability is high.’

"creating synergy advantages by placing the employees based on their job tasks (use of work process descriptions in the workplace planning).’

Modeling the Value-Adding Attributes 465

were communications or relationship factors. Thus, the respondents recognize that their real estate decisions will not add value to the firm unless they are in alignment with the other functional departments’ decisions and in support of the firm’s overarching goals. Consistent with Miller and Cardinal’s (1994) findings that the amount of strategic planning can affect a firm’s performance, these corporate real estate managers recognize that such planning is essential to the firm’s success and their unit’s contribution to that success.

Respondents from 10 firms (38%) suggest the workplace and its role in recruiting and retaining a world class workforce is important in identifying the role of real estate in adding value to the firm indirectly. De Jonge (1996) did not identify this as an element of real estate’s added value, but Nourse and Roulac (1993) mention employee satisfaction under the broader heading of “promote human resource

objectives.”Timing the purchase and sale of real estate assets (managing the firm’s real estate

portfolio) is perceived by seven (27%) responding firms as a way they create added value. Respondents from seven firms also mention either physical or financial flexibility. Another six (23%) cite image and serving customers, both related to promotion and marketing, as means of CREM adding value to the firm. All three of these value-adding attributes were listed by De Jonge (1996), while Nourse and Roulac (1993) did not identify asset management as a corporate real estate strategy.

A less frequently perceived way to create added value is to “increase innovations”(12% of firms). The role of real estate in innovations was somewhat recognized

by De Jonge (1996) as changing the culture and Nourse and Roulac (1993) as facilitating knowledge work.

Revised Model of Value-Adding Attributes of CREM

Using the Balanced Scorecard structure and the research findings, the model can be expanded as presented in Exhibit 8, showing that business strategy can be comprised of two basic approaches for increasing the shareholders’ value: revenue growth and profitability. These corporate strategies must then be translated into supporting real estate strategies that guide operating decisions (as shown in Exhibit 8). The key idea in this model is to identify real estate strategies that can create added value to the core business, which contribute to the wealth of the firm and shareholder’s value. The proper combination of real estate strategies will vary depending on the corporation’s strategic positioning within the market. The firm may want to emphasize revenue growth through building the franchise and/or increasing value to its customers. Alternatively, it may want to emphasize profitability through improved cost structure and more efficient use of assets. Based on previous research presented earlier in this paper and the results of the interviews, the corporate real estate strategies are organized to support these core business strategies into the seven alternatives shown in Exhibit 8: (1) increasing

J R E R Vol. 28 1 N o. 4 – 2006

Exhibit 8 I Possible Tactical Real Estate Decisions in Support of Alternative Real Estate Strategies

•

Real estate decision making and operation level

Obtain current valuations of facilities

Select suitable locations

Manage risk associated with properties

Make lease/purchase decision on a facility by facility basis

Redevelop obsolete properties

Create and maintain IT-system for property management

Select locations that attract customers

Provide space that attracts customers

Make symbolic statement through design and location

Create workplaces that support the brand

Provide environment that supports the sale

Develop usability of the workplaces

Design facilities that allow innovative processes

Emphasize knowledge work settings

Allow users to participate in design phase

Seek locations convenient to employees

Provide pleasant working environment

Provide functional Workplace

Provide desired amenities

Respond quickly to real estate requests

Maintain facilities to accommodate optimal operations

Provide environment that enhances productivity

Choose convenient layouts and locations for providers

Design facilities that improve the creation and delivery of products

Choose convenient locations for employees in separate buildings

Choose leasing instead of owning

Negotiate short-term leases

Create flexible workplace solutions

Favour multiple use facilities

Select serviced offices

Minimize acquisition and financing costs

Minimize Operating expenses

Create economies of scale in acquisitions

Use workplaces more efficiently

Conduct routine maintenance

Balance between outsourced and in-house services

Act as a control mechanism

Utilize government incentives

Establish workplace standards

Modeling the Value-Adding Attributes 467

the value of assets, (2) promoting marketing and sales, (3) increasing innovation, (4) increasing employee satisfaction, (5) increasing productivity, (6) increasing flexibility, and (7) reducing costs. These strategies can be used to set objectives and guide real estate decisions, which have been shown in previous research to directly or indirectly affect the value of the firm.

The first strategy, increasing the value of assets through managing the real estate portfolio, views real property as a capital asset that can be managed to optimize financial contribution to the firm. Objectives may be to maximize the value of the property portfolio or ensure that the lowest cost alternative is chosen that considers all the short- and long-term costs of owning versus renting. However, proper management of the company’s portfolio must start with an inventory and valuation of current facilities, then management via a property information system.

Real estate can contribute to the marketing and sales strategies through site selection and physical design. Accessibility and visibility are keys to attracting customers and increasing revenues. Physical design can be used to create an image for the company among its suppliers, employees, customers, and investors, an indirect way of adding value to the firm.

Increasing innovations is a less familiar real estate strategy. Many firms are in knowledge businesses, which operate in very competitive environments. To survive and grow, they need to innovate. These firms need to provide workspaces that encourage and support innovative thinking and working. This requires the participation of the space users in planning spaces and providing the type, size, and design of workspace that creates an inspiring working atmosphere. This, in turn, will lead the firms to the increased revenues that manufacturers achieve through innovation.

Increasing employee satisfaction with their working environments depends on real estate and facilities management decisions concerning site selection, workplace design and amenities, and environmental quality. Firms making workplace decisions to improve employee satisfaction can expect to achieve the increased financial returns experienced by other firms in a range of industries who have recognized this indirect path to profits.

Increasing productivity will also lead to increased profitability. Real estate decisions about site selection, infrastructure, and interior design directly impact the functionality of the space, allowing employees to work more efficiently and effectively. Real estate and facilities decisions influence a number of personnel and system factors, which influence the level of productivity of the individual and, subsequently, the level of productivity of teams and profitability of an organization.

A strategy of increasing flexibility may include both physical workspace and financial terms. Many firms form and reform work teams within their offices on a regular basis. They experiment with flex time and shared jobs, which allow workers to share space. Others want be ready to move into and exit markets

J R E R Vol. 28 1 N o. 4 – 2006

4 6 8 I Lindholm, Gib l e r, and Lev a¨ i n e n

quickly as conditions change. In contrast, most space agreements are long-term and workspaces relatively fixed, obligating the firm to pay for space that may not be optimal for its operations. If one of the key drivers of flexibility for the firm is its workspace, then a real estate strategy that focuses on providing flexible space that can match the duration of business needs will support the firm’s core strategy and add value to the firm. Some operating decisions that would follow from a flexible real estate strategy include choosing spaces that can be adapted to multiple uses and workers, creating flexible workspaces within the structures, negotiating short-term leases that include options for expansion and contraction, and leasing rather than purchasing properties that are not essential to the core business.

The most familiar of the strategies to increase profitability is cost reduction. Reducing cost in any area has a direct and immediate impact on the financial performance of the firm. The most often mentioned real estate operating decisions to achieve cost reduction objectives include outsourcing some real estate services and using corporate real estate staff to oversee operating units’ real estate transactions. Other actions firms may consider in pursuit of this strategy include co-locating business units, occupying green buildings, and choosing locations based on governmental incentives. They may reduce expenses by negotiating lower rates for real estate related services and utilities, and increasing quality and timing of facilities maintenance to avoid costly repairs and capital expenditures.

For the real estate strategies outlined above to add value to the firm, CREM decision making must be linked to the strategic decision-making level of the organization and corporate real estate staff must possess knowledge of the core business and its needs. Such knowledge creates confidence among business units who are then more willing to cooperate and depend upon the corporate real estate staff to make value-adding decisions. It also ensures that CREM can communicate its contribution to the firm in a language that the top decision makers understand.

Exhibit 9 is an example of how to apply the developed framework and choose the right set of real estate strategies, which are linked to core business strategies. To demonstrate the path from core business goal to CREM operating decision, BSC strategy map structure is used, which specifies the critical elements and their linkages. One builds a strategy map from the top-down, starting with the core business strategy and then identifying the path to follow to reach that destination. The map illustrates how an organization can pursue CREM strategies, developing CRE skills and technology that will enable CREM to select and maintain work space that provides the proper environment (learning and growth perspective) that will improve operational efficiencies, enhance customer relationships, and increase innovations (internal process perspective) so that the organization can deliver value to the market (customer perspective), which will lead to sales that increase shareholder value (financial perspective). Firms can use this template together with the added-value framework (Exhibit 6) to develop their own strategy maps, which will help them identify the most suitable real estate strategies and operating decisions to support their firm’s core business strategies.

Modeling the Value-Adding Attributes 1 469

Exhibit 9 1 Example of How to Apply the Added Value of CREM Framework Using BSC Strategy Map

Structure

In the example illustrated in Exhibit 9, management has set a company-wide goal to increase flexibility to enable the firm to react quickly to changes in the market place, reducing overhead costs and increasing sales. CREM can support this goal by establishing and following a flexible real estate strategy. To be successful in its implementation of this strategy, the firm must employ qualified a real estate staff who understand the company’s plans and how to implement them. The CREM staff needs the technology and tools to create and maintain an inventory of owned and leased properties. Then the CREM staff can help create flexible workplace solutions by selecting appropriate properties, recommending leasing rather than owning non-core properties, and negotiating flexible lease terms. The firm can then move workers when needed, reconfigure workspaces, and dispose of underutilized and obsolete properties more quickly. These actions help reduce real estate-related costs, improve productivity, and thereby increase profitability. The firm can adjust locations quickly to improve accessibility to customers, increase sales to current customers, and open markets to new customers as well.

For long-term success, the firm must also develop a set of performance measures to assess its progress toward achieving its objectives and thereby its main goal of

J R E R 1 Vol. 28 1 N o. 4 – 2006

4 7 0 I Lindholm, Gib l e r, and Lev a¨ i n e n

maximizing shareholder wealth. Once a firm has translated its overall business strategy into the proper combination of real estate strategies, it can set specific objectives appropriate to its products and services and its position in the market. Measurements of key performance indicators can then be used to quantitatively assess whether real estate decisions are having the desired effect on the financial success of the firm. Simply relying on traditional measures, such as space per employee, will not provide sufficient data on which to base strategic decisions. Analysis of key performance indicators will allow managers to adjust real estate strategies and operations accordingly.

Conclusion

Many writers on corporate real estate stress the importance of the business environment and the role that CREM should play in enhancing business performance. This research identifies common themes from previous research as to how CREM could advance overall business performance and create added value for the core business. In addition, further knowledge has been gathered about actual corporate real estate practices from in-depth interviews with corporate real estate executives and service providers.

Using this information, a model was developed of how the value-adding attributes of CREM contribute to the core business and wealth maximization of the firm’s owners. Starting with a Balanced Scorecard approach, the two main ways by which CREM can add value to the firm are identified: revenue growth and profitability growth. The value-added section of the model is based on seven real estate strategies, which support these methods of maximizing wealth. This model provides a comprehensive structure spanning both traditional real estate strategies, such as cost reduction and increasing the value of assets, as well as other value-adding strategies related to real estate that often go unrecognized and unmeasured: promoting marketing and sales, increasing innovations, increasing employee satisfaction, increasing productivity, and increasing flexibility. The model incorporates many current business and management practices such as flexible work spaces and integration of information technology. The key to this model is that the real estate strategies follow from and support the overall business strategy and are both consistent and mutually reinforcing with other functional strategies within the firm. This model is extended to identifying real estate operating decisions that can support each of the seven real estate strategies.

This model can be operationalized within the established BSC framework. Each core strategy can be viewed from four perspectives: learning and growth, internal processes, customer value, and financial. Such analysis and structure will ensure that firms make real estate decisions that will both directly and indirectly support the core organizational goals. Further research is needed to validate the model in practice. Testing the developed framework with financial analysis will help firms determine which strategies are most effective for their circumstances, both in terms of revenue growth and profitability. Such analysis will require data collection

Modeling the Value-Adding Attributes 4 7 1

across time and firms to evaluate the different impacts of various real estate strategies and operating decisions.

The interviews conducted for this research reinforce how much both corporate real estate and general managers still view real estate as a cost of production that must be minimized, not as a strategic resource. While the majority of those contacted still emphasize reducing real estate costs, a substantial minority recognize the opportunity to use real estate resources to increase productivity, support core business strategies, and increase employee satisfaction. This research should be extended with the identification and refinement of specific performance measures, which can be used to quantify these additional ways value is added to the firm by corporate real estate via the strategic model presented in this paper. Leading and lagging performance indicators now being used require testing for reliability and validity. New indicators may be needed to better quantify the direct and indirect effects real estate has on corporate performance. Then a set of preferred measures can be offered from which firms can choose depending on their specific business strategy. This will identify what data firms need to collect to analyze real estate’s contribution to the firm and help CREM gain better recognition and reward for the value real estate adds to the firm.

End note

1 Using a convenience sample based on expert judgment, as in this survey, may result in sampling bias in that the sample respondents may not be representative of the population. Because some members of the population have no chance of being sampled, the extent to which a convenience sample actually represents the entire population cannot be known. Because one cannot specify the probability that each member of the population can be chosen, the results are not generalizable for statistical tests.

References

Ackoff, R., A Concept of Corporate Planning, New York: John Wiley & Sons, Inc., 1970.

Allen, M.T., R.T. Rutherford and T.M. Springer, The Wealth Effects of Corporate Real Estate Leasing, Journal of Real Estate Research, 1993, 8:4, 567–78.

Acoba, F.J. and S.P. Foster, Aligning Corporate Real Estate with Evolving Corporate Missions: Process-based Management Models, Journal of Corporate Real Estate, 2003, 5: 2, 143–64.

American Society of Interior Designers (ASID), Recruiting and Retaining Qualified Employees, Washington, DC: ASID, 1999.

Arthur Andersen & Co, NACORE International and CCIM, Real Estate in the Corporation: The Bottom Line from Senior Management, Chicago, IL: Arthur Andersen & Co., 1993.

Banker, R.D., G. Potter, and D. Srinivasan, The Empirical Investigation of an Incentive Plan that Includes Nonfinancial Performance Measures, Accounting Review, 2000, 75:1, 69–92.

Barkley, L., Key Performance Indicators, Journal of Corporate Real Estate, 2001, 3:2, 161–71.

J R E R Vol. 28 1 N o. 4 – 2006

4 7 2 I Lindholm, Gibler, and Lev a¨ i n e n

Bdeir, Z., Strategic Performance in Corporate Real Estate, M.Sc. thesis in Real Estate Development, Boston: Massachusetts Institute of Technology, September 2003.

Becker, B.W., D.O. Kaldenberg and J.H. McAlexander, Site Selection by Professional Service Providers, Journal of Marketing Theory and Practice, 1997 Fall, 5, 35–44.

Becker, F. and F., Steele, Workplace by Design, San Francisco, CA: Jossey-Bass Ltd, 1995.

Blakstad, S.H., A Strategic Approach to Adaptability in Office Buildings, PhD thesis, Trondheim: Norwegian University of Science and Technology, Department of Building Technology, 2001.

Bradley, S.J., What’s Working? Briefing and Evaluating Workplace Performance Improvement, Journal of Corporate Real Estate, 2002, 4:2, 150–59.

Brill, M., Using Office Design to Increase Productivity. Buffalo, N.Y.: Workplace Design and Productivity, 1984.

Burns, C.M., Analysing the Contribution of Corporate Real Estate to the Strategic Competitive Advantage of Organisations, 2002, Occupier.org, working papers, available at: http://www.occupier.org/papers/working paper10.pdf.

Capowski, G.S., Designing a Corporate Identity, Management Review, 1993, 82:6, 37–39.

Carn, N.G., R.T. Black, and J.S. Rabianski, Operational and Organizational Issues Facing Corporate Real Estate Executives and Managers, Journal of Real Estate Research, 1999, 17:3, 281–99.

Cefis, E. and M. Ciccarelli, Profit Differentials and Innovation, Economics of Innovation & New Technology, 2005, 14:1/2, 43–62.

Chesbrough, H.W., Open Innovation: New Imperative for Creating and Profiting from Technology, Boston, MA: Harvard Business School Press, 2003.

Craig, C.S., A. Ghosh, and S. McLafferty, Models of the Retail Location Process: A Review, Journal of Retailing, 1984, 60:1, 5–36.

De Jonge, H., Toegevoegde Waarde Van Concernhuisvesting, Paper presented at NSC-Conference, October 15, 1996.

Ernst & Young, Corporate Real Estate Outsourcing: 10 Years Later, Ernst & Young, 2002.

Ghalayini, A.M. and J.S. Noble, The Changing Basis of Performance Measurement, International Journal of Operations & Production Management, 1996, 16:8, 63–80.

Gibler, K.M. and R.T. Black, Agency Risks in Outsourcing Corporate Real Estate Functions, Journal of Real Estate Research, 2004, 26:2, 137–60.

Gibler, K.M., R.T. Black and K.P. Moon, Time, Place, Space, Technology and Corporate Real Estate Strategy, Journal of Real Estate Research, 2002, 24:3, 235–62.

Gibson, V., Changing Business Practice and Its Impact on Occupational Property Portfolios, London: RICS, 1998.

Gibson, V., Property Portfolio Dynamics: the Flexible Management of Inflexible Assets, Facilities, 2000, 18:3/4, 150–54.

Gibson, V.A. and R. Barkham, Corporate Management in the Retail Sector, Journal of Real Estate Research, 2001, 22, 107–27.

Gibson, V. and C. Lizieri, New Business Practices and the Corporate Property Portfolio, Journal of Property Research, 1999, 16, 201–18.

Modeling the Value-Adding Attributes 4 7 3

Gibson, V. and M. Louargand, The Workplace Portfolio as Contractual Arrangements, in M. Joroff and M. Bell (eds.), The Agile Workplace, Boston: Gartner and MIT, 2001, 37– 55.

Grant, L., Happy Workers, High Returns, Fortune, 1998 October 12, 137:1, 81.

Harris, R., Less a Castle, More a Condominium: Taking a Look at the Office of the Future, London: Gerald Eve Research, 1996.

Heskett, J.L., W.E. Sasser and L.A. Schlesinger, The Service Profit Chain: How Leading Companies Link Profit and Growth to Loyalty, Satisfaction, and Value, New York: Free Press, 1997.

Heskett, J.L., T.O. Jones, G.W. Loveman, W.E. Sasser Jr. and L.A. Schlesinger, Putting the Service-Profit Chain to Work, Harvard Business Review, 1994, 72:2, 164–74.

Iszo, J. and P. Withers, Value Shift: The New Work Ethic and What It Means for Business, Gloucester, MA: Fairwinds Press, 2001.

Ittner, C.D. and D.F. Larcker, Measuring the Impact of Quality Initiatives on Firm Financial Performance, Advances in the Management of Organizational Quality, 1996, 1, 1–37.

Jensen, M.C., Value Maximization, Stakeholder Theory, and the Corporate Objective Function, European Financial Management, 2001, 7:3, 297–317.

Joroff, M., M. Louargand, S. Lambert and F. Becker, Strategic Management of the Fifth Resource: Corporate Real Estate, Corporate Real Estate 2000 Series report number 49, IDRC, 1993.

Kaplan, A. and S. Aronoff, Productivity Paradox, Facilities, 1996, 14, March/April, 6–14.

Kaplan, R.S. and D.P. Norton, Using the Balanced Scorecard as a Strategic Management System, Harvard Business Review, 1996, January–February, 75–85.

Kaplan, R.S. and D.P. Norton, Having Trouble With Your Strategy? Then Map It, Harvard Business Review, 2000. 78: September–October, 167–76.

Kaplan, R.S. and D.P. Norton, Strategy Maps: Converting Intangible Assets into Tangible Outcomes, Boston: Harvard Business School Publishing Corporation, 2004.

Keegan, D.P., R.G. Eiler, and C.R. Jones, Are Your Performance Measures Obsolete? Management Accounting, 1989, 70:12, 45–50.

Kimbler, L.B. and R.C. Rutherford, Corporate Real Estate Outsourcing, Journal of Real Estate Research, 1993, 8, 525–40.

Kimes, S.E. and J.A. Fitzsimmons, Selecting Profitable Hotel Sites at La Quinta Motor Inns, Interfaces, 1990 March–April, 20, 12–20.

Kleeman, W.B. Jr., Out-tasking More Widespread than Outsourcing in the USA, Facilities, 1994, 12:2, 24–26.

Krumm, P.J.M.M., Corporate Real Estate Management in Multinational Corporations, Nieuwegeing: ARKO Publishers, 1999.

Krumm, P.J.M.M., History of Real Estate Management from a Corporate Perspective, Facilities, 2001, 19:7/8, 276–86.

Krumm, P.J.M.M. and J. de Vries, Value Creation through the Management of Corporate Real Estate, Journal of Property Investment & Finance, 2003, 21:1, 61–72.

Lambert, S., J. Poteete, and A. Waltch, Generating High-Performance Corporate Real Estate Service, Corporate Real Estate 2000 Series report No. 52, IDRC, 1995.

J R E R Vol. 28 1 No. 4 – 2006

474 I Lindholm, Gib l e r, and Lev a¨ i n e n

Lazonick, W. and M. O’Sullivan, Maximising Shareholder Value: A New Ideology for Corporate Governance, Economy and Society, 2000, 29:1, 13–35.

Lyne, J., Strategic Alliances, Site Selection, 1997, 39:3, 256–61.

Maister, D.H. Employee Attitudes Affect a Company’s Financial Success, Employment Relations Today, 2001, 28:3; 17–33.

McDonagh, J. and T. Hayward, Outsourcing Corporate Real Estate Asset Management in New Zealand, Journal of Corporate Real Estate, 2000, 2, 351–71.

Miles, M.C. and A.M. Huberman, Qualitative Data Analysis: An Expanded Source Book, Thousand Oaks, CA: Sage Publications, 1994.

Miller C.C. and L.B. Cardinal, Strategic Planning and Firm Performance, Academy of Management Journal, 1994, 37:6, 1649–65.

Nappi-Choulet, I., Corporate Property Outsourcing in Europe: Present Trends and a New Approach for Real Estate Economics. Paper presented to the IPD European Property Strategies Conference, Wiesbaden, May 2002.

Nonaka, I. and H. Takeuchi, The Knowledge-creating Company: How Japanese Companies Create the Dynamics of Innovation, New York: Oxford University Press, 1995.

Nourse, H.O., Measuring Business Real Property Performance, Journal of Real Estate Research, 1994, 9:4, 431–44.

Nourse, H.O. and S.E. Roulac, Linking Real Estate Decisions to Corporate Strategy, Journal of Real Estate Research, 1993, 8:4, 475–94.

OECD, The Nature of Innovation and the Evolution of the Productive System. Technology and Productivity—the Challenge for Economic Policy, Paris: OECD, 1991.

Pfnuer, A., C. Schaefer, and S. Armonat, Aligning Corporate Real Estate to Real Estate Investment Functions, Journal of Corporate Real Estate, 2004, 6:3, 243–63.

Pittman, R.H. and J.R. Parker, A Survey of Corporate Real Estate Executives on Factors Influencing Corporate Real Estate Performance, Journal of Real Estate Research, 1989, 4:3, 107–19.

Rucci, A.J., S.P. Kirn, and R.T. Quinn, The Employee-Customer Profit Chain at Sears, Harvard Business Review, 1998, 76:1, 82–97.

Schaefers, W., Corporate Real Estate Management: Evidence from German Companies, Journal of Real Estate Research, 1999, 17:3, 301–20.

Simons, R.A., Public Real Estate Management—Adapting Corporate Practice to the Public Sector, Journal of Real Estate Research, 1993, 8:4, 639–54.

Singhvi, S.S., A Quantitative Approach to Site Selection, Management Review, 1987 April, 76, 47–50.

Teoh, W.K., Corporate Real Estate Asset Management: The New Zealand Evidence, Journal of Real Estate Research, 1993, 8:4, 607–23.

Veale, P.R., Managing Corporate Assets, Journal of Real Estate Research, 1989, 4:3, 1– 22.

Wilson, C., J. Leckman, K. Cappucino and W. Pullen, Towards Customer Delight: Added Value in Public Sector Corporate Real Estate, Journal of Corporate Real Estate, 2001, 3: 3, 215–21.

Modeling the Value-Adding Attributes 4 7 5

The authors would like to thank the financial sponsors for this research: Tekes (National Technology Agency of Finland), Yleisradio Oy, VR-Group, Fortum Oy, Bank of Finland, Olof Granlund Oy, City Of Espoo, City of Lahti and City of Kuopio, as well as two anonymous reviewers for their helpful suggestions.

Anna-Liisa Lindholm, Helsinki University of Technology, FI-02015 HUT, Finland or anna-liisa.lindholm@tkk.fi.

Karen M. Gibler, Georgia State University, Atlanta, GA 30302-4020 or kgibler@ gsu.edu.

Kari I. Leva¨inen, Helsinki University of Technology, FI-02015 HUT, Finland or kari.levainen@tkk.fi.

J R E R Vol. 28 1 N o. 4 – 2006

ESANN 2012 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 25-27 April 2012, i6doc.com publ., ISBN 978-2-87419-049-0. Available from http://www.i6doc.com/en/livre/?GCOI=28001100967420.

Cluster homogeneity as a semi-supervised

principle for feature selection using mutual

information

Frederico Coelho1 and Antonio Padua Braga1 and Michel Verleysen2 

1- Universidade Federal de Minas Gerais - Brazil

2- Universit´e Catholique de Louvain - Belgium

Abstract. In this work the principle of homogeneity between labels and data clusters is exploited in order to develop a semi-supervised Feature Selection method. This principle permits the use of cluster information to improve the estimation of feature relevance in order to increase selection performance. Mutual Information is used in a Forward-Backward search process in order to evaluate the relevance of each feature to the data dis¬tribution and the existent labels, in a context of few labeled and many unlabeled instances.

1 Introduction

The solution of machine learning problems is often hampered by redundant information embeded into a large number of variables, which are usually chosen to represent the problem according to their availability and to some sort of a priori knowledge. Reducing the number of variables by Feature Selection (FS) may improve learning performance by smoothing the effects of the well known “curse of dimensionality” and “concentration of the euclidean norm” [1] problems. FS may also contribute to a better understanding of the variable behavior, bringing more clearly physical interpretation of real problems [2];

A common approach to FS is to estimate the relevance and to rank each feature according to their relation (or correlation) with the output targets [3, 4]. This approach is intuitive and easy to implement but it usually fails to consider the relevance of a given feature in the presence of others[2], since most filter methods are univariate [4, 5]. In this context, Mutual Information (MI) [6] arises as a good “relation” criterion, since it is a multivariate measure which is widely used to evaluate relations among sets of features and output labels.

Basically, labels and data are the available sources of information to perform FS. Many methods [3, 7] are able to deal only with labeled data while others only deal with unlabeled data [8, 9]. However, in many real situations, the amount of labeled data is not sufficient to characterize well the relations between input data and output classes. Since labeling by human experts can be costly, it is common in many kinds of problems to have large unlabeled data sets available and very few labeled data. Due to the availability of the large unlabeled data set, the question that arises in such a context is “why not to use information extracted from the unlabeled data in order to estimate feature relevance and

Work developed with founding support from CAPES - Process BEX 1456105

507

to induce models?” The joint use of labeled and unlabeled data to perform FS characterizes the semi-supervised feature selection paradigm.

Some machine learning approaches include clustering methods in order to la-bel instances. They are based on the assumption that the underlying distribuc-tions of the data, and their modes, can be estimated from the sampled data by clustering methods. One of the basic principles of structural data analysis is that labels are consistent with data distributions. Accordingly, the relevance of features to labels should also be reflected by the relevance of features to clusters.

In this work a semi-supervised FS strategy based on MI will be introduced. The basic principle of the method is replace, for unsupervised data, the label information by cluster information in order to estimate the relevance of each feature or feature subset.

This paper is organized as follows: first the FS framework will be summa¬rized. Then the use of unlabeled data into this framework will be detailed. Next some experiments will be presented as well as their results leading to the conclusions.

2 Feature Selection

Feature selection is usually accomplished according to a relevance criterion and to a search strategy. The former aims to assess how relevant a single feature sub-set is, while the latter aims to guide the search towards the most relevant feature subset, since, in practice, testing all possible subsets (exhaustive search) can be unfeasible even for problems with few variables. In this work a filter method is implemented using MI as a relevance criterion. Roughly speaking, MI measures the amount of information shared among two or more sets of variables [6] cap-turing even nonlinear relations among them. The multivariate properties of MI makes it an important approach to assess the relevance of subsets of features, since it may be affected by joint behavior of a feature in the presence of others. Equation 1 shows the relevance evaluation between the input data X and the output vector Y :

r = MI (X, Y ) . (1)

The implemented search technique is the forward-backward (FB) proce¬dure [10, 2]. The forward strategy has smaller capability to finding more com¬plementary features, compared to backward selection. On the other hand even the smallest nested subset is predictive. The backward strategy, in turn, is ca¬pable of finding complementary features, however, its performance is degraded for smallest nested subsets. So, the forward-backward process tries to get the best of both approaches.

3 Using Unlabeled Data

Evaluating feature relevance using MI requires that the data set contains some labeled data; however, small data sets may fail to represent well the general

508

0 2 4 6 8 10

X1 X1

(a) (b)

Fig. 1: For the two class XOR problem, in 1(a) none of the features alone can explain the distribution of the classes, defined by circles and crosses, and in 1(b), even without the labels, features 1 and 2 are still able to explain data distribution.

relation between input and output variables as shown in the illustrative examples of Figure 1(a). Is this example, the distribution of labels is not well represented if a small data set is sampled within the central circle. Since labeling can be costly, it is expected that unlabeled data could provide some information about the posterior probability of labels that could improve FS. The feature selection task could be performed by searching for those features that are important not only for labels, but also for clusters, which are expected to be consistent with labels. The use of both labeled and unlabeled data characterizes the semi-supervised paradigm.

Data distribution information can be useful even when there is a reasonable amount of labeled data. As an example, consider a forward FS procedure applied to a three dimensional problem, for which features Xs and X2 together fully explain the labels in Y and X3 is completely random (Fig. 1(a) shows the relevant features). Individually none of the three features is able to explain the labels, so in the first iteration of the algorithm (that will be univariate), their MI value will be small and, by chance, feature X3 could be ranked first, resulting in a poor initial subset selection. In such a situation the distribution of the dataset may provide additional information about the relevance of Xs and X2.

Features Xs and X2, together, are still able to discriminate the instances into four different clusters according to the distribution of the dataset, regardless of labels, as shown at Figure 1(b). So, if we are able to estimate the cluster structure that best fits data generator functions, we can estimate the relevance of each feature subset according to the dataset distribution. Each pattern, especially the unlabeled ones, can be associated to a given cluster and receive a tag according

1This is an hypothetical example to illustrate the problem. In real problems labeled and unlabeled data are not expected to be concentrate in different space regions.

509

to the cluster number (Figure 1(b)). These “cluster labels” assigned to each unlabeled data, results on the cluster label vector Ycl. In addition, the number of clusters Nc should be sufficiently large in order to guarantee label homogeneity within clusters.

In general, the MI between a feature set X and its vector of labels Y can be defined in terms of their joint and marginals probabilities as

Equation 2 can be rewritten by splitting the data according to their classes as shown in Equation 3 for a binary case, where superscripts (1) and (1) indicate respectively the data belonging to classes +1 and -1:

(3)

Assuming that, after clustering procedures, clusters Ci, i = 1, 2, ..., k are homogeneous and correspond to instances from the same class, i.e., they were generated in such a way that C1, C2, ..., Ci Y1 and Ci+1, ..., Ck Y 1, the MI can be rewritten as

(4)

In such situation, MI (X, Ycl) = MI (X, Y ). In practice, as we are dealing with unlabeled data, if the number of clusters is defined sufficiently large to allow that clusters encompass mostly instances from same class, we have that MI (X, Ycl)  MI (X, Y ). Equation 1 can now be rewritten as

~ ~

rs = MI X() X(u), Y  Ycl ,

where X() and X(u) are respectively the labeled and unlabeled data sets, Y is the label vector and Ycl is the vector of cluster labels. Equation 5 can be directly used in our forward-backward FS filter method. In this way more information about the relevance of each feature subset is provided taking into account the cluster information. Therefore cluster information replaces the “label” infor¬mation for unlabeled data in order to consider them in the evaluation of the MI.

4 Experiments and Results

The experiments consist in comparing the performances of feature subsets se-lected according to a pure supervised approach and the semi-supervised method

510

presented in this paper. A sequential FB FS strategy [10, 2] was implemented and applied to some real and synthetic datasets, using a MI estimator tailored to classification problems. This estimator was developed by Gom´ez et al. [10] which has high performance even in a context of scarce data. Data was clustered with K-means algorithm [11]; the number of clusters N is shown in Table 1. Nwas empirically chosen in such way to be sufficiently large in order to guarantee label homogeneity within clusters.

The final results aim at comparing the final feature subset obtained when us-ing only labeled data S, with the one obtained using both labeled and unlabeled data S. After sampling the two sets, the Linear Discriminant Analysis Method (LDA) was used in order to classify the test set in three different conditions: con-sidering only S, S or the set F of all features. The mean classification accuracy and standard deviation for 10 different trials are presented in Table 2. LDA was chosen to perform the classification tests due its simplicity and robustness.

Three data sets were used in the experiments. The first one (FBench) is a synthetic data set, originally developed for benchmark regression problems [12], whose output is a function of some of their random input variables. Its output was discretized into two classes (1 for Y > 0 and 1 for Y < 0) in order to transform it into a classification problem. Two other problems come from the UCI Machine Learning Repository (www.ics.uci.edu/mlearn/): the sonar data set, composed by instances of a sonar response from rocks and mines, and the Pen-Based Handwritten Digits data set, composed by digit samples from 44 different writers. For this last problem we considered only instances of digits 1 and 2 in the experiments.

On each trial a very small portion of data N was chosen as labeled data, another N quantity was selected as a test set and the rest N instances was considered as unlabeled data, so their labels were not considered in the FS task.

Table 1: Data and algorithm parameters: N is the total number of instances an N is the total number of features

Problem N N N N N N S S

FBench 10 10000 49 7952 1999 100 415 1-5-4-10-2-3

Sonar 60 208 11 147 41 40 46 46-36-20-27-30-16-43-24

Pen 16 2287 114 1717 456 30 4 4-15

Table 2: Shows the results for each test, where # is the final number of features of each subset.

Problem (# ) Accuracy ± σ

F S S

FBench (10) 0.8502±0.0123 (3) 0.8199±0.0127 (6) 0.8504±0.0122

Sonar (60) 0.7117±0.6667 (1) 0.6052±0.1343 (8) 0.6924±0.0803

Pen (16) 0.9808±0.0075 (1) 0.8478±0.0251 (2) 0.8780±0.0264

In all experiments the obtained accuracy for the subset S is higher than those obtained using the features selected when using only the labeled data. It is possible to observe in Table 2 that, for Fbench and Sonar problems, there is no representative accuracy loss when using only the features in S instead of using

511

all features. Only for the Pen data set there is a loss with respect to F. However, there is an improvement in accuracy with respect to using only the supervised set SP, as expected, since the objective here is to show that cluster information from unlabeled data, and consequently the proposed method, conveys information to improve FS.

5 Conclusion

This work proposes a semi-supervised FS method based on the principle of ho¬mogeneity between labels and data clusters. According to this principle the label distribution is consistent and coherent with the distribution of data. In that sense, estimation of data clusters can provide some hints about the poste¬rior label distribution. Therefore, features that are relevant to labels are also relevant to data distribution and, consequently to clusters. The results show that information retrieved from clusters can improve the estimation of feature relevance and of feature selection tasks, specially when the number of labeled data is too small and the unlabeled data is numerous.

References

[1] Michel Verleysen. Learning high-dimensional data, pages 141–162. IOS Press, Amster¬dam, 2003.

[2] Isabelle Guyon, Steve Gunn, Masoud Nikravesh, and Lotfi A. Zadeh. Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006.

[3] Kenji Kira and Larry A. Rendell. A practical approach to feature selection. In ML92: Proceedings of the ninth international workshop on Machine learning, pages 249–256, San Francisco, CA, USA, 1992. Morgan Kaufmann Publishers Inc.

[4] William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery. Numerical recipes in C (2nd ed.): the art of scientific computing. Cambridge University Press, New York, NY, USA, 1992.

[5] Ronald A. Fisher. The use of multiple measurements in taxonomic problems. Annals Eugen., 7:179–188, 1936.

[6] Thomas M. Cover and Joy A. Thomas. Elements of information theory. Wiley-Interscience, New York, NY, USA, 1991.

[7] C. Krier, D. Fran¸cois, F. Rossi, and M. Verleysen. Feature clustering and mutual in¬formation for the selection of variables in spectral data. Neural Networks, pages 25–27, 2007.

[8] Pabitra Mitra, C. A. Murthy, and Sankar K. Pal. Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Mach. Intell., 24(3):301–312, 2002.

[9] E. Llobet, O. Gualdron, J. Brezmes, X. Vilanova, and X. Correig. An unsupervised dimensionality-reduction technique. In Sensors, 2005 IEEE, 30 2005.

[10] Vanessa G´omez-Verdejo, Michel Verleysen, and J´erˆome Fleury. Information-theoretic feature selection for functional data classification. Neurocomput., 72:3580–3589, October 2009.

[11] S. Lloyd. Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2):129–137, March 1982.

[12] Jerome H. Friedman. Multivariate Adaptive Regression Splines. The Annals of Statistics, 19(1):1–67, 1991.

512

Penerapan Kendali Logika Fuzzy+Proportional Integral pada

Modul Process Control Simulator PCS 327 MK2

Wrastawa Ridwan

Jurusan Teknik Elektro

Universitas Negeri Gorontalo

email : r1space@yahoo.com

Abstrak. Respon output yang diperoleh dengan penerapan kontroler logika fuzzy adaptif pada modul Process Control Simulator PCS 327 MK2 ternyata masih kurang memuaskan yaitu settling time yang lambat pada saat pembebanan dan overshoot yang besar. Pada penelitian ini akan dirancang suatu kontrol hybrid terdiri dari kontroler logika fuzzy adaptif dan kontroler Proportional Integral (PI) yang digunakan untuk mengendalikan respon output dari modul Process Control Simulator PCS 327 MK2. Gain Kontrol digunakan untuk mengubah sinyal kontrol sehingga didapatkan respon output dari sistem sesuai dengan yang diinginkan (setpoint). Penambahan kontroler integral digunakan untuk mengurangi overshoot yang ada. Algoritma pembelajaran yang digunakan pada sistem fuzzy adalah gradient descent untuk mengestimasi parameter-parameter fuzzy. Sistem akan diberi gangguan beban setelah mencapai steady state sebesar 0,5 volt dan 1 volt.

Dalam penelitian ini diperoleh hasil respon output dengan rata-rata error terkecil pada saat K = 1,2 dan Ki=0,2 dengan rata-rata error 0,2245 Volt.

Kata kunci : Process Control Simulator PCS 327 MK2, kontrol hybrid

1. Pendahuluan

Process Control Simulator PCS-327 MK-2 adalah simulator analog dengan fungsi tertentu yang memakai rangkaian terpadu operational amplifier disusun dengan aturan sedemikian sehingga prinsip-prinsip metode kontrol proses dapat diajarkan pada level teknisi dan teknologis. Secara umum, PCS 327 MK-2 ini terdiri dari bagian controller dan bagian proses. Bagian controller terdiri dari aksi kontrol integral, derivative dan proportional. Seiring dengan perkembangan teknologi pengaturan kerja sistem, maka pengontrolan sistem atau yang sering disebut controller terasa semakin diperlukan keberadaannya. Khususnya di bidang pengontrolan perangkat keras, baik itu berupa alat-alat elektronik maupun alat-alat mekanik, telah dikenalkan berbagai metode pengontrolan yang cukup baik untuk digunakan, salah satunya adalah pengendali logika fuzzy.

Pada penelitian ini akan dianalisis penerapan kontrol hybrid pada bagian proses dari Process Control Simulator PCS-327 MK-2 sedangkan controller-nya menggunakan adaptive fuzzy logic controller dan pengendali PI (Proportional Integral). Keterbatasan pengetahuan operator manusia mengenai plant yang akan diatur menyebabkan kesulitan dalam menentukan aturan-aturan dasar dalam perancangan fuzzy logic controller, sehingga dikembangkan suatu metode untuk mengidentifikasi sistem dengan menerapkan konsep inversi kausal. Sedangkan pengendali PI diharapkan dapat mengendalikan respon sistem untuk mengatasi overshoot yang ada pada saat pembebanan.

Permasalahan dalam penelitian ini adalah bagaimana mendapatkan respon output dari plant berupa modul Process Control Simulator PCS-327 MK2 menggunakan kontrol hybrid, yaitu adaptive fuzzy logic controller dan pengendali PI (Proportional Integral). Identifikasi dan desain kontrol menerapkan metode inversi kausal dan metode gradient descent training sebagai algoritma pembelajaran. Identifikasi dilakukan secara online sehingga didapatkan parameter-parameter dari model fuzzy yang nantinya digunakan sebagai acuan untuk menentukan besarnya sinyal kontrol yang diperlukan agar menghasilkan respon output yang diinginkan (sama dengan setpoint), terutama apabila sistem diberi gangguan beban.

Penelitian ini dibatasi pada penerapan kontrol hybrid (adaptive fuzzy logic controller dan pengendali PI) pada bagian proses modul Process Control Simulator PCS-327 MK2. Kemudian sistem akan diberi gangguan beban setelah mencapai steady state sebesar 0,5 volt dan 1 volt.

2. Tinjauan Pustaka

2.1 Sistem Fuzzy Adaptif

Terdapat dua cara pendekatan untuk adaptive fuzzy controller, yaitu direct adaptive control dan indirect adaptive control [Passino, 1997] . Dalam penelitian ini akan digunakan indirect adaptive fuzzy controller.

Pada indirect adaptive fuzzy controller dilakukan on line identifikasi untuk mengestimasi parameter-parameter plant dan bagian “controller designer” akan mengubah parameter dari kontroler. Jika parameter plant berubah, identifier akan mengestimasi parameter plant dan controller designer akan mengubah parameter kontroler. Pendekatan ini dinamakan indirect adaptive control karena kita mengubah kontroler secara tidak langsung melainkan terlebih dahulu harus mengestimasi parameter plant.

Pada direct adaptive fuzzy controller tidak memerlukan perantaraan model proses atau tahap identifikasi parameter, mendapatkan parameter kontroler secara langsung dari perbandingan aktual performansi loop tertutup dengan beberapa kelakuan yang diinginkan lewat indeks performansi keluaran yang menunjukkan adaptasi kontroler.

Beberapa kelebihan kontroler adaptif secara tak langsung (indirect adaptive control) dibandingkan pendekatan secara langsung (direct adaptive control) :

n Pembangkitan suatu model plant membolehkan perubahan parameter yang terdeteksi secara tiba-tiba, sesuai dengan hasil penjejakan dari karakteristik sementara.

n Pemisahan model adaptasi dari disain kontroler, membolehkan model untuk dianalisis secara terpisah dari performansi kontroler dan kestabilan sistem.

n Spesifikasi performansi kontroler dapat diubah untuk mengakomodasi batasan-batasan yang baru.

Gambar 1 Indirect adaptive fuzzy controller

Identifikasi Model Fuzzy

Sistem Logika Fuzzy yang digunakan adalah menggunakan :

- Singleton Fuzzyfier

- Center average defuzzifier

- Aturan penalaran produk ( produk inference engine )

- Fungsi Keanggotaan Gaussian

Sistem Fuzzy yang akan dirancang mempunyai bentuk :

 x xl 2 

 N  

exp  i i 

i  1  

 l 

 i 

 

x x l 2 

M  N exp  i i 

i  1  

  l 

  i 

Parameter yang dapat diubah - ubah ( adjustable ) dari sistem logika fuzzy di atas adalah :

- yl V

- x i  U

i

V adalah semesta pembicaraan pada keluaran, sedangkan Ui adalah semesta pembicaraan pada masing-masing masukannya. M adalah banyaknya aturan fuzzy dan N adalah

banyaknya masukan. Sedangkan f(x) adalah keluaran sistem fuzzy tersebut. Variabel l

dan l

i ( i = 1,...,N;l = 1,...,M ) masing - masing adalah titik tengah dan lebar dari fungsi

keanggotaan masukan. Masukan dari sistem fuzzy tersebut dinotasikan dengan xi ( i =

1,...,N), sedangkan titik tengah fungsi keanggotaan di keluarannya adalah l

y (l = 1,...,M).

2.2 Kontrol Hybrid

Kontrol hybrid adalah kontrol yang menggunakan dua atau lebih kontroler dalam satu

sistem. Blok diagram dari kontrol hybrid yang terdiri dari indirect adaptive fuzzy logic

controller dan kontroler Proportional Integral (PI) secara keseluruhan dapat dilihat pada Gambar 2.

Gambar 2 Diagram blok kontrol hybrid

Jika prosedur pemodelan fuzzy secara adaptif digunakan untuk menyediakan pemodelan proses dalam desain kontroler dilakukan secara online, maka hasilnya adalah berupa suatu kontroler fuzzy adaptif secara tak langung ( indirect adaptive fuzzy logic controller ). Sedangkan penambahan kontroler PI adalah untuk mengurangi overshoot. Persamaan kontroler PI adalah sebagai berikut.

U(k)  Ki * (e(k)  e(k  1))

Struktur kontroler ini terdiri atas dua buah umpan balik, yaitu umpan balik untuk kontroler dan umpan balik untuk update pemodelan plant. Umpan balik untuk kontroler menggunakan hasil observasi keluaran untuk menentukan sinyal kontrol yang dibutuhkan yang digunakan sehingga dihasilkan respon yang diinginkan, umpan balik kontroler, sedangkan umpan balik untuk update pemodelan proses menggunakan observasi masukan/keluaran plant untuk mengadaptasi model invers proses yang dibutuhkan untuk umpan balik kontroler.

3 Metode Penelitian

Metode pengerjaan penelitian ini dibagi dalam beberapa tahap :

1. Mengumpulkan bahan literatur.

2. Mempelajari teori sistem logika fuzzy untuk identifikasi dan controller.

3. Membuat software untuk identifikasi dan controller

4. Penyiapan hardware (PCS 327 MK-2, komputer, ADC/DAC).

5. Mengumpulkan data dengan melakukan proses identifikasi sistem dan menerapkan controller pada sistem (bagian proses dari modul Process Control Simulator PCS 327 MK2) dengan perubahan gain kontrol.

6. Analisis data untuk menentukan spesifikasi respon sistem.

7. Pembuatan laporan penelitian

Penelitian ini akan dilakukan pada Laboratorium Teknik Pengaturan, Jurusan Teknik Elektro, Institut Teknologi Sepuluh Nopember Surabaya, khususnya pada tahap penyiapan hardware, pengumpulan dan analisis data.

Perangkat keras yang digunakan dalam penelitian ini berupa 1 (satu) set komputer untuk mengolah data secara on-line, peralatan ADC/DAC berupa multi lab Card PCL-712 dan plant yang dikontrol berupa modul PCS 327 MK2. Hubungan antar perangkat keras diperlihatkan dalam Gambar 3.

Perangkat lunak yang digunakan untuk mengimplementasikan algoritma kontrol menggunakan bahasa pemrograman C.

komputer PCL 712

Gambar 3 Blok diagram perancangan sistem

Sebelum diterapkan kontroler logika fuzzy adaptif perlu dilakukan proses identifikasi terlebih dahulu. Hasil dari proses identifikasi tersebut berupa parameter-parameter fuzzy yang kemudian digunakan untuk kontroler logika fuzzy adaptif. Proses pembelajarannya menggunakan algoritma gradient descent.

Untuk lebih jelasnya, proses perancangan kontroler logika fuzzy adaptif yang diterapkan pada modul Process Control Simulator PCS 327 MK2, melalui algoritma sebagai berikut : a. Algoritma identifikasi plant

1. Modul Process Control Simulator PCS 327 MK2 diberi masukan step sebesar 3.8 volt.

2. Menentukan jumlah rule (= 30)

3. Menentukan inisialisasi parameter model invers fuzzy yaitu x1 l (0), x2l (0), l

y (0), l

61 (0),

62 (0) yang nantinya akan diestimasi.

4. Mengestimasi parameter-parameter tersebut dengan algoritma gradient descent berdasarkan data masukan/keluaran plant.

5. Data parameter-parameter dari langkah 4 digunakan untuk inisialisasi kontroler.

b. Algoritma Kontroler Hybrid

1. Hasil identifikasi model invers fuzzy yang berupa parameter-parameter fuzzy digunakan untuk inisialisasi kontroler fuzzy adaptif.

2. Gangguan beban diberikan pada saat respon loop tertutup dari plant mencapai steady state. Beban diberikan sebesar 0.5 volt (13,15%) dan 1 volt (26,31%).

3. Memberikan input kontrol gain (K) yang berubah kemudian menghitung Mean Error (ME).

4. Pada K yang sama diberikan input kontrol gain integral (Ki) yang berbeda.

5. Menganalisis respon sistem untuk mengetahui karakteristik performansi sistem meliputi time settling (ts) pada saat transien dan pembebanan, maksimum overshoot (Mp) dan persen maksimum overshoot (%Mp), output steady state, error steady state serta seberapa besar rata-rata error (Mean Error).

Software ini dibuat untuk menangani akuisisi data dari modul Process Control Simulator dan mengoleksi data input-output untuk identifikasi secara online, serta untuk implementasi pengaturan dengan kontroler hybrid (logika fuzzy adaptif dan PI).

4 Hasil Penelitian Dan Pembahasan

4.1 Hasil Identifikasi Plant

Identifikasi sistem dilakukan secara online dengan memberikan masukan step sebesar 3,8 volt pada plant. Identifikasi yang dilakukan adalah identifikasi model invers fuzzy dari plant untuk mendapatkan parameter-parameter sistem fuzzy dengan menggunakan algoritma pembelajaran gradient descent dengan ketentuan sebagai berikut :

- Jumlah rule (M) = 30, Kecepatan pembelajaran () = 0,5

Parameter-parameter sistem fuzzy hasil dari identifikasi dapat dilihat pada Tabel 1

[Ridwan, 2007].

Parameter-parameter hasil identifikasi diatas nantinya digunakan untuk inisialisasi parameter pada kontroler logika fuzzy adaptif secara online. Data parameter-parameter diatas terlihat bahwa nilai l

x1 mengalami perubahan cukup besar dari rule ke 1 sampai ke

15, rule ke 16 sampai 29 tidak banyak berubah dan berubah lagi pada rule ke 30. Untuk nilai l

x2 tidak mengalami perubahan setelah rule ke 22. Nilai y l tidak mengalami perubahan mulai rule ke 15 sampai 27 dan berubah sedikit pada rule ke 28 samapi rule ke 30. Nilai l

cenderung tetap, sedangkan nilai 61l berubah dari rule ke 1 sampai 29 dan berubah drastis pada rule 30 [Ridwam, 2007].

4.2 Respon Sistem Tanpa Kontroler PI

Respon keluaran setelah diimplementasikan kontroler logika fuzzy adaptif jika diberikan masukan step sebesar 3,8 volt dengan kontrol gain yang bervariasi seperti terlihat pada Tabel 2 dan Tabel 3 [Ridwan, 2007].

Tabel 1 Parameter-Parameter l

x1, x2l, l

y , l

61 , l

62 Sistem Fuzzy Hasil Identifikasi

Rule l

x1 (V) l

x2 (V) l

y (V) l

61 (V) l

62 (V)

1 06268473761 30500610500 34168127115 36446857694 00500000000

2 05881927408 30500610500 31784460826 36446857694 00500000000

3 08579908202 30500610500 24620093940 38796155001 00500000000

4 10618599637 32063492063 15919073582 38986814149 00500000000

5 16083521280 32063492063 10031400384 43459853749 00500000000

6 17946944795 33626373626 06514397262 43001862782 00500000000

7 24477608092 33626373626 03047860891 45940793200 00500000000

8 27516889183 33626373626 00831116144 43474977849 00500000000

9 30981920489 33626373626 -0041106747 44427051556 00500000000

10 31201334242 35189255189 -0186007371 44614273753 00500000000

11 32958200965 35189255189 -0310024055 44977758296 00500000000

12 34360228654 35189255189 -0378948859 43231318135 00500000000

13 35719326341 35189255189 -0411089448 43428175513 00500000000

14 36807137855 35189255189 -0437764554 41725253782 00500000000

15 37478390179 35189255189 37999999999 41790979614 00500000000

16 37486254407 36752136752 37999999999 42289555727 00500000000

17 37821402191 36752139470 37999999999 42326570255 00500008496

18 38216375798 36752139470 37999999999 40868048292 00500008496

19 38544592207 36752139470 37999999999 40881946683 00500008496

20 39177562854 36752139470 37999999999 40216559066 00500008496

21 39392872794 36799724815 37999999999 40221217483 00648749014

22 39355882148 38315018315 37999999999 41050773056 00500000000

23 39462880257 38315047017 37999999999 41053234543 00500089708

24 39700170073 38315047017 37999999993 40513128230 00500089708

25 39740670116 38315047017 37999999993 40513559643 00500089708

26 39740670116 38315047017 37999999993 40513559643 00500089708

27 39775356182 38315047017 37999888812 40513936251 00500089708

28 39980878646 38315047017 36053363235 40028210263 00500089708

29 28937728937 38315047017 36053363235 38315018315 00500089708

30 30500610500 38315047017 36053363235 00500000000 00500089708

Tabel 2 Spesifikasi kontrol dengan K bervariasi, tanpa beban

Spesifikasi

Gain (K) ts (det) Mp (V) Oss (V) Ess (V) Rata-rata error (V)

1,0 7,4699 0,4688 3,8315 0,0315 0,4364

1,1 7,1899 0,7814 3,8315 0,0315 0,4103

1,2 6,5899 0,9377 3,8315 0,0315 0,3873

1,3 9,9399 1,0940 3,8315 0,0315 0,3707

Tabel 3 Spesifikasi kontrol dengan K bervariasi, berbeban

Spesifikasi

Gain (K) Beban 0,5 V Beban 1 V

ts1 (det) ts2 (det) ts3 (det) ts4 (det)

1,0 6,0899 4,8899 6,2599 7,4699

1,1 4,6199 4,2299 5,2799 6,9199

1,2 4,0699 3,3499 4,4499 4,8899

1,3 3,2399 2,8599 3,7899 4,7799

Berdasarkan spesifikasi kontrol diatas terlihat bahwa rata-rata error untuk K = 1,0 paling besar disebabkan oleh time settling (ts) lambat baik pada saat transient maupun pada saat pembebanan. Rata-rata error paling kecil pada saat K = 1,3 karena mempunyai ts yang paling cepat pada saat pembebanan. Pada saat pembebanan, diperoleh ts yang paling cepat pada K = 1,3. Namun K yang besar ini mengakibatkan ts pada saat transien dan maximum overshoot menjadi besar juga.

Data diatas memperlihatkan bahwa makin besar harga K, harga Mp makin besar dan ts pada saat pembebanan makin cepat [Ridwan, 2007]. Diperoleh pula bahwa harga Mp dan ts masih terlalu besar, sehingga spesifikasi ini yang akan diperbaiki dengan penerapan kontrol hybrid, dengan menambahkan kontroler PI pada sistem.

4.3 Respon Sistem dengan Kontroler Hybrid

Kontrol hybrid pada penelitian ini adalah kontrol fuzzy adaptif dan kontrol Proportional Integral (PI). Penambahan kontroler PI disini untuk mengurangi maximum overshoot (Mp) dan mempercepat transien, terutama pada saat pembebanan. Pada percobaan ini diberikan setpoint 3,8 volt dengan kontrol gain (K) dan konstanta integrator (Ki) yang bervariasi. Penambahan tegangan beban diberikan ketika sistem telah mencapai steady state.

Tabel 4 Spesifikasi kontrol untuk Ki= 0,2 dan K bervariasi

Harga K Parameter

ts (det) Mp (V) Ess (V) Pembebanan Rerata

Error (V)

ts1 (det) ts2

(det) ts3

(det) ts4

(det)

1 2,69 0,1563 0,0315 6,1 4,51 8,79 7,14 0,3047

1,1 6,22 0,3126 0,0315 4,99 3,95 6,48 6,98 0,2929

1,2 5,43 0,4688 0,0315 3,96 3,14 4,99 4,61 0,2885

1,3 5,44 0,7814 0,0315 3,24 2,69 5,06 3,68 0,3006

Dari data diatas diperoleh, semakin besar harga K, harga ts pada saat pembebanan makin kecil sehingga harga rata-rata error makin kecil. Error steady state (Ess) sama untuk semua harga K. Harga Mp sebanding dengan kenaikan harga K. Diperoleh respon terbaik pada K=1,2 dengan harga rata-rata error paling kecil.

Tabel 5 Spesifikasi kontrol untuk Ki= 0,25 dan K bervariasi

Harga K Parameter

ts (det) Mp (V) Ess (V) Pembebanan Rerata

Error (V)

ts1 (det) ts2

(det) ts3

(det) ts4

(det)

1 2,2 0,1563 0,0315 6,54 4,34 8,95 7,58 0,2876

1,1 4,23 0,3126 0,0315 5,16 3,68 7,31 6,99 0,2820

1,2 4,99 0,4688 0,0315 4,06 3,14 5,44 4,61 0,2925

1,3 5,5 0,4688 0,0315 3,35 2,85 4,22 3,68 0,2560

Dari data diatas diperoleh, semakin besar harga K, ts pada saat transien dan Mp makin besar. Pada saat pembebanan, semakin besar harga K, semakin kecil harga ts. Kecenderungan nilai rata-rata error makin kecil dengan kenaikan harga K. Diperoleh respon terbaik pada K=1,3 dengan harga rata-rata error paling kecil.

Tabel 6 Spesifikasi kontrol untuk Ki= 0,3 dan K bervariasi

Harga K Parameter

ts (det) Mp (V) Ess (V) Pembebanan Rerata

Error (V)

ts1 (det) ts2

(det) ts3

(det) ts4

(det)

1 2,14 0 0,0315 6,54 4,62 9,72 7,86 0,2524

1,1 3,62 0,1563 0,0315 5,16 3,68 7,69 6,37 0,2532

1,2 3,95 0,3126 0,0315 4,18 3,13 7,10 5,00 0,2245

1,3 9,34 0 0,0315 6,1 3,68 6,87 6,37 0,3641

Dari data diatas diperoleh, semakin besar harga K, ts pada saat transien makin besar. Harga Mp relatif kecil, bahkan pada K = 1 dan K = 1,3, tidak terdapat overshoot. Pada saat pembebanan, semakin besar harga K, kecenderungan harga ts semakin kecil. Kecenderungan nilai rata-rata error makin kecil dengan kenaikan harga K. Diperoleh respon terbaik pada K = 1,2 dengan harga rata-rata error paling kecil.

Secara umum diperoleh dari analisis respon output sistem, bahwa kenaikan harga K membuat harga ts saat transien semakin besar, harga Mp semakin besar dan harga ts saat pembebanan semakin kecil. Kenaikan harga Ki mengakibatkan harga ts saat transien makin besar dan harga Mp semakin kecil. Nilai ts saat pembebanan cenderung mengalami kenaikan.

Analisis data secara keseluruhan, diperoleh rata-rata error yang terkecil pada saat Ki = 0,3 dan K = 1,2.

5 Kesimpulan

1. Kenaikan harga K membuat harga ts pada saat transien semakin besar, harga Mp semakin besar dan harga ts pada saat pembebanan semakin kecil.

2. Kenaikan harga Ki mengakibatkan harga ts pada saat transien makin besar dan harga Mp semakin kecil. Nilai ts saat pembebanan cenderung mengalami kenaikan.

3. Secara keseluruhan diperoleh rata-rata error yang terkecil pada saat Ki = 0,3 dan K = 1,2.

Daftar Pustaka

1 Jang, J.S.R., Sun C.T., Mizutani, E., 1997, Neuro-Fuzzy Fuzzy and Soft Computing : A Computational Approach to Learning and Machine Intelligence, New Jersey : Prentice Hall International, Inc.

2 Landau, I.D, 1990, System Identification and Control Design Using P.I.M + Software, New Jersey : Prentice Hall International, Inc.

3 Ogata, Katsuhiko, 1997, Teknik Kontrol Automatik Jilid I, Jakarta : Erlangga.

4 Passino, Kevin M., Yurkovich, S., 1997, Fuzzy Control, New York : Addison Wesley Logman.

5 Ridwan, Wrastawa, 2007, Penerapan Adaptive Fuzzy Logic Controller Pada Modul Process Control Simulator PCS 327 MK2, Gorontalo : Penelitian Dosen Muda.

6 Wang, Li-Xin, 1997, A Course in Fuzzy Systems and Control, New Jersey : Prentice-Hall International, Inc.

7 Yan, J., Ryan, M., Power, J., 1994, Using Fuzzy Logic Toward Intelleigent Systems, Cambridge : Prentice-Hall International (UK) Limited.

8 ....., 1998, Model PCL-712 : Multi-Lab (12 bit ) A/D+D/A+DIO, London : Advantech Co. Ltd.

9 ....., 1998, Process Control Simulator PCS327 Book 1, London : Feedback Instruments Ltd.

CALL FOR PAPERS

THE 35th IEEE

PHOTOVOLTAIC

SPECIALISTS CONFERENCE

June 20-25, 2010

Hawaii Convention Center

Honolulu, Hawaii

PLEASE CHECK OUR WEB SITE AT

www.ieee-pvsc.org

Sponsored by the IEEE Electron Devices Society

Invitation from the Chair

On behalf of the Organizing, Cherry, and International Committees, it is my great pleasure to invite you to join the 35th IEEE Photovoltaic Specialist Conference, June 20-25, 2010, at the Hawaiian Convention Center in Honolulu, Hawaii. We continue our role as the premier technical conference covering all aspects of PV technology from basic material science to installed system performance. We also continue our Industrial Exhibition that brings our PV Specialists together with the PV industry. Set with the backdrop of tremendous progress in Hawaiian renewable energy initiatives and the beautiful location of Waikiki, this will be THE PV conference in 2010.

Highlights include:

Strong Technical Program: In addition to our traditional eight topical areas, we have created a separate Area for Organic Photovoltaics, and we created a new area titled: “Advances in Characterization of Photovoltaics”.

Full Day of Tutorials: We will have nine tutorials this time, consisting of half-day lectures taught by experts in the field. The topics will range from the basic physics of solar cell operation to details about the latest trends in the industry that will be valuable to newcomers to PV as well as seasoned veterans.

Industrial Exhibition: Within the gorgeous Hawaiian Convention Center, our exhibit space will be designed to bring together the commercial sector and the Photovoltaic Technologist. Our focus will be on measurement and characterization tools, and we will be highlighting space PV applications.

Student Participation: Our technical community is only as vibrant as our student body, so we have created incentives to encourage students to attend and to be active participants in the conference, including reduced registration and tutorial fees, special student hotel rates, and best student presentation awards in each technical area.

Hotel Accommodations: One of our assets is the Photovoltaic Specialists community itself, and our conference provides the meeting place for this community. To maintain our continuity as a community, we are using the Hilton Hawaiian Village as our Base Hotel. Within easy walking distance to the Convention Center, you cannot find a more beautiful location right on Waikiki Beach. We have a low conference rate of $179/night (the Federal Government rate) and an incentive package for our attendees.

Auxiliary Program: We will hold a full agenda throughout the week addressing broader issues within the PV Community. The “PV Velocity Forum” will explore methods to accelerate the transition of new technology from research to market. The “Women in Photovoltaics Symposium” will address the role of women in the historically male-dominated PV World. PV grid integration symposia will benefit from our location since Hawaii has been a leader in PV grid integration and members of the Hawaiian Government and Energy Company will join us.

Social Program: Continuing our theme of enhancing our PV Specialists community, our goal is to create relationships on a social as well as professional level amongst our attendees, families, and companions. From the Cherry Award Reception to the Conference Banquet to the daily sightseeing tours, the social program is going to be a blast. Arrive early and stay late!

We urge you to register for the meeting, as well as to make your hotel reservation, well ahead of the deadline. The increased interest PV is likely to lead to greatly increased attendance and the hotel will maintain our group rate and room block for a limited time. Please join us in Honolulu and help to make the 35th Photovoltaic Specialists Conference a memorable event.

Robert Walters

General Chair

Call For Papers

35th IEEE Photovoltaic Specialists Conference

On behalf of the Technical Program Committee, I invite you to submit an abstract on your latest results in photovoltaics research, development, and applications to the 35th IEEE PVSC. We are in the midst of a crucial time for energy management on our planet. Environmental, climate change, and energy security concerns are among the most pressing issues we face today. Clearly, photovoltaics can be part of the solution. Public awareness is growing that photovoltaics can shape energy use patterns for future generations – much as the automobile transformed transportation within a time span of 50 years – as evidenced by the exponential rise in photovoltaic production over the last decade. Science and technology developments in PV over the next several years, and their influence on the economics of PV installations, are likely to establish which energy technologies become dominant for decades to come. The chance to share and discuss these crucial PV developments in a timely and influential forum is what the PVSC is all about. Please join us in continuing the PVSC's tradition as the premier international conference on the science and technology of photovoltaics.

Abstracts summarizing original research on all aspects of photovoltaics are encouraged. The technical sessions are organized into 10 major areas as outlined below. We have adopted a system of international chairs and co-chairs for each technical area, to further foster international participation and collaboration at the PVSC.

We have also started two new technical areas at the 35th PVSC. In recognition of the rapidly growing interest in organic photovoltaics and dye-sensitized solar cells, which had previously been part of Area 1, we have launched Area 6: Organic Photovoltaics. The other new area is Area 8: Advances in Characterization of Photovoltaics. Here the focus will be on methods of measurement and analysis themselves, rather than on a particular photovoltaic material system. By breaking these topics out separately in Area 8, it is hoped that researchers will have more exposure to characterization tools typically used for PV materials outside their area of specialization. This cross-fertilization will hopefully give researchers some 'new eyes' with which to look at their PV materials.

To have your paper considered for presentation at the 35th PVSC, please submit a 3-page evaluation abstract, and a short abstract no more than 300 words in length for display on the PVSC website, by the deadline below. Other than the 3-page limit, there are no format restrictions on the evaluation abstract, except that it be detailed enough to allow a competent technical review. The preferred way to submit your abstract is via the 35th PVSC website at www.ieee-pvsc.org . Click on "Submit Your Abstract/Manuscript Online" and please follow the instructions line-by-line to upload your abstract successfully. If you are unable to submit your abstract or manuscript electronically, please contact Brent Nelson as soon as possible for instructions:

Brent Nelson, National Renewable Energy Laboratory

1617 Cole Blvd., Golden, CO 80401

brent.nelson@nrel.gov

The deadline for electronic submission of the 3-page extended abstract and the short abstract of 300 words or less is February 15, 2010, 12:00 midnight Pacific Standard Time (UTC - 8 hours). Contributing authors will be notified of the acceptance status of their papers after March 22, 2010. Upon acceptance, we ask all authors to confirm that they will be able to present their work at the conference, and upload their manuscript by the due date of June 20, 2010 (before the conference) for publication in the conference proceedings. A small number of select papers from the IEEE PVSC are planned to be included in a special journal issue devoted to photovoltaics. Papers in the PVSC proceedings are searchable and accessible via the internet through the IEEE Xplore® system. To ensure IEEE Xplore®-compliant proceedings, please submit your manuscripts electronically through the website, if at all possible.

Please join us in making the 35th PVSC the place to be to present and learn about the latest advances in the science, engineering, and applications of photovoltaics!

Richard R. King

Program Chair, 35th PVSC

Technical Areas

Area 1: Fundamentals and New Concepts for Future Technologies

Chair: Ryne Raffaelle, National Center for Photovoltaics, Golden, Colorado, USA

Co-Chair: N. (Ned) Ekins-Daukes, Imperial College, London, United Kingdom

Co-Chair: Yoshitaka Okada, The University of Tokyo, Japan

Subarea 1.1 Fundamental Conversion Mechanisms

Subarea 1.2 Quantum Dots, Nanowires, and Quantum Wells

Subarea 1.3 Nanostructures for Hybrid Solar Cells

Subarea 1.4 Novel Material Systems

Papers are sought that describe basic research in physical, chemical and optical phenomena, new materials and novel device concepts, which are essential to feed the innovation pipeline leading to future-generation PV technologies. General areas of interest include, but are not limited to, synthesis, characterization and modeling of: (1) non-conventional PV conversion processes based on quantum confinement and nanostructured concepts, intermediate-band solar cells, multiple charge generation, up/down converters, thermophotovoltaics, hot-carrier cells, and other concepts; (2) quantum dots, nanowires, and quantum wells, highly metamorphic materials, new materials systems; and (3) cross-cutting science and hybrid materials that include organic/inorganic materials and innovative devices such as luminescent concentrators.

Area 2: CIGS and CdTe Thin Film Solar Cells and Related Materials

Chair: Rommel Noufi, National Renewable Energy Laboratory, Golden, Colorado, USA

Co-Chairs: Tokio Nakada, Aoyama Gakuin University, Japan

Hans-Werner Schock, Helmholtz-Zentrum Berlin, Germany

Ayodhya N. Tiwari, EMPA, Swiss Federal Laboratory, Switzerland

Jim Sites, Colorado State University, USA

Subarea 2.1 Thin Film Deposition and Characterization of Absorber and Related Wide Band Gap and Novel Materials

Subarea 2.2 Transparent Conductors, Buffer Layers, and Back Contacts

Subarea 2.3 Device Properties and Modeling/Characterization

Subarea 2.4 Advanced Processes and Controls: Atmospheric and Vacuum

Subarea 2.5 Modules and Manufacturing: Process Controls, Performance, Interconnect, and Reliability

As the CdTe and CIGS technologies move from the lab to the factory, we encourage contributions addressing recent advances in manufacturing processes utilizing vacuum and/or atmospheric conditions, process controls and diagnostics, alternative buffers, TCOs, novel contacts, moisture barriers and other measures related to stability/reliability of the solar cell. To maintain a strong and broad science foundation for these two thin film technologies, we solicit contributions on the science and engineering of thin-film deposition, characterization of structural, optical and electrical properties, modeling, and the role of electrically active defects and impurities. Looking forward, we also solicit contributions exploring new materials, wide band gap absorbers, novel device structures, and tandem cells.

Area 3: III-V and Concentrator Technologies

Chair: Frank Dimroth, Fraunhofer ISE, Freiburg, Germany

Co-Chair: Sarah Kurtz, National Renewable Energy Laboratory, Golden, Colorado, USA

Co-Chair: Kenji Araki, Daido Steel, Japan

Subarea 3.1 III-V Epitaxy, Materials, Processing and Devices; III-V Concentrator Solar Cells

Subarea 3.2 High Concentration PV Modules, Optics and Receivers

Subarea 3.3 High Concentration PV Systems and Power Plants

Subarea 3.4 Low concentration PV - Si Concentrator Cells, Modules and Systems

The highest conversion efficiencies of >40 % are obtained with multijunction solar cells made of III-V compound semiconductors. Materials science is the basis for the continuous improvements in the understanding and further development of these complex solar cell structures. We therefore call for papers on the materials science and technology in this field. This may include (but not be limited to) work on theoretical device modeling, epitaxy, solar cell processing and characterization. III-V multijunction solar cells are the basis for the growing terrestrial market of high concentration photovoltaics. At the same time, lower concentration approaches using silicon solar cells are gaining attention. At this conference we are encouraging submission of papers in all fields related to the materials science and technology of Si and III-V concentrator solar cells, receivers and systems. Manufacturing aspects, product reliability and testing are important aspects to be discussed for both solar cells and concentrator systems. Papers on the development of new concentrators including optics for high- as well as low-concentration are welcome. Further topics may focus on: tracker development, thermal hybrid systems, annual power rating, industry standards, CPV market development, cost reduction or ecological impact. Contributions may range from exploratory research through applied research, technology development, and engineering improvements.

Area 4: Crystalline Silicon Technologies

Chair: Klaus Weber, Australian National Univ., Canberra, Australia

Co-Chair: Stefan Glunz, Fraunhofer ISE, Freiburg. Germany

Co-Chair: Stuart Bowden, Arizona State University, USA

Subarea 4.1 Feedstock and Crystallization

Subarea 4.2 Defect Passivation and Advanced Optics

Subarea 4.3 Device Fabrication

Subarea 4.4 Modeling, Metrology, and Characterization

Subarea 4.5 Manufacturing

The continuing drive for higher conversion efficiencies and lower costs of crystalline Si cells demands an increasingly sophisticated understanding of the materials and processes involved, in order to drive the development of new or improved manufacturing methods, materials and device structures. Papers reporting on all aspects of c-Si technology are welcomed, including but not limited to: feedstock materials and crystal growth; defect characterization and passivation; advanced optics for light trapping and reflection control; new cell designs; device modelling; advanced measurement techniques; and solutions for large scale manufacturing.

Area 5: Amorphous, Nano, and Film Si Technologies

Chair: Arno Smets, Eindhoven Univ. of Technology, The Netherlands

Co-Chair: Sumit Argarwal, Colorado School of Mines, USA

Co-Chair: Takuya Matsui, National Institute of Advanced Industrial Science and Technology, Japan

Subarea 5.1 Fundamental Properties of Thin Silicon Films

Subarea 5.2 Processing Issues for Thin Silicon Films and Devices

Subarea 5.3 Novel Concepts for Thin Silicon Solar Cell Devices

Subarea 5.4 Amorphous, Nano/Microcrystalline and Silicon Film Devices and Modules

Thin-film photovoltaics based on amorphous, nano/microcrystalline and polycrystalline silicon on non Si-substrates have matured through three decades of advances in the design and processing of high-quality materials, solar cells and modules. Detailed research studies and visionary papers addressing the entire spectrum of the subject are welcomed, including material characterization concerning microstructure, light induced degradation, SiGe alloys, film oxidation; processing issues concerning large throughput, large area, high deposition rates, processing routes for polycrystalline silicon; novel concepts for thin silicon solar cells concerning films with new functionalities, light trapping using plasmonic films, texturing, multi-layers and intermediate reflective layer; and all topics related to amorphous/microcrystalline and silicon film solar cells and modules such as multijunction structures, performance and long-term reliability.

Area 6: Organic Photovoltaics

Chair: David Ginley, National Renewable Energy Laboratory, Golden, Colorado, USA

Co-Chair: Jan Kroon, ECN, The Netherlands

Co-Chair: Gitti Frey, Technion, Israel

Subarea 6.1 Polymer and Small Molecule Based Organic Photovoltaics

Subarea 6.2 Stability, Processing, and Packaging for Organic Photovoltaics

Subarea 6.3 Tandem, QD Enhanced, and Advanced Concept Organic Solar Cells

Subarea 6.4 Hybrid and Dye Sensitized Solar Cells

Subarea 6.5 Mechanisms, Interfaces, and Models in Excitonic Solar Cells

Organic, hybrid inorganic/organic, and dye sensitized solar cells are rapidly advancing technologies that are beginning to demonstrate commercial viability. The flexibility of different donor/acceptor combinations including both organic small molecule and polymer as well as nanostructured inorganic materials stimulate a large diversity of approaches to the promise of more stable and efficient devices. Many of the devices are excitonic in nature necessitating new device modeling and all of them are dominated by interfaces between very heterogenous materials with different structural, thermal and chemical properties. The symposium will focus on the examination of many of the key areas evolving in this diverse approach to solar energy. This includes papers in the broad spectrum of areas including: an exploration of the evolving devices and materials based on polymers, small molecules, and dyes, the potential enhancement of these devices with tandem or QD structures, the stability and packaging of organic based devices, and an examination of new models and data for the performance of excitonic and dye based devices and their complex interfaces.

Area 7: Space Technologies

Chair: Alex Howard, AFRL, Kirtland Air Force Base, Albuquerque, NM

Co-Chair: Mitsuru Imaizumi, JAXA, Japan

Co-Chair: Carsten Baur, ESA

Subarea 7.1 Space Materials and Devices

Subarea 7.2 Space Systems

Subarea 7.3 Flight Performance and Environmental Effects

Topics of interest are solar cells suited for space use, especially devices capable of high efficiency or high specific power, including solar array designs. The scope includes III-V, thin-film, and novel solar cells. Also of interest are papers concerning space reliability, space environmental effects, and protective materials for the space environment. We welcome papers concerning characterization and qualification of space solar cells and papers concerning flight experiments and missions.

Area 8: Advances in Characterization of Photovoltaics

Chair: Angus Rockett, Univ. of Illinois, Urbana-Champaign, USA

Co-Chairs: Gerald Siefer, Fraunhofer ISE, Freiburg, Germany

Manuel Romero, National Renewable Energy Laboratory, Golden, Colorado, USA

Ayodhya Tiwari, EMPA, Swiss Federal Laboratory, Switzerland

Yoshihiro Hishikawa, Advanced Industrial Science and Technology (AIST), Tsukuba, Japan

Thorsten Trupke, BT Imaging Pty Ltd, Surry Hills, Australia

Subarea 8.1 New Characterization Methods for PV: Optoelectronic, Physical, Chemical

Subarea 8.2 Methods for Characterization of Defects

Subarea 8.3 PV Cell and Module Measurement Techniques

Subarea 8.4 In-situ Characterization Methods

Subarea 8.5 Process Control and Modeling

Subarea 8.6 Methods for Reliability Testing and Standards

The focus of Area 8 is to present works primarily focused on methods of characterization of photovoltaic materials and devices as distinct from focusing on the materials and devices characterized. Thus papers submitted to this area could range from new scanning probe methods to determine semiconductor properties to methods to calibrate an accelerated lifetime testing apparatus. In-situ characterization methods and process control methods are appropriate to Area 8 because they are about implementing a method in a given environment. Papers describing the performance or properties of specific materials and devices, if focused primarily on those materials and devices should go to the areas concerned with the relevant technology. However, a paper describing the application of a technique to a material, focused primarily on demonstrating the capabilities of a technique, belong in Area 8. Thus, a paper describing cathodoluminescence (CL) of CuInSe2 would belong in Area 2 if focused on the CIS but in Area 8 if focused on how to conduct CL or the capabilities of a CL instrument. Exciting new work is being reported in this area ranging from novel methods of photoemission to advanced imaging and characterization methods for individual Si wafers through full modules.

Area 9: PV Modules and Terrestrial Systems

Chair: Angèle Reinders, University of Twente, Enschede, The Netherlands

Co-Chair: Terry Jester, Hudson Clean Energy Partners, USA

Co-Chair: Pierre Verlinden, Solar Systems, Australia

Subarea 9.1 Markets and Customers

Subarea 9.2 PV Module Materials, Encapsulation and Manufacturing

Subarea 9.3 Inverters and other BOS Components

Subarea 9.4 Grid Connected Systems and Building Integration

Subarea 9.5 Stand Alone Applications

PV modules are a vital commodity in the market of PV systems. We encourage submissions in all subjects associated with PV module materials, manufacturing and the performance of PV modules. Also papers reporting on markets and costs, and regarding the energy yield of PV modules are encouraged. Power conditioning equipment affects the reliability and efficiency of PV systems. Therefore, contributions describing technical issues and standardization of inverters and Balance-of-Systems (BOS) components are encouraged. Papers about design engineering, monitoring and control of very large scale grid-connected PV installations are welcome, as well as papers about incentives for, and experiences with residential grid-connected systems and building-integrated PV systems. The growing need for autonomous electricity supply is advancing the development of stand-alone PV solutions. We welcome contributions describing sizing and simulation of system integrated PV systems in the context of functionality, regulations, costs and environmental aspects. In Area 9 contributions can range from applied research and technology development, to papers about design, engineering, markets and user studies.

Area 10: PV Velocity Forum: Accelerating the PV Economy

Chair: John Benner, National Renewable Energy Laboratory, Golden, Colorado

Co-Chair: B. J. Stanbery, HelioVolt Corp., Austin, Texas, USA

Subarea 10.1 PV Programs, Policies and Incentives

Subarea 10.2 PV Markets

Subarea 10.3 Sustainability and Environmental Issues

The PV Velocity Forum brings technologist, investors and policy-makers together to explore methods for driving more cost-effective emerging technologies through production and into the market. Speakers and panelists will engage with attendees to explore gating factors affecting the adoption of new PV technologies, such as research support, policy development, regulations, supply chain, environmental issues and market-based project management. The Forum will address strategies to sustain or accelerate the high growth rate and drive costs down faster.

GENERAL CHAIR PROGRAM CHAIR

Robert J. Walters Richard R. King

U.S. Naval Research Laboratory Spectrolab, Inc.

Code 6818, Bldg. 208, Rm. 135 12500 Gladstone Ave.

4555 Overlook Ave., SW Sylmar, CA 91342

Washington, DC 20375 Tel: (818) 838-7404

Tel: 202-767-2533 rking@spectrolab.com Robert.Walters@nrl.navy.mil

WILLIAM R. CHERRY AWARD

This award is named in honor of William R. Cherry, a founder of the photovoltaic community. In the 1950's, he was instrumental in establishing solar cells as the ideal power source for space satellites and for recognizing, advocating, and nurturing the use of photovoltaic systems for terrestrial applications. The William R. Cherry award was instituted in l980, shortly after his death. The purpose of the award is to recognize engineers and scientists who devote a part of their professional life to the advancement of the technology of photovoltaic energy conversion. The nominee must have made significant contributions to the science and/or technology of PV energy conversion, with dissemination by substantial publications and presentations. Professional society activities, promotional and/or organizational efforts and achievements are not considerations in the election for the award.

This award is presented at each IEEE Photovoltaic Specialists Conference. The recipient is selected by the William R Cherry Committee composed of past PVSC conference chairpersons and past recipients of the award. Those nominated for the award do not participate in the process.

To be eligible for the award, the nominee must currently be active in the science and technology of PV conversion. He/she must have been active in the field for an extended period, with expectation of continued activity. Short term activities in the field, and/or single outstanding contributions are not adequate to make a person eligible for the award.

To make a nomination, please submit:

1. The name of your nominee, and his/her current affiliation.

2. A summary (less than 100 words) of the nominee's contributions to the advancement of the PV field.

3. A citation (less than 40 words) listing the nominee's specific contributions to make them deserving of the award.

4. A list of the nominee's activities in the field.

5. Nominator's name, address, phone number and e-mail address.

Please send any nominations for the next William R. Cherry award (35th IEEE PVSC) to:

Dr. Antonio Luque, Instituto de Energia Solar

Universidad Politecnica de Madrid

E28040 Madrid, SPAIN

Tel: +34 91 544 1060, Fax: +34 91544 6341

E-mail: Luque@ies-def.upm.es

The deadline for Cherry Award nominations to be considered for the 35th IEEE PVSC is

December 31, 2009.

Previous recipients of the William R. Cherry Award:

Dr. Paul Rappaport 1980 Dr. Adolf Goetzberger 1997

Dr. Joseph L. Loferski 1981 Dr. Richard J. Schwartz 1998

Prof. Martin Wolf 1982 Dr. Christopher R. Wronski 2000

Dr. Henry W. Brandhorst 1984 Dr. Richard M. Swanson 2002

Mr. Eugene L. Ralph 1985 Dr. Ajeet Rohatgi 2003

Dr. Charles E. Backus 1987 Dr. Timothy J. Coutts 2005

Dr. David E. Carlson 1988 Dr. Antonio Luque 2006

Dr. Martin A. Green 1990 Dr. Masafumi Yamaguchi 2008

Mr. Peter A. Iles 1991 Dr. Stuart Wenham 2009

Dr. Lawrence L. Kazmerski 1993

Prof. Yoshihiro Hamakawa 1994

Dr. Allen M. Barnett 1996

A Preliminary Performance Comparison of Five Machine Learning Algorithms for Practical IP Traffic Flow Classification

Nigel Williams, Sebastian Zander, Grenville Armitage

Centre for Advanced Internet Architectures (CAIA)

Swinburne University of Technology

Melbourne, Australia

+61 3 9214 {4837, 4835, 8373}

{niwilliams,szander,garmitage}@swin.edu.au

ABSTRACT

The identification of network applications through observation of associated packet traffic flows is vital to the areas of network management and surveillance. Currently popular methods such as port number and payload-based identification exhibit a number of shortfalls. An alternative is to use machine learning (ML) techniques and identify network applications based on per-flow statistics, derived from payload-independent features such as packet length and inter-arrival time distributions. The performance impact of feature set reduction, using Consistency-based and Correlation-based feature selection, is demonstrated on Naïve Bayes, C4.5, Bayesian Network and Naïve Bayes Tree algorithms. We then show that it is useful to differentiate algorithms based on computational performance rather than classification accuracy alone, as although classification accuracy between the algorithms is similar, computational performance can differ significantly.

Categories and Subject Descriptors

C.2.3 [Computer-Communication Networks]: Network Operations - Network monitoring; C.4 [Performance of Systems]: Measurement Techniques

General Terms: Algorithms, Measurement Keywords: Traffic Classification, Machine Learning

1. INTRODUCTION

There is a growing need for accurate and timely identification of networked applications based on direct observation of associated traffic flows. Also referred to as ‘classification’, application identification is used for trend analyses (estimating capacity demand trends for network planning), adaptive network-based Quality of Service (QoS) marking of traffic, dynamic access control (adaptive firewalls that detect forbidden applications or attacks) or lawful interception.

Classification based on well-known TCP or UDP ports is becoming increasingly less effective – growing numbers of networked applications are port-agile (allocating dynamic ports as needed), end users are deliberately using non-standard ports to hide their traffic, and use of network address port translation (NAPT) is widespread (for example a large amount of peer-to-peer file sharing traffic is using non-default ports [1]).

Payload-based classification relies on some knowledge about the payload formats for every application of interest: protocol decoding requires knowing and decoding the payload format

while signature matching relies on knowledge of at least some characteristic patterns in the payload. This approach is limited by the fact that classification rules must be updated whenever an application implements even a trivial protocol change, and privacy laws and encryption can effectively make the payload inaccessible.

Machine learning (ML) [2] techniques provide a promising alternative in classifying flows based on application protocol (payload) independent statistical features such as packet length and inter-arrival times. Each traffic flow is characterised by the same set of features but with different feature values. A ML classifier is built by training on a representative set of flow instances where the network applications are known. The built classifier can be used to determine the class of unknown flows.

Much of the existing research focuses on the achievable accuracy (classification accuracy) of different machine learning algorithms. The studies have shown that a number of different algorithms are able to achieve high classification accuracy. The effect of using different sets of statistical features on the same dataset has seen little investigation. Additionally, as different (in some cases private) network traces have been used with different features, direct comparisons between studies are difficult.

There have been no comparisons of the relative speed of classification (computational performance) for different algorithms when classifying IP traffic flows. However, within a practical IP traffic classification system, considerations as to computational performance and the type and number of statistics calculated are vitally important.

In this paper we attempt to provide some insight into these aspects of ML traffic classification. We define 22 practical flow features for use within IP traffic classification, and further reduce the number of features using Correlation-based and Consistency-based feature reduction algorithms. We confirm that a similar level of classification accuracy can be obtained when using several different algorithms with the same set of features and training/testing data. We then differentiate the algorithms on the basis of computational performance. Our key findings are:

• Feature reduction greatly reduces the number of features needed to identify traffic flows and hence greatly improves computational performance

• While feature reduction greatly improves performance it does not severely reduce the classification accuracy.

• Given the same features and flow trace, we find that different ML algorithms (Bayes Net, Naïve Bayes Tree and C4.5) provide very similar classification accuracy.

ACM SIGCOMM Computer Communication Review 7 Volume 36, Number 5, October 2006

• However, the different algorithms show significant differences in their computational performance (build time and classification speed).

The paper is structured as follows. Section 2 summarises key machine learning concepts. Section 3 discusses related work, while Section 4 outlines our approach, including details on algorithms, features and datasets. Section 5 presents our main findings and in Section 6 we conclude and discuss future work.

2. MACHINE LEARNING

2.1 Machine Learning Concepts

We use machine learning algorithms to map instances of network traffic flows into different network traffic classes. Each flow is described by a set of statistical features and associated feature values. A feature is a descriptive statistic that can be calculated from one or more packets – such as mean packet length or the standard deviation of inter-arrival times. Each traffic flow is characterised by the same set of features, though each will exhibit different feature values depending on the network traffic class to which it belongs.

ML algorithms that have been used for IP traffic classification generally fall into the categories of being supervised or unsupervised. Unsupervised (or clustering) algorithms group traffic flows into different clusters according to similarities in the feature values. These clusters are not pre-defined and the algorithm itself determines their number and statistical nature. For supervised algorithms the class of each traffic flow must be known before learning. A classification model is built using a training set of example instances that represent each class. The model is then able to predict class membership for new instances by examining the feature values of unknown flows.

2.2 Feature Reduction

As previously stated, features are any statistics that can be calculated from the information at hand (in our case packets within a flow). Standard deviation of packet length, Fourier transform of packet inter-arrival times and the initial TCP window size are all valid features. As network flows can be bi-directional, features can also be calculated for both directions of the flow.

In practical IP classification tasks we need to decide which features are most useful given a set of working constraints. For instance calculating Fourier transform statistics for thousands of simultaneous flows may not be feasible. In addition, the representative quality of a feature set greatly influences the effectiveness of ML algorithms. Training a classifier using the maximum number of features obtainable is not always the best option, as irrelevant or redundant features can negatively influence algorithm performance.

The process of carefully selecting the number and type of features used to train the ML algorithm can be automated through the use of feature selection algorithms. Feature selection algorithms are broadly categorised into the filter or wrapper model. Filter model algorithms rely on a certain metric to rate and select subsets of features. The wrapper method evaluates the performance of different features using specific ML algorithms, hence produces feature subsets ‘tailored’ to the algorithm used.

2.3 Evaluation Techniques

Central to evaluating the performance of supervised learning algorithms is the notion of training and testing datasets. The training set contains examples of network flows from different

classes (network applications) and is used to build the classification model. The testing set represents the unknown network traffic that we wish to classify. The flows in both the training and testing sets are labelled with the appropriate class a-priori. As we know the class of each flow within the datasets we are able to evaluate the performance of the classifier by comparing the predicted class against the known class.

To test and evaluate the algorithms we use k-fold cross validation. In this process the data set is divided into k subsets. Each time, one of the k subsets is used as the test set and the other k-1 subsets form the training set. Performance statistics are calculated across all k trials. This provides a good indication of how well the classifier will perform on unseen data. We use k=10 and compute three standard metrics:

• Accuracy: the percentage of correctly classified instances over the total number of instances.

• Precision: the number of class members classified correctly over the total number of instances classified as class members.

• Recall (or true positive rate): the number of class members classified correctly over the total number of class members.

In this paper we refer to the combination of accuracy, precision and recall using the term classification accuracy.

We use the term computational performance to describe two additional metrics: build time and classification speed. Build time refers to the time (in seconds) required to train a classifier on a given dataset. Classification speed describes the number of classification that can be performed each second.

2.4 Sampling Flow Data

System memory usage increases with the number of instances in the training/testing set [3], as does CPU usage. As the public trace files used in this study contain millions of network flows, we perform flow sampling to limit the number of flows and therefore the memory and CPU time required for training and testing (see Section 4.3).

We sample an equal number of traffic flows for each of the network application classes. Thus class prior probabilities are equally weighted. Although this may negatively impact algorithms that rely on prior class probabilities, it prevents algorithms from optimising towards a numerically superior class when training (leading to overly optimistic results). Furthermore, this allows us to evaluate the accuracy of ML algorithms based on feature characteristics without localising to particular trace-dependent traffic mixes (different locations can be biased towards different traffic classes).

3. RELATED WORK

The Expectation Maximization (EM) algorithm was used by McGregor et al. [4] to cluster flows described by features such as packet length, inter-arrival time and flow duration. Classification of traffic into generic groups (such as bulk-transfer, for instance) was found to be achievable.

Dunnigan and Ostrouchov [5] use principal component analysis (PCA) for the purpose of Intrusion Detection. They find that network flows show consistent statistical patterns and can be detected when running on default and non-default ports.

We have proposed an approach for identifying different network applications based on greedy forward feature search and EM in [6]. We show that a variety of applications can be separated into an arbitrary number of clusters.

ACM SIGCOMM Computer Communication Review 8 Volume 36, Number 5, October 2006

Roughan et al. [7] use nearest neighbour (NN) and linear discriminate analysis (LDA) to map different applications to different QoS classes (such as interactive and transactional). They demonstrated that supervised ML algorithms are also able to separate traffic into classes, with encouraging accuracy.

Moore and Zuev [3] used a supervised Naive Bayes classifier and 248 flow features to differentiate between different application types. Among these were packet length and inter-arrival times, in addition to numerous TCP header derived features. Correlation-based feature selection was used to identify ‘stronger’ features, and showed that only a small subset of fewer than 20 features is required for accurate classification.

Karagiannis et al. [8] have developed a method that characterises host behaviour on different levels to classify traffic into different application types.

Recently Bernaille et al. [9] used a Simple K-Means clustering algorithm to perform classification using only the first five packets of the flow.

Lim et al. [10] conducted an extensive survey of 33 algorithms across 32 diverse datasets. They find that algorithms show similar classification accuracy but quite different training performance (for a given dataset and complementary features). They recommend that users select algorithms based on criteria such as model interpretability or training time. Classification speed of the algorithms was not compared.

4. EXPERIMENTAL APPROACH

As the main focus of this study is to demonstrate the benefit of using computational performance as a metric when choosing an ML algorithm to implement, we use a single dataset and fixed algorithm configurations. varying only the feature set used in training.

In the following sections we detail the machine learning and feature selection algorithms used in the paper. The IP traffic dataset, traffic classes, flow and feature definitions are also explained.

4.1 Machine Learning Algorithms

We use the following supervised algorithms that have been implemented in the Weka [20] ML suite:

• Bayesian Network

• C4.5 Decision Tree

• Naïve Bayes

• Naïve Bayes Tree

The algorithms used in this study are simple to implement and have either few or no parameters to be tuned. They also produce classifications models that can be more easily interpreted. Thus algorithms such as Support Vector Machines or Neural Networks were not included.

The algorithms used in this investigation are briefly described in the following paragraphs, with extended descriptions in Appendix A.

Naive-Bayes (NBD, NBK) is based on the Bayesian theorem [11]. This classification technique analyses the relationship between each attribute and the class for each instance to derive a conditional probability for the relationships between the attribute values and the class. Naïve Bayesian classifiers must estimate the probabilities of a feature having a certain feature value. Continuous features can have a large (possibly infinite) number of values and the probability cannot be estimated from the frequency distribution. This can be addressed by modelling features with a continuous probability distribution or by using discretisation. We

evaluate Naive Bayes using both discretisation (NBD) and kernel density estimation (NBK). Discretisation transforms the continuous features into discrete features, and a distribution model is not required. Kernel density estimation models features using multiple (Gaussian) distributions, and is generally more effective than using a single (Gaussian) distribution.

C4.5 Decision Tree (C4.5) creates a model based on a tree structure [12]. Nodes in the tree represent features, with branches representing possible values connecting features. A leaf representing the class terminates a series of nodes and branches. Determining the class of an instance is a matter of tracing the path of nodes and branches to the terminating leaf.

Bayesian Network (BayesNet) is structured as a combination of a directed acyclic graph of nodes and links, and a set of conditional probability tables [13]. Nodes represent features or classes, while links between nodes represent the relationship between them. Conditional probability tables determine the strength of the links. There is one probability table for each node (feature) that defines the probability distribution for the node given its parent nodes. If a node has no parents the probability distribution is unconditional. If a node has one or more parents the probability distribution is a conditional distribution, where the probability of each feature value depends on the values of the parents.

Naïve Bayes Tree (NBTree) is a hybrid of a decision tree classifier and a Naïve Bayes classifier [14]. Designed to allow accuracy to scale up with increasingly large training datasets, the NBTree model is a decision tree of nodes and branches with Naïve Bayes classifiers on the leaf nodes.

4.2 Feature Reduction Algorithms

We use two different algorithms to create reduced feature sets: Correlation-based Feature Selection (CFS) and Consistency-based Feature selection (CON). These algorithms evaluate different combinations of features to identify an optimal subset. The feature subsets to be evaluated are generated using different subset search techniques. We use Best First and Greedy search methods in the forward and backward directions, explained below.

Greedy search considers changes local to the current subset through the addition or removal of features. For a given ‘parent’ set, a greedy search examines all possible ‘child’ subsets through either the addition or removal of features. The child subset that shows the highest goodness measure then replaces the parent subset, and the process is repeated. The process terminates when no more improvement can be made.

Best First search is similar to greedy search in that it creates new subsets based on the addition or removal of features to the current subset. However, it has the ability to backtrack along the subset selection path to explore different possibilities when the current path no longer shows improvement. To prevent the search from backtracking through all possibilities in the feature space, a limit is placed on the number of non-improving subsets that are considered. In our evaluation we chose a limit of five.

The following brief descriptions of the feature selection algorithms are supplemented by additional details in Appendix B.

Consistency-based feature subset search [15] evaluates subsets of features simultaneously and selects the optimal subset. The optimal subset is the smallest subset of features that can

ACM SIGCOMM Computer Communication Review 9 Volume 36, Number 5, October 2006

identify instances of a class as consistently as the complete feature set.

Correlation-based feature subset search [16] uses an evaluation heuristic that examines the usefulness of individual features along with the level of inter-correlation among the features. High scores are assigned to subsets containing attributes that are highly correlated with the class and have low inter-correlation with each other.

To maintain a consistent set of features when testing each of the algorithms, wrapper selection was not used (as it creates algorithm-specific optimised feature sets). It is recognised that wrapper-generated subsets provide the upper bound of accuracy for each algorithm, but do not allow direct comparison of the algorithm performance (as features are different).

4.3 Data Traces and Traffic Classes

Packet data is taken from three publicly available NLANR network traces [17], which were captured in different years and at different locations. We used four 24-hour periods of these traces

(auckland-vi-20010611, auckland-vi-20010612, leipzig-ii

20030221, nzix-ii-20000706). As mentioned in Section 2.4 we use stratified sampling to obtain flow data for our dataset. 1,000 flows were randomly and independently sampled for each class and each trace. The traces were then aggregated into a single dataset containing 4,000 flows per application class. This is referred to as the ‘combined’ dataset. 10-fold cross-validation is used to create testing and training sets (see Section 2.3).

We chose a number of prominent applications and defined the flows based on the well-known ports: FTP-Data (port 20), Telnet (port 23), SMTP (port 25), DNS (port 53) and HTTP (port 80). We also include the multiplayer game Half-Life (port 27015). The chosen traffic classes account for a large proportion (up to 75%) of the traffic in each of the traces. The choice of six application classes creates a total of 24,000 instances in the combined dataset.

A drawback of using anonymised trace files is the lack of application layer data, making verification of the true application impossible. As we use default ports to obtain flows samples, we can expect that the majority of flows are of the intended application. We accept however that a percentage of flows on these ports will be of different applications. Despite this the number of incorrectly labelled flows is expected to be small, as port-agile applications are more often found on arbitrary ports rather than on the default port of other applications (as found for peer-to-peer traffic in [1]). Therefore the error introduced into the training and testing data is likely to be small.

4.4 Flow and Feature Definitions

We use NetMate [18] to process packet traces, classify packets to flows and compute feature values. Flows are defined by source IP and source port, destination IP and destination port and protocol.

Flows are bidirectional and the first packet seen by the classifier determines the forward direction. Flows are of limited duration. UDP flows are terminated by a flow timeout. TCP flows are terminated upon proper connection teardown (TCP state machine) or after a timeout (whichever occurs first). We use a 600 second flow timeout, the default timeout value of NeTraMet (the implementation of the IETF Realtime Traffic Flow Measurement working group’s architecture) [19]. We consider only UDP and TCP flows that have at least one packet in each direction and transport at least one byte of payload. This excludes flows

without payload (e.g. failed TCP connection attempts) or ‘unsuccessful’ flows (e.g. requests without responses).

When defining the flow features, the ‘kitchen-sink’ method of using as many features as possible was eschewed in favour of an economical, constraint-based approach. The main limitation in choosing features was that calculation should be realistically possible within a resource constrained IP network device.

Thus potential features needed to fit the following criteria:

• Packet payload independent

• Transport layer independent

• Context limited to a single flow (i.e. no features spanning multiple flows)

• Simple to compute

The following features were found to match the above criteria

and became the base feature set for our experiments:

• Protocol

• Flow duration

• Flow volume in bytes and packets

• Packet length (minimum, mean, maximum and standard deviation)

• Inter-arrival time between packets (minimum, mean, maximum and standard deviation).

Packet lengths are based on the IP length excluding link layer overhead. Inter-arrival times have at least microsecond precision and accuracy (traces were captured using DAG cards [17]). As the traces contained both directions of the flows, features were calculated in both directions (except protocol and flow duration). This produces a total of 22 flow features, which we refer to as the ‘full feature set’. A list of these features and their abbreviations can be found in Appendix C.

Our features are simple and well understood within the networking community. They represent a reasonable benchmark feature set to which more complex features might be added in the future.

5. RESULTS AND ANALYSIS

Our ultimate goal is to show the impact of feature reduction on the relative computational performance of our chosen ML algorithms. First we identify significantly reduced feature sets using CFS and Consistency subset evaluation. Then, having demonstrated that classification accuracy is not significantly degraded by the use of reduced feature sets, we compare the relative computational performance of each tested ML algorithm with and without reduced feature sets.

5.1 Feature Reduction

The two feature evaluation metrics were run on the combined dataset using the four different search methods. The resulting reduced feature sets were then used to train and test each of the algorithms using cross-validation as described in Section 2.3. We obtain an average accuracy across the algorithms for each of the search methods. We then determine the ‘best’ subset by comparing the average accuracy against the average accuracy across the algorithms using the full feature set.

Figure 1 plots the average accuracy for each search method using Consistency evaluation. The horizontal line represents the average achieved using the full feature set (94.13%). The number of features in the reduced feature subset is shown in each bar.

ACM SIGCOMM Computer Communication Review 10 Volume 36, Number 5, October 2006

Figure 1: Consistency-generated subset accuracy according

to search method

A large reduction in the feature space is achieved with relatively little change in accuracy. Two search methods (greedy forward, best first forward) produced an identical set of nine features. This set also provided the highest accuracy, and was thus determined to be the ‘best’ Consistency selected subset. The features selected are detailed in Table 1, referred to as CON subset in the remainder of this paper.

CFS subset evaluation produced the same seven features for each of the search methods. This is therefore the ‘best’ subset by default. We refer to this as the CFS subset, shown in Table 1.

Table 1: The best feature subsets according to feature

reduction method

CFS subset fpackets, maxfpktl, minfpktl, meanfpktl, stdbpktl, minbpktl, protocol

CON subset fpackets, maxfpktl, meanbpktl, maxbpktl, minfiat, maxfiat, minbiat, maxbiat, duration

The ‘best’ feature sets chosen by CFS and Consistency are somewhat different, with the former relying predominantly on packet length statistics, the latter having a balance of packet length and inter-arrival time features. Only maxfpktl and fpackets are common to both reduced feature sets. Both metrics select a mixture of features calculated in the forward and backward directions.

The reduction methods provided a dramatic decrease in the number of features required, with the best subsets providing similar mean accuracies (CFS: 93.76%, CON: 93.14%). There appears to be a very good trade-off between feature space reduction and loss of accuracy.

5.2 Impact of Feature Reduction on Classification Accuracy

We examine the impact that feature reduction has on individual algorithms, in terms of accuracy, precision and recall, using the feature sets obtained in Section 5.1. Cross-validation testing is performed for each of the algorithms using the full feature set, the CFS subset and the CON subset. We obtain the overall accuracy and mean class recall/precision rates across the classes after each test. These values provide an indication as to the overall performance of the algorithm as well as the performance for individual traffic classes. Figure 2 compares the accuracy for each ML algorithm when using the CFS subset, CON subset and the full feature set.

Figure 2: Accuracy of algorithms using CFS subset, CON

Subset and All features.

The majority of algorithms achieve greater than 95% accuracy using the full feature set, and there is little change when using either of the reduced subsets. NBK does not perform as well as in [3], possibly due to the use of different traffic classes, features and equally weighted classes.

Figure 3 plots the relative change in accuracy for each of the algorithms compared to the accuracy using all features. The decrease in accuracy is not substantial in most cases, with the largest change (2-2.5%) occurring for NBD and NBK when using the CON subset. Excluding this case however, both subsets produce a similar change, despite the different features used in each.

Figure 3: Relative change in accuracy depending on feature selection metric for each algorithm compared to using full feature set

Examining the mean class recall and precision rates for each of the algorithms showed a similar result to that seen with overall accuracy. The mean rates were initially high (>0.9) and remained largely unchanged when testing using the reduced feature sets.

Although the classification accuracies for each of the algorithms were quite high, they do not necessarily indicate the best possible performance for each algorithm, nor can any wider generalisation of accuracy for different traffic mixes be inferred. They do however provide an indication as to the changes in classification accuracy that might be expected when using reduced sets of features more appropriate for use within operationally deployed, high-speed, IP traffic classification systems.

Despite using subsets of differing sizes and features, each of the algorithms achieves high accuracy and mean recall/precision rates, with little variation in performance for the majority of algorithms. An interesting implication of these results is that, given our feature set and dataset, we might expect to obtain similar levels of classification accuracy from a number of

ACM SIGCOMM Computer Communication Review 11 Volume 36, Number 5, October 2006

different algorithms. Though ours is a preliminary evaluation, a more extensive study [10] reached similar conclusions.

5.3 Comparing Algorithm Computational Performance

It is clearly difficult to convincingly differentiate ML algorithms (and feature reduction techniques) on the basis of their achievable accuracy, recall and precision. We therefore focus on the build time and classification speed of the algorithms when using each of the feature sets. Computational performance is particularly important when considering real-time classification of potentially thousands of simultaneous networks flows.

Tests were performed on an otherwise unloaded 3.4GHz Pentium 4 workstation running SUSE Linux 9.3. It is important to note that we have measured the performance of concrete implementations (found in WEKA) as opposed to theoretically investigating the complexity of the algorithms. This practical approach was taken to obtain some tangible numbers with which some preliminary performance comparisons could be made.

Figure 4 shows the normalised classification speed for the algorithms when tested with each of the feature sets. A value of 1 represents the fastest classification speed (54,700 classifications per second on our test platform).

Figure 4: Normalised classification speed of the algorithms

for each feature set

Although there was little separating the results when considering accuracy, classification speed shows significant differences. C4.5 is the fastest algorithm when using any of the feature sets, although the difference is less pronounced when using the CFS subset and CON subset. Using the smaller subsets provides noticeable speed increases for all algorithms except C4.5.

There are a number of factors that may have caused the reduction of classification speed for C4.5 when using the smaller subsets. It appears that decreasing the features available during training has produced a larger decision tree (more tests and nodes), thus slightly lengthening the classification time. The difference is relatively minor however, and in the context of a classification system benefits might be seen in reducing the number of features that need to be calculated.

Figure 5 compares the normalised build time for each of the algorithms when using the different feature sets. A value of 1 represents the slowest build time (1266 seconds on our test platform).

Figure 5: Normalised build time for each algorithm and

feature set

It is immediately clear that NBTree takes substantially longer to build than the remaining algorithms. Also quite clear is the substantial drop in build time when using the reduced feature sets (though still much slower than the other algorithms).

Figure 6 provides a closer view of the algorithms excluding NBTree. Higher values represent lengthier build time.

Figure 6: Normalised build time for each algorithm and

feature set except NBTree

Moore and Zuev [3] found that NBK classifiers could be built quickly, and this is also the case here. NBK builds quickly as this only involves storing feature distributions (classification is slow however as probability estimates must be calculated for all distinct feature values). Bayes Net and NBD have comparable build times, while C4.5 is slower.

Overall the reduced feature sets allow for large decreases in the time taken to build a classifier. The different mix of features within the reduced subsets now appear to have some impact – the subset relying predominantly on packet length features (CFS subset) builds much faster than the subset using a mixture of packet lengths and inter-arrival times (CON subset).

One explanation for this behaviour is that there are many more possible values for inter-arrival times compared to packet length statistics (packet length statistics are more discrete). This produces a wider distribution of feature values that requires finer quantisation when training, with the increased computation leading to longer build times.

These preliminary results show that when using our features, there is a large difference in the computational performance between the algorithms tested. The C4.5 algorithm is significantly faster in terms of classification speed and appears to be the best suited for real-time classification tasks.

ACM SIGCOMM Computer Communication Review 12 Volume 36, Number 5, October 2006

6. CONCLUSIONS AND FUTURE WORK

Traffic classification has a vital role in tasks as wide ranging as trend analyses, adaptive network-based QoS marking of traffic, dynamic access control and lawful interception. Traditionally performed using port and payload based analysis, recent years have seen an increased interest in the development of machine learning techniques for classification.

Much of this existing research focuses on the achievable accuracy (classification accuracy) of different machine learning algorithms. These experiments have used different (thus not comparable) datasets and features. The process of defining appropriate features, performing feature selection and the influence of this on classification and computation performance has not been studied.

In this paper we recognise that real-time traffic classifiers will operate under constraints, which limit the number and type of features that can be calculated. On this basis we define 22 flow features that are simple to compute and are well understood within the networking community. We evaluate the classification accuracy and computational performance of C4.5, Bayes Network, Naïve Bayes and Naïve Bayes Tree algorithms using the 22 features and with two additional reduced feature sets.

We find that the feature reduction techniques are able to greatly reduce the feature space, while only minimally impacting classification accuracy and at the same time significantly increasing computation performance. When using the CFS and Consistency selected subsets, only small decreases in accuracy (on average <1%) were observed for each of the algorithms. We also found that the majority of algorithms achieved similar levels of classification accuracy given our feature space and dataset, making differentiation of them using standard evaluation metrics such as accuracy, recall and precision difficult.

We find that better differentiation of algorithms can be obtained by examining computational performance metrics such as build time and classification speed. In comparing the classification speed, we find that C4.5 is able to identify network flows faster than the remaining algorithms. We found NBK to have the slowest classification speed followed by NBTree, Bayes Net, NBD and C4.5.

Build time found NBTree to be slowest by a considerable margin. The remaining algorithms were more even, with NBK building a classifier the fastest, followed by NBD, Bayes Net and C4.5.

As this paper represents a preliminary investigation, there are a number of potential avenues for further work, such as an in-depth evaluation of as to why different algorithms exhibit different classification accuracy and computational performance. In a wider context, investigating the robustness of ML classification (for instance training on a data from one location and classifying data from other locations) and a comparison between ML and non-ML techniques on an identical dataset would also be valuable. We would also like to explore different methods for sampling and constructing training datasets.

7. ACKNOWLEDGEMENTS

This paper has been made possible in part by a grant from the Cisco University Research Program Fund at Community Foundation Silicon Valley. We also thank the anonymous reviewers for their constructive comments.

8. REFERENCES

[1] T. Karagiannis, A. Broido, N. Brownlee, kc claffy, “Is P2P dying or just hiding?”, In Proceedings of Globecom, November/December 2004.

[2] T. M. Mitchell, “Machine Learning”, McGraw-Hill Education (ISE Editions), December 1997.

[3] A. W. Moore, D. Zuev, “Internet Traffic Classification Using Bayesian Analysis Techniques”, in Proceedings of ACM SIGMETRICS, Banff, Canada, June 2005.

[4] A. McGregor, M. Hall, P. Lorier, J. Brunskill, “Flow Clustering Using Machine Learning Techniques”, Passive & Active Measurement Workshop, France, April 2004.

[5] T. Dunnigan, G. Ostrouchov, “Flow Characterization for Intrusion Detection”, Technical Report, Oak Ridge National Laboratory, November 2000.

[6] S. Zander, T.T.T. Nguyen, G. Armitage, “Automated Traffic Classification and Application Identification using Machine Learning”, in Proceedings of IEEE LCN, Australia, November 2005.

[7] M. Roughan, S. Sen, O. Spatscheck, N. Duffield, “Class-of-Service Mapping for QoS: A statistical signature-based approach to IP traffic classification”, in Proceedings of ACM SIGCOMM Internet Measurement Workshop, Italy, 2004.

[8] T. Karagiannis, K. Papagiannaki, M. Faloutsos, “BLINC: Multilevel Traffic Classification in the Dark”, in Proceedings of ACM SIGCOMM, USA, August 2005.

[9] L. Bernaille, R. Teuxeira, I. Akodkenou, A. Soule, K. Salamatian, “Traffic Classification on the Fly”, ACM SIGCOMM Computer Communication Review, vol. 36, no. 2, April 2006.

[10] T. Lim, W. Loh, Y. Shih, “A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-three Old and New Classification Algorithms”, Machine Learning, volume 40, pp. 203-229, Kluwer Academic Publishers, Boston, 2000.

[11] G. H. John, P. Langley, “Estimating Continuous Distributions in Bayesian Classifiers”, in Proceedings of 11th Conference on Uncertainty in Artificial Intelligence, pp. 338-345, Morgan Kaufman, San Mateo, 1995.

[12] R. Kohavi and J. R. Quinlan, Will Klosgen and Jan M. Zytkow, editors, “Decision-tree discovery”, in Handbook of Data Mining and Knowledge Discovery, pp. 267-276, Oxford University Press, 2002.

[13] R. Bouckaert, “Bayesian Network Classifiers in Weka”, Technical Report, Department of Computer Science, Waikato University, Hamilton, NZ 2005.

[14] R. Kohavi, “Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid”, in Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining (KDD), 1996.

[15] M. Dash, H. Liu, “Consistency-based Search in Feature Selection”, Artificial Intelligence, vol. 151, issue 1-2, pp. 155-176, 2003.

[16] M. Hall, “Correlation-based Feature Selection for Machine Learning”, PhD Diss. Department of Computer Science, Waikato University, Hamilton, NZ, 1998.

[17] NLANR traces: http://pma.nlanr.net/Special/ (viewed August 2006).

[18] NetMate, http://sourceforge.net/projects/netmate-meter/ (viewed August 2006).

ACM SIGCOMM Computer Communication Review 13 Volume 36, Number 5, October 2006

[19] N. Brownlee, “NeTraMet & NeMaC Reference Manual”, University of Auckland, http://www.auckland. ac.nz/net/Accounting/ntmref.pdf, June 1999.

[20] Waikato Environment for Knowledge Analysis (WEKA) 3.4.4, http://www.cs.waikato.ac.nz/ml/weka/ (viewed August 2006).

9. APPENDIX

9.1 Appendix A: Machine Learning Algorithms

9.1.1 Naïve Bayes

Naive-Bayes is based on the Bayesian theorem [11]. This classification technique analyses the relationship between each attribute and the class for each instance to derive a conditional probability for the relationships between the attribute values and the class. We assume that X is a vector of instances where each instances is described by attributes {X1,...,Xk} and a random variable C denoting the class of an instance. Let x be a particular instance and c be a particular class.

Using Naive-Bayes for classification is a fairly simple process. During training, the probability of each class is computed by counting how many times it occurs in the training dataset. This is called the prior probability P(C=c). In addition to the prior probability, the algorithm also computes the probability for the instance x given c. Under the assumption that the attributes are independent this probability becomes the product of the probabilities of each single attribute. Surprisingly Naive Bayes has achieved good results in many cases even when this assumption is violated.

The probability that an instance x belongs to a class c can be computed by combining the prior probability and the probability from each attribute’s density function using the Bayes formula:

P(C  c) P(Xi xi | C  c)

P(Cc|Xx) i

The denominator is invariant across classes and only necessary as a normalising constant (scaling factor). It can be computed as the sum of all joint probabilities of the enumerator:

P X  x  P C P X  x C . (2)

( ) ( j) ( | j)

Equation 1 is only applicable if the attributes Xi are qualitative (nominal). A qualitative attribute takes a small number of values. The probabilities can then be estimated from the frequencies of the instances in the training set. Quantitative attributes can have a large number (possibly infinite) of values and the probability cannot be estimated from the frequency distribution. This can be addressed by modelling attributes with a continuous probability distribution or by using discretisation. We evaluate Naive Bayes using both discretisation (NBD) and kernel density estimation (NBK). Discretisation transforms the continuous features into discrete features, and a distribution model is not required. Kernel density estimation models features using multiple (Gaussian) distributions, and is generally more effective than using a single (Gaussian) distribution.

9.1.2 C4.5 Decision Tree

The C4.5 algorithm [12] creates a model based on a tree structure. Nodes in the tree represent features, with branches representing possible values connecting features. A leaf representing the class terminates a series of nodes and branches.

Determining the class of an instance is a matter of tracing the path of nodes and branches to the terminating leaf. C4.5 uses the ‘divide and conquer’ method to construct a tree from a set S of training instances. If all instances in S belong to the same class, the decision tree is a leaf labelled with that class. Otherwise the algorithm uses a test to divide S into several non-trivial partitions. Each of the partitions becomes a child node of the current node and the tests separating S is assigned to the branches.

C4.5 uses two types of tests each involving only a single attribute A. For discrete attributes the test is A=? with one outcome for each value of A. For numeric attributes the test is A where  is a constant threshold. Possible threshold values are found by sorting the distinct values of A that appear in S and then identifying a threshold between each pair of adjacent values. For each attribute a test set is generated. To find the optimal partitions of S C4.5 relies on greedy search and in each step selects the test set that maximizes the entropy based gain ratio splitting criterion (see [12]).

The divide and conquer approach partitions until every leaf contains instances from only one class or further partition is not possible e.g. because two instances have the same features but different class. If there are no conflicting cases the tree will correctly classify all training instances. However, this over-fitting decreases the prediction accuracy on unseen instances.

C4.5 attempts to avoid over-fitting by removing some structure from the tree after it has been built. Pruning is based on estimated true error rates. After building a classifier the ratio of misclassified instances and total instances can be viewed as the real error. However this error is minimised as the classifier was constructed specifically for the training instances. Instead of using the real error the C4.5 pruning algorithm uses a more conservative estimate, which is the upper limit of a confidence interval constructed around the real error probability. With a given confidence CF the real error will be below the upper limit with 1-CF. C4.5 uses subtree replacement or subtree raising to prune the tree as long as the estimated error can be decreased.

In our test the confidence level is 0.25 and the minimum number of instances per leaf is set to two. We use subtree replacement and subtree raising when pruning.

9.1.3 Bayesian Networks

A Bayesian Network is a combination of a directed acyclic graph of nodes and links, and a set of conditional probability tables. Nodes represent features or classes, while links between nodes represent the relationship between them.

Conditional probability tables determine the strength of the links. There is one probability table for each node (feature) that defines the probability distribution for the node given its parent nodes. If a node has no parents the probability distribution is unconditional. If a node has one or more parents the probability distribution is a conditional distribution where the probability of each feature value depends on the values of the parents.

Learning in a Bayesian network is a two-stage process. First the network structure Bs is formed (structure learning) and then probability tables Bp are estimated (probability distribution estimation).

We use a local score metric to form the initial structure and refine the structure using K2 search and the Bayesian Metric [20]. An estimation algorithm is used to create the conditional probability tables for the Bayesian Network. We use the Simple Estimator, which estimates probabilities directly from the dataset [13]. The simple estimator calculates class membership probabilities for each instance, as well as the conditional

ACM SIGCOMM Computer Communication Review 14 Volume 36, Number 5, October 2006

probability of each node given its parent node in the Bayes network structure.

There are various other combinations of structure learning and search technique that can be used to create Bayesian Networks.

9.1.4 Naïve Bayes Tree

The NBTree [14] is a hybrid of a decision tree classifier and a Naïve Bayes classifier. Designed to allow accuracy to scale up with increasingly large training datasets, the NBTree algorithm has been found to have higher accuracy than C4.5 or Naïve Bayes on certain datasets. The NBTree model is best described as a decision tree of nodes and branches with Bayes classifiers on the leaf nodes.

As with other tree-based classifiers, NBTree spans out with branches and nodes. Given a node with a set of instances the algorithm evaluates the ‘utility’ of a split for each attribute. If the highest utility among all attributes is significantly better than the utility of the current node the instances will be divided based on that attribute. Threshold splits using entropy minimisation are used for continuous attributes while discrete attributes are split into all possible values. If there is no split that provides a significantly better utility a Naïve Bayes classifier will be created for the current node.

The utility of a node is computed by discretising the data and performing 5-fold cross validation to estimate the accuracy using Naïve Bayes. The utility of a split is the weighted sum of the utility of the nodes, where the weights are proportional to the number of instances in each node. A split is considered to be significant if the relative (not the absolute) error reduction is greater than 5% and there are at least 30 instances in the node.

9.2 Appendix B: Subset Evaluation Metrics 9.2.1 Consistency

The consistency-based subset search algorithm evaluates subsets of features simultaneously and selects the optimal subset. The optimal subset is the smallest subset of features that can identify instances of a class as consistently as the complete feature set.

To determine the consistency of a subset, the combination of feature values representing a class are given a pattern label. All instances of a given pattern should thus represent the same class. If two instances of the same pattern represent different classes, then that pattern is deemed to be inconsistent. The overall inconsistency of a pattern p is:

IC (p)  np - cp (3)

where np is the number of instances of the pattern and cp the number of instances of the majority class of the np instances. The overall inconsistency of a feature subset S is the ratio of the sum of all the pattern inconsistencies to the sum of all the pattern instances nS:

IRSIC(p)

( )  . (4)

The entire feature set is considered to have the lowest inconsistency rate, and the subset most similar or equal to this is considered the optimal subset.

9.2.2 Correlation-based Feature Selection

The CFS algorithm uses an evaluation heuristic that examines the usefulness of individual features along with the level of inter-correlation among the features. High scores are assigned to

subsets containing attributes that are highly correlated with the class and have low inter-correlation with each other.

Conditional entropy is used to provide a measure of the correlation between features and class and between features. If H(X) is the entropy of a feature X and H(X|Y) the entropy of a feature X given the occurrence of feature Y the correlation between two features X and Y can then be calculated using the symmetrical uncertainty:

C ( X | Y ) H ( X ) H ( X | Y )

H(Y)

The class of an instance is considered to be a feature. The goodness of a subset is then determined as:

k rci (6)

kk(k1) rii

where k is the number of features in a subset, rci the mean

feature correlation with the class and rii the mean feature

correlation. The feature-class and feature-feature correlations are the symmetrical uncertainty coefficients (Equation 5).

9.3 Appendix C: Table of Features

Feature Description Abbreviation

Minimum forward packet length minfpktl

Mean forward packet length meanfpktl

Maximum forward packet length maxfpktl

Standard deviation of forward packet length stdfpktl

Minimum backward packet length minbpktl

Mean backward packet length meanbpktl

Maximum backward packet length maxbpktl

Standard deviation of backward packet length stdbpktl

Minimum forward inter-arrival time minfiat

Mean forward inter-arrival time meanfiat

Maximum forward inter-arrival time maxfiat

Standard deviation of forward inter-arrival times stdfiat

Minimum backward inter-arrival time minbiat

Mean backward inter-arrival time meanbiat

Maximum backward inter-arrival time maxbiat

Standard deviation of backward inter-arrival times stdbiat

Protocol protocol

Duration of the flow duration

Number of packets in forward direction fpackets

Number of bytes in forward direction fbytes

Number of packets in backward direction bpackets

Number of bytes in backward direction bbytes

ACM SIGCOMM Computer Communication Review 15 Volume 36, Number 5, October 2006

Econometric analysis of financial trade

processes by mixture duration models

Reinhard Hujer a,*, Sandra Vuleti´c b

aUniversity of Frankfurt/M., IZA Bonn, ZEW Mannheim

bUniversity of Frankfurt/M.

Version: 18 October 2004

Abstract

We propose a new framework for modelling the time dependence in duration pro-cesses. The well known ACD approach introduced by Engle and Russell (1998) will be extended so that an unobservable stochastic process accompanies the dura¬tion process. Our creation is called Mixture ACD model (MACD) which puts the conjunction into practice. It is a moderate tool for description of financial duration processes. The introduction of a latent regime variable can be justified in the light of recent market microstructure theories. In an empirical application we show that the MACD approach is able to capture specific characteristics of intraday transaction durations while alternative ACD models fail.

Key words: Duration models, time series models, mixture models, financial transaction data, market microstructure.

JEL classification: C41, C22, C25, C51, G14.

1 Introduction

Investigating the microstructure of financial markets has become very pop¬ular over the last twenty years. Theoretical assertions concerning the behavior of market participants in the presence of asymmetric information are discussed

* Corresponding author. Johann Wolfgang Goethe-University, Department of Eco-nomics and Business Administration, Institute of Statistics and Econometrics, Mer-tonstrasse 17, 60054 Frankfurt on the Main, Germany. Tel.: +49 69 798 28115; Fax: +49 69 798 23673. E-mail: hujer@wiwi.uni-frankfurt.de (R. Hujer).

1 The authors thank Dr. Stefan Kokot for valuable preparatory work.

in many contributions. In this respect Easley, Kiefer, O’Hara, and Paperman (1996) deliver a prominent approach. Statistical methodology will be employed in order to check empirically the validity of the implications of market mi¬crostructure models. Since rich transaction data sets are available containing detailed information about the timing of trades, prices, volume and other rele¬vant characteristics for a wide range of financial securities, it is possible to get to the bottom of financial markets. Theory and the application of a tailor - made statistical instrument are combined in the elaboration of Kokot (2004).

New econometric methods appear rapidly and they experience an extensive application in the branch of finance. The autoregressive conditional duration model (ACD) introduced by Engle and Russell (1998) is an auspicious ap¬proach which couples the spirit of time series models with econometric tools for the analysis of transition data. Ultra high frequency data, stemming from transaction data sets and having the characteristic of irregular spacing in time, are ideal actuality for the use of the innovative framework. The ACD model is perfectly suitable for the analysis of dynamics of arbitrary events associated with the trading process along time, and the durations between successive occurrences of interesting market events are object of investigation.

As demonstrated by Bauwens, Giot, and Grammig (2000) the periods of time elapsing between successive trades exhibit an idiosyncrasy which could not even be captured by extensions of the original model. For the first time the flexible Markov switching ACD model developed by Hujer, Vuleti´c, and Kokot (2002) is capable of higher forecast accuracy of the trading process itself, but it requires much effort and computing power in estimation. We intend to in¬troduce an alternative model with a parsimonious parameterization, called the Mixture ACD model (MACD), which also attains to good performance. Integral part of the MACD model is a latent discrete valued regime variable whose involvement can be justified in the light of recent market microstructure models. The unobservable regime can be associated with the presence (or ab¬sence) of private information about an asset’s value that is initially available exclusively to a subset of informed traders and only eventually disseminates through the mere process of trading to the broader public of all market par¬ticipants.

The manageable MACD model bears a resemblance to the general switch¬ing autoregression model introduced by Hamilton (1989) and nests many of the existing autoregression duration models as special cases. There are several

models that are closely related to our approach as well. Despite the affinity to the duration model given by De Luca and Gallo (2004), the MACD model differs substantially in the distributional assumption. It has the discrete mix-ture in common with the threshold ACD model introduced by Zhang, Russell, and Tsay (2001).

This paper is structured as follows: A brief review of the idea of ACD mod¬eling is given in Section 2. In Section 3 the MACD model will be introduced and compared to related work on duration models. Moreover we discuss esti¬mation procedures and specification tests for MACD models. In an empirical application in Section 4 we present estimation results employing a transac¬tion data set for the common share of Boeing traded on the New York Stock Exchange. Finally, in Section 5 we summarize our main results and give a perspective on possible issues for future research.

2 The ACD model

Autoregressive conditional duration (ACD) models, introduced by Engle and Russell (1998), are designed to account for patterns of autocorrelation typically observed in time series of intervals between successive occurrences of market events associated with the trading process. The definition of the market event depends on the specific aim of the study.

Let x = tt1 be the duration between the recordings of the (n1) - th and the n - th market event with the deterministic conditional mean function

yo = E(x1; è), (2.1)

where the information set 1consists of all preceding durations up to time t1 and è is the corresponding set of parameters. The ACD model is defined by some parameterization of this conditional mean and by the decomposition

yo, (2.2)

where the residual process ån is assumed to be i. i. d. with density g (ån; è) depending on a set of distributional parameters è, support on the positive real line and an unconditional expectation equal to one. The flexibility of the ACD model can be altered by modifying the distributional assumption of the residuals and/or by changing the specification of the conditional mean function. The distributional assumption of the residuals determines the density

of the durations fn (xn I ~n1; θ), where θ = (θψ, θε) represents the whole parameter set. A list of common choices for g (εn; θε) includes the exponential, the Weibull, the Burr (1942), and the generalized gamma distribution, all of them nested in the comprehensive family of distributions.

In a standard ACD(p, q) model the parameterization of the conditional mean is linear according to

ψn = ω + Ep βk ' ψnk + Eq αk ' xnk, (2.3)

k=1 k=1

and it can be transformed into an ARMA(max(p, q), p) representation from which expressions for the unconditional moments of xn may be derived easily. In order to ensure non-negativity for the conditional mean the parameters ω, αk, and βk are forced to be non-negative. Computational problems due to this strong restriction may be circumvented by using logarithmic versions. Bauwens and Giot (2002) propose the following LACD(p, q) specification

ln (ψn) = ω + Ep βk ' ln(ψnk) + Eq αk ' ln(xnk) (2.4)

k=1 k=1

and the corresponding analytical expressions for the unconditional moments are given by Bauwens, Galli, and Giot (2003). In both specifications station-arity depends on the magnitudes of the parameters αk, and βk.

3 The Mixture ACD model

3.1 The basic framework

The basic assumption of the Mixture ACD model, also referred to as MACD, is that the duration process xn is accompanied by an unobservable stochas¬tic process sn. The stochastic process sn is characterized by a discrete valued random variable with countable support J = {j I 1 < j < J, J E J and has the task to represent the regime in which the duration process xn prevails at time tn. In financial applications the existence of different trading regimes may provide evidence on the presence of agents with private information about an asset’s value.

Decomposition (2.2) holds in the sense that the innovation process εn has a known discrete mixture distribution with E (εn) = 1 and invariant higher moments across the N observations considered in the sample. The density of

each innovation ån has the following formal appearance

where each weight 0 < ð(j) < 1 represents the corresponding long run proba-bility for prevailing in state j and è(j)

 is the corresponding parameter vector characterizing the conditional density of the innovation process driven in the j-th regime. Consequently, the unconditional density of the innovation process as given in equation (3.1) depends on all regime specific distributional parame

(

ters gathered into the vector è = è(1)

 , ... , θ())' and on  = ((1)())'.



Any of the densities mentioned in Section 2 may be used in order to specify the regime specific distributions of the innovation process. De Luca and Gallo (2004) build up a duration model where the innovation process follows the Schuhl distribution, being simply a discrete mixture of exponential distribu¬tions. The MACD model can be recognized as a generalization which allows for more flexibility.

On the one hand the expected value of each innovation E(ån) is constrained to be equal to one and on the other hand this expected value turns out to

( )

be a discrete mixture of regime specific expectations E ε|s =j; è(j)

 . This

implies the maintenance of the equality

( )

π() • E ε|s = j; è(j)  (3.2)

which does not require that all the regime specific expectations are equal to

( )

one. In the case of E ε|s = j; è(j)

 = 1 for all j E ,7 , the MACD model

coincides with a special case of the static variant of the Markov switching ACD model developed by Hujer, Vuleti´c, and Kokot (2002).

By the change of variable technique with x = ån • øn, the relevant density for statistical inference is the duration’s marginal density

which depends on the parameter vector è = (è, è, è)'. The mean function øn = E (xn | Fn1; è) is assumed to capture the whole persistence of the duration process by an appropriate recursion.

Note, that the MACD model does not allow for different regime specific

mean functions. This feature may induce scathing criticism, especially from a theoretical point of view. But the empirical experience with strongly restricted Markov switching ACD models can be used as a vindicative argument. Hujer, Kokot, and Vuleti´c (2003) conclude that even the static variant of the Markov switching ACD model with regime independent dynamics in the mean func¬tion and regime specific distributional parameters performs reasonably well in terms of forecast accuracy.

3.2 Estimation of the Mixture ACD model

For discrete mixture models there are two ways by which maximum like-lihood estimates of the parameter vector 0 may be obtained. The direct nu¬merical maximization of the incomplete log-likelihood function

(0) = Eln [f(x1; 0)] (3.4)

=1

under the linear constraint E=1 7r() = 1 and additional restrictions for non-negativity, stationarity and eventually for distributional parameters is the standard approach. Log-likelihood functions of mixture models are charac¬terized by the existence of multiple local maxima. In order to catch the global maximum, the repetition of the parameter estimation with different start val¬ues is strongly recommended. Since standard maximization algorithms often fail or produce nonsensical results, maximum likelihood estimates for discrete mixture models are often obtained by the use of the robust Expectation-Maximization (EM) algorithm introduced by Dempster, Laird, and Rubin (1977).

In the hypothetical situation where we can observe the realizations of the regime variable the complete log-likelihood function is given by

where z() = 1 if s = j and zero otherwise. The expectation of (3.5) condi¬tional on all observed data  = (x1, ... , x) leads to the expected complete log-likelihood function (0, 00) = E((0) ; 00) which is simply ob¬tained by replacing z()

 by the probabilistic inference

π() xs = j, 1; è(j)

ξ() 0 f 0 , è0

 =  (3.6)

0 

() xs = k, 1; è(k)

0 , è0

=1

evaluated for some parameter vector guess è0. Evaluation of LEC(è, è0) con

stitutes the first part of the EM-algorithm and is commonly referred to as the

E-step. The associated M-step consists of maximizing (è, è0) in respect of

the parameter vector è and can be conducted separately with respect to the

regression parameters and the regime probabilities if n(nn=n1;(j)

εψ) = 0

(k)

for all j, k  (1,. .. J). The estimates for the regime probabilities are given by

and the remaining parameters may be obtained from the solution to

 ln f(x  s = j, 1; θ() 

ξ()  , è)

• = 0. (3.8)

 θ

By repeating the two steps of the EM-algorithm until the absolute change of the parameter vector is smaller than some prespecified convergence criterion, estimates of the parameter vector are obtained. Hamilton (1990) shows that the final estimates èˆ maximize the incomplete log-likelihood function.

3.3 Statistical inference

Diebold, Gunther, and Tay (1998) propose a method which can be ap-plied to test the forecast performance of general dynamic models. The idea behind this specification test has been extensively used by Bauwens, Giot, and Grammig (2000) to compare different types of ACD models. Denote by f(x1; ˆè)Nn=1 the sequence of density forecasts evaluated using the parameter vector estimate èˆ from some parametric model and denote by f(x1; è)Nn=1 the sequence of densities corresponding to the true but unobservable data generating process of xn. As shown by Rosenblatt (1952), under the null hypothesis

H0 : f(x1; ˆè)Nn=1 = fn(xnFn1; è)Nn=1, (3.9)

the sequence of empirical integral transforms defined by

f(u 1; ˆθ) du (3.10)

will be uniform i.i.d. on the unit interval. Any test for uniformity of the se-quence of integral transforms can be used to assess the forecast performance of the model under consideration. Consider partitioning the support of ζn into K equally spaced bins and denote the number of observations falling into the k-th bin by Nk. The test statistic RT

compares the theoretical frequency ςk = K1 to the observed relative frequency ˆςk = NkN and has a χ2 distribution with (K  1) degrees of freedom under the null hypothesis. The independence feature may be checked by computing the Ljung and Box (1978) test for the sequence of empirical integral transforms. The statistical tests for i. i. d. uniformity may be supplemented by graphical tools. Departures from uniformity can easily be detected using a histogram plot or quantile-quantile plot based on the sequence of ˆζn, while the auto cor-relogram for ˆζn can be used in order to assess the independence property.

3.4 Link to microstructure models

The modern literature on the microstructure of financial markets, grad-ually widening in the style of Easley, Kiefer, O’Hara, and Paperman (1996), picks out the presence of diverse types of market participants (traders) as a central theme. The intercommunity of the broad literature is the initial posi-tion that the market participants are differentiated by the level of information which they harness privately and consequently the trading mechanism will be discussed under the aspect of asymmetric information. Concerning this matter it is easy to imagine that some traders exist who catch a signal indi-cating that an asset is either overpriced or underpriced while other traders do not notice anything. So, the market development can be easily characterized by the coexistence and interaction of just two categories of traders: informed traders and uninformed traders, also called liquidity traders or followers. The informed trader’s strategy consists of making purchases and sales of assets in the immediate aftermath of the recognition of favorable and unfavorable sig-nals. The informed traders encroach upon the market development conjunctly

and trigger heaped transactions as soon as they bushwhack relevant news. Uninformed traders are insensible in regard to the information processing and retain the habitual trading activity.

The collectivity of transactions, carried out either by the large attendance of uninformed traders or by sporadic emersions of informed traders as a result of information based decisions, can be seen as a realization of a point process and the corresponding probability law that governs the occurrence of trades can be specified by a duration statistic. The presence of different traders acting on the financial market makes the embedding of a conglomerate of trader specific characteristics into the ordinary ACD framework adjacent. Because a specific transaction does not reveal by which type of trader it has been induced, the introduction of an underlying unobservable mixing variable with discrete distribution is reasonable. The mixing parameters represent the corresponding probabilities that a transaction arises from a specific type of trader.

This simple theoretical background is excellently reflected in the MACD framework which bases upon an arbitrary mixture distribution for the stochas¬tic process of innovations. Thereby the regime variable is in the capacity of the mixing variable and the mixing parameters can be interpreted as fractions of the different trader types acting on the market. The level of discrepancy between trader specific peculiarities in trading behavior can be easily regu¬lated by adapting the parameters inside of equation (3.2). The instantaneous transaction rates turn out to be different across the trader categories and this is what we want to achieve primarily.

Bauwens, Giot, and Grammig (2000) report on the deficiency of ordinary ACD models which is well founded by the inability of modelling observations in the tails of their distributions appropriately. This arouses the suspicion that the duration process is mulcted of some facts with fundamental importance. The thoughts stimulated by the market microstructure theory justify an ad¬vanced approach for duration data which is materialized in the concise MACD framework. By doing this, we hope to succeed in overcoming the lack of sat¬isfactory forecast performance of ordinary ACD models and we expect a clear answer from the empirical application given in the following section.

4 Empirical application

4.1 The data set

The data used in our empirical application consists of transactions of the common stock of Boeing, recorded on the New York stock exchange from the trades and quotes database provided by the NYSE Inc. The sampling period spans 19 trading days from November 1 to November 27, 1996. We used all trades observed during the regular trading day (9:30 - 16:00). The trading times have been recorded with a precision measured in seconds. Observations occurring within the same second have been aggregated to one trade. In the final data set we removed censored observations: durations from the last trade of the day until the close and durations from the open until the first trade of the day.

It is well known that the length of the durations varies in a deterministic manner during the trading day that resembles an inverted U-shaped pattern. Engle and Russell (1997) propose to decompose the duration series into a deterministic time of day function (D(t1) and a stochastic component x, so that the raw durations are generated from ˜x = x• (D(t1). In order to remove the deterministic component we apply the two step method proposed by Engle and Russell (1997) in which the time of day function is estimated separately from other model parameters. 2 Dividing each raw duration ˜x in the sample by an estimate of the time of day function (D(t1), a sequence of deseasonalized durations x is obtained which is used in all subsequent analyses. 3

Descriptive information about sample moments and Ljung Box statistics of the raw and the seasonally adjusted duration data is reported in Table 1.

< insert Table 1 about here >

2 Simultaneous ML-estimation as in Engle and Russell (1998) and Veredas et al. (2002) is also feasible. Engle and Russell (1998) report that both procedures give similar results if sufficient data is available.

3 Estimates of the time of day function were obtained by conducting a semi-nonparametric regression of the durations on the time of day according to Gallant (1981) and Eubank and Speckman (1990). Details on the seasonality adjustment step are available from the authors upon request.

As expected, the series of adjusted durations has a mean of approximately one. Both time series exhibit overdispersion relative to the exponential distri-bution which has standard error equal to mean. A mixture of distributions will accommodate well to the stylized fact of overdispersion. Another eyecatching characteristic of the data is the presence of strong positive autocorrelation in the trade durations as can be seen in Figure 1.

< insert Figure 1 about here >

Even after seasonal adjustment, the Ljung-Box tests reject the hypothesis of no autocorrelation up to 50 lags at the 5% significance level, although the shape of the autocorrelation function changes slightly. Therefore, an autoregressive approach appears to be appropriate as a model for the transaction durations. 4.2 Specification of the Mixture ACD Model

We estimate an ordinary ACD model and also two contrastable specifica-tions of the MACD model with consideration of two regimes, i. e. J = 2. The mean function ψn is logarithmic and both lag orders p and q in the recursion are equal to one, i. e.

ψn = exp(ω) ' ψβ1

n1 ' xα1

n1. (4.1)

Concerning the demand for a unit mean of the innovation process εn we dis-tinguish between two different cases. The restrictive variant, denoted by the character R in the following, comprises the fact that all regime specific ex

(

pectations of the innovation process E εnsn = j; θ(j)

ε are forced to be equal

to one, so that absolutely no care for equation (3.2) is needed. This variant may be estimated by employing the EM-algorithm, while the nonrestrictive variant, denoted by the character R¯ in the following, has to be estimated by maximizing the incomplete log-likelihood function directly.

Each regime specific distribution of the innovation process εnsn = j is taken from the Burr (1942) family of distributions with regular time-invariant distributional parameters κ(j) and σ(j) which are associated with each of the two regimes of interest. The introduction of additional time-invariant distri-butional parameters µ(j) is considered in the nonrestrictive case where the equality

=1 

π() • µ() 1 ê(j)• 1

Γ 1 +1• Γ

(j)()1

(j)

= 1 (4.2)

1+ 1 1

σ() ê(j) • Γ (j) + 1

has to be ensured in the course of estimation. Because of the need to consider two constrictive facts in estimation, i. e. the sum of all regime probabilities is equal to one and the requirement given in (4.2), one has to estimate µ(1) and ð(1) beside the regular distributional parameters and the parameters of the mean function only. In contrast, the restrictive case incorporates corresponding distributional parameters which obey a parameterization according to

µ() = 



1+ 1  1

σ() ê(j) • Γ (j) + 1

1 

Γ 1 + 1• Γ

(j) ()  1

(j) 

 (j) (4.3)

so that they are exempted from estimation. This parameter determination im-plies that each regime specific expectation of the innovation process is equal to one. Bringing together the restrictive and the unrestrictive variant, each ånsn = j follows the Burr distribution with the three distributional parame¬ters µ(j), ê(j), ó(j) and the regime specific density of the duration xn turns out to be

 µ()• κ()• x(j)1

f x s = j, 1; è(j)

 , θ = 

1 (4.4)

1 + ó(j) • µ(jn)•x(j) ó(j)+1



with the time variant parameter µ()

 = ()

 • µ(). Regardless to the in

ner constitution of equation (4.2) which makes room for the restrictive and unrestrictive variant of the MACD model, the regime specific distributions of a selective duration x turn out to be entirely different. Even the restric

tive variant which implies E xs =j, 1; è(j)

 , θ =  for every regime , gives leeway to different regime specific distributional features, i. e. the first moment of  is fix across all regimes but all higher moments are regime variant. The unrestrictive variant provides a cut above in the sense that all moments are allowed to be regime specific. But for all that, both specification variants definitively imply the fact (1; ) = . An interesting issue becoming apparent is whether the restrictive variant is sufficiently flexible to catch regime specific characteristics hidden in the duration process.

But first of all, we attend to the topic concerning the outclassing perfor

mance of MACD models. Coming from the standard ACD approach there is no exorbitant increase in the number of parameters composing a MACD model. In comparison to the ordinary Burr ACD model, the corresponding two regime MACD model, which conforms to the instruction of the R variant ( R¯ variant), requires the estimation of three (four) additional parameters only.

Parameter estimates, standard errors 4, values of the log-likelihood func-tion and information criterion; descriptive statistics for the series of empirical integral transforms and p-values of statistical tests for the corresponding pa-rameters being equal to the population counterpart implied by the uniform distribution on the unit interval; and also results of the specification tests for all of the model specifications we estimated are presented in Table 2.

< insert Table 2 about here >

At first, the Bayesian information criterion BIC proposed by Schwarz (1978) does not support the ordinary logarithmic ACD model which is nested as a special case in the MACD framework with logarithmic mean specification and J = 1. The test on the mean argues for the null hypothesis (¯  E (() = 0.5, but the result of the variance test is not in favor of the null hypothesis σ2 Var ((n) = 1 12. The low p-values obtained from the quantile tests are a sign of bad adaption in the tail of the distribution. Moreover, the specifica¬tion test that we performed does not support the one regime model. This can be seen from the low p-value of the ratio test which is equal to zero. Hence, the apparent defect of the ordinary logarithmic ACD model stems from the improper choice of distribution for the innovation process. However, the or-dinary logarithmic ACD model is able to capture the autocorrelation pattern of the intertrade durations adequately as indicated by the high p-value of the Ljung Box statistic for the series of empirical integral transforms.

The present results for proper mixture models indicate a significant im-provement on the performance of the ordinary logarithmic ACD model. They admit for the general conclusion: for J greater than one, first order MACD models are able to eliminate the distributional problem of ordinary ACD mod¬els and the autocorrelation pattern in the duration data will be considered

4 Standard errors have been computed based on numerical derivatives of the in¬complete log likelihood function using the quasi - maximum likelihood estimates of the information matrix as suggested by White (1982).

adequately. Even the two regime case breeds best results, as can be seen from the last four columns of Table 2. For each of the two variants we estimated, the arithmetic mean of the empirical integral transforms, denoted by ¯ˆζ, draws near one half, the corresponding empirical variance s2Cˆ becomes significantly one twelfth and the first, second and third quartile does not differ significantly from 0.25, 0.50 and 0.75 - these facts express the extraordinary conformance to the uniform distribution on the unit interval. The p-values of the RTC test increase by leaps and bounds, they rise to over 10%. The hypothesis of no autocorrelation in the integral transforms will be statistical significant at con¬ventional significance levels.

For purposes of comparison Figure 2 contains histogram plots and QQ - plots for the series of integral transforms for the 1-regime and the nonrestric-tive 2-regime model specification.

< insert Figure 2 about here >

The plots clearly show that the estimated MACD model produces empirical integral transforms that match the implied theoretical density very well and tends to give accurate forecasts over the whole range of observed values of x. In contrast, the plots for the one regime model show that the empirical integral transforms disagree sharply with the theoretical density, and that it tends to produce systematically biased forecasts of small x, the histogram for the first four quantiles is outside of the 95% confidence interval.

The parameter estimates for ω, α1 and β1, which determine the evolution of the duration’s conditional mean in time, differ only marginally across the three models we estimated totally. The same may be noticed for the distributional parameters. The estimation results obtained from the multiple regime models show that the two regular distributional parameters κ(j) and σ(j) vary keenly across the regimes, each with larger value in the second regime than in the first. This has a strong impact on the shape of the hazard function considered for each regime separately. For both variants of the MACD model Figure 3 dis¬plays the two regime specific hazard functions λ,,,(x,,,s,,, = j, F,,,1; ˆθ) and also the regime unspecific hazard rate λ,,,(x,,,F,,,1; ˆθ), each evaluated for ψ,,, = 1 and by taking the parameter vector estimate into account.

< insert Figure 3 about here >

Note in the first instance, that the choice for the one or the other variant does not change the qualitative nature of the hazard rates. The hazard rate assigned to the second regime tends to rise rather quickly after a transaction has been observed. In contrast the hazard function under the first regime increases moderately and gives clearly more weight to spells with a length of more than two units of time. This corresponds nicely to the fact that the first regime has higher probability ð(1) than the second regime. Roughly three fourths of all transactions were generated in the first regime.

So, the application of the MACD model affirms the existence of two con-stitutively different streams governing the process of intertrade durations and visualizes the different velocities from which trading evolves. The inertial trad-ing activity, adumbrated by the hazard rate of the first regime, predominates the whole trading process and can be associated with the theoretical vision of trading behavior ascribed to the uninformed traders. The second regime awards the image of succinct trading which can be traced back to informed traders participating on the financial market.

5 Conclusions

Mixture models are frequently used in econometrics. This motivates us to combine the basic idea of mixture models with the art of ACD modelling originally introduced by Engle and Russell (1998). The fusion is realized by the Mixture ACD model (MACD) which we present, challenge and put to the test in this paper. We can conclude from our research work that the MACD model turns out to be a promising new framework for modelling auto corre¬lated durations obtained from high frequency data sets from stock and foreign exchange markets.

In the first instance the MACD model emerges as a successful tool for forecasting time series of intraday transaction durations, as such it is able to remove the distributional problem from which ordinary ACD models occasion-ally suffer. Since the pompous Markov switching ACD model of Hujer, Vuleti´c, and Kokot (2002) and the slender discrete mixture exponential ACD model of De Luca and Gallo (2004) are seen as rivals, the creation of the MACD model can be recognized as a compromise solution between the two extremes. As a smart generalization the MACD model enhances the prestige of the discrete

mixture exponential ACD model and as a manageable special case it belittles the pride of the Markov switching ACD model. The amount of flexibility of the MACD model can be regulated in four directions: the number of regimes, the regime specific distributional assumptions, the mean function and finally the condition for unit mean in the residual process are starting points for altering the comprehension.

A further asset of the MACD model is its interpretation in the context of recent market microstructure models. The weights ð(j) can be perspicuously regarded as fractions of informed and uninformed traders acting on the finan¬cial market, but the imagination of constant proportions all along the time may be questionable. Therefore, an interesting extension of the MACD model would be to make it fit for time varying regime probabilities.

References

Bauwens, L., Galli, F., Giot, P., 2003. The moments of log-ACD models. Discussion Paper 11, CORE, Universit´e Catholique de Louvain.

Bauwens, L., Giot, P., 2002. The logarithmic ACD model: An application to the bid-ask quote process of three NYSE stocks. Annales d’Economie et Statistique 60, 117–149.

Bauwens, L., Giot, P., Grammig, J. Veredas, D., 2000. A comparison of financial duration models via density forecasts. Discussion Paper 60, CORE, Universit´e Catholique de Louvain and University of Frankfurt, forthcoming in: International Journal of Forecasting.

Burr, I. W., 1942. Cumulative frequency functions. Annals of Mathematical Statis¬tics 13, 215–232.

De Luca, G., Gallo, Giampiero, M., 2004. Mixture processes for financial intradaily durations. Studies in Nonlinear Dynamics & Econometrics 8 (2), 1–18, article 8.

Dempster, A. P., Laird, N. M., Rubin, D. B., 1977. Maximum likelihood from in¬complete data via the EM algorithm. Journal of the Royal Statistical Society 39, 1–38, series B.

Diebold, F. X., Gunther, T. A., Tay, A. S., 1998. Evaluating density forecasts with applications to financial risk management. International Economic Review 39 (4), 863–883.

Easley, D., Kiefer, N., O’Hara, M., Paperman, J. P., 1996. Liquidity, information and infrequently traded stocks. Journal of Finance 51 (4), 1405–1436.

Engle, R. F., Russell, J. R., 1997. Forecasting the frequency of changes in quoted for¬eign exchange prices with the autoregressive conditional duration model. Journal of Empirical Finance 4 (2-3), 187–212.

Engle, R. F., Russell, J. R., 1998. Autoregressive conditional duration: A new model for irregulary spaced transaction data. Econometrica 66 (5), 1127–1162.

Eubank, R. L., Speckman, P., 1990. Curve fitting by polynomial-trigometric regres¬sion. Biometrica 77 (1), 1–9.

Gallant, A. R., 1981. On the bias in flexible functional forms and an essentially unbiased form. Journal of Econometrics 20 (2), 285–323.

Hamilton, J. D., 1989. A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica 57 (2), 357–384.

Hamilton, J. D., 1990. Analysis of time series subject to changes in regime. Journal

of Econometrics 45, 39–70.

Hujer, R., Kokot, S., Vuleti´c, S., 2003. Comparison of MSACD models, Johann Wolfgang Goethe University, Frankfurt am Main.

Hujer, R., Vuleti´c, S., Kokot, S., 2002. The markov switching ACD model. Working Paper Series: Finance and Accounting 90, University of Frankfurt.

Kokot, S., 2004. The Econometrics of Sequential Trade Models : Theory and Ap¬plications Using High Frequency Data. Springer Verlag.

Ljung, G. M., Box, G. E. P., 1978. On a measure of lack of fit in time series models. Biometrica 65 (2), 297–303.

Rosenblatt, M., 1952. Remarks on a multivariate transformation. Annals of Math¬ematical Statistics 23 (3), 470–472.

Schwarz, G., 1978. Estimating the dimension of a model. Annals of Statistics 6 (2), 461–464.

Veredas, D., Rodriguez-Poo, J., Espasa, A., 2002. On the intradaily seasonality and dynamics of a financial point process: A semiparametric approach. Discussion Paper 23, CORE, Universit´e Catholique de Louvain.

White, H., 1982. Maximum likelihood estimation of misspecified models. Economet-rica 50 (1), 1–25.

Zhang, M. Y., Russell, J. R., Tsay, R. S., 2001. A nonlinear autoregressive condi¬tional duration model with applications to financial transaction data. Journal of Econometrics 104 (1), 179–207.

Tables

Table 1

Descriptive Statistics for trade durations

Statistic Raw durations ˜x,,, Adj. durations x,,,

Arithmetic mean 48.3248 1.0007

Standard deviation 61.8416 1.1933

Minimum 1.0000 0.0141

First Quartile 10.0000 0.2323

Median 27.0000 0.5875

Third Quartile 61.0000 1.2980

Maximum 894.0000 16.1672

Sample size 9092 9092

Ljung Box statistic 3815.6633 1362.7593

 The Ljung Box statistic is based on 50 lags. For a significance level of 5% the tabulated critical value is 67.1671.

Table 2

Estimation results and specification tests

Parameter Ordinary R variant R variant

Estimate Stderr Estimate Stderr Estimate Stderr

ω 0.0147 0.0021 0.0184 0.0028 0.0139 0.0020

α1 0.0248 0.0035 0.0224 0.0032 0.0233 0.0033

β1 0.9715 0.0047 0.9725 0.0045 0.9713 0.0047

µ(1) - - - - 0.9241 0.0438

?(1) 1.1699 0.0182 1.4220 0.1801 1.4652 0.0513

?(2) - - 2.7822 0.0448 2.4410 0.1546

s(1) 0.3333 0.0284 0.3542 0.1750 0.3887 0.0425

s(2) - - 2.5387 0.0395 1.5921 0.2256

π(1) - - 0.7252 0.0228 0.7238 0.0249

N 9092.00 9092.00 9092.00

 -8691.82 -8518.40 -8510.35

BIC 17429.21 17109.73 17102.75

¯ˆ(,p (T(()) 0.4963 0.2217 0.4984 0.5972 0.4994 0.8429

sˆ(, p (T(() ) 0.2960 0.0006 0.2880 0.7618 0.2883 0.8706

ˆ(025, p (T25(()) 0.2219 0.0000 0.2543 0.3433 0.2564 0.1577

ˆ(05, p (T50(()) 0.4903 0.0631 0.4887 0.0310 0.4915 0.1050

ˆ(075, p (T75(()) 0.7654 0.0007 0.7500 0.9915 0.7507 0.8846

RT(, p(RT() 248.4424 0.0000 25.5817 0.1423 30.7584 0.0429

LB(, p(LB() 54.2217 0.3166 53.4368 0.3437 52.4001 0.3810

 is the value of the incomplete log-likelihood function. BIC is the Bayesian informa-tion criterion computed as 2 • + ln(N) • k where k denotes the number of estimated parameters. A couple of descriptive statistics is given for the series of empirical integral transforms: (¯ˆis the arithmetic mean and p (T(()) is the p-value of a test for E(() = 0.5. sˆ( is the standard deviation and p (T(()) is the p-value of a test for V ar(() = 121.

ˆ(025 is the 25 percent quantile and p (T25(()) is the p-value of a test for (025 = 0.25; the analogous computations are done for the 50 and 75 percent quantile. RT( is the value of the ratio test for i. i. d. uniformity of ( using 20 equal bins and p(RT() is the corresponding p-value. LB( is the value of the Ljung-Box statistic for 50 lags and p (LB() is the corresponding p-value.

Figures

Fig. 1. Autocorrelation function for durations

Raw durations ˜x,,, Adjusted durations x,,,

Fig. 2. Histograms and QQ-plots for integral transforms

1-regime model 2-regime model

Fig. 3. Hazard function

Restrictive variant  Nonrestrictive variant 

Towards Measuring Process Model Granularity

via Natural Language Analysis

Henrik Leopold', Fabian Pittke',3, and Jan Mendling2

1 Humboldt-Universit¨at zu Berlin, Unter den Linden 6, 10099 Berlin, Germany

henrik.leopold®wiwi.hu-berlin.de

2 WU Vienna, Augasse 2-6, A-1090 Vienna, Austria

jan.mendling®wu.ac.at

3 SRH University Berlin, Ernst-Reuter-Platz 10, 10587 Berlin, Germany

fabian.pittke®srh-uni-berlin.de

Abstract. Nowadays business process modeling is an integral part of many organizations to document and redesign complex organizational processes. Particularly due to the large number of process models, quality assurance represents an important issue in many organizations. While many quality aspects are well understood and can be automatically checked with existing tools, there is currently no possibility to support modelers in maintaining a consistent degree of granularity. In this paper, we leverage natural language analysis in process models to introduce a novel set of metrics that indicate the granularity of process models. We evaluate the proposed metrics using two hierarchically organized process model collections from practice. Statistical tests demonstrate the expressive power of the proposed metrics.

Key words: Process Modeling, Model Granularity, Granularity Metrics

1 Introduction

Nowadays business process modeling is an integral part of many organizations. The use cases of business process modeling range from documentation to the redesign of complex organizational operations [1]. As a result of such initiatives, many companies face huge process model repositories, including, in extreme cases, up to thousands of models [2]. The sheer amount of process models motivates the need for techniques and concepts that ensure the efficient management and organization of these repositories.

One of the key concepts to keep track of the large number of process models is the introduction of a process architecture [3]. Such a process architecture represents an organized overview of the company’s business process models and their interrelations [4]. Therefore, process architectures typically define a hierarchy with different levels of granularity. While models on higher levels represent processes on a rather abstract level, models on lower levels illustrate fine-grained process details. A driving issue in this context is to define, to identify, and to maintain a consistent degree of detail on each level of a process architecture

2 Leopold et al.

[5]. Although there exist several guidelines on the proper design of process models [6, 7, 8], the provided recommendations on process model granularity are not very specific and do not support process modelers in deciding on the appropriate level of detail. As there is currently no sufficiently effective possibility of measuring the granularity of a process model, the decision about the appropriate level of detail is purely based on the subjective assessment of the modelers.

In this paper, we address the problem of measuring the granularity of process models. Our contribution is twofold: first, we propose a set of metrics that operationalize the concept of granularity for process models from a natural language perspective, and second, we evaluate the statistical power of each metric to indicate process model granularity. As a result, a modeler can be supported and guided during the modeling process in order to create models with consistent granularity.

The remainder of the paper is structured accordingly. Section 2 reflects upon the concept of granularity in related research streams and proposes several perspectives for measuring process model granularity. Then, Section 3 introduces a set of granularity metrics based on these perspectives. Subsequently, Section 4 challenges these metrics against two model repositories from practice and evaluates their statistical significance. Finally, Section 5 concludes the paper.

2 Background

This section introduces the concept of granularity and relates it to the context of this paper. First, we provide a generic perspective on granularity before discuss other approaches to granularity from a variety of research streams. Finally, we discuss granularity in the context of process models and the associated perspectives.

2.1 On the Concept of Granularity

Granularity is one of the most basic concepts of human cognition that decom¬poses a whole into smaller parts of it. The term granularity originates from the Latin word granus and refers to the property of being granular and consisting of smaller grains or particles1. Zadeh [9] defines this concept as construction, interpretation, and representation of granules, i.e., a clump of objects drawn together by indistinguishably, similarity, proximity, or functionality.

We can distinguish three different interpretations of the granularity of a system composed of a number of entities [10]. First, granularity is interpreted in terms of the gestalt effect. This theory states that human cognition of concepts is highly narrow and causes the effect that a collection of entities is perceived as a whole rather than a set of individual parts. Second, granularity can be interpreted in terms of generality and specificity which typically leads to a number of hierarchical levels of granularity used for classification purposes. Third,

1 see Oxford Online Dictionary for granularity and granular

Towards Measuring Process Model Granularity 3

granularity can be perceived by instantiation relationships. This is the case, when an object is created from another object by adding detail or, from the opposite viewpoint, when an object is linked to another by removing detail. Considering the different interpretations of the granularity concept, a variety of approaches exist that apply the granularity concept on different subjects.

2.2 Operationalizations of Granularity

The concept of granularity has been recognized in various research fields such as granular computing, information retrieval, service-oriented architectures, software and application design, text and language processing, and conceptual modeling. However, only a part of these works define metrics that operationalize granularity.

Particularly in the field of service-oriented architectures, many authors pro¬posed strategies to measure granularity. This is due to the fundamental im¬portance of granularity in this context. Many of these approaches consider the scope of a service by referring to its lines of code (LOC) or its function points [11, 12, 13, 14]. Krammer et al. [11] as well as Heinrich and Zimmermann [12] extend these approaches by a distance-oriented metric that explicitly considers the hierarchical position of a service in an architecture and by a size-oriented metric that considers the amount of directly and indirectly included services. This idea is also applied by Wang et al. to decompose software components into different layers [15].

In information retrieval, granularity is concerned by two subjects, namely queries that retrieve documents and the documents themselves which are the result of a given query. The approaches defined in [16] and [17] adhere to the second interpretation of granularity and formalize the specificity of a query in terms of the amount of retrieved documents and their size. In contrast to this syntactic approach, Yan et al. [18] combine syntactic and semantic aspects of the document content. Specifically, the topical coverage of a document and its semantic associations among the domain concepts are used as an indicator for granularity.

Another research stream that applies the granularity concept is natural language processing (NLP). The main focus of NLP is the automated analysis of large natural language texts, which are also referred to as corpora. In this context, granularity is concerned with predicting the scope of a corpus, i.e. its topical coverage, or the specificity of a given text. In order to predict text specificity, Allen and Wu [19] propose a set of predictors, such as entropy and word co-occurence, to determine the specificity of input terms with respect to a list of general and specific terms. To decide on the collection scope, Allen and Wu combine a semantic relatedness measure and the co-occurrence of words [20].

In accordance with prior research from other fields, granularity metrics are also adapted in the domain of process modeling. The need for embracing this concept arises from the hierarchical organization of process models in many process model collections. Typically, such a hierarchy contains coarse-grained models on the top level and more fine-granular models on the lower levels. Hence, the granularity of a process model is indicated by its position in the level hierarchy. To measure

4 Leopold et al.

process model granularity, different metrics have been introduced. Similarly to approaches from other fields, Holschke et al. use the number of model elements to discuss granularity [21]. Other authors make use of the number of meronymic relationships in a model [10]. Nevertheless, these approaches do not consider the special characteristics of process models provided by the combination of modeling elements and natural language elements. Therefore, such metrics are not sufficient to effectively guide modelers in reaching a consistent level of granularity.

2.3 Perspectives on Process Model Granularity

Aiming at measuring the granularity of process models, it is necessary to inves-tigate the characteristics of process models in more detail. In particular, it is essential to understand process models as a combination of modeling language and natural language. Thus, both perspectives should be considered for process model granularity. Therefore, we follow a systematic approach and explore how process models convey real-world semantics.

Looking at a typical process model, we observe that it is composed of two basic elements. First, we identify different constructs from modeling languages such as BPMN or EPC. Generally, we can distinguish modeling constructs like activities, events, gateways, flow relations, and roles. In addition, these constructs are combined according to certain modeling language rules. Second, we identify natural language text labels that enrich the model constructs with necessary information to fully capture real world-semantics. Therefore, both elements are required as modeling constructs without a natural language text label would ignore necessary details.

Following this line of argumentation, a process model combines two types of languages: a modeling language and a natural language. Both languages have a syntactic dimension and a semantic dimension. The syntactic dimension defines how the constructs of the modeling language or the words of the natural language can be combined. The semantic dimension specifies the meaning of the constructs of the modeling language or the meaning of the words of the natural language. The overall semantics of the model is thus given by combining the semantics of both languages as depicted in Figure 1. Accordingly, we can assess the granularity from four perspectives: the syntax of the modeling language, the semantics of the modeling language, the syntax of the natural language, and the semantics of the natural language.

As shown in the literature review, prior work on process model granularity primarily focused on syntactic and semantic perspective of the modeling language in process models [10, 21]. In this context, the number of process model nodes represents a syntactic aspect and the number of different constructs a semantic aspect. Obviously, the adequate consideration of natural language is missing. As natural language, however, represents an important share of the overall model semantics, it can be expected to represent an informing source for assessing process model granularity. Therefore, we aim at levering the potential of natural language and focus on the syntactic and the semantic dimension of natural language.

Towards Measuring Process Model Granularity 5

Fig. 1. Perspectives on a process model, adapted from [22]

3 Conceptual Approach

In this section, we introduce a set of metrics indicating the granularity of a process model. Therefore, we consider a process model P containing a set of activities A. Each activity a 2 A consists of three components: an action aa, a business object abo, and an addition aadd. As example, consider the activity Notify customer of problem consisting of the action notify, the business object customer, and the addition of problem. Note that the business object and the addition of an activity can be empty. The automatic annotation of activities with these components can be accomplished with the technique defined in [23].

In the following, we define two classes of metrics. First, we specify metrics operationalizing the syntactic dimension of natural language. This includes aspects like the number of words per label, but also more informed characteristics as for instance the usage of label components. Second, we introduce metrics operationalizing the semantic dimension of natural language. Hence, we consider aspects like the specificity of words. The following subsections introduce each metric in detail.

3.1 Metrics Building on Natural Language Syntax

Average Number of Words. As discussed by several works from prior research (e.g. [11, 12]), size is a commonly applied indicator for the granularity of an object. With the average number of words per label, we directly address this basic characteristic and adapt it to the nature of process model labels. Thus, we explicitly consider all words that form a label. The rationale is that coarse-grained

6 Leopold et al.

process models use more words in their activities than fine-grained ones. We define the average number of words as follows:

JAJ ,(1)

with JaJ being the number of words of the activity a.

Average Number of Words per Business Object. As discussed earlier, activity labels are typically composed of action, business object, and addition. Conceivably, we can also apply size oriented measures to these components. Particularly business objects, being the central artifact of activities, represent a valuable source of information in this context. Looking into linguistic literature how complex words are constructed, we can identify the concept of compounding [24]. Compounding is a word formation process that combines two words to a new and more specific word. For example, we can combine the words order and purchase to form the more specific word purchase order. Inspired by this concept, we consider the number of words of a business object as an indicator for word specificity. Accordingly, we introduce the following metric:

P a2A JaboJ

NoWbo = JAboJ , (2)

with JaboJ being the number of words of the business object and Abo representing all activities having a business object.

Average Number of Components. We already discussed that the activities tend to become more specific with an increasing number of words. Using the label annotation technique [23], we have another syntactic perspective on activity labels because we can explicitly distinguish between the three different components. Assuming that fine-granular models tend to include more components than activities from coarser grained models, we can define the following metric:

(3)

JAJ

with cov(a) being the number of components of activity a.

Maximum Business Object Count. In [25], the authors introduce the notion of a dominating business object representing a business object that is, due to its importance for the process, mentioned more often than other business objects. As a result, this particular business object is apparently discussed in more detail. Hence, we may conclude that models containing a dominating business object are more specific than models including a different business object in each activity. Following this line of argumentation, we introduce a measure that considers the maximal number of business objects in a process model:

MaxCbo = arg max {count(abo, A)Jabo 2 A} (4)

Towards Measuring Process Model Granularity 7

where the function count returns the number of occurrences of the business object abo in the model A.

Share of Business Objects or Additions. As we can distinguish between different components, we can analyze the contribution of each component to the label as a whole. Since we require activity labels to have at least one action, we restrict ourselves to business objects and additions. Similar to the compounding approach, a label that encompasses business objects or additional fragments is more specific than one that does not. Accordingly, process models with a considerably higher share of labels with business objects or additional fragments are more fine-grained and vice versa. Thus, we can define the following two metrics:

SoCbo = |{abo E A|abo =6 0}| (5)

|A|

SoCadd = |{aadd E A|aadd =6 0}| (6)

|A|

3.2 Metrics Building on Natural Language Semantics

Intrinsic Information Content of Business Objects and Actions. In the context of process model specificity, Friedrich [26] identified the taxonomy depth as an indicator for specificity. We seize this indication and extend it with the concept of information content [27]. More specifically, we use the intrinsic information content, which is based on the number of hyponyms (more abstract concepts) of a term in a taxonomy [28]. Since the addition of an activity does not represent a term, but a phrase, we calculate the intrinsic information content for actions and business objects:

IICbo = Pabo2A IICseco(abo)

|A| (7)

IICa = aa2A IICseco(aa) (8)

|A|

where the function IICseco represents the intrinsic information content as defined in [28].

WordNet Coverage of Business Objects and Actions. The last set of metrics is motivated by work from information retrieval, in which the authors explicitly consider the scope of a query to decide on the granularity [16, 17]. Similar to these approaches, we measure the amount of words that is retrievable in a lexical database. We assume that specific words, such as compounding words, are not covered and therefore can give valuable clues about the granularity of process models. Accordingly, we define:

WNCbo = |{abo E A|abo E6 WordNet}| (9)

8 Leopold et al.

WNCa = |{aa 2 Ap|aa 26 WordNet}| (10)

4 Evaluation

In this section, we evaluate the applicability of the introduced metrics for mea¬suring process model granularity. We use two hierarchically organized process model collections from practice and evaluate the statistical power of the metrics to indicate the hierarchy level of the considered process models. First, Section 4.1 gives a brief overview of the evaluation setup. Then, Section 4.2 introduces the process model collections we utilize. Finally, Section 4.3 presents the evaluation results.

4.1 Setup

To evaluate the proposed metrics, we implemented them in the context of a Java prototype. For the metrics WNCbo and WNCa, we accordingly employed the WordNet database [29]. To determine whether the metrics have sufficient statistical power to indicate the hierarchy level of a process model, we conducted a statistical test. Specifically, we first used the Kolmogorov-Smirnov test to evaluate whether the metric values are normally distributed. As this is not the case, we employed the Kruskal-Wallis test for determining the statistical significance. In particular, this test is suited for comparing the means of two or more non-normally distributed and independent samples. For these reasons, we preferred this test over other statistical tests, such as the Mann-Whitney or the t-test.

4.2 Test Collection Demographics

In order to demonstrate the applicability of the presented metrics, we employ two process model collections differing with respect to important characteristics including domain, model size, and the number of hierarchy levels:

o SAP Reference Model: The SAP Reference Model (SRM) captures the business processes of the SAP R/3 system in its version from the year 2000 [30, pp. 145-164]. It comprises 604 Event-driven Process Chains with in total 2433 activities covering 29 functional branches such as sales and accounting. The SAP Reference Model organizes the included process models in a hierarchy of two levels.

o Insurance Model Collection: The Insurance Model Collection (IMC) contains 349 EPCs dealing with the claims handling activities of a large insurance company. In total, the models include 1840 activities and hence are slightly bigger than the models from the SRM. The models are organized in a hierarchy of three different hierarchy levels.

Table 1 gives a summarizing overview of the characteristics of both process model collections.

Towards Measuring Process Model Granularity 9

Table 1. Characteristics of Employed Model Collections

Property SRM IMC

No. of Hierarchy Levels

No of Models on Level 1

No of Models on Level 2

No of Models on Level 3

No. of Activities

Average No. of Activities per Model

4.3 Evaluation Results

Table 2 gives an overview of the obtained results. It shows the arithmetic mean of the metrics for each hierarchy level (AM), the respective standard deviation (STD), and the p-value from the statistical test. From the table, we can classify the introduced metrics into three categories: (1) metrics being statistically significant for both collections, (2) metrics being statistically significant for one collection, and (3) metrics that are not significant for any of the collections.

Metrics falling into the first category include the total number of words (NoW), the number of words of the business object (NoWb°), the number of components (NoC), and the share of additions (SoCadd). The fact that these metrics are significant for both collections emphasizes that the degree of linguistic detail is a good indicator for the hierarchy level. Apparently, fine-granular models use more words to describe the process semantics. Interestingly, this cannot only be derived from the total number of words, but also from the label components. In particular additions are more frequently used in models from the lower levels of the hierarchy (level > 2). The significance of the metric NoWb° also emphasizes that models from lower levels use more specific business objects. As an example, consider the business object customer service request consisting of three words. Such a business object is typically only used in models belonging to lower hierarchy levels.

Metrics belonging to the second category include the share of business ob¬jects (SoCb°), the intrinsic information content of actions (IICa), the wordnet coverage of business objects (WNCb°), and the maximum number of business objects (MaxCb°) In general, these differences can be explained by different styles of modeling. For the two semantic metrics IICa and WNCb°, we obtain significant results for the three layered IMC. Apparently, the IMC models consis¬tently use more specific words on the lower hierarchy levels. Further, they use a higher number of business objects on higher levels. Hence, the metric MaxCb° is significant as well. For the SRM collection, only the share of business objects SoCb° obtains significant results. From this constellation we may conclude that the SRM models rather use no business objects on the lower levels than using less specific ones. Nevertheless, these results still emphasize that these metrics may represent valuable indicators for the granularity of a process model.

10 Leopold et al.

Table 2. Evaluation Results

Metric Level 1 AM STD Level 2 AM STD Level 3 AM STD p-value

NoW 2.73 1.12 3.66 1.31 - - < 0.01**

NoWbo 1.46 0.57 1.86 0.79 - - < 0.01**

NoC 1.83 0.32 2.07 0.32 - - < 0.01**

MaxCbo 1.39 0.68 1.4 0.74 - - 0.88

SoCbo 0.73 0.25 0.89 0.22 - - < 0.01**

SRM

SoCadd 0.07 0.15 0.14 0.27 - - < 0.05*

IICbo 0.55 0.30 0.57 0.28 - - 0.55

IICa 0.51 0.25 0.51 0.22 - - 0.81

WNCbo 0.37 0.89 0.36 0.85 - - 0.88

WNCa 0.04 0.20 0.04 0.21 - - 0.77

NoW 3.58 1.07 3.94 1.05 4.98 1.27 < 0.01**

NoWbo 2.56 1.03 2.28 0.90 2.52 0.95 < 0.01**

NoC 1.98 0.11 2.17 0.22 2.35 0.29 < 0.01**

MaxCbo 1.06 0.21 1.37 0.61 1.38 0.91 0.04*

IMC SoCbo SoCadd 0.98

0.00 0.11

0.00 0.98

0.18 0.07

0.22 0.98

0.38 0.93

0.29 0.66

< 0.01**

IICbo 0.61 0.41 0.69 0.20 0.65 0.20 0.13

IICa 0.97 0.16 0.52 0.26 0.46 0.24 < 0.01**

WNCbo 0.18 0.39 0.82 1.22 1.03 1.65 < 0.01**

WNCa 0.00 0.00 0.00 0.00 0.01 0.09 0.63

* Metric is significant at the 0.05 level (2-tailed), ** Metric is significant at the 0.01 level (2-tailed).

Although many metrics turned out to be relevant indicators for at least one collection, we did not obtain significant results for the wordnet coverage of actions (WNCa) and the intrinsic information content of business objects (IICb°). These results can be explained by having a closer look at the WordNet dictionary. The non-significance of the metric WNCa results from the fact that WordNet covers almost all verbs of the English language. Hence, non-covered actions represent an exception and cannot be used as granularity indicator. The non-significance of the metric IICb° suffers from the opposite problem. As many business objects are compounded by two or more words and thus rather specific, they are not part of the WordNet dictionary. As a result, it is not possible to employ WordNet to calculate useful specificity values for business objects from a semantic perspective.

All in all, the evaluation highlights that many of the introduced metrics have the statistical power to explain the hierarchy level of a process model. This has two important implications for maintaining a consistent quality of process model collections. First, the introduced metrics can be used to evaluate the consistency of existing process model collections, and second, they can support modelers to achieve an adequate level of detail right from the start. Depending on the goals of an organization, modeling tools could be configured to indicate the desired

Towards Measuring Process Model Granularity 11

characteristics. This may include syntactic aspects as the existence of an addition, or semantic aspects as the specificity of actions and business objects.

5 Conclusion

In this paper, we introduced a novel set of metrics to measure the granularity of a process model. Specifically, we used natural language analysis techniques and exploited syntactic and semantic aspects of natural language to operationalize process model granularity. The evaluation with two hierarchically organized process model collection from practice illustrated that a major share of the metrics have a statistically significant power to explain the hierarchy level of a process model. We highlighted that these results pave the way for evaluating and ensuring an adequate level of detail of process models. Considering the increasing size of process model collection in practice, this work represents an important contribution for the automated quality insurance of process models.

As this work is, to our knowledge, the first endeavour aiming at measuring process model granularity via natural language analysis, there are several direc¬tions for future work. First, we aim at investigating whether the proposed metrics can be combined to a single granularity indicator. Second, we plan to evaluate how the granularity assessment can support other process model techniques. For instance, process model matching could benefit from determining regions of differing granularity in the matching candidates. Finally, we plan to study in how far the metrics can effectively guide users to create models with an adequate level of granularity.

References

1. Kettinger, W., Teng, J., Guha, S.: Business Process Change: a Study of Method-ologies, Techniques, and Tools. MIS quarterly (1997) 55–80

2. Rosemann, M.: Potential Pitfalls of Process Modeling: Part A. Business Process Management Journal 12(2) (2006) 249–254

3. Malinova, M., Leopold, H., Mendling, J.: An empirical investigation on the design of process architectures. In: Wirtschaftsinformatik. (2013) 75

4. Dijkman, R., Vanderfeesten, I., Reijers, H.A.: The road to a business process architecture: an overview of approaches and their use. Technical report, Working Paper WP-350, Eindhoven University of Technology (2011)

5. Becker, J., zur M¨uhlen, M.: Towards a classification framework for application granularity in workflow management systems. In: CAiSE’99. (1999) 411–416

6. Becker, J., Rosemann, M., Uthmann, C.: Guidelines of Business Process Modeling. In van der Aalst, W.M.P., Desel, J., Oberweis, A., eds.: Business Process Management. Models, Techniques, and Empirical Studies. Springer, Berlin (2000) 30–49

7. Mendling, J., Reijers, H.A., van der Aalst, W.M.P.: Seven Process Modeling Guidelines (7PMG). Information and Software Technology 52(2) (2010) 127–136

8. Krogstie, J., Sindre, G., Jø rgensen, H.v.: Process models representing knowledge for action: a revised quality framework. Eur. J. Inf. Syst. 15(1) (2006) 91–102

12 Leopold et al.

9. Zadeh, L.A.: Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets Syst. 90(2) (sep 1997) 111–127

10. Henderson-Sellers, B., Gonzalez-Perez, C.: Granularity in conceptual modelling: Application to metamodels. In: ER 2010. Volume 6412. (2010) 219–232

11. Krammer, A., Heinrich, B., Henneberger, M., Lautenbacher, F.: Granularity of services. Business & Information Systems Engineering 3(6) (2011) 345–358

12. Heinrich, B., Zimmermann, S.: Granularity metrics for it services. In: ICIS’12. (2012) 1–19

13. Kulkarni, N., Dwivedi, V.: The role of service granularity in a successful soa realization - a case study. In: SERVICES ’08. (2008) 423–430

14. Ma, Q., Zhou, N., Zhu, Y., Wang, H.: Evaluating service identification with design metrics on business process decomposition. In: SCC ’09. (2009) 160–167

15. Wang, Z.J., Zhan, D.C., Xu, X.F.: Stcim: a dynamic granularity oriented and stability based component identification method. SIGSOFT Softw. Eng. Notes 31(3) (2006) 1–14

16. He, B., Ounis, I.: Inferring query performance using pre-retrieval predictors. In: String Processing and Information Retrieval. (2004) 43–54

17. Plachouras, V., Cacheda, F., Ounis, I., Rijsbergen, C.J.V.: University of glasgow at the web track: Dynamic application of hyperlink analysis using the query scope. In: TREC’03. (2003) 636–642

18. Yan, X., Lau, R.Y., Song, D., Li, X., Ma, J.: Toward a semantic granularity model for domain-specific information retrieval. ACM Trans. Inf. Syst. 29(3) (2011) 1–46

19. Allen, R.B., Wu, Y.: Generality of texts. In: ICADL’02. (2002) 111–116

20. Allen, R.B., Wu, Y.: Metrics for the scope of a collection: Research articles. J. Am. Soc. Inf. Sci. Technol. 56(12) (2005) 1243–1249

21. Holschke, O., Rake, J., Levina, O.: Granularity as a cognitive factor in the effec-tiveness of business process model reuse. In: BPM. Volume 5701. (2009) 245–260

22. Leopold, H.: Natural Language in Business Process Models. PhD thesis, Humboldt Universit¨at zu Berlin (2013)

23. Leopold, H., Smirnov, S., Mendling, J.: On the refactoring of activity labels in business process models. Information Systems 37(5) (2012) 443 – 459

24. Bieswanger, M., Becker, A.: Introduction to English Linguistics. UTB f¨ur Wis-senschaft: Uni-Taschenb¨ucher. Francke (2010)

25. Leopold, H., Mendling, J., Reijers, H.: On the Automatic Labeling of Process Models. In: Proceedings of the 23rd international conference on Advanced Information Systems Engineering. Volume 6741 of LNCS., Springer (2011) 512–520

26. Friedrich, F.: Measuring semantic label quality using wordnet. In: EPK. (2009) 7–21

27. Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: IJCAI’95. (1995) 448–453

28. Seco, N., Veale, T., Hayes, J.: An intrinsic information content metric for semantic similarity in WordNet. Proc. of ECAI 4 (2004) 1089–1090

29. Miller, G.A.: WordNet: a Lexical Database for English. Communications of the ACM 38(11) (1995) 39–41

30. Keller, G., Teufel, T.: SAP(R) R/3 Process Oriented Implementation: Iterative Process Prototyping. Addison-Wesley (1998)

IJCSNS International Journal of Computer Science and Network Security, VOL.17 No.12, December 2017 91

Intelligent Method for Software Requirement Conflicts

Identification and Removal: Proposed Framework and Analysis

Maysoon Aldekhail and Djamal Ziani,

King Saud University, Riyadh, Saudi Arabia

Summary

Requirement engineering has recently assumed a significant role in software engineering. In software development, requirements should be correct, complete and consistent. Consistency refers to requirements without any conflicts or contradictions. Requirement consistency is a critical factor in project success as any conflict may waste cost, time and effort. This paper will propose a novel intelligent approach in finding and solving conflicts in functional requirements. The approach works at two levels; a rule-based system to detect the conflicts in functional requirements; and the application of genetic algorithm to resolve conflicts and optimize the set of function requirements to produce minimum conflicts Key words:

Requirement conflicts, genetic algorithm, rule-based system

1. Introduction

Conflicts among requirements are a serious concern for project success. In requirement engineering, the term conflict involves inference, interdependency, and inconsistency between requirements [20]. In a recent research study [21], a very high number of conflicting requirements was identified among software projects. It was reported to have discovered n2 conflicts in n requirements. Another research study [19] has reported 40% to 60% of requirements in conflict. Previous studies have stated that one of the main reason for high project cost and time is the failure in managing requirement conflicts [6]. To prevent repetition of all the phases, it is important to detect and resolve conflicts in early phases of the project’s lifecycle [15]. Many research studies have shown the risks of working with requirements that are in conflicts with other requirements. These risks include overtime or over budget which can lead to project failure. At the very least, it would result in extra effort being expended. The requirement phase is the most critical phase of the software development cycle because the quality of the requirements phase affects the overall quality of the software. Wrong or incomplete requirements may cause incomplete or incorrect project [3].

The literature review [2] demonstrated that most techniques proposed to decrease the risks and detect requirements conflicts are manual. Thus, this takes a lot of time and effort for the software engineering techniques whereas the

Manuscript received December 5, 2017 Manuscript revised December 20, 2017

automated approaches are tools based on human analysis. However, these may incur costs to the project due to human error and wrong decision making. Moreover, most of the proposed approaches have not been evaluated to measure their rate of efficiency. No previous works have used Artificial Intelligence (AI) techniques to find or resolve conflicts. The application of AI techniques in Requirement Engineering (RE) is an emerging area of research that includes the development of ideas across two domains.

By applying an artificial intelligence technique to detect and resolve conflicts in requirements, it would replace human beings and thus, save a lot of time and effort for engineers. Additionally, this increases the quality of analysing the requirements, which in turn provides more accurate results in detecting and resolving conflicts. Moreover, using artificial intelligence technique would lack the human side that uses rational thinking and thus, reduce costs the project would have incurred due to human error and incorrect decisions. Artificial intelligence techniques are self-learning and evolving; these will provide better solutions and make reusing them easier. In addition, it would reduce the cost for hiring experts in requirement management.

The structure of the rest of the paper is as follows: Section 2 provides a description of requirements conflicts and the current research in detecting them while Section 3 presents the current requirements conflict resolution and their critique. Section 4 offers a review on rule-based systems and its application in requirements engineering. An overview of genetic algorithms and its different application in software engineering is provided in Section 5. The following section expands on the new approach and the potential benefits of applying this method. Finally, we conclude with recommendations for the future in Section 7.

2. Requirement Conflict Identification

Successful development of software systems requires complete, consistent and clear-cut requirements. Conflicting requirements is a problem that occurs when a requirement is inconsistent with another requirement [28]. Kim, Park, Sugumaran and Yang provide a useful definition

92 IJCSNS International Journal of Computer Science and Network Security, VOL.17 No.12, December 2017

of requirements conflict, “The interactions and dependencies between requirements that can lead to negative or undesired operation of the system.” [18]. Aldekhail, Chikh and Ziani provided general classifications for requirements conflicts based on types of requirements, functional requirements and non-functional requirements [2]. An example of conflicts in non-functional requirements is security (privacy metric) with usability (ease of function learning metric), so there is a compromise. However, the developer must choose an acceptable solution to find the right balance of attributes that work.

Many research studies are trying to find a new method to define and detect conflicts between requirements. In [2], the paper provides an overview on the previous research conducted in this area. It has analysed and classified twenty-two different techniques into different categories. The categorization is based as follows:

• The first classification is based on the method that is used to identify the conflicts, either manually by the requirement engineers or automatically using software tools.

• The second classification is focused on the type of requirements that the technique will be applied to: functional or non-functional requirements.

• The third classification is to determine the scope of the proposed approach to study if it covers the detection problem, to review detection and analysis of the conflicts requirements into different conflict types, and confirm if the proposed approach offers a resolving technique.

• The fourth classification is based on the representation type used for requirements. If the technique uses a particular formalization form, it structures the requirements in a particular model, or it uses an ontology.

The literature review [2] has established that most techniques that are proposed to detect requirements conflicts are manual techniques that take extensive time as well as effort and may cause delays in the project. In addition, these are considered fallible since there is human effort involved. Some conflict techniques have built in some tools trying to automate the detection process. Thus, this would decrease the human effort and time. However, all the automation approaches are still based on human analysis to detect and resolve conflicts.

3. Requirement Conflict Resolution

In order to provide a complete picture about how conflicts are solved practically, different techniques are proposed by experts and software engineers. Described below are some techniques that are used to solve conflicts between requirements.

Sameer Abufardeh from University of Minnesota Crookston offered a few techniques from his experience and from the literature [1]:

• Using a process called rethinking the requirements: By going back to the sources of the conflicting requirements and trying to understand it and thereafter, addressing it differently.

• Getting all the stakeholders in one place and making them discuss and analyze the trade-offs amongst the conflicting requirements, and coming up with prioritization process with regards to the value to the project, cost, time, etc.

• Trying to replace two or more conflicting requirements with a single one that addresses the goals of the conflicting requirements

On the other hand, Samuel Sepúlveda from Universidad de La Frontera had other options as follows [1]:

• Using group-techniques such as focus group, brainstorming, KJ method, workshops, etc.

• Using a win-win model.

• Using GORE and i* diagrams to share goals and objectives with the stakeholders.

David Espina proposed deploying a prioritization method that scored each requirement with regards to the value, cost, and risk for the organization [10].

Jeff Grigg claims that one should prioritize their business goals, and then trace the requirements back to the business goals that they are trying to achieve. The next step would be to assign a higher priority to the requirement that traces back to a higher-priority business goal [12]. When conflicts are detected, negotiation for conflicts resolution can be conducted either by selecting alternatives or re-evaluating priorities.

Papa, Daniels, and Spiker presented some management methods that are used for negation and conflict resolution like theory x, where the managers are responsible for resolving conflicts between employees and decision making about what to follow up with to the management [22].

IJCSNS International Journal of Computer Science and Network Security, VOL.17 No.12, December 2017 93

However, this method is not very popular due to its negative point of view from the employees.

Another theory called ‘theory y’ was introduced where the whole responsibility for resolving conflicts was given to the employees. Theory z was added as a point to theory y where goals were set for employees before they started working and how they should always work on achieving them. Finally, theory w worked through mutual consideration and as such, it makes everyone a winner due to its nature.

Pair wise comparison method (PCM) was mentioned in [14] to be used in requirements conflict resolution. It used a matrix where the requirements are listed with their priorities for each stakeholder.

Another technique used was the Win-win model, which used ‘theory’ and was based on the idea that everyone is a winner. It has four steps: identify conflict issues, exploring options in architectural strategies, reaching agreements and eliciting win-win conditions.

Finally, we can see that conflict resolution is based on two main techniques: negotiation between stakeholders, and application of prioritization in requirements based on business goals and objectives. However, there is a lack in resolving techniques; all the existing ones are manual techniques and usually mere guidelines to help software engineers fix problems. No works exist that resolve the conflicts automatically. Also, most of them are proposed techniques that are not evaluated for their efficiency in detecting and resolving conflicts.

By studying the limitations in previous works, this research proposes applying artificial intelligence techniques to fix this gap in the area of requirements conflicts.

4. Rule-Based Systems

The rule-based system is the simplest form of artificial intelligence that uses rules as a way of representing the knowledge that is saved in the knowledge base [13]. A rule-based system depends on the expert system idea that mimics the reasoning behind the human expert’s decisions in problem solving and decision making.

The research on applying rule-based systems in requirements engineering have mostly used rule-based systems for verification purposes. Wang, Bai, Cai, and Yan presented a rule-based expert system to help evaluate software quality, and their evaluation results showed an improvement in design efficiency [32]. Chan et al. proposed a new requirement modelling approach called rule-based behaviour engineering to formally model requirements and provided a tool for communication among stakeholders [6]. Dzung and Ohnishi proposed a method for using rule-based

system to verify the correctness of requirement ontology [7]. By increasing the size of ontology, it becomes difficult to check the accuracy of information stored in it.

5. Computational Intelligence and Genetic Algorithm

Computational Intelligence (CI) is a sub-branch of artificial intelligence, which is also known as soft computing. It refers to the ability of a computer to learn a specific task from data or experimental observation [23]. Computational intelligent has many paradigms, neural networks, evolutionary algorithms, swarm intelligent, and fuzzy systems [9]. Figure 1 displays the CI paradigms and the evolutionary algorithms (evolution computing) as a subdivision of soft computing:

Fig .1 Computational intelligent techniques.

The objective of the evaluation algorithms is to mimic the process from natural evolution, where the main idea is the survival of the fittest and how the weak will eventually die [8]. Evolution is an optimization process where the goal is to improve the ability of a system to survive in a dynamically changing and competitive environment [16].

In the domain of search techniques, evolutionary algorithm is a family of stochastic search techniques that mimic the natural evolution proposed by Charles Darwin in 1858. The following classification (Figure 2) indicates the position of evolutionary algorithms in the area of search techniques:

94 IJCSNS International Journal of Computer Science and Network Security, VOL.17 No.12, December 2017

Fig. 2 Search techniques.

Genetic algorithm is a search-based optimization technique based on the principle of genetics and natural selection. It is usually used in optimization problems where there is a need to maximize or minimize a given objective function value under a given set of constraints [17]. Genetic algorithms start with guesses and attempts to improve these guesses by evolution. It is one of the most powerful methods with which high quality solutions are quickly created in response to a problem [27].

There are some basic terminologies that will be used while working with genetic algorithm [5] as follows:

• Search space – All possible solutions to the specific problem

• Population – It is a subset of all the possible solutions to the given problem.

• Chromosome is one such solution to the given problem.

• Gene is one element position of a chromosome.

• Allele is the value a gene takes for a particular chromosome.

According to Goodman, GA essentially includes the following [11]:

1. Representation of a solution called a chromosome; this should be represented in specific data structure or in binary.

2. An initial set of solutions; an initial population is usually build randomly

3. The fitness function; measures the fitness of any

proposed solution to meet the objective.

4. The selection function; selects which chromosome will participate in the next evolution phase.

5. The crossover operator; used in reproduction new chromosome by exchanging genes from two chromosomes.

6. The mutation operation; changes a gene in a chromosome and in turn, creates new chromosome.

7. The termination condition; determines when a genetic algorithm run will stop running.

A pseudo-code for a basic algorithm for a genetic algorithm is as follows [30]:

GA()

Initialize population

Find fitness of population

While (termination criteria is reached) do

Parent selection

Crossover with probability pc

Mutation with probability pm

Decode and fitness calculation

Survivor selection

Find best

Return best

Figure 3 below presents the flowchart of the basic genetic algorithm.

Fig. 3 Flowchart of basic GA.

IJCSNS International Journal of Computer Science and Network Security, VOL.17 No.12, December 2017 95

Genetic algorithm has mostly been applied to the scheduling and optimization of problems and for searching problems like TSP [24]. In [25], genetic algorithm technique has been used for conflict identification and resolution for project activities.

[26] and [27] have both presented a list of some applications of GAs in software engineering and the benefits of applying them. Many research studies have used GA in project effort and time estimation which is one of the most challenging aspects in software development, and the results were very positive. Also, different research studies using GA to help measure the performance of the system by applying GA in software metric in design, coding, quality, reliability and maintenance are presented. GA has had very good results in software testing.

Sharma, Sabharwal and Sibal demonstrated that GA has been used in all types of software tests, functional tests, model-based test case generation, regression testing, object-oriented unit testing as well as in black box testing [29]. Software testing is laborious and time-consuming work; it spends almost 50% of software system development resources [31]. Research has shown how this percentage generally decreases by applying AI techniques and especially when using GA.

6. Proposed Intelligent Conflict Identification and Removal Framework

As sections have shown, there are limitations in the previous works in requirements conflicts identification and resolutions, and that there is a need for applying AI technique in this area and its benefits. Thus, we will work on function requirements. The proposed solution is divided into two parts:

1. Defining requirements conflicts: The proposed solution is to build a rule-based system in if-then form based on discussion with experts in requirements engineering regarding the definition of conflicts between two function requirements. A set of rules will be defined, and these rules will determine when requirements are in conflict.

2. Resolving requirements conflicts: We are searching for optimum solution via alternative solutions, and what these optimization techniques

do. Optimization algorithm searches for an

optimal solution by an iterative process. We will use a genetic algorithm to solve the conflicts in requirements intelligently.

Figure 4 below shows the basic structure of the proposed model for detecting and resolving requirements conflict:

Fig. 4 Basic structure of proposed approach.

6.1 Algorithm for Proposed Approach

The main steps in the proposed approach are as follows:

Part A:

1. Develop the rule-based system (if-then-else) to detect conflicts between function requirements.

2. Read the input (function requirements set) from the excel file

3. Calculate the number of conflicts in the original function requirements set, and list the function requirements that have conflicts and the rule number that detects the conflicts

Part B:

1. Build Initial population randomly by generating attributes within the domain.

2. Find Fitness

3. Apply genetic algorithm until least one conflict solution is found.

The algorithm for proposed model is as follows:

Start

Get Input from Excel

Calculate conflict

Initial population is built randomly by generating values for attributes within domain.

Run Fitness (Conflict on each solution)

Select Solutions for GA

Apply Crossover between FRs

Apply Mutation between FRs

Repeat Fitness to Mutation until Stopping Criteria

Stop

96 IJCSNS International Journal of Computer Science and Network Security, VOL.17 No.12, December 2017

Figures 5 and 6 below show the flowchart for each part of the proposed model:

Fig. 5 Flowchart of part (A), Rule-based system.

Fig. 6 Flowchart of part (B), Intelligent system.

6.2 Theoretical Analysis of Proposed Approach

By applying the proposed approach of using Rule-based system and genetic algorithm, we expect that we would receive the benefits of an automated process of finding the conflicts between requirements with a simulation to expert work completed intelligently. This will reduce human error, time, and effort of the software engineers.

Also, we expect positive results by applying genetic algorithm in reducing the number of conflicts as much as possible to reach an optimal solution, and to resolve any conflicts. Furthermore, all previous research studies that have applied AI techniques in solving requirement problems in requirement engineering field reported positive results.

Using a genetic algorithm in resolving requirement conflicts will automate the task intelligently and increase the quality of the software development because it will remove human input, which would then provide accurate results in defining and solving conflicts. Also, it will eliminate the emotional side in solving the conflicts between different stakeholders and save costs due to human errors and inefficient decisions. Artificial intelligent techniques are self-learning and improving which allows us to keep reusing them and would thereby, reduce the cost of hiring experts.

7. Conclusion and Future Works

Working with inconsistence requirements will cost the project a lot; from time and effort expended which eventually leads to project failure. This novel approach proposes using an artificial intelligence technique in defining and resolving functional requirements technique. A rule-based system can be used to identify the conflicts and a genetic algorithm can be employed to resolve conflicts and produce a set of functional requirements with a minimum number of conflicts. Applying artificial intelligent technique would increase project efficiency, quality and reduce human effort and errors. In future works, the proposed approach will test and check the results on different sets of functional requirements within different projects.

Acknowledgements

The authors would like to thank the College of Computer and Information Sciences and the Research Center at King Saud University for their sponsorship.

References

[1] S. Abufardeh and S. Sepúlveda. “How to Deal with Stakeholders Conflicts in Requirements Gathering?” https://www.researchgate.net/post/How_to_deal_with_stake

holders_conflicts_in_requirements_gathering2, accessed

Sept. 2. 2016

[2] M. Aldekhail, A. Chikh and D. Ziani, D. “Software Requirements Conflict Identification: Review and Recommendations.” IJACSA, 7(10), pp. 336 – 335, 2017.

IJCSNS International Journal of Computer Science and Network Security, VOL.17 No.12, December 2017 97

[3] A.A. Alshazly, A.M. Elfatatry and M.S. Abougabal. “Detecting defects in software requirements specification.” AEJ, 53, pp. 513 – 527, 2014.

[4] W. H. Butt, S. Amjad and F. Azam. Requirement conflicts resolution: using requirement filtering and analysis. New York, U.S.A.: Springer. 383 – 397, 2011.

[5] J. Carr, “An introduction to genetic algorithms.” Senior

Project, pp. 1 – 40.

https://www.whitman.edu/Documents/Academics/Mathemat ics/2014/carrjk.pdf, accessed Aug. 21. 2015

[6] L.W. Chan, R. Hexel and L. Wen. “Rule-based behaviour engineering: Integrated, intuitive formal rule modelling.” Proc. 22nd Australian Software Engineering Conf. (ASWEC), Hawthorne, Victoria, Australia, pp. 20 – 29, June 2013.

[7] D.V. Dzung, and A. Ohnishi, “Customizable rule-based verification of requirements ontology.” Proc. IEEE 1st Int. Workshop on AIRE. Karlskrona, Sweden. pp. 19 – 26, August 2014.

[8] A.A. El-Sawy, M.A. Hussein, E.S.M. Zaki and A.A. Mousa. “An Introduction to Genetic Algorithms: A Survey A Practical Issues.” Int. Journal of SER, 5, pp. 252 – 262, 2014.

[9] A.P. Engelbrecht. Computational intelligence: An Introduction. John Wiley & Sons, 2007.

[10] D. Espina. Scope. “How Do You Manage Conflicting

Stakeholder Demands?”

http://pm.stackexchange.com/questions/1399/how-do-you-manage-conflicting-stakeholder-demands, accessed Nov. 26. 2016

[11] E.D. Goodman. “Introduction to Genetic Algorithms.” Proc. Companion Publication of the 2014 Annu. Conf. on Genetic and Evolutionary Computation. Vancouver, BC, Canada: ACM. pp. 205 – 226, July 2014.

[12] J. Grigg,. “Conflicting Requirements.”

http://c2.com/cgi/wiki?ConflictingRequirements, accessed Nov. 10. 2016

[13] C. Grosan and A. Abraham. “Rule-Based Expert Systems.” Intelligent Systems. Intelligent Systems Reference Library, (17). Springer, Berlin, Heidelberg, pp. 149 – 185, 2011.

[14] F. Hameed and M. Ejaz. “Model for conflict resolution in aspects within Aspect Oriented Requirement engineering”

(Master’s thesis).

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.4 76.4292&rep=rep1&type=pdf, accessed Aug. 6. 2016

[15] M. Heisel and J.A. Souquières. “Heuristic Algorithm to Detect Feature Interactions in Requirements.” In: S. Gilmore and M. Ryan, eds., Language Constructs for Describing Features. Springer, London, pp. 143 – 162, 2001.

[16] M.A. Iqbal, N.K., Khan, M.A. Jaffar, M. Ramazan and A.R. Baig. “Opposition Based Genetic Algorithm with Cauchy Mutation for Function Optimization. Proc. 2010 Int. Conf. on Information Science and Applications. Hotel Rivera, Seoul, South Korea, pp. 1 – 7, April 2001.

[17] H. Jiang. Can the Genetic Algorithm Be a Good Tool for Software Engineering Searching Problems? Proc. 30th Annu. Int. COMPAC. Chicago, USA, 2, pp. 362 – 366, September 2006.

[18] M. Kim, S. Park, V. Sugumaran and H. Yang. “Managing requirements conflicts in software product lines: A goal and scenario based approach.” Data & Knowledge Engineering, 61, pp. 417 – 432, 2007.

[19] D. Mairiza and D. Zowghi. “An ontological framework to manage the relative conflicts between security and usability requirements.” Proc. 3rd Int. Workshop on MARK, Sydney, Australia, pp. 1 – 6, September 2010.

[20] D. Mairiza, D. Zowghi and N. Nurmuliani. “Managing conflicts among non-functional requirements.” In L. Ngyen, D. Randall and D. Zowgi, eds., ARWE 2009 Proc. 12th AWRE, Sydney, Australia: University of Technology, Sydney, October 2009.

[21] T. Moser, D. Winkler, M. Heindl and S. Biffl. “Requirements Management with Semantic Technology: An Empirical Study on Automated Requirements Categorization and Conflict Analysis.” Proc. from 23rd Int. Conf. AISE, Berlin, Springer-Verlag, Berlin, Heidelberg, pp. 3 – 17, June 2011.

[22] M.J. Papa, T.D. Daniels and B.K. Spiker. Organizational communication perspectives and trends. SAGE Publications, Inc, 2008.

[23] H.M. Pandey. “Solving lecture time tabling problem using GA.” Proc. 6th Int. Conf. – Cloud System and Big Data Engineering, Noida, India, Amity University, Department of Computer Science & Engineering, pp. 45 – 50, January 2016.

[24] R. Raghavjee and N. Pillay. (2008, October). “An Application of Genetic Algorithms to the School Timetabling Problem.” Proceedings of the 2008 Annual Research Conference of the South African Institute of Computer Scientists and Information Technologists on IT Research in Developing Countries: Riding the Wave of Technology. Wilderness, South Africa. New York, U.S.A.: ACM. pp. 193 – 199, October 2008.

[25] M. Ramazan, M.A. Iqbal, M.A. Jaffar, A. Rauf, S. Anwar and A.A. Shahid. “Project Scheduling Conflict Identification and Resolution Using Genetic Algorithms.” Proc. Int. Conf. on Information Science and Applications. Hotel Rivera, Seoul, South Korea, pp. 1 – 6, April 2010.

[26] P. Reena, K. Bhatia. “Application of Genetic Algorithm in Software Engineering: A Review.” IRJES, 6, pp. 63–69, 2017.

[27] Samriti. “Applications of Genetic Algorithm in Software Engineering, Distributed Computing and Machine Learning.” IJCAIT, vol. 9, no. 2, 2016.

[28] B. Schär. “Requirements Engineering Process HERMES 5 and SCRUM.” (Master’s thesis). University of Applied Sciences and Arts, Northwestern Switzerland, 2015.

[29] C. Sharma, S. Sabharwal and R. Sibal. “Applying genetic algorithm for prioritization of test case scenarios derived from UML diagrams.” IJCSI, vol. 8, no. 3, pp. 433 – 444, 2011

[30] S.N. Sivanandam and S.N. Deepa. Introduction to genetic algorithms. Berlin: Springer-Verlag Berlin Heidelberg, 2007.

[31] P.R. Srivastava and T. Kim. “Application of genetic algorithm in software testing.” IJSEA, vol. 3, no. 4, pp. 87 – 96, 2009.

[32] X. Wang, Y. Bai, C. Cai, and X. Yan. “A production rule-based knowledge system for software quality evaluation.” Proc. 2nd Int. Conf. ICCET, Chengdu, China., 6, pp. 208 – 211, April 2010.

98 IJCSNS International Journal of Computer Science and Network Security, VOL.17 No.12, December 2017

Maysoon Aldekhail has been a PhD candidate at King Saud University in the Computer Sciences and Information Systems College since 2013. She received her Master’s degree in Information Systems from King Saud University, Saudi Arabia in 2009. Her current professional occupation is Lecturer at the Information System Department at the College of Computer and Information Sciences in Al-Imam University in Riyadh, Saudi Arabia. Her research interests include Requirements Engineering and ERP.

Djamal Ziani has been an associate professor at King Saud University in the Computer Sciences and Information Systems College since 2009. He is also a researcher in ERP and in the data management group of CCIS, King Saud University. He received a Master’s degree in Computer Sciences from the University of Valenciennes, France in 1992, and a Ph.D. in Computer Science from the University of Paris Dauphine, France in 1996. He has been a consultant and project manager in various companies in Canada, such as SAP, Bombardier Aerospace, and Montreal Stock Exchange from 1998 to 2009.

www.cs.brandeis.edu/~dcc

PROGRAM

Data Compression Conference (DCC 2016)

Sponsored by U. Arizona, Brandeis U., Microsoft Research, IEEE Signal Processing Society Proceedings published by IEEE Computer Society Conference Publishing Services (CPS)

Snowbird, Utah, March 29 - April 1, 2016

PROGRAM COMMITTEE

Michael W. Marcellin, University of Arizona (DCC Co-Chair)

James A. Storer, Brandeis University (DCC Co-Chair)

Ali Bilgin, University of Arizona (Committee Co-Chair)

Joan Serra-Sagrista, U. Autonoma de Barcelona (Committee Co-Chair)

Henrique Malvar, Microsoft Research (Publications Chair)

James E. Fowler, Mississippi State University (Publicity Chair)

Alberto Apostolico (honorary member), Georgia Institute of Technology

Charles D. Creusere, New Mexico State University

Travis Gagie, University of Helsinki

Hamid Jafarkhani, University of California Irvine

Yuval Kochman, Hebrew University

Alistair Moffat, The University of Melbourne

Giovanni Motta, Google, Inc.

Gonzalo Navarro, University of Chile

Antonio Ortega, University of Southern California

Jan Ostergaard, Aalborg University

Majid Rabbani, Eastman Kodak Co.

Yuriy Reznik, InterDigital, Inc.

Thomas Richter, University of Stuttgart

Victor Sanchez, University of Warwick

Serap Savari, Texas A&M University

Khalid Sayood, University of Nebraska

Rahul Shah, Louisiana State University

Dana Shapira, Ariel University

Ofer Shayevitz, Tel Aviv University

Dafna Sheinwald, IBM Haifa Lab

Gary J. Sullivan, Microsoft Corporation

Jiangtao Wen, Tsinghua University

Ji-Zheng Xu, Microsoft Research

En-Hui Yang, University of Waterloo

Yan Ye, Interdigital, Inc.

SCHEDULE OVERVIEW:

Tuesday Evening, March 29:

Registration and Reception (7pm - 10pm)

Wednesday, March 30:

Morning: Technical Sessions 1, 2 (8:00am - 12:20pm)

Mid-Day: Panel Presentation (2:00pm - 3:30pm)

Afternoon: Technical Sessions 3, 4 (4:00pm - 7:20pm)

Thursday, March 31:

Morning: Technical Sessions 5, 6 (8:00am - 12:20pm)

Mid-Day: Keynote Presentation (2:00pm - 3:00pm)

Afternoon: Technical Session 7 (3:20pm - 5:40pm) Evening: Poster Session and Reception (6:00pm - 8:30pm)

Friday, April 1:

Morning & Mid-Day: Technical Sessions 8,9,10 (8:00am - 2:40pm)

TUESDAY EVENING

Registration / Reception, 7:00-10:00pm (Golden Cliff Room)

WEDNESDAY MORNING

SESSION 1, Compressed Data Structures, Part 1

8:00am: Lempel-Ziv Computation in Compressed Space (LZ-CICS) 3

Dominik Köppl' and Kunihiko Sadakane2

1TU Dortmund, 2University of Tokyo

8:20am: Linear Time Succinct Indexable Dictionary Construction with Applications 13

Guy Feigenbla', 2, Ely Porat', and Ariel Shiftan', 3

1Bar-Ilan University, 2IBM Research, 3NorthBit

8:40am: Computing LZ77 in Run-Compressed Space 23

Alberto Policriti',2 and Nicola Prezza'

'University of Udine, 2Institute of Applied Genomics

9:00am: Parallel Lightweight Wavelet Tree, Suffix Array and FM-Index Construction 33

Julian Labeit', Julian Shun2, and Guy E. Blelloch3

1Karlsruhe Institute of Technology, 2UC Berkeley, 3Carnegie Mellon University

9:20am: Induced Suffix Sorting for String Collections 43

Felipe A. Louza', Simon Gog2, and Guilherme P. Telles'

1University of Campinas, 2Karlsruhe Institute of Technology

9:40am: Faster, Minuter 53

Simon Gog', Juha Kärkkäinen2, Dominik Kempa2, Matthias Petri3, and Simon J. Puglisi2

1Karlsruhe Institute of Technology, 2University of Helsinki,

3University of Melbourne

10:00am: A Space Efficient Direct Access Data Structure 63

Gilad Baruch', Shmuel T. Klein', and Dana Shapira2

1Bar-Ilan University, 2Ariel University

Break: 10:20am - 10:40am

SESSION 2, Recent Developments in Video Coding, Part 1

10:40am: Enhanced Multiple Transform for Video Coding 73

Xin Zhao, Jianle Chen, Marta Karczewicz, Li Zhang, Xiang Li,

and Wei-Jung Chien

Qualcomm Technologies, Inc.

11:00am: Bi-directional Optical Flow for Future Video Codec 83

Alshin Alexander and Alshina Elena

Samsung

11:20am: Structure-driven Adaptive Non-local Filter for High Efficiency

Video Coding (HEVC) 91

Jian Zhang', Chuanmin Jia', Nan Zhang2, Siwei Ma', and Wen Gao' 1Peking University, 2Capital Medical University

11:40am: Adaptive Motion Vector Resolution Scheme for Enhanced Video Coding 101

Zhao Wang', Jian Zhang', Nan Zhang2, and Siwei Ma'

1Peking University, 2Capital Medical University

12:00pm: Intra Frame Flicker Reduction for Parallelized HEVC Encoding 111

Ziyu Wen, Jisheng Li, Jiashuo Liu, Yikai Zhao, and Jiangtao Wen

Tsingua University

- 2 -

Wednesday Lunch Break: 12:20pm - 2:00pm

WEDNESDAY MID-DAY

Panel Presentation

2:00pm - 3:30pm

Video Coding: Recent Developments for HEVC and Future Trends

Abstract:

This special event at DCC 2016 will consist of a keynote talk by Gary Sullivan (co-chair of the MPEG & VCEG Joint Collaborative Team on Video Coding) followed by a panel discussion with key members of the video coding and standardization community. Highlights of the presentation will include HEVC Screen Content Coding (SCC), High Dynamic Range (HDR) video coding, the Joint Exploration Model (JEM) for advances in video compression beyond HEVC, and recent initiatives in royalty-free video coding.

Panel Members:

Anne Aaron

Manager, Video Algorithms - Netflix

Arild Fuldseth

Principal Engineer, Video Coding - Cisco Systems

Marta Karczewicz

VP, Technology, Video R&D and Standards - Qualcomm

Jörn Ostermann

Professor - Leibniz Universität Hannover Institute for Information Processing

Jacob Ström

Principal Researcher - Ericsson

Gary Sullivan

Video Architect - Microsoft

Yan Ye

Senior Manager, Video Standards - InterDigital

- 3 -

WEDNESDAY AFTERNOON

SESSION 3

4:00pm: Regression Wavelet Analysis for Progressive-Lossy-to-Lossless Coding

of Remote-Sensing Data 121

Naoufal Amrani', Joan Serra-Sagristà', Miguel Hernández-Cabronero2, and Michael Marcellin3

1Universitat Autònoma de Barcelona, 2University of Warwick,

3University of Arizona

4:20pm: Transform Optimization for the Lossy Coding of Pathology Whole-Slide Images 131

Miguel Hernández-Cabronero', Francesc Aulí-Llinàs2, Victor Sanchez', and Joan Serra-Sagristà2

1University of Warwick, 2Universitat Autònoma de Barcelona

4:40pm: Point Cloud Attribute Compression Using 3-D Intra Prediction

and Shape-Adaptive Transforms 141

Robert A. Cohen, Dong Tian, and Anthony Vetro

Mitsubishi Electric Research Laboratories

5:00pm: On the Minimum Distortion of Quantizers with Heterogeneous

Reproduction Points 151

Erdem Koyuncu and Hamid Jafarkhani

University of California, Irvine

Break: 5:20pm - 5:40pm

SESSION 4

5:40pm: Nonconvex Lp Nuclear Norm Based ADMM Framework

for Compressed Sensing 161

Chen Zhao, Jian Zhang, Siwei Ma, and Wen Gao

Peking University

6:00pm: Compressive-Sensed Image Coding via Stripe-Based DPCM 171

Chen Zhao, Jian Zhang, Siwei Ma, and Wen Gao

Peking University

6:20pm: Compressive Tensor Sampling with Structured Sparsity 181

Yong Li', Wenrui Dai2, and Hongkai Xiong'

1Shanghai Jiao Tong University, 2University of California, San Diego

6:40pm: Bayesian Compressed Sensing with Heterogeneous Side Information 191

Evangelos Zimos', João F. C. Mota2, Miguel R. D. Rodrigues2,

and Nikos Deligiannis'

1Vrije Universiteit Brussels, 2University College London

7:00pm: A Reconstruction Algorithm with Multiple Side Information

for Distributed Compression of Sparse Sources 201

Huynh Van Luong', Jürgen Seiler', André Kaup', and Søren Forchhammer2 1Friedrich-Alexander-Universität, 2DTU Fotonik

- 4 -

THURSDAY MORNING

SESSION 5, Genome Compression

8:00am: Burrows-Wheeler Transform for Terabases 211

Jouni Sirén

Wellcome Trust Sanger Institute

8:20am: An Evaluation Framework for Lossy Compression of Genome Sequencing

Quality Values 221

Claudio Alberti', Noah Daniels2, Mikel Hernaez3, Jan Voges4,

Rachel L. Goldfeder3, Ana A. Hernandez-Lopez', Marco Mattavelli',

and Bonnie Berger2

1École Polytechnique Fédérale de Lausanne, 2Massachusetts

Institute of Technology, 3Stanford University,

4Institut fuer Informationsverarbeitung

8:40am: Efficient Compression of Genomic Sequences 231

Diogo Pratas, Armando J. Pinho, and Paulo J. S. G. Ferreira

University of Aveiro

9:00am: Predictive Coding of Aligned Next-Generation Sequencing Data 241

Jan Voges, Marco Munderloh, and Jörn Ostermann

Institut für Informationsverarbeitung

9:20am: Denoising of Quality Scores for Boosted Inference and Reduced Storage 251

Idoia Ochoa, Mikel Hernaez, Rachel Goldfeder, Tsachy Weissman,

and Euan Ashley

Stanford University

9:40am: A Cluster-Based Approach to Compression of Quality Scores 261

Mikel Hernaez, Idoia Ochoa, and Tsachy Weissman

Stanford University

10:00am: CS2A: A Compressed Suffix Array-Based Method for Short Read Alignment 271

Hongwei Huo', Zhigang Sun', Shuangjiang Li', Jeffrey Scott Vitter2,

Xinkun Wang3, Qiang Yu', and Jun Huan4

1Xidian University, 2University of Mississippi, 3Northwestern University,

4University of Kansas

Break: 10:20am - 10:40am

SESSION 6, Recent Developments in Video Coding, Part 2

10:40am: Compression Efficiency Improvement over HEVC Main 10 Profile for HDR

and WCG Content 279

Taoran Lu', Fangjun Pu', Peng Yin', Yuwen He2, Louis Kerofsky2, Yan Ye2,

Zhouye Gu3, and David Baylon3

1Dolby Laboratories, 2InterDigital Communications, 3ARRIS Group Inc.

11:00am: High Dynamic Range Video Coding with Backward Compatibility 289

Dmytro Rusanovskyy', Done Bugdayci Sansli2, Adarsh Ramasubramonian',

Sungwon Lee', Joel Sole', and Marta Karczewicz'

1Qualcomm Tech. Inc., 2Qualcomm Tech. Finland

11:20am: Optimal Bitrate Allocation for High Dynamic Range and Wide Color

Gamut Services Deployment Using SHVC 299

T. Biatek', W. Hamidouche2, J.-F. Travers3, and O. Deforges2

1IRT b<>com, 2IETR/INSA Rennes, 3TDF

11:40am: Backward Compatible HDR Video Compression System 309

Sébastien Lasserre, Fabrice Le Léannec, Tangi Poirier, and Franck Galpin

Technicolor

12:00pm: Luma Adjustment for High Dynamic Range Video 319

Jacob Ström, Jonatan Samuelsson, and Kristofer Dovstam

Ericsson Research

- 5 -

Thursday Lunch Break: 12:20pm - 2:00pm

THURSDAY MID-DAY

Keynote Address

2:00pm - 3:00pm

JPEG PLENO: Towards a New Standard for Plenoptic Image Compression

Touradj Ebrahimi

École Polytechnique Fédérale De Lausanne (EPFL)

EPFL and JPEG Convener

Abstract:

JPEG format is today a synonymous of modern digital imaging, and one of the most popular and widely used standards in recent history. Images created in JPEG format now exceeds one billion per day in their number, and most of us can count a couple, if not more JPEG codecs in devices we regularly use in our daily lives; in our mobile phones, in our computers, in our tablets, and of course in our cameras. JPEG ecosystem is strong and continues an exponential growth for the foreseeable future. A significant number of small and large successful companies created in the last two decades have been relying on JPEG format, and this trend will likely continue.

A question to ask our selves is: will we continue to have the same relationship to flat snapshots in time (the so-called Kodak moments) we call pictures, or could there be a different and enhanced experience created when capturing and using images and video, that could go beyond the experience images have been providing us for the last 120 years? Several researchers, artists, professionals, and entrepreneurs have been asking this same question and attempting to find answers, with more or less success. Stereoscopic and multi-view photography, panoramic and 360-degree imaging, image fusion, point cloud, high dynamic range imaging, integral imaging, light field imaging, and holographic imaging are among examples of solutions that have been proposed as future of imaging.

(continued on the next page)

(keynote abstract continued)

Recent progress in advanced visual sensing has made it feasible to capture visual content in richer modalities when compared to conventional image and video. Examples include Kinect by Microsoft, mobile sensors in Project Tango by Google and Intel, light-field image capture by Lytro, light-field video by Raytrix, and point cloud acquisition by LIDAR (Light Detection And Ranging). Likewise, image and video rendering solutions are increasingly replying on richer modalities offered by such news sensors. Examples include Head Mounted Displays by Oculus and Sony, 3D projector by Ostendo and 3D light field display solutions by Holografika. This promises a major change in the way visual information is captured, processed, stored, delivered and displayed.

JPEG PLENO evolves around an approach called plenoptic representation, relying on a solid mathematical concept known as plenoptic function. This promises radically new ways of representing visual information when compared to traditional image and video, offering richer and more holistic information. The plenoptic function describes the structure of the light information impinging on observers’ eyes, directly measuring various underlying visual properties like light ray direction, multi-channel colours, etc.

The road-map for JPEG PLENO follows a path that started in 2015 and will continue beyond 2020, with the objective of making the same type of impact that the original JPEG format has had on today's digital imaging starting from 20 years ago. Several milestones are in work to approach the ultimate image representation in well-thought, precise, and useful steps. Each step could potentially offer an enhanced experience when compared to the previous, immediately ready to be used in applications, with potentially backward compatibility. Backward compatibility could be either at the coding or at the file format level, allowing an old JPEG decoder of 20 years ago to still be able to decode an image, even if that image won’t take full advantage of the intended experience, which will be only offered with a JPEG PLENO decoder.

This talk starts by providing various illustrations the example applications that can be enabled when extending conventional image and video models toward plenoptic representation. Doing so, we will discuss use cases and application requirements, as well as example of potential solutions that are or could be considered to fulfill them. We will then discuss the current status of development of JPEG PLENO standard and discuss various milestones ahead. The talk will conclude with a list of technical challenges and other considerations that need to be overcome for a successful completion of JPEG PLENO.

- 7 -

THURSDAY AFTERNOON

SESSION 7

3:20pm: Authorship Attribution Using Relative Compression 329

Armando J. Pinho, Diogo Pratas, and Paulo J. S. G. Ferreira

University of Aveiro

3:40pm: Timeliness in Lossless Block Coding 339

Jing Zhong and Roy D. Yates

Rutgers University

4:00pm: Online Grammar Transformation Based on Re-Pair Algorithm 349

Takuya Masaki and Takuya Kida

Hokkaido University

4:20pm: On Compression Techniques for Computing Convolutions 359

Eduardo Laber, Pedro Moura, and Lucas Pavanelli

PUC-RIO

4:40pm: A Simple and Efficient Approach for Adaptive Entropy Coding over Large Alphabets 369

Amichai Painsky, Saharon Rosset, and Meir Feder

Tel Aviv University

5:00pm: Interactive Function Compression with Asymmetric Priors 379

Basak Guler1, Aylin Yener1, Ebrahim MolavianJazi1, Prithwish Basu2, Ananthram Swami3, and Carl Andersen2

1The Pennsylvania State University, 2Raytheon BBN Technologies, 3Army Research Laboratory

5:20pm: Compressing Combinatorial Objects 389

Christian Steinruecken

University of Cambridge

POSTER SESSION AND RECEPTION

6:00 - 8:30pm

In the Golden Cliff Room.

(Titles are listed at the end this program; abstracts of each presentation appear in the proceedings.)

- 8 -

FRIDAY MORNING

SESSION 8

8:00am: Tiny Descriptors for Image Retrieval with Unsupervised Triplet Hashing 397

Jie Lin', Olivier Morère',2, Julie Petta3, Vijay Chandrasekhar', and Antoine Veillard2

'Institute for Infocomm Research, 2Université Pierre et Marie Curie, 3Supélec

8:20am: From Visual Search to Video Compression: A Compact Representation

Framework for Video Feature Descriptors 407

Xiang Zhang', Siwei Ma', Shiqi Wang', Shanshe Wang', Xinfeng Zhang2, and Wen Gao'

'Peking University, 2Rapid-Rich Object Search (ROSE) Lab

8:40am: Locally-Weighted Template-Matching Based Prediction for Cloud-Based

Image Compression 417

Jean Bégaint', Dominique Thoreau', Philippe Guillotel', and Mehmet Türkan2

'Technicolor, 2Izmir University of Economics

9:00am: Coding Scheme for the Transmission of Satellite Imagery 427

Francesc Aulí-Llinàs', Michael W. Marcellin2, Victor Sanchez3, Joan Serra-Sagristà', Joan Bartrina-Rapesta', and Ian Blanes'

'Universitat Autònoma de Barcelona, 2University of Arizona,

3University of Warwick

9:20am: Optimizing Subjective Quality in HEVC-MSP: An Approximate Closed-form

Image Compression Approach 437

Shengxi Li', Mai Xu1,2, Yun Ren', Chengzhang Ma', and Zulin Wang',2

'Beihang University, 2Collaborative Innovation Center of Geospatial Technology

9:40am: Graph-Based Transform for 2D Piecewise Smooth Signals

with Random Discontinuities 447

Dong Zhang and Jie Liang

Simon Fraser University

10:00am: On Perceptual Audio Compression with Side Information at the Decoder 456

Adel Zahedi', Jan Østergaard', Søren Holdt Jensen', Patrick Naylor2, and Søren Bech',3

'Aalborg University, 2Imperial College, 3Bang & Olufsen

Break: 10:20am - 10:40am

SESSION 9, Recent Developments in Video Coding, Part 3

10:40am: Daala: A Perceptually-Driven Next Generation Video Codec 466

Thomas J. Daede',2, Nathan E. Egge',2, Jean-Marc Valin',2, Guillaume Martres1,3, and Timothy B. Terriberry',2

'Xiph.Org Foundation, 2Mozilla, 3EPFL

11:00am: The Thor Video Codec 476

Gisle Bjøntegaard, Thomas Davies, Arild Fuldseth, and Steinar Midtskogen

Cisco Systems

11:20am: Fast Algorithm for HDR Color Conversion 486

Andrey Norkin

Netflix Inc.

11:40am: General Synthesized View Distortion Estimation for Depth Map Compression

of FTV 496

Ang Lu, Yichen Zhang, and Lu Yu

Zhejiang University

12:00pm: A Framework of Complexity Optimally Scalable Algorithms for HEVC 506

Tingting Wang', Yihao Zhang', Huang Li', Hongyang Chao', and Feng Wu2

'Sun Yat-sen University, 2University of Science and Technology of China

- 9 -

FRIDAY MID-DAY

Break: 12:20pm - 12:40pm

SESSION 10, Compressed Data Structures, Part 2

12:40pm: Improved Range Minimum Queries 516

Héctor Ferrada and Gonzalo Navarro

University of Chile

1:00pm: Self-Indexing RDF Archives 526

Ana Cerdeira-Pena1, Antonio Fariña1, Javier D. Fern6,ndez2,

and Miguel A. Martínez-Prieto3

1University of A Coruna, 2Vienna University of Economics and Business, 3University of Valladolid

1:20pm: Shortest DNA Cyclic Cover in Compressed Space 536

Bastien Cazaux, Rodrigo C6,novas, and Eric Rivals †

University de Montpellier

1:40pm: Traversing Grammar-Compressed Trees with Constant Delay 546

Markus Lohrey1, Sebastian Maneth2, and Carl Philipp Reh1

1Universität Siegen, 2University of Edinburgh

2:00pm: Practical Index Framework for Efficient Time-Travel Phrase Queries

on Versioned Documents 556

Chun-Ting Kuo and Wing-Kai Hon

National Tsing Hua University

2:20pm: Compact Navigation Oracles for Graphs with Bounded Clique-Width 556

Shahin Kamali

Massachusetts Institute of Technology

- 10 -

Poster Session

(listed alphabetically by first author)

Motion Hint Field with Content Adaptive Motion Model for High Efficiency

Video Coding (HEVC) 579

Ashek Ahmmed and Mark Pickering

University of New South Wales

Joint Framework for Signal Reconstruction Using Matched Wavelet

Estimated from Compressively Sensed Data 580

Naushad Ansari and Anubha Gupta

Indraprastha Institute of Information Technology-Delhi

Lossy Compression of Unordered Rooted Trees 581

Romain Azaïs', Jean-Baptiste Durand2, and Christophe Godin3

1Universit6 de Lorraine, 2Universit6 Grenoble Alpes, 3Universit6 Montpellier 2

Single-Loop Software Architecture for JPEG 2000 582

David Barina, Ondrej Klima, and Pavel Zemcik

Brno University of Technology

Transforms for Motion-Compensated Residuals Based on Prediction

Inaccuracy Modeling 583

Xun Cai and Jae S. Lim

Massachusetts Institute of Technology

RKLT-Based Lossless Hyperspectral Image Compression Combined

with Principal Components Selection 584

Hao Chen, Yi Hua, and Shuang Zhou

Harbin Institute of Technology

Compression-Inspired Author Profiling 585

Francisco Claude, Roberto Konow, and Susana Ladra

University Diego Portales, University de Chile, University A Coruna

Grammatical Ziv-Lempel Compression: Achieving PPM-Class Text

Compression Ratios with LZ-Class Decompression Speed 586

Kennon J. Conrad and Paul R. Wilson

Independent Consultant

Quick Access to Compressed Data in Storage Systems 587

Cornel Constantinescu and David Chambliss

IBM Almaden Research Center San Jose

A Fast Splitting Algorithm for an H.264/AVC to HEVC Intra Video Transcoder 588

Antonio J. Díaz-Honrubia', José Luis Martínez', Pedro Cuenca',

and Hari Kalva2

1University of Castilla-La Mancha, 2Florida Atlantic University

- 11 -

StarIso: Graph Isomorphism Through Lossy Compression 589

Jason Fairey and Lawrence Holder

Washington State University

Computational Architecture for Fast Seismic Data Transmission

between CPU and FPGA by Using Data Compression 590

Carlos A. Fajardo', Carlos A. Angulo', Julián G. Mantilla', Iván F. Obregón', Javier Castillo2, César Pedraza3, and Óscar M. Reyes'

'Universidad Industrial de Santander, 2Universidad Rey Juan Carlos, 3Universidad Nacional

Fast Cover Song Retrieval in Advanced Audio Coding Domain

Based on Deep Learning Technique 59'

Jiunn-Tsair Fang', Yu-Ruey Chang2, and Pao-Chi Chang2

'Ming Chuan University, 2National Central University

Delta Encoding of Virtual-Machine Memory in the Dynamic Analysis

of Malware 592

James E. Fowler

Mississippi State University

Network of Spiking Neurons Driven by Compression 593

Alexander Gain' and Lawrence Holder2

'Tulane University, 2Washington State University

HEVC Fast CU Encoding Based Quadtree Prediction 594

Yuan Gao, Pengyu Liu, Yueying Wu, and Kebin Jia†

Beijing University of Technology

Realistic 3D Mesh Compression Based on Predicted Angle-Normal Images 595

Yuan Gao',2, Yunhui Shi', Shaofan Wang', Wenpeng Ding', Jin Wang', and Baocai Yin'

'Beijing University of Technology, 2Beijing Electronic Science

and Technology Institute

Compressed Forensic Source Image Using Source Pattern Map 596

Hamidreza Ghasemi Damavandi', Ananya Sen Gupta', Robert Nelson2, and Christopher Reddy2

'University of Iowa, 2Woods Hole Oceanographic Institution

Fast Acquisition for Quantitative MRI Maps: Sparse Recovery

from Non-linear Measurements 597

Anupriya Gogna and Angshul Majumdar

IIIT Delhi

Connection between DCT and Discrete-Time Fractional Brownian Motion 598

Anubha Gupta' and ShivDutt Joshi2

'Indraprastha Institute of Information, 2Indian Institute of Technology

Analysis and Synthesis Prior Greedy Algorithms for Non-linear

Sparse Recovery 599

Kavya Gupta, Ankita Raj, and Angshul Majumdar

IIIT Delhi

- 12 -

Rate-Distortion Optimized Compression Algorithm for 3D Triangular

Mesh Sequences 600

M. Hachani', A. Ouled Zaid2, and W. Puech'

1University of Tunis El Manar, 2Montpellier University

When Less is More — Using Restricted Repetition Search

in Fast Compressors 601

Danny Harnik, Ety Khaitzin, and Dmitry Sotnikov

IBM Research

Efficient Environmental Temperature Monitoring Using

Compressed Sensing 602

Ali Hashemi', Mohammad Rostami2, and Ngai-Man Cheung1

1Singapore University of Technology and Design, 2University of Pennsylvania

Engineering Wavelet Tree Implementations for Compressed

Web Graph Representations 603

Meng He and Chen Miao

Dalhousie University

Approximate String Matching for Self-Indexes 604

Lukáš Hrbek and Jan Holub

Czech Technical University in Prague

Hardware Based Compression in Big Data 605

Deepak Jain', Gordon McFadden2, and Brian Will2

1Intel Ireland, 2Intel Corporation

Small Polygon Compression 606

Abhinav Jauhri, Martin Griss, and Hakan Erdogmus

Carnegie Mellon University

Opportunities for High-Level Parallelism in Multiview Video Coding 607

Caoyang Jiang and Saeid Nooshabadi

Michigan Tech

Massively Efficient Motion Estimation by Exploiting Inter-Pixel Similarities 608

Caoyang Jiang and Saeid Nooshabadi

Michigan Tech

Decision Zone-Based Parallel Fast Motion and Disparity Estimation Scheme

for Multiview Coding 609

Caoyang Jiang and Saeid Nooshabadi

Michigan Tech

Low-Latency Lossless Compression for Data Bus Using

Multiple-Type Dictionaries 610

Yuki Katsu and Haruhiko Kaneko

Tokyo Institute of Technology

Analysis of a Rewriting Compression System for Flash Memory 611

Shmuel T. Klein' and Dana Shapira2

1Bar Ilan University, 2Ariel University

- 13 -

Multi-mode Kernel-Based Minimum Mean Square Error Estimator

for Accelerated Image Error Concealment 612

Ján Koloda', Jürgen Seiler', Antonio M. Peinado2, and André Kaup' 'Friedrich-Alexander University, 2Universidad de Granada

A Performance Case-Study on Memristive Computing-in-Memory

Versus Von Neumann Architecture 613

Lauri Koskinen, Jari Tissari, Jukka Teittinen, Eero Lehtonen, Mika Laiho, and Jussi H. Poikonen

University of Turku

Textural and Gradient Feature Extraction from JPEG2000 Codestream

for Airfield Detection 614

Cheng Li, Chenwei Deng, and Baojun Zhao

Beijing Institute of Technology

Accelerate Data Compression in File System 615

Weigang Li and Yu Yao

Intel

A New Transform Video Coding Algorithm 616

Jianyu Lin

Curtin University

Deep Convolutional Neural Network for Decompressed Video Enhancement 617

Rongqun Lin, Yongbing Zhang, Haoqian Wang, Xingzheng Wang,

and Qionghai Dai

Tsinghua University

Content Adaptive Interpolation Filters for HEVC Framework 618

Xiaojie Liu, Wenpeng Ding, Yunhui Shi, and Baocai Yin

Beijing Key Laboratory of Multimedia and Intelligent Software Technology

Compression Ratio Design in Compressive Spectral Imaging 619

Jeison Marín', Leonardo Betancur', and Henry Arguello2

1Universidad Pontificia Bolivariana, 2Universidad Industrial de Santander

Overview of the MPEG Activity on Point Cloud Compression 620

Rufael Mekuria' and Lazar Bivolarsky2

1CWI, 2Tata Communications

Novel Algorithm for Stereoscopic Image Quality Assessment 621

Jaime Moreno', Beatriz Jaime', Alessandro Rizzi2, and Christine Fernandez3

1National Polytechnic Institute, 2University of Milan, 3University of Poitiers

A Novel Development Infrastructure for Scalable Video

Coding/Transcoding Applications 622

Vida Movahedi', Amir Asif', Alicia Chin2, Ihab Amer', Zane Zhenhua Hu2, and Yongang Hu2

1York University, 2IBM Canada

A Context-Aware Taxonomy of Deduplication Metrics for Backup Strategies 623

Lilian Noronha Nassif and Janaína Coutinho Mattos

Public Ministry of Minas Gerais

- 14 -

Globally Optimal Algorithms for Transform Selection in Multiple-Transform

Signal Compression 624

Lucas Nissenbaum and Jae S. Lim

Massachusetts Institute of Technology

Leveraging CABAC for No-Reference Compression of Genomic Data

with Random Access Support 625

Tom Paridaens', Jens Panneel', Wesley De Neve',2, Peter Lambert', and Rik Van de Walle'

'iMinds-Ghent University, 2Center for Biotech Data Science GUGC-K

Adaptive Quantization Matrices for HD and UHD Display Resolutions

in Scalable HEVC 626

Lee Prangnell and Victor Sanchez

University of Warwick

Positional Inverted Self-Index 627

Petr Procházka and Jan Holub

Czech Technical University in Prague

Transform Coding for On-the-Fly Learning Based Block Transforms 628

Saurabh Puri', Sebastien Lasserre', Patrick Le Callet2,

and Fabrice Le Léannec'

'Technicolor, 2IRCCyN Université de Nantes

Just Noticeable Difference Based Fast Coding Unit Partition in 3D-HEVC

Intra Coding 629

Hai Ren', Huihui Bai', Chunyu Lin', Mengmeng Zhang2, and Yao Zhao' 'Beijing Jiaotong University, 2North China University of Technology

Generalization of Efficient Implementation of Compression

by Substring Enumeration 630

Shumpei Sakuma, Kazuyuki Narisawa, and Ayumi Shinohara

Tohoku University

Joint Design of Layered Coding Quantizers to Extract and Exploit

Common Information 631

Mehdi Salehifar, Tejaswi Nanjundaswamy, and Kenneth Rose3

University of California, Santa Barbara

Low-Complexity, Backward-Compatible Coding of High Dynamic

Range Images and Video 632

Emanuele Salvucci

ForwardGames S.r.l.

The Rate Loss in Binary Source Coding with Decoder Side Information 633

Andrei Sechelea', Adrian Munteanu', Samuel Cheng2, and Nikos Deligiannis'

'Vrije Universiteit Brussels, 2University of Oklahoma

Interactive Quantization for Extremum Computation in Collocated Networks 634

Solmaz Torabi, Jie Ren, John MacLaren Walsh

Drexel University

- 15 -

Low Delay Complexity Constrained Encoding 635

Thijs Vermeir1, Jürgen Slowack1, Glenn Van Wallendael2, Peter Lambert2, and Rik Van de Walle

1Barco N.V., 2Data Science Lab, Ghent University

Low Complexity Pixel Domain Perceptual Image Compression via Adaptive

Down-Sampling 636

Zhe Wang and Sven Simon

University of Stuttgart

Quality and Error Robustness Assessment of Low-Latency Lightweight

Intra-Frame Codecs 637

Alexandre Willème and Benoit Macq

University Catholique de Louvain

Coefficient-wise Deadzone Hard-decision Quantizer with Adaptive Rounding

Offset Model 638

Haibing Yin, Hongkui Wang, Xiumin Wang, and Zhelei Xia

China Jiliang University

A Novel Algorithm to Decrease the Computational Complexity of HEVC

Intra Coding 639

Mengmeng Zhang, Heng Zhang, and Zhi Liu

North China University of Technology

Author Index 649

- 16 -

Independent Component Analysis for Texture Defect Detection

O. Gökhan Sezer1, Aydın Ertüzün1, Aytül Erçil 2

1 Boğaziçi University, Electrical and Electronics Engineering Department, Istanbul-Turkey

2 Sabancı University, Faculty of Engineering and Natural Sciences, Istanbul-Turkey

ogsezer@hotmail.com, ertuz@boun.edu.tr, aytulercil@sabanciuniv.edu

ABSTRACT

In this paper, a novel method for texture defect detection is presented. The method makes use of Independent Component Analysis (ICA) for feature extraction from the non-overlapping subwindows of texture images and classifies a subwindow as defective or non-defective according to Euclidean distance between the feature obtained from average value of the features of a defect free sample and the feature obtained from one subwindow of a test image. The experimental results demonstrating the use of this method for visual inspection of textile products obtained from a real factory environment are also presented.

I- INTRODUCTION

Defect detection from images plays significant role in quality of manufactured products and its application areas continue to increase. Numerous methods have been proposed for performing this task. Amet et.al. [1] have used sub-band domain co-occurrence matrices for texture defect detection, Karras et.al. [11] have suggested focusing on detecting defects from images' wavelet transformation and vector quantization related properties of the associated wavelet coefficients, Chetverikov et.al. [4] have approached the texture defect detection problem in a more theoretical way, based on regularity and orientation criteria. Chen and Jain used a structural approach to defect detection in textured images. Dewaele et.al. used signal processing methods to detect point defects and line defects in texture images. Cohen et.al. [5] used MRF models for defect inspection of textile surfaces while Erçil et.al. [6] used similar techniques for inspection of painted metallic surfaces. Atalay has implemented MRF model

based method on a TMS320C40 parallel processing system for real-time defect inspection of textile fabrics [2]. For surveys of texture analysis, see Van Gool et.al., Reed et.al., Rao, Tuceryan and Jain [12,-14].

In the work of Hurri [7], different types of texture images are used to examine general characteristics of independent components (ICs) of texture images and results are found inadequate to draw any conclusion about the ICs of texture images. The reason for this comes from the fact that every texture image in the set of texture images brings about its own ICs that describe its environment. Since every texture defines its own environment, finding ICs of a set of texture images of the same type will help us to comprehend the weave structure of the texture by its ICs.

This paper addresses a new application that uses ICA for locating and also partially identifying defects in textile fabric images.

II- ICA FUNDAMENTALS

1- Overview

ICA is a generative model which means that it describes how the observed data x can be

represented as a superposition of independent components sj's.

x=As (1)

where x is the observed vector that consists of the observations xj's, s is the source vector that consists of the independent components si's and A is the mixing matrix. In ICA, vector x is the only a priori known quantity and both A and s are assumed to be unknown. Therefore A and s should be somehow estimated with the information that s is non-gaussian and entries of the s vector are statistically independent. Fortunately, ICA enables us to make use of the given assumptions in the model to estimate both A and s. Once A is estimated, the sources can be computed as;

s=Wx (2)

where W is the (pseudo)inverse of the mixing matrix A and is called the demixing matrix.[8, 9].

2- ICA Model and Sparse Coding for Image Feature Extraction

Sparse coding is closely related to ICA and it represents data by having just a few active units out of larger collection of model vectors. Therefore in sparse coding data is represented by each of the components that is rarely active (i.e., zero most of the time) [10].

Using sparse coding to represent image data in lower dimensions will be a practical feature extraction tool in pattern recognition problems as the sparse coding aims to reduce the redundancy in representing the data.

Sparse coding can be denoted by a linear representation by;

s=Mx (3)

where x is a n-D observed (random) vector, s is m-D linearly transformed vector and M is m x n matrix that linearly transforms x into s.

The relation that exists between sparse coding and ICA is that their data models are related closely as can be seen from Eqs.(2) and (3).

In this paper, ICA or equivalently sparse coding is used for feature extraction from the texture images. Hence columns of A in ICA model represents the features (basis vectors or ICs), and si component of s vector becomes a coefficient of the i-th feature in the observed data x (see equation (1)).

III- METHODOLOGY & SYSTEM DESCRIPTION

A machine vision system generally contains two important parts; (i) feature extraction and (ii) decision making.

In this work, the defect detection system composes of two blocks: first one is the offline block, in which ICs for a set of texture images are extracted and using these basis vectors, feature vectors from defect free images are calculated and are fed into the defect detection

part of the online phase. The second phase is the online block where test images with a wide range of texture defects are handled and their features are extracted by sparse coding method.

Fig. 3.1: Defect detection system block diagram used in this project

In this project, the data set contains one defect free image and 19 defective textile fabric images where each corresponds to a different defect type and one defect free image. In the first stage of the feature extraction, ICs were obtained from this set.

Test images I(n,m) of size 256x256 are assumed to be acquired by a CCD camera in real time and this image is fed through the feature extractor as seen in Figure 3.1. Feature vectors are calculated within local non-overlapping subwindows of size NxN. The choice of subwindow size depends on two factors: 1) how localized the defects are (i.e., size of the defects); and 2) for a non-defective sample how representative of the texture is the data in a window of such size [1]. Second factor is important since the size of the window determines how well the texture is represented. Too small window size will result in ICs, which do not represent texture appropriately. In this project, best results are achieved by using subwindow size of 16x16 so in the rest of the paper we will use this subwindow size. Thus, features extracted from every subwindow which represents a distinct region in image I(n,m) and corresponds to the columns of the SI matrix obtained at the output of the feature extraction part (see Fig. 3.1)

The steps can be summarized as follows:

Off-line (Learning) Phase:

1) Obtain ICA basis vectors from a set of non-defective and defective images

2) Construct the matrix A using the ICA basis vectors as the columns of A

3) Partition a defect-free textile fabric image into sub-windows of NxN.

4) For each sub-window calculate the coefficient vector (the feature vector) using the sparse coding method (Eq.( 5))

5) Compute strue vector by averaging the feature vectors computed for each sub-window.

On-line Feature Extraction Phase:

1) Partition a test image I(p,q), into sub-windows of NxN.

2) For each sub-window calculate the coefficient vector (the feature vector) using the sparse coding method (Eq. (5)).

3) Construct the matrix SI using the feature vector si of each sub-window as the columns of SI

On-line Detection Part:

1) Compute Euclidean distance between each column of matrix SI corresponding to

the feature vector of each sub-window and vector strue.

distance  (strue si)T(strue si)

  1/2 (6)

where si is the i-th column of matrix SI.

2) Classify a sub-window as defective if distance exceeds some threshold value α.

The threshold value is determined by the following formula;

 Dm qIQR (5)

where Dm is the median value of the feature vector of a subwindow (i.e., a column of SI matrix), IQR is the inter quartile range and q is a constant determined experimentally. (For

Gaussian distribution q = 1.67 corresponds to 95% confidence interval.)

IV- IMPLEMENTATION & RESULTS

In this paper, symmetric fixed-point ICA algorithm with tanh(x) nonlinearity is used [15]. By means of ICA hidden factors underlying the fabric images data set are obtained. 16 ICs are used for the analysis. This value is obtained by trial and error.

In the first place, ICA is directly applied to the original images which have many vertical and horizontal stripes representing the weave structure. In this case it is observed that some of the ICs (or equivalently basis vectors) have high frequency characteristics which affects our Euclidean distances (cf. Fig. 4.1 basis vectors 5, 7, 8, 9, 11, 13, and 15), hence the resulting defect detection performance was low.

Fig. 4.1. Independent components of original set

In order to eliminate this problem, IC’s with high frequency characteristics are neglected and distances are calculated for the remaining set of IC’s. These new results are promising and defect detection performance increased but some defects are still missed and there are false alarms as well (Fig. 4.2).

Fig. 4.2: Image on the right side and image on the left side with and without neglecting high freq. ICs respectively (white boxes correspond to subwindows that are detected as defective).

Changing the threshold values did not effectively improve the results. From this experience, we came up with the idea that weave structure of textile fabric image should, somehow, be eliminated while preserving the essential characteristics of the defects. Thereby, median filtering and some histogram modification operations are performed consecutively (see Fig. 4.3). For median filtering, a 3X3 mask is used where as histogram modification includes intensity level slicing, setting all pixels with gray level values bigger than 200 as 200 and then increasing the brightness, thus removing the underlying texture.

This pre-processing operation is carried out for all images and new set of images, from which IC’s without high frequency characteristics are extracted, are obtained (Fig.4.4).

Fig. 4.3. Image on the left side is obtained after pre-processing and image on the right side is the result of our algorithm (black boxes correspond to subwindows that are detected as

defective).

These ICs, in fact, correspond to various structures present in the texture of fabric. In the resulting set of ICs vertical bars in basis vectors 1, 9, and 11 can be though as the consequence of vertical defects and vertical stripes in the textile fabric structure, whereas horizontal bar in basis vectors 2, 3, 7, 8, 13, and 16 can be though as the consequence of horizontal defects and horizontal stripes in the textile fabric structure. Also there are spot-like structures in the basis vectors 10, 12, 14, and 15, which can be resulted because of partially

captured defects, or small holes in the texture. There are also some other basis vectors, which represent the typical weave characteristics such as basis vectors 4, 5 and 6.

Fig 4.4: Independent components of pre-processed set

In the light of this kind of cause and effect relation between structure of texture and ICs, a defected region in an image is expected to have one or more dominant basis vectors that describe the defect (refer to Fig. 4.5).

Figure 4.5 illustrates the basis vector coefficients from a defective sub-image, window 40 in Figure 4.3b. In Fig. 4.5, it is observed that coefficient of the 11th basis vector is dominant among the other basis vectors. This basis vector has a vertical-bar shape as can be seen in Fig. 4.4. Thus, vertical defect in a fabric image activated vertical-bar-shaped IC’s in the set of basis vectors. Hence, defect type identification can also be achieved with this method.

Fig. 4.5 Bar graph showing the coefficients of the corresponding basis vectors (or

equivalently ICs ).

In Fig. 4.6 source profile of a defect free subwindow (subwindow 49) of the same image is given. Observe that there is no dominant basis vector in the 49th subwindow and coefficient values are close to zero. Compare the coefficients of 49th and 40th subwindows. (Note that the scales of Figures 4.5 and 4.6 are different.)

Fig. 4.6 Bar graph showing the coefficients of the corresponding basis vectors (or

equivalently ICs).

Detection Rates:

Defect detection rates are calculated by the following formula,

CR  100 x(NCC NDD) / NTotal (6)

where NCC is the number of correctly classified nondefective subwindows, NDD is the number of correctly classified defective subwindows and NTotal is the total number of subwindows being tested.

i- First Method: 88.59 % (ICA applied directly to original set without neglecting ICs with high frequency characteristics, and i1=1.0)

ii- Second Method: 89.41 % (ICA applied directly to original set with neglecting ICs with high frequency characteristics, and i1=1.0)

iii- Third Method: 96.74 % (ICA applied to pre-processed image set, and i1=1.5)

V- CONCLUSION

A new methodology for defect detection is developed which can also be applied to defect detection. Compared to previous detection rates in [1] that are varying between 85 and 92 per cent, ICA enables better detection with 4-5 per cent overall increase. Besides that this new method has very low real time computational requirements, since the online part of the computations involves just a simple matrix multiplication.

In conclusion, we could say that the proposed defect detection by ICA method is a promising approach suitable for a real time inspection system in textile industry.

REFERENCES:

[1] A. Latif-Amet, A. Ertüzün, A. Erçil,"An efficient method for texture defect detection:

subband domain co-occurrence matrices", Image and Vision Computing 18

(2000) 543- 553

[2] A. Atalay," Automated Defect Inspection of Textile Fabrics Using Machine Vision Techniques ", M.S. Thesis, Boğaziçi University, 1995.

[3] J. Chen, A.K. Jain, " A Structural Approach to Identify Defects in Textural Images.", Proceedings of IEEE International Conference on Systems, Man and Cybernetics, pp 29-32, Beijing, 1988

[4] D. Chetverikov, A. Hanbury, "Finding defects in texture using regularity and local orientation", Pattern Recognition, vol.35, pp.203-218, 2002.

[5] F.S. Cohen, , Z. Fan, and S. Attali, “Automated Inspection of Textile Fabrics Using Textural

Models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 13,

No. 8, pp. 803-808, August 1991.

[6] A. Ercil, and B. Özüyılmaz, “Automated Visual Inspection of Metallic Surfaces,” Proceedings of The Third International Conference on Automation, Robotics and Computer Vision (ICARCV'94), pp. 1950-1954, Singapore, November 1994.

[7] J. Hurri, "Independent Component Analysis of Image Data", MS thesis, Helsinki University

of Technology, Helsinki, Finland, 1997.

[8] A. Hyvarinen, "Survey on Independent Component Analysis", Helsinki University of Technology, Finland, 1999.

[9] A. Hyvarinen, E. Oja "Independent Component Analysis: A tutorial", Helsinki University of Technology, Finland, 1999.

[10] A. Hyvarinen, E. Oja, P. Hoyer, and J. Hurri, "Image Feature Extraction by Sparse Coding

and Independent Component Analysis", Helsinki University of Technology,

Finland.

[11] D.A. Karras, S.A. Karkanis, D.K. Iakovidis, D.E. Maroulis and B.G. Mertzios, "Improved

Defect Detection in Manufacturing Using Novel Multidimensional Wavelet Feature

Extraction Involving Vector Quantization and PCA Techniques." 8th Panhellenic

Conference on Informatics, Proceedings Nov. 7-10, Nicosia, Cyprus, 2001.

[12] T.R. Reed, J.M. Hans Du Buf, " A Review of Recent Texture Segmentation and Feature Extraction Techniques " A Review of Recent Texture Segmentation and Feature Extraction Techniques," CVGIP: Image Understanding, Vol. 57, No. 3, pp. 359-372, May 1993.

[13] M. Tuceryan, A. Jain, "Texture Analysis" The Handbook of Pattern Recognition and Computer Vision, by C.H. Chen, L.F. Pau, P.S.P. Wang (eds.) World Scientific Publishing Co., 1993.

[14] L. Van Gool, P. Dewaele, and A.Oosterlinck " Survey: texture analysis anno 1983." Computer Vision, Graphics, and Image Processing, Vol. 29, pp. 336-357, 1985.

[15] http://www.cis.hut.fi/projects/ica/imageica

Jasa Konsultasi Disertasi Keuangan (1) dan Perbankan S3, hubungi WA 0821 2230 7021

Selasa, 09 Agustus 2022

Contoh paper teknik untuk anda: Computing for Sustainable Global Development

Tidak ada komentar:

Posting Komentar