# Algorithm Development for Sparse Signal Recovery and Performance Limits Using Multiple-User Information Theory

TABLE OF CONTENTS Signature Page...................................iii Dedication......................................iv Epigraph.......................................v Table of Contents..................................vi List of Figures....................................x List of Tables....................................xii Acknowledgements.................................xiii Vita and Publications................................xvii Abstract of the Dissertation.............................xix Chapter 1 Introduction.............................1 1.1 Background..........................3 1.1.1 Sparse Signal Recovery with SMV..........3 1.1.2 Sparse Signal Recovery with MMV.........5 1.2 Main Contributions of the Thesis...............7 1.2.1 Performance Limits of Support Recovery......7 1.2.2 AlgorithmDesign and Analysis Using Multiuser In- formation Theoretic Techniques...........8 1.2.3 Robust Linear Regression..............10 1.2.4 Adaptive Filtering Algorithms with Sparsity Concerns 11 1.3 Thesis Outline.........................11 Chapter 2 Performance Limits of Support Recovery Using Multiuser Infor- mation Theory............................13 2.1 Introduction..........................13 2.2 Formal Deﬁnition of the Problem...............15 2.3 A Multiuser Information Theoretic Perspective on Sparse Signal Recovery........................17 2.3.1 Brief Review on the Single-Input Multiple-Output MAC.........................17 2.3.2 Similarities to the Problemof Support Recovery..19 2.3.3 Key Differences....................20 2.4 Support Recovery with SMV:Main Results and Implications 21 2.4.1 Fixed Number of Nonzero Entries..........22 vi

2.4.2 Growing Number of Nonzero Entries........24 2.4.3 Further Discussions..................27 2.5 Support Recovery with MMV:Main Results and Implications 30 2.5.1 Main Results.....................30 2.5.2 The Low-Noise-Level Scenario...........32 2.5.3 The Role of the Nonzero Signal Matrix.......35 2.5.4 A Generalization of W................38 2.5.5 Relation to Performance Limit of Multivariate Group Lasso.........................40 2.6 RandomNonzero Entries...................41 2.7 Acknowledgements......................44 2.8 Appendices..........................45 2.8.1 Proof of Theorem1..................45 2.8.2 Proof of Theorem2..................61 2.8.3 Proof of Theorem3..................67 2.8.4 Proof of Theorem4..................72 2.8.5 Proof of Theorem5..................75 2.8.6 Proof of Theorem6..................94 2.8.7 Proof of Corollary 4.................100 Chapter 3 AlgorithmDesign and Analysis Using Multiuser Information The- oretic Perspectives.........................104 3.1 Introduction..........................105 3.2 The MultiPass Algorithmic Framework and the MultiPass Lasso Algorithms.......................107 3.2.1 Motivation of the MultiPass Algorithmic Framework 107 3.2.2 The MultiPass Lasso Algorithm(MPL).......108 3.2.3 Observations on MPL................109 3.2.4 Preliminary Analysis of MPL............110 3.2.5 The Reweighted MultiPass Lasso Algorithm(RMPL) 112 3.3 Experimental Study......................113 3.3.1 MPL:Selection of Regularization Parameter λ (l) ..114 3.3.2 RMPL:Selection of ϵ and q max ............115 3.3.3 MPL and RMPL:Selection of s max ..........115 3.3.4 Comparison with BP,Lasso,and Reweighted ℓ 1 Min- imization.......................117 3.3.5 Computational Efﬁciency of MPL and RMPL....118 3.4 Performance Limit of Orthogonal Matching Pursuit.....119 3.4.1 Orthogonal Matching Pursuit (OMP)........120 3.4.2 Connection to Successive Interference Cancellation.121 3.4.3 Intuitive Necessary Condition for OMP to Succeed.121 3.4.4 Observations on the Performance Limit of OMP..124 3.4.5 Experiments.....................125 vii

3.5 Acknowledgements......................129 3.6 Appendices..........................129 3.6.1 Probability Lower Bound for Lasso.........129 3.6.2 Derivation of Probability Lower Bound for MPL..132 Chapter 4 Robust Linear Regression by Exploiting the Connection to Sparse Signal Recovery...........................141 4.1 Introduction..........................142 4.1.1 Background......................143 4.2 The Two-Component Model of Measurement Noise.....144 4.3 Sparse Signal Recovery Algorithms for Robust Linear Re- gression............................146 4.3.1 Maximuma Posteriori (MAP) Based Robust Regres- sion..........................146 4.3.2 Empirical Bayesian Inference Based Robust Regres- sion..........................147 4.4 Experiments..........................150 4.4.1 Simulated Data Sets.................150 4.4.2 Brownlee’s Stackloss Data Set............152 4.4.3 Bupa Liver Data Set.................154 4.5 Acknowledgements......................155 Chapter 5 LMS Type Adaptive Filtering Algorithms that Incorporate Sparsity 156 5.1 Introduction..........................157 5.2 ProblemFormulation and Background............161 5.2.1 ProblemFormulation.................161 5.2.2 Background on Adaptive Filters Exploiting Sparsity 162 5.3 An Algorithmic Framework for Adaptive Filtering with Spar- sity Concerns.........................166 5.3.1 Prototype LMS Adaptive Filtering Algorithm....166 5.3.2 Relation to ANG Algorithms.............170 5.3.3 Prototype NLMS Adaptive Filtering Algorithm...173 5.3.4 Steady-State Performance Analysis of Prototype Al- gorithms.......................173 5.4 The pALMS and pANLMS Algorithms...........179 5.4.1 Derivation and Discussion of pALMS........179 5.4.2 Steady-State Performance Analysis of pALMS...182 5.4.3 Derivation and Discussion of pANLMS.......183 5.4.4 Steady-State Performance Analysis of pANLMS..185 5.5 Experiments..........................186 5.5.1 Experimental Justiﬁcation of Theorem10 for pALMS 187 5.5.2 The Effect of Parameters...............187 5.5.3 Comparisons to Existing LMS Type Algorithms...191 viii

5.5.4 Comparisons to Existing NLMS Type Algorithms..197 5.6 Acknowledgements......................199 5.7 Appendices..........................200 5.7.1 Proof of Theorem9..................200 5.7.2 Derivation of a General NLMS Type Algorithm Us- ing Afﬁne Scaling Transform.............201 5.7.3 Proofs of Theorems 10 and 11............202 Chapter 6 Concluding Remarks........................209 6.1 Summary of Contributions..................209 6.2 Suggestions for Future Research...............211 Bibliography....................................215 ix

LIST OF FIGURES Figure 3.1:Comparisons of performance lower bounds for Lasso and MPL.Pa- rameters: (a) n = 800,σ 2 = 1,δ = 3,x high = 20,k h = 10,x low = 10,k l = 5. (b) n = 800,m= 10 5 ,x high = αx low ,σ 2 = 1,δ = 3,k h = 10,k l = 5.112 Figure 3.2:The role of γ in λ (l) for MPL.....................114 Figure 3.3:Parameter selection for RMPL.....................115 Figure 3.4:The role of s max for MPL and RMPL.In each case,the upper plot shows support recovery rate,whereas the lower plot shows success rate...................................116 Figure 3.5:Sparse signal recovery in different settings.In each case,the up- per plot shows support recovery rate,whereas the lower plot shows success rate..............................117 Figure 3.6:Comparison of algorithmefﬁciency.Each column presents the per- formance and the corresponding run time of algorithms implemented via the same Lasso solver speciﬁed in the title.The upper row indi- cates the empirical probability of success,and the lower row illus- trates the average time consumed per experiment...........119 Figure 3.7:Capacity region of a 2-sender Gaussian MAC.............122 Figure 3.8:Performance of OMP in different scenarios.In (a) (b) (c):m = 10 4 ,k = 7.In (d):n = 100,m= 1000,k = 7,w i = 0.4,i ∈ [k]...127 Figure 3.9:The performance of support recovery of OMP with large measure- ment matrices.(m= ⌊2 0.2n ⌋,k = 7).................128 Figure 4.1:Empirical bias and variance (Symmetric outlier case)........151 Figure 4.2:Empirical bias and variance (Asymmetric outlier case.Legends are the same with Figure 4.1.)......................152 Figure 4.3:Index plots for different regression algorithms.The interval [−2.5,2.5] is marked by red lines for inspecting outliers.............153 Figure 5.1:The application scenario for experimental study...........186 Figure 5.2:Experimental justiﬁcation of the theoretic estimation of MSE for pALMS.In each plot,the horizontal lines indicates MSE predicted by the theorems.For each choice of µ,the color of the predicted MSE matches the color of the empirical learning curve of pALMS.188 Figure 5.3:The effect of p in pALMS.......................189 Figure 5.4:The effect of β in pALMS.......................190 Figure 5.5:The effect of p in pANLMS......................191 Figure 5.6:Performance comparison of LMS type algorithms.(white input)..193 Figure 5.7:pALMS:Mean of predictor (p = 1).(white input)..........193 Figure 5.8:pALMS:Mean of predictor (p = 1.4).(white input)........194 Figure 5.9:Performance comparison of LMS type algorithms.(correlated input) 196 x

Figure 5.10:pALMS:Mean of predictor (p = 1).(correlated input).......196 Figure 5.11:pALMS:Mean of predictor (p = 1.2).(correlated input)......197 Figure 5.12:Performance comparison of NLMS type algorithms.(white input).198 Figure 5.13:Performance comparison of NLMS type algorithms.(correlated input)199 xi

LIST OF TABLES Table 2.1:Sufﬁcient Conditions for Support Recovery in Different Sparsity Re- gions..................................26 Table 2.2:Sufﬁcient Conditions for Support Recovery in Existing Literature..26 Table 2.3:Necessary Conditions for Support Recovery.............27 Table 2.4:Bounds under different scenarios....................37 Table 4.1:Regression results for LS and Alg1...................153 Table 4.2:Regression results for Bupa liver data set................154 Table 4.3:F-values of different algorithms....................154 Table 4.4:Final regression results.........................155 xii

ACKNOWLEDGEMENTS Working toward a Ph.D.is a journey. First and foremost,I amdeeply indebted to my advisor,Professor Bhaskar Rao, who has been extremely patient and helpful during my study.Professor Rao has taught me everything about doing research and,more importantly,being a professional.He also gave me lots of freedom in choosing the research area of my interest,allowing me to grow to the best of my capabilities.From his guidance,I learned that the value of research is the intellectual progression and expansion for us to push the boundaries of our knowledge.The expertise he generously shared with me,and industry experiences he helped me to obtain,have become my greatest assets. Another great pleasure during my Ph.D.study is information theory.This is an area which I found difﬁcult to grasp in the beginning but extremely powerful as my understanding developed.Professor Young-Han Kim demonstrated the power of network information theory in his lectures,where the techniques I learned became the foundation of my research.Professor Kim walked me into the domain of information theory and inspired me to appreciate its beauty.He taught me many useful mathematical tools,especially the large deviation theory which made possible my theoretical work.He also patiently helped me to improve my presentation skills,particularly on effectively communicating complex ideas. I would like to thank other members on my Ph.D.committee,Professors Ken Kreutz-Delgado,Sanjoy Dasgupta,William Hodgkiss,and Dr.Scott Makeig for their xiii

insightful discussions during my thesis proposal and many research meetings.Mean- while,I thank Professor Nuno Vasconcelos for generously permitting me access to the resources at Statistical Visual Computing Lab during my ﬁrst months in UCSD.The servers named after mathematicians in his lab enabled my timely completion of the costly EM iterations for the Cheetah project.Professor Truong Nguyen and his group granted me the access to the printer in the Video Processing Lab,which has printed me several (indeed,many) big piles of papers over these years.Professor Rajesh Hegde helped me a great deal when I started my study as a Ph.D.student.Under his guidance, I coauthored my ﬁrst research paper ever.Ananthapadmanabhan Kandhadai and Jeff (Pengjun) Huang at Qualcomm kindly offered me several summer internships to prac- tice my expertise in signal processing on commercialized speech coders.Dr.Kuansan Wang at Microsoft Research guided me to solve web service problems using signal pro- cessing techniques and led me to a promising industry.I am very grateful for all their help,supports,and career advices. I amvery lucky to meet my girlfriend,Yuan Zhi,during my Ph.D.study.To all, she exudes kindness,laughter,and strength.To me,also the light in my difﬁcult times and the determination in standing by all my decisions. Lots of friends have helped me and entertained my life here.I gained lots of analytical perspectives from many senior students including Dashan Gao,Jun Zheng, Junwen Wu,Min Li,Yanhua Mao,Jing Zhu,Dayou Zhou,Wenyi Zhang,Chengmo Yang,Shankar Shivappa,Yogananda Isukapalli,Ling Zhang,Zhou Lu,Lingyun Zhang, Honghao Shan,Yushi Shen,Xinran Wu,Hairuo Zhuang,and Meng-Ping (Ben) Kao.I xiv

want to thank my lab members,Zhilin Zhang,Liwen Yu,Ali Masnadi-Shirazi,Ethan Duni,Bongyong Song,Anh Nguyen,Eddy (Hwan Joon) Kwon,for stimulating discus- sions.Meanwhile,I enjoyed lots of chats with Renshen Wang,Wanping Zhang,Bo Hu, Luo Gu,Yi Jing,Yan Gao,Mingjing Chen,Liang Feng,David Wipf,Ali Karimian, Matt Pugh,Natan Jacobson,Emanuele Coviello,Paul Cuff,Hamed Masnadi-Shirazi, Alex Rasmussen,and Ozan Koyluoglu.A reward for staying long enough here is that I got to know many friends who came after my joining UCSD.It is my pleasure to meet Fei Liu,Wensong Xu,Yu Xiang,Lele Wang,Minghai Qin,Chiao-Yi Chen,Shengjun Pan,Hao Zheng,Jianjian Gao,Sheusheu Tan,Peng Du,Daqian Jin,Peng Wang,En- ming Luo,Yanqin Jin,Yupeng Fu,Jing Zheng,and Yujia Wang.Discussions with them are always entertaining and inspiring. None of my career advances can be achieved without the selﬂess love and the unconditional support frommy parents.This work is the best present that embodies my love and appreciation for them. The material of this thesis has been published,is submitted for possible publica- tion,or is in preparation for submission. Chapter 2 is,in part,a reprint of material in the paper “Limits on Support Re- covery of Sparse Signals via Multiple Access Communication Techniques,” Y.Jin,Y.-H. Kim,B.D.Rao,submitted to IEEE Transactions on Information Theory,2010,the pa- per “Performance Tradeoffs for Exact Support Recovery of Sparse Signals,” Y.Jin,Y.-H. Kim,and B.D.Rao,published in the proceedings of the International Symposium on Information Theory,2010,the paper “Sparse Signal Recovery in Presence of Multiple xv

Measurement Vectors,” Y.Jin and B.D.Rao,in preparation for IEEE Transactions on Information Theory,and the paper “On the Role of the Properties of the Non-zero En- tries on Sparse Signal Recovery,” Y.Jin and B.D.Rao,published in proceedings of the Asilomar Conference on Signals,Systems and Computers,Nov.2010.In all cases,I was the primary author,and B.D.Rao supervised the research.Y.-H.Kimcontributed to the paper “Limits on Support Recovery of Sparse Signals via Multiple Access Communica- tion Techniques” and the paper “Performance Tradeoffs for Exact Support Recovery of Sparse Signals.” Chapter 3 is,in part,a reprint of the paper “MultiPass Lasso Algorithms for Sparse Signal Recovery,” Y.Jin and B.D.Rao,published in the proceedings of the In- ternational Symposiumon Information Theory,2011,and the paper “Performance Limit of Matching Pursuit Algorithms,” Y.Jin and B.D.Rao,published in the proceedings of the International Symposiumon Information Theory,July 2008.In both cases,I was the primary author,and B.D.Rao supervised the research. Chapter 4 is,in part,a reprint of the material published as “Algorithms for Robust Linear Regression by Exploiting the Connection to Sparse Signal Recovery,” Y.Jin,B. D.Rao,International Conference on Acoustics,Speech,and Signal Processing,March 2010.I was the primary author,and B.D.Rao supervised the research. Chapter 5 is,in part,a reprint of the paper “LMS Type Adaptive Filtering Al- gorithms that Incorporate Sparsity,” Y.Jin,B.Song,and B.D.Rao,submitted to IEEE Transactions on Signal Processing.I was the primary author,B.Song contributed to the research,and B.D.Rao supervised the research. xvi

VITA AND PUBLICATIONS 2005 B.S.in Computer Science and Technology,Tsinghua University, Beijing,China. 2008 Research Intern,Internet Services Research Center,Microsoft Research,Redmond,WA. 2009 Intern,Multimedia R&D and Standard Group,Qualcomm Inc. San Diego,CA. 2005-2011 Research Assistant,University of California,San Diego,La Jolla, CA. 2011 Ph.D.in Electrical Engineering (Signal and Image Processing), University of California,San Diego,La Jolla,CA. Y.Jin,Y.-H.Kim,and B.D.Rao,“Limits on Support Recovery of Sparse Signals via Multiple Access Communication Techniques,” submitted to IEEE Transactions on In- formation Theory,2010. Y.Jin,B.Song,and B.D.Rao,“LMS Type Adaptive Filtering Algorithms that Incor- porate Sparsity,” submitted to IEEE Transactions on Signal Processing,2011. Y.Jin and B.D.Rao,“Sparse Signal Recovery in Presence of Multiple Measurement Vectors,” in preparation for IEEE Transactions on Information Theory. Y.Jin and B.D.Rao,“MultiPass Lasso Algorithms for Sparse Signal Recovery,” Inter- national Symposium on Information Theory,July 2011. Y.Jin and B.D.Rao,“On the Role of the Properties of the Non-zero Entries on Sparse Signal Recovery,” Asilomar Conference on Signals,Systems and Computers,November 2010. Y.Jin,Y.-H.Kim,and B.D.Rao,“Performance Tradeoffs for Exact Support Recovery of Sparse Signals,” International Symposium on Information Theory,June 2010. Y.Jin and B.D.Rao,“Algorithms for Robust Linear Regression by Exploiting the Con- nection to Sparse Signal Recovery,” International Conference on Acoustics,Speech,and Signal Processing,March 2010. Y.Jin and B.D.Rao,“Performance Limit of Matching Pursuit Algorithms,” Interna- tional Symposium on Information Theory,July 2008. xvii

Y.Jin and B.D.Rao,“Insights into the Stable Recovery of Sparse Solutions in Overcom- plete Representations Using Network Information Theory,” International Conference on Acoustics,Speech,and Signal Processing,April 2008. R.M.Hegde,Y.Jin,and B.D.Rao,“Spectral Estimation of Voiced Speech Using a Family of MVDR Estimates,” International Conference on Acoustics,Speech,and Signal Processing,April 2007. xviii

ABSTRACT OF THE DISSERTATION AlgorithmDevelopment for Sparse Signal Recovery and Performance Limits Using Multiple-User Information Theory by Yuzhe Jin Doctor of Philosophy in Electrical Engineering (Signal and Image Processing) University of California,San Diego,2011 Bhaskar D.Rao,Chair The problemof sparse signal recovery can be formulated as a problemof ﬁnding a sparse solution X∈ R m to an underdetermined systemof equations Y = AX+Z where A ∈ R n×m ,n < m,is the measurement matrix,Z ∈ R n is the measurement noise,and Y ∈ R n is the noisy measurement.Let k be the number of nonzero entries in xix

X.Then,Xis sparse when k ≪m. First,we study the performance limits of support recovery,i.e.,the recovery of the locations of the nonzero entries of X.By connecting sparse signal recovery to com- munication over a multiple access channel (MAC),we show that MAC capacity region sheds light on the performance limits of support recovery.Sufﬁcient and necessary con- ditions are derived to reveal the role of the model parameters,such as the dimension of the signal m,the number of measurements n,and especially the magnitudes of nonzero entries of X,in ensuring asymptotically successful support recovery.By drawing an analogy with the single-input multiple-output MAC,the information theoretic results are extended to the multiple measurement vectors (MMV) problem to shed light on the role of multiple measurements. Next,we propose a multiple-pass algorithmic framework which exploits the va- riety in the dynamic range of nonzero entries.Speciﬁcally,nonzero entries with similar magnitudes are jointly detected as a group and different groups are detected sequentially. The MultiPass Lasso algorithm and its variant are studied in detail,with experimental results demonstrating the performance improvement over their one-pass counterparts, respectively,in reconstruction accuracy and computational complexity. We also examine two novel applications of sparse signal recovery.For robust linear regression,we model the outliers in the observations,which occur infrequently, as the nonzero entries of a sparse vector,and derive methods rooted in sparse signal re- covery to tackle this problem.For adaptive ﬁltering with sparse predictors,we employ a general diversity measure for sparse signal recovery to develop a prototype least-mean- xx

squares (LMS) type algorithm,in which the optimization is performed in the afﬁne scaling domain to expedite the convergence.Steady-state analysis is conducted to pro- vide insight into the performance.As two instantiations,the pALMS and pANLMS algorithms are studied in detail.Experimental results support the potentials of sparse signal recovery techniques in both applications. xxi

Chapter 1 Introduction The problemof sparse signal recovery can be formulated as a problemof ﬁnding a sparse solution to an underdetermined systems of equations Y = AX+Z (1.1) where X ∈ R m is the signal of interest,A ∈ R n×m is the measurement matrix with n < m,and Z ∈ R n is the measurement noise,and Y ∈ R n is the noisy measurement. Let k denote the number of nonzero entries of X,and X is said to be sparse when k ≪m.This problem with model (1.1) is usually referred to as sparse signal recovery with single measurement vector (SMV). When being considered in an application context,the rows or columns of Ausu- ally forma physically meaningful model,and the nonzero elements of Xare usually of interest and need to be identiﬁed.As one example,in the problem of the estimation of 1

2 a wireless multipath channel [5,29],each row of A represents a segment of the train- ing sequence,and X represents the coefﬁcients of the sparse multipath channel to be estimated.As another example,in the applications of electroencephalography (EEG) and magnetoencephalography (MEG) [59,60],the columns of A can represent the in- stantaneous impulse responses of the brain sources on the EEG or MEG sensors,and identifying the nonzero entries in the vector X can be viewed as localizing the neu- ronal activities for a given observation Y.Recently,the development of compressed sensing [16,39] provides the ﬂexibility in designing the matrix A.In this case,each row of A can be randomly generated according to certain probability distribution,and hence Y contains n random projections of X onto the rows of A.The ability to re- cover a sparse vector X from as few random projections as possible offers a universal approach for signal acquisition and compression,which are independent of the domain speciﬁc structures of the signals.Further,(1.1) also serves as the underlying mathe- matical model for many other applications,such as image processing [71],[40],robust face recognition [135],bandlimited extrapolation and spectral estimation [13],speech processing [27],echo cancellation [41],[101],body area networks [53],and wireless communication [62]. It is worthwhile to note the growing recognition of the importance of the area of sparse signal recovery as evidenced by numerous special research journal issues, conference workshops and sessions.One of the earliest full sessions,if not the ﬁrst session,on sparse signal recovery was organized by Yoram Bresler and Bhaskar Rao at ICASSP,1998.Since then,many conference sessions and journal issues have been

3 dedicated to the issue of sparsity.Some recent notable examples of such include several tutorial sessions at ITA Workshop,2008;a panel session at ICASSP,2008;a special issue on IEEE Journal of Selected Topics in Signal Processing,2010;and a special issue IEEE Journal of Selected Topics in Signal Processing,2011.Overall,sparse signal recovery has become an important and promising area with both theoretic and practical potentials. 1.1 Background 1.1.1 Sparse Signal Recovery with SMV For the problem of sparse signal recovery with SMV,computationally efﬁcient algorithms have been proposed to ﬁnd or approximate the sparse solution X ∈ R m in various settings.A partial list includes matching pursuit [82],orthogonal matching pur- suit (OMP) [94],Lasso [113],basis pursuit [24],FOCUSS [60],iteratively reweighted ℓ 1 minimization [20],iteratively reweighted ℓ 2 minimization [21],sparse Bayesian learn- ing (SBL) [114,130],ﬁnite rate of innovation [120],CoSaMP [88],and subspace pur- suit [34].Analysis has been developed to shed light on the performances of these practi- cal algorithms.For example,Donoho [39],Donoho,Elad,and Temlyakov [36],Cand ` es and Tao [19],and Cand ` es,Romberg,and Tao [18] presented sufﬁcient conditions for ℓ 1 -norm minimization algorithms,including basis pursuit and its variant in the noisy setting,to successfully recover the sparse signals with respect to different performance metrics.Wainwright [123] and Zhao and Yu [140] provided sufﬁcient and necessary

4 conditions for Lasso to recover the support of the sparse signal,i.e.,the set of indices of the nonzero entries.Tropp [115],Tropp and Gilbert [116],and Donoho,Tsaig,Drori, and Starck [37] studied the performances of greedy sequential selection methods such as matching pursuit and its variants.On the other hand,from an information theoretic perspective,a series of papers,for instance,Wainwright [122],Fletcher,Rangan,and Goyal [51],Wang,Wainwright,and Ramchandran [124],and Akc¸akaya and Tarokh [2], provided sufﬁcient and necessary conditions to indicate the performance limits of opti- mal algorithms for support recovery,regardless of computational complexity. In addition to the batch estimation techniques,which are based on blocks of mea- surements,the development of adaptive algorithms with sparsity concerns has recently become an area of growing interest.The advantages of adaptive algorithms include their relatively lower computational complexity and their ability to track the system dynamics.As a notable example,the proportionate NLMS (PNLMS) algorithmwas de- veloped by Duttweiler [41] to deal with the sparse channel estimation that arises in echo cancellers.Although this algorithm was not formally derived by minimizing an under- lying objective function,it was well motivated with theoretical analysis and simulations to support its effectiveness in adaptively estimating the sparse predictor.Other recent adaptive ﬁltering algorithms with sparsity concerns include PNLMS++ [54],improved PNLMS (IPNLMS) [7],improved IPNLMS (IIPNLMS) [33],zero attracting LMS (ZA- LMS) and the reweighted zero attracting LMS (RZA-LMS) [25,26],and ℓ 0 -LMS and ℓ 0 -NLMS [61].Performance analysis are derived for the algorithms above to support their usage in practical applications.

5 1.1.2 Sparse Signal Recovery with MMV An emerging trend in the area of sparse signal recovery is the capability of col- lecting multiple measurement vectors in an increasing number of applications,such as EEG and MEG [128,136],blind source separation [28],source localization [80],multi- variate regression [89],and direction of arrival estimation [111].This gives rise to the problemof sparse signal recovery with multiple measurement vectors (MMV) [30],for which the model is given by Y = AX +Z (1.2) where X ∈ R m×l is the (matrix) signal of interest,A ∈ R n×m is again the measurement matrix,Z ∈ R n×l is the measurement noise,and Y ∈ R n×l is the noisy measurement, for l > 1.Let k be the number of nonzero rows in X,and X is sparse when k ≪m. Practical algorithms have been developed to address the new challenges in this scenario.One class of algorithms for solving the MMV problem can be viewed as straightforward extensions based on their counterparts for the SMVproblem.To sample a few,M-OMP [29,117],M-FOCUSS [29],ℓ 1 /ℓ 2 minimization method 1 [46],multi- variate group Lasso [89],and M-SBL [134] can be all viewed as examples of this kind. Another class of algorithms additionally make explicit effort to exploit the structure underlying the sparse signal X,such as the temporal correlation or the autoregressive 1 This method is sometimes referred to as ℓ 2 /ℓ 1 minimization,due to the naming convention in a speciﬁc paper.In this thesis,we use ℓ 1 /ℓ p to indicate a cost of a matrix B which is deﬁne as ∑ i |( ∑ j |b i,j | p ) 1/p |.

6 nature across the columns of X which would be otherwise unavailable when l = 1,to aim for better performance of sparse signal recovery.For instance,the improved M- FOCUSS algorithms [136],auto-regressive sparse Bayesian learning (AR-SBL) [138], and T-SBL [139] all have the capability of explicitly taking advantage of the structural properties of X to improve the recovery performance.Alone side the algorithmic ad- vancement,a series of work have been focusing on the theoretical analysis to support the effectiveness of existing algorithms for the MMV problem.We brieﬂy divide these results into two categories.The ﬁrst category of theoretic analysis aims at practical algorithms for sparse signal recovery with MMV.For example,Chen and Huo [23] dis- covered the sufﬁcient conditions for ℓ 1 /ℓ p norm minimization method and orthogonal matching pursuit to exactly recover every sparse signal within certain sparsity level in the noiseless setting.Eldar and Rauhut [47] also analyzed the performance of sparse signal recovery using the ℓ 1 /ℓ 2 norm minimization method in the noiseless setting,but the sparse signal was assumed to be randomly distributed according to certain probabil- ity distribution and the performance was averaged over all possible realizations of the sparse signal.Obozinski,Wainwright,and Jordan [89] provided sufﬁcient and necessary conditions for multivariate group Lasso to successfully recover the support of the sparse signal 2 in the presence of measurement noise.The second category of theoretic analysis are of an information theoretic nature,and explore the performance limits that any al- gorithm,regardless of computational complexity,could possibly achieve.In this regard, Tang and Nehorai [111] employed a hypothesis testing framework with the likelihood 2 We refer to the support of a matrix X as the set of indices corresponding to the nonzero rows of X.

7 ratio test as the optimal decision rule to study howfast the error probability decays.Suf- ﬁcient and necessary conditions are further identiﬁed in order to guarantee successful support recovery in the asymptotic sense. 1.2 Main Contributions of the Thesis The main contributions of the thesis focus on the theory,algorithms,and appli- cations of sparse signal recovery.We brieﬂy preview these aspects in this section. 1.2.1 Performance Limits of Support Recovery The problemof support recovery of sparse signals concerns the identiﬁcation of the locations of the nonzero entries of X in the SMV problem (or the nonzero rows of X in the MMVproblem).This problemarises in applications where the locations of the nonzero entries capture important information about the underlying signal activity.As an example,in medical imaging applications such as MEG and EEG,it is desirable to localize active neuronal activities which corresponds to the detection of the locations of the nonzero entries of X. To address this problem,we proposed a connection between the problems of sparse signal recovery and multiuser communication.Let us consider the SMVproblem for the discussion to follow.Based on this connection,the columns of the measurement matrix A form a common codebook for all senders.Codewords from the senders are individually multiplied by unknown channel gains,which correspond to nonzero entries

8 of X.Then,the noise-corrupted linear combination of these codewords is observed. Thus,support recovery can be interpreted as decoding messages frommultiple senders. Using the techniques rooted in multiuser communication but suitably modiﬁed for the support recovery problem,we develop sharp necessary and sufﬁcient conditions for sup- port recovery of sparse signals to be asymptotically successful.For example,when k is ﬁxed,we show that n = (log m)/c(X) is sufﬁcient and necessary.We give a complete characterization of c(X) that depends on the values of all nonzero entries of X.This result provides a clear insight into the role of nonzero entries in support recovery,which improves upon many existing results where only the minimum nonzero magnitude en- tered the performance tradeoffs.We also provide performance limits for the scenarios with increasing k,random nonzero activities,and MMV,respectively.Especially,the performance limit regarding the MMV model,which is enabled by a connection to the single-input multiple-output (SIMO) MAC communication problem,suggests the po- tential performance improvement enabled by having multiple measurement vectors. 1.2.2 Algorithm Design and Analysis Using Multiuser Information Theoretic Techniques One can broadly classify the existing algorithms for sparse signal recovery into two categories:sequential selection methods,such as matching pursuit and orthogo- nal matching pursuit,and joint recovery methods,such as basis pursuit and Lasso.It has been observed that sequential selection methods can deal well with sparse signals

9 whose nonzero entries have very different magnitudes,but performpoorly for sparse sig- nal with similar magnitudes.In contrast,joint recovery methods can handle well sparse signals with similar magnitudes,but they may not exploit the disparity in nonzero mag- nitudes.In practice,the nonzero entries of a sparse signal can be modelled as clusters with each cluster comprising of a group of nonzero entries with comparable magnitudes. The variation in the dynamic range of the signals poses a new challenge for algorithmic development. Motivated by the group detector for multiuser detection,we explore the oppor- tunity of merging the different algorithmic design principles together to develop novel algorithms suitable for practical signals.We propose the MultiPass algorithmic frame- work,which sequentially detects different groups of nonzero entries.This is achieved by applying a joint recovery method with the goal of identifying a subset of the sup- port,on which the nonzero entries have comparable magnitudes,with high probability at each iteration.The MultiPass Lasso (MPL) algorithmis developed as an instantiation under this algorithmic framework.Meanwhile,we propose the Reweighted MultiPass Lasso algorithmwhich utilizes MPL as an component.Experiment study shows that the proposed algorithms yield improved estimation accuracy and computational efﬁciency. Theoretical analysis is also provided to support the performance improvement of the MultiPass Lasso algorithm. Then,we focus on the performance limit of orthogonal matching pursuit (OMP) using the connection between OMP and the successive interference cancellation (SIC) scheme for multiuser detection.Inspired by the decoding criterion for SIC,we pro-

10 pose an intuitive necessary condition for OMP to successfully recover the support of the sparse signal.Experiments and discussions are provided to shed light on the perfor- mance limit of OMP. 1.2.3 Robust Linear Regression The problemof robust linear regression focuses on learning a linear model based on the data in presence of outliers,which are observations that “deviate so much from other observations as to arouse suspicion that they were generated by a different mecha- nism” [67].Note that it is important to recognize the impact of outliers for modeling and analysis.Since some methods such as the least-squares (LS) estimation are very sensi- tive to outliers,the need of robust methods for regression analysis becomes evident. We leverage the techniques for sparse signal recovery to design novel algorithms for robust regression.The key aspect in our approach is the employment of a two- component model for measurement noise,where one component accounts for the regu- lar Gaussian noise and the other component captures the outliers in the data.Since the outliers occurs infrequently,the component vector representing outliers can be therefore viewed as a sparse vector.We propose a maximum-a-posteriori (MAP) based method and a sparse Bayesian learning (SBL) based method to jointly estimate the regression coefﬁcients as well as the outlier component.Experimental results on simulated and real data sets support the effectiveness of the proposed algorithms.