Nonlinear Programming


The third edition of the book is a thoroughly rewritten version of the 1999 second edition. New material was included, some of the old material was discarded, and a large portion of the remainder was reorganized or revised. The total number of pages has increased by about 10 percent.

Aside from incremental improvements, the changes aim to bring the book up-to-date with recent research progress, and in harmony with the major developments in convex optimization theory and algorithms that have occurred in the meantime. These developments were documented in three of my books: the 2015 book ``Convex Optimization Algorithms," the 2009 book ``Convex Optimization Theory," and the 2003 book ``Convex Analysis and Optimization" (coauthored with Angelia Nedic and Asuman Ozdaglar). A major difference is that these books have dealt primarily with convex, possibly nondifferentiable, optimization problems and rely on convex analysis, while the present book focuses primarily on algorithms for possibly nonconvex differentiable problems, and relies on calculus and variational analysis.

Having written several interrelated optimization books, I have come to see nonlinear programming and its associated duality theory as the lynchpin that holds together deterministic optimization. I have consequently set as an objective for the present book to integrate the contents of my books, together with internet-accessible material, so that they complement each other and form a unified whole. I have thus provided bridges to my other works with extensive references to generalizations, discussions, and elaborations of the analysis given here, and I have used throughout fairly consistent notation and mathematical level.

Another connecting link of my books is that they all share the same style: they rely on rigorous analysis, but they also aim at an intuitive exposition that makes use of geometric visualization. This stems from my belief that success in the practice of optimization strongly depends on the intuitive (as well as the analytical) understanding of the underlying theory and algorithms.

Some of the more prominent new features of the present edition are:

  1. An expanded coverage of incremental methods and their connections to stochastic gradient methods, based in part on my 2000 joint work with Angelia Nedi\'c; see Section 2.4 and Section 7.3.2.
  2. A discussion of asynchronous distributed algorithms based in large part on my 1989 ``Parallel and Distributed Computation" book (coauthored with John Tsitsiklis); see Section 2.5.
  3. A discussion of the proximal algorithm and its variations in Section 3.6, and the relation with the method of multipliers in Section 7.3.
  4. A substantial coverage of the alternating direction method of multipliers (ADMM) in Section 7.4, with a discussion of its many applications and variations, as well as references to my 1989 ``Parallel and Distributed Computation" and 2015 ``Convex Optimization Algorithms" books.
  5. A fairly detailed treatment of conic programming problems in Section 6.4.1.
  6. A discussion of the question of existence of solutions in constrained optimization, based on my 2007 joint work with Paul Tseng [BeT07], which contains further analysis; see Section 3.1.2.
  7. Additional material on network flow problems in Section 3.8 and 6.4.3, and their extensions to monotropic programming in Section 6.4.2, with references to my 1998 ``Network Optimization" book.
  8. An expansion of the material of Chapter 4 on Lagrange multiplier theory, using a strengthened version of the Fritz John conditions, and the notion of pseudonormality, based on my 2002 joint work with Asuman Ozdaglar.
  9. An expansion of the material of Chapter 5 on Lagrange multiplier algorithms, with references to my 1982 ``Constrained Optimization and Lagrange Multiplier Methods" book.

The book contains a few new exercises. As in the second edition, many of the theoretical exercises have been solved in detail and their solutions have been posted in the book's internet site

These exercises have been marked with the symbols \www. Many other exercises contain detailed hints and/or references to internet-accessible sources. The book's internet site also contains links to additional resources, such as many additional solved exercises from my convex optimization books, computer codes, my lecture slides from MIT Nonlinear Programming classes, and full course contents from the MIT OpenCourseWare (OCW) site.

I would like to express my thanks to the many colleagues who contributed suggestions for improvement of the third edition. In particular, let me note with appreciation my principal collaborators on nonlinear programming topics since the 1999 second edition: Angelia Nedic, Asuman Ozdaglar, Paul Tseng, Mengdi Wang, and Huizhen (Janey) Yu.

Dimitri P. Bertsekas, June 2016

Nonlinear Programming: First Edition, 1996


Nonlinear programming is a mature field that has experienced major developments in the last ten years. The first such development is the merging of linear and nonlinear programming algorithms through the use of interior point methods. This has resulted in a profound rethinking of how we solve linear programming problems, and in a major reassessment of how we treat constraints in nonlinear programming. A second development, less visible but still important, is the increased emphasis on large-scale problems, and the associated algorithms that take advantage of problem structure as well as parallel hardware. A third development has been the extensive use of iterative unconstrained optimization to solve the difficult least squares problems arising in the training of neural networks. As a result, simple gradient-like methods and stepsize rules have attained increased importance.

The purpose of this book is to provide an up-to-date, comprehensive, and rigorous account of nonlinear programming at the beginning graduate student level. In addition to the classical topics, such as descent algorithms, Lagrange multiplier theory, and duality, we cover some of the important recent developments: interior point methods for linear and nonlinear programs, major aspects of large-scale optimization, and least squares problems and neural network training.

A further noteworthy feature of the book is that it treats Lagrange multipliers and duality using two different and complementary approaches: a variational approach based on the implicit function theorem, and a convex analysis approach based on geometrical arguments. The former approach applies to a broader class of problems, while the latter is more elegant and more powerful for the convex programs to which it applies.

The chapter-by-chapter description of the book follows:

Chapter 1:

This chapter covers unconstrained optimization: main concepts, optimality conditions, and algorithms. The material is classic, but there are discussions of topics frequently left untreated, such as the behavior of algorithms for singular problems, neural network training, and discrete-time optimal control.

Chapter 2:

This chapter treats constrained optimization over a convex set without the use of Lagrange multipliers. I prefer to cover this material before dealing with the complex machinery of Lagrange multipliers because I have found that students absorb easily algorithms such as conditional gradient, gradient projection, and coordinate descent, which can be viewed as natural extensions of unconstrained descent algorithms. This chapter contains also a treatment of the affine scaling method for linear programming.

Chapter 3:

This chapter gives a detailed treatment of Lagrange multipliers, the associated necessary and sufficient conditions, and sensitivity analysis. The first three sections deal with nonlinear equality and inequality constraints. The last section deals with linear constraints and develops a simple form of duality theory for linearly constrained problems with differentiable cost, including linear and quadratic programming.

Chapter 4:

This chapter treats constrained optimization algorithms that use penalties and Lagrange multipliers, including barrier, augmented Lagrangian, sequential quadratic programming, and primal-dual interior point methods for linear programming. The treatment is extensive, and borrows from my 1982 research monograph on Lagrange multiplier methods.

Chapter 5:

This chapter provides an in-depth coverage of duality theory (Lagrange and Fenchel). The treatment is totally geometric, and everything is explained in terms of intuitive figures.

Chapter 6:

This chapter deals with large-scale optimization methods based on duality. Some material is borrowed from my Parallel and Distributed Algorithms book (coauthored by John Tsitsiklis), but there is also an extensive treatment of nondifferentiable optimization, including subgradient, e-subgradient, and cutting plane methods. Decomposition methods such as Dantzig-Wolfe and Benders are also discussed.


Four appendixes are given. The first gives a summary of calculus, analysis, and linear algebra results used in the text. The second is a fairly extensive account of convexity theory, including proofs of the basic polyhedral convexity results on extreme points and Farkas' lemma, as well the basic facts about subgradients. The third appendix covers one-dimensional minimization methods. The last appendix discusses an implementation of Newton's method for unconstrained optimization.

Inevitably, some coverage compromises had to be made. The subject of nonlinear optimization has grown so much that leaving out a number of important topics could not be avoided. For example, a discussion of variational inequalities, a deeper treatment of optimality conditions, and a more detailed development of Quasi-Newton methods are not provided. Also, a larger number of sample applications would have been desirable. I hope that instructors will supplement the book with the type of practical examples that their students are most familiar with.

The book was developed through a first-year graduate course that I taught at the Univ. of Illinois and at M.I.T. over a period of 20 years. The mathematical prerequisites are matrix-vector algebra and advanced calculus, including a good understanding of convergence concepts. A course in analysis and/or linear algebra should also be very helpful, and would provide the mathematical maturity needed to follow and to appreciate the mathematical reasoning used in the book. Some of the sections in the book may be ommited at first reading without loss of continuity. These sections have been marked by a star. The rule followed here is that the material discussed in a starred section is not used in a non-starred section.

The book can be used to teach several different types of courses.

  1. A two-quarter course that covers most sections of every chapter.
  2. A one-semester course that covers Chapter 1 except for Section 1.9, Chapter 2 except for Sections 2.4 and 2.5, Chapter 3 except for Section 3.4, Chapter 4 except for parts of Sections 4.2 and 4.3, the first three sections of Chapter 5, and a selection from Section 5.4 and Chapter 6. This is the course I usually teach at MIT.
  3. A one-semester course that covers most of Chapters 1, 2, and 3, and selected algorithms from Chapter 4. I have taught this type of course several times. It is less demanding of the students because it does not require the machinery of convex analysis, yet it still provides a fairly powerful version of duality theory (Section 3.4).
  4. A one-quarter course that covers selected parts of Chapters 1, 2, 3, and 4. This is a less comprehensive version of (c) above.
  5. A one-quarter course on convex analysis and optimization that starts with Appendix B and covers Sections 1.1, 2.1, 3.4, and Chapter 5.

There is a very extensive literature on nonlinear programming and to give a complete bibliography and a historical account of the research that led to the present form of the subject would have been impossible. I thus have not attempted to compile a comprehensive list of original contributions to the field. I have cited sources that I have used extensively, that provide important extensions to the material of the book, that survey important topics, or that are particularly well suited for further reading. I have also cited selectively a few sources that are historically significant, but the reference list is far from exhaustive in this respect. Generally, to aid researchers in the field, I have preferred to cite surveys and textbooks for subjects that are relatively mature, and to give a larger number of references for relatively recent developments.

Finally, I would like to express my thanks to a number of individuals for their contributions to the book. My conceptual understanding of the subject was formed at Stanford University while I interacted with David Luenberger and I taught using his books. This experience had a lasting influence on my thinking. My research collaboration with several colleagues, particularly Joe Dunn, Eli Gafni, Paul Tseng, and John Tsitsiklis, were very useful and are reflected in the book. I appreciate the suggestions and insights of a number of people, particularly David Castanon, Joe Dunn, Terry Rockafellar, Paul Tseng, and John Tsitsiklis. I am thankful to the many students and collaborators whose comments led to corrections and clarifications. Steve Patek, Serap Savari, and Cynara Wu were particularly helpful in this respect. David Logan, Steve Patek, and Lakis Polymenakos helped me to generate the graph of the cover, which depicts the cost function of a simple neural network training problem. My wife Joanna cheered me up with her presence and humor during the long hours of writing, as she has with her companionship of over 30 years. I dedicate this book to her with my love.

Dimitri P. Bertsekas

November 1995

Nonlinear Programming: Second Edition, 1999


This second edition has expanded by about 130 pages the coverage of the original. Nearly 40% of the new material represents miscellaneous additions scattered throughout the text. The remainder deals with three new topics. These are:

(a) A new section in Chapter 3 that focuses on a simple but far-reaching treatment of Fritz John necessary conditions and constraint qualifications, and also includes semi-infinite programming.

(b) A new section in Chapter 5 on the use of duality and Lagrangian relaxation for solving discrete optimization problems. This section describes several motivating applications, and provides a connecting link between continuous and discrete optimization.

(c) A new section in Chapter 6 on approximate and incremental subgradient methods. This material is the subject of ongoing joint research with Angelia Geary, but it was thought sufficiently significant to be included in summary here.

One of the aims of the revision was to highlight the connections of nonlinear programming with other branches of optimization, such as linear programming, network optimization, and discrete/integer optimization. This should provide some additional flexibility for using the book in the classroom. In addition, the presentation was improved, the mathematical background material of the appendixes has been expanded, the exercises were reorganized, and a substantial number of new exercises were added.

A new internet-based feature was added to the book, which significantly extends its scope and coverage. Many of the theoretical exercises, quite a few of them new, have been solved in detail and their solutions have been posted in the book's www page. These exercises have been marked with the symbol "www"

The book's www page also contains links to additional resources, such as computer codes and my lecture transparencies from my MIT Nonlinear Programming class.

I would like to express my thanks to the many colleagues who contributed suggestions for improvement of the second edition. I would like to thank particularly Angelia Geary for her extensive help with the internet-posted solutions of the theoretical exercises.

Dimitri P. Bertsekas

June, 1999

[Return to Athena Scientific Homepage]