HIGH COSTS AND NEGATIVE VALUE OF PAIR PROGRAMMING

pair programming - namcook analytics

June 13, 2013 Draft 1.0

Abstract
Pair programming is a new method in which two programmers share an office and a terminal and take turns coding while the other watches and comments.  The method is often associated with flavors of agile development such as extreme programming, although it is used with other methods too.

Pair programming costs about 2.5 times as much as one programmer working alone.  There are no significant quality benefits compared to a single programmer who uses static analysis and inspections.

Pair programming is the opposite of lean development; it is bloat.  There are a total of 116 occupation groups employed by large companies.  As many as 50 occupations work on large systems.  It is unknown why only programmers were selected to be doubled.

Capers Jones, CTO and Vice President

Namcook Analytics LLC
Email: [email protected]
Web: www.Namcook.com

Copyright 2013 by Capers Jones.
All Rights Reserved.

Introduction
The pair programming method could only exist in an industry that does not measure well and does not understand economics.

In pair programming two programmers share an office and a work station and take turns coding and observing.  The coder is called the “driver” and the observer is called the “navigator.”  The two switch roles frequently.

The literature and experiments with pairs are deficient because they do not consider all of the variables that can affect the outcome of pairs versus single programmers.  Some of the topics that should have been included but were not are:

  1. Single programmers who use static analysis
  2. Single programmers who use inspections
  3. Single programmers who use both static analysis and inspections
  4. Expert single programmers compared to average pairs
  5. Expert pairs compared to average single programmers
  6. Novice single programmers compared to average pairs
  7. Novice pairs compared to expert single programmers
  8. Novice single programmers compared to novice pairs
  9. Average single programmers compared to average pairs
  10. Expert single programmers compared to expert pairs

Many of the studies appear to take random pairs and compare the results to random single programmers without either side using static analysis or inspections.  This is not a realistic comparison.  Both sides should use at least static analysis.

Most of the studies are very small scale.  There is zero literature on using pairs for large systems which might have 500 or more programmers.  No company is going to hire 500 additional programmers without substantial economic reason.   Even if they tried, it is unlikely that 500 capable personnel could be hired to work as pairs.

Most of the studies  on pair programming assume that both the pairs and the single programmers do their own testing and seem to stop at unit test.  There is little information about the impact of certified test personnel.  Would testing by certified professional testers for function test, regression test, component test, performance test, and system test change the results?  Would there be pairs of certified testers too?

All of the studies are based solely on programmers and ignore all other software occupations such as business analysts, architects, testers, quality assurance, data base analysts, project managers, and any of the total of 116 occupation groups associated with software development and software maintenance.

On large software systems there may be as many as 50 occupation groups including business analysts, data base analysts, architects, designers, quality assurance, test personnel, technical writers, project office staff, managers, and many more.  Why not double some or all of these other personnel?

Negative Economics of Pair Programming
Average pairs of programmers code about 15% more slowly than average single programmers.

This reduction in speed would only have positive value if defects were reduced almost to zero but that is not the case.  Also, requirements and design defects outnumber code defects on large systems and pair programmers have little impact on non-code defects.

Table 1 is extracted from a pair-programming calculator developed by the author.  It allows researchers to input a number of variables including staff compensation, application size in lines of code, and coding speeds for both pairs and individual programmers.  Table 1 shows a typical pattern for average pairs and average individual programmers for 1000 code statements:

Table 1: Pair Programming vs. Single Programming Costs and Schedules

 
Application size in LOC:
 
 
 
1,000
Single coding speed in LOC per hour:20
Pair coding speed in LOC per hour:15
Monthly compensation:$7,500
Hourly compensation:$56.82
Single ProgrammerPair ProgrammingDifference
Clock hours:5066.67-16.67
Staff hours:50133.33-83.33
Code cost:$2,841 $7,576 ($4,735)
Cost Percent266.67%
Schedule days:8.3311.11-2.78
Schedule months:0.420.56-0.14
Schedule Percent133.33%

As can be seen from table 1 the economics of pair programming are severely negative.  Pair programming costs were $7,576 versus $2,841 for an increase of over 266%.  Pair programming schedules were 11.11 days as opposed to 8.33 days for an increase of 133%.

Unless the quality results for the pair approached zero defects and the quality results for the individual programmer were very poor there would seem to be no economic justification for pair programming.

Unfortunately the quality data from the pair programming literature is not adequate because it omits the impacts of inspections and static analysis.  Table 2 shows the quality results of a single programmer using both static analysis and inspections versus a pair that uses only testing:

Table 2: Pair Programming vs. Single Programming Quality

 
Single Programmer
Pair Programming
Difference
Defect potential13121
Static analysis %50%
Defects remaining77
Inspection %65%
Defects remaining22
Test %75%
Defects remaining13-2
Defects delivered13-2

If the individual programmer used a combination of static analysis and inspections, then the quality would be much better than the pair.  However this is not a fair comparison because the pair could also use static analysis.

The point of table 2 is that the literature on pair programming fails to include the potential impacts of either static analysis or inspections on quality.  If a single programmer uses static analysis and inspections the quality results will be much better than the pair.  If both the pair and the individual use static analysis, the quality results will be about the same but the pairs are still more than 250% as expensive.

Ranges of Programming Experience Levels
Programmers are not all equal in experience.  Let us assume three plateaus:  expert, average, and novice for both the individual programmers and the pair programmers.  These ranges yield nine possible combinations:

  • Single Programmer
  • Expert
  • Expert
  • Expert
  • Average
  • Average
  • Average
  • Novice
  • Novice
  • Novice
  • Pair Programmers
  • Expert
  • Average
  • Novice
  • Expert
  • Average
  • Novice
  • Expert
  • Average
  • Novice

Table 3 shows the results from these nine combinations for coding costs for 1000 code statements at a cost of $7,500 per month:

Table 3: Single and Pair Programming Experience Levels

Single Programmer
Pair Programming
Difference
ExpertExpert
$2,583 $5,682 ($3,099)
ExpertAverage
$2,583 $7,576 ($4,993)
ExpertNovice
$2,583 $8,741 ($6,158)
AverageExpert
$2,841 $5,682 ($2,841)
AverageAverage
$2,841 $7,576 ($4,735)
AverageNovice
$2,841 $8,741 ($5,900)
NoviceExpert
$3,551 $5,682 ($2,131)
NoviceAverage
$3,551 $7,576 ($4,025)
NoviceNovice
$3,551 $8,741 ($5,190)

It is obvious from table 3 that the best economic results do not come from pair programming but rather from hiring and using expert single programmers.  Since only about 10% of the programming population can be classed as “experts” it is also obvious that pair programming dilutes the pool of expertise without adding any economic value.

Because the main claim for pair programming is improved quality, table 4 shows the quality results from individuals and pairs at expert, average, and novice skill levels.  Table 4 assumes static analysis followed by unit testing.  Only code bugs are shown in table 4:  requirements and design bugs are hard to remove by either static analysis or by testing.  Inspections are the best technology for removing front-end defects in requirements and design.

Table 4 assumes 1000 source code statements:

Table 4: Software Quality Levels by Experience

 
Single Programmer
Pair Programming
Difference
ExpertExpert
Code defects10100
Removed910-1
Delivered101
ExpertAverage
Code defects1012-2
Removed9100
Delivered12-2
ExpertNovice
Code defects1015-5
Removed990
Delivered16-5
AverageExpert
Code defects13103
Removed10100
Delivered303
AverageAverage
Code defects13121
Removed10100
Delivered321
AverageNovice
Code defects1315-2
Removed1091
Delivered36-3
NoviceExpert
Code defects16106
Removed910-1
Delivered707
NoviceAverage
Code defects16124
Removed910-1
Delivered725
NoviceNovice
Code defects16151
Removed990
Delivered761

In the case where both the pair and the individual programmer are experts, pair programming has an advantage.  In cases where the individual programmer is an expert and the pairs are either average or novice, the individual programmer has the quality advantage.  But it should be remembered that the pair is still much more expensive than the individual.

For quality as well as costs, the best long-term strategy is not pair programming but selecting top experienced individual programmers.

For quality as well as costs the percentage of the programming population who might be viewed as “expert” is only about 10%.  Here too using pair programming would dilute the pool of available experts.

Harmful Consequences of Divided Authority
To date the concept of “pairs” has only been applied to programmers.  Because divided authority has been shown to be quite harmful in many other fields, doubling personnel is something to be cautious about.

The software industry employs a total of 116 different occupation groups.  To date there is no data or literature on other kinds of pairs such as:

  • Pairs of business analysts
  • Pairs of architects
  • Pairs of designers
  • Pairs of testers
  • Pairs of quality assurance
  • Pairs of cost estimation specialists
  • Pairs of technical writers
  • Pairs of project managers

In many kinds of work divided authority has been shown to be harmful.  For example divided authority in military campaigns has led to many famous military disasters such as the battle of Cannae where Carthage defeated a Roman army with divided command.

For some kinds of work pairs have been shown to be helpful:

  • Airline pilots and copilots
  • Scuba diving with pairs
  • Authoring some but not all books
  • Police patrol units in hazardous locations

For other kinds of work pairs have been shown to be quite harmful:

  • Military commands
  • Corporate CEO level executives
  • Medical practices for individual patients
  • Professional football with two quarterbacks sharing playing time

Pair programming was an interesting experiment, but not a method that should have been rushed into production use without a great deal more due diligence than has actually occurred.

REFERENCES AND ADDITIONAL INFORMATION
Jones, Capers; Software Engineering Best Practices; McGraw Hill, 2010.
Jones, Capers; The Economics of Software Quality, Addison Wesley, 2011.

Comments

  1. Larry Zevon Author

    Interesting read Capers, I was particularly interested to see the cost factors.

  2. You often use the formulation “most of the studies [on pair programming]” without giving any reference. What are the papers you’re talking of?

    Pair Programming is about programming. I didn’t know that Business Analysts program.

    Have you every considered soft factors like knowledge transfer?
    Or what about a company’s interest in *not* having indispensable employees and or being stuck in case of an employee’s illness? What about other quality assuring means than static analysis and inspections?

    • Capers Jones

      Bjorn,

      A google search on “pair programming” will turn up many new studies and also old ones. Perhaps you have seen “Pair Programming Considered Harmful” by John Evans. There is also a study by Laurie Williams of the University of Utah.

      Over and above citations, it is clear that paying 2 people to do the work of 1 person will cost more unless the two are twice as fast.

      One of my clients has about 50,000 software engineers. Doubling them to 100,000 would cost about $5 billion a year. Do you think this would be a good investment?

      Regards,
      Capers Jones

      • I don’t think doubling the number of engineers at your client would be effective. But as minimum-WIP approaches show in manufacturing, getting the same number of people to concentrate on smaller amounts of stuff at once can often yield productivity gains. So I don’t think common intuition is enough to discount pair programming.

        • Capers Jones

          William,

          You raise an interesting point which deals with a metric called the “assignment scope” or the optimal amount that one person can handle.

          Experts have larger assignment scopes than novices.

          This metric can be joined to another metric called “production rate.” This is the amount of work done in a given time period such as a month.

          Regards,
          Capers Jones

      • Seth

        You are confirming my suspicion: you didn’t need any research to arrive at your obvious conclusion.

        Your whole hand-wavy response to a request for references seems to mean “everybody knows pair programming doesn’t work”.

        Also, in the comments you seem to be stressing that any ‘good results’ with pair programming must be in smaller projects. It’s no accident that PP originates from the Agile world of software development.

        Now, have you considered whether productivity in your “large” (cruise vessel scale) projects could be improved by making the process more Agile?

        You’re not actually proving that the productivity of your 50,000 mythical software engineers is anywhere near optimal. At all. Perhaps, you could find that 10,000 developers pair programming could do the same work.

        Just making you think a bit wider than a spreadsheet with simple formulas.

        • Capers Jones

          Seth,

          On July 15 Don Reifer published a study of about 500 agile projects. His data is similar to my own that agile is concentrated below 1000 function points. My own data on around 15,000 projects shows RUP, TSP, and hybrid to be the most widely used formal method in the 10,000 function point range. Neither Don nor I found any agile projects for 10,000 function points. Do you know of any?

          Thanks,
          Capers Jones

        • Capers Jones

          Seth,

          Agile does not scale up well. Both TSP and RUP are better above 10,000 function points.
          Neither Don Reifer’s data nor my own have encountered Agile above 5,000 function points.
          If you know of any uses of Agile for big systems let the group know.

          Based on both my client data and Don Reifer’s study of about 500 agile projects, agile is not great for quality and not great for maintenance.
          It is pretty good for development; especially for development of small projects < 1000 function points.

          Thanks,
          Capers Jones

      • Dan Puzey

        I’m sorry, but this is a non-reply. You’re discussing what appears to be a formal paper that you’ve authored yourself; a reply to the question “what are your sources?” should not be as flippant as “a google search returns lots of results that could have been used as supporting evidence.” Nor is posting rhetorical questions constructive. You should site your sources directly and clearly if you expect your assertions to hold weight.

        Please note that I’m not against your arguments: I find myself broadly in agreement. But to present a ream of numbers with no clear source (beyond a “calculator developed by the author”) and to then *not give* a source when asked reflects badly. I can go to Google and find evidence to support just about anything – including the counter to your assertions.

        • Capers Jones

          Dan,

          Instead of your complaint, which I understand, about my non-disclosure agreements it would benefit pair programming if the teams themselves participated in benchmarks with the non-profit International Software Benchmark Standards Group (ISBSG.org). Here are a number of topics that need additional study:

          Health Hazards from Pair Programming

          Several studies of bacteria found on computer keyboards and mice found that keyboards harbored more bacteria than toilet seats. Keyboards are among the least sanitary of all forms of office equipment. Pair programming with the pair sharing one keyboard, mouse, or touch screen raises the probability of bacterial or viral infection for both members of the pair. A study is recommended to examine the incidence of flu and other contagious diseases among pair programming teams and also individual programmers.

          Mean Time to Interrupt from Shared Office Cubicles

          A study at IBM of private offices and shared cubicles by the author found that the mean time to interrupt or some form of distracting social commentary was about 13 minutes for two-person cubicles and 11 minutes for cubicles with more than two occupants. This study was congruent with similar but independent analysis by Tom DeMarco who found that solo programming productivity was higher for individual offices than for shared cubicles. There is a need for long-term research on pair programming and also for solo programming to confirm or challenge these earlier studies. In particular the mean time to interrupt or social rather than technical discussions needs more study in a pair programming context.

          Remote Pairing rather than Co-Location

          A study by the author of staff meetings and other technical meetings in IBM software groups involving 5 to 8 participants found that live meetings averaged about 15 minutes of non-technical social discussions prior to actually starting the technical discussion. The study also found non-relevant side issues came up about every 7 minutes. The meetings lasted about 3.25 hours but technical discussions occupied less than 90 minutes of the total meeting time.

          By contrast technical telephone conference calls with the same numbers of participants started within 5 minutes of the nominal start time and lasted about 53 minutes. There were many fewer side issues during conference calls than during live meetings. The technical discussions occupied about 44 minutes of the 53 minute calls.

          These results suggest that remote pairing with the pairs sharing information via shared screens and voice communication might be more effective than co-location and probably less of a health hazard for the participants. Remote pairing is a subject that would benefit from field trials.

          Modern studies on scrum sessions that measured total duration, technical content, and non-technical social topics would also be useful. It is possible that remote scrum sessions rather than live ones might have the same patterns noted in the early studies of live versus telephone meetings.

          Formal Inspections compared to Pair Programming

          Formal inspections of requirements, design, code, and other deliverable items have more than 40 years of empirical data and thousands of measured inspections. Key aspects of the success of inspections are the fact that time recording is an integral part of the task so that preparation time, inspection time, and defect repair time are all known. Another aspect with relevance to pair programming would be the use of 2-hour sessions with breaks, and possibly limiting the number of sessions to a maximum of three per day. Many aspects of programming benefit from introspection and private research. Both the effort and the ranges of defect removal efficiency are known for inspections, but are not known for pair programming. Additional research on pair programming with limited sessions per day would be useful.

          Lack of Pair Programming Benchmark Data

          A check with the International Software Benchmark Standard Group (ISBSG.org) found that out of their collection of more than 5,000 software projects not even one pair programing project had submitted data. Elsewhere in this blog is a catalog of software benchmark providers that cites 23 organizations that produce quantitative benchmark information with a combined total of around 91,000 projects. Here too there are no pair programming projects reported other than the ones cited in this blog by the author. Pair programming seems to have less benchmark data available than any current methodology. It would benefit pair programming credibility if paired projects downloaded the ISBSG data collection questionnaire and began to produce benchmark reports. There is no charge for downloading the ISBSG report or submitting project data. Pair programming benchmarks would add credibility to the various claims of higher quality and productivity, or perhaps challenge those claims. In either case pair programming benchmarks would benefit the software industry.

          Pair Programming in Context

          The literature on pair programming concentrates on the task of coding and does not include either the earlier tasks of requirements and design or the later tasks of quality assurance, critical inspections, all forms of testing, project management, or many other tasks. It would be beneficial to the pair programming literature to include full life-cycle studies that encompassed other work such as the following for 10,000 function points:

          Development Activities
          Work
          Burdened

          Hours per
          Cost per

          Function
          Function

          Point
          Point

          1
          Business analysis
          0.02
          $1.33
          2
          Risk analysis/sizing
          0.00
          $0.29
          3
          Risk solution planning
          0.01
          $0.67
          4
          Requirements
          0.38
          $28.57
          5
          Requirement. Inspection
          0.22
          $16.67
          6
          Prototyping
          0.33
          $25.00
          7
          Architecture
          0.05
          $4.00
          8
          Architecture. Inspection
          0.04
          $3.33
          9
          Project plans/estimates
          0.03
          $2.00
          10
          Initial Design
          0.75
          $57.14
          11
          Detail Design
          0.75
          $57.14
          12
          Design inspections
          0.53
          $40.00
          13
          Coding
          4.00
          $303.03
          14
          Code inspections
          3.30
          $250.00
          15
          Reuse acquisition
          0.01
          $1.00
          16
          Static analysis
          0.02
          $1.33
          17
          COTS Package purchase
          0.01
          $1.00
          18
          Open-source acquisition.
          0.01
          $1.00
          19
          Code security audit.
          0.04
          $2.86
          20
          Ind. Verif. & Valid.
          0.07
          $5.00
          21
          Configuration control.
          0.04
          $2.86
          22
          Integration
          0.04
          $2.86
          23
          User documentation
          0.29
          $22.22
          24
          Unit testing
          0.88
          $66.67
          25
          Function testing
          0.75
          $57.14
          26
          Regression testing
          0.53
          $40.00
          27
          Integration testing
          0.44
          $33.33
          28
          Performance testing
          0.33
          $25.00
          29
          Security testing
          0.26
          $20.00
          30
          Usability testing
          0.22
          $16.67
          31
          System testing
          0.88
          $66.67
          32
          Cloud testing
          0.13
          $10.00
          33
          Field (Beta) testing
          0.18
          $13.33
          34
          Acceptance testing
          0.05
          $4.00
          35
          Independent testing
          0.07
          $5.00
          36
          Quality assurance
          0.18
          $13.33
          37
          Installation/training
          0.04
          $2.86
          38
          Project measurement
          0.01
          $1.00
          39
          Project office
          0.18
          $13.33
          40
          Project management
          4.40
          $333.33

          Cumulative Results
          20.44
          $1,548.68

          The studies would identify the full sets of tasks normally performed in a pair-programming context and the costs, schedule, and staffing for these tasks. Software development involves much more than just coding and the impact of pair programming on total development is missing from the pair programming literature.

          Since pair programming often occurs in an agile or extreme programming context, similar data would be shown by sprint rather than for the full project as shown above.

    • Capers Jones

      Bjorn,

      Programmers are less than 30% of the total employment for large systems. Are you saying nobody else has any value?

      Most projects have back up plans in case of illness or lost personnel – that is not unique to pairs. What happens if you lose both of the pair at the same time?

      Capers Jones

    • Capers Jones

      Bjorn,

      Pair programmers are not immune to flu or other illness. Losing half of a pair is just as troublesome as losing a solo programmer. In any case most companies have backup plans. I myself have taken over code from programmers who were ill or who changed jobs.

      As to knowledge transfer there are dozens of methods besides pair programming: inspections, mentoring, wiki sites, quality function deployment (QFD), joint application design (JAD), and today virtual team meetings for international projects.

      Capers Jones

  3. Capers,

    I find this an embarrassing text.

    You correctly state that pair programming is usually associated
    with agile methods (which assume requirements are incompletely
    understood, can best be uncovered by _building_ the software,
    and change rapidly).

    Then you compare this with large organizations employing dozens
    of different occupation groups. Such organizations need to
    assume known and stable requirements, because otherwise
    such a high degree of functional specialization would be
    counter-productive.

    That comparison does not make much sense.
    In the agile view, the “programmer” is not a mere coder.
    Rather, programmer work also comprises design, testing,
    infrastructure building, configuration management,
    and training your colleagues, among other things.
    Therefore, your focus on only coding productivity and
    defect densities is overly narrow.

    What is worse, your key assumption is, I quote
    “Average pairs of programmers code about 15% more
    slowly than average single programmers.”
    You do not provide a source for this.

    I readily agree the scientific evidence available so far
    is incomplete — but it is not nil and it suggests
    your above statement to be wrong.
    For a more differentiated view, I suggest you consult
    “The effectiveness of pair programming: A meta-analysis”
    (2009, Jo Hannay, Tore Dybå, Erik Arisholm, Dag Sjøberg):
    Only one of the 11 experiments this article analyzed found
    pair programmers to be slower than solos (see its Fig 1).

    As for when it actually makes sense to use pair programming
    and how to do it properly, my own group is working since several
    years to understand the internal mechanisms of (effective and
    ineffective) pair programming.
    One early result is a (still incomplete) analysis of the
    functional specialization _within_ pair programming (as
    opposed to the hopelessly naive driver/observer model).
    See
    “Liberating pair programming research from the oppressive
    Driver/Observer regime”
    (2013, Franz Zieris, Stephan Salinger, Lutz Prechelt)
    for a preliminary description.

    Once these and several other mechanisms are understood,
    we will work out helpful and unhelpful patterns of behavior
    within a pair programming session so that programmers can
    easily learn how (and when) to pair effectively.
    Only _then_ should we measure the efficiency of
    pair programming and make broad claims about its suitability.
    Until then, let us just leave it to the judgment of competent
    agile “programmers”.

    • Capers Jones

      Lutz,

      You and I probably come from different forms of software. If you work on large systems with development staffs of more than 500 people involving more than 50 occupations building systems for more than 10,000 users pair programming is not a good choice.

      Do you recommend pairing architects, business analysts, quality assurance, testers, and some of the other 50 occupations who work on big systems?

      Whenever you pay 2 people to do the work of 1 person the costs go up unless the two are more than twice as fast. This is not true for pair programming.

      Regards,
      Capers Jones

      • Jack9

        > unless the two are more than twice as fast. This is not true for pair programming.

        The first part is a strange assumption. The second part is unsubstantiated.
        The individual programmer output is miniscule in regards to code run in production (when talking about a modern development process that includes testing and other quality checks). For example, if I write 100 lines of code and, in a code review, there is a different solution that another came up with in 60, there is a cost benefit analysis applied to rewrite that code (and is rarely done). If you have the 2 coders pair, you get a solution of 60 (or less) and the need to rework to get it is 0. There are many factors to establish what code is produced and raw loc is not the only one. So when you claim you have to pay twice to get the output of 1, you’re misunderstanding the process. It would be great if we had actual data to support the “pair programming is not economical” (which I believe). Simply saying it isn’t, doesn’t convince anyone.

        • Capers Jones

          Jack,

          Your are forgetting that defects in requirements and design outnumber defects in the code and often are harder to find and repair.

          There are also bad fixes or secondary defects found defect repairs themselves. About 7% is the U.S. average for bugs introduced by bug repairs.

          Regards,
          Capers Jones

          • Freya

            “Your are forgetting that defects in requirements and design outnumber defects in the code and often are harder to find and repair.”

            This is exactly why agile teams employ pair programming. On the projects I work on we never have all the other professions you quote. Architect, designer, qa, tester. We have none of these positions where I work, this is why we are consider software developers or engineers. We do so much more than write loc. We as a team of “programmers” are the architects, the designers, the testers, we provide our own qa, we do our own estimations and we don’t have a project manager.

            Where I work pairs are very useful for working with legacy code, or even last weeks code. We mainly work on adapting already existing code, probably not written by ourselves. This fits very neatly with the agile iterations process. We also have sparse documentation as it gets out of date. We try to make our code self documenting. We achieve this by having input from the people who will have to work with our code at a later date. ie a pair from within the team. It can save many more man hours in the long run, than it will ever take to pair, to have something written in a clean coherent house style. I appreciate that you don’t need a pair to flip burgers, but you do need a pair to conduct code surgery.

          • Capers Jones

            Freya,

            You must work on small projects with few users. Many ways can be used for small projects with few users. When you have more than 10,000 clients and more than 500 team members you can’t omit the other occupations.

            It is easy to build a row boat. Not so easy to build a 75,000 ton cruise ship.

            Capers Jones

          • Capers Jones

            Freya,

            Have you ever built applications > 10,000 function points or > 500,000 LOC?

            Your description of your environment is typical for small projects.

            Regards,
            Capers Jones

          • Capers Jones

            Freya,

            Carpenters or even home owners can build a shed without using an architect or any specialists such as electricians and plumbers.
            But if you build large structures such as office buildings or cruise ships you need a lot of specialists.

            Any application small enough to be done by generalists is not very interesting. It may be useful and fun, but technically not challenging compared to large systems.

            Capers Jones

    • Capers Jones

      Lutz,

      You don’t seem to think that business analysts, architects, designers, data base designers, and any of the other occupations add value.

      Do you think pairs of programmers can handle all of the client interactions for something like a central office switching system that will serve millions or clients?

      So far every comment about pairs seems to have hidden assumption that applications are small and have only one or two clients that provide requirements.

      Capers Jones

    • Capers Jones

      While you are collecting data, suppose an outsource vendor arrives at your CEO’s office and says they will do what you are doing for half the cost.
      Can you provide an effective defense?

      This happens quite often.

      Thanks,
      Capers Jones

      • Caoilte O'Connor

        If a CEO wishes to goose this year’s executive bonus package by leveraging the undeniable short-term boost to the balance sheet from gutting the organisation then no one is going to succeed in dissuading him. The natural course of action is to leave immediately and accept a higher paying position at a company experiencing the longer term consequences of outsourcing (inability to innovate, fees that rise exponentially as business loses capability to work without outsourcer) and now wishes to move development in-house again.

        This happens all the time.

        We would, of course, avoid the bureaucratic overhead of architects, BAs and manual testers by practising pair-programming in large numbers of small autonomous teams that work closely with the business to prioritise incremental improvements.

        • Capers Jones

          Caoilte,

          Your points are interesting. I do a lot of work as an expert witness in outsource litigation. About 70% of outsource agreements work OK and both parties are satisfied. About 20% are in trouble and 5% end up in court.

          I had hoped you might have suggested having available quantitative data from pair programming – poor quality is often a claim in outsource litigation.

          For example you might suggest to the CEO that the outsource vendor provide proof that their quality would be as good or better as yours.

          The unfortunate tendency of pair groups not to measure and not to have quantitative data weakens their case.

          Thanks,
          Capers Jones

    • Capers Jones

      Lutz,

      You seem to be saying that pair programming often fails and needs to be tuned and adjusted. After trial and error pairs can get better.
      Is this your point or did I misunderstand it?

      Thanks,
      Capers Jones

    • Capers Jones

      Lutz,

      Of the 34 methods I collect data on agile is not the best; pair programming is near the bottom.
      Both are better than cowboy; agile is better than waterfall almost always. Neither is better than TSP or RUP for large systems.
      Both agile and pair programming are used mainly for small applications below 1000 function points.

      Neither among my own clients or from Don Reifer’s study of about 500 agile projects has agile been noted above 5,000 function points.
      If you know of any agile projects for large systems, let us all know.

      Longer-range studies show agile is a bit weak in quality compared to TSP and RUP and not as good for maintenance; better than waterfall though.

      Another issue with pair programming is that quite a few programmers don’t like it and some have changed jobs to avoid it.

      Also, for the kind of programming I do dealing with my own inventions that will be patented later it is not a good choice.
      Several years ago a colleague working on a patenable invention and using pair programming had his invention stolen and patented by his partner, which led to expensive litigation. For my own inventions I don’t want anyone else to know how they work before the patents are filed.

      Thanks,
      Capers Jones

  4. Capers,
    It’s good that you’re trying to quantify the usefulness of pair programming. You have many good points, and also some serious flaws in your arguments. I will point out only the most important one from my perspective:

    As you, yourself, have often emphasized, in the full cost of putting software into production, the cost of writing code is a minor element. If it turns out that method 1 produces even slightly lower quality software than method 2, the cost of using method 1 is often many times the cost of using method 2.

    In my experience, a good pair produces hightr quality code than an indifivual of the same skill as the pair members. This alone makes pairs much more efficient, and I’d love to see you do an analysis that takes the cost of quality into account.

    Of course, pairs, like individuals, vary greatly in their ability to produce high quality code at low cost, or at any cost. Without going into full detail, my own conclusion from more that 50 years experience working with software development teams would be somehwat different from yours:

    For quality as well as costs, the best long-term strategy is pair programming, selecting well-matched pairs of top experienced individual programmers with demonstrated interpersonal skills. (…with the addition of some less experienced programmers paired with more experienced ones for their personal development into the next generation of top experienced individual programmers …)

    • Capers Jones

      Gerry,

      Good to hear from you and thanks for the comment. I’ll send some additional materials that are too big for this response – the measured results of many different kinds of pre-test removal and testing.

      Individual programmers average about 1.6 defects per function point and find about 35% of the bugs via unit test.

      Pairs average about 1.5 defects per function point and find about 45% via
      unit test.

      However formal inspections average about 85%; static analysis averages about 60%; testing by professional test personnel find up to 50% for each test stage.

      A single programmer who uses a combination of static analysis plus inspections will have higher defect removal efficiency than pairs, unless they also use static analysis and formal inspections.

      Since paying two people to do the work person is expensive, how many pairs do you think would be needed in companies who employ from 10,000 to 50,000 software engineers? The annual costs for doubling 50,000 software engineers is about $5 billion per year.

      For large systems requirements and design bugs outnumber code bugs and pairs have little impact on them. The U.S. average for bugs is about as follows:

      Requirements = 1.00 per function point
      Design = 1.25 per function point
      Code = 1.50 per function point for mid range languages
      Documents = 0.6 per function point
      Bad fixes or secondary bugs = 0.4 per function point
      Total bugs = 4.75 per function point

      Competent single programmers using a synergistic combination of pre-test inspections and static analysis followed by formal testing with certified test personnel and test cases designed using mathematical methods achieve 99% in defect removal efficiency.

      The current U.S. average is only a bit over 85% and Agile even with pairs is only about 92%.

      Best Regards,
      Capers Jones

      • > Individual programmers average about 1.6 defects per function point and find about 35% of the bugs via unit test.

        > Pairs average about 1.5 defects per function point and find about 45% via
        unit test.

        [Citation Needed]

        • Capers Jones

          Curtis,

          I have a pair programming calculator that allows users to put in their own numbers.
          What numbers do you have for defect densities?

          Capers Jones

          • Drew

            Curtis’ numbers are irrelevant. He’s asking for *your* data. If you’re going to make sweeping statements expect someone to ask to see the data. And the onus is on you, not the person asking.

          • Capers Jones

            I have non disclosure agreements with my clients. However if you read any of my books the front matter acknowledges the clients that provide the data in general forms.

            Capers Jones

          • Capers Jones

            I have non-disclosure agreements with my clients. If you read my books the acknowledgements list many of the companies in a general way.
            NDA agreements prohibit naming companies.

            Capers Jones

          • Capers Jones

            Drew,

            One of the best sources for comparative data is the International Software Benchmark Standards Group (ISBSG.org). This is a non-profit organization from Australia who has data available on more than 5,000 software projects. It is available to the general public by subscription or fairly small fees for specific kinds of data.

            About the time I started this blog I checked and as of a few weeks ago not even one pair programming project had submitted any data to ISBSG, although quite a few agile projects had data.

            Why do you suppose no pair programming projects have submitted quantitative benchmark data? There are thousands of projects where inspections have been measured, and also many other methodologies such as RUP, waterfall, iterative, TSP, and the like.

            Thanks,
            Capers Jones

          • Capers Jones

            Drew,

            I published the data just not the sources due to NDA. I note that none of the 50 or so comments on pair programming had any data at all or at least none that they shared in their comments.

            Capers Jones

          • Capers Jones

            Drew,

            I published the data but due to non-disclosures not the sources, which does not negate the data itself.
            If you don’t like it publish your own data on pair programming and include your sources.

            Capers Jones

        • Capers Jones

          Curtis,

          I have non-disclosure agreements with clients. If you read any of my books the acknowledgements cite some of the companies in a general way, but NDA prevent naming specific clients.

          Capers Jones

  5. Capers,
    Thanks for the additional information. Still, you seem to be comparing “competent single programmers” to “average pair programmers.” There are lots of average pairs, but very few “competent single programmers” (using your definition. In my experience as a programmer and an observer of programmers, I’m capable of being “competent,” but much more likely to act competently when I’m working with a partner who won’t allow me to take shorrtcuts or forget one of those practices that allows me to reach that 99% level. That’s perhaps the principal advantage of working with a partner.

    You’re right about what levels *could* be reached by the individual programmer, but the critical question is “why don’t more indivual programmers reach this level consistently?” Pairing is one way of becoming more consistently “competent.” I don’t know of any other way that equals it.

    • Capers Jones

      Jerry,

      Inspections also transfer knowledge and improve performance of novices. IBM data noted that participation in formal inspections drove down average numbers of bugs in both designs and code because inspection participants were avoiding mistakes noted during the inspection process. There is almost 40 years of data for inspections.

      Do you have any actual data for pair programming that differs from mine?

      Best Regards,
      Capers Jones

      • Capers Jones

        Sean,

        I have data on around 15,000 projects using 34 methods. Some are hybrid or composite methods. The 34 are:

        Methods
        1 Mashup
        2 Hybrid
        3 IntegraNova
        4 TSP/PSP
        5 Microsoft Solutions Framework
        6 RUP
        7 XP
        8 Agile/Scrum
        9 Data state design
        10 T-VEC
        11 Information engineering (IE)
        12 Object Oriented
        13 EVO
        14 RAD
        15 Jackson
        16 SADT
        17 Spiral
        18 SSADM
        19 Open source
        20 Flow based
        21 Iterative
        22 Crystal development
        23 V-Model
        24 Prince2
        25 Merise
        26 DSDM
        27 Clean room
        28 ISO/IEC
        29 Waterfall
        30 Pair programming
        31 DoD 2167
        32 Proofs of correctness
        33 Cowboy
        34 None

        Thanks,
        Capers Jones

      • Capers Jones

        Sean,

        Probably without intending it, your reply indicates that pair programming in unscientific.

        Usually with medicine and science new medicines and new theories have some validation prior to release. They are also measured after release.

        For example IBM validated inspections with hundreds of trials prior to release. A colleague, Gary Gack, has data on more than 1,600 measured inspections.

        With pair programming there seemed to have been no initial validation or proof of efficacy prior to release. Fortunately it seems to work sometimes, but it fails sometimes too.

        If pair programming had gone through a normal scientific release at least 50 trials would have occurred prior to release. Measured results should now be in the thousands, which is not the case.

        Capers Jones

    • Capers Jones

      Jerry,

      Competence follows a bell-shaped curve as we all know. The bell would have more on the lagging end and fewer on the high performance end with pairs.

      Suppose you measured 100 Olympic runners in a 100 meter event. Then suppose you measured 100 pairs of runners where the average of the pair is the value measured.

      The results would probably favor the single sprinters.

      Thanks,
      Capers

  6. Hi Capers,

    One thing I immediately notices was the strawman argument in the beginning. Pair Programming is not new – Wikipedia shows references dating at least back to 2000 on Pair Programming (http://en.wikipedia.org/wiki/Pair_programming). It is also not necessarily a technique where one codes and one comments – we often employ ping pong programming, where Dev A writes a failing test, Dev B makes the test pass, they both refactor, and then Dev B writes the next failing test and Dev A makes it pass.

    Further, you extrapolate that only devs were selected to pair. In many critical industries, having a pair is critical. For example, as a former firefighter, we never went into a fire alone, but always with a pair. Likewise Paramedics would also have a second medic with them, and for critical cases, they both stayed in the back of the ambulance while a third party drove it to the hospital. Closer to home, I’ve observed many cases of managers, PMs, Accountants, etc having people review their work closely with them.

    One thing that isn’t clear from your calculator is that if these are metrics based on calculations of teams actually doing pair programming, or assumptions made from your texts. I can say that subjectively I’ve seen significant productivity with pairing teams in at least three separate organizations. I’d be happy to connect you with them if you would be interested in studying them closer.

    Cory

    • Capers Jones

      Cory,

      Thanks for the comments. I have also produced a list (not on the blog) of occupations where pairs are helpful and pairs are harmful. For example pilots and co-pilots are helpful; surgical teams need more than pairs.

      Among the worst and most harmful use of pairs are military commands, where divided commands often lead to defeats.

      There are 116 different occupations associated with software. How many of these do you think should be paired?

      A client of mine has about 50,000 software engineers. Pairing all of them up to 100,000 would cost about $5 billion a year.

      From observations of cost structures, introducing inspections and static analysis would cut the number of software people down to 40,000 and achieve as much or more than is accomplished today.

      Regards,
      Capers Jones

    • Capers Jones

      Cory,

      Most people would be interested in the data you mention – why not tell us how to connect.

      What measures do you use for productivity?

      Thanks,
      Capers Jones

    • Capers Jones

      Cory,

      I’d be please to connect and compare the data of your clients.

      Here are some typical values for a sample of methods noted among my clients for applications in the 1000 FP range or roughly 5,000 Java statements.

      The data is expressed in terms of function points per month:

      Agile = 12.5
      XP without pairs = 12
      TSP = 11.5
      RUP = 10.5
      Pairs = 9.5
      Waterfall = 8.0

      Thanks,
      Capers Jones

    • Capers Jones

      Cory,

      You can also include aircraft pilots and copilots. However for some fields such as military commands pairing has often led to disaster and was a contributing factor for the loss at the battle of Cannae.

      There are also methods such as inspections which bring 3 to 5 people into collaboration – there is some advantage for the larger number of views. However too big a group can be divisive.

      Thanks,
      Capers Jones

  7. As someone who has done a lot of pairing, I’m pretty skeptical of these conclusions.

    As one friend puts it, pairing gets me third-draft code in first-draft time. For me, benefits include maximizing work not done, increasing clarity of code, finding better solutions to problems, and greatly increased team sharing of knowledge. That gets substantial productivity gains for the team over time.

    So, for example, would I expect a using pairs to produce fewer lines of code per hour? Sure, but to me that’s a sign of better code, not of lower output volume. I also definitely don’t believe that “pair programmers have little impact on non-code defects”. Working paired makes me much more likely to push back on requirements that don’t make sense. And I’d question the notion that pairing dilutes experts; for me it lets me amplify my expertise by helping my colleagues learn all my tricks.

    Could it be that the stats you are using are confounded by the environments you’re measuring it in? I think pairing makes sense in a collaborative context, where you’ve got tight cross-functional teams regularly producing releasable output, and where there’s some sort of feedback loop between what’s released and what next gets built.

    I’d also be curious to see the details of your numbers by quality of pairing. Do they have a station where they can code with equal access to keyboard? Are they using tools equally familiar to both? Does control change frequently? Is it really a collaborative relationship? Do pairs rotate frequently so that team members all pair with one another on a regular basis?

    As with any practice, there are a lot of ways to do pairing wrong, and the viewpoint of this article doesn’t give me a lot of confidence that the author understands how to do it well or what necessary supporting practices are.

    • Capers Jones

      William,

      Thanks for your email. There are quite a few other issues with pair programming that need more work and more research.
      Here are some examples:

      Health Hazards from Pair Programming

      Several studies of bacteria found on computer keyboards and mice found that keyboards harbored more bacteria than toilet seats. Keyboards are among the least sanitary of all forms of office equipment. Pair programming with the pair sharing one keyboard, mouse, or touch screen raises the probability of bacterial or viral infection for both members of the pair. A study is recommended to examine the incidence of flu and other contagious diseases among pair programming teams and also individual programmers.

      Mean Time to Interrupt from Shared Office Cubicles

      A study at IBM of private offices and shared cubicles by the author found that the mean time to interrupt or some form of distracting social commentary was about 13 minutes for two-person cubicles and 11 minutes for cubicles with more than two occupants. This study was congruent with similar but independent analysis by Tom DeMarco who found that solo programming productivity was higher for individual offices than for shared cubicles. There is a need for long-term research on pair programming and also for solo programming to confirm or challenge these earlier studies. In particular the mean time to interrupt for social rather than technical discussions needs more study in a pair programming context.

      Remote Pairing rather than Co-Location

      A study by the author of staff meetings and other technical meetings in IBM software groups involving 5 to 8 participants found that live meetings averaged about 15 minutes of non-technical social discussions prior to actually starting the technical discussion. The study also found non-relevant side issues came up about every 7 minutes. The meetings lasted about 3.25 hours but technical discussions occupied less than 90 minutes of the total meeting time.

      By contrast technical telephone conference calls with the same numbers of participants started within 5 minutes of the nominal start time and lasted about 53 minutes. There were many fewer side issues during conference calls than during live meetings. The technical discussions occupied about 44 minutes of the 53 minute calls.

      These results suggest that remote pairing with the pairs sharing information via shared screens and voice communication might be more effective than co-location and probably less of a health hazard for the participants. Remote pairing is a subject that would benefit from field trials.

      Modern studies on scrum sessions that measured total duration, technical content, and non-technical social topics would also be useful. It is possible that remote scrum sessions rather than live ones might have the same patterns noted in the early studies of live versus telephone meetings.

      Formal Inspections compared to Pair Programming

      Formal inspections of requirements, design, code, and other deliverable items have more than 40 years of empirical data and thousands of measured inspections. Key aspects of the success of inspections are the fact that time recording is an integral part of the task so that preparation time, inspection time, and defect repair time are all known. Another aspect with relevance to pair programming would be the use of 2-hour sessions with breaks, and possibly limiting the number of sessions to a maximum of three per day. Many aspects of programming benefit from introspection and private research. Both the effort and the ranges of defect removal efficiency are known for inspections, but are not known for pair programming. Additional research on pair programming with limited sessions per day would be useful.

      Lack of Pair Programming Benchmark Data

      A check with the International Software Benchmark Standard Group (ISBSG.org) found that out of their collection of more than 5,000 software projects not even one pair programing project had submitted data. Elsewhere in this blog is a catalog of software benchmark providers that cites 23 organizations that produce quantitative benchmark information with a combined total of around 91,000 projects. Here too there are no pair programming projects reported other than the ones cited in this blog by the author. Pair programming seems to have less benchmark data available than any current methodology. It would benefit pair programming credibility if paired projects downloaded the ISBSG data collection questionnaire and began to produce benchmark reports. There is no charge for downloading the ISBSG report or submitting project data. Pair programming benchmarks would add credibility to the various claims of higher quality and productivity, or perhaps challenge those claims. In either case pair programming benchmarks would benefit the software industry.

      Pair Programming in Context

      The literature on pair programming concentrates on the task of coding and does not include either the earlier tasks of requirements and design or the later tasks of quality assurance, critical inspections, all forms of testing, project management, or many other tasks. It would be beneficial to the pair programming literature to include full life-cycle studies that encompassed other work such as the following for 10,000 function points:

      Development Activities
      Work
      Burdened

      Hours per
      Cost per

      Function
      Function

      Point
      Point

      1
      Business analysis
      0.02
      $1.33
      2
      Risk analysis/sizing
      0.00
      $0.29
      3
      Risk solution planning
      0.01
      $0.67
      4
      Requirements
      0.38
      $28.57
      5
      Requirement. Inspection
      0.22
      $16.67
      6
      Prototyping
      0.33
      $25.00
      7
      Architecture
      0.05
      $4.00
      8
      Architecture. Inspection
      0.04
      $3.33
      9
      Project plans/estimates
      0.03
      $2.00
      10
      Initial Design
      0.75
      $57.14
      11
      Detail Design
      0.75
      $57.14
      12
      Design inspections
      0.53
      $40.00
      13
      Coding
      4.00
      $303.03
      14
      Code inspections
      3.30
      $250.00
      15
      Reuse acquisition
      0.01
      $1.00
      16
      Static analysis
      0.02
      $1.33
      17
      COTS Package purchase
      0.01
      $1.00
      18
      Open-source acquisition.
      0.01
      $1.00
      19
      Code security audit.
      0.04
      $2.86
      20
      Ind. Verif. & Valid.
      0.07
      $5.00
      21
      Configuration control.
      0.04
      $2.86
      22
      Integration
      0.04
      $2.86
      23
      User documentation
      0.29
      $22.22
      24
      Unit testing
      0.88
      $66.67
      25
      Function testing
      0.75
      $57.14
      26
      Regression testing
      0.53
      $40.00
      27
      Integration testing
      0.44
      $33.33
      28
      Performance testing
      0.33
      $25.00
      29
      Security testing
      0.26
      $20.00
      30
      Usability testing
      0.22
      $16.67
      31
      System testing
      0.88
      $66.67
      32
      Cloud testing
      0.13
      $10.00
      33
      Field (Beta) testing
      0.18
      $13.33
      34
      Acceptance testing
      0.05
      $4.00
      35
      Independent testing
      0.07
      $5.00
      36
      Quality assurance
      0.18
      $13.33
      37
      Installation/training
      0.04
      $2.86
      38
      Project measurement
      0.01
      $1.00
      39
      Project office
      0.18
      $13.33
      40
      Project management
      4.40
      $333.33

      Cumulative Results
      20.44
      $1,548.68

      The studies would identify the full sets of tasks normally performed in a pair-programming context and the costs, schedule, and staffing for these tasks. Software development involves much more than just coding and the impact of pair programming on total development is missing from the pair programming literature.

      Since pair programming often occurs in an agile or extreme programming context, similar data would be shown by sprint rather than for the full project as shown above.

  8. To be honest, you provide data based on your own calculator, and when digging deeper I find flaws on the comparisons on the relevance for the underlying tables. For example, for table 2 you talk about differences between using inspection and static analysis, and how that outperforms quality in a single programmer setting. In table 1 you compare salaries based on staff hours. You leave out the consideration of differences in staff hours for the second table, while one of the benefits that you may get from pairing together with another person (I leave open in which role or function you are putting two people together) may deliver the same results as if you had inspections and static analysis in place.

    Similarly, another fact that I don’t see covered is that pair programming all day is actually quite exhaustive. That said, you don’t do it all day. You also don’t pair on all tasks, but only on the once where you want to have the benefit of pairing, that is knowledge sharing between two people so that your lottery factor goes up. The lottery factor is the number of people that may win in the lottery before your project starts to suffer. In most projects I have seen that did not use pair programming, the psychological effects of not pairing had been that programmers were afraid to touch the code of some other developer, that tester were afraid to test functionality of another, and that business analysts didn’t dare to touch a particular document another BA had written. I would claim that in such a setting the lottery factor was 1 or even below that as the project was already suffering, and no code review or static analysis helped overcome those fears. On the other hand, combining the knowledge of two different experts in certain areas yielded a math of 1 + 1 >>> 2 – but only for the tasks that are worth pairing on. The simplistic things, you probably should be doing in solitude, still.

    This whole story reminds me about a proof that 1 = 2 that I once presented to my physics teacher. The proof started with “let x be the smallest real number smaller 1″. My teacher replied: “Based on a flawed premise, you can proof anything.”

    • Capers Jones

      Markus,

      Thanks for your comments. Cooperation is often usesful.

      Best,
      Capers Jones

    • Capers Jones

      Markus,

      There are quite a few additional topics centering on pairs that need further study. Here are a few of them:

      Health Hazards from Pair Programming

      Several studies of bacteria found on computer keyboards and mice found that keyboards harbored more bacteria than toilet seats. Keyboards are among the least sanitary of all forms of office equipment. Pair programming with the pair sharing one keyboard, mouse, or touch screen raises the probability of bacterial or viral infection for both members of the pair. A study is recommended to examine the incidence of flu and other contagious diseases among pair programming teams and also individual programmers.

      Mean Time to Interrupt from Shared Office Cubicles

      A study at IBM of private offices and shared cubicles by the author found that the mean time to interrupt or some form of distracting social commentary was about 13 minutes for two-person cubicles and 11 minutes for cubicles with more than two occupants. This study was congruent with similar but independent analysis by Tom DeMarco who found that solo programming productivity was higher for individual offices than for shared cubicles. There is a need for long-term research on pair programming and also for solo programming to confirm or challenge these earlier studies. In particular the mean time to interrupt for social rather than technical discussions needs more study in a pair programming context.

      Remote Pairing rather than Co-Location

      A study by the author of staff meetings and other technical meetings in IBM software groups involving 5 to 8 participants found that live meetings averaged about 15 minutes of non-technical social discussions prior to actually starting the technical discussion. The study also found non-relevant side issues came up about every 7 minutes. The meetings lasted about 3.25 hours but technical discussions occupied less than 90 minutes of the total meeting time.
      By contrast technical telephone conference calls with the same numbers of participants started within 5 minutes of the nominal start time and lasted about 53 minutes. There were many fewer side issues during conference calls than during live meetings. The technical discussions occupied about 44 minutes of the 53 minute calls.

      These results suggest that remote pairing with the pairs sharing information via shared screens and voice communication might be more effective than co-location and probably less of a health hazard for the participants. Remote pairing is a subject that would benefit from field trials.
      Modern studies on scrum sessions that measure total duration, technical content, and non-technical social topics would also be useful. It is possible that remote scrum sessions rather than live ones might have the same patterns noted in the early studies of live versus telephone meetings.

      Formal Inspections compared to Pair Programming

      Formal inspections of requirements, design, code, and other deliverable items have more than 40 years of empirical data and thousands of measured inspections. Key aspects of the success of inspections are the fact that time recording is an integral part of the task so that preparation time, inspection time, and defect repair time are all known. Another aspect with relevance to pair programming would be the use of 2-hour sessions with breaks, and possibly limiting the number of sessions to a maximum of three per day. Many aspects of programming benefit from introspection and private research. Both the effort and the ranges of defect removal efficiency are known for inspections, but are not known for pair programming. Additional research on pair programming with limited sessions per day would be useful.

      Thanks,
      Capers Jones

  9. Sal freudenberg

    Really interesting work, i agree that there are so many variables is it very tricky to prove or disprove pair programming, however I have done extensive empirical research of professional, experienced pair programmers in this area and find a number of issues missing in your argument. Namely: knowledge transfer and decreasing the ‘truck count’ for your project, collaborative understanding not only of expertise but of what is going on right now (One of the benefits of pairers peripheral awareness of each other’s chat), increase in problem space and potential options covered from collaborating, pulling in extra help from those who hear what you are talking about, having the fresh beginner mind challenge our dogmas, the human aspect of collaborating etc. I look at pairing from a cognitive point of view and find the benefits more compelling than the costs.

    • Capers Jones

      Sallyann,

      Thanks for the comments. In principle there may be cognitive values and I’m glad you are studying them.

      Unfortunately what might happen is that an offshore outsource vendor will arrive and point out to the CEO that the backlog of corporate applications waiting to be developed might be done somewhere else for much less money; so all of the pairs lose their jobs.

      A paper that evaluates 17 learning channels will be showing up on my blog as a PDF file. I’ll send it to you.

      Regards,
      Capers Jones

    • Capers Jones

      Sal,

      There is always communication and cross sharing of ideas. Pair programming is only one way, and not necessarily the best way.

      There are inspections, wiki sites, joint application design (JAD), quality function deployment (QFD), and many other methods that use cooperation and collaboration.

      One issue with pair programming is that an off shore outsource vendor might arrive and convince a CEO that the work could be done for less than half the cost.
      How do you defend against that?

      Regards,
      Capers Jones

    • Capers Jones

      Sal,

      There are many other ways to collaborate and cooperate in addition to pair programming.

      A lot of the work I do involves international projects with development teams in Europe, Asia, and the U.S. For these we use wiki sites, Skype meetings, joint application design (JAD), quality function deployment (QFD), and in recent years virtual team meetings. Software on large systems in large companies involves much more than just programming.

      I was commissioned to study software employment in major corporations and government groups. Here is what we found:

      1. Accounting/Financial Specialists
      2.    Agile coaches
      3.    Architects (Software)
      4.    Architects (Systems)
      5.    Architects (Enterprise)
      6.    Assessment Specialists
      7.    Audit Specialists
      8.    Baldrige Award Specialists
      9.   Baselining Specialists
      10.  Benchmarking Specialists
      11.  Business analysts (BA)
      12.  Business Process Reengineering (BPR) Specialists
      13.  Capability Maturity Model Integrated (CMMI) Specialists
      14.  CASE and tool Specialists
      15.  Client-Server Specialists
      16.  CMMI Assessors
      17.  Complexity Specialists
      18.  Component Development Specialists
      19.  Configuration Control Specialists
      20.  Cost Estimating Specialists
      21.  Consulting Specialists
      22.  Curriculum Planning Specialists
      23.  Customer Liaison Specialists
      24.  Customer Support Specialists
      25.  Data Base Administration Specialists
      26.  Data Center Support Specialists
      27.  Data quality Specialists
      28.  Data Warehouse Specialists
      29.  Decision Support Specialists
      30.  Development specialists
      31.  Distributed Systems Specialists
      32.  Domain Specialists
      33.  Earned Value Specialists
      34.  Education Specialists
      35.  E-Learning Specialists
      36.  Embedded Systems Specialists
      37.  Enterprise Resource Planning (ERP) Specialists
      38.  Executive Assistants
      39.  Frame Specialists
      40.  Expert-System Specialists
      41.  Function Point Specialists (certified)
      42.  Generalists (who perform a variety of software-related tasks)
      43.  Globalization and Nationalization Specialists
      44.  Graphics Production Specialists
      45.  Graphical User Interface (GUI) Specialists
      46.  Human Factors Specialists
      47.  Information Engineering (IE) Specialists
      48.  Instructors (Management Topics)
      49.  Instructors (Software Topics)
      50.  Integration Specialists
      51.  Intellectual Property (IP) Specialists
      52.  Internet specialists
      53.  ISO Certification Specialists
      54.  Joint Application Design (JAD) Specialists
      55.  Kanban Specialists
      56.  Kaizen Specialist s
      57.  Knowledge specialists
      58.  Key Process Indicators (KPI) specialists
      59.  Library Specialists (for project libraries)
      60.  Litigation support Specialists
      61.  Maintenance Specialists
      62.  Marketing Specialists
      63.  Member of the Technical Staff (multiple specialties)
      64.  Measurement Specialists
      65.  Metric Specialists
      66.  Microcode Specialists
      67.  Model Specialists
      68.  Multi-Media Specialists
      69.  Network maintenance Specialists
      70.  Network Specialists (LAN)
      71.  Network Specialists (WAN)
      72.  Network Specialists (Wireless)
      73.  Neural Net Specialists
      74.  Object-Oriented Specialists
      75.  Outsource Evaluation Specialists
      76.  Package Evaluation Specialists
      77.  Pattern Specialists
      78.  Performance Specialists
      79.  Programming Language Specialists (Java, C#, Ruby, PHP, SQL, etc.)
      80.  Project Cost Analysis Specialists
      81.  Project managers
      82.  Project Office Specialists
      83.  Project Planning Specialists
      84.  Process Improvement Specialists
      85.  Productivity Specialists
      86.  Quality Assurance Specialists
      87.  Quality function deployment (QFD) Specialists
      88.  Quality Measurement Specialists
      89.  Rapid Application Development (RAD) Specialists
      90.  Research Fellow Specialists
      91.  Reliability Specialists
      92.  Repository Specialists
      93.  Reengineering Specialists
      94.  Requirements engineer
      95.  Reverse engineering Specialists
      96.  Reusability Specialists
      97.  Reverse Engineering Specialists
      98.  Risk Management Specialists
      99.  Sales Specialists
      100.  Sales Support Specialists
      101. Scrum masters
      102. Security specialsts
      103. Standards specialists
      104. Systems analyst specialists
      105. Systems support specialists
      106. Technical translation specialists
      107. Technical writing specialists
      108. Test case design specialists (mathematical)
      109. Testing specialists (automated)
      110. Testing specialists (manual)
      111. Testing specialists (model driven)
      112. Total quality management (TQM) specialists
      113. Virtual reality specialists
      114. Web design specialists
      115. Web page design specialists
      116. Web master

      Thanks,
      Capers Jones

    • Capers Jones

      Sal,

      Formal inspections provide many of the same benefits, and also have a wider range of people involved than just pairs.

      Thanks,
      Capers Jones

    • Capers Jones

      Sal,

      Inspections have more than 40 years of empirical data and thousands of measured projects. They provide an even wider set of different views than pairs.
      There are also interesting pre-code methods such as quality function deployment (QFD) and joint application design (JAD) that provide collaboration among many participants – many more than just pairs.

      Thanks,
      Capers Jones

  10. jon w

    Your tables seem to contain made-up numbers. I can’t find any source of the data?
    Also, static analysis is great for languages and environments where it works, but many languages and environments don’t lend themselves well to static analysis.
    So your argument is, if anything, even weaker than the argument made in the pair programming research, which isn’t saying much

    • Capers Jones

      Jon,

      You are right about static analysis. There are currently around 2,500 programming languages and static analysis only works on about 25 out of that total. As it happens that set of 25 is still a pretty big quantity of applications.

      In any case inspections work on all 2,500 languages and they have more than 40 years of successful empirical data. It is too bad that the pair programming community is not as good with measures as the inspection community.

      For inspections preparation time, inspection time, and defect repair time are all recorded.

      Regards,
      Capers Jones

    • Capers Jones

      Jon,

      Here are the comparative defect removal efficiency levels for a sample of defect prevention, pre-test defect removal, and test stages:

      Defect Prevention Efficiency
      1 JAD 27%
      2 QFD 30%
      3 Prototype 20%
      4 Models 68%
      Subtotal 86%

      Pre-Test Removal Efficiency
      1 Desk check 27%
      2 Static analysis 55%
      3 Inspections 93%
      Subtotal 98%

      Test Removal Efficiency
      1 Unit 32%
      2 Function 35%
      3 Regression 14%
      4 Component 32%
      5 Performance 14%
      6 System 36%
      7 Acceptance 17%
      Subtotal 87%

      Regards,
      Capers Jones

  11. At my firm, we cultivate full-stack generalist developers who regularly engage in dialog with the system owner to refine requirements and develop according to an agile process. Our teams are never larger than 6 people. There is much to say on the topic of pair programming but I enjoy both pairing and going solo and have done it in different companies. I find I am more disciplined in a pair and more consistently productive.

    I find the “lines of code” measure to be woefully inadequate for any kind of measure of productivity. Every single code base I have ever seen written by solo programmers is full of unnecessary code, poor decisions that were never questioned and so on. What “lines of code” does not measure (and this can apply as easily to the 10x developer question as it does to pairing) is that the most productive developer may actually write fewer lines of code. The most productive developer might be the guy that says this module is BS and why are we writing it in the first place! $100k saved right there. Give that person a raise. (never happens)

    What I have also observed is that developers in a large corporate setting are more than capable of gaming the system and only emitting as much code as is required to keep them employed or make the promotion if they are lucky enough to have a technical ladder. If they are more productive than average, they will play games on the side or develop extra bells and whistles for fun. They will “architect” swiss-army knife solutions that serve no purpose other than their own amusement. I’ve never seen a pair go too far off into the weeds.

    So until you can measure also the hours that were worked as pairs that would have been wasted by a solo dev and the code that was never developed because it shouldn’t have been, you are not getting any kind of realistic idea of the cost or benefits of the practice. Additional areas to look into would be maintenance of unfamiliar code, knowledge diffusion, team cohesion, retention, and let’s just be blunt, replaceability. A lot of developers like to “own” whole pieces of a system — this politicises architecture and appears to generate job security. I’ll bet this is a huge reason that many operations require 100s of programmers. People are writing web apps that service millions of users with small teams of just a few people. That is where you should look for the evidence of what works and what doesn’t. Bloated corporations run largely by and for the benefit of overpaid management, probably aren’t going to give you the best idea.

    • Capers Jones

      Steven,

      I’ve added a new PDF to the blog that points out that LOC metrics are professional malpractice if used for cross-language or multi-language economic studies. Only function points work well for complex projects with many kinds of workers.

      Also, some applications have a dozen or more languages. For many languages there are not even any counting rules.

      In the blog I bypassed these issues by limiting the results to the same language in all.

      Best Regards,
      Capers Jones

      • Capers Jones

        Sean,

        I’ve only said that LOC is a very bad metric across language. I never said it was a good metric. It ignores requirements and design and also penalizes high-level languages which produce more functionality with less code. You can find these results on my blog in the history of LOC metrics.

        A side by side comparison of productivity rates for 50 languages shows that LOC consistently penalizes newer and more high level languages and shows the highest productivity for basic assembly. This is common knowledge, but when all examples use one language the issue is minimized.

        Capers Jones

        Capers Jones

      • Capers Jones

        Sean,

        Your are correct that large code volumes do not correlate with real economic productivity. One of the virtues of function point metrics is that it does correlate with real economic productivity by showing that low code volumes are often much more productive than high code volumes. Here are side by side comparisons of productivity measured using function points and LOC. As you can see, LOC violates all of the canons of economic productivity.

        Capers Jones

        Languages Function Pts. LOC per
        per Month Month

        Machine language 1.45 927.54
        Basic Assembly 2.70 864.86
        JCL 3.69 815.29
        Macro Assembly 3.80 810.13
        HTML 4.76 761.90
        C 5.62 719.10
        Algol 6.38 680.85
        Bliss 6.38 680.85
        Chill 6.38 680.85
        COBOL 6.38 680.85
        Coral 6.38 680.85
        Fortran 6.38 680.85
        Jovial 6.38 680.85
        GW Basic 6.74 663.21
        Pascal 7.07 646.46
        PL/S 7.07 646.46
        ABAP 7.69 615.38
        Modula 7.69 615.38
        PL/I 7.69 615.38
        ESPL/I 8.26 587.16
        Javascript 8.26 587.16
        Forth 8.77 561.40
        Lisp 8.77 561.40
        Prolog 8.77 561.40
        Basic (interpreted) 8.77 561.40
        Quick Basic 9.01 549.36
        C++ 9.68 516.13
        Java 9.68 516.13
        PHP 9.68 516.13
        Python 9.68 516.13
        C# 9.88 505.93
        Ada 95 10.08 496.12
        RPG III 10.27 486.69
        CICS 10.45 477.61
        DTABL 10.45 477.61
        Ruby 10.45 477.61
        Simula 10.45 477.61
        DB2 11.11 444.44
        Oracle 11.11 444.44
        Mixed Languages 11.41 429.53
        Haskell 11.41 429.53
        Pearl 11.69 415.58
        Speakeasy 11.69 415.58
        APL 12.20 390.24
        Delphi 12.64 367.82
        Objective C 13.04 347.83
        Visual Basic 13.04 347.83
        ASP NET 13.40 329.90
        Eiffel 13.73 313.73
        Smalltalk 14.02 299.07
        IBM ADF 14.29 285.71
        MUMPS 14.53 273.50
        Forte 14.75 262.30
        APS 14.96 251.97
        TELON 15.15 242.42
        QBE 15.92 203.82
        SQL 15.92 203.82
        Excel 17.73 113.48

    • Capers Jones

      Steve,

      You seem to assume that all pairs are top guns who don’t make mistakes – not true.

      If your teams are never larger than 6 then probably your company builds small applications.

      Have you built commercial packages aimed at 100,000 users or with teams of 500 people in half a dozen cities?

      An analogy is that there are many ways to build a row boat and have it come well. There are only a few ways to build a 75,000 ton cruise ship and have it come out well.

      Regards,
      Capers Jones

    • Capers Jones

      Steven,

      Here are the occupations noted on applications in the 100,000 function point size range:

      Normal Peak
      Staff Staff

      Programmers 290 434
      Testers 256 384
      Designers 138 228
      Business analysts 138 214
      Technical writers 60 84
      Quality assurance 51 82
      1st line managers 45 63
      Data base administration 26 34
      Project Office staff 23 31
      Administrative support 26 33
      Configuration control 15 21
      Project librarians 12 17
      2nd line managers 9 13
      Estimating specialists 9 12
      Architects 6 9
      Security specialists 3 5
      Perfomance specialists 3 5
      Function point counters 3 5
      Human factors specialists 3 5
      3rd line managers 2 3
      TOTAL 1,119 1,680

      Here are the applications noted in the 100 function point size range:

      Normal Peak
      Staff Staff

      Programmers 3 4
      Testers 0 1
      Designers 0 1
      Business analysts 0 1
      Technical writers 0 1
      Quality assurance 0 1
      1st line managers 1 1
      Data base administration 0 0
      Project Office staff 0 0
      Administrative support 0 0
      Configuration control 0 0
      Project librarians 0 0
      2nd line managers 0 0
      Estimating specialists 0 0
      Architects 0 0
      Security specialists 0 0
      Perfomance specialists 0 0
      Function point counters 0 0
      Human factors specialists 0 0
      3rd line managers 0 0
      TOTAL 4 10

      It sounds like you don’t use architects, designers, business analysis, data base admin or any of the other specialists.

      Thanks,
      Capers Jones

    • Capers Jones

      Steven,

      Not all big systems are bloated although many are. Some are big because of their functionality such as SAP and Oracle. Others are big because they deal with big problems such as the World Wide Military Command and Control System (WIMMCS).

      Thanks,
      Capers Jones

    • Capers Jones

      Steven,

      Some applications can service millions of users with a few developers, but others require more developers. For example a central-office telephone switching system such as ESS5 or System 12 service millions of users and also required hundreds of developers.

      Thanks,
      Capers Jones

  12. godzilla

    I took the liberty of re-creating the results from Table 1. Throughout the post you claim that a pair of programmers is 15% slower than a single programmer. However, in the calculated data you give the single programmer 20 LOC/hr, and the pair of programmers 15 LOC/hr. After claiming a 15% advantage throughout the text, you calculate using a 33% advantage. How do you justify exaggerating the numbers for the calculation?

    The results of Table 1 are entirely dictated by that one assumption. It takes 33% more hours, and the pay of 1/2 of the pair becomes 133% that of the pay of the single programmer (which is, of course, 266% for the complete pair).

    If you were to stick to your own numbers instead of the exaggerated assumptions used in the calculation, then the results would of course be 15% more hours and 230% cost. But even that is based on a false premise.

    If the numbers come from actual studies, you find a 15% increase in man-hours. In other words, you forgot to multiply “Staff Hours” by 2 in the left column. Since the number of man-hours increases by 15%, the cost of using pair programmers would be 115% that of using two individual programmers.

    • Capers Jones

      I have a pair programming calculator that allows users to put in their own numbers for both productivity and quality.

      It was simple to build. The fundamental point is that if you pay two people to do the work of 1 person it costs more.

      Regards,
      Capers Jones

  13. I might be missing it, but where in this paper do you quantify you methodology for obtaining these results you’ve presented?

    • Capers Jones

      Chris,

      You can use your own data – no matter whose data you use so long as you pay two people to do the work of 1 person it costs more money.

      Capers Jones

      • Donald Ball

        That sounds very clever, but it’s just begging the question. You eliminate a priori the possibility that the two people pairing may produce more useful results than two people working alone. Some studies have demonstrated this to be the case.

        • Capers Jones

          Donald,

          Two experts can easily do better than one amateur. One expert can easily do better than two amateurs. I don’t see your point.

          Capers Jones

        • Capers Jones

          Donald,

          And some show the opposite. There is no definite evidence that all pairs are better than all solo programmers.

          Here are some papers where the authors did not get good results from pairs.

          1. Sargeant, Will; Where Pair Programming Fails for Me; 12/29/2010; Tersesystems.com

          2. Evers, Jon; Pair Programming Considered Harmful; 3/12/2012; Techcrunch.com

          3. Lai, Kim Man, Atikus, Bruce, Chan, Keith; Pair Programming Issues and Challenges; Geogle.comp.polyu.edu.hk

          4. CMountford; Pair Programming is Kryptonite; 6/25/2009; blogs.atlassian.com/2009/06/pair_programming_is_kryptonite

          5. Berba Velasco jr., V; The Pitfalls and Perils of Pair Programming; http://www.streetdirectory.com/travel_guide/12455/programming/the_pitfalls_and_perils_of_pair_programming.html

          6. Wilden, Mark; Why I don’t like Pair Programming (and why I left Pivotal); 12/7/2009; mwilden.blogspot.com/2009/11/why_I_don’t_like_pair_programming_and.html

          7. Atwood, Jeff; Pair Programming vs. Code Reviews; http://www.codinghorror.blog/2007/pair-programming-vs-code-reviews.html

          8. Begel, Andrew and Ngappan, Nachiappan; Pair Programming: What’s in it for me?; research.microsoft.com/en-us/um/people/abegel/papers/esm-begel-2008.pdf

          Thanks,
          Capers Jones

        • Capers Jones

          Donald,

          Other studies have demonstrated the opposite. A good programmer using static analysis is cheaper and has higher quality than most pairs, unless they use static analysis too. Even if they do use static analysis, the solo programmer is still less expensive.

          Thanks,
          Capers Jones

    • Capers Jones

      Chris,

      I do remote or on-site interviews and use a chart of accounts that includes these 40 activities. Only large systems use all 40 – small projects use only a few and they don’t have as many specialists.

      1 Business analysis
      2 Risk analysis/sizing
      3 Risk solution planning
      4 Requirements
      5 Requirement. Inspection
      6 Prototyping
      7 Architecture
      8 Architecture. Inspection
      9 Project plans/estimates
      10 Initial Design
      11 Detail Design
      12 Design inspections
      13 Coding
      14 Code inspections
      15 Reuse acquisition
      16 Static analysis
      17 COTS Package purchase
      18 Open-source acquisition.
      19 Code security audit.
      20 Ind. Verif. & Valid.
      21 Configuration control.
      22 Integration
      23 User documentation
      24 Unit testing
      25 Function testing
      26 Regression testing
      27 Integration testing
      28 Performance testing
      29 Security testing
      30 Usability testing
      31 System testing
      32 Cloud testing
      33 Field (Beta) testing
      34 Acceptance testing
      35 Independent testing
      36 Quality assurance
      37 Installation/training
      38 Project measurement
      39 Project office
      40 Project management

      Thanks,
      Capers Jones

  14. Interesting read.

    In my personal experience (2-4 person teams, “agile”-ish) pair programming is about 20% slower initially to build a feature, but we have 90% less corrections (including bugs and “oh we forgot X”). In our case “pair programmers have little impact on non-code defects” isn’t the case; we can push back on bad design choices before they get hardened into code.

    I was a skeptic and had a client that wanted pair programming, so we measured the heck out of the project and compared it to projects of similar size. My numbers aren’t entirely anecdotal, but a very small sample size certainly.

    Another value that’s hard to measure is training and improvement; picking up better ways to do things, or even just learning new keyboard shortcuts make me more productive going forward. Plus, our designs are better: more resistant to change, easier to test, easier to integrate. I wish I could put a finger on why, but all I can come up with is that all the little conversations that go into “oh, this should be done by X” add up more than expected.

    Very interesting, I’ll check back to re-read the final version when it’s done. Thanks for doing this analysis!

    • Capers Jones

      Wes,

      Thanks for the comments. Interesting points. I hope you revisit and perhaps provide some data with your results.

      Regards.
      Capers Jones

    • Capers Jones

      Wes,

      One of the measured benefits of inspections is that they serve to prevent defects as well as find them. People who participate in inspection tend to avoid the kinds of bugs found. Over time, and this has been measured at IBM and elsewhere, inspections lower defects in all of the kinds of material that are inspected – requirements, design, code, etc.

      With inspections you get a wider set of opinions than with pairs.

      Regards,
      Capers Jones

    • Capers Jones

      Wes,

      Did your study and comparison use static analysis or inspections on either the individual or pair side?

      Thanks,
      Capers Jones

    • Capers Jones

      Wes,

      Thanks for the interesting observations – glad things worked so well for your use of pairs.

      Best Regards,
      Capers Jones

    • Capers Jones

      Wes,

      Glad you got good results from pairs. Inspections often have similar results and benefits, and are less expensive.

      Thanks,
      Capers Jones

  15. Victor Boudolf

    Is defect count the primary measure of quality? It’s easily measureable, but there are other factors such as maintainability and modifiability that affect the long term cost and usefulness of a system. Pairing as part of agile was a respons to combat brittle systems that could become obsolete before they were completed. Did you find any research on this?

    • Capers Jones

      Victor,

      Defects are not the only quality measure. However they do have some strong correlations.

      High defect counts lead to low reliability and hence unhappy clients.

      The correlation with maintainability is less strong.

      Regards,
      Capers Jones

    • Capers Jones

      Victor,

      My data (around 15,000 projects in total) show Agile to have better quality than waterfall or cowboy development but not as good as TSP or RUP.
      A similar study by Don Reifer with more than 500 agile projects comes out on July 15 and shows similar results.

      Regards,
      Capers Jones

    • Capers Jones

      Victor,

      Your point is a good one, but pairing is not the only method that can help avoid brittle systems. Inspections are useful for the same purpose.
      Both RUP and TSP have built many complex systems that were not very brittle.

      Thanks,
      Capers Jones

  16. Josh

    I have to wonder what the economic sense would be for *any* company to, on a whim, double their staff. One would imagine that the rapport built up between the ~500 already-hired employees would be fair enough that they could work together.

    Except Becky, everyone hates Becky.

    • Capers Jones

      Josh,

      If every company tried to adopt pair programming at the same time there would not be enough programmers to staff all the jobs.

      Regards,
      Capers Jones

    • Capers Jones

      Josh,

      Earlier this week I got a question about results in my data collection for projects larger than 10,000,000 Lines of code.
      There are very few of these giant projects – major defense systems; ERP package; and operating systems are examples.

      An application of 10,000,000 LOC in a mid-level language such as Java is about 200,000 function points. There would be as many as 2,000 software engineers involved.

      Assuming that you could find and hire 2,000 more capable programmers for pair programming, the annual costs would be about $200,000,000 assuming a burdened rate of $100,000 per year.

      None of my clients who build these big systems use either agile or pair programming. Has any else seen either agile or pairs at this large size range?

      Thanks,
      Capers Jones

      Thanks,
      Capers Jones

  17. Yann Ollivier

    “Pair programming is a new method”: are you serious? I had already heard about it in the late 90′s as a part of XP.
    LInes of code/ hour as a measurement of productivity… I am speechless. I thought this kind of nonsense died in the last century, and everyone had moved on.
    I guess you live in a cave.
    As for the value of pair programming, I will just say: everything is not a quantity you can measure. Pair programming can help e.g. to identify potential problems with a requirement earlier, to improve communications between developers usually working with different parts of a system, to spread knowledge in a team…
    maybe instead of focusing on figures, statistics and categorization of people as different types of factory machines, you should just go and talk to developers who have tried PP. Ask them what they think about it. What are the cons and pros… The people who know best how to code are the people who code! Not their managers of some other productivity guru.
    Managers treating their employees as production units with some inputs and outputs are the plague of the IT industry.

    • Capers Jones

      Yann,

      Sounds like you work on small projects. What about architects, designers, software quality assurance, business analysts, testers, and any of the other 116 occupations found in big software groups. Don’t they do anything useful in your opinion?

      Capers Jones

  18. There are several highly questionable assumptions underlying this study. I will choose one: that the writing of code is “the work of one person”. That may be true for the typing of the code, but it may not be true of the writing or authorship of the code. Indeed, when we look carefully at software development, we see that many people are involved in a collaborative process: requirements development, design, review and inspections, testing, building, maintaining, and supporting the product. If one were to talk an overly structured model of software development, it could like an assembly line, with all of the elements happening in a discrete, linear, and sequential way. But people who observe people doing development and testing directly will quickly recognize that the assembly line is an overly simplistic, inaccurate, and unhelpful model. Drawing on my experience both in software and in earlier career, theatre, I can assure you that software development is the work of creation and design, not merely the assembly of widgets or the production of lines of code. In creative work, close collaboration and raid feedback accelerates the development of the work, and greatly improves its quality. Quantitative models such as are you are using here grossly oversimplify and misrepresent real development work.

    • Capers Jones

      Michael,

      Most applications have more than one person involved. Big applications can have hundreds and more than 50 occupations.

      My blog does not say cooperation is bad. Inspections, for example, have more than 40 years of empirical data behind them.

      Regards,
      Capers Jones

    • Capers Jones

      Michael,

      One of the reasons I would not use pair programming myself is that most of my programming is on algorithms that I will be patenting.
      For inventions that have not yet been patented pair programming is hazardous. I don’t want anyone else to know how my inventions work until after the patents are filed.

      Capers Jones

  19. Capers -

    This is always a fun debate. I’ve been in the industry since 1973, and have led software teams since 1992. I have direct experience in both the traditional model (for 26 years) and the paired model (for the last 14 years). There is no comparison when it comes to quality, productivity, or delight.

    I began pairing my programmers in 1999, and the results were profound and dramatic. in 2001, four of us formed a company called Menlo Innovations around this model. The pairing construct is the most powerful managerial tool I have ever discovered. We pair all of our roles, PMs, High-tech Anthropologists®, Programmers, QA. We have defeated head-to-head competitors who later became customers. During the head-to-head time, they have told us they outspent us 10 to 1 and never got a product to market. In the meantime, our product development effort captured 30% of the worldwide units shipped.

    One of the dramatic changes is that we don’t need a support team, or a support hot line. Our phone doesn’t ring with trouble. The last time our team can recall a software emergency was 2004. In my old life I used to allocate upwards of 30% of my team resources to fixing problems created in the past. The cost was more than dollars, it was demoralizing to the teams I led (and to me).

    Our team at Menlo enjoys a 40 hour work week, never weekends, and we’ve never had to deny a vacation request in our history. One of the reasons we can do this is because a paired model easily scales (up and down). Without towers of knowledge, we regularly defeat Brooks’ Law. If we need more done, we can add more people.

    If any of us ever really believed that speed of coding was the answer, we would give all of our programmers typing classes. It’s never been about efficiency, but rather effectiveness. The cost of a programming error these days can cost lives, companies, or economies. The days of hero-based programming models are long gone.

    If you’d like to see The Menlo Software Factory firsthand, we host tours. Last year over 2,000 people traveled to see us from around the world. This year we are on pace for over 3,000 visitors. None of the questions posed above have to remain theoretical. We’ve been doing nothing but pairing for over a dozen years. We just tripled our office space for the third time in our history, and have received 5 awards from Inc. magazine for our revenue growth. We build software products for other companies, so our project sponsors are not a captive audience. They can choose any vendor they want, but they choose us, even though it would appear we are twice as expensive. Most come to us after they have failed at their own attempts and suddenly our system makes sense to them.

    You are welcome to come explore and ask questions.

    http://www.menloinnovations.com/by-visiting/classes-tours

    Rich Sheridan
    Co-founder, CEO, Chief Storyteller, Tour Guide
    Menlo Innovations LLC
    Ann Arbor, Michigan

    • Capers Jones

      Rich,

      Your comments are interesting. Do you have any actual quality of productivity data?

      Your company sounds like a fun place to work.

      Thanks,
      Capers Jones

    • Capers Jones

      Rich,

      Your comment is interesting. From my data on around 15,000 projects here are productivity rates for the projects with high quality:

      10 function points = 16.84 function points per month
      100 function points = 13.37 function points per month
      1000 function points = 11.81 function points per month
      10000 function points = 5.72 function points per month
      100000 function points = 2.68 function points per month

      Are your results better than these?

      What is the largest size applications that you build?

      I’ll send you summary data for all of my projects. It is too big for this blog.

      Thanks,
      Capers Jones

    • Capers Jones

      Matt,

      Paying two people to do the work of one costs more unless the two are more than twice as good, which is not the case.

      There are also 116 different kinds of occupations associated with software. Do you recommend pairing all the others?

      Regards.
      Capers Jones

  20. but you are calculating only people costs, but don’t keep in ming future value. If we keep in mind that most projects are late 2-3 times, maybe it’s reasonable to employ pair programming at certain level

    • Capers Jones

      Andrej,

      Projects run late for many reasons but two stand out as the most common: 1) Having too many bugs when testing starts, which stretches the test cycle; 2) Having changing requirements that expand the scope of the project without expanding schedules.

      Bugs originate in requirements, design, code, documents, and bad fixes. All of these need to be reduced. Formal inspection prior to testing are beneficial; so is static analysis. Pair programming sometimes helps but not always.

      Regards,
      Capers Jones

    • Capers Jones

      Andrej,

      Suppose an off shore outsource vendor arrived at your company and told the CEO they could do the work for half the cost.
      How would you explain to the CEO that these claims may not be true?

      Regards,
      Capers Jones

    • Capers Jones

      Andrej,

      Pair programming is only one of many methods that have an impact on quality and cost effectiveness. Some of the others are:

      Defect Prevention Methods
      1 Agile embedded users
      2 Automated quality prediction tools
      3 Business process analysis
      4 Certified SQA personnel
      5 Certified test personnel
      6 CMMI 2
      7 CMMI 3
      8 CMMI 4
      9 CMMI 5
      10 Code “intelligent agents” that support major languages
      11 Cyclomatic complexity measures
      12 Data mining of legacy requirements
      13 Decision tables
      14 Defect detection efficiency (DDE)
      15 Defect removal efficiency (DRE)
      16 Design “intelligent agents” that support reuse and best practices
      17 Due diligence analysis – venture-backed software companies
      18 Enterprise architecture
      19 Essential complexity measures
      20 Formal quality estimates
      21 Function extraction
      22 Function point quality measures
      23 Hexawise test case design
      24 Inspections – architecture
      25 Inspections – code
      26 Inspections – design
      27 Inspections – requirements
      28 Inspections – test cases
      29 Inspections (formal)
      30 ISO/IEEE quality standards
      31 Joint Application Design (JAD)
      32 Kaizen
      33 Kanban
      34 Modeling – requirements
      35 Pair programming
      36 Patterns – design
      37 Peer reviews (semi formal)
      38 Poka-yoke
      39 Problem statement language (PSL)
      40 Prototypes – disposable
      41 Prototypes – evolutionary
      42 Quality circles
      43 Quality clauses in outsource contracts
      44 Quality function deployment (QFD)
      45 Reuse – certified code
      46 Reuse – certified design
      47 Reuse – certified requirements
      48 Reuse – certified test cases
      49 Reuse – documents
      50 Reuse (certified)
      51 Risk analysis – automated
      52 Risk analysis – manual
      53 Root cause analysis
      54 SANS/MITRE analysis of coding defects
      55 SCRUM
      56 Security “intelligent agents” that scan code during development
      57 Security reviews
      58 Six-sigma (generic)
      59 Six-Sigma (Lean)
      60 Six-Sigma for software
      61 Software Quality Assurance (SQA)
      62 Structured code principles
      63 Test coverage analysis
      64 Total quality management (TQM)
      65 UML diagrams

      Pre-Test Defect Removal Methods
      1 Audits (compliance)
      2 Audits (financial)
      3 Capture/recapture
      4 Client reviews of specifications
      5 Data quality inspections
      6 Document editing – trained editors
      7 Document editing – untrained editors
      8 Due diligence inspections – acquisitions
      9 Due diligence inspections – venture investments
      10 Formal requirements inspections – changes
      11 Formal architecture inspections – changes
      12 Formal architecture inspections – original
      13 Formal code inspections – changes
      14 Formal code inspections – original
      15 Formal design inspections – changes
      16 Formal design inspections – original
      17 Formal requirements inspections – original
      18 Formal test case inspections – changes
      19 Formal test case inspections – original
      20 Function extraction
      21 Governance audits
      22 Indpendent verification and validation (IV&V)
      23 Informal peer reviews
      24 Intelligent agents that scan code
      25 Intelligent agents that scan requirements and text
      26 ISO/IEEE standards audits
      27 Kaizen
      28 Kanban
      29 Management reviews
      30 Modeling – requirements
      31 Pair programming
      32 Pattern matching (costs)
      33 Pattern matching (design)
      34 Pattern matching (quality)
      35 Pattern matching (requirements)
      36 Pattern matching (sizing)
      37 Peer reviews (semi formal)
      38 Personal desk checking
      39 Phase reviews
      40 Poka-Yoke
      41 Proofs of correctness (automated)
      42 Proofs of correctness (manual)
      43 Refactoring
      44 Re-inspection of materials that fail inspections
      45 Requirements modeling
      46 Scrum sessions
      47 Software quality assurance reviews
      48 Static analysis – code
      49 Static analysis – text
      50 Structural quality reviews

      Thanks,
      Capers Jones

  21. This purist approach presented in this article seems to this concept way beyond what a pragmatic Agile Team would apply PP in their daily development routine. All or Nothing isn’t the answer and spending so much energy to deny a great concept to the point of not even trying it undermines all the great aspects that Pair Programming taken in pragmatic bites has to offer. Guess what…its a throttling of the concept of Pair Programming thats highly successful. You Pair Program WITH the resources at hand…you dont take your current development labor force and X 2. You spend part of your day in a Pair Programming set when it makes the most sense. Great times to Pair up are when you’re in very complex code…deep in the “J” curve of a new piece of functionality and/or starting out a pattern to a new project. You break up your pairs when the efficiency makes sense. Its not a “Siamese Union” as you speak of in this dialogue. Its a pragmatic application of the concept and you do it with the resources at hand. Yes, you may hire an extra developer or two per team of 8-10 but the primary purpose is to avoid serious coding mistakes, having rouge coders that obfuscate by habit and to eliminate a single point of failure from a “Domain” knowledge perspective. What kind of impact does a single developer with 10 years of solo development on a BRE have on a company when they get hit by a bus, leave the company, or get promoted to another position or team? Pro-Active pairing eases this transition AND is healthy for the code, the team and ongoing transitions that would otherwise bring an IT shop to its knees.

    • Capers Jones

      QuiJon,

      In large projects there are backups.

      I myself have taken over work from a programmer who left a company.

      Regards,
      Capers Jones

      • Oleg

        What is the cost of having backup and how knowledge is transferred on a constant basis?

        • Capers Jones

          Oleg,

          Elsewhere in my blog you will find a paper on software learning channels. It discusses 17 methods of learning and sharing information for software teams.

          Thanks,
          Capers Jones

    • Capers Jones

      QuiJon,

      How do you explain the many programmers who don’t like pair programming and change jobs just to get away from it?

      Capers Jones

  22. Mark Schell

    Capers,

    I could not disagree more. My experiences include working as a single programmer for 11 years and as a pair programmer for 3 years. During my time as a pair programmer I have produced far more software with fewer defects, more readable/maintainable code and have learned far more than the previous 11 years of single code development.

    You have missed in your article many of the other benefits to pair programming that will all help the organization develop software better and faster in the future. These topics include:
    • faster and more knowledge transfer between the team members
    • hands on mentoring/teaching of junior developers by senior developers
    • increased code readability/maintainability
    • the ability for a pair to keep each other focused on the task at hand (similar to the weight lifting partner)
    • increased collective code ownership
    • increasing team morale
    • shared stress across the development team with less chance of burn out
    • increased collaboration

    There are a lot of statistics stated in this document but I do not see anywhere an indication on exactly where these numbers have come from or if you have any operation experience with pair programming yourself? It is difficult to understand the true benefits of pair programming until you do it yourself; I was not originally a believer until I lived it every day. I have also seen a large number of places of employment that do not optimize their use of pair programming. Just because an organization reports they are doing pair programming does not mean that they are doing it well. Optimization of pair programming includes having two keyboards, mice and monitors such that the pair developer is not just watching but is an active member of the pair. Did you study measure teams that paired in that fashion?

    Again I disagree about your beliefs stated in the section titled “Harmful Consequences of Divided Authority.” Please note that all of the professions that you state work well with pair programming are the doers/producers in the organization while the positions that do not work well are the management/leadership positions. Software engineers are the doers and often times in a complicated environment the use of pair programming is very effective allowing the complicated to be simplified via the use of two people collaborating on a solution. In fact I would recommend that pairing be used for some of the other professions in the software industry. It also appears that other industries are looking for way to increase collaboration in their work environments and they too are looking for ways to introduce pairing to increase collaboration and success for their company goals.

    A number of times in your article you state that pairing cannot scale. Again I disagree. First if you keep teams small within the large program it will make it easier to scale. Secondly I believe you are looking at the problem incorrectly. We have seen that pairs produce more code over the long term than 2 singles given all of the other factors therefore it is a fallacy that a large company needs to double their workforce, but instead it can decrease it. Not only can you decrease developers but you can also decrease the number of testing staff (was this measured in your research) given the decrease in number of bugs. Also it has been shown that when you increase the size of the team communication become drastically more difficult which also requires additional staff to handle the additional communication. Therefore decreasing team size will also make you more efficient requiring even less staff.

    Mark Schell

    • Capers Jones

      Mark,

      Interesting comments. I’ve done pair programming and also trio programming, and also co-authoring.

      No matter how you look at it paying 2 people to do the work of 1 costs more.

      Regards,
      Capers Jones

      • Mark Schell

        Capers,

        Ineffective pairs may produce the same work as one, but I have seen effective pairs produce more than 2 developers over the long term. Just like any process, it is only effective as the individuals using it.

        Mark

        • Capers Jones

          Mark,

          That is why google searches on “pair programming successes” and “pair programming problems” turn up about the same number of results.

          Other methods also involve cooperation – inspections, quality function deployment (QFD), joint application design (JAD), and many others.
          In fact the actual name of one of the most effective methods is “team software process” (TSP) so cooperation is included in the name itself.

          I have data on 34 methods and all of them have both successes and failures.

          Thanks,
          Capers Jones

      • Dave

        > No matter how you look at it paying 2 people to do the work of 1 costs more.

        You keep on asserting this, but you’ve shown no evidence that it is actually true. Your article is based on this flawed premise.

        • Capers Jones

          Dave,

          The math is so simple I’m surprised by your question. If you have an average burdened compensation of $100,000 per year and you pay two people the costs will be $200,000 per year.

          Unless the two people are more than twice as fast, or produce the same features with less than half the code, or both, then pairs will cost more. Neither of these occur with enough regularity to make pair programming profitable.

          Capers Jones

          • Dave

            Ok – so you’ve got it! Yes! There is a possiblity that two people could be twice as fast!
            It is well established that lines of code is a poor productivity metric.

            For real research you need to debunk the real claims:
            * Does pair programming reduce defect rates?
            * Does pair programming reduce number of times a programmer goes down a “blind alley”?
            * Does pair programming reduce the need for rework?
            * Does pair programming create more concise code?

          • Capers Jones

            Dave,

            Normally before a method is released to the world it is validated. For example IBM validated inspections prior to release.

            With pair programming there was no validation – just a release of an unproven method.

            There are thousands of measured projects that used inspections with very reliable data on defect prevention and defect removal. Except for this blog and a few other small studies, pair programming has less empirical data than RUP, TSP, inspections, or other methods.

            Thanks,
            Capers Jones

          • Capers Jones

            Dave,

            While you are asking questions, what about:

            How many programmers have quit to get away from pair programming?
            Does pair programming raise the odds of flu or communicable diseases?
            Does pair programming find as many bugs as static analysis?
            Do bad pairs cause more defects and more than double the cost of one average programmer?

            Most of the pair literature seems to compare unaided solo programmers against unaided pairs without even considering debugging tools, static analysis, inspections, mathematical test case design, and other methods that actually have solid data.

            Capers Jones

          • Dave

            In terms of empirical evidence. I was skeptical of the benefits myself, until I trialled the adoption at a small/medium company.

            What we found was on projects that used pair programming, there where fewer spikes in the burn down. We found that delivery was more consistent and predictable. Programmers can’t ‘hide’ problems for short term gains as easily.

            We probably could have achieved the same results using other methods. However, the result of using pair programming was overall improved delivery.

          • Capers Jones

            Dave,

            An issue that this blog addresses is that phrases such as “improve” are not really data.

            Did you try static analysis or inspections, both of which also improve delivery. Inspections don’t allow hiding of problem either.

            Thanks,
            Capers Jones

    • Capers Jones

      Mark,

      Your comments are interesting. Have you published specific numbers that confirm your long-term statement?

      Thanks,
      Capers Jones

    • Capers Jones

      Mark,

      I have tried pair programming and did not like it. Many other don’t like it either.
      There are some substantial differences in the kinds of work done for large systems compared to small.
      Here are the total activities that I measure:

      1 Business analysis
      2 Risk analysis/sizing
      3 Risk solution planning
      4 Requirements
      5 Requirement. Inspection
      6 Prototyping
      7 Architecture
      8 Architecture. Inspection
      9 Project plans/estimates
      10 Initial Design
      11 Detail Design
      12 Design inspections
      13 Coding
      14 Code inspections
      15 Reuse acquisition
      16 Static analysis
      17 COTS Package purchase
      18 Open-source acquisition.
      19 Code security audit.
      20 Ind. Verif. & Valid.
      21 Configuration control.
      22 Integration
      23 User documentation
      24 Unit testing
      25 Function testing
      26 Regression testing
      27 Integration testing
      28 Performance testing
      29 Security testing
      30 Usability testing
      31 System testing
      32 Cloud testing
      33 Field (Beta) testing
      34 Acceptance testing
      35 Independent testing
      36 Quality assurance
      37 Installation/training
      38 Project measurement
      39 Project office
      40 Project management

      Thanks,
      Capers Jones

  23. Pair programming is hardly new. I suffered through it in the mid to late 90′s. At that time and place, it was a reward for being “in” with management. It was understood that only the Golden Boys (or Girls) were good enough for pairing. Lots of jealousy ensued. Some of us quit the company. Hehe.

    OTOH, using LoC (or any of its surrogates) as a measure of productivity is a false choice. The best code is the least code, and depending on the reward system in place (all too often using LoC), both singles and pairs will endeavour to produce to the reward system.

    For database-centric applications (embedded, games, and the like need not apply), using advanced schemas and DRI will gain more value than client (or server) code. Such systems are where I live. In such system development, collaboration (call it pair programming) generally yields more cohesive designs, and not so much code.

    I don’t abide pair programming, as it happens. But divided authority is hardly one of its problems; the popular matrix management *is* divided authority. Pair programming is codified collaboration. Open offices with everybody yakking and playing frisbee is du jour among the younger set. Social networking and forced collaboration in the workplace. It’s not surprising that the younger set is re-inventing pair programming. Meh.

    The most effective alternative to pair programming (assuming that single coders appear to be failing) is clear headed management, both in the business/customer/user/etc. side and the technical side. Far too often, one or both of these managements is foggy on requirements/design, and uses the coders as scapegoats for the inevitable Death March and failure. As Jimmy Carter pointed out: “The fish rots from the head.” I guess such managements conclude that pairs will be better able to divine from the entrails, when they cannot. Hiring smarter singleton coders won’t help either; they’ll just get pissed off at the bozos sooner.

    • Capers Jones

      Robert,

      A new post on my blog shows that LOC metrics constitute professional malpractice for cross- or multi-language economic analysis.

      There are 116 occupations associated with the software industry. Do you recommend pairing any of the others?

      Regards,
      Capers Jones

    • Capers Jones

      Robert,

      From being an expert witness in a number of lawsuits about poor quality or failing projects, I agree that management causes more problems than technologists.

      The 4 most common problems noted in breach of contract lawsuits are:

      1 Poor estimates prior to beginning.

      2 Poor quality control and bypassing pre-test inspections and static analysis

      3 Poor change control in the face of requirements changes at > 2% per calendar month

      4 Poor and sometimes deceitful management status tracking which conceals problems until too late

      Regards,
      Capers Jones

  24. Thank you Capers for an interesting read.
    I find it very encouraging, being an agile practitioner.
    Let me explain. Probably the most impotant thing about being agile is relying on empirical data, relevant in the context of the system being measured. And in this work I find very little empirical data. The ones added in your reply to Mr. Weinberg is irrelevant to what is being measured. For example: does pair programming reduce the cost of fixing bugs? increases autonomy of the team? reduces defect fixing time? and more.
    Furthermore, your work focuses on the efficiency of the individual, whereas agility is far more concerned with the productiveness of the team and of the organization.
    For example: one of my best experiences as an architect was pairing with the product manager: we would write the user-stories together, switching navaigation and driving frequently, and drive each other to creating better requirements. It also made our interaction with the development team/s far more rewarding: their ability to provide better acceptance tests was dramatically improved; they involved us far earlier in the iteration reviewing working software; we would enhance the requirements more frequently owing to the team’s input; and finally (not that it affects the programmers so much… Or does it?) the customer was much happier with the results, compared to other projects. Not that I am certain that our pairing is the one single source of influence on these phenomenna.
    Also, not that I have any problem with the scientific method. On the contrary, as long as the context is correct.
    Personally, I like the seminal work of Arlo Balshee – Promiscuous Pairing and Beginners Mind. By all means this work does not comply with scientific standards. But it does one thing very well: it emphasizes the importance of context. In fact, it clearly mentions that the results are relevant only to the organization in which the work was done. If it’s any use to you, may I humbly suggest you design a set of experiments in the spirit of this work? Of course, your experiments will be different, being in a different context.
    Thank you again for making the opportunity for this fruitful debate.

    • Capers Jones

      Ilan,

      Thanks for the comment. An interesting study with more than 500 agile projects will come out later this month by another researcher, Don Reifer.

      Regards,
      Capers Jones

    • Capers Jones

      Ilan,

      The paper dealt only with pair programming, which is a narrow concept in which two people are paid to do the work of 1.

      More general forms of cooperation are useful and widely used: inspections, focus groups, wiki groups, quality function deployment (QFD), joint application design (JAD), and many more.

      Regards,
      Capers Jones

  25. Capers Jones

    I was not quoting Laurie. I have a pair programming calculator that anyone can use – it was simple enough to build that probably anyone reading the blog could have done the same. It allows users to put in their own assumptions about pair programming and individual coding speeds and also monthly costs.

    Another part of the calculator handles code defects for one or a pair, with or without static analysis and inspections.

    It is surprising that in all of these responses not a single one has yet shown any data as yet.

    Regards,
    Capers Jones

  26. Capers Jones

    Alex,

    More than 40 years of empirical data show that inspections also lead to low cyclomatic complexity. For that matter many single programmers don’t even code with high cyclomatic complexity.

    Capers Jones

  27. Capers Jones

    LOC per hour is not a good choice for different languages or for the majority of applications that use more than one language at the same time such as C# and mySQL or Java and HTML.

    Another newer paper on my blog says that LOC is professional malpractice for economic studies of multiple languages.

    Capers Jones

  28. Capers Jones

    I was not quoting Laurie’s paper. I use a pair programming calculator. It allows changes in speed for both the pair and the individual.

    No matter how you change things, you are still paying two people for the work of one.

    Use your own data and see if you can make pair programming cost less.

    Capers Jones

  29. I can’t say that any of the 116 should be done as Pair, by default. Collaborative development is widely used, in my experience; just not as structured with two humans on one machine. Relational database development is a good example where collaboration is helpful, in that it forces the group to understand how all the data hangs together (or we shall all hang separately). This is a good thing.

    • Capers Jones

      Robert,

      My paper was on pair programming – not on more general forms of collaboration. All large software projects and all large engineering projects depend upon collaboration across many fields.

      One issue with pair programming is that it ignores all other contributors. Another issue is that companies using pairs are likely to be visited by off shore outsource vendors who will say they can do the same work for less than half the cost. Unless the pair groups have good data, and few do, they are vulnerable to losing their jobs due to overseas vendors.

      Regards,
      Capers Jones

  30. Dragonfly

    Table 1 compares costs/time, however, there is no reference to a controlled statistical study to see where this table is derived from.
    In short, until there is a double blind trail that proves the figures, the article is not worth the paper it’s written on.

    There is also no reference done to the re-work required due to quality issues of complex problems done by single or paired programmers. Re-work might in-fact play a larger role in that it could contribute to downtime, which ultimately could cost a lot to a company.

    • Capers Jones

      jschrap,

      I think you have your science backwards. Usually before a medicine or new device is released to the public it is tested and validated. That did not happen for pair programming – it went out cold without any proof of success.

      If you do a google search on “pair programming success” and “pair programming problems” you get about the same numbers of hits. Where is the proof of success? Why was it not proven successful before release?

      Some of the reports that are unfavorable include:

      1. Sargeant, Will; Where Pair Programming Fails for Me; 12/29/2010; Tersesystems.com

      2. Evers, Jon; Pair Programming Considered Harmful; 3/12/2012; Techcrunch.com

      3. Lai, Kim Man, Atikus, Bruce, Chan, Keith; Pair Programming Issues and Challenges; Geogle.comp.polyu.edu.hk

      4. CMountford; Pair Programming is Kryptonite; 6/25/2009; blogs.atlassian.com/2009/06/pair_programming_is_kryptonite

      5. Berba Velasco jr., V; The Pitfalls and Perils of Pair Programming; http://www.streetdirectory.com/travel_guide/12455/programming/the_pitfalls_and_perils_of_pair_programming.html

      6. Wilden, Mark; Why I don’t like Pair Programming (and why I left Pivotal); 12/7/2009; mwilden.blogspot.com/2009/11/why_I_don’t_like_pair_programming_and.html

      7. Atwood, Jeff; Pair Programming vs. Code Reviews; http://www.codinghorror.blog/2007/pair-programming-vs-code-reviews.html

      8. Begel, Andrew and Ngappan, Nachiappan; Pair Programming: What’s in it for me?; research.microsoft.com/en-us/um/people/abegel/papers/esm-begel-2008.pdf

      Of the 25 or so papers I’ve read there is an interesting mix of results, but the gist of the studies look like this to me:

      Without static analysis, pairs have slightly higher quality than solo programming at higher costs.
      With static analysis for solo and no static analysis for the pairs, the solo results are better for quality and costs.
      With static analysis for both the solo and the pair, quality is about the same but solo is cheaper.
      With inspections the solo has better quality than a pair but equivalent costs.
      An expert solo is better than an average pair
      An expert pair is better than an average solo
      An expert solo and an expert pair have similar quality but the solo is cheaper

      Thanks,
      Capers Jones

    • Capers Jones

      Controlled studies are normally done by academics. I collect data from corporations with whom I have non-disclosure agreements.

      Normally controlled studies occur prior to releasing a new medicine or a new methodology. They take place before the release, to keep the products from causing harm.

      So far as I know there were no controlled studies or even any due diligence on pair programming – it simply was released without any knowledge of its effectiveness, which ranges from marginal to very bad.

      The software industry is lucky that pair programming sometimes works, even if it is always more expensive. However it often fails and sometimes fails so badly that top-tier programmers change jobs rather than put up with it.

      Thanks,
      Capers Jones

  31. Christian

    I think that to try to quantify the costs of pair programming is the worst thing I have heard in years (measuring LOC? Really? I suggest reading “Refactoring”). In a post like this I would prefer to see metrics related to motivation, confidence and focus on the product rather than the ones you are providing in your article. Pair programming is a good practice in terms of good design and code quality, contributes to the what I call “developer happiness” increasing team motivation and leading to some dynamics that would otherwise be impossible to get. I think you should focus your concerns on your customers and the product you are providing. A good starting point would be http://agilemanifesto.org/principles.html.

    Best,
    Christian.

    • Capers Jones

      Christian,

      If you read elsewhere in the blog you will see a history of LOC metrics with the conclusion that it is professional malpractice for studies across unlike languages.

      You will also find papers on the economic fallacy of “cost per defect” and the untrue claim that cost per defect goes up exponentially.

      A third metric paper on the blog deals with the many uses of function point metrics, which are the best for economic studies.

      However, your claims about the efficacy of pair programming are subjective and without data. Many people do not like pair programming and some change jobs due to this dislike – they are often top-tier programmers with high appraisal scores. Here are a few citations:

      1. Sargeant, Will; Where Pair Programming Fails for Me; 12/29/2010; Tersesystems.com

      2. Evers, Jon; Pair Programming Considered Harmful; 3/12/2012; Techcrunch.com

      3. Lai, Kim Man, Atikus, Bruce, Chan, Keith; Pair Programming Issues and Challenges; Geogle.comp.polyu.edu.hk

      4. CMountford; Pair Programming is Kryptonite; 6/25/2009; blogs.atlassian.com/2009/06/pair_programming_is_kryptonite

      5. Berba Velasco jr., V; The Pitfalls and Perils of Pair Programming; http://www.streetdirectory.com/travel_guide/12455/programming/the_pitfalls_and_perils_of_pair_programming.html

      6. Wilden, Mark; Why I don’t like Pair Programming (and why I left Pivotal); 12/7/2009; mwilden.blogspot.com/2009/11/why_I_don’t_like_pair_programming_and.html

      7. Atwood, Jeff; Pair Programming vs. Code Reviews; http://www.codinghorror.blog/2007/pair-programming-vs-code-reviews.html

      8. Begel, Andrew and Ngappan, Nachiappan; Pair Programming: What’s in it for me?; research.microsoft.com/en-us/um/people/abegel/papers/esm-begel-2008.pdf

      Thanks,
      Capers Jones

  32. Greg Young

    I read this the first time and thought wow pair programming must be very bad, odd how people are not seeing it (especially on lean teams!).

    Then I realized the numbers are all made up.

    Mark Twain would have some choice words about this. It is sad that this can be posted to look like “research” in our industry.

    • Capers Jones

      Greg,

      Other authors also have published negative information on pair programming. Here are a few.

      Capers Jones

      1. Sargeant, Will; Where Pair Programming Fails for Me; 12/29/2010; Tersesystems.com

      2. Evers, Jon; Pair Programming Considered Harmful; 3/12/2012; Techcrunch.com

      3. Lai, Kim Man, Atikus, Bruce, Chan, Keith; Pair Programming Issues and Challenges; Geogle.comp.polyu.edu.hk

      4. CMountford; Pair Programming is Kryptonite; 6/25/2009; blogs.atlassian.com/2009/06/pair_programming_is_kryptonite

      5. Berba Velasco jr., V; The Pitfalls and Perils of Pair Programming; http://www.streetdirectory.com/travel_guide/12455/programming/the_pitfalls_and_perils_of_pair_programming.html

      6. Wilden, Mark; Why I don’t like Pair Programming (and why I left Pivotal); 12/7/2009; mwilden.blogspot.com/2009/11/why_I_don’t_like_pair_programming_and.html

      7. Atwood, Jeff; Pair Programming vs. Code Reviews; http://www.codinghorror.blog/2007/pair-programming-vs-code-reviews.html

      8. Begel, Andrew and Ngappan, Nachiappan; Pair Programming: What’s in it for me?; research.microsoft.com/en-us/um/people/abegel/papers/esm-begel-2008.pdf

  33. Stephan Eggermont

    Development with 500 people with 50 roles, programmers 20 lines/hour. How many of those 500 are then programmers?

    • Capers Jones

      Stephan,

      Here is the distribution of occupations for a project of 100,000 function points or about 5,000,000 LOC in a mid-level language such as Java.

      Regards.
      Capers Jones

      Normal
      Staff

      Programmers 290
      Testers 256
      Designers 138
      Business analysts 138
      Technical writers 60
      Quality assurance 51
      1st line managers 45
      Data base administration 26
      Project Office staff 23
      Administrative support 26
      Configuration control 15
      Project librarians 12
      2nd line managers 9
      Estimating specialists 9
      Architects 6
      Security specialists 3
      Perfomance specialists 3
      Function point counters 3
      Human factors specialists 3
      3rd line managers 2
      TOTAL 1,119

  34. I used to always suspect pair programming as an unproductive way to get things done and this post removed all doubts.

    We did try pair programming before – and we noticed that we have two scenarios:

    - Expert + Novice: The novice will do nothing and will get de-motivated immediately.
    - Expert + Expert:They will always fight about how to do things

    Productivity, in my opinion, is even less than the productivity of a single person programming.

    Thanks for sharing this data with us!

    • Capers Jones

      Thanks for the reply – your observations are congruent with my data and with several other research papers.

      Best Regards,
      Capers Jones

  35. Unfortunately this “work” adds nothing to the debate. Your sources are secret and thus unverifiable, and your “data” is generated by a tool you plugged your assumptions into. Surprise, surprise, it backs up your assumptions.

    Please stop acting as if it is a worthwhile contribution, it is not. Not until you do the proper scientific thing and allow others to check your results.

    Your should definitely stop slapping down other people’s comments accusing them of not having done research, when by any objective measure, you haven’t either. It smacks of arrogance and hubris.

    That said, if you DO release the data and the source of your “calculator”, and your data is verified by others, then it WILL be a valuable contribution. Possibly a very valuable one.

    Until then, tone down on the puffed up chest a bit, okay?

    • Capers Jones

      Sean,

      Elsewhere in this blog is a catalog of software benchmark providers. It lists 23 quantitative benchmark providers with a combined total of about 91,000 measured projects. Unfortunately all of us collect data using non-disclosure agreements so none of us can show the names of the companies providing the data.

      An interesting question for the pair programming community is why none of these projects are pair programming projects? I asked the International Software Benchmark Standards Group (ISBSG.org) about pair programming. (They are a non-profit organization.) Not even one pair project had submitted data to add to their collection of more than 5,000 projects.

      Why don’t some of the pair projects download the ISBSG benchmark questionnaire and begin to provide reliable quantitative data? Their company names will not be revealed, but the data would add credibility to the claims of the pair programming community.

      These various benchmarks do identify methods so you can find data on agile, RUP, TSP, waterfall, Prince2, and many other methods – only pair programming is missing.

      If you don’t agree with my conclusions you probably won’t agree with the other papers that are a bit negative.

      1. Sargeant, Will; Where Pair Programming Fails for Me; 12/29/2010; Tersesystems.com

      2. Evers, Jon; Pair Programming Considered Harmful; 3/12/2012; Techcrunch.com

      3. Lai, Kim Man, Atikus, Bruce, Chan, Keith; Pair Programming Issues and Challenges; Geogle.comp.polyu.edu.hk

      4. CMountford; Pair Programming is Kryptonite; 6/25/2009; blogs.atlassian.com/2009/06/pair_programming_is_kryptonite

      5. Berba Velasco jr., V; The Pitfalls and Perils of Pair Programming; http://www.streetdirectory.com/travel_guide/12455/programming/the_pitfalls_and_perils_of_pair_programming.html

      6. Wilden, Mark; Why I don’t like Pair Programming (and why I left Pivotal); 12/7/2009; mwilden.blogspot.com/2009/11/why_I_don’t_like_pair_programming_and.html

      7. Atwood, Jeff; Pair Programming vs. Code Reviews; http://www.codinghorror.blog/2007/pair-programming-vs-code-reviews.html

      8. Begel, Andrew and Ngappan, Nachiappan; Pair Programming: What’s in it for me?; research.microsoft.com/en-us/um/people/abegel/papers/esm-begel-2008.pdf

      There is one other case where pair programming should not be used and probably will never be used because it is too hazardous. The programming I’m doing now involves patentable inventions. I don’t want anyone else to know how the algorithms work until the patents are filed. If I did try pairing the other person would need to sign both a non-disclosure agreement and a non-competition agreement. Even then it is best to work solo on new inventions before patents are filed.

      Capers Jones

  36. A Developer

    Surely this is a joke?

    Pair programming is just a technique which is very effective sometimes, and not so effective at other times. You certainly shouldn’t be making a binary decision about it – the author’s suggestion about doubling the workforce from 50k to 100k actually made me laugh out loud.

    That’s almost like saying, “It’s better if 2 people lift boxes that are heavier than Xkg so we must double the amount of warehouse staff”. Clearly when a heavy box needs lifting, 2 of the existing staff will temporarily team up to lift it.

    To any non-devleoper reading this – please don’t frown at your developers next time they pair up on something, we’re all human here, and I don’t think I need to spoon feed anyone the benefits team work.

    • Capers Jones

      The issue is that other methods provide better results than pair programming for a lower cost.

      Capers Jones

    • Capers Jones

      To: A Developer

      Pair programming is only one method for team work, and not a very good one.

      Team Software Process (TSP) has substantial proof of success.

      Inspections have thousands of measured projects and 40 years of proven success.

      If you look at the methods that are effective on fairly large applications TSP and RUP are much better than pairs; XP without pairs is better, agile is a bit better. Pairs are near the bottom in productivity and only fairly good for quality. All are better than waterfall or cowboy though.

      Have you tried TSP or RUP? Have you tried static analysis and inspections?

      Capers Jones

  37. Mark Stamp

    In general I agree with your views on pairing but you have neglected some intangible benefits which make the figures less stark.
    You say that a single programmer with have an inspection, but you have not allowed for the cost of another, usually senior developer, to take time out of his work to do this.
    When talking about different skill level developers, you have neglected the fact that a novice will improve by pairing with an expert. This will decrease future costs but can’t be quantified.

    • Capers Jones

      Mark,

      Thanks for the comment.

      Mentoring is many years older than pair programming. Many companies ask senior personnel to help new hires but this is not pair programming.

      Inspections also have provable value along the same lines, in that novice programmers learn from going through the code of experts.
      They also are helped when experts go through their code.

      Inspections rank as both a top defect removal method and a top defect prevention methods.

      Inspections teams run from a low of 3 participants to a maximum of 8 with about 5 being the mode. Thus inspections bring a wider range of views than pair programming.

      Best Regards,
      Capers Jones

  38. Bors

    Hi Jones,

    what do you mean exactly for “Expert pairs”?
    I think there are 2 possible meanings:
    1. A couple of expert programmers working in pair
    2. A couple of programmers experts in pair programming
    I think is not the same thing.
    Usually I compare the pair programming to the car driving. And I can see some similitudes:
    1. A family on holyday: The husband is driving and the wife is looking at the map (Note: Usualy the husband don’t believe to the wife indications).
    2. A taxi: The taxi-driver drives and the passenger looks around
    3. A rally driver team: The driver drives and the navigator is a fundamental support to drive faster and faster ;)
    What kind of pair programming are you thinking about?

    So, actually my question is: “How is it possible to train the expert programmers to become expert pair programmers?”

    Another point is that you are measuring the LOC.
    How can you evalutate a refactoring that reduces the LOC and make the code cleaner for further modifications?

    I don’t think that pair programming is always the best solution. But I think that is very complicated to realize a good comparison.

    Thanks for your essay. I think is a very interesting starting point to make more accurate investigations.

    Boris

    • Capers Jones

      Boris,

      If you read the papers on the history of lines of code elsewhere in the blog you will see that LOC penalizes high level languages and makes them seem more expensive than low-level languages. That problem was avoided in the pair piece by assuming the same language.

      However if cutting a module in half is valuable then you are right about LOC. The function point metric is the best choice for economic analysis across multiple languages. Here are samples for 3 languages:

      Assembly = 480 LOC per month; function points per month = 1.92

      PL/I = 364 LOC per month; function points per month = 4.55

      C++ = 281 LOC per month; function points per month = 5.10

      Total months of code effort for the three examples were:

      Assembly = 300

      PL/ = 89

      C++ = 66

      Thanks,
      Capers Jones

  39. Simon Kenyon

    “new method”.
    first used in in 1990, so not that new

  40. Steffen Helbo

    @Capers

    It seems to me that you are trying to scale up something that only works in small teams to a Large systems development situation.
    and this is where your Major flaw is. Pair programming and Agile programming does not scale well in to Large Systems. so for Pair programming to be effective you need to make the comparison on development of smaller systems and then In some cases you can make Pair programming Cost effective.

    so talking about an entire company with 50.000 employees going up to 100.000 is irrelevant when talking about Pair programming. because that company would Maybe have a small amount of teams that could use the Pair programming effectively.. the result is not 50.000 more people to hire, but usually just a few situations where some of the 50.000 employed would work together in pair programming..

    then there is the whole issue that you assume that the design flaws are bigger and won’t be corrected by pair programming, this is not true in projects using pair programming the designer usually are also the Programmers who handle that part so the pairs will be there to Iron out Any design, architectural and programming flaws before they become too big to handle without a Major rewrite of the code.

    and in your maths you entirely forget to calculate with the knowledge transfer from one person to another that the company at a later date can exploit.

    you also do not cover the fact that in companies where only one person knows his part of the code he can easily get such a high salary that he could easily cost the same as two, this simply because all he needs to do is to go to the boss and say I have found another job for the boss to realize that if he wants to keep him he need to give him a offer that is difficult to impossible to walk away from, thereby costing the company more in the long term than pair programming would had done. Simply Pair programming prevents the company from ending up in a situation where one person can ask Any price for his work simply because there is only him to do it…

    there are lot’s of other things that you don’t look at, but I think your biggest flaw is not to provide us(readers) with sources for All your numbers and claims.

    • Capers Jones

      Stefen,

      There are many forms of knowledge transfer including mentoring, inspections, formal training, joint application design, and quite a few others that are more cost effective than pairs.

      Some people forced to use pair programming quit and change jobs. Your are right that neither agile nor pairs scale up well.

      You neglected the fact that there are about 128 different occupations involved with software – do you also recommend paired business analysts, paired testers, and paired technical writers? What about paired managers?

      No matter how you look at it paying two people for the work of one costs more. Your argument that a super programmer might cost more than an average pair is interesting – probably the super programmer would be much more effective than the pair and hence worth more. A baseball player with a .400 average will probably make more money than two players with .200 averages, and is worth it.

      Thanks,
      Capers Jones

      • Steffen Helbo

        Again you take a Huge company with many different software occupations.

        the only situation where pair programming is useful is when you don’t need or use the 128 different occupations that Can be involved with software…

        it’s a Very simple fact that you keep overlooking Pair programming Should not be used when you have a huge arsenal of people to work on the project.

        and again in the situation that the company and/or the project is a small one, the benefit of pair programming will shine through. Simply because there will not be 128 different occupations working with that project, there might just be 2 persons or 10 persons Totally and then it’s much more cost efficient to use pair programming than to hire a Architect and a designer and so on, because when you then end up with only one who can program the project the project will Take forever to finish, where if you instead just hire 6-7 programmers and 4-3 other people with sales or management as their focus you will get a very reliable and much better software and a quicker solution than that one programmer could had done with all the others trying to support the creation of that product.

        • Capers Jones

          Stetten,

          Nobody hires architects or specialists for small projects. Even if you limit pair programming to small projects the burdened cost with benefits and medical is about $150,000 per year per person.

          If you have 1 pair you pay an extra $150,000 per year.
          If you have 5 pairs you pay an extra $750,000 per year.
          If your have 10 pairs you pay an extra $1,500,000 per year.

          There are other ways that are less expensive and have equal or better quality and shorter schedules than pair programming.

          Capers Jones

    • Capers Jones

      Daniel,

      I don’t speak French – could you comment in English?

      Thanks

  41. Hey there, You’ve done an excellent job. I’ll definitely digg it and personally recommend
    to my friends. I am confident they’ll be benefited from this website.

    • Capers Jones

      Donte,

      Thanks for the kind words. Sorry for the delay – out of the U.S. and off the web for several weeks.

      Best Regards,
      Capers Jones

    • Capers Jones

      Donte,

      Thanks.

      Capers Jones

  42. I read this post completely on the topi of
    the comparison of most recent annd previous technologies, it’s awesome article.

    • Capers Jones

      Annabelle,

      Sorry for the delay – out of the country and off the web for a while. Thanks for the kind words.

      Happy Holidays.
      Capers Jones

  43. It’s awesome to pay a visit this web page and reading the views of all colleagues on the
    topic of this paragraph, while I am also keen of
    getting experience.

    • Capers Jones

      Claudette,

      Thanks for the comments.

      Happy Holidays,
      Capers Jones

  44. John Conyers

    Two heads are better than one! Isn’t that what we are told? Sounds so reasonable.

    So here we go. Let’s take two “programmers,” each with an IQ of 80, and pair them up. Now we have the equivalent skill of one programmer with an IQ of 160.

    Two heads are better than one! N’est-ce pas?

    To what depth have we, as a nation, sunk? Knowledge, skill, creativity, innovation – OUT. Lashing two dummies together at a terminal – IN.

    It is hard to be sanguine about the future of the country in the presence of a tide of ignorance and stupidity washing over the country.

    • Capers Jones

      John,

      It may not be quite as bad as you state, but pair programming is definitely suboptimal for costs and only marginal for quality.

      Thanks and Happy Holidays,
      Capers Jones

  45. Capers Jones

    Sorry but I don’t speak French – any chance of resending in English?

    Thanks,
    Capers Jones

  46. Johnc120

    My brother recommended I may like this website. He used to be totally right. This post truly made my day. You can not consider simply how so much time I had spent for this info! Thanks! dbkdeadkbckg

    • Capers Jones

      Johnc

      Thanks for the kind words.

      Capers Jones

  47. Johnf474

    Hi there! This is my first visit to your blog! We are a group of volunteers and starting a new initiative in a community in the same niche. Your blog provided us beneficial information to work on. You have done a marvellous job! edbgbfcakfae

    • Capers Jones

      Johnf

      Many thanks. Glad the info is useful.

      Capers Jones

  48. Capers Jones

    thanks,
    Capers Jones

  49. Capers Jones

    Thanks,
    Capers Jones

  50. Capers Jones

    Adalbarto,

    Thanks for the kind words. I might consider a section for other people. However, as you may have noted, every piece must have quantified data.
    This is not the place for soft opinions. Let me know what topics you are interested in.

    Happy Holidays,
    Capers Jones

  51. Capers Jones

    Thanks for the kind words. The end of 2013 was busy with client work but I’ll be adding new materials next week and from now on.

Pingbacks

  1. Comparing Apples and Pairs – Simple-Talk
  2. Pair Programming « Greg Young's Blog

Leave a Reply