- Forward


Properties of the C Programming Language
and their Implications for Software Quality


Prof. David Bernstein
James Madison University

Computer Science Department
bernstdh@jmu.edu

Print

Motivation
Back SMYC Forward
  • C is Very Popular:
    • An enormous amount of code has been and continues to be written in C (in the TIOBE index it was number 1 for Jan. 1988, number 1 for Jan. 1998, and number 2 for Jan 2008)
  • C is Prone to Defects:
    • Especially when programmers are familiar with "heavyweight" languages that protect them where C does not
  • The Questions:
    • Why is C prone to defects?
    • Why is C popular despite these drawbacks?
Overview
Back SMYC Forward
  • Consider the guiding principals used by the standards committee
  • Discuss the motivations of both the implementers of C compilers and C programmers
  • Identify the resulting characteristics/properties of C
  • Consider the implications for software quality
What is an Implementation?
Back SMYC Forward
  • Formal Definition:
    • Particular software, running in a particular environment, under particular control options
  • A Loose Definition:
    • A compiler command including flags/options
The Guiding Principles
Back SMYC Forward
  • Existing code is important:
    • The bulk of existing code should be acceptable to any implementation
  • C code can be portable:
    • The language and library should be as widely implementable as possible
The Guiding Principles (cont.)
Back SMYC Forward
  • C code can be non-portable:
    • The ability to write machine-specific code is one of the strengths of C
  • Avoid "quiet changes" between implementor and programmer:
    • Avoid changes to the language which cause a working program to work differently without notice
The Guiding Principles (cont.)
Back SMYC Forward
  • A standard is a "treaty":
    • Implementers and programmers have different objectives
  • Keep the spirit of C:
    • (a) Trust the programmer.
    • (b) Don't prevent the programmer from doing what needs to be done.
    • (c) Keep the language small and simple.
    • (d) Provide only one way to do an operation.
    • (e) Make it fast, even if it is not guaranteed to be portable.
The Spirit of C Revisited
Back SMYC Forward
  • A Recognized Shortcoming:
    • C code is often not safe/secure
  • A New Facet for the Cx1 Revision:
    • (f) Make support for safety and security demonstrable
Resulting Characteristics of C
Back SMYC Forward
  • C is Lightweight:
    • Many things are the responsibility of the programmer, not the language
  • C is Permissive:
    • The language does not prevent the programmer from doing almost anything
  • C is Close to the Machine:
    • Many operations are defined in terms of how the target machine's hardware does it, not a general abstract rule (e.g., whether char values widen to signed or unsigned values depends on which byte operation is more efficient on the target machine)
Some Recent History - C9X (1994)
Back SMYC Forward
  • Support international programming
  • Codify existing practice; try not to invent
  • Minimize incompatibilities with C90
  • Minimize incompatibilities with C++ (but don't try to become C++)
  • Maintain conceptual simplicity
Some Recent History - C1X (2007)
Back SMYC Forward
  • Programmers need the ability to check their work (for security and safety reasons)
  • No invention (under no circumstances should the language be used to invent new concepts)
  • The ability to mix and match code from different standards is important
Kinds of Behavior
Back SMYC Forward
  • Locale-Specific Behavior:
    • Defined: Behavior that depends on the nationality, culture, language, etc... of the implementers/implementation
    • Example: isLower() for characters other than the 26 letters in the ASCII character set
  • Unspecified Behavior
    • Defined: A behavior for which the standard provides two or more possibilities
    • Example: The order in which the arguments (which may be expressions) of a function are evaluated (e.g., in f(g(i), h(i))) the order in which g(i) and h(i) are evaluated is unspecified)
Kinds of Behavior (cont.)
Back SMYC Forward
  • Implementation-Defined Behavior:
    • Defined: A behavior that is unspecified in the standard but specified in a particular implementation
    • Example: Propogation of the high-order bit when a signed integer is shifted right
  • Undefined Behavior
    • Defined: A behavior that violates a "shall" or "shall not" requirement, a behavior that is noted as undefined in the standard, or a behavior that is not discussed in the standard
Why Allow for Explicitly Undefined Behaviors?
Back SMYC Forward
  • So the implementor need not catch program errors that are difficult to diagnose.
  • To avoid defining edge cases that would favor one implementation strategy over another.
  • To identify possible extensions.
Some Implications
Back SMYC Forward
  • Implication of All Four Kinds of Behaviors:
    • Portability problems often arise
  • Implications Of Undefined Behaviors:
    • Anything can happen when the compiled program is executed
    • Optimizing compilers are not obligated to generate code for undefined behaviors
Levels of Portability
Back SMYC Forward
  • Strictly Conforming Programs:
    • Use only those features of the language and library specified in the C Standard
    • Do not produce output that is dependent on any unspecified, undefined, or implementation-defined behavior
  • Conforming Programs:
    • Are acceptable to a conforming implementation (i.e., may depend on nonportable features of a conforming implementation)
Type Safety
Back SMYC Forward
  • Defined:
    • Preservation: If a variable x has a type t and x evaluates to a value v then v has type t
    • Progress: Evaluation of an expression either results in a value or there is another way to proceed
  • The Type Safety of C:
    • Most people consider C to be weakly-typed
  • Examples in C:
    • If you cast a pointer to an entity of type t to a pointer to an entity of type s and dereference it, then the result is undefined
    • If you perform an operation on signed and unsigned integers of differing lengths using implicit conversion, then the result can be unrepresentable
There's Always More to Learn
Back -