Fortran

From Citizendium
Revision as of 13:10, 9 February 2010 by imported>Milton Beychok (Wiki links)
Jump to navigation Jump to search
This article is developing and not approved.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
Catalogs [?]
 
This editable Main Article is under development and subject to a disclaimer.

Fortran (Formula translation) is the oldest high-level programming language. It was the first computer language that gave priority to humans over computers, meaning that Fortran users did not have to know machine instructions, as was required of computer programmers before the development of high-level languages. As is true for any high-level language, Fortran statements are simple transcriptions of mathematical formulas that are independent of the instruction set of the computer on which the program is to be executed.

Fortran being the oldest high-level computer language, there is a huge legacy of Fortran programs in the scientific and engineering communities, with some of the programs exceeding a million statements. Fortran has always been the primary language for intensive (super)computing tasks, such as weather and climate modeling, oil reservoir simulation, computational fluid dynamics, computational chemistry, computational economics, and computational physics. The existing body of Fortran programs being too large to rewrite into another computer language in a finite amount of time, it is likely that Fortran is here to stay — even more so since a very noticeable feature of all Fortran versions that have appeared to date is their "upward compatibility". In principle, a modern Fortran 95 compiler can still translate a Fortran program written in, say, 1959 and prepare it to run on a present day computer.

It is not likely that the modern versions of the language are applied as widely as the earlier versions. Until 1991, when the Fortran 90 standard was established, the language lagged behind in modern developments, such as recursion, pointers and dynamic memory allocation, object-oriented programming, etc. Many programmers in need of such constructs switched to C, C++, and variants of these languages, and it is not to be expected that they will return to a modern form of Fortran. In any case, by all sorts of metrics Fortran is far behind C, C++, and Java in popularity among programmers.[1] In the specialized field of high performance computing, however, where computing runs lasting days, or even weeks, are common, Fortran still is the major language. This is because traditionally the Fortran compilers have been—and still are—strong in the optimization of computationally intensive tasks.

History

The first version of the language was developed in 1954 by an IBM team lead by John Backus. This version, now referred to as Fortran I, was mainly restricted to IBM computers (notably the IBM 704 that, starting in 1957, was delivered with a Fortran compiler). In 1958 a variant, known as Fortran II, was introduced that was soon also implemented by other computer vendors; it now became possible to transfer (Fortran) programs between computers with completely different instruction sets. The issue of portability of programs (possibility of transferral between computers of different architecture) was born. In 1966 the first standard endorsed by the American Standards Association (now ANSI) was established: Fortran 66. The idea of this standard being that a program that strictly adhered to it would be portable. Fortran 66 was largely based on a version, called Fortran IV, developed by IBM around 1961.

From the end of the 1950s onward, many important computer language concepts were developed by the early computer scientists, such as recursion, dynamic storage allocation, local and global variables, if then else statements, etc. When at the end of the 1960s it was decided that a more modern form of Fortran was needed (that later came to be known as Fortran 77), there was much pressure from the side of (numerical) mathematicians and computer scientists to extend the language with such constructs. However, this was successfully resisted by programmers—mainly physical scientists and engineers—who feared that the new, more abstract, concepts would erode Fortran's pragmatic, natural style. In addition, workers applying Fortran to numerically intensive tasks were afraid that the introduction of the constructs would be at the expense of the numerical efficiency of the language. At that time already, very sophisticated Fortran optimizing compilers were used and it was feared that concepts as recursion would impede the optimization and slow down the programs considerably. In consequence, the Fortran 77 standard does not allow recursion, dynamic storage allocation, or local variables. It does, however, allow more extended "if statements" and "loop" structures than its predecessors. Also character handling facilities were introduced into Fortran 77.

The next Fortran standard was for quite some time known as Fortran 88 because it was expected 11 years after Fortran 77 (which itself came 11 years after Fortran 66). However, the international standards committees dragged their feet[2] and the first compilers following the new and hotly debated standard (adopted 11 April, 1991)[3] started appearing soon after 1991. Fortran 90 allows recursion, pointers, dynamic storage allocation, case statements, and a construct that is reminiscent of objects used in object oriented languages, namely modules. Very importantly, Fortran 90 allows handling of arrays as one entity, thus enabling compilers to apply parallel processing to them.

The latest standard, Fortran 2003, supports object-oriented and generic programming.

Some features of the language prior to Fortran 90

See also: Fortran/Catalogs

As stated above, much old (written before the 1990s) Fortran code is still in use. It is therefore of interest to discuss some of the rules and features of the older versions of the language. Moreover, all of the old rules are still valid, even in the latest Fortran standard.

The majority of statements are simply assignment statements of the form

      A = expression

where the expression on the right-hand side is first fully evaluated and then assigned to the variable named A. The arithmetic expressions are the same as in any computer language, with the exception of raising to a power, which is indicated by **, as in 10**3. Comparison operators are between periods: .LE., .EQ., .GE., .GT., .NE.

Implicit typing

Fortran 77 introduced character data. Before that time Fortran only recognized real and complex (floating point) numbers, integer numbers, and logicals (booleans). Names of variables have to begin with a letter. Only complex and logical variables (and in Fortran 77 character variables) have to be declared explicitly. The real and integer variables are implicitly declared by the rule: all variables with names beginning with I, J, K, L, M, and N are integer, the variables of which the names start with the other letters are real.

The oldest Fortran versions only allowed variable names of maximally 6 characters, which gave rise to names as "CNVRG" for "convergence", "ITHR" for an integer threshold, etc. Moreover, the Fortran standard, up to and including Fortran 77, required instructions and variables to be written in capitals, although many compilers relaxed this requirement and accepted mixed case.

Fixed format

Fortran 90 introduced free format; until the appearance of this standard Fortran statements had a fixed format adapted to punch cards of 80 columns. The first column can contain the letter "C" indicating a comment line. Column 1 to 5 of the line are either blank or contain a numeric label. The 6th column is preserved for a continuation character; if any non-blank character is in the 6th column, the statement is interpreted as the continuation of the previous statement. Column 7 through 72 contains the actual statement where blanks are completely ignored, they are optionally used to improve readability by humans. Statements are not closed by a semicolon or otherwise. Columns 73 through 80 are ignored and may be used for sequence numbers.

Change of execution flow

The oldest Fortran versions knew only the "arithmetic if". This is a statement of the form

      IF (N) 10, 20, 30

where execution continues at the statements labeled 10, 20, or 30 for N <0 , N = 0 and N > 0, respectively. Labels contained in columns 1 through 5 can be any unique set of up to 5 digits.

Later the "logical if" was added that was usually used in conjunction with a "goto" statement, for example

       IF (N .GE. M) GO TO 1234

If N ≥ M execution continues with the statement carrying the label 1234.

Fortran 77 knows the "IF (a) THEN ... ELSE ... END IF" construct, where a is either true or false. This construct does not need labels.

Subroutines and functions

A Fortran program can be partitioned into smaller independent subprograms that at load/link time are combined to one executable program. The most common subprograms are "subroutines" that have a number of parameters that are input, output, or both. Also "functions", having input parameters and returning a single value, are possible. Functions are most commonly used for built-in subprograms provided by the vendor, such as sqrt (square root), etc.

A subroutine is invoked by a call statement, for example

       D  =  3.14
       CALL CUBE(D, D3)
C      D3 is equal to D cubed at this point
       ...
       SUBROUTINE CUBE(A, B)
       B = A**3
       RETURN
       END

The statement RETURN indicates that execution returns to the calling program and END tells the compiler that this is the end of the program unit. (From Fortran 77 onward END serves both functions).

The same as a function:

       D = 3.14
       D3 = CUBE(D)
       ...
       FUNCTION CUBE(A)
       CUBE = A**3
       RETURN
       END

Common blocks and blank commons

As stated, the older Fortran variants did not explicitly distinguish local and global variables. As a matter of fact, most compilers made local copies of variables, and kept arrays global.

To provide for global variables Fortran has "common blocks", either named or unnamed ("blank common"). These are memory areas that are assigned to the executable program at load/link time. Each subroutine, in which the common block is declared, has read/write access to it. Because the common block is an independent programming unit, no information about it, other than its starting address, is passed to the subroutine that accesses it. In other words, every subroutine makes its own assumptions about the length and the layout of the common block. It is needless to say that common blocks in the hands of careless programmers are almost infinite sources of confusion and programming errors.

Loops

The original form of the subscripted loop ("do loop") is

       DO 10 I = M, N
         ....
   10  (any executable statement)

The statements after the "do statement" up to and including the statement carrying the label 10 are executed, for I varying with unit steps from M to N. Prior to Fortran 77, the loop was executed at least once, even for M = N+1, i.e., the checking on the range of I was performed at the end of the loop cycle, not at the beginning. In the majority of cases this is not what the logic of the program dictates, so the older Fortran programs are full of constructs as

       IF (M .GT. N) GO TO 20
       DO 10 I = M, N
         ....
   10  (any executable statement)
   20  (any executable statement)

This was changed in Fortran 77, where by default the loop was skipped when M > N. This change was contrary to the "upward compatibility" philosophy of Fortran and therefore most compilers introduced a "onetrip" flag to ask for at least one trip through the loop, in accordance with the older standard. Forgetting to set the flag (or not knowing that it is necessary) can cause crashed runs, or even worse, (slightly) wrong results.

Furthermore, Fortran 77 introduced the following label-free syntax for the do loop:

       DO  I = M, N
         ....
       END DO

When also using the "if then else end if" construct, one can write programs in Fortran 77 with hardly any labels or "GO TO" statements. The possibility of avoiding these was a great step forward. Prior to Fortran 77, programs were plagued by a proliferation of labels and GO TO statements, making older Fortran programs very hard to be read by humans. (This was the notorious "spaghetti Fortran".)

Arrays

Basically, arrays in the older Fortran versions are static and declared by a statement of the type

       DIMENSION A(100,100), B(25)

meaning that A(I,J) gives access to an element of the square array A, provided 1 ≤ I ≤ 100 and 1 ≤ J ≤ 100 and B(K), 1 ≤ K ≤ 25, gives access to an element of the linear array B. Before Fortran 77 array indices had to be larger than 0. This restriction could be very cumbersome, because the index 0 appears naturally in many problems and the programmer had to shift the index by one, leading to confusion about the choice between ±1 shifts. In Fortran 77 upper and lower bounds of arrays are unrestricted.

Some features of Fortran 90/95

Fortran 90/95 is a major revision and extends the language with a wealth of new features. Unlike the earlier revisions, which hardly changed the appearance of the programs, it is possible to write a Fortran 95 program that is unintelligible by a Fortran 77 programmer (although Fortran 95 compilers translate Fortran 77 code without problems. In fact, it happens regularly at large computer centers that programmers compile their Fortran 77 programs with 95 compilers without noticing it). Especially, modules and user defined data types are significant innovations with no counterparts in the older versions.

The new features of Fortran 90 include:

  • Free format source code form (optional) in which:
    • Blanks are significant
    • Lines are up to 132 characters in length
    • Up to 39 continuation lines (cf., to 19 in fixed format) are allowed
  • Semicolon (;) as statement separator for multiple statements per line
  • Names can be up to 31 characters in length and may contain underscores
  • Exclamation mark (!) as comment symbol (also inline)
  • Inclusion of source text from other files, through INCLUDE
  • Modern control structures:
    • DO statement, with CYCLE and EXIT options
    • DO WHILE or no control clause.
    • CASE statement.
  • Specification of numeric precision and range
  • Handling of arrays (array sections, array operators, etc.)
    • Arrays can be treated as whole objects, for example, A=B*SIN(A), where A and B are arrays.
  • Dynamic memory management
    • Allocate, deallocate
    • Pointers
    • Recursion
  • User defined data types
  • Modules - packages containing variables and code
    • Operator overloading
    • Generic procedures
  • Keyword argument passing to procedures
  • The INTENT (in, out, inout) procedure argument attribute

Fortran 95 is a minor update on Fortran 90[4]. Noticeable is that a few old Fortran features, which in Fortran 90 were deprecated but not declared illegal, are removed from the Fortran 95 standard. They are

  • Real and double precision DO variables (which were introduced in Fortran 77)
  • Branching to an END IF statement from outside its IF construct (an oversight of Fortran 77)
  • PAUSE statement (a means of suspending the program and informing the computer operator; dating back to Fortran I)
  • ASSIGN and assigned GOTO statements and assigned format specifiers (Very rarely used features; dating back to Fortran I).
  • H (from Hollerith) edit descriptor (Rarely used manner of defining strings. Fortran 77 allows strings between quotes, most vendors allowed this already in Fortran 66. Thus, e.g., 9HRaindrops is equivalent to 'Raindrops').

Strictly speaking this removal of features is contrary to the philosophy of upward compatibility of Fortran, but it is expected that it will not cause unsurmountable practical problems.

Example

The following example shows some of the novel (in comparison to Fortran 77) syntax features of Fortran 90. Free format is used.

  Module people                ! Module, may contain declarations and subroutines

     Type name                 ! User defined type name 
        sequence               ! The order in name is to be preserved
        character*20 Lastname  ! Last names cannot be longer than 20 characters
        character*10 Firstname !        and first name not longer than 10
        character*1  Middle_initial
     End type name

     Type person               ! Another user defined type
        integer age
        integer birthdate(3)   ! Array (m, d, y)
        type (name) fullname   ! Declare type of structure Fullname
     End type person

  End module people

  PROGRAM EXAMPLE                      ! Main program 
     Use people                        ! Make module available
     Type (person), target ::  Smith   ! Declare structure Smith 

     Smith = person(46, (/6, 30, 1963/), name('Smith', 'John', 'K')) ! Assign structure
     Call setrec                       ! Call internal subroutine
     Contains

        Subroutine setrec              ! Internal subroutine
           Type Emp
              integer empl_nr          ! Employee number
              Type (person), pointer :: employee ! Will point to structures of type person
           End type emp
           Type (emp) empl_rec

           empl_rec = emp(2744, Smith) ! Set record for John K. Smith, 2nd field ..
                                       !  .. is pointer pointing to structure Smith

           write(*,'(2i5, 2x, a20)')&  ! Writes: "2744   46 Smith"
          &   empl_rec%empl_nr, empl_rec%employee%age,&
          &   empl_rec%employee%fullname%Lastname

              Smith%age = 53
              write(*,'(i5, 2x, a10)')& ! Writes: "  53  John"
          &   empl_rec%employee%age,&
          &   empl_rec%employee%fullname%Firstname

        End subroutine setrec
  End program example 

Explanation

A module (a separate independent program unit) is used to define two derived (i.e., user-defined) types: "name" and "person". The module is made available to the main program "EXAMPLE" by the use statement. (Note that Fortran 90 is not case-sensitive.) Because later a pointer will point to the structure "Smith", it is declared with the target attribute. The components of the structure "Smith"—"age" (an integer), "birthdate" (an integer array), and "fullname" (a sequence structure)—are set. The inline routine "setrec" is called. Such an inline routine has access to the variables of its host (in this case the host is the main program "EXAMPLE"). Hence subroutine "setrec" can access "Smith" and the derived type "person".

The routine "setrec" sets an employee record ("empl_rec") for "Smith". The first component contains the employee number "empl_nr", the second the pointer "employee" that points to "Smith" (which is of type "person"). Note in the write statements that components of derived types are accessed through the use of the percentage symbol (%); this is a general rule, not restricted to input/output statements. The ampersands (&) in the write statements indicate continuation lines.

After the age of "Smith" has been changed, his new age and his first name are printed, both being accessed through the pointer "employee". Note that the pointer itself is not changed, only the content of the structure "Smith" that "employee" points to is changed.

Fortran 2003

Fortran 2003, published 18 November 2004, is an upwardly-compatible extension of Fortran 95, adding, among other things, support for exception handling, object-oriented programming, and improved interoperability with the C language.

References

  1. Programming language popularity Retrieved July 30, 2009
  2. See The Fortran saga by Brian Meek for a detailed story of the political fights around the establishment of the new standard
  3. Foreword by Michael Metcalf in: W. H. Press, S. A. Teukolsky, W. T. Vetterling, B. P. Flannery: Numerical Recipes in Fortran 90. Cambridge University Press, 1999, ISBN 0-521-57439-0. Online
  4. Final working draft