Perl Programming


Perl (Practical Extraction and Reporting Language) is a high level, general purpose, interpreted and dynamic programming language.


Perl was originally developed by Larry Wall in 1987 while working as programmer at Unisys, as a general-purpose UNIX scripting language to make report processing easier. But, it gained widespread popularity in the late 1990s as a CGI scripting language, in part due to its parsing abilities, but as of 2010 is used for a wide range of tasks including system administration, web development, network programming, games, bioinformatics, GUI and graphics programming. The language is intended to be practical (easy to use, efficient, complete) rather than beautiful (tiny, elegant, minimal). Its major features include support for multiple programming paradigms (procedural, object-oriented and functional styles), reference counting memory management (without a cycle-detecting garbage collector), built-in support for text processing, and a large collection of third-party modules. According to Larry Wall, Perl has two slogans. The first is "There's more than one way to do it", commonly known as TMTOWTDI. The second slogan is "Easy things should be easy and hard things should be possible".


Features

Perl borrows features from other programming languages including C, shell scripting, awk and sed.  The overall structure of Perl derives broadly from C Programming. Perl is procedural in nature, with variables, expressions, assignment statements, brace-delimited blocks, control structures and subroutines. Perl also takes features from shell programming. All variables are marked with leading $ (sigils) which unambiguously identify the data type (for example, scalar, array, hash) of the variable in context. Importantly, sigils allow variables to be interpolated directly into strings. Perl has many built-in functions that provide tools often used in shell programming (although many of these tools are implemented by programs external to the shell) such as sorting, and calling on system facilities. Perl takes lists from Lisp, hashes ("associative arrays") from AWK, and regular expressions from SED. These simplify and facilitate many parsing, text-handling, and data-management tasks.


Perl 5 added features that support complex data structures, first class functions (that is, closures as values), and an object-oriented programming model. These include references, packages, class-based method dispatch, and lexically scoped variables, along with compiler directives (for example, the strict pragma). A major additional feature introduced with Perl 5 was the ability to package code as reusable modules. Larry Wall later stated that "The whole intent of Perl 5's module system was to encourage the growth of Perl culture rather than the Perl core.

All versions of Perl do automatic data-typing and automatic memory-management. The interpreter knows the type and storage requirements of every data object in the program; it allocates and frees storage for them as necessary using reference counting (so it cannot deallocate circular data structures without manual intervention). Legal type-conversions — for example, conversions from number to string — are done automatically at run time; illegal type conversions are fatal errors.


Upcoming version of perl is 6.0.


Design

Perl has many features that ease the task of the programmer at the expense of greater CPU and memory requirements. These include automatic memory management, dynamic typing, strings, lists, and hashes, regular expressions, introspection, and an eval() function. Perl follows the theory of "no built-in limits". Perl syntax reflects the idea that "things that are different should look different." Perl does not enforce any particular programming paradigm (procedural, object-oriented, functional, or others) or even require the programmer to choose among them.

No written specification or standard for the Perl language exists for Perl versions through Perl 5, and there are no plans to create one for the current version of Perl. There has been only one implementation of the interpreter, and the language has evolved along with it. Perl 6, however, started with a specification.


Applications

In the early days of the Web, programmers have used Perl to write CGI scripts, it is the most popular dynamic languages for writing Web applications. Large projects written in Perl include cPanel, Slash, RT, TWiki and Movable Type. Many high traffic websites use Perl extensively i.e; Amazon.com, bbc.co.uk, Preceline.com, Craigslist, IMDb, LiveJournal, Slashdot and Ticketmaster

Converting or processing large amounts of data for tasks such as creating reports. In fact, these strengths are intimately linked. The combination makes Perl a popular all-purpose language for system administrators, particularly because short programs can be entered and run on a single command line. With a degree of care, Perl code can be made portable across Windows and Unix. Graphical user interfaces (GUIs) may be developed using Perl. i.e; Perl/Tk. Perl is also widely used in finance and in bioinformatics, where it is valued for rapid application development and deployment and for its capability to handle large data-sets.


Implementation

Perl is implemented as a core interpreter, written in C, together with a large collection of modules, written in Perl and C. As of 2010, the stable version (5.12.3) is 14.2 MB when packaged in a tar file and gzip compressed. The interpreter is 150,000 lines of C code and compiles to a 1 MB executable on typical machine architectures. Alternatively, the interpreter can be compiled to a link library and embedded in other programs. There are nearly 500 modules in the distribution, comprising 200,000 lines of Perl and an additional 350,000 lines of C code. (Much of the C code in the modules consists of character-encoding tables.)


The interpreter has an object-oriented architecture. All of the elements of the Perl language—scalars, arrays, hashes, coderefs, filehandles—are represented in the interpreter by C structs. Operations on these structs are defined by a large collection of macros, typedefs, and functions; these constitute the Perl C API. The Perl API can be bewildering to the uninitiated, but its entry points follow a consistent naming-scheme, which provides guidance to those who use it.


The life of a Perl interpreter divides broadly into a compile phase and a run phase.

Most of what happens in Perl's compile phase is compilation, and most of what happens in Perl's run phase is execution, but there are significant exceptions. Perl makes important use of its capability to execute Perl code during the compile phase. Perl will also delay compilation into the run phase. The terms that indicate the kind of processing that is actually occurring at any moment are compile time and run time. Perl is in compile time at most points during the compile phase, but compile time may also be entered during the run phase. The compile time for code in a string argument passed to the eval built-in occurs during the run phase. Perl is often in run time during the compile phase and spends most of the run phase in run time. Code in BEGIN blocks executes at run time but in the compile phase.


At compile time, the interpreter parses Perl code into a syntax tree. At run time, it executes the program by walking the tree. Text is parsed only once, and the syntax tree is subject to optimization before it is executed, so that execution is relatively efficient. Compile-time optimizations on the syntax tree include constant folding and context propagation, but peephole optimization is also performed.

Perl has a Tuning complete grammar because parsing can be affected by run-time code executed during the compile phase. Therefore, Perl cannot be parsed by a straight Lex/Yass lexer/parser combination. Instead, the interpreter implements its own lexer, which coordinates with a modified GNU bison parser to resolve ambiguities in the language.


Perl interpreter can simulate a Turing machine during its compile phase.


Availability

Perl is dual licensed under both the Artistic License and the GNU General Public License. Distributions are available for most operating systems. The CPAN (Comprehensive Perl Archive Network) carries a complete list of supported platforms with links to the distributions available on each. CPAN is also the source for publicly available Perl modules that are not part of the core Perl distribution.

Because of unusual changes required for the MAC OS environment, a special port called MacPerl was shipped independently.


Windows

Users of Microsoft Windows typically install one of the native binary distributions of Perl for Win32, most commonly Strawbetty Perl or ActivePerl. Compiling Perl from source code under Windows is possible, but most installations lack the requisite C compiler and build tools. This also makes it difficult to install modules from the CPAN, particularly those that are partially written in C. ActivePerl is a closed source distribution from ActiveState that has regular releases that track the core Perl releases. The distribution also includes the Perl Package Manager (PPM), a popular tool for installing, removing, upgrading, and managing the use of common Perl modules. 

Strawbetty Perl is an open source distribution for Windows. It has had regular, quarterly releases since January 2008, including new modules as feedback and requests come in. Strawberry Perl aims to be able to install modules like standard Perl distributions on other platforms, including compiling XS modules.


Database interfaces

Perl is widely favored for database applications. Its text-handling facilities are useful for generating SQL queries; arrays, hashes, and automatic memory management make it easy to collect and process the returned data. 

In early versions of Perl, database interfaces were created by relinking the interpreter with a client-side database library. This was sufficiently difficult that it was done for only a few of the most-important and most widely used databases, and it restricted the resulting perl executable to using just one database interface at a time. In Perl 5, database interfaces are implemented by Perl DBI modules. The DBI (Database Interface) module presents a single, database-independent interface to Perl applications, while the DBD (Database Driver) modules handle the details of accessing some 50 different databases; there are DBD drivers for most ANSI SQL databases. 

DBI provides caching for database handles and queries, which can greatly improve performance in long-lived execution environments such as mod_perl, helping high-volume systems avert load spikes as in the Slashdot effect. In modern Perl applications, especially those written using Web application frameworks such as Catalyst, the DBI module is often used indirectly via object-oriented mappers such as DBIx::Class, Class::DBI or Rose::DB::Object which generate SQL queries and handle data transparently to the application author.


Comparative performance

Large Perl programs start more slowly than similar programs in compiled languages because perl has to compile the source every time it runs. Perl programs took much longer to run than he expected because the perl interpreter spent much of the time finding modules because of his over-large include path. Unlike Java, Python, and Ruby, Perl has only experimental support for pre-compiling. Therefore Perl programs pay this overhead penalty on every execution. A number of tools have been introduced to improve this situation. The first such tool was Apache's mod_perl, which sought to address one of the most-common reasons that small Perl programs were invoked rapidly: CGI Web development. ActivePerl, via Microsoft ISAPI, provides similar performance improvements. Once Perl code is compiled, there is additional overhead during the execution phase that typically isn't present for programs written in compiled languages such as C or C++. Examples of such overhead include bytecode interpretation, reference-counting memory management, and dynamic type-checking.


Optimizing

Like any code, Perl programs can be tuned for performance using benchmarks and profiles. In part because of Perl's interpreted nature, writing more-efficient Perl will not always be enough to meet one's performance goals for a program. In such situations, the most-critical routines of a Perl program can be written in other languages such as C or Assembler, which can be connected to Perl via simple Inline modules or the more-complex-but-flexible XS mechanism. 

Nicholas Clark, a Perl core developer, discusses some Perl design trade-offs and solutions in a 2002 document called "When perl is not quite fast enough".


News Element