Computing Terminology
What is a computer?
- any device, entity, or object that can perform computations
- need not be electronic in nature
Processor/microprocessor/central processing unit (CPU)
- the core unit that does the actual computations.
- only does a few things, such as:
- jump to a particular location in RAM
- read a number from RAM
- save a number to RAM
- add two numbers together
- compare two numbers to see which is larger
- send a number to a particular device (e.g. a monitor)
- receive a number from a particular device (e.g. a keyboard)
- the processors is often called “the brains” of the computer.
- processors are grouped in families that all share similar characteristics
- usually each family has a specific variety of machine code that it understands
- all instructions to the processor as to what it should do must be specified as a sequence of these machine codes
- watch video: See How the CPU Works In One Lesson
Clockspeed
- speed of processor’s electric pulses (oscillating at a certain frequency to keep time).
- measured in Hertz, (i.e. cycles per second)
- the processor can only perform a fixed number of computations with each pulse
- the pulses that the clock sends to the processor are typically 5 Volts or less.
- see more on this topic
Source code
- the language written by the creator of the program (usually a human).
- written as plain text
- there are many popular source code programming languages, such as Java, Python, Javascript, C, C++, Objective C, C#, etc.
- source code must usually be translated into some other type of code that the processor can natively understand, called machine code, before it can be executed.
Machine code
- language understood by processor
- each family of processors has its own variety of machine code
- instructions in machine code can be directly executed by the processor it is designed for
- lowest level of code that the processor knows how to execute (i.e. how to “run”)
- binary code: all
0
’s and1
’s - see more on number systems
1101101010011010
Assembling
- assembly language is a simple set of mnemonics, such as ‘add’ or ‘sub’ for subtract.
- each mnemonic is a shorthand code that directly maps to one of the binary machine code instructions available on the processor
- for every machine code instruction, there is a mnemonic using alphabetic characters that can be written instead
- assembling is the process of translating assembly language mnemonic codes into machine code for a specific processor family
- an assembler is the software that assembles
- once assembled, the machine code is executable on the processors for which it was designed
Compiling
- compiling is translating an entire program from one programming languages to another
- this term is most often used to refer to the process of translating high-level source code directly to machine code
- but also used to refer to the process of translating high-level programming code to an intermediate-level language, such as byte code
- if compiled directly to machine code, that machine code is then “executable”, meaning it can be run on the processor, if desired
- another program, called an executor performs the execution of the machine code, if desired… this is not part of the compilation process
- if compiled to an intermediate-level language instead of machine code, another compiler or interpreter, such as as virtual machine (VM), must then be used to translate that intermediate-level language to machine code before it can be executed.
Interpreting
- at a simplified level, an interpreter translates a single statement of source code at a time from one higher-level programming language into machine code and immediately executing that machine code on the processor
- the code is being translated “on the fly” into machine code and immediately executed
- contrast this with compiling, which involves bulk analysis and translation of an entire source code base to machine code, but does not execute the machine code
The reality of compiling vs interpreting
- read this short explanation by a user on stackoverflow.com for a bit more of the reality of modern-day interpreters, which are more complex than my simplified explanation
Implementations
Many real world high-level programming language implementations use one or both of compiling and interpreting. Here are a few examples:
C:
- high-level C code is usually compiled directly to machine code for a specific processor family
- that compiled machine code can then be executed on the processors for which it was compiled
Shell scripts:
- bash and similar shell scripting languages are interpreted directly into machine code by the shell interpreter.
- each shell statement is immediately translated to machine code and executed on the processor
Java:
- high-level Java code is typically compiled to intermediate-level
bytecode
- bytecode can then be interpreted into machine code by any Java Virtual Machine
Python:
- high-level Python code is typically compiled to intermediate-level
bytecode
- bytecode can then be interpreted into machine code by any Python Virtual Machine
Bytecode
- Python and Java both are usually compiled to their respective versions of bytecode
- Bytecode is called such because each instruction in these languages is one byte (8 bits) of data
- Bytecode is an intermediate-level code that is interpreted “natively” by the corresponding virtual machine - a software simulation of a computer that has bytecode is its machine code.
- The advantage of bytecode is that all processors for which a Python Virtual Machine or Java Virtual Machine have been created can execute their respective bytecodes, whereas typically machine code differs from processor to processor. So any given machine code can only run on the processor family for which it was created, whereas bytecode can run on any computer that has a virtual machine installed.
- this allows Python and Java to be “write once, run anywhere” languages.
- see this explanation by a Python core developer for a more detailed description of what bytecode is (starting at 2’10” in the video)
Java bytecode
- Java was one of the early languages that relied on the bytecode compiler/interpreter paradigm
- in Java, any family of processors that has had a “Java Virtual Machine” (
JVM
) designed for it can run the Java bytecode - the community who work on Java have made JVMs for most popular processor families
- a programmer writes high-level Java source code, this is typically compiled to bytecode, this bytecode is saved onto whatever computer the user wants it to run the program on, and the JVM interprets the bytecode to the appropriate machine code for that processor family
- thus Java is a “write once, run anywhere” sort of language
This paradigm is now common in more modern programming languages like Python, Ruby, and even PHP
Documentation
Documenting code is important for readability and maintenance of that code. Most languages have common conventions for how developers leave notes and document their code.
Python
- See Python docstring conventions
- All programs must be documented following these conventions
Java
- See documenting source code in Java
- All programs must be documented following these conventions