Knowledge Dump

C++ Basics

In this article, some basic concepts of C++ (version 17) are covered.

Contents

General
C++ has the following properties/syntax:
  • Each program needs a main function to be executable. However, many compilers allow compilation without a main function, returning a non-executable file (e.g. g++ compiler with "-c" option).
  • Single-line comments are denoted by two slashes // this is a comment, while multi-line comments are written as /* this is a comment */.
  • Declarations/expressions end with semicolons (excluding preprocessor commands).
  • Every variable has a unique name and is of a fixed data type ("strongly-typed" language). For some of the already implemented types, see Fundamental Data Types.
  • C++ is case sensitive, i.e. the name variable is not equivalent to vaRiable.
  • Variables can be initialized with several notations. For example, int a = 10;, int a(10); (constructor notation) and int a{10}; all define an integer a with value 10.
  • Numbers with exponents are written as follows: 10e10 $=10^{10}$, similarly 4.1e-5 $=4.1*10^{-5}$. The e could also be replaced by an uppercase E.

Fundamental Data Types
There are various predefined data types of differing sizes and uses. However, none of these is smaller than 1 byte (8 bits), since it isn't possible to define pointers to the address of a single bit. Also note that the size of these data types is not universal, but may vary from system to system. The operator sizeof can be used to find out the size of the respective data types.
Type Description Size
bool Boolean, with values true/false 8+ Bits
int (Signed) Integer 16 or 32 bits
short Integer, abbreviation of short int 16 bits
long Integer, abbreviation of long int 32 or 64 bits
long long Integer, abbreviation of long long int 64 bits
float Floating point type 32 bits
double Floating point type with double precision 64 bits
long double Bigger/more precise floating point variable than double 64+ bits
char ASCII character variable 8 bits
void "Empty" data type that's mostly used as return of functions (empty return).
No objects can be of void type, but there is a pointer to void (std::nullptr of type std::nullptr_t)
We can also add unsigned as a prefix to character or integer types, in order to free the number's sign bit. This results in exclusively positive numbers of potentially doubled magnitude, compared to signed variables.
As many of these data types' sizes and thus value ranges vary from system to system, one can use the class std::numeric_limits::max<XYZ> from the <limits> library to check for the maximum possible values or std::numeric_limits::lowest<XYZ> for the lowest possible values (replace "XYZ" with "int", "float", etc.). Similarly, std::numeric_limits::min<XYZ> can be used to fetch the smallest possible values of floating point variables. For non-floating point variables, "min" is equivalent to "lowest". More methods of the numeric_limits class can be found here (external link).

Note that all numbers have a data type. By default, integers are treated as int and floating type numbers as double. To enforce the use of other data types, suffixes can be used: 123u for unsigned int, 123l for long and 123ul for unsigned long (uppercase U and L can be used, too). Similarly, 3.14f denotes a float and 3.14l a long double (again, also uppercase F and L usable).

Operators
While there are many operators already implemented in C++, they only work on the fundamental data types. To be usable on new data structures and classes, they need to be overloaded, i.e. redefined for each type. Here are some examples of operators and how they are used:
Operator Description Examples
= Assignment operator int a,b; a = b = 5;
+,-,*,/,% Arithmetic operators; % denotes modulo. int a = 5 * 4;
++,-- Increment and decrement operators, increasing/decreasing by 1. int a = 1; a++; ++a; --a; a--;
+=,-=,*=,/=,%= Arithmetic operators that can be used to shorten some expressions. int a,b; a = b = 2; a += 7; b *= a;
(Equivalent to a = a + 7 and b = b * a)
==,!=,<,>,<=,>= Operators for comparing variables. Each returns a boolean. int a,b; bool c; a = 1; b = 2; c = (a != b) (c is true)
&&,||,! "And" (&&), "or" (||), "not" (!) operators on bool expressions. bool a,b,c,d; a = true; b = !a; c = a && b; d = a || b;
(b is false, c is false, d is true)
? Conditional operator, similar to if/else statements:
"X ? Y : Z" translates to "if X true, then return Y, else Z."
bool a,b,c; a = b = true; c = (a==b ? false : true);
(c is false)
, Allows for several commands, where normally only one is allowed.
Only the rightmost returns a value, the rest are just evaluated.
int a,b,c; a = (b = 3, c = 2, b+c);
(a is 5)
:: Scope resolution operator; specifies, which namespace a variable is to be taken from. int a = 1; namespace N { int b = ::a+1 };
namespace M { int c = ::a + N::b };
(b is 2, c is 3)
sizeof() Returns the size of a variable or data type (in bytes). int a = sizeof(short);
(a is 2. size_t can be used instead of int.)
&
*
"Address-of" operator, added as a prefix to identifiers.
"Dereferencing" operator, reading the value pointed to by a pointer.
int a = 1; int *b = &a; *b = 2; (a is 2)
[] Offset operator; Dereferences pointer-like objects after adding an offset to them. int a[5]; a[3] = 3; (4th array value is 3.)
*(a+3) = 3;(Equivalent to declaration above.)

Statements
Statements (e.g. declarations/expressions like int a = 1;) are the building blocks of every C++ program. In order to efficiently assemble them, their execution can be controlled or repeated by selection/iteration statements.
Selection Statements
As the name suggests, these commands are used to select statements. There are two types of selection statements: if/else and switch.

If commands are used to execute statements, if a certain condition is met. If it is not, we could execute another statement with else.
Example:
int a,b,c; a = 1; b = 2;
if (a > b) c = 3;		//a is smaller than b, hence the statement "c=3" is not selected.
else c = 4;			//Instead, we get "c=4".

switch is somewhat similar to chained if/else statements. It takes in an expression, e.g. an int variable, and checks, whether any of the noted constant expressions matches the variable. If a case matches the value of the variable, the following set of statements is executed, until the end of the switch or a break statement (see Jump Statements). To cover the possibility that none of the cases match, one can also add a default set of statements that is always executed, if no break command occurred prior.
Example:
int a,b; a = 2;
switch (a) {
	case 1:			//If a==1...
		b = 1;		//Not executed, since a is unequal 1.
		break;
	case 2:			//If a==2...
		b = 2;		//This statement will be executed.
		break;		//Jump to end of switch.
	case 3:
		b = 3;		//If there was no break in case 2, this statement would be executed, too.
		break;
	default:		//If no break command was executed in the above cases...
		b = 0;		//Not executed, due to break in case 2.
}
Note that the case expressions need to be constants:
int a,b,c; a = 2; c = 1;

switch (a) {
	case c:			//Error, does not compile... Would work with "const int c = 1", though.
		b = 1;
		...
}
Iteration Statements
Iteration statements repeat certain commands, as long as the specified expression has value true. There are four types of iterations: while, do while, for and for with range.

while loop example:
int a,i; a = i = 1;
while (i < 10) {		//As long as i is smaller than 10, i<10 returns true,
	a *= 2;			//a is doubled
	++i;			//and i is increased by 1. End result of a=2^9, i=10.
}
do while is effectively almost the same as a while iteration. The only difference is that a do while loop executes the statement at least once, before checking, whether the specified condition is fulfilled. Example:
int a,i; a = i = 1;
do {
	a *= 2;			//Double a
	++i;			//and increase i by 1.
}
while ((i < 10) && (i > 1));	//If i is smaller than 10 and greater than 2, repeat the statements above.
				//Since the do statement was executed once before checking the condition,
				//i>1 is true and the loop runs until a=2^9 and i=10.
				//Note: Don't forget the semicolon after the condition.
for is a more general version of a while loop. It does not only hold the loop condition, but also includes an initialization of the loop variable, as well as the incremental/decremental statement. Multiple initialization/increment statements are allowed and need to be separated with a ,. Example:
int a = 1;
for (int i = 1; i < 10; ++i) {	//Initialize; check condition; increment. ++i increments first and then returns i, 
	a *= 2;			//while i++ returns first and increments after. Again, we have a=2^9 and i=10 at the end of the loop.
}				//However, i is not accessible outside of the loop, here.
A ranged for iteration takes in a variable that subsequently takes all the values covered in the range expression. This range expression can e.g. be an array, a class/struct (with begin() and end() member functions) or simply an initialization list in curly braces (e.g. {4, 9, 3, 5}). Example:
int a = 1;
for(int i : {4, 9, 3, 5}){	//Variable i takes all values in the integer array defined by the initialization list.
	a *= i;			//Multiply all list values with a. End result: a=540.
}
Jump Statements
Jump statements affect the order, in which commands are executed or allow to skip following code. There are four types of jump statements: return, break, continue and goto.

return statements are not really used to affect the order of statement execution, but mostly to return the desired output of functions. Once a return statement is executed, the expression after return is evaluated and the current function call is terminated, possibly with a returned value from the evaluated expression. Example:
int foo(int x) {		//Function that returns 1 for positive and -1 for negative integer input.
	if (x < 0) {
		return -1;	//If the input is smaller 0, -1 is returned and the function call ends.
	}
	return 1;		//1 is returned, if the return statement before wasn't triggered, i.e. if x >= 0.
}
break is used to terminate iteration statements or switch commands prematurely. Example:
for (int i = 1; i > 0; i++) {	//Technically infinite loop, since i is steadily increased. 
	if (i == 10) {		//When integer overflow is reached, this would result in undefined behavior.
		break;		//However, the loop ends, once i is 10, due to the break command.
	}
}
A continue command is similar to break, but instead of ending the whole loop, it just causes a jump to the end of the current iteration, without executing the rest of its code. Example:
int a;
for (int i = 1; i < 20; i++) {
	if (i != 14) continue;	//Unless i is 14, jump to the end of the current iteration.
	a = i;			//Expression is only evaluated for i=14, so a=14 after the loop.
}
By the use of goto, one can jump to any label in the current function environment. Labels can be defined, by adding a colon after an identifier (e.g. label1:). When goto is used, jumping out of the scope of one or more variables, the destructor is called in opposite order of the variables' construction. Special care is needed, when jumping into the scope of variables: Unless these are fundamental type variables without initialization or classes types with trivial constructor, destructor and without initialization, the code will not compile. Example:
	goto label1;		//Jumping to label1 is ok and will define int a.
	int a;
label1:
	int b = 1;
label2:				//Program would not compile if we jumped to label2 instead of label1, since b is initialized.
	for (int i = 1; i < 20; i++) {
		if (i == 14) goto label3;	//Jump to label3, once i=14. The loop terminates, since label3 is outside the loop.
		a = i;		//In the end, we have a=13.
	}
label3:

Arrays
Arrays can be defined/read/written by using [] brackets. Example: int a[3]; yields an array of 3 integers, which can then be set by a[0]=23; a[1]=-34; a[2]=9; (array indices always start at 0). Alternatively, it could have been initialized by using curly brackets: int a[3]={23,-34,9}; ("=" sign may be left out).
Multi-dimensional arrays are declared similarly: int a[2][4];, which can be interpreted as a 2x4 matrix. Its values are accessed as follows: a[0][3]=3, setting the value in the "first row, fourth column" to 3.
It should be noted that the array index in declarations is mostly used to indicate its size (see: sizeof() operator), but can still be exceeded when accessing/writing values. For example, after declaring and initializing an array of 2 integers, one could still access the (nonexistent) 5th value of the array, resulting in undefined behavior: int a[2] = {1,2}; int b = a[4];. This is due to the way the offset operator is implemented – compiling such a program won't yield any error, but may lead to problems during runtime.

When used in functions, the index range of a one-dimensional array is undeclared and specifying an array size doesn't yield a different result, as long as the size is valid (i.e. integer > 0). The function doesn't copy the full array as an argument, which is potentially very large in size, but just a pointer to the start of the array. When working with multi-dimensional arrays, all but the first depth have to be declared, though.

Pointers
Contrary to variables of fundamental data type, pointers don't save actual values, but the address of another variable in the computer's memory. They can only point to one specific data type (with the exception of void type pointers) and one variable at a time. We can declare them, by adding an asterisk * between the data type and identifier. Note that this asterisk is not the dereference operator and only used to declare a pointer. The dereference operator is used to access the value of the variable the pointer points to. To initialize an address, the address-of operator is added as a prefix to the identifier of a variable with the fitting data type. Example:
int a, b, c;
a = b = 1; c = 4;

int *p1, *p2;		//Declare pointers. Operator placement and whitespaces don't matter, 
int* p3, *   p4;	//as long as it's between the data type and identifier.
int* p5, p6;		//Note that this declares one pointer (p5) and one integer (p6), since there is no * before p6!

p1 = &a; p2 = &b; p3 = &c;	//Initialize some pointers.
*p1 = 2;		//Change value of a to 2 with the dereference operator *.
*p2 = c;		//p2 points to b, which is changed to the value of c.
*p3 = *p1;		//Value of c changed to value of a.
p4 = p1;		//Pointer p4 points to the same address as p1.
p5 = &(*p2);		//Dereference and address-of operator cancel each other out. Parentheses not required.
//End values: a=2, b=4, c=2.
Pointers also allow for basic addition/subtraction of numbers. Incrementing a pointer by one makes it jump to the memory address right after the variable, while keeping the pointer type. Example: When working with a pointer to a short variable, the pointer will skip two bytes of memory. Since the pointer is of type short*, it will still read the memory values of two bytes when dereferencing, even if there is no short variable saved there. Thus, pointer manipulations like this should be handled with care.

As noted in the array section above, arrays are passed to functions as pointers. Generally, arrays and pointers are mostly equivalent in the way they can be used to access or write values. Both can be dereferenced – for arrays, this yields the value of the first array entry. The offset operator also works for both, but is mostly used for arrays, since the array entries are saved consecutively in the memory. The biggest difference however, is that pointers can point to different memory locations, while arrays are fixed to one memory block and can not be re-declared. Arrays also contain information about their size, while a similarly declared pointer doesn't.


In case we want to use pointers, but don't want the original variables to be changeable, we can also declare the pointers as const. This can be added either as a prefix before the asterisk, for a pointer that can't change the original value (e.g. const int *a; or equivalently int const * a), or as a suffix after the asterisk, in order to specify a constant pointer fixed to one variable (e.g. int * const a;). Both of these const types can be combined for a fixed pointer that can't change the variables' value (e.g. const int * const a;).


In the previous example, the original pointer remained unchanged after the function call, since it was copied as input to the function. If we wanted to change the pointer with a function, we could've called it by reference (writing int * &ptr as function input) or used a pointer to said pointer instead. Pointers to pointers are declared simply by adding a second asterisk, e.g. int *a; int **b = &a;.

Lastly, it should be mentioned that there's a pointer type with unspecified underlying data type: void pointers. While these allow for more flexibility, since they can point to any data type, they also can't be dereferenced, since they don't carry the information of the size of a variable's memory block. However, the void pointer can be converted to point to the desired data type. Example:
#include <iostream>

int main()
{
	int a = 123; char b = 'b'; float c = 3.14;
	void *p = &a;				//Declare void pointer, initialize to the address of integer a.
	std::cout << *(int*)p << "\n";		//Convert void to int pointer, in order to dereference.
	p = &b;
	std::cout << *(char*)p << "\n";		//Convert void to char pointer.
	p = &c;
	std::cout << *(float*)p;		//Convert void to float pointer.

	return 0;
}
Output:
123
b
3.14 

Namespaces
Variables have varying accessibility, depending on where they are defined. When declared outside of any block, they are globally accessible by name. In contrast to this, variables declared inside blocks (e.g. function definitions) are only visible there and can not be accessed from outside the block. Namespaces pose a middle ground to these cases: While variables or functions declared in namespaces are globally accessible, they can not directly be called by their name, without specifying the namespace that contains them. Example:
int a = 1;			//a is a global variable

namespace N {
	int b = 2;		//b is in namespace N and can be accessed by N::b globally.
}

void foo() {
	int c = a;		//c is a local variable. Since a is globally accessible, c=1.
}

int b = N::b + 1;		//Global variable b, different from the one declared in namespace N.
// "int b = c" would not have worked, since c is only defined inside the function foo()
If we want to make a variable or function declared in a namespace directly accessible outside of it, the keyword using can be applied. When the whole namespace is to be included, we can write using namespace. Example:
#include <iostream>

namespace N {
	int b = 2;
	int c = 3;
	void foo() {
		std::cout << 1234 << "\n";	//Prints 1234 onto console and starts new line.
	}
}

int main() {
	using N::foo;				//Declares function foo outside of namespace N.
	foo();					//Executes function foo. No scope resolution operator :: required.
	using namespace N;			//Includes the rest of the declarations in namespace N.
	std::cout << b << ", " << c << "\n";	//Prints b and c to the console.

	return 0;
}
Obviously, one has to act very carefully, when including large namespaces with lots of commonly used identifiers, since sloppy use can easily lead to name collisions.