Python Essentials
My Essential Python Notes
Working with Files in Python Cheatsheet
Working with files in files is syntatically intuitive. There are are built-in functions and there are also modules like os
that can be imported to achieve such tasks.
We can start with with some well-known built-in functions:
open()
A built-in function that returns a file object. The second parameter can be either of the following:
-
r:
- Reads the file (selected by default).
- It will return an exception if the file does not exist.
-
r+:
- Reads and appends the file
- It will return an exception if file does not exist.
-
a:
- Can only appends text to the file. It cannot read the file.
- File will be created if it does not exist.
-
a+:
- Appends and reads the file.
- File will be created if it does not exist.
-
w:
- Overwrites the file.
- File will be created if it does not exist.
-
x:
- Creates the specified file but cannot read the file, it can only write to the file.
- It will return an exception if file exists.
-
x+:
- Creates the specified file but can read and write the file.
- It will return an exception if file already exists.
Based on the modes above, it may return the exception FileNotFoundError
.
Note: Do not be confused open()
with os.open()
. os.open()
returns a file descriptor while open()
returns a file object.
write()
It is a built-in function in Python.
If the mode in open()
is set to r+
, w
, or a
, write()
can be called.
Otherwise, it will complain that the file is not meant to be written:
io.UnsupportedOperation: not writable
read()
Reads the whole file all at once. It is ideal for short texts.
file = open("text.txt", "r")
content = file.readline()
readline()
Reads a file line by line with every call. If the function is used inside a loop, it will read the whole line including the newline character (notice the end in the print function).
Ex:
file = open("text.txt", "r")
# This will read only 5 lines.
for i in range(0, 5):
print(file.read(), end='\0')
The two ways to open a file
What you might notice is that a file can be opened either like this:
file = open("text.txt", "r")
file.read()
file.close()
Or this:
with open("text.txt") as file
print(file.read())
The latter has two advantages:
- It is more concise.
- It manages exceptions better when the files are closed.
Some os module functions to know of include:
-
os.remove(): Removes the specified file in the current directory.
-
os.rename(): Renames the specified file ("src") to a new one ("dst").
-
os.mkdir(): Creates a new directory in the current working directory (cwd).
-
os.chdir(): Changes the current working directory to the specified directory. It can either be relative or absolute paths.
-
os.rmdir(): Deletes the specified directory (it will only work if it is empty)
-
os.listdir(): Lists all the files and subdirectories in a specified directory. It returns a list of files/dirs found in the cwd.
-
os.getcwd(): Returns the name of the cwd.
-
os.pardir: It is a constant string that gives us the cwd's parent directory name (Note that
os.chdir("..").getcwd()
does not work).
The os.path
submodule is reserved for path related functions. Almost all of the function's arguments accept pathnames.
Such functions include:
- os.path.exists()
- Checks if the path in question exists.
- os.path.isdir()
- Checks if the directory in the cwd exists.
- os.path.isfile()
- Checks if the file specified exists in the cwd.
- os.path.getsize()
- Returns the size of the specified file in bytes.
- os.path.abspath()
- Returns the absolute path of the file specified.
- os.path.join()
- Returns a string of a concatenated relative path with either a forward slash (/) or a backslash (). It is used for OS compatibility purposes.
- os.path.getmtime(): Returns a Unix time of a particular path passed.
The datetime
module involves manipulating date and time.
There is a datetime object in the datetime module (datetime.datetime
).
- datetime.datetime.timestamp(): Returns a Unix time of a particular file or directory passed since its creation.
Working with CSV Files
There is a csv module to parse csv files.
Such well-known functions in the csv module include
- csv.reader(): Returns a list from each row in the csv text file.
- csv.writer(): Returns an instance of a csv writer class.
- csvwriter.writerow(row): Writes the a single row at a time (even if there are multiple rows in the list, it will be treated as a single row and the brackets are included. It is ideal if called inside a loop).
- csvwriter.writerows(row): Writes multiple rows at a time.
- csv.DictReader(): Reads csv file and creates a dict object in each order.
- csv.DictWriter(): Writes a dict object to a csv file. The fieldname parameter is optional.
DictReader creates an object that operates like a regular reader (an object that iterates over lines in the given CSV file), but also maps the information it reads into a dictionary where keys are given by the optional fieldnames parameter. If we omit the fieldnames parameter, the values in the first row of the CSV file will be used as the keys. So, in this case, the first line of the CSV file has the keys and so there's no need to pass fieldnames as a parameter.
Regex stuff in Python
Functions
Checkout the re module. It is built-in in Python.
We use the re module. Inside the re module there are functions like:
re.search()
: Finds the first match anywhere in the string.re.match()
: Finds the match only at the BEGINNING of the string.re.fullmatch()
: Checks for entire string to be a match. It does not return a substring.re.findall()
: It is likere.search
, but it finds all the matches anywhere in the string. It returns a list or tuples.
Their differences are outlined here
They all return a re.Match
object:
There are functions for the re.Match
object.
Match.groups: Returns a tuple of groups if the parentheses ()
exist in the passed regex (like (FirstWord,)
). Otherwise, it will return ()
.
The Match object can be treated as an array where index 0 is the fully matched string while the latter indices are the groups.
Special characters
Some special characters to note are:
-
.
(dot): Accepts any single character except for a newline character (unless if dotall has been specified) -
^
(caret): Matches at the start of the string. If inside the[]
, it means "not the following". -
$
: Matches at the end of the string. -
[]
: If the reggex is with other characters, it is like.
. But the only difference is that it has to match the characters passed. For example:[Pp]
: P or p[0-9]
: 0 to 9|
: A match that either this substring or that substring. Ex:cat|dog
.[a-z]
: only lowercase a to z[A-Z]
: only uppercase A to Z
-
()
: Creates a group. It is used to make tuples with match.groups()
Note that [aA-zZ]
is valid while [Aa-Zz]
is invalid as the ASCII character 'a' comes before 'A' and not the other way around.
*
(asterisk): Matches everything including repetitions
+
: Matches one or more occurrrences of the character that comes before it.?
: Matches zero or more occurrences of the character that comes before it.
Escaping/Matching characters
\
(escape character): Escapes the wildcard characters. It can escape a special regex character or a special string character. It is for these reasons that raw strings are important.\w
: Matches alphanumeric characters. It matches letters, numbers, and underscores but NOT whitespace characters. It is equivalent to [a-zA-Z0-9_]\s
: Matches whitespace characters. It is equivalent to [ \t\n\r\f\v]\d
: Matches digits. It is equivalent to [0-9]\b
: Matches word boundaries.
Check regex101.com for more.
Rawstrings, IGNORECASE, and DOTALL
Rawstrings do not accept any special characters. We specify it using the letter r
before the string.
Ex:
result = re.search(r"ion", "occupation")
print(result)
# <re.Match object; span=(7, 10), match='ion'>
It is highly recommended to use rawstrings for regex stuff.
We can pass in values like IGNORECASE and DOTALL in our third parameters.
IGNORECASE: Ignores the difference between uppercase and lowercase letters.
DOTALL: The .
special can take a newline character if passed.
Environment Variables
To access the environment variables, we type in os.environ
.
os.environ
will return a mapping object. It is an instance of os._Environ
class which itself is a subclass of collections.abc.MutableMapping
. It is made to behave like dict
but it is not related to the dict
type.
If a variable passed exists in our environment (type env
to see all the environment variables), we can type:
os.environ["PATH"]
This returns the paths for the PATH variable.
If the variable did not exist, it will print to stderr
. It will complain that the key does not exist.
A workaround for there is the get
function. It is written like so.
os.environ.get("PATH")
It can accept two arguments. Here, the second argument is assigned to None
. It will return by default if a particular variable did not exist unless states othewise..
Exit Status
See GNU's website for more info.
In general:
0: Successful 1: Not successful
The more you know!
You can use the inspect
module to find the source file and the documentation of a particular function.
Ex:
print(inspect.getfile(os.environ.get))
This will give us /usr/lib/python3.10/_collections_abc.py
. It's very neat. Check out the inspect
module for more.
subprocess modules
It is used to run Linux/Windows commands.
subprocess.run
sends ICMP packets that are executed within a script.
Ex:
subprocess.run(["ls", "-l"])
If we want to manually check the exit status. The subprocess
module has a variable called returncode
. We use it as such:
result = subprocess.run(["ls", "This_file_does_not_exist"])
print(result.returncode)
# Prints 2
If we want to take a "screenshot" of the output when we passed in our commands, we need to set capture_output to true. We set to true in order to order the use of attributes stdout and stderr.
We use it like so:
result = subprocess.run(["host", "8.8.8.8"], capture_output=True)
# Now it's stored in the stdout attribute as it won't output an error.
#Print the stdout
print(result.stdout)
# Returns
# b'8.8.8.8.in-addr.arpa domain name pointer dns.google.\n'
Note that the letter "b" is meant to say, "This is not a proper string, it is an array of bytes"
Bytes literals are always prefixed with 'b' or 'B'; they produce an instance of the bytes type instead of the str type. They may only contain ASCII characters; bytes with a numeric value of 128 or greater must be expressed with escapes. link
In order to translate an arary of bytes into a regular Python string, we use the decode()
function. It will turn into an UTF-8 encoding by default.
print(result.stdout.decode())
# Prints (including the newline character)
#b'8.8.8.8.in-addr.arpa domain name pointer dns.google.
#
The get() function in the object dict
According to pydoc:
| get(self, key, default=None, /)
| Return the value for key if key is in the dictionary, else default.
And the Python manual
The dict datatype in Python has a get function. We use this function in order to avoid errors.
If the second parameter is not passed, it will return None
.
If the key does not exist in the dictionary, it will return the 2nd parameter if passed.
usernames = {}
name = "good_user"
usernames[name] = usernames.get(name, 0) + 1
print(usernames)
The sys module
Let's say you want to take in command line arguments before running a Python file. In C, we pass these in our main function:
int main(int argc, char* argv[]){
return 0;
}
We Python, we simply import sys
:
import sys
first_arg = sys.argv[0] #Returns the filename
second_arg = sys.argv[1] #Returns the passing argument (if it exists)
#Note that if the 2nd argument is not passed, it will return an IndexError exception
print("This is the first arg", first_arg)
print("This is the second arg", second_arg)