Python Essentials

My Essential Python Notes

Garen Ikezian published on
12 min, 2237 words

Categories: Python

Tags: Python

Working with Files in Python Cheatsheet

Working with files in files is syntatically intuitive. There are are built-in functions and there are also modules like os that can be imported to achieve such tasks.

We can start with with some well-known built-in functions:

open()

A built-in function that returns a file object. The second parameter can be either of the following:

  • r:

    • Reads the file (selected by default).
    • It will return an exception if the file does not exist.
  • r+:

    • Reads and appends the file
    • It will return an exception if file does not exist.
  • a:

    • Can only appends text to the file. It cannot read the file.
    • File will be created if it does not exist.
  • a+:

    • Appends and reads the file.
    • File will be created if it does not exist.
  • w:

    • Overwrites the file.
    • File will be created if it does not exist.
  • x:

    • Creates the specified file but cannot read the file, it can only write to the file.
    • It will return an exception if file exists.
  • x+:

    • Creates the specified file but can read and write the file.
    • It will return an exception if file already exists.

Based on the modes above, it may return the exception FileNotFoundError.

Note: Do not be confused open() with os.open(). os.open() returns a file descriptor while open() returns a file object.

write()

It is a built-in function in Python.

If the mode in open() is set to r+, w, or a, write() can be called.

Otherwise, it will complain that the file is not meant to be written:

io.UnsupportedOperation: not writable

read()

Reads the whole file all at once. It is ideal for short texts.

    file = open("text.txt", "r")

    content = file.readline()

readline()

Reads a file line by line with every call. If the function is used inside a loop, it will read the whole line including the newline character (notice the end in the print function).

Ex:

    file = open("text.txt", "r")
    # This will read only 5 lines.
    for i in range(0, 5):
       print(file.read(), end='\0') 

The two ways to open a file

What you might notice is that a file can be opened either like this:

    file = open("text.txt", "r")
    file.read()
    file.close()

Or this:

    with open("text.txt") as file
        print(file.read())

The latter has two advantages:

  • It is more concise.
  • It manages exceptions better when the files are closed.

Some os module functions to know of include:

  • os.remove(): Removes the specified file in the current directory.

  • os.rename(): Renames the specified file ("src") to a new one ("dst").

  • os.mkdir(): Creates a new directory in the current working directory (cwd).

  • os.chdir(): Changes the current working directory to the specified directory. It can either be relative or absolute paths.

  • os.rmdir(): Deletes the specified directory (it will only work if it is empty)

  • os.listdir(): Lists all the files and subdirectories in a specified directory. It returns a list of files/dirs found in the cwd.

  • os.getcwd(): Returns the name of the cwd.

  • os.pardir: It is a constant string that gives us the cwd's parent directory name (Note that os.chdir("..").getcwd() does not work).

The os.path submodule is reserved for path related functions. Almost all of the function's arguments accept pathnames.

Such functions include:

The datetime module involves manipulating date and time.

There is a datetime object in the datetime module (datetime.datetime).

Working with CSV Files

There is a csv module to parse csv files.

Such well-known functions in the csv module include

  • csv.reader(): Returns a list from each row in the csv text file.
  • csv.writer(): Returns an instance of a csv writer class.
  • csvwriter.writerow(row): Writes the a single row at a time (even if there are multiple rows in the list, it will be treated as a single row and the brackets are included. It is ideal if called inside a loop).
  • csvwriter.writerows(row): Writes multiple rows at a time.
  • csv.DictReader(): Reads csv file and creates a dict object in each order.
  • csv.DictWriter(): Writes a dict object to a csv file. The fieldname parameter is optional.

DictReader creates an object that operates like a regular reader (an object that iterates over lines in the given CSV file), but also maps the information it reads into a dictionary where keys are given by the optional fieldnames parameter. If we omit the fieldnames parameter, the values in the first row of the CSV file will be used as the keys. So, in this case, the first line of the CSV file has the keys and so there's no need to pass fieldnames as a parameter.

Regex stuff in Python

Functions

Checkout the re module. It is built-in in Python.

We use the re module. Inside the re module there are functions like:

  • re.search(): Finds the first match anywhere in the string.
  • re.match(): Finds the match only at the BEGINNING of the string.
  • re.fullmatch(): Checks for entire string to be a match. It does not return a substring.
  • re.findall(): It is like re.search, but it finds all the matches anywhere in the string. It returns a list or tuples.

Their differences are outlined here

They all return a re.Match object:

There are functions for the re.Match object.

Match.groups: Returns a tuple of groups if the parentheses () exist in the passed regex (like (FirstWord,)). Otherwise, it will return ().

The Match object can be treated as an array where index 0 is the fully matched string while the latter indices are the groups.

Special characters

Some special characters to note are:

  • . (dot): Accepts any single character except for a newline character (unless if dotall has been specified)

  • ^ (caret): Matches at the start of the string. If inside the [], it means "not the following".

  • $: Matches at the end of the string.

  • []: If the reggex is with other characters, it is like .. But the only difference is that it has to match the characters passed. For example:

    • [Pp]: P or p
    • [0-9]: 0 to 9
    • |: A match that either this substring or that substring. Ex: cat|dog.
    • [a-z]: only lowercase a to z
    • [A-Z]: only uppercase A to Z
  • (): Creates a group. It is used to make tuples with match.groups()

Note that [aA-zZ]is valid while [Aa-Zz]is invalid as the ASCII character 'a' comes before 'A' and not the other way around.

* (asterisk): Matches everything including repetitions

  • +: Matches one or more occurrrences of the character that comes before it.
  • ?: Matches zero or more occurrences of the character that comes before it.

Escaping/Matching characters

  • \ (escape character): Escapes the wildcard characters. It can escape a special regex character or a special string character. It is for these reasons that raw strings are important.
  • \w: Matches alphanumeric characters. It matches letters, numbers, and underscores but NOT whitespace characters. It is equivalent to [a-zA-Z0-9_]
  • \s: Matches whitespace characters. It is equivalent to [ \t\n\r\f\v]
  • \d: Matches digits. It is equivalent to [0-9]
  • \b: Matches word boundaries.

Check regex101.com for more.

Rawstrings, IGNORECASE, and DOTALL

Rawstrings do not accept any special characters. We specify it using the letter r before the string. Ex:

result = re.search(r"ion", "occupation")
print(result)
# <re.Match object; span=(7, 10), match='ion'>

It is highly recommended to use rawstrings for regex stuff.

We can pass in values like IGNORECASE and DOTALL in our third parameters.

IGNORECASE: Ignores the difference between uppercase and lowercase letters. DOTALL: The . special can take a newline character if passed.

Environment Variables

To access the environment variables, we type in os.environ.

os.environ will return a mapping object. It is an instance of os._Environ class which itself is a subclass of collections.abc.MutableMapping. It is made to behave like dict but it is not related to the dict type.

If a variable passed exists in our environment (type env to see all the environment variables), we can type:

os.environ["PATH"]

This returns the paths for the PATH variable.

If the variable did not exist, it will print to stderr. It will complain that the key does not exist.

A workaround for there is the get function. It is written like so.

os.environ.get("PATH")

It can accept two arguments. Here, the second argument is assigned to None. It will return by default if a particular variable did not exist unless states othewise..

Exit Status

See GNU's website for more info.

In general:

0: Successful 1: Not successful

The more you know!

You can use the inspect module to find the source file and the documentation of a particular function.

Ex:

print(inspect.getfile(os.environ.get))

This will give us /usr/lib/python3.10/_collections_abc.py. It's very neat. Check out the inspect module for more.

subprocess modules

It is used to run Linux/Windows commands.

subprocess.run sends ICMP packets that are executed within a script.

Ex:

    subprocess.run(["ls", "-l"])

If we want to manually check the exit status. The subprocess module has a variable called returncode. We use it as such:

    result = subprocess.run(["ls", "This_file_does_not_exist"])
    print(result.returncode)
    # Prints 2

If we want to take a "screenshot" of the output when we passed in our commands, we need to set capture_output to true. We set to true in order to order the use of attributes stdout and stderr.

We use it like so:

    result = subprocess.run(["host", "8.8.8.8"], capture_output=True)
    # Now it's stored in the stdout attribute as it won't output an error.

    #Print the stdout 
    print(result.stdout)
    # Returns 
    # b'8.8.8.8.in-addr.arpa domain name pointer dns.google.\n'

Note that the letter "b" is meant to say, "This is not a proper string, it is an array of bytes"

Bytes literals are always prefixed with 'b' or 'B'; they produce an instance of the bytes type instead of the str type. They may only contain ASCII characters; bytes with a numeric value of 128 or greater must be expressed with escapes. link

In order to translate an arary of bytes into a regular Python string, we use the decode() function. It will turn into an UTF-8 encoding by default.

    print(result.stdout.decode())
    # Prints (including the newline character)

    #b'8.8.8.8.in-addr.arpa domain name pointer dns.google.
    #

The get() function in the object dict

According to pydoc:

| get(self, key, default=None, /)

| Return the value for key if key is in the dictionary, else default.

And the Python manual

The dict datatype in Python has a get function. We use this function in order to avoid errors. If the second parameter is not passed, it will return None.

If the key does not exist in the dictionary, it will return the 2nd parameter if passed.

    usernames = {}
    name = "good_user"
    usernames[name] = usernames.get(name, 0) + 1
    print(usernames)

The sys module

Let's say you want to take in command line arguments before running a Python file. In C, we pass these in our main function:

int main(int argc, char* argv[]){
    return 0;
}

We Python, we simply import sys:

    import sys

    first_arg = sys.argv[0] #Returns the filename
    second_arg = sys.argv[1] #Returns the passing argument (if it exists)
    #Note that if the 2nd argument is not passed, it will return an IndexError exception

    print("This is the first arg", first_arg)
    print("This is the second arg", second_arg)