> [!META]- Inline Metadata
> [tags:: #advent-of-code/2022]
> [status:: done]
> [link::]
> [up:: [[Advent of Code 2022 MOC]]]
## Problem
You can hear birds chirping and raindrops hitting leaves as the expedition proceeds. Occasionally, you can even hear much louder sounds in the distance; how big do the animals get out here, anyway?
The device the Elves gave you has problems with more than just its communication system. You try to run a system update:
```
$ system-update --please --pretty-please-with-sugar-on-top
Error: No space left on device
```
Perhaps you can delete some files to make space for the update?
You browse around the filesystem to assess the situation and save the resulting terminal output (your puzzle input). For example:
```
$ cd /
$ ls
dir a
14848514 b.txt
8504156 c.dat
dir d
$ cd a
$ ls
dir e
29116 f
2557 g
62596 h.lst
$ cd e
$ ls
584 i
$ cd ..
$ cd ..
$ cd d
$ ls
4060174 j
8033020 d.log
5626152 d.ext
7214296 k
```
The filesystem consists of a tree of files (plain data) and directories (which can contain other directories or files). The outermost directory is called `/`. You can navigate around the filesystem, moving into or out of directories and listing the contents of the directory you're currently in.
Within the terminal output, lines that begin with `
are _commands you executed_, very much like some modern computers:
- `cd` means _change directory_. This changes which directory is the current directory, but the specific result depends on the argument:
- `cd x` moves _in_ one level: it looks in the current directory for the directory named `x` and makes it the current directory.
- `cd ..` moves _out_ one level: it finds the directory that contains the current directory, then makes that directory the current directory.
- `cd /` switches the current directory to the outermost directory, `/`.
- `ls` means _list_. It prints out all of the files and directories immediately contained by the current directory:
- `123 abc` means that the current directory contains a file named `abc` with size `123`.
- `dir xyz` means that the current directory contains a directory named `xyz`.
Given the commands and output in the example above, you can determine that the filesystem looks visually like this:
```
- / (dir)
- a (dir)
- e (dir)
- i (file, size=584)
- f (file, size=29116)
- g (file, size=2557)
- h.lst (file, size=62596)
- b.txt (file, size=14848514)
- c.dat (file, size=8504156)
- d (dir)
- j (file, size=4060174)
- d.log (file, size=8033020)
- d.ext (file, size=5626152)
- k (file, size=7214296)
```
Here, there are four directories: `/` (the outermost directory), `a` and `d` (which are in `/`), and `e` (which is in `a`). These directories also contain files of various sizes.
Since the disk is full, your first step should probably be to find directories that are good candidates for deletion. To do this, you need to determine the _total size_ of each directory. The total size of a directory is the sum of the sizes of the files it contains, directly or indirectly. (Directories themselves do not count as having any intrinsic size.)
The total sizes of the directories above can be found as follows:
- The total size of directory `e` is _584_ because it contains a single file `i` of size 584 and no other directories.
- The directory `a` has total size _94853_ because it contains files `f` (size 29116), `g` (size 2557), and `h.lst` (size 62596), plus file `i` indirectly (`a` contains `e` which contains `i`).
- Directory `d` has total size _24933642_.
- As the outermost directory, `/` contains every file. Its total size is _48381165_, the sum of the size of every file.
To begin, find all of the directories with a total size of _at most 100000_, then calculate the sum of their total sizes. In the example above, these directories are `a` and `e`; the sum of their total sizes is `_95437_` (94853 + 584). (As in this example, this process can count files more than once!)
Find all of the directories with a total size of at most 100000. _What is the sum of the total sizes of those directories?_
## Scratchpad
### Part 1
Build a TreeNode class: parent (TreeNode), children (List[TreeNode]), size (if file,), name (str)
add_child() (adds parameter Node to children, updates parameter's parent, ensure size is 0)
Build a Tree class:
root TreeNode
current_node TreeNode
calculate_size(TreeNode) if size 0, get size of all children, else return size
~~add_child -> add file or directory~~
cd -> move either to parent or to specified child
Find directories where size > 100000
>[!NOTE]
>This can include a subdirectory and the parent directory separately, even though the parent size includes the subdirectory size
This part screams "subproblems!" to me. So we can probably do a variation of [[70. Climbing Stairs]], but since the subproblems are not repeated, we don't get that advantage. We have to basically do a depth-first recursion to the lowest node with children, get its size, then that of siblings, and then that will give that of the parent node (dir).
In this vein, we could basically drill down until we find items all with sizes (which tells us they're files). We can then add them and mutate the directory's 0 size to the sum of those files (with parent), then when we go back through, treat the directory as a file and stay at that level? Or just use whatever data structure to keep track of each dir's total size as a 'visited' or 'measured' item (ie if in dictionary, stay at this level and add to parent).
**Idea** have a `visited` dictionary, keys are the file/dir names, value is the size (if dir size of all children). Do DFS-type algo to drill down, at each dir, check if children are in visited hash, if they all are, add the sum of them together and add to
Once the size dict is built, iterate through `visited` with `dict.items()` and add anything over 10000 to a `sum` variable to be returned.
This doesn't work because there can be subdirectories with different parents and the same name.
Ended up going with a size mutation (and updating code to use the presence of children to determine if a node is a dir) strategy, then writing another recursive function to go through the tree again to collect into a list all sizes <= a given limit, then sum that.
## Solution
```python
TEST_INPUT = """
$ cd /
$ ls
dir a
14848514 b.txt
8504156 c.dat
dir d
$ cd a
$ ls
dir e
29116 f
2557 g
62596 h.lst
$ cd e
$ ls
584 i
$ cd ..
$ cd ..
$ cd d
$ ls
4060174 j
8033020 d.log
5626152 d.ext
7214296 k
"""
class TreeNode:
def __init__(self, name: str, size: int = 0):
self.name = name
self.size = size
self.parent = None
self.children = [] # Deciding to go with a common structure for dirs and files
def __str__(self):
return f"Name: {self.name}\tSize: {self.size}"
def __repr__(self, level=0):
ret = "\t" * level + repr(self.name) + "\n"
for child in self.children:
ret += child.__repr__(level + 1)
return ret
class Tree:
def __init__(self):
self.root = TreeNode(name="/")
self.current_node = self.root
def cd(self, dest: str) -> None:
if dest == "..":
if self.current_node != self.root:
self.current_node = self.current_node.parent
elif dest == "/":
self.current_node = self.root
else:
for child in self.current_node.children:
if not child.size and dest == child.name:
self.current_node = child
return
print(f"Directory {dest} not found")
def insert_child(self, child: TreeNode) -> None:
if self.current_node.size:
# Not raising an exception because we want the program to continue
print(f"This node is not a directory:\n{self.current_node}")
elif child in self.current_node.children:
print(f"Item already in directory:\n{child}")
else:
self.current_node.children.append(child)
child.parent = self.current_node
def parse_input(input_string: str) -> Tree:
fs_tree = Tree()
for cmd in input_string.split("\n"):
cmd = cmd.split()
if not cmd:
continue
if cmd[0] == "quot;:
if cmd[1] == "cd":
fs_tree.cd(dest=cmd[-1])
else:
if cmd[0] == "dir":
fs_tree.insert_child(child=TreeNode(name=cmd[1]))
else:
fs_tree.insert_child(child=TreeNode(name=cmd[1], size=int(cmd[0])))
_load_node_sizes(node=fs_tree.root)
fs_tree.current_node = fs_tree.root
return fs_tree
def find_dir_total_size(tree: Tree) -> int:
"""
Answers part 1
"""
sizes = _get_dir_sizes(node=tree.root, sizes=list(), limit=100_000)
return sum(sizes)
def delete_smallest_dir(tree: Tree) -> int:
"""
Find smallest directory that will free up total space needed
Answers part 2
"""
total_drive_size = 70_000_000
needed_free_space = 30_000_000
current_free_space = total_drive_size - tree.root.size
space_to_free_up = needed_free_space - current_free_space
sizes = _get_dir_sizes(node=tree.root, sizes=list())
return min([size for size in sizes if size >= space_to_free_up])
def _load_node_sizes(node: TreeNode):
node_total_size = 0
for child in node.children:
if child.children:
_load_node_sizes(node=child)
node_total_size += child.size
node.size = node_total_size
def _get_dir_sizes(node: TreeNode, sizes: list, limit: int = 0) -> list:
for child in node.children:
if child.children:
if limit and child.size <= limit:
sizes.append(child.size)
elif not limit:
sizes.append(child.size)
_get_dir_sizes(node=child, sizes=sizes, limit=limit)
return sizes
if __name__ == "__main__":
tree = parse_input(TEST_INPUT)
print(f"Total dir size for test input (part 1): {find_dir_total_size(tree=tree)}")
print(f"Free up space (part 2): {delete_smallest_dir(tree=tree)}")
with open("input.txt", "r") as f:
input_string = f.read()
tree = parse_input(input_string)
print(
f"Total dir size for test input (part 1): {find_dir_total_size(tree=tree)}"
)
print(f"Free up space (part 2): {delete_smallest_dir(tree=tree)}")
```
## Things I Learned