Understanding Package Imports in Python
I have been having an embarassingly hard time getting a handle on package imports in Python. I’ll get something working, only to have it break inexplicably when I make what seems to be an incidental change. Tests will run in one directory but not another, but then inadvertently start working, only to stop a few days or minutes later. I’ve tried to be methodical in investigating what changes lead to what behavior, but it’s been difficult. To hopefully put this to rest, I’m going to investigate and methodically record all the behavior I can isolate regarding package imports, to hopefully make some sense of what’s going on.
PYTHONPATH
First, let’s start with a project I’m calling ‘backend’. Here’s the file structure:
backend/
backend/
__init__.py
analyzer.py
tests/
__init__.py
test_analyzer.py
And my PYTHONPATH
:
So, my PYTHONPATH
is pointing to the directory containing the backend package, but not to the backend package itself (which contains __init__.py
).
Let’s open up a terminal and play around:
Very cool. Now let’s cd
into the backend
package and see if anything changes:
That’s curious.
Let’s see what happens when we remove the path to the project from our PYTHONPATH
.
And into Python, from backend/
:
And, from backend/backend
:
Ah. Now we’re getting somewhere. So you can import a package that is within your current working directory without having that package in your PYTHONPATH
, as a local import. From anywhere else, you’ll need your PYTHONPATH
to point to it.
To double-check, let’s cd
all the way to /
and try to import:
So, a package must be contained in a directory on your PYTHONPATH
to be able to import it from anywhere other than the directory immediately above it.
To verify, let’s try changing our PYTHONPATH
:
Here, we’ve pointed it to the package itself, not to the containing directory. Let’s try importing it from backend/
:
and from backend/backend/
:
and from /
:
Makes sense. But what happens if we cd
up one directory above backend/
?
Strange. I would have expected this import to have failed, but it imported backend
as though it were local. Let’s go up one more directory:
And now it fails, as it should. My suspicion is that, since the paragon
directory contained the backend
directory which contained the backend
package, python was able to look into the similarly-named directories. Let’s try renaming the outer backend
directory to backend1
and see what happens.
Ok, so that makes sense (note that renaming backend
to backend1
meant that the PYTHONPATH
was no longer valid. Hence the failure meant that the local import wasn’t working.)
We can verify this by playing a bit more with the PYTHONPATH
:
From paragon/
Note that we’re doing an absolute import, not a relative import, because the name of the directory and the package are no longer the same. And now changing the directory name back to backend
:
It goes back to importing locally. Now I understand the convention of naming directories after the packages they contain.
Submodules
Now, let’s look at importing modules from within a package.
For a long time, I assumed that if you imported a package, you could automatically access all of the modules within the package. It took an uncomfortably large amount of time debugging testing errors that I finally realized that this wasn’t the case.
Let’s play around and try importing the analyzer
module inside the backend
package.
Alright. So it seems that we have to explicitly import submodules inside a package. Once a module is imported, though, we can use all of the functions that module defines.
What if we don’t want to import things one-by-one? Can we use the from module import *
on a package?
Doesn’t seem like it. But what about this?
Ah! Since analyzer
is a module, we could import all of the functions from the module, without importing any of their wrapper files into the namespace. Good to know.
Running Tests
Now that that’s a bit clear, let’s take a look at running tests.
For reference, this are the import statements in the test file:
scripts.py
and synth.py
contain tools for generating synthetic data and mocks, feel free to ignore those for now.
First, running the test file as a simple Python script:
Ok, that worked out. Now, though, we’ll try running the test using the pytest framework:
What is this? This is the bug that has been haunting me. Usually I just delete files and change paths at random until something starts to work. This time, I decided to delete the __init__.py
from the tests/
directory. Why? I have no idea. YOLO. I ran the tests again and got this:
Well… something changed at least. Investigating the error, I notice that it suggests I delete the __pycache__
folders that have been popping up in my projects. I’ve set my iPython interpreter not to generate these kinds of files, but I’ve been dropping into vanilla Python from time to time, so I guess that’s where these came from. I go ahead and delete all of these files from the project, and try running the test again:
OH COME ON. Really? This isn’t the first time that .pyc
and __pycache__
have caused me some pain. But this is good, this is progress. Let’s run the whole shebang:
Alright! Let’s try an experiment: moving the tests/
directory one level up, so it’s a sibling directory to the backend
package, rather than a child. Typing this command: mv backend/tests .
gives us this:
backend/
backend/
__init__.py
analyzer.py
tests/
test_analyzer.py
Fingers crossed. py.test
Error! But the same as before. Delete tests/__pycache__
and try again. SUCCESS!! Let’s commit.
Sigh. As with most thing programming, #itsalwaysusererror. Mind your PYTHONPATH, attend to naming conventions, clear out your cached files, and you’ll have a long and happy life.
BONUS: Testing a Non-Package
Let’s say you’re working on a project but don’t want to add it to your PYTHONPATH. It’s a work-in-progress, no one else should be able to import it, what have you. Can you still import those modules to test them?
It seems like it.
Let’s consider another project, a webapp, with the following structure:
webapp/
webapp/
__init__.py
views.py
tests/
test_integration.py
Note that this directory is not in my PYTHONPATH:*
*While writing this post, I read some articles advocating for keeping trailing slashes in your PATH variables. I thought it was a good idea, so I’ve changed my PYTHONPATH accordingly.
Let’s try running py.test
from webapp/
:
Very cool. But what happens if we change directories?
From webapp/tests/
:
Hmm. Can’t find the package. Here are the import statements at the top of the test:
It seems like running py.test
from the top of the project means that the test can look for local packages from that location. Running the tests from inside the test directory means that the package has to be imported via PYTHONPATH
.
Well… that makes sense! In the end, it all makes sense.
Further reading: another pretty good article on imports.