Count Byu (reading and writing files) Transcript
Start visual description. The instructor’s screen is shared where he shows how to prepare and write code for a particular program. He demonstrates the steps as he describes each aspect. The instructor can be seen in the top right-hand corner in a small box. End visual description.
[00:00:00]
Instructor: Alright, let’s go through the BYU Count file, reading and writing
activity. So here we’re going to write a program that takes an input file and an
output files as commandline arguments.
[00:00:11]
And for each line in the input file, we’re going to write a line in the output file
that has a corresponding number B Y or U characters found in the input and we’ll
ignore casing while we do it. So for an example, your big book is ugly has a y, b, b,
u, and y, oh and a u in your. So that’s six, right? Or Yuba is beyond. Y, u, b, b, y.
Five letters. So we’re going to count up the number of B’s, Y’s, and U’s in each
line and then print that to the output file.
[00:00:49]
All right. So what do we see here? First thing I like to do is just sketch out the
basic skeleton of the program, right? So we know we need command line
arguments. So I’ll just include input sys right off at the top. We’re going to have
some main function and if name is equal to name, then we’re going to call name.
Great.
[00:01:22]
Now we know we’re going to have command line arguments. Let’s wire those in
and see what that means. So here we have a program that takes an input file and
an output file as command line arguments. We’ll just use that order, input file
first, output file second.
[00:01:39]
So the way I think about these kinds of programs where I’m taking command line
arguments in and working with files, I do all of the sys stuff down in if name
equals name block. There’s a lot of reasons for that and I can’t explain them all
right now. It’s sufficient to know though that that’s good practice and best
practice. It’s wise to get into that habit and learn how to separate the sys argv
stuff from the rest of your code.
[00:02:10]
So I want to use references to sys argv down here and I want my main function to
instead just talk about the kinds of data that it needs, right? So this program, it
needs an input file and an output file. And so I’m going to specify that here. Input
file, output file. That’s what my program needs. And the main function kind of
represents my program and the data that it needs.
[00:02:35]
Down here, the job of this little block is to grab the information from the
command line and interpret it. So what arguments are, what position of sys argv,
and pass those in the main.
[00:02:46]
So we want the input file to be first. So that’s going to be sys argv at position one
and then the output file will be second at sys argv position two. So I’ll pass the
values that are found at these two positions into my main function. My main
function can now just think about input files and output files and doesn’t have to
think about command lines. That’s important.
[00:03:07]
So now we turned our empty file into not an empty file. Now, we can think about
how are we going to decompose the problem that we have. Well, a lot of these
file-processing programs follow the same pattern over and over and over, right?
You have some input file that you’re going to read into your program and you’re
going to do something to that data and then you’re going to end up writing that
back out to another file, right? And so you kind of have a three-step process over
and over and over. You’ll see this pattern repeatedly.
[00:03:46]
And so looking for this pattern makes writing the programs simpler. And so you
can see, oh, I need to read the lines, I need to transform those lines somehow
and then I need to write those lines back out.
[00:04:00]
And that transformed the lines part, or change the lines, work with the lines,
that’s just the list pattern. We’ve seen this before, you know, you’re going to
transform them or you’re going to filter them or you’re going to select from them
or something to that effect, which we know how to do. And so we’re breaking
down a larger program once again into just a simple pattern of read, list pattern,
write. So let’s set that basic structure up in our program. We like to read and
write stuff.
[00:04:32]
You can copy and paste these from files you already have or if you’ve got them
memorized, you can just write it out. File name with open file name as file,
return file dot lines. And then we’re going to have def writelines which takes a
filename and some content, with open filename, writeable as file, file dot
writelines, content.
[00:05:05]
So now we have a writelines file, so we can set this up. I can say lines equals read
lines from the input file. And then we’re going to have some kind of like lines
equals change lines. We could rename that in a minute. And then we’re going to
write the lines output file. The content is the lines in this case, that’s what we
want to write. So there’s the overall structure now of our program with the first
and last steps already written.
[00:05:40]
So now we just need to understand what does this mean. Now change lines, we
got to take a bunch of lines and turn them into outputs that look like this. Well,
that’s a list pattern, right? That’s a mapping pattern. So if we come up here def,
we’ll just change lines. Sure, why not? Lines. Then let’s set up the basic mapping
pattern, right? So new lines equals empty, for line in lines, new lines append and
change line.
[00:06:21]
It’s so we’ve peeled off one more layer of the problem. We knew that this was
just simple mapping pattern. We’ve seen this code many times before and now
we just need to write the function that operates on an individual line, and
change lines applies that now to all of the lines and returns a new list.
[00:06:40]
What does it mean to change a new line? Maybe we should come up with
something better for this, right? What’s the process that we’re trying to do here?
Well, I’m trying to count BYU maybe for each one, b’s, y’s and u’s.
[00:06:53]
So I’m going to rename this count_byu. So for each line, I want to count byu and
defining what that should be in a doc string can help us better understand now
the specific problem, right? So let’s say that count byu line.
[00:07:17]
So in this case, line looks something like what? It’s going to look like this. So I
want to highlight, copy, paste, maybe we’ll put that in quotes, right? So this is
what my line looks like. In reality, there’s also a new line at the end of it that
doesn’t copy over, but I can put that back in. So I want something that looks like
this and I want my return to look something like what? I want it to look ready to
be written out because I’m writing a line.
[00:07:52]
I need to remember the new line that’s going out. So I want a six new line to be
the result of something that looks like this. So can I turn this into that? All right.
So now we have our marching orders.
[00:08:08]
So we could say, you know, result lines is empty. I’m sorry, this is a single line,
right? So what have we got? We’re going to march through here, we’re going to
count. That sounds like an accumulate pattern. So we march through each letter.
[00:08:24]
If it belongs to B, Y or U, we’re going to output a six. So let’s say total equals zero
for letter in line. If letter dot lower in BYU, total plus equals one. So that gives us
our total as a number, as a six.
[00:08:46]
Remember the pattern for ignore case very often involves just making everything
lowercase and comparing it there. So if the lower case letter is in BYU, then both
uppercase or lowercase letters count and that’s what we want. We’re going to
count it up.
[00:09:03]
Now, I have a six and I need a string six with a new line in it. How do we get data
into strings? We use a formatting pattern. So I can just say, hey, let’s return a
formatted string where the total is here, followed by a new line. Return that.
[00:09:29]
So now I’ve taken something that should look like this and turned it into
something that should look like that. And now for every line we do that, this
function will apply that change to every line and return a new list that’s ready to
write and we’re all set up and we can give this a go if we want. So we can open
up our terminal.
[00:09:52]
Mine’s going to drop me into the lecture files because that was the last folder
that I had used in my terminal. I can look in here. There’s all kinds of files I might
consider. But I’ve got BYU text dot txt. I can find that up in here as well, right?
Your big book is ugly. Yubba is beyond. Yes. No. BYU. That looks like something
we could use to test this out and see what the output should be.
[00:10:20]
You know this should be a six, this should be a five, this should be a one, this
should be a zero, and this should be a three when we run it. So in the terminal, I
can say Python and then our script name, which was called count BYU py and we
give it the input file.
[00:10:41]
How do we know we need an input file, right? We have to kind of look and see.
What was the name we used in our main method that helps us understand the
meaning of that parameter? So if I’m passing sys argv one first in my main
method, that means that I intend for that to be my input file.
[00:10:59]
And if I’m passing sys argv two second, then I’m intending for that to be my
output file, right? So down here, let’s give it an input file. Let’s give it BYU text
dot txt and now we need an output file for it. And so you could say BYU counts
dot txt. And let’s give that a try. So then it runs and we’ll see up here BYU counts
dot txt just appeared.
[00:11:26]
You can also use LS in my terminal and there it is byu counts dot txt. We can look
in there 65103, which is what we were hoping for. Yes.
[00:11:37]
Now, sometimes when you’re writing programs that use command line
arguments here, you want to debug, right? If I just debug this up here and come
up here and hit run, for example, or let’s go like this, say run count BYU. It’s going
to try and it’s going to say list index out of range because there’s no input
arguments when I click the button up here. There is a way to do that and you’re
welcome to Google that and understand how to do it through the edit
configurations button up here.
[00:12:15]
But another simple way is that you can comment out your input line here and
just hard code some option. So I could say um BYU text dot txt and BYU counts
two dot txt for example. And by hard-coding these in now it’s not looking on the
command line anymore for anything, and this is great. Now I could debug it, I
could put some breakpoint in here.
[00:12:47]
I could say go, load up the debugger for us. Great. And now it hits my break point
and stops, and I can see here’s the lines, your big book is ugly, blah, blah, blah,
blah blah, right? And then if I take a step to the next line, now lines shows my
outputs and I can write that out. Let’s hit continue and it’ll finish and write out
my new file.
[00:13:09]
So that’s a great way, a simple way to be able to run your code within PyCharm
usually because you want to use the debugger and the breakpoints. Just
remember that you did that because if you plug this into the auto grader in grade
scope or you try to run the py tests, it’s going to fail here because you’ve hard-
coded this now, the tests actually try to pass in different arguments and you’re
not looking at the arguments, you’re looking at a very specific file. So that’s not
going to work.
[00:13:40]
And so when you’re done debugging, just make sure you come back and change
it back to using the command line arguments before you run py test or before
you submit it online. With that, have fun reading and writing from files!