Count Byu (reading and writing files) Transcript

Start visual description. The instructor’s screen is shared where he shows how to prepare and write code for a particular program. He demonstrates the steps as he describes each aspect. The instructor can be seen in the top right-hand corner in a small box. End visual description.

[00:00:00] Instructor: Alright, let’s go through the BYU Count file, reading and writing activity. So here we’re going to write a program that takes an input file and an output files as commandline arguments.

[00:00:11] And for each line in the input file, we’re going to write a line in the output file that has a corresponding number B Y or U characters found in the input and we’ll ignore casing while we do it. So for an example, your big book is ugly has a y, b, b, u, and y, oh and a u in your. So that’s six, right? Or Yuba is beyond. Y, u, b, b, y. Five letters. So we’re going to count up the number of B’s, Y’s, and U’s in each line and then print that to the output file.

[00:00:49] All right. So what do we see here? First thing I like to do is just sketch out the basic skeleton of the program, right? So we know we need command line arguments. So I’ll just include input sys right off at the top. We’re going to have some main function and if name is equal to name, then we’re going to call name. Great.

[00:01:22] Now we know we’re going to have command line arguments. Let’s wire those in and see what that means. So here we have a program that takes an input file and an output file as command line arguments. We’ll just use that order, input file first, output file second.

[00:01:39] So the way I think about these kinds of programs where I’m taking command line arguments in and working with files, I do all of the sys stuff down in if name equals name block. There’s a lot of reasons for that and I can’t explain them all right now. It’s sufficient to know though that that’s good practice and best practice. It’s wise to get into that habit and learn how to separate the sys argv stuff from the rest of your code.

[00:02:10] So I want to use references to sys argv down here and I want my main function to instead just talk about the kinds of data that it needs, right? So this program, it needs an input file and an output file. And so I’m going to specify that here. Input file, output file. That’s what my program needs. And the main function kind of represents my program and the data that it needs.

[00:02:35] Down here, the job of this little block is to grab the information from the command line and interpret it. So what arguments are, what position of sys argv, and pass those in the main.

[00:02:46] So we want the input file to be first. So that’s going to be sys argv at position one and then the output file will be second at sys argv position two. So I’ll pass the values that are found at these two positions into my main function. My main function can now just think about input files and output files and doesn’t have to think about command lines. That’s important.

[00:03:07] So now we turned our empty file into not an empty file. Now, we can think about how are we going to decompose the problem that we have. Well, a lot of these file-processing programs follow the same pattern over and over and over, right? You have some input file that you’re going to read into your program and you’re going to do something to that data and then you’re going to end up writing that back out to another file, right? And so you kind of have a three-step process over and over and over. You’ll see this pattern repeatedly.

[00:03:46] And so looking for this pattern makes writing the programs simpler. And so you can see, oh, I need to read the lines, I need to transform those lines somehow and then I need to write those lines back out.

[00:04:00] And that transformed the lines part, or change the lines, work with the lines, that’s just the list pattern. We’ve seen this before, you know, you’re going to transform them or you’re going to filter them or you’re going to select from them or something to that effect, which we know how to do. And so we’re breaking down a larger program once again into just a simple pattern of read, list pattern, write. So let’s set that basic structure up in our program. We like to read and write stuff.

[00:04:32] You can copy and paste these from files you already have or if you’ve got them memorized, you can just write it out. File name with open file name as file, return file dot lines. And then we’re going to have def writelines which takes a filename and some content, with open filename, writeable as file, file dot writelines, content.

[00:05:05] So now we have a writelines file, so we can set this up. I can say lines equals read lines from the input file. And then we’re going to have some kind of like lines equals change lines. We could rename that in a minute. And then we’re going to write the lines output file. The content is the lines in this case, that’s what we want to write. So there’s the overall structure now of our program with the first and last steps already written.

[00:05:40] So now we just need to understand what does this mean. Now change lines, we got to take a bunch of lines and turn them into outputs that look like this. Well, that’s a list pattern, right? That’s a mapping pattern. So if we come up here def, we’ll just change lines. Sure, why not? Lines. Then let’s set up the basic mapping pattern, right? So new lines equals empty, for line in lines, new lines append and change line.

[00:06:21] It’s so we’ve peeled off one more layer of the problem. We knew that this was just simple mapping pattern. We’ve seen this code many times before and now we just need to write the function that operates on an individual line, and change lines applies that now to all of the lines and returns a new list.

[00:06:40] What does it mean to change a new line? Maybe we should come up with something better for this, right? What’s the process that we’re trying to do here? Well, I’m trying to count BYU maybe for each one, b’s, y’s and u’s.

[00:06:53] So I’m going to rename this count_byu. So for each line, I want to count byu and defining what that should be in a doc string can help us better understand now the specific problem, right? So let’s say that count byu line.

[00:07:17] So in this case, line looks something like what? It’s going to look like this. So I want to highlight, copy, paste, maybe we’ll put that in quotes, right? So this is what my line looks like. In reality, there’s also a new line at the end of it that doesn’t copy over, but I can put that back in. So I want something that looks like this and I want my return to look something like what? I want it to look ready to be written out because I’m writing a line.

[00:07:52] I need to remember the new line that’s going out. So I want a six new line to be the result of something that looks like this. So can I turn this into that? All right. So now we have our marching orders.

[00:08:08] So we could say, you know, result lines is empty. I’m sorry, this is a single line, right? So what have we got? We’re going to march through here, we’re going to count. That sounds like an accumulate pattern. So we march through each letter.

[00:08:24] If it belongs to B, Y or U, we’re going to output a six. So let’s say total equals zero for letter in line. If letter dot lower in BYU, total plus equals one. So that gives us our total as a number, as a six.

[00:08:46] Remember the pattern for ignore case very often involves just making everything lowercase and comparing it there. So if the lower case letter is in BYU, then both uppercase or lowercase letters count and that’s what we want. We’re going to count it up.

[00:09:03] Now, I have a six and I need a string six with a new line in it. How do we get data into strings? We use a formatting pattern. So I can just say, hey, let’s return a formatted string where the total is here, followed by a new line. Return that.

[00:09:29] So now I’ve taken something that should look like this and turned it into something that should look like that. And now for every line we do that, this function will apply that change to every line and return a new list that’s ready to write and we’re all set up and we can give this a go if we want. So we can open up our terminal.

[00:09:52] Mine’s going to drop me into the lecture files because that was the last folder that I had used in my terminal. I can look in here. There’s all kinds of files I might consider. But I’ve got BYU text dot txt. I can find that up in here as well, right? Your big book is ugly. Yubba is beyond. Yes. No. BYU. That looks like something we could use to test this out and see what the output should be.

[00:10:20] You know this should be a six, this should be a five, this should be a one, this should be a zero, and this should be a three when we run it. So in the terminal, I can say Python and then our script name, which was called count BYU py and we give it the input file.

[00:10:41] How do we know we need an input file, right? We have to kind of look and see. What was the name we used in our main method that helps us understand the meaning of that parameter? So if I’m passing sys argv one first in my main method, that means that I intend for that to be my input file.

[00:10:59] And if I’m passing sys argv two second, then I’m intending for that to be my output file, right? So down here, let’s give it an input file. Let’s give it BYU text dot txt and now we need an output file for it. And so you could say BYU counts dot txt. And let’s give that a try. So then it runs and we’ll see up here BYU counts dot txt just appeared.

[00:11:26] You can also use LS in my terminal and there it is byu counts dot txt. We can look in there 65103, which is what we were hoping for. Yes.

[00:11:37] Now, sometimes when you’re writing programs that use command line arguments here, you want to debug, right? If I just debug this up here and come up here and hit run, for example, or let’s go like this, say run count BYU. It’s going to try and it’s going to say list index out of range because there’s no input arguments when I click the button up here. There is a way to do that and you’re welcome to Google that and understand how to do it through the edit configurations button up here.

[00:12:15] But another simple way is that you can comment out your input line here and just hard code some option. So I could say um BYU text dot txt and BYU counts two dot txt for example. And by hard-coding these in now it’s not looking on the command line anymore for anything, and this is great. Now I could debug it, I could put some breakpoint in here.

[00:12:47] I could say go, load up the debugger for us. Great. And now it hits my break point and stops, and I can see here’s the lines, your big book is ugly, blah, blah, blah, blah blah, right? And then if I take a step to the next line, now lines shows my outputs and I can write that out. Let’s hit continue and it’ll finish and write out my new file.

[00:13:09] So that’s a great way, a simple way to be able to run your code within PyCharm usually because you want to use the debugger and the breakpoints. Just remember that you did that because if you plug this into the auto grader in grade scope or you try to run the py tests, it’s going to fail here because you’ve hard- coded this now, the tests actually try to pass in different arguments and you’re not looking at the arguments, you’re looking at a very specific file. So that’s not going to work.

[00:13:40] And so when you’re done debugging, just make sure you come back and change it back to using the command line arguments before you run py test or before you submit it online. With that, have fun reading and writing from files!