This semester is blowing by fast. For Digital Humanities, I'm a little behind schedule. I am still trying to work through Python and understand it.
I am also having the same trouble with the Control Panel, as they are related.
I could follow the steps in the Python tutorial pretty well, but was having
trouble using my own Control Panel. I am still having trouble wrapping my head
the whole process (and even, to an extent, the purpose) of the Command Line and Python, as well as having trouble understanding how
to run and use them. However, after reading and discussing about text mining and topic
modeling, I can better see what Python and the Command Line are actually for, at least in one respect: topic modeling.
Especially after looking at the readings for this week for topic modeling, I could see that Python cleans texts with OCR. At the beginning of the month, I was less sure about how this was
going to fit into the course and what I was supposed to use it for. Now I can
see it’s for cleaning PDFs so they can be mined, which is cool. I have not
finished the whole tutorial yet but I had to come back to it after working with
Voyant last week to get ready to work with Mallet. I haven’t started on Mallet
yet, though. I did the readings for topic modeling, so I feel I know at least
in theory, what I’ll be working with and for. I have not programmed the OCR yet
either, which I will need to do before I start Mallet as well, at least if I am
to going to try using my own PDFs (although I will probably end up using stock
ones or finding full text URLs that have already been cleaned to use instead of
my PDFs, which need to be cleaned with OCR). In all, I’m behind, but I will get
there and I feel I know more about the things we are supposed to be doing. And even
though I'm really struggling with being able to understand how to actually do
these things step by step, I feel I conceptually understand what the
programs are and what they are supposed to do. I feel I have progressed in
general understanding, which is a comfort.
No comments:
Post a Comment