April update: machine learning and nature

March was just sort of a not-great month. My grandmother passed away and plans just sort of slid to the side — nothing really derailed, it just slid. Oh well.

April has been more productive overall but is going really fast. The academic year always seems to force an increased energy in April as May will bring final exams, final senior projects, a big talk for me, and several holiday-like events! We’re also gearing up at MCFAM for the summer machine learning camp for high school students. This involves advertising the camp, soliciting more funding, and starting to recruit speakers from local companies and colleges. If you’re interested in speaking or giving us $$$, email me! $1000 funds an instructor for a week; $500 funds an undergraduate TA for a week; and the students and instructors who teach this camp will be able take pretty specialized knowledge to other educational settings to share with more students, so it’s really a good investment.

Back to the personal: this winter I spent a lot of time thinking about nature. Our backyard is not looking great anymore — trees have grown and turned sunny areas into shady areas, and the addition of a small person to our family means that we need to change how we do outdoor work and play. So I started to plan a garden. Time to jump ship now ’cause this is going to get really really long.

As I took an inventory of what we actually have, I realized that almost nothing in our landscaping was actually that pollinator-friendly! I had never realized that. Flowers mean bees are happy, right? Wrong. A lot of the plants we had are not that useful to our bees. They’re not native, they’re not helpful non-natives. Since in the wintertime you can do nothing for the garden but contemplate, I contemplated and read my way through several local library systems.

Gardeners are apparently really fanatical. I had no idea. Did you know there’s a New Perennial Movement? Did you know that there are heated discussions (and a little bit of research) about cultivars of native plants? Cultivars are the prettier versions, the Instagram+Photoshop versions, of regular plant species. They’re selected for attractiveness and growth habit. They often also end up being sterile or having a weird shape that bees can’t get into, for instance.

My spouse picked up “The Know-Maintenance Perennial Garden” and “The No-Work Garden” from the library because that’s what we want, a lazy garden. First I read the No Work Garden book because the author’s name was Bob Flowerdew and how can one get more qualified. He had many smart and snarky British remarks that exposed my gardening bad habits and ignorance. It was funny. He pointed out that growing vegetables with small children is for the delusional if you’re picking up a book called the No Work Garden — you should just plant fruit trees and berry bushes and call it good. That man is right.

I ignored the Know-Maintenance Garden at first because it seemed a little weird with all these grid layouts in the back and all this stuff about grasses. But it ended up being hands down the most provocative book I got, even without snarky Britishisms. Roy Diblik talked about how plants live together: if you’ve got a plant with deep roots and a plant with rhizomes next to each other, they aren’t competing for the same water source; depending on the growth habits of neighbor plants you might get a sprawling mess or you might get a nice thick set of mutually supporting stems. He talked about the ecology of seasonal succession, too. I never thought about plants’ roots and growth habit and neighborliness before, or how you if you know these things you can plan a garden that basically won’t need maintenance for three to five years. Wow!

Then I followed some names from the Know-Maintenance guy’s recommendations and ended up learning about Piet Oudolf, who has some kick-ass gardens that blew my mind. I did not know that gardens could be that way.

Piet Oudolf and Roy Diblik use a lot of prairie plants in their gardens, and as I mentioned Diblik had started me thinking about the ecology of plant communities — not just thinking about gardens as a few flowers, but as complex ecosystems. This is Minnesota, so I checked out local companies like Prairie Moon Nursery and Prairie Restorations. There is a lot of advice for prairie restoration projects. However, (1) we’ve got a minuscule back yard, (2) everyone says it’ll take three years to prep the ground to get a prairie (first you need to kill everything in the soil), (3) unfortunately part of the point is to render the yard acceptable to the urban eye and not have the city called on us, and (4) the rest of the point is to not work so hard! Ugh. So. No prairie restoration. The prairie-inspired edits of Diblik and Oudolf still appealed to me, but again, tiny tiny backyard. There is nothing evoking the wide-open prairie in my 11-foot-deep backyard and there are tons of shrubs and trees. Wait…… trees and shrubs occur in nature. Why isn’t there a shrubland edit?

At that point I stumbled on “Planting in a Post-Wild World” which — hooray! — covered prairies, shrubland/woodland communities, and forest. Yes, these folks thought about getting inspiration from the forest edge, meadows, shrubby savannahs, etc. That’s what I need. But I don’t see any amazing Instagramming award-winning multi-million-dollar-garden-establishing shrubland garden designers. Ideas?

(I told you to bail a while ago. Don’t know why you’re still reading.)

To bring this extended monologue to the current day, I’ll wrap up by saying that throughout this, I’ve stayed interested in how we use plants for food, medicine, and fragrance. Maybe it’s for another day to list out Original Local, Buffalo Bird Woman’s book, A Taste of Heritage which I picked up while visiting the Crow nation, The Sioux Chef, and all the other amazing books and resources that have been helping me understand the Native American perspective on our local ecology, as well as some eclectic reading on European-American approaches to herbalism. There’s a ton to learn at this intersection of science, complex systems, dinnertime, nutrition and health, backyard play, fragrance, and spirituality. I’ll just point out that I’m going to rip up my hostas and sautee them for dinner, I’ll work on extirpating my garlic mustard by making it into pesto, and whenever the Virginia waterleaf starts to annoy me I’m gonna eat that too. And I’ll be able to do any of that before my attempted shrubland masterpiece even gets established. See you in the backyard.

Data Management

This post is for my senior project and research students (undergrads and grads at the University of Minnesota). Reproducibility of research is very important to me. If we’re writing a paper or doing a project together, every member of the project should be able to verify the results and present the project to others. As you go on in your career and build skills and trust with collaborators, you may end up in collaborations where you each contribute skills that the others don’t have — my collaborations with people in public health come to mind, for instance, because I don’t have their years of experience in their field and they have a different set of math and stats skills than I do — but the point of the current collaborations with undergrads and masters’ students is that we all can take ownership of the work!

Data management is a huge part of reproducibility and it will serve you well later. Maybe five years from now you’ll start a project and you’ll think, huh, I think I could use some ideas from that paper I did during my masters’ program… and then you won’t be able to find the files, or won’t be able to open them, or they won’t make any sense (what does KHUIG mean as a column header again?!!). Plan ahead and avoid that!

What files are important?
• Code that you are using for the project
• The input files for your final analysis
• A data dictionary for the input files. What do all those mysterious column names mean? It may be obvious to you now, but in five years the meaning of GVKEY may have slipped your memory.
• Drafts of the paper, in some cases

What format should files be in?
• Code files can be in various formats, but one should be able to open them with a text editor
• Written work should be in .tex or .txt or maybe a Word format….

Where do files go?
• Less good but OK: a Google Drive folder. Why is this less good? It doesn’t have dynamic updating — you need to update your work manually.
• More good but not best: a Dropbox folder. There is versioning and automatic updating.
• Best: Github, either at github.umn.edu or github.com. Github.com can only be used for data that is not owned by the University — in case of a data breach I don’t want to be responsible for theft of intellectual property. Github.umn.edu can be used for data that belongs to the University; by using the U’s service for the U’s data, we’re trusting that the U will take appropriate security measures.

Why else is Github best? You’re using version control, you have a history of your work, you can share with others easily, it’s a transferable skill that is valuable to many employers. If you have a public Github repository you can show your skills to others easily by including it on your resume and LinkedIn page, or in your profile if you give a talk someplace.

“My collaborator sent me this code and I can’t run anything. I don’t think they are incompetent — they said it worked on their machine. What’s the problem?”
• The problem is that you probably did not share all the information about packages and versions!
• For instance, Python 3 different in some significant ways from Python 2.7 — functions like map and range changed behavior in important ways.
• Many of the packages I/we are using in research are pretty new, too: the TDAstats package for R and the kepler-mapper package for Python both have had changes just during the time I’ve been working on projects with them.
• For true reproducibility, it would be best to actually package up a virtual environment in Docker (there are some other ways, too). This is called containerization — watch for it in job ads! If you want a career in a coding-intensive field, like some varieties of data science or machine learning, let me know, because it is a good way to do things but requires a bit of learning.

That’s it for now. We can talk in person about how you’ll implement this for your project.

Header photo by rawpixel on Unsplash

The annual reflection on plagiarism

Plagiarism always prompts an annual ritual of reflection for me. I’m lucky it’s basically annual; it could be twice a year, since there are two semesters, but it’s really a December problem. Why? No one dares plagiarise with me in May 🙂

In December, though, people don’t know me well enough yet. They think I won’t notice that the words they typed don’t answer the question I asked in the assignment, or they think it’s the right way to do things (?!) despite all the conversation to the contrary in September and late November, or they think I don’t actually read what they write. In December, it’s dark and students want to get out. They’re nervous and doubt themselves and their writing skills. They realize approximately 36 hours before the assignment is due that writing math is not like writing English, and they fall back on the sentences of someone in the first 10 Google search results.

I’m sure I don’t catch everyone and everything. But I have an ear for language: I can tell when someone’s writing like they talk, and when they’re not — and I talk to every student I’ve got. (It’s a luxury to have <40 students a semester and be able to learn names and talk to every one!). I notice when the cadence changes, when the vocabulary shifts. Every year there are sentences that I’m sure, sure, a student didn’t write themselves, but I can’t find that sentence out in the world. And every year there are sentence that I’m sure a student didn’t write and that come up in the sources they cited or in the sources I find that they didn’t cite.

(Yes, I spot-check citations. I’m not perfect — in one paper, I thought to myself, “Wow, it’s really interesting that someone named Nakamoto wrote about linear regression *and* about Bitcoin!” and not until the next read did I realize this student was sprinkling citations randomly throughout her paper. Really randomly. It’s as if citations were sprinkles on a cupcake — they make the cupcake look good & it’s the density that matters, not the placement. This student cited Nakamoto’s paper on Bitcoin to justify her use of linear regression, an article about the internet of things to bolster the discussion of currency exchange rates, a review of a book about statistics that included linear regression in the name to support her equation for log returns, although maybe by now I’m permuting things myself. The references were often relevant to the paper; they were just randomly sprinkled around like decoration and after four rounds of revision I didn’t really force any improvement….)

Anyhow, back to plagiarism. There’s almost always a reason given. “I was nervous, I didn’t know, I thought their wording was better, I figured since they were published they were right.”

I try to address all these concerns beforehand. “Send me drafts & I’ll pre-grade it! I’ll help you with wording! It’s ok if it’s not perfect! Writing math is hard — I expect it to be awkward when you’re learning!”

Most students believe me, at least eventually. Many send me their drafts & I’ve caught some plagiarism or near-plagiarism there, and if it’s in a draft I just let the student know & they re-write it & it’s fine! Learning experience for all, round of applause.

But there’s always someone who doesn’t believe me. No drafts or as few as possible. The desperate hope, as far as I can tell, that if the work is turned in at the last minute I’ll just give it an A without looking closely. And then usually the cascade of emotions afterward, when I give the work a poor grade.

argh.

PEOPLE! In general, I hate grading with the passion of at least ten suns. I’d like to poke a fork in my eye every time I’m marking exams, ok? My whiskey consumption goes up slightly at equidistributed intervals during the semester, corresponding to midterms and other exams — it’s always one problem that does me in, really, the rest’s not so bad. Grading makes me see in acute relief what I should have taught differently, spent more time on, etc.

But reading student writing, engaging in a dialogue and finding out what students have learned and thought and discovered? That’s actually interesting and fun and enjoyable, if students actually learned and thought and discovered! Basically the only reasons I’m still in academia are the joy of discovery and the dialogue through the written word. I love interacting with people through writing, and that includes editing and feedback and then discussion afterward. I’m far from perfect. I’m not always a consistent editor and I’m not always the most careful reader. But I really do engage with my students’ papers and the work I do with coauthors and colleagues. I like it, it’s that simple. And so as I’m having a conversation with your paper, I’m going to have questions and go search for them. I’m going to listen to the rhythm of your voice and see what it has to say both through what’s said and what’s not said. I’m going to notice if you don’t answer my questions, at least if you don’t distract me with something more interesting. And I’m going to notice if it’s not you talking to me, or if it’s not you talking to me.

I’d rather have your crappy English-math sentences than what you think of as someone else’s perfect reflection. Their reflection probably isn’t that good, anyway, and I’m not interested in anyone’s thoughts but yours!

Writing math articles: be a good leader

The end of the semester is upon me (and many others). It’s time for editing papers and reviewing lots of things, as well as working on my own writing. I’ve got three senior project students, three other research groups, and my own papers-in-progress. Some themes recur.

Big analogy: You’re waltzing or salsa-ing at a delightful party. Pick the music to suit your paper.

  • The paper author is the leader in this dance. A good leader shifts pressure back and forth from the base of the palm of the hand to the fingers, and up and down, to tell the following dancer what he intends, three-quarters of a beat before it is to occur. I’m not a great dancer, but when I am partnered with a great lead, I know exactly what’s coming and I can do my best to execute — and I end up looking like a good dancer! In the math paper, tell the reader what move is coming. It is ok to say, “Now I’ll define factorials, which are the number of ways n objects can be arranged in order.” If the reader knows what is coming, she can prepare, and she’ll feel successful!
  • Build trust with your reader/follower before asking for faith. Before you can carry out fancy moves, you need trust with your reader/dance follower. Don’t ask the reader to read three pages of complicated stuff without telling them where you’re going. Do tell them explicitly where they’re going first. Don’t confuse motivation with explicit signals like this — telling your dance partner, “This will be great in an hour!” is not the same as telling your dance partner, “You will twirl and then we’ll go backwards for a measure.” The second builds trust. The first sounds good, but the reader may not trust you yet.
  • The number of words you write is proportional to the faith you ask. Reading takes time, as does dancing, and time is something we can’t recover. If I’m not dancing with the person I’m in love with, I like best dancing with someone who alternates between fancy moves & relaxing bits. Dancing salsa with someone who never does turns or twirls is boring. Box stepping through an entire waltz is boring. Too many words without math ideas is boring! One easy way to start dealing with this is deleting words if they aren’t necessary. “In some sense these are dual to one another, where we define dual to mean…” -> “These are dual to each other: define dual as…”  “We will present a defintion…” -> “We define…” It helps you get to the fancy moves faster 🙂
  • It’s a party! Introduce your guest to everyone! References and citations are not just for information you used to do the math. They’re also resources for the reader if she’d like to learn more or follow up on a tangent or know who else worked on this. So introduce your reader to everyone else at the party! and use names! Rather than, “See [1],” it’s so much more hospitable to say, “To read more about algebraic varieties, see the excellent book by Smith, “An Invitation to Algebraic Geometry.””
  •  Seriously, introduce your guest. Here are the people I hang out with (mathematically) all the time and their related papers. Here are the folks I totally disagree with in approach, but we talk about the same math. Here are the people who did the original work on this problem — it was 40 years ago but they can still dance! Here are some good additional resources. Besides being hospitable, references show that you have a command of the field, they make other authors feel good (someone read my paper?!), they allow the reader to potentially notice that there’s a set of articles on the same topic but in math physics that you never heard of and you should really get introduced…!

Ok, that’s it for now. I’m by no means a perfect writer, but it’s a skill I work on regularly. Good luck!

(Yes, I know my dance example here is super heternormative, but trying to do something clever about it would not help the reader or the message…)