Countdown to Jen's Dissertation Defense: April 1, 2005

Sunday, February 27, 2005

33 days left

nothing - preparing for MIT interview

Saturday, February 26, 2005

34 days left

nothing - writing slides for job talk

35 days left

Binary networks - incorporated jim's changes to produce the final draft of this chapter.

Friday, February 25, 2005

36 days left

Future work - wrote 660 words, including one section on network structure and trust and the beginning of a section on filmtrust.

Thursday, February 24, 2005

37 days left

finished and sent filmtrust and continuous algorithms chapters.

Tuesday, February 22, 2005

38 days left

Chapter 4: Background

I think I finished reworking this chapter. Added about 600 words on game theory, did some reorganization.

Chapter 7: FilmTrust

I think this chapter is finally finished. I added in some text on coverage by my algorithm.

Chapter 6: Continuous algorithms

I also think this chapter is about done. I emailed Cai and Paolo again today to check my sanity on the algorithms.

39 days left

I assembled more of the front matter - table and figure lists, TOC, etc. Reformatted all of the chapters to the style requried by the university. I did a preliminary check on the references to make sure everything mentioned in the chapters appears in the reference section. I added pieces to several chapters, cleaning them up. I ran some new experiments that will give me some coverage information for the FimlTrust chapter. I think FilmTrust will be done after that and a conclusion. I can't actually find another algorithm that does the trust recommendation like I am doing it. I need to search a bit more, but I may just wrap up that chapter without anything to do a real comparison against.

I need to work a bit more on the background chapter. Though it's basically done, I need to clean it up to add in a few things that I think are important.

I think I may actually be able to finish up the entire first draft by the end of this week.

Monday, February 21, 2005

40 days left

I put together the beginning of the front matter today, including the abstract. I also separated the references out of the individual chapters and put them in the separate section at the end.
Chapter 6: Continuous networks

Wrote 1600 words with pseudo code, algorithmic description, and complexity analysis as well as a description of why I could not do a direct comparison to other algorithms.

Sunday, February 20, 2005

41 days left

Chapter 7: FilmTrust

I didn't do too much today. I went through and cleaned up this chapter. Wrote only about 150 words. After monday's usability subjects, this one will be totally ready to send off.

Chapter 6: Continuous algorithms

I didn't write any more about this, but I've done more thinking about the ranking issue of the other algorithms. As far as I can tell so far, there aren't a lot of options to get the ratings I want/need for a direct comparison. I may just have to explain this in the paper.

Saturday, February 19, 2005

Current Status

  1. Introduction - done
  2. Social Networks - done
  3. Trust - done
  4. Inferring Trust Background - done
  5. Binary Networks - done
  6. Continuous Networks - 6500 words, 22 ss pages. Only outstanding section is the last one comapring algorithms.
  7. FilmTrust - 4100 words, 12ss pages. Basically finished. The last section may be expanded after I get a couple more usability subjects tested.
  8. Trust mail - done
  9. Conclusion - none
  10. Future work - 2600 words, 6 pages. Still a ways to go to lay out all of the future items.

42 days left

Chapter 7: FilmTrust

Conducted the usability study and got some good results and feedback. I wrote up the current results in about 600 words in the FilmTrust chapter. I may sit with a few more subjects on monday, but essentially the chapter is about finished.

Chapter 6: Continuous algorithms

Found the output from Cai's algorithm. Like EigenTrust, it's giving me rankings rather than recommended ratings. I need to figure out how to handle this.

Chapter 10: Future work

Added a section on filtering statements on the semantic web with trust and provenance. Total addition of 2000 words and 5 pages to this chapter.

I also gave Jim copies of the previously sent chapters and got one back from him with minimal changes.

Friday, February 18, 2005

43 days left

Tomorrow, I conduct the usability study that will let me finish up the FilmTrust chapter.

Chapter 6: Continuous algorithms

I ran eigentrust today on my data. I'm not sure how to deal with the output since it really doesn't give actual trust value. Instead, you just get a vector that allows you to make relative comparisons among nodes. I tried to do a rank correlation, and it's coming up low, but I'm not really sure how to incorporate that.

Cai also sent me his code, which is based on a spreading activation model. It runs beautifully on the network and I just need to hear back from him on how to get the output.

Thursday, February 17, 2005

44 days left

Chapter 7: FilmTrust

Today I created the pre- and post-test questionnaires and tasks for my usability study on this chapter. I conducted the study on my first subject (Dan) and signed up 5 additional people for the study.

Chapter 6: Continuous algorithms

FINALLY I got eigentrust working, converging to the left eigenvector, etc etc. It's great. Finally, I'm going to be able to do the comparison I've needed to do.

Wednesday, February 16, 2005

45 days left

Chapter 4: Background

Finished this chapter. Total is 6 pages, 2500 words. I deleted a big chunk that was there before because it didn't really relate. Chapter has been sent off to Jim.
Chapter 6: Continuous algorithms

I spent a few hours implementing eigentrust (again). It still isn't working, but I think it's closer. I sent it off to the peeps at Stanford and hopefully I'll get some feedback from them tomorrow.

Tuesday, February 15, 2005

46 days left

Chapter 7: FilmTrust

Today I did a comparison of the trust based accuracy with the accuracy of recommendations using an automated collaborative filtering algorithm based on pearson correlations. The ACF followed the mean while the accuracy from trust stayed high. This is excellent because it shows that in the context of my example, trust outperforms the traditional methods. I am going to stop with this one example because it is a classis ACF algorithm, and my goal is not to show that i'm better than all of them, but that I'm not worse.

Wrote 555 words to conclude that result and also write up the theoretical basis for the review ranking.

Chapter 6: Continuous Algorithms

Tried the correlation ACF for trust ratings. The basic run is not showing a significant difference between mine and theirs. I need to look a bit deeper at this.

Monday, February 14, 2005

47 days left

Chapter 7: FilmTrust

I finished reading all of the background papers on recommender systems. Frankly, the ones I read today didn't have much to offer, but it's good that I've read them. I did some experiments on rank correlation, but they didn't offer much and I don't know how useful that would have been anway. I analyzed the database and, to complete the FilmTrust chapter, I need more reviews. I emailed all the members about that and hopefully I'll get more data from that.

I also found code for one of the better recommender systems that I read about. I've struggled trying to get the mySql database installed and running on my computer. That will probably be one of tomorrow's projects since it will allow me to complete one of the comparisons I'd like to have.

Sunday, February 13, 2005

48 days left

Chapter 7: FilmTrust

Wrote just over 500 words on the previous research section. I also found some papers related to another movie recommender system. They alledegly have a web service that i'd like to try out for comparison. If not, I should be able to implement their algorithms relatively easily.

Saturday, February 12, 2005

49 days left

Chapter 1: Introduction

Finished. Sent to Jim. It's a short one - 4 pages, 1500 words, but it needed to be done and it is.

Chapter 8:TrustMail

Done. FINALLY. This chapter has been sitting for days waiting for only a few numbers because I was rushing to run the code and it was running with errors. I finally figured out a way to randomly select some points to make it more efficient while still giving pretty good results. I corrected all the errors after a seemingly endless series of stupid bugs, and got the results i was looking for. I also sent this one off to Jim tonight.

At this point, I have 5 chapters finished. Of the ones remaining, chapter 4(background), 9(conclusion), and 10(future work) are just writing - they don't require any real intellectual work. Chapter 6 (continuous networks) is waiting for a comparison with other implementations. Chapter 7 (FilmTrust) is waiting for a user study on the effect of ordering reviews. In reality, if it came down to it, I think the dissertation would pass even without the parts missing from chapter 6 and 7. I'm going to finish them, but if I were pressed, I could have a draft of the whole thing done in a week. This is a great relief because I know now that it's not an issue to get everything completed in time.

Current Status

  1. Introduction - done at 4 pages(single spaced) and 1560 words. Sent to Jim today.
  2. Social networks - done
  3. Trust - done
  4. Background - 5 pages, 2000 words. P2P section is complete, but I need more on existing algorithms, recommenders, etc
  5. Binary Trust - done
  6. Continuous Trust - 22 pages, 6500 words. The only missing section is a comparison of my algorithms with others. Need to implement the others.
  7. FilmTrust - 8 pages, 2200 words. Only section left is on sorted reviews. Part of that can be written with a justification, but I'm thinking a user study with their feedback would also be useful for this chapter
  8. TrustMail - done (6 pages, 2300 words). Sent to Jim today. Finally.
  9. Conclusion - none
  10. Future Work - still 2 pages, 489 words. Needs more sections.

Friday, February 11, 2005

50 days left

Chapter 7: FilmTrust

Finally some progress. I cracked the thick-shelled nut of the recommended ratings this afternoon. I don't know why it was giving me problems because it works exactly like i expected it to. I can show that as the user diverges from the average tastes, the recommended rating stays close to their opinions (much closer than the average). I also have tweaked the implementation to correspond with what I discovered, and it seems to be working quite well.

I wrote all of this up to essentially complete the section on recommended ratings. That is 5 pages, 1500 words. Total for the chapter now is 8 pages, 2200 words. Previous research and the section on ranked reviews still to go.

The ranked reviews section can be started since there is a firm justification for it. However, this is where I think I need to conduct the user study. I started assembling questions for that today, and also emailed Ben to set up an appointment to talk about it.

It's a great relief to have figured this all out.

Chapter 8: TrustMail

Yes. There was another error. My code completed, I wrote the new code to do the last calculation. It was backwards. So yesterday I started it running again, forwards, but I was not checking a case that I should have and that DRASTICALLY increased the run time such that it didn't finish and wouldn't have finished for a long time. So again I had to stop it, and it's running now. All this for 3 stupid numbers that don't actually matter that much in the first place.

Chapter 6: Continuous Trust

It's slowed enough that I'm putting the title next to the chapter number again. I got an email from the guys at Stanford and another trust implementation guy I emailed. Haven't worked up the EigenTrust code based on this new email, but it makes sense (I think). The other algorithm should be coming soon and I expect in a form I can use. That will be great.

Thursday, February 10, 2005

51 days left

Chapter 4: Background and related work

I completed the section on P2P trust applications, including the citations needed for that. That only amounted to 100 words or so, but it requried some reading and at least now that section is finished.

Chapter 8: TrustMail

The code ran to completion, looked nice, and then I realized that I computed things backwards. Yet again, I need to re-run the code. It's completing in about 12 hours, though, so I really honestly should be able to finish the chapter tomorrow. I did add to the previous work section to cite Pattie Maes work on email filtering (she's my host at the MIT media lab in a few weeks). Another about 100 words here.

Chapter 6

Well, no luck with the EigenTrust implementation yet. I certainly have enough to explain why it won't work for these sorts of tasks, but really a working implementation would be a lot better. In a couple days I'll give this another shot, but there is just some information I'm missing to make it work.

Richardson, et al is also proving difficult. *MY* papers included pesudocode. Producing a working implementation from these papers is not easy, and I would &heart; some pseudocode to work from.

Wednesday, February 09, 2005

52 days left

This dissertation is doing things to me. Bad things. I've had to start wearing an ace bandage on my right wrist because it's so so sore from all the typing. And yesterday, I got into the shower without remembering to fully undress. I couldn't figure out why I felt so weird until I realized that I still had my underwear on. This can't be a good sign.

Chapter 8: TrustMail

Last night's simulations had a bug. They ran for 22 hours but were missing one of the two data points I needed. I have to re-run them and, unfortunately, the bug-fix has created a major slow-down. So it will probably take a few days to get this done. Once it's complete, I can plug in the numbers and ship it off to Jim.

Chapter 6

I realize I need to compare to some other algorithms here. I unsuccessfully tried to code a couple things and ended up emailing the authors of those projects to get ahold of their code or to send them my network so they can run it through their stuff. That will save me time, and also (and more importantly) prevent errors if I were to mis-code or misunderstand what is part of their work.

Chapter 7: FilmTrust

I did the lit search and pulled out a bunch of articles on movies, recommender systems, and collaborative filtering. Perhaps the future-work section of this chapter will get written tomorrow.

Chapter 1: Introduction

I had to include some writing in the days work, so I started on the introduction. It's not done, but almost everything has a good start. I wrote 1161 words, to be precise.

Tuesday, February 08, 2005

53 days left

Chapter 8: TrustMail

The chapter is basically finished. All the writing is done and I'm just waiting for the numbers to come out of the code that's running. The pace of the code I started earlier means it will take a couple days. I started another one just now that should be quicker, and a check in the morning will let me know which is progressing faster. In any case, I just have to plug in the resulting numbers and this chapter will be finished. It's a short one ( only about 5 pages single spaced), but it's good to get it crossed off the list.

Monday, February 07, 2005

54 days left

Chapter 8: TrustMail

Wrote just over 600 words in the results section. Wrote the code to extract and build the social network from the Enron email corpus. Created the database for the corpus. Wrote code to calculate the percentage of senders who could be identified in a trust network. That code is running now and should be done in the morning. That will let me finish up the writing of that chapter. I'm also keeping track of pairs and how the path was found so, tomorrow, I can also compute the percentage of messages that would be tagged.

Sunday, February 06, 2005

55 days left

Chapter 6

I'm a bit stuck on how to progress with the chapter. I've completed everything to justify and describe the development of the algorithm. Within that I've presented the numbers on accuracy. I'm not sure what to do next - do I have to implement these other algorithms and do a comparison? Can I just wrap it up and show how it applies? I'm not convinced that it is a significant enough contribution to just present the algorithm and give it's actual accuracy. I have to show an improvement or something. Because I don't have a clear plan, I was not able to make any real progress here today.

Chapter 8: TrustMail

I figured out that the analysis I want to do on this is to compare how many messages are caught with a close social network filter and how many are caught with an extended social network filter. I am going to use the enron email repository for this, and I downloaded that (150 users, about 5GB of messages) tonight. Those 150 users are the enron employees, but there should be many more *total* users because the employees received and sent email to people outside the company. Hopefully the social network there won't be too dense. If I can show that with an extended social network that a greater percentage of messages are caught, then the analysis from chapter 6 will lead to a reasonable conclusion that that percentage of messages will also be rated accurately enough to give the user a benefit.

The obvious next step will be to extract the social network from these messages, into a *much* smaller data set.

Both FilmTrust and TrustMail rely on the assumption that sorting what the user sees according to trust rating is a good thing. I need to prove that somewhere. Since I am not conducting a user study on TrustMail, I will ahve to devise a test to administer to FilmTrust users to see if they see a benefit from having the reviews with higher ratings appear more prominently.

Saturday, February 05, 2005

56 days left

Chapter 6

Wrote 1000 words. Completed all of the background to the algorithm with statistical analyses. All but the pseudocode for the algorithmic description itself. Currently at 24 pages, 6400 words.

Friday, February 04, 2005

57 days left

Chapter 6

Limited writing due to another afternoon spent at NSA. ~175 words in the chapter. I re-performed my experiments that I did yesterday and now have correct data. I've started writing this in the chapter, done the statistical analysis on the data points, and am ready to write them. Tomorrow should be very productive since I don't have any scheduled distractions.

Thursday, February 03, 2005

58 days left

As predicted, a full afternoon at NSA followed by a full night teaching class lead to a very unproductive day.

Chapter 6

The only things I managed to get done were some experiments to compare the original algorithm to a new variation. On my larger network, the new variation performed significantly better. On the FilmTrust network, there was no significant difference. That may be ok as a claim that some networks will benefit. I need to think more about this one.

Hopefully, I'll get more done tomorrow.

Wednesday, February 02, 2005

59 days left

Today, I spent most of the day working on an overdue chapter on terrorism and the semantic web. I completed about half of that. It came pretty easily, but it still took time to write it. I also wrote the slides for the talk I have to give tomorrow.

I am starting to feel burnt out with this chapter. It's very important, but I've been working on it 14 hours a day for about a week. I've been working 14 hours a day on the in one form or another for a long time, and the hours aren't really the problem. The dissertation writing, though, has required such intense focus. My usual work cycle involves switching between several active projects, so when I get fatigued on one, I just move to the next. With this, I can't do that, and the more I progress, the less I can even switch between chapters.

Chapter 6

Started the section about the algorithm, wrote 340 words. I also wrote some code to test a few variations on the algorithm that I'm writing about. I found a couple of good results to strengthen what I'm doing.

Tuesday, February 01, 2005

Current Status - 2/1/05

  1. Introduction- nothing (2 days)

  2. Social Networks - Done. Sent to Jim
  3. Trust - Done. Sent to Jim
  4. Background - 4 pages, 1780 words complete. Spotty, but not a lot of thought work to finish.
  5. Binary Networks - Done. Comments back from Jim.
  6. Continuous Networks - 17 pages, 4640 words done. Complete introduction, background and properties of experimental networks, properties and behavior of trust in social networks. To do: presentation of the algorithm, analysis of its performance on networks, potentially a comparison to other algorithms on the same networks. (2-3 days of work until first draft unless the comparison is done. Then, 5-6 days)
  7. FilmTrust - 4 pages, 748 words. Complete background on website. To do: Introduction and related work, analysis of algorithms on the website, comparison to other recommender system algorithms. (5-6 days of work until first draft).
  8. TrustMail - 3 pages, 1309 words. Background and introduction, description of the software complete. Todo: some sort of analysis of the results. (3-4 days of work to come up with analysis method, perform it, and write it up).

  9. Conclusions - nothing. (2 days)
  10. Future work - 2 pages, 489 words. MOAM completed. To do: write up of other applications; terrorism, expert social network, possibly filtering of statements. (2-3 days of work to get basic introductions to topics, 3-4 to present anything resulty).

Those days estimates add up to 24. That does not include time for revisions with Jim and days spent on work outside of the dissertation. So far, it looks achievable. In 24 days, we'll be down to 35 days to defense, and 14 days to when the dissertation needs to go to committee. With parallel editing, that should be achievable.

60 days left

Chapter 2: Social Networks

The Journal paper version was sent back without review from Social Networks because it is apparently too basic for that audience. I'll have to look for a computer science outlet and present it in more of a web context.

Chapter 6: Blah blah blah

Finally, I've finished the analysis of trust and length and their effect on accuracy. I wrote 1250 words today to complete this section, along with a lot of code to generate some big tables of data. This will be the longest chapter, without a doubt. It's already 17 pages and I haven't presented the algorithm of the results from running it yet. This has the potential to be a good journal article, except that I think it will be too long even for that.

Tomorrow I may have to dedicate to other things that need to be done - an overdue book chapter, grading assignments, and writing the talk I have to give Wednesday. All of this is time sensitive, but hopefully it will only take up the day part of my day, leaving me the night part of my day for more dissertation work.