Thanks for providing a clue and example of what to do with all these lovely *.json files after we've captured them. I wouldn't call those archives nice or ideal from a subject finding / rereading standpoint, but at least they work, and it doesn't take a lot of effort! Maybe better archive display strategies will emerge after the Yahoo debacle has been over with for a bit.
Update2: Okay! I just discovered that this project has been archived and that yahoo-group-archiver is being recommended (which seems to be doing the trick) https://github.com/IgnoredAmbience/yahoo-group-archiver
Update: Turns out all was working. The 404s were on account of the first few hundred messages being deleted by hand (it was a very contentious group in the early days ;-). Now I'm locked out with a 500, though I am still able to read from my browser. Will let some time pass and try again. Thanks!
I got this far. I set the cookies in archive_group.py
I wonder if the problem could be related to this group being left over from the OLD version of yahoo groups? The url formatting for the messages in my (private) group is : https://groups.yahoo.com/neo/groups/NAMEOFMYGROUP/conversations/messages/1754 Is there some way I could reformat the request?
Archiving group 'nameofmygroup', mode: update , on Sun Dec 1 23:22:55 2019
Archiving message 1 of 1759
Cannot get message 1, attempt 1 of 3 due to HTTP status code 404
Cannot get message 1, attempt 2 of 3 due to HTTP status code 404
Cannot get message 1, attempt 3 of 3 due to HTTP status code 404
Failed to retrive message 1 due to HTTP status code 404
Archiving message 2 of 1759
Cannot get message 2, attempt 1 of 3 due to HTTP status code 404
Cannot get message 2, attempt 2 of 3 due to HTTP status code 404
Cannot get message 2, attempt 3 of 3 due to HTTP status code 404
Failed to retrive message 2 due to HTTP status code 404
I'm getting a fail and traceback — I imagine there are python modules I don't have loaded that are required?
Traceback (most recent call last):
File "archive_group.py", line 25, in <module>
import requests #required for fetching the raw messages
ImportError: No module named requests
If you're not technical and need a free, easy to use solution, consider PG Offline.
http://www.personalgroupware.com/downloads.htm
The software has become effectively free.
It has a 14 day trial after which you can no longer download from YG! but you can still read and search the downloaded archive.
So, if you can DL your group in 14 days, you’ve won a watch haven’t you?
The lack of a download facility after the group has been deleted by Yahoo isn’t going to be an issue.
Cheers,
Wilson.
On December 14th Yahoo will shut down Yahoo Groups. Since my communities have mostly moved away from @yahoogroups.com hosting, to Facebook, @googlegroups, and other places, the bit that hit me was that they are deleting all the mailing list archives.
Digital archives of text conversations are close to ideal from the perspective of a historian: unlike in-person or audio-based interaction this naturally leaves a skimmable and easily searchable record. If I want to know, say, what people were thinking about in the early days of GiveWell, their early blog posts (including comments) are a great source. Their early mailing list archives, however, are about to be deleted.
Luckily we still have two months to export the data before it's wiped, and people have written tools to do automate this. Here's how to download a backup of all the conversations in a group:
If things are going well it will start spitting out messages like:
And it will be creating files:
If you get a message like:
It may mean that you have been blocked, but it may also just mean that for some reason an individual message can't be downloaded. In that case, to tell it to give up on that message and just continue on, create the json file with the stuck message number: You might also get a message like: This is what I see if I try to archive a private group. It's still possible to use the tool to archive a private group that you have access to, but it's a bit involved. First you visit Yahoo Groups in your web browser with Devtools open to the Networking tab. Then you look at what cookies are set on the HTML request, and find the T and Y cookies. The T cookie should start with z= and the Y cookie should start with v=. Paste these into the cookie_T and cookie_Y variable definitions at the beginning of archive_group.py.Once you've downloaded all the messages in a group you can run:
Which will create a bunch of files like [group-name]-archive/archive-YYYY.html. They're not that easy to read, because it doesn't do any kind of quote folding, but we can always do that later. If you made any empty files to get around messages that wouldn't archive (see the touch command above) you'll get an error at this stage; just delete the empty files and re-run.I've archived five groups: givewell, Boston-Contra, BostonAreaContraCommunity, contrasf, and trad-dance-callers. The first two are public groups with public archives, so I've made archives available at /givewell-archive and /Boston-Contra-archive. The remaining three are private, but if you want to look at them and you were a participant or otherwise have a good reason let me know.
Comment via: facebook