Thanks! And it's a lot of info, yeah. ~90% of new data in yesterday's drop was photographs, which they redacted for us.
The House Oversight Committee's giant drop in November had tons of data we still didn't take advantage of even after doing the original Jmail, like flight logs.
For the Yahoo release, which is still ongoing, the folks at Drop Site News (see https://www.jmail.world/about) are handling the manual redaction which has been very time consuming, even with tons of AI to help in the background.
Yes! We used our friends at Reducto (https://reducto.ai/) for all document extraction and parsing (one of the best companies I've ever referred to YC ;) )
We did an initial parsing pass of all four DOJ document batches on Friday. This takes a raw PDF and returns chunks containing typed blocks—each with a type (Title, Text, Figure, etc.), bounding boxes, content, and confidence scores. For PDFs that were just scans of photographs (which was like 90% of new content in Friday's release), it gave in depth descriptions of those! You can type search terms like "door" at https://www.jmail.world/photos to see what I mean.
For apps like Jmail and JFlights we use their structured extraction endpoint instead—you define a schema (e.g. {from, to, subject, date, body} for emails or {departure_airport, arrival_airport, passengers[], date} for flights) and it pulls those fields directly into JSON.
The JFlights example served as the best ad for Reducto and how doc parsing technology can speed up hours of journalistic investigations like this.
while true, it would probably be useful to provide examples. The one that I am aware of seems to be a picture showing Clinton, Michael Jackson, and Diana Ross with "redacted" victims
however it seems that this photo is actually taken from a 2003 Democratic fundraiser, and the redacted images of victims were of Diana Ross' son Evan, and Michael Jackson's kids, Paris and Prince Jackson. This may or may not be accurate either, since I have not been able to dig down into the photo and determine if it has any connections to a supposed 2003 fundraiser.
But it seems more likely to be true than not that this was sloppily planted evidence that was especially insultingly fake.
on edit: looking closer does not seem to be exact same photos, but instead two different photos taken at the same time and place, so in the 2003 Dem fundraising, but a different photo of that. So it could be that Epstein had it and DOJ thought hey, look at these pervs! Let's release!!
As you say, it's not the same photo. If the one in the dump was in Epstein's possession, the reason for the redactions are either that some drone in the DOJ just redacted all children out of habit, or that it was deliberately done in such a way as to frame Clinton. I can't decide which I find more credible.
I think if it hadn't been those adults with the kids an alert staffer might have thought "whose kids are these, these aren't young teenage girls, I better double check" But Michael Jackson, kids, Clinton arms around him, Diana Ross with young male, they're thinking they walked into an armory filled with nothing but smoking guns!
>the reason for the redactions are either that some drone in the DOJ just redacted all children out of habit, or that it was deliberately done in such a way as to frame Clinton
They were supposed to redact all minors, not just "victims".
I see people are not clued into this and incredulously downvote because the file release appears to be in good faith to them such that illegal evidence tampering is out of the question
I'm being snarky and this isn't such a serious comment and I don't really mean this for Gemini but can you imagine using something like Gemini ("Hi, please comb through this") and it just refuses on ethical grounds
I just have real institutional problems with Google, they have all the best tech minds but some things are just off limits to them being politically correct
And no, not Epstein. It's a general statement; but it's disappointing that they're like this (and of course Gemini was famously the one that gave black Nazis and things like that)
Google has never fixed their black people/gorilla issue. The foundational tech that all of their products run on going back a decade is fundamentally flawed (and outputs outputs that many would say align with racist ideologies, among others).
But, whoever’s doing the redacting sees the original right? What prevents the redactor from saying, “here’s what the document really said.” Or “here’s who’s in the image, I saw it before I redacted it?”
That’s a good point. I would imagine they break it up into pieces - in a reCAPTCHA sorta way - and any given person sees a sentence or a piece of a sentence.
An alternative would be to strip out all obvious known words and only leave unknowns (i.e., names) and then have those fragments reviewed (in a reCAPTCHA sorta way).
Finally, for images, cover all faces and the one by one decide which should remain covered and which should not.
LOTS of work but there are workflows to mitigate the ability for reviewers to connect more than they should.
Given how MTG went completely silent despite her high profile platform, I'm guessing the civil (or at this point, royal) servants don't want their families harmed.
I’d guess a first pass is done automatically? Eg if a page mentions eg Trump, just redact that whole page/paragraph/etc. So the people who have done the closer reading to redact further probably don’t actually know the scale of what was already redacted. Just a guess though.
The House Oversight Committee's giant drop in November had tons of data we still didn't take advantage of even after doing the original Jmail, like flight logs.
For the Yahoo release, which is still ongoing, the folks at Drop Site News (see https://www.jmail.world/about) are handling the manual redaction which has been very time consuming, even with tons of AI to help in the background.