A filerepository library / how to organize ?

trashHeap · April 21, 2019, 3:08pm

I’d like to start building a file repository of archive.org eBooks and other publicly accessible eTexts from the internet. While it certainly would house a lot of open content works, im not sure why it shouldn’t also house things that are given away as gratis online either with standard copyright terms.

Subject matter spread should be anything that is relevant or interesting to talkgroup members.

I regularly come across things like this that I feel it’s important to mirror. For example as I mull about the inform programming languages I often refer back to this netbook here: http://adamcadre.ac/gull

It’s possibly the only text I found that explains in easy to read terms how to use the multimedia functions of the glulx virtual machine or zmachine for beginners in mind; and it doesn’t have a whole lot of mirrors.

I also for example came across a description of “What is to be done?” by Nikolay Gavrilovich Chernyshevsky while reading another book, and I discovered it’s in the public domain on archive.org. It’s landed on my “to read” pile and I think that should be fair game too.

But now to the meat and potatoes.

I’ve been pondering how to organize this before I start asking @tim for space on allthecodes.

I am thinking maybe a directory structure based on the Universal Decimal Classification and a root index as a kind of card catalog. Maybe an awk parseable text database. Which should make the metadata importable into any long term data projects.

Does that make sense to people? Would another structure make more sense for a file store of documents?

maiki · April 21, 2019, 7:54pm

I haven’t articulated this, since I want to say it in a way that shows evidence of my claims, and not just “information wants to be free, man!” But so much of the last century’s works have been forgotten, helped to fade due to industry practices.

And so… I fully support sharing. Copyright in a blight on our ability to self-organize, etc. I don’t want anyone getting hurt, but I personally am fine walking into grey territories for knowledge. Forgiveness > permission, and all that.

Whoo, that’s a big, cool index system to use!

I’ll think on it. I’m trying to come up with these flat hierarchies for interi.org, so basically things won’t ever get more than two sections deep from root. Instead of depth, I go for broad and try to infer from metadata.

So, I’d have a bunch of documents I found and am sharing. I’d metatate them with all the classifications I could, and then let machines take care of the rest.

I don’t awk often, but I do search Hugo sites all the time, using my local file index provider (my non-generic example is search in Nautilus). I clone so many folks’ repos to debug, but they keep their content in the same repos, so I get really fascinating (serendipitous!) results back!

I think it works for portability and practical local use.

trashHeap · April 21, 2019, 10:46pm

My thinking is that the file repo should work as both as an individual standalone product, and as something easily importable/referenced/integrated by bigger knowledge engines. Which is why i’d like to have some sort of internal hiearchy with machine & human readible index per se. Keeing it human browseable I think is going to require a structure with some death, if the collection gets any size.

My concern there, is unlike with Hugo we could end up with some documents that aren’t easily parseable. (Image based PDFs for example). Thus the need for some sort of minimal index, capturing some searcheable metadata.

maiki · April 21, 2019, 10:53pm

I think that is “dearth”, but easily one of my favorite sentences! Thrembode, Master Necro, on stand-by!

Dur, I wasn’t done. @trashHeap, gather me six interesting and disparate documents/artifacts, please. That will help get me on your page, and give us something to play with.

trashHeap · April 21, 2019, 11:50pm

I might try and build a sample of what im after to demonstrate, but ill let it cook in my noggin for a day or two too, to make sure im properly convinced.

maiki · April 22, 2019, 12:00am

Ahem topic timers…

judytuna · April 22, 2019, 4:35pm

yall are a good influence on me because my brain goes straight to cloud(butt) services. i just looked at a friend’s demo app for a new service from the company that makes elasticsearch and i forgot the name and can’t find it on their website, weirdly. it crawls your (publicly available) data and lets you search?

judytuna · June 9, 2020, 2:23am

Can I play? From Black History Month Library , six things include…

Black Music/Beyonce - Coachella Pt. 2.mp4
Curriculums/Study Guides/Black Panther Study Guide.pdf (images and text)
Great African Americans Coloring Book.pdf
Eyes Of The Rainbow a documentary film with Assata Shakur.mp4
This Nonviolent Stuff’ll Get You Killed - Charles E. Cobb.epub
Butler, Octavia - Parable of the Sower.pdf

maiki · June 9, 2020, 2:47am

Coolio!

One thing I’ve decided: all text needs to sit in a human markup (such as markdown) so it can shared in many formats. Folks should not need to download a PDF or epub, they should have an app that they feed a URL and its made for them precisely.

I haven’t looked at that library yet, but I’ve been hanging on various anarchist library sites and zones and want to show folks how nice that could be as a web page.

As for other media, I’m not sure what’s best. But time to experiment!

maiki · June 9, 2020, 2:52am

Oh, for music: the highest def and lowest free stream. I think that means FLAC (and other lossless) and opus.

A file repo for audio should have lossless files avilable, but make “good enough” opus files for listening; this allows for easier auditory skimming, a form of sound search.

For video I think a similar approach makes sense, but I am not there yet. Video is difficult for me to use on my connection, so there is an opportunity there to figure out best practices. Making a copy of a video file for every known viewing combo doesn’t seem sensible.

judytuna · June 11, 2020, 7:05am

Woah, check out A magic formula for ebooks ! It’s from 2010! Does it make sense to revisit some of what we said?

Do you use any of these technologies now?

One of the reasons talkgroup is remarkable to me is how you prove yourself right after ten years. Even ten years ago, you were weighing accuracy and making-do-with-what’s-available (a scan of a hard copy of a book) against what you want as the ideal (the promise of a digital text, that you can read in a manner of your choosing, without needing special programs). I am half in awe that you struck the core of the matter right away, and half dismayed that I feel like after 10 years our society hasn’t really made forward progress on this question. lol. The .pdfs of books I’m looking at right now that are text-based (rather than being image scans of physical copies of the books) are formatted badly and riddled with errors that I assume are from the OCR / image-to-text conversion software. And the .pdfs of books where the pdf is a scan of a physical book is… well, it’s not text, so it’s not searchable, and it’s even farther from the markdown-formatted text you mentioned (for example) than the text-based pdfs of books.

WE’RE NOT DEAD YET LE’TS DO IT

I don’t agree with myself from 10 years ago on this. However, I don’t have something better to suggest right now. The best I can do off-the-cuff, flailing, now, is “curate stuff for your pod, and you decide who you trust, I guess,” which is super handwavy.

Okay, I guess I do agree with myself from 10 years ago sometimes lol

Sometimes, I like thinking of talkgroup as a brain slowly mulling things over. We can watch it happen over ten years. TEN YEARS. TEN YEARS IS A LONG TIME. So much has changed, and so much has stayed the same.

maiki · June 11, 2020, 7:54am

I’m about to crash, so busy today (looking forward to reading all this stuff tomorrow!). But I wanted to say: I use git daily, and Calibre is still probably the best suite for most folks to do epub/format conversion stuff. I sometimes use it to convert epub to mobi to email to folks’ kindles.

I’m afflicted with noticing typos and misspellings, and I rarely get through a digital book without finding a few errors in the text. And there’s nothing to be done because we’ve created a system of literary distribution that favors ego (the author as rockstar) and distraction (marketing) over usefulness.

Oh hey, we’re in this topic! Quick aside in an aside: I don’t know what to do with fiction, especially fiction by live authors. But non-fiction I’m just going to copy and publish on the open web. I feel it’s potentially asking for a fight, but it’s the fight I’m willing to engage.

I so fucking tired of reading poorly marked up manifestos and essays. They should be shared and easy to learn from. That’s the point.

Whew! I thought I was gonna just sound crazy for continually mentioning “it will make sense one day”.

Topic		Replies	Views
In the beginning... a filesystem maiki turtles-all-the-way-down	21	1293	January 27, 2021
A magic formula for ebooks maiki git , books , calibre , epub , fbreader , opds , pathagar , lucicat , sisu	3	467	December 12, 2020
Make warez directory Webcraft interi , warez	20	2091	December 27, 2019
Migrating from Ubuntu 16.04 to Fedora 28 Science and Technology ubuntu , fedora	17	1808	May 1, 2021
Where should I keep notes? maiki wordpress , dokuwiki , mediawiki , git , semantic-mediawiki , wikae , gollum	1	284	April 21, 2021

A filerepository library / how to organize ?

Related topics