@litchralee

litchralee@sh.itjust.works · 7 days ago

im not much of a writer, im sure its more clear from AI than if i did it myself

Please understand this in the kindest possible way: if you were not willing to write documentation yourself, why should I want to want review it? I too could use an AI/LLM to distill documentation rather than posting this comment but I choose not to, because I believe that open discussion is a central tenant of open-source software. Even if you are not great at writing in technical English, any attempt at all will be more germane to your intentions and objectives than what an LLM generate. You would have had to first describe your intentions and objectives to the LLM anyway. Might as well get real-life practice at writing.

It’s not that AI and LLMs can’t find their way into the software development process, but the question is to what end: using an AI system to give the appearance of a fully-flushed out project when it isn’t, that is deceitful. Using an AI system to learn, develop, and revise the codebase, to the point that you yourself can adequately teach someone else how it works, that is divine.

With that out of the way, we can talk about the high-level merits of your approach.

how the authentication works: https://positive-intentions.com/docs/research/authentication

What is the lifetime of each user’s public/private keypair? What is the lifetime of the symmetric key shared between two communicating users? The former is important because people can and do lose their private key, or have a need to intentionally destroy the key. In such instance, does the browser app explicitly invalidate a key and inform the counterparty? Or do keys silently disappear and also take the message history with it?

The latter is important because the longer a symmetric key is used, the more ciphertext that a malicious actor can store-and-decrypt later in time, possibly in the future when quantum computers can break today’s encryption. More pressing, though, is that a leak of the symmetric key means all prior and future messages are revealed, until the symmetric key is rotated.

how security works: https://positive-intentions.com/blog/security-privacy-authentication

I take substantial notice whenever a promise of “true privacy” is made, because it either delivers a very strange definition of privacy, or relies upon the reader to supply their own definition of what privacy means to them. When privacy is on offer, I’m always inclined to ask: privacy from whom? From network taps? From other apps running in the same browser?

This document pays only lip service to some sort of privacy notion, but not in any concrete terms. Instead, it spends a whole section on attempting to solve secure key exchange, but simply boils down to “user validates the hash they received through a secure medium”. If a secure medium existed, then secure key exchange would already be solved. If there isn’t one, using an “a priori” hash of the expected key is still vulnerable to hash attacks.

this is my sideproject and im trying to get it off the ground

I applaud you for undertaking an interesting project, but you also have to be aware that many others have also tried their hand at secure messaging, with more fails than successes. The blog posts of Soatok show us the fails within just the basic cryptography, and that doesn’t even get to some of the privacy issues that exist separately. For example, until Signal added support for username, it was mandatory to reveal one’s phone number to bootstrap the user’s identity. That has since been fixed, but they go into detail about why it wasn’t easy to arrive at the present solution.

am i a cryptographer yet?

I recall a recent post I saw on Mastodon, where someone who was implementing a cryptographic library made sure to clarify that they were a “cryptography engineer” and not a cryptographer, because they themselves have to consult with a cryptography regarding how the implementation would work. That is to say, they recognized that although they are writing the code which implements a cryptographic algorithm, the guarantees comes from the algorithm itself, which are understood by and discussed amongst cryptographers. Sometimes nicely, and other times necessarily very bluntly. Those examples come from this blog post.

I myself am definitely not a cryptographer. But I can reference the distilled works of crypgraphers, such as from this 1999 post which still finds relevancy today:

The point here is that, like medicine, cryptography is a science. It has a body of knowledge, and researchers are constantly improving that body of knowledge: designing new security methods, breaking existing security methods, building theoretical foundations, etc. Someone who obviously does not speak the language of cryptography is not conversant with the literature, and is much less likely to have invented something good. It’s as if your doctor started talking about “energy waves and healing vibrations.” You’d worry.

I wish you the very best with this endeavor, but also caution as the space is vast and the pitfalls are manifold.

litchralee@sh.itjust.works · 7 days ago

Aiming to create the worlds most secure messaging app

For anyone else that was looking for it, this is the link to the threat model: https://positive-intentions.com/docs/research/threat-model/

That said, it seems quite thin on hard details, such as how identities (ie usernames) are managed – eg are they unique? How can users cross-check an online identity to a real person? Fingerprints? QR codes? SHA256 hashes? – and whether they are considered publicly-exchangeable. Plus how users are bootstrapped so they can find each other.

While a threat model is the minimum to even beginning an assessment of anything that utters the word “security”, I do have to ask:

Was that document crafted for this project specifically?
Was it prepared by a cryptographer?
And was it generated using an AI/LLM?

litchralee@sh.itjust.works · edit-2 11 days ago

This doesn’t answer OP’s question, but is more of a PSA for anyone that seeks to self-host the backend of an E2EE messaging app: only proceed if you’re willing and able to upkeep your end of the bargain to your users. In the case of Signal, the server cannot decrypt messages when they’re relayed. But this doesn’t mean we can totally ignore where the server is physically located, nor how users connect to it.

As Soatok rightly wrote, the legal jurisdiction of the Signal servers is almost entirely irrelevant when the security model is premised on cryptographic keys that only the end devices have. But also:

They [attackers] can surely learn metadata (message length, if padding isn’t used; time of transmission; sender/recipients). Metadata resistance isn’t a goal of any of the mainstream private messaging solutions, and generally builds atop the Tor network. This is why a threat model is important to the previous section.

So if you’re going to be self-hosting from a country where superinjunctions exist or the right against unreasonable searches is being eroded, consider that well before an agent with a wiretap warrant demands that you attach a logger for “suspicious” IP addresses.

If you do host your Signal server and it’s only accessible through Tor, this is certainly an improvement. But still, you must adequately inform your users about what they’re getting into, because even Tor is not fully resistant to deanonymization, and then by the very nature of using a non-standard Signal server, your users would be under immediate suspicion and subject to IRL side-channel attacks.

I don’t disagree with the idea of wanting to self-host something which is presently centralized. But also recognize that the network effect with Signal is the same as with Tor: more people using it for mundane, everyday purposes provides “herd immunity” to the most vulnerable users. Best place to hide a tree is in a forest, after all.

If you do proceed, don’t oversell what you cannot provide, and make sure your users are fully abreast of this arrangement and they fully consent. This is not targeted at OP, but anyone that hasn’t considered the things above needs to pause before proceeding.

litchralee@sh.itjust.works · 12 days ago

I mean, at the USA average price of electricity of $0.13 per kWh, then for a halving of 70 Watts, it’s about 11 cents per day, or $40 per year. But at the California average price of $0.35, then the savings is 29 cents per day, or $107 per year.

That’s not small money, especially if it’s free to make these gains by ripping out unneeded functionality. But the point is taken that it’ll be hard to find savings from older hardware, which simply didn’t prioritize energy efficiency.

litchralee@sh.itjust.works · 21 days ago

IANAL either, but I’m vaguely familiar that this realm of USA law is known as “choice of law” provisions and the applicability of “click wrap” contracts, and it’s a thorny issue in the digital age. Essentially, the problem is whether Meta can be made reasonably aware that a ToS exists for a given web server. Unlike a “NO TRESPASSING” sign posted on a gate, or a sticker on the packaging of a physical copy of Microsoft Word 97 that says “opening this package constitutes agreement to the EULA, at this URL…”, it can be argued that unless the ToS is made so blitheringly obvious to a web scraper, it might not pass muster.

To be clear, this isn’t a problem for normal web users, because the ToS link will very easily appear at the bottom of the page, when rendered in a standard web browser. The issue is whether scrapers – including AI scrapers but also bot-crawlers and even plain ol Curl – would see the notice of the ToS. There is no convention – either de facto or in law – about where or what format a ToS has to be. And it would be problematic to say that all scrapers need to thoroughly search a website for a “legal.txt”, because such a file might be somewhere non-obvious and because it exacerbates the whole “scrap servers until they collapse” issue.

So already, getting a ToS to bind Meta – or any other high-volume scraper – is an upward battle. Hence why I suggested a remedy rooted in common law, premised on the idea that actively causing expenses for the server owner is actionable, even without a ToS.

That said, I do want to point out one other detail about choice-of-law: normally if a contract specifies the venue for disputes, that will be honored. Example: the courts of Santa Clara County in California. But supposing the instance owner lives in Montreal and specifies the venue as the Court of Quebec, and if the issue with binding Meta to the ToS was solved, then there’s the challenge of actually targeting Meta. As a USA domiciled corporation, they’re not automatically within the jurisdiction that the Quebec courts can reach. If there’s a Canadian subsidiary, that might be a valid target. But if not, the Quebec courts wouldn’t be able to compel Meta’s lawyers to even show up, let alone rule in favor of the instance owner. And then there’s the whole aspect of getting an American court to ratify a judgement issued by an overseas court. It’s doable, but it’s so much harder than specifying a venue within the USA.

But again, that’s problematic if the instance isn’t located within the USA, because then the owner must travel to the USA for their court dates. And I can’t really recommend that anyone travel to the USA except for only the most critical or dire of situations.

litchralee@sh.itjust.works · edit-2 24 days ago

The cynicism surrounding the USA court system is not without cause, but the suggestion to not even bother trying has always rubbed me the wrong way. Firstly, on philosophical grounds, it’s defeatism and on-par with appeasement. But secondly, average Americans can and have prevailed when up against a multinational company.

The one which often comes to mind is the case of a Philadelphia man winning a default judgement against Wells Fargo and was on the cusp of having the local sheriff auction off a branch’s furniture, until they all settled the matter. The man in question wrote about his experience here: https://lawsintexas.com/this-is-how-my-qwr-foreclosed-wells-fargo/

As for how to use Meta, the average Joe need not hire a major law firm, but can choose to pursue a limited suit in small claims court. For Meta, which is headquartered in Silicon Valley in California, the Superior Court in Santa Clara County would be the venue. Drawbacks include: having to get to Silicon Valley for court dates, and a total claims limit of $12.5k.

But on the flip side, the small claims court does not allow lawyers to argue the case before the judge, meaning it’s basically you and Meta’s representative. That representative might still have legal training, but it won’t be a situation like in the 1997 film The Rainmaker where it’s one solo lawyer versus a whole team of lawyers.

There’s also fewer avenues for Meta to inflate costs, such as attempting to pull the case into federal court: diversity jurisdiction isn’t available unless a claim is over $75k. But they can create difficulties through the discovery process, and other pre-trial activities.

Do I think this is viable? Possibly, but it’ll still take a fair amount of effort to have a lawyer work the case prior to trial, even if that lawyer can’t actually do the talking in front of the judge. Easily 5 digit territory to pay your lawyer. But again, this is balanced by Meta having to deal with the nuisance of having someone on their side also put in a similar amount of effort. And when the max cap for small claims is $12.5k, Meta also has enough money to just pay up and then steer their AI scrapers away from your server, saving everyone the bother. See “nuisance value lawsuits”. Also, spiteful lawsuits are a thing.

After all, it’s not like everyone is going to sue Meta in small claims court, right? Right?

litchralee@sh.itjust.works · edit-2 24 days ago

In the somewhat-distant past, “trespass to chattels” is a type of lawsuit in Anglo-American law that could be raised in response to the abuse of a publicly-accessible computer system, originally meant as a remedy for the diminishment of someone’s else’s property (eg milking their cow). How the modern case law is understood, it allows the owner of a system (eg a Fediverse instance) to recover money due to a tortfeasor’s (eg Meta) conduct that interferes with the normal function of the system. The bar had been raised since the 80s, requiring direct impact to the system, not just that someone accessed the system without explicit authorization. Even outright malice does not suffice, since the test is whether the system was degraded in some way.

A run-of-the-mill scraper querying once daily wouldn’t meet the test, and something as minimal as an ICMP ping every second wouldn’t meet the test. But AI scraping to the tune of hundreds of queries per day, adding up to double digit percentage points of server bandwidth for a small Fediverse instance, that might.

That some instance operators have to consider adding more vCPUs or RAM, or operators that successfully applied blockers like Anubis, in response to AI scraping underscores how harmful – and thus potentially legally actionable – those actions are, suggesting a decent chance such a lawsuit could be successful.