Links are almost always base64 encoded now and the online url decoders always produce garbage. I was wondering if there is a project out there that would allow me to self-host this type of tool?

I’d probably network this container through gluetun because, yanno, privacy.

Edit to add: Doesn’t have to be specifically base64 focused. Any link decoder that I can use in a privacy respecting way, would be welcome.

Edit 2: See if your solution will decode this link (the one in the image): https://link.sfchronicle.com/external/41488169.38548/aHR0cHM6Ly93d3cuaG90ZG9nYmlsbHMuY29tL2hhbWJ1cmdlci1tb2xkcy9idXJnZXItZG9nLW1vbGQ_c2lkPTY4MTNkMTljYzM0ZWJjZTE4NDA1ZGVjYSZzcz1QJnN0X3JpZD1udWxsJnV0bV9zb3VyY2U9bmV3c2xldHRlciZ1dG1fbWVkaXVtPWVtYWlsJnV0bV90ZXJtPWJyaWVmaW5nJnV0bV9jYW1wYWlnbj1zZmNfYml0ZWN1cmlvdXM/6813d19cc34ebce18405decaB7ef84e41 (it should decode to this page: https://www.hotdogbills.com/hamburger-molds)

  • moseschrute@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    2 days ago

    Not that you should vibe code, but you could vibe code this so easily. Have it output a static website. Give the source code a scan if you’re paranoid. Check the network tab if you’re really really paranoid. But literally you could have it output this as a static index.html file that you drop into your browser of choice.

    This is the only type of coding LLMs should ever be used for imo. A small, very clearly defined task that is very easy to verify if it works. And code that won’t infect a larger project.

    Edit: as others pointed out, that url isn’t base64 encoded. You would have to clearly define what you are trying to do if you want this to work. For example, do all urls follow the same format as the above?

  • Hawk@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    36
    ·
    7 days ago

    There is no such thing as a base64 encoded url. Part of an url might hold base64 encoded data, but never the url itself.

    These online tools aren’t working because you’re using them wrong.

  • 𝕸𝖔𝖘𝖘@infosec.pub
    link
    fedilink
    English
    arrow-up
    12
    ·
    6 days ago

    Just take the base64 bit of the url. The whole url isn’t a base64, so it decoded to garbage.

    The base64 bit decodes just fine.

  • Finadil@lemmy.world
    link
    fedilink
    English
    arrow-up
    10
    ·
    edit-2
    6 days ago

    I mean… It’s decoding into garbage because you’re feeding it more than just the base64 section. I suppose if you’re already running nginx or something you could easily make a page that uses javascript to break the link down (possibly using /, ?, = as separators) and decode sections that look like base64. If you make it javascript and client side there’s not really any privacy concerns.

    EDIT: Oops. My Lemmy client didn’t load the other replies at first, I didn’t realize you already had plenty of other options.

  • irotsoma@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    13
    ·
    7 days ago

    Don’t include the non-encoded part of the data or it will corrupt the decryption. The decoder can’t tell the difference between data that’s not encoded and data that is encoded since it’s all text.

  • Scripter17@lemmy.world
    link
    fedilink
    English
    arrow-up
    7
    ·
    edit-2
    2 days ago

    I’ve been working on a URL cleaning tool for almost 2 years now and just committed support for that type of URL. I’ll release it to crates.io shortly after Rust 1.90 on the 18th.

    https://github.com/Scripter17/url-cleaner

    It has 3 frontends right now: a CLI, an HTTP server and userscript to clean every URL on every webpage you visit, and a discord bot. If you want any other integration let me know and I’ll see what I can do.

    Also, amusingly, you decoded the base64 wrong. You forgot to change the _ to / and thus missed the /burger-dog-mold and tracking parameter garbage at the end. I made sure to remove the tracking parameters.

    Edit: Published on crates.io and github under AGPL. Sadly the discord frontend couldn’t be published to crates.io because to work around something (I forget exactly what) I changed a dependency from the one on crates.io to a more up-to-date version of it on github. Crates.io correctly rejects that kind of stuff. If you want to use the discord frontend, git clone the repository then run cargo build -r -p url-cleaner-discord-app.

    The offer to write extra frontends stands, btw. If you want a slack bot I’ll make one.

  • amzd@lemmy.world
    link
    fedilink
    English
    arrow-up
    10
    ·
    7 days ago

    It’s 3 lines of code in basically every programming language, no need for selfhosting, just open the terminal?

  • e0qdk@reddthat.com
    link
    fedilink
    English
    arrow-up
    13
    ·
    7 days ago

    There’s something else going on there besides base64 encoding of the URL – possibly they have some binary tracking data or other crap that only makes sense to the creator of the link.

    It’s not hard to write a small Python script that gets what you want out of a URL like that though. Here’s one that works with your sample link:

    #!/usr/bin/env python3
    
    import base64
    import binascii
    import itertools
    import string
    import sys
    
    input_url = sys.argv[1]
    parts = input_url.split("/")
      
    for chunk in itertools.accumulate(reversed(parts), lambda b,a: "/".join([a,b])):
      try:
        text = base64.b64decode(chunk).decode("ascii", errors="ignore")
        clean = "".join(itertools.takewhile(lambda x: x in string.printable, text))
        print(clean)
      except binascii.Error:
        continue
    

    Save that to a file like decode.py and then you can you run it on the command line like python3 ./decode.py 'YOUR-LINK-HERE'

    e.g.

    $ python3 ./decode.py 'https://link.sfchronicle.com/external/41488169.38548/aHR0cHM6Ly93d3cuaG90ZG9nYmlsbHMuY29tL2hhbWJ1cmdlci1tb2xkcy9idXJnZXItZG9nLW1vbGQ_c2lkPTY4MTNkMTljYzM0ZWJjZTE4NDA1ZGVjYSZzcz1QJnN0X3JpZD1udWxsJnV0bV9zb3VyY2U9bmV3c2xldHRlciZ1dG1fbWVkaXVtPWVtYWlsJnV0bV90ZXJtPWJyaWVmaW5nJnV0bV9jYW1wYWlnbj1zZmNfYml0ZWN1cmlvdXM/6813d19cc34ebce18405decaB7ef84e41'
    https://www.hotdogbills.com/hamburger-molds/burger-dog-mold
    

    This script works by spitting the URL at ‘/’ characters and then recombining the parts (right-to-left) and checking if that chunk of text can be base64 decoded successfully. If it does, it then takes any printable ASCII characters at the start of the string and outputs it (to clean up the garbage characters at the end). If there’s more than one possible valid interpretation as base64 it will print them all as it finds them.

  • markstos@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    5 days ago

    The encoding format of URLs is URL encoding, also known as percent-encoding. Content in the URL may be first encoding in some other format, like JSON or base64, and then encoded additionally using percent-encoding.

    While there is a standard way to decode percent-encoding, websites are free to use base64 or JSON in URLs however they wish, so there’s not a one-size-fits-all way to decode them all. For example, the “/” character is valid in both percent-encoding and base64-encoding, so to know if it’s part of a base64-encoded blob or not, you might end up trying decoding several parts of the URL as base64 and checking if the result looks like URL-- essentially brute force.

    A smarter way to do this might be to maintain a mapping between your favorite sites that you want to decode and what methods they use to encode links. Then a tool could efficiently directly decode the URLs embedded in these click trackers.

  • masterofn001@lemmy.ca
    link
    fedilink
    English
    arrow-up
    5
    ·
    7 days ago

    I have nothing to add except the appreciation for everyone who helped and amazement at the vastly differing ways people produced working results.

  • liliumstar@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    4
    ·
    edit-2
    7 days ago

    I wrote this little webapp thing some time ago. It’s not exactly what you asked for but is a good example.

    All it does is base64 encode a link and adds the server url in front of it. When someone visits that link it will redirect them to the destination. The intent is to bypass simple link tracking / blocking in discord and other platforms.

    There are also checks for known bad domains and an attempt to remove known tracking query parameters.

    https://git.tsps-express.xyz/liliumstar/redir

    Edit: I forgot to add it also blocks known crawlers (at least at time of writing) so that they can’t just follow the 302 and figure out where it goes.