• 0 Posts
  • 2 Comments
Joined 2 years ago
cake
Cake day: June 29th, 2023

help-circle
  • Basically it’s this system to do all kind of directional acyclic tasks, primarily based around data ingestion. It’s very flexible and powerful, which also means there’s a steep learning curve.

    To give an example, you could have a task that gatherers a list of instances and updates the database. It could also spawn a new task for each one to check if the server is up and get the version number, and you could even have it email you to create an account for new instances.

    Then from the task that made sure the server is up, you could spawn a new task that gets communities, which then spawns new tasks to ingest posts from it

    And when this whole process is done, you could have it kick off a new set of tasks to do the indexing or whatever else on the up to date data set

    It has some nice visualization of the process, you can allocate workers across devices, you can kick off the process through an API… You can use it to do anything from monitoring to scraping and doing map reduce on it. You could even federate and wire into activity pub directly, use their apis, or mix and match with scraping

    I’ve never worked with crawlers and I’m not sure what angle you’re going to attack this from, but if normal crawlers don’t play well with the fediverse this is an option