Computer Science Lounge - [Too Many Idea Men Edition]

Potentially, but my middle name is "Exalted Lord of Over Commitment".

Just an update. My Node.js site Prospect has taken longer than expected for iron out but I am in the process or front end changes, the back end works now with clustering and I hashed the passwords and the works. Soon I will be working on a way to automate deployment to a server I setup in local town so access to it will be soon and it will improve over time.

Secondly I wanted to get back into some C programming because it is my favorite language so I wrote this hashset in c over this past weekend. Enjoy!!!

I've been working on a .soft file parser in flex/bison, as a critical part of the TF-Cluster grid program I've been hacking on. Also found out that the dataset size is 79TB, and that me running the program would likely DDOS National Center for Biotechnology Information because their infrastructure and programmatic access is terrible.

If I stop posting on here for a long term, I might have gotten in trouble for inadvertently DDOS'ing a federal agency.

2 Likes

This is a good sign for this community to have this type of content. Content about hardware gets borring after so. The industry is going in the direction of developers using cloud based configurations such as Amazon AWS where you can define your hardware in configeration files in your software.

1 Like

Whats your Github maby I will take a look make a pull request or two?

2 Likes

github.com/anadon under the madlib and TF-cluster repos. I should write something up when I'm done with the parser, because very little seems to be about in the way of good how-tos for re-entrant parsers using flex and bison. I'm going to work on it when I wake up tomorrow because it needs to be done by the 20th.

The plan is to add the file support to madlib, update TF-Cluster v3 to handle the new functionality, and wrap that in a BOINC wrapper, pointed at URL ranges in NCBI, and to sent the compressed results to my university.

I've also been contemplating creating a .bsoft file format to store the data in a more compressed way, as properly structuring it could remove all explicit key values, reduce the char space to 7 bits, and then fed into bzip. Because the current concise dataset I have to run this over is 79TB, and that size presents all sorts of issues. The raw dataset is several times that, using .CEL files from affymetrics which I still should add support for.

1 Like

You say somthing about a "perfect alignment algorithm" could you elaborate more on that about the type of algorithm and where you would want it?

The maximally conserved non-contiguous subsequence. It would be an optimized dynamic algorithm with the range limited by a suffix array algorithm to find the longest contiguous subsequence to limit edge cases. I've been trying to find time to work on a better SA-IS implementation.

I'd just want it under the /src folder, and I have a pretty good API defined in my suffix array library.

Would it be related to the FKT algorithm?

I'm actually not familiar. Could you explain or give me a link?

Looks like it might be a more general form, but not in the same time complexity as sequence alignment using dynamic programming. The algorithm I'm using is worst case O(n^2), and has a number of decent cases where it is faster.

What file currently has that algorithm in the project?

Not added, if I recall. It should be in the TODO file, as well as adding doxygen formally to the project. In a D&D session right now, so I'm kinda preoccupied.

Ok well I will take a look, this looks advanced for my current knowledge set. I know basic discrete math and data structures but I might need to expand my knowledge a little further before I contribute.

Anybody know any good programs that can format your code to a certain spec?

Also if anybody is interested Prospector is up and running. You might consider it early pre alpha but this is my first website from scratch. Its not the best but I made it and I learned a lot making it. I will continue to polish it in the future, to access it go here.

http://prospector.center:3000/

Please send me your biggest complaints or problems with it and I can work on those issues. Also if anybody knows how I can get it to use https that would be nice. I have no idea how that works.

http://astyle.sourceforge.net/

I'm unable to connect to prospector.center

1 Like

I got it going again, for some reason I cant start it on startup.

rc.local
#!/bin/sh -e
#
# rc.local
#
# This script is executed at the end of each multiuser runlevel.
# Make sure that the script will "exit 0" on success or any other
# value on error.
#
# In order to enable or disable this script just change the execution
# bits.
#
# By default this script does nothing.

sudo su
cd /home/sdrafahl/Prospector
./scripts/redis.sh -start
npm start
exit 0