I'm trying to use accord.net for text classifaction. But I can't find a way to represent sparse vectors and matrices. For example, we have a lot of texts and after tokenization with ngrams and hashing, every text represented as a feature index(given by featureHasher) with a weight(tf). And it is impossible to load all data as a non sparse matrix into a memory. Is there a way to do incremental processing or represent sparse matrix or do feature reduction with sparse data?
1
There are 1 best solutions below
Related Questions in ACCORD.NET
- OpenShift Pyramid logging to file
- com.mongodb.MongoException: not authorized for insert on myworld.Users
- Openshift context path
- error while establishing connection with node.js server OpenShift
- Cannot port forward for app
- OpenShift - Tomcat 7 (JBoss EWS 2.0) + PostgreSQL 9.2 + Hibernate 4.3.5
- running node.js sails app on openshift
- Can't get Pandas to install with OpenShift
- Cartridge Python2.7 on OPenshift
- OpenShift Requirements.txt Pip error
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Unfortunately, not all models and methods support sparse matrices at this time. However, if you are trying to do text categorization, you might be able to do it using a Support Vector Machine with a Sparse kernel.
Sparse kernels can be found in the Accord.Statistics.Kernels.Sparse namespace, such as for example the SparseLinear and SparseGaussian. Those kernels expect data to be given in LibSVM's Sparse format. The specification for this format can be found in LibSVM's FAQ under the question Why sometimes not all attributes of a data appear in the training/model files?.
Basically, in this format, a feature vector that would be represented as
is represented as
or in other words, as a list of position:value pairs, where position starts at 1.
Here is an example on how to use SVMs with a SparseLinear kernel using LibSVM's sparse linear format: