Wednesday, February 1, 2012

Nasty StreamFS Bug

Several months ago I ran into a nasty bug in StreamFS. When the incoming data rate was too high StreamFS would crash with a flood of “Too many open files” exceptions. I took an initial approach to the problem and then dug deeper to find out it was fundamentally a timing bug -- although i’m still unsure about why the bug caused the problem.

Initial approach:
  • Call close() on all database connections.
  • Both in the MySQL driver and the MongoDB driver.
  • Make sure all HttpExchange objects are properly cleaned up after a response is sent.
  • Added a finally clause to the sendResponse() method in the Resource object code that assures the closing of the input/output streams and a final call to the exchange.close() method.
The code was spread all over the place, so it took a while to make sure I had “plugged every hole”, and there were many! So it wasn’t a completely wasted effort. It made the code base more stable, no question. However, the problem still occurred when I unleased the deluge of data coming onto the system.

The next couple of week I tried everything: tracers, logging, etc. Nothing was clearing this bug. So I finally decided to just dig through the datapath. Data is submitted, forwarded to the subscription engine and finally placed in MongoDB. I added in arbitrary delays and it turns out that’s what caused the problem. The process described above was entirely serialized:
  • Forward to subscribers
  • Save in database
With a delay of more than 100 ms between these two step and the incoming post rate at about 700 Kbps (with data size about 300 bytes per entry, so about 300 json documents per second), the process was not fast enough to send the reply and handle the next incoming document. I created a separate task using a thread pool, to paralelize the subscription forwarding process and the database insertion process and the problem disappeared. There must be something in the underlying HttpExchange code in the com.sun.net.httpserver library that is causing some kind of problem with close never being called if the incoming data rate is too fast. In either case it fixed the bug and StreamFS is better for it.

Tuesday, July 12, 2011

Losing your past...

My academic and professional success to date has certainly not come for free. There's quite a number of sacrifices that i've had to make along the way. But what interesting that i've noticed, now as i get older, is not the sacrifices of working hard and staying up late, but for me in particular, it's the sacrifice that's come with being away from my closest family and friends. A few minutes ago, I was looking through the facebook page of some of my closest friends from my neighborhood. Many of them still hang out and now have young children of their own, that also hang out. I realized -- as i looked through the photos -- that I'm missing out on their lives. The people who are the most important to me, even after all the friendships i've made, none have been more meaningful than the ones i cultivated while growing up. Hopefully in the near future I can re-insert myself into their lives. It'll be a difficult transition, i'm sure, but one that's probably worth while.

Friday, January 7, 2011

Data acquisition challenges: Non-technical barriers to acquisition

I'm trying to figure out a systematic way of integrating building data from the Seimens BMS in Sutardja Dai Hall. I'm about 90 minutes from dialing into a meeting to talk to Siemens engineers and I figured i'd write a quick blog post on my notes for OPC. The notes are actually based on this pdf.

OPC stands for OLE for Process Control (too many TLAs! --Three Letter Acronyms). OLE stands for Object Linking and Embedding. It's used to read/write value to/from sensors and controllers. For my StreamFS system, i'm interested is collecting as many data streams from physical data sources (from various large sensor deployments) and integrating them into a single system. So lets look a little closer at OPC.

OPC is a software processes that run over client-server connections between a computer and gateway -- the front-end machine that receives all the deployment data. The gateway receives and sets values from the sensors that are attached to it. The machine is typically connected to the gateway via ethernet or serial. In SDH, a serial connection is used. Serial connections are bandwidth limited in comparison to ethernet. Typically the range of serial bandwidth is between 128 Kbps and 8 Mbps, depending on the type of serial cable that's used. I do not know what the available bandwidth in SDH is in the serial installed. According to a meeting we had a few weeks ago, there is a 300-object download limit per sampling period on the Seimens system -- we assumed this was because of the serial connection, however, we were informed that through OPC, we can attain all the objects from all the sensors. So the bottleneck cannot be because of the serial connection. It must be a topological bottleneck through the Seimens root server. OPC, must bypass the root to avoid the bottleneck. (There's also a push to upgrade from serial to ethernet, but if the bottleneck is topological, ethernet won't help!).

The OPC server can be accessed by either using the serial interface or the TCP/IP interface. I'm not sure if it's depending on the connection to the gateway server. When setting up the server, it must have the same address and process name as the gateway program.

The main set of questions that come out of this short overview for the Siemens folks are:

1) Is the bottleneck topological?
2) Do we need to purchase Siemens OPC?
3) Can we access the deployment information bus and send control signals to sensors from the web interface? What documentation should we look at to learn more about that interface?
4) How does the web interface compare with OPC?
5) What's the security model? How can we gain read/write access? Who manages/sets the security policy?

These conversations with Siemens have been dragging for months. It's incredible how little information we get from them phone call after phone call. They are clearly trying to protect their business by being vague. You can't really blame them, but it's very frustrating as a researcher trying to do some science!

Sunday, June 27, 2010

More on the Evo

I decided to keep my Evo and switch to Sprint. The switch from the G1 to the Evo was a no-brainer. The Evo is far far superior, as it should be since it's a later generation of an android-based device.

I did get a chance to play with the iPhone4 the other day and I was quite impressed with the UI and general feel of the phone. Functionality-wise, I think the Evo is better (Evo versus iPhone4), especially if you're a power user and use the internet a lot. I've found that the 3G AP feature is not only extremely useful for me, but it is far more stable than the access point connection I have at the CS lab at school. The bandwidth is great for productivity. I've evened streamed tv using the slingbox client, for hours on end without any hitches. It's really fantastic!

However, that said, the iPhone's UI is FAR superior. I very much hate to admit that but it's true. Their engineers and artists have clearly spent an inordinate amount of time making the feel of the phone just right. They really nailed it. It's sensitive to the user's touch, but not too sensitive. It's very responsive and is able to decipher user gestures with high accuracy and it just looks so pretty. I understand why it is that people love the phone so much.

From a functionality perspective, I think the Evo is a better phone. Once 4G is ubiquitous it will make the phone much more valuable. But it's going to be difficult to get the lay-person user to see that those differences matter since the lay-person surely doesn't use the sophisticated features. Apple has focused it's attention on the lay user while not compromising too much on the technical end of things. It's certainly a nice tradeoff. Secondly, most people could care less about Apple's practices because it really does not affect them. I am against apple in principal, since they have a closed platform -- a walled garden usage model. Andriod is quite the opposite, and there are certainly tradeoffs to consider there as well, but all things being equal, i'd rather have choices.

My neighborhood -- Woodside

Just a couple of pictures of where I grew up. My family still live there and it's where I call home when I visit NYC. Woodside, Queens: Always reppin'










Thursday, May 20, 2010

Sprint Evo Review

There was no 4G coverage here in Berkeley so, for now, all the tests were run on the 3G network.

===============================
Download speed: ~500 Kbps
Upload speed: ~120Kbps
===============================

Supposedly the 4G network is 10x faster. That would make the connection speed similar to the one I have at home. That would be 5Mbps down and 1.2 Mbps up. That's pretty amazing.

The live the tv feature is pretty incredible. And the fast that the data is being carried over the 3G network is actually quite impressive.

The keyboard facility is not as bad as I expected it to be. Although it's seriously lacking in comparison to the keyboard on the droid or the G1.

Battery life. It might be a good idea to really run some battery tests on this thing. Live TV, Access Point feature, etc. Those things have really got to drain the battery. It's likely a function of the cellular radio's duty cycle. Probably easy to do the 'back-of-the-envelope' calculation to figure out how long the battery is expected to last.

The screen is huge, and clearly this also drain the battery.

Not all 3G networks are create equal. It comparison to T-mobile's 3G network in this area, there really is no comparison. The Sprint 3G network is significantly better than the t-mobile network. I can barely hold a voice call for my drive from school which is only 3 miles/10 minutes aways by car.

Radio -- it has one and it's awesome. I put on some headphone and was able to get local radio stations. Radio hasn't been integrated into any of my devices and that has made it very difficult to keep up with radio shows I hear in the morning on my way to school. Not all the stations I listen to have online/streaming audio stations, so when i turn the car off upon arrival, no more radio show.


More to come....

Monday, January 4, 2010

Open research questions in IS4

I'm working on building a sensor storage system with features that will make it easy to integrate multiple stream of data into a database for long-term storage, real-time streaming, and perhaps real-time trending/search. The name of the system is the Integrated Sensor-stream Storage System or IS4 and there is more information here.

Long-term storage is trivial. The challenge or contribution in this area is really the protocols and associated schemas for integrating data stream from a heterogenous set of sensors. There are also challenges that have been directly addressed in the literature with respect to the integration of data that does not have a unifying condition. In other words, there's no column whose values match in order to run a join across columns. It's important to note that in this work, we assume that the data is all timeseries data. Furthermore, the associated metadata is also timeseries data. Therefore we can use aggregation, windowing, extrapolation, and interpolation techniques to create views (temporary and materialized).

From the ActionWebs meeting, the main take-away related to my work is the idea of the relationship between models and data. Up to this point our mindset has been on applying a model to the data, for example, running linear interpolation in time and surface-fitting in space to generate a 2D temperature-distribution map overlayed on a map of a space. But this kind of interpolation has major flaws. Especially when dealing with temperature data, linear interpolation has no bounded contraints. There is no notion of heat transfer, no notion of space separation, no physical model that can be fed the collected data so that observations are based more on physical reality rather than pure mathematical modeling. Really, the model should set the context for the data and the data should then inform the model. Adjustments to the model should made according to the data.

On the data model...

I am reluctant to use the word column or view since it implies a relational data model. The data model is object-oriented. We can, however, map data objects to tables. The concept of a column is an ordered vector of object attribute values for the same attribute label. The row corresponds to an object of attribute-value pairs. Views are comprised of a mapping of attributes in column form. Since all objects are timestamped, we can use a model (physical, mathematical, or otherwise) to fill in any missing values so that a merge/join can be performed.

Open challenges:

1. Real-time streaming queries
2. Actuation
3. Staged pub/sub system
4. Alerts and modeling
5. Query language

Real-time streaming queries
This problem has been addressed in the literature. Windowing is one of the most common techniques.

Actuation
This is a problem in sensor networks that has not been addressed. Most papers mention the need for actuation, very few actually do it.

Staged pub/sub system
Client systems to IS4 will have the ability to subscribe to various stages of incoming data and processing stream. This implies the need to paint stream flows and forwarding those flows to all subscribers.

Alerts and modeling
Models used to apply to incoming data; data informing the model for adjustments. Deviations of data from model beyond pre-defined thresholds should trigger either alerts to client processes or online model adjustment.

Query language
How do we query the data? If end-clients are not processes by a human user, should we provide a SQL-like to allow users to express what they would like from the data without having to express how to get it.

[more later]