Archive for the 'Uncategorized' Category

Bài thơ tình của thương binh Nguyễn Toàn Thắng

Tôi bị thương nằm trên giường bệnh

Trong cơn mê hô lệnh xung phong.

Qua cơn mê phấn khởi trong lòng

Bên giường bệnh có cô y tá

Hai tay cô rờ lên gò má:

Vết thương này đau lắm không anh

Với những lời nhỏ nhẹ chân thành

……

Cô y tá có cái nhìn tha thiết

Thấy anh đau em không biết làm sao

Hai mắt cô ngấn lệ tuôn trào

Tôi mến phục tự hào đồng đội

Cuộc chiến tranh Tây Nam biên giới

Có những người đồng đội hy sinh

Trong gian lao chiến đấu quên mình

Đưa tôi đến trong tình đồng đội

Dù mưa bom hay trăm đạn xôí

Tình yêu thương đồng đội có nhau

Lo từng viên thuốc cọng rau

Chăm nom đồng đội đỡ đầu mẹ sinh

Nghe tiếng gọi là mình ở đó

Anh cần gì em có đây anh

Em là y tá trực kinh

Lo từng nhịp đập huyết sinh em truyền

Chân anh gãy đạn xuyên bên phải

Vết thương này chân phải lìa đôi

Lên từng cơn sốt liên hồi

Em đo nhiệt độ đứng ngồi bên anh

Anh tỉnh lại cất nhanh tiếng gọi

Nghe miên man tiếng nói của em

Biết là đồng đội bên mình

…….

Tình đồng đội đồng niên đồng chí

Là mối tình cao quý vô cùng

Chiến hào cơm nước sẻ chung

Nằm trên giường bịnh chịu cùng gian lao

Nguyễn Toàn Thắng

Mẹ

Mẹ

Con sẽ không đợi một ngày kia
khi mẹ mất đi mới giật mình khóc lóc
Những dòng sông trôi đi có trở lại bao giờ?
Con hốt hoảng trước thời gian khắc nghiệt
Chạy điên cuồng qua tuổi mẹ già nua
mỗi ngày qua con lại thấy bơ vơ
ai níu nổi thời gian?
ai níu nổi?
Con mỗi ngày một lớn lên
Mẹ mỗi ngày thêm già cỗi
Cuộc hành trình thầm lặng phía hoàng hôn.

Con sẽ không đợi một ngày kia
có người cài cho con lên áo một bông hồng
mới thảng thốt nhận ra mình mất mẹ
mỗi ngày đi qua đang cài cho con một bông hồng
hoa đẹp đấy – cớ sao lòng hoảng sợ?
Ta ra đi mười năm xa vòng tay của mẹ
Sống tự do như một cánh chim bằng
Ta làm thơ cho đời và biết bao người con gái
Có bao giờ thơ cho mẹ ta không?


Những bài thơ chất ngập tâm hồn
đau khổ – chia lìa – buồn vui – hạnh phúc
Có những bàn chân đã giẫm xuống trái tim ta độc ác
mà vẫn cứ đêm về thao thức làm thơ
ta quên mất thềm xưa dáng mẹ ngồi chờ
giọt nước mắt già nua không ứa nổi
ta mê mải trên bàn chân rong ruổi
mắt mẹ già thầm lặng dõi sau lưng
Khi gai đời đâm ứa máu bàn chân
mấy kẻ đi qua
mấy người dừng lại?
Sao mẹ già ở cách xa đến vậy
trái tim âu lo đã giục giã đi tìm
ta vẫn vô tình
ta vẫn thản nhiên?

Hôm nay…
anh đã bao lần dừng lại trên phố quen
ngã nón đứng chào xe tang qua phố
ai mất mẹ?
sao lòng anh hoảng sợ
tiếng khóc kia bao lâu nữa
của mình?
Bài thơ này xin thắp một bình minh
trên đời mẹ bao năm rồi tăm tối
bài thơ như một nụ hồng
Con cài sẵn cho tháng ngày
sẽ tới !

Xin tặng cho những ai được diễm phúc còn có Mẹ
Đỗ Trung Quân – 1986

20/2/09

Chẳng mấy khi lại có thời gian và cảm hứng ngồi viết blog thế này. Hôm nay các chú bạn ba bất ngờ đem hoa và quà đến mừng thọ bà, chúc bà mạnh khỏe và sống lâu. Mình thấy vui và cảm động quá. Ba có những người bạn thật tuyệt vời. Hôm nay cả nhà mình đều vui. Mình cũng chỉ mong cho bà mình mạnh khỏe, sống lâu trăm tuổi, để vui cùng con cháu. Không có việc gì khó, chỉ sợ lòng không bền. Khó khăn thì rõ là nhiều. Nhưng mình phải cố gắng, chí ít là cố gắng hết sức của mình, để bà, ba mẹ, và em mình vui.

Hôm nay tự dưng trời có tuyết. Thời điểm này đúng ra là bắt đầu vào xuân rồi. Nhưng mấy hôm nay lại lạnh bất thường. Tuyết đang rơi dày, nhưng có lẽ từ ngày mai là trời ấm trở lại. Đứng ngoài trời 30 phút ngắm tuyết và đợi một người. Tuyết rơi bay ngang qua ánh đèn đường thật đẹp. Nhưng người thì không tới vì vẫn giận dỗi vì một chuyện mình không có lỗi. Định nặn thử một thằng người tuyết nhưng mà tuyết nó dễ vỡ ra quá. Mà nặn một mình không vui nên lại thôi. Ấy nhưng mà tình cờ, cái đường mình lăn cục tuyết trên đất nó lại ra hình trái tim, thiệt là đặc biệt.

How to finish your PhD in a reasonable number of years

How to finish your PhD in a reasonable number of years

My upcoming defense will be almost exactly 4 years to the day after I started this Ph.D. program. This has prompted some reflection on how I’ve managed to get it done. Well, and the upcoming Carnival of GRADual Progress may have made me want to blog something that others might find interesting. Or maybe not. Anyways, what follows is my advice for getting your PhD in the shortest amount of time that still allows some sleep and a wee bit of a life. Unfortunately, not all of the advice is that which I would give my hypothetical future grad students, either because taking short-cuts to minimize time might cut into the quality of their experience or because as a selfish professor, I would want the best possible science to occur. Hence, things I’ve done are getting a :) and things I wouldn’t tell my grad students are getting a :( .

  1. Know – really know – what you want to do before you start your Ph.D. This will hopefully help you avoid time advisor- or project-hopping, or worse leaving the program partway through. This may be particularly hard if you are coming straight out of undergraduate and the narrow specialization of graduate school seems so constraining. In my experience, the most focused grad students are the ones who have a M.S. or job experience or both. Of course, knowing what you want to do won’t negate the possibility that your project may morph before your eyes partway through. That’s just how science works. :)
  2. Know what programs – and what advisors ­– have a reputation for graduating students in a timely fashion. There is a big-name in my field who’s students routinely take 6-7 years to finish, even if they started with a M.S. Avoid people (and places) like that unless you have a really good reason for wanting to work with that one specific person.
  3. Get a fellowship. This single biggest slow-down I see among my fellow students (other than ski season) is having to work or TA all the way through. TAing usually takes ~15 hours a week and, when you are also taking classes, that doesn’t leave much time for research. On the other hand, a fellowship is paying you to do the best possible job of graduate school. And often it is paying you substantially better than you would make as a TA or RA. But getting a fellowship takes some advance planning. Some, like those from NSF, require you to have completed <1 style=”font-family: Wingdings;”> :)
  4. But if you don’t get a fellowship, get an RA that lets you work on your thesis. An RA is a research assistantship, usually funded by a grant that your advisor got. Basically, you work at least 20 hours per week on a research project. At some schools, almost all students on RA funding are using that project as their thesis, because their advisors specifically got the funding to fund a thesis project. Again, this means you are basically being paid to do the research you have to do to get your degree. At other schools, (insane) rules prevent you from using your RA work in your thesis. This is no better than TAing in terms of time effectiveness. The rules on RAs would be something to look at when applying for programs.
  5. Minimize the number of classes you take. Sure, there are a ton of courses that look interesting and may be helpful someplace down the road. But take what you need to meet your degree requirements and then stop taking classes. This gives you the maximum amount of time to work on your research. If the course listings are just too tempting, ask the professor if you can sit in on the lectures or formally audit. Then do the minimum amount of work necessary to know what’s going on in class. The point of the class is to give you the basics and expose you to a field; you can always learn the fine points on your own when you need them. :) :(
  6. Learn how to just do “enough” in your classes. Yes, the subjects are fascinating and you feel like you should maximize the material you get out of each class, but don’t. If you spend to much time doing all of the readings thoroughly and making sure you understand how to solve every last problem, you’ll never get your research done. The point of the class is to give you the basics and expose you to a field; you can always learn the fine points on your own when you need them. Also, figure out how to relate term papers and class projects to your thesis topic. Not only will it make your term papers easier to write, it may come in handy on your thesis itself.
  7. Don’t over-study for your comps/prelims. Set yourself a limited number of weeks to focus mainly on getting ready for your exams. Don’t allot a whole summer or term; you’ll get burned out and you’ll slow your research progress. I gave myself about a month. As my advisor told me, and his advisor told him: “There’s no way to study for these things, but you can’t not study.” The point is to reinforce the things you already know and to get them back up to the top of your brain so that you will be able to recall (most of) them under pressure. :)
  8. Winter break doesn’t mean a month off and spring break is imaginary. This is one of the biggest differences I’ve noticed between MS and PhD students. When finals end in the fall, the MS students leave and don’t come back until classes restart. Meanwhile, they’ve lost several weeks of time without distractions that they could have been using for research. Actually, I’ve gone home for ~2 weeks most Christmases, but I’ve always brought some finite piece of work that I need to complete as well as a stack of journal articles that I usually ignore. Ask my in-laws, I usually spend at least an hour or two every day of “vacation” getting some work done. :)
  9. Shorter breaks can recharge your batteries. I’ve also taken shorter vacations (like around my anniversary) and refused to bring any work along. And I generally take at least one weekend day completely off. These mini-breaks really help me sustain my enthusiasm, and they also help with that “having a life” thing. :)
  10. Don’t pick a research topic that requires multiple years of data. Long time-series or needing multiple field seasons can really slow you down. Let’s say you need two years of data to do a before/after study and the first year of data is worthless. That means you won’t even have results until after your third year of grad school is completed. And if you do field work, all sorts of natural factors (hurricanes, avalanches, etc.) can obliterate your field site and potential data through no fault of your own. (Unfortunately, I did not do this.) :(
  11. Pick a field site within a few hours of your university/house. Because then when you find that you’ve forgotten a piece of equipment back at the lab, you don’t have to hop an airplane (or pay overnight shipping) to get it. And I guarantee, that even if you have the best planned field campaign ever, as you write up your results, you will discover that you could really use a few extra measurements or some nice photos or something. I learned this lesson the hard way during my MS. :(
  12. Don’t pick your grad school location based on the skiing/surfing/rafting season. At least not if you know you’d probably end up participating in the sport more than 1 day per week.
  13. Limit your volunteer commitments. I know that saying this makes me sound a like a selfish person, but what I mean is that rather than spending one day a week working at the animal shelter, you might consider spending 1 week per year building an animal shelter in a hurricane ravaged area. You get an intense experience that helps with #9, makes you feel more altruistic, actually accomplishes something, and doesn’t take as much time. :)
  14. Commit yourself to externally imposed deadlines. Submitting an abstract for a conference is a great way to make sure that you have a chunk of research done a few months later. Going on the job market will sure light the fire for you to finish and defend. Getting pregnant, however, does not do the same trick as it will make you fatigued and less focused. :)
  15. Make to-do lists and set goals. Things that work for me: (1) listing my goals for each month at its beginning, posting them on the whiteboard by my desk, and then revisiting them at the end of the month; and (2) at the end of the day, especially on Fridays, make a list of a few (<6) style=”font-family: Wingdings;”> :)
  16. When you are not in the mood for writing, make figures. Or similar variations on the theme. Find something that you can be productive at and do that for a day or two until you are ready to write again. Variety is the spice of life. :)
  17. Finally, have a life. Life is too short and too precious and the world is too interesting to do nothing but work. Make friends, have a significant other, have a dog, develop a hobby. Time spent away from school work will make the time spent doing school work more bearable.

Daniel Lemire’s blog, My research process

My research process

<!–Related posts:

–> Filed under: Academia/Research, Favorite — Daniel Lemire @ 10:01

One thing you never read about is how people do research in their mind. People do describe how to write papers, how to get an academic job, but somehow, I cannot recall anyone describing their thought process.

Mine is simple enough. It includes both theoretical and experimental work. So here it is…

  • I usually start with a specific problem. This problem must be about something significant: a few people worldwide might want to know about the solution. It must be sufficiently narrow that I can address it in a few months. I try to apply the Turney’s principle: be ambitious. In other words, it should not be obvious when I begin that I will succeed. Yes, this means that I do not know I will be able to write a paper at the end! And yes, this means that I sometimes fail. Ideally, I pick a problem so original that I am the only one working on it, worldwide. Almost invariably, the nicest problems take one of the following forms: 1) I want to explain theoretically something I observe experimentally 2) I want to improve on an existing method by at least an order of magnitude (in accuracy, simplicity, speed). Merely aiming to improve an existing approach by a small amount is something I avoid, if only because I know that given enough time, I can always hope to improve any technique by a tiny amount. There is no challenge, no surprise, no risk of failure!
  • A good problem is such that I can then it process down to at least one simple conjecture. A simple conjecture is one that I can realistically hope to make progress on within a few days or a few hours. Sometimes I verify the conjecture experimentally, sometimes theoretically, it does not matter. I avoid working on several small conjectures at the same time: I try to handle them one at a time. Sometimes, the result of my work on a conjecture will be another conjecture. Sometimes these conjectures turn out to be silly, in retrospect.
  • Once I have processed the first simple conjecture, I try to come up with other ones that will bring me closer to a solution to my problem. Always picking the next most promising one.
  • Very often, I will give up on a problem or the problem will change drastically over time. Or the problem will generate worthwhile subproblems. At any given time, I have about a dozen different problems on my radar, but only about 2 or 3 active ones, and only about 2 or 3 conjectures I am working on.

Collaboration messes up this process because I no longer control the overall problem. But I will still decompose the problem into conjectures that I take one at a time. One benefit of working with someone else is that you have someone who will read and check your conjectures. You can also check someone’s else conjecture which is refreshing. You are also much less likely to make crucial mistakes in the process if you work with others (especially if your collaborators are any good).

To a large extend, my process does not rely on brilliant insights nor luck. I merely grind the problem slowly, each time approaching closer and closer to the solution (hopefully). I do not care about making mistakes. I am very, very often wrong. In the past, I have wasted months working on useless problems, generating useless conjectures: this tends to happen more frequently if I work alone.

What makes me more productive, mostly, are nice problems. Often, picking the small conjectures is rather simple: after all, I do not need to be right, I just need to grind at the problem. If there is any talent involved at all in my process, it has to do on how I pick the overall problem. But even then, I think that passion matters more than talent. The more I care about the problem, to faster I make progress. And more importantly, the happier I am as I work.

Funding opportunities, networking, fame and fortune play no role in the above process. At no point do I worry about what others will think except maybe when I pick the overall problem. And even then, I only check, in my mind, that a few people will care, enough that some journal will publish it, eventually. This egocentric process is probably suboptimal. However, my overarching goal is not to be famous, but rather to enjoy myself and get paid in the process. This is not to say I do care about my peers: I want to earn their respect.

I can sometimes offload some of the conjectures to people working on my projects. However, my process does not scale up very well. I can work in small teams (2 or 3 people), but I could not run a large laboratory (10 people or more) with the above process. I am more of a craftsman than a tycoon.

Daniel Lemire’s blog

How to become smarter

<!–Related posts:

–> Filed under: Academia/Research, Favorite — Daniel Lemire @ 8:22

picture by tatianes
  • Work on projects you love doing, even if only part of the time. You can only be as smart as you are motivated. I will never be a smart electrician.
  • Reading and learning are important, but people learn by doing, by tinkering.
  • Carry a notebook or a PDA, and use it to record ideas. Periodically discard most of your ideas.
  • Having a blog can’t hurt.
  • This is probably the most important point: hang around with smart people. If you live among monkeys, you might have a good life, but you will not earn a Ph.D. (except if you are studying monkeys!). Happily, you can easily hang around with smart people wherever you live thanks to the Internet. This is important because if you hang around with people who do great work, you will be motivated by emulation: nobody likes to feel like a loser among his peers.
  • Push yourself: try daring projects and learn to fail. Be ambitious! Do not waste your time with things you know how to do well. Go beyond. Aim as high as you can, while trying to stay on track.
  • Context is important when solving problems. I found that offices are nearly the worst place to work for me. My home office is much better. Sometimes, a coffee place can be a decent alternative office (presumably because of the white noise effect). Sometimes, using a pen is better than a keyboard. Sometimes, working with a laptop in your bed is better than working on a desk. Change, try new contexts!
  • Come back to important projects regularly. Do not get lost in the small stuff.
  • Urgency is an important factor. Somehow, being too happy about what you achieved can slow you down. This suggests that you should be critical of your own work, and you should not underestimate your competitors. Of course, you need to stay motivated, so do not overestimate your competitors or underestimate your own work either!
  • Omega-3 is good for you and might make you smarter. Eating fish seems like a good idea.
  • When you are tensed, eat carbs (bread, cookies). Do not make things worse by drinking coffee.
  • Too much coffee tends to get your mind to speed up and you lose focus easily. You end up getting many things done, but you no longer have time for thinking about the hard problems.
  • When you need energy, eat proteins (cheese, meat, beans). Coffee alone will only help you temporarily, it does not get you through a lot of hard work.
  • Drink a lot of water: after all, your brain is mostly water.
  • Sleep a decent amount. Some people claim sleep-deprivation allows them to get more done, and it might be true, and I do not know of any evidence that sleep-deprivation hurts your brain, but being sleepy does slow you down and tends to get you to work on routine problems.
  • Taking long walks (at least 20 minutes) out in a quiet park, thinking about some deep issues, tend to set me up for good work for the rest of the day.

See also my post My research process.

For further reading and scientific evidence, read my posts Physical factors making your smarter: white noise, carbohydrates, music, alcohol, and coffee? and Thinking intelligence is innate makes you stupid.

Reference.

Daniel Lemire’s blog

Understanding what makes database indexes work

<!–Related posts:

–> Filed under: Data Warehousing and OLAP, Favorite, Science and Technology — Daniel Lemire @ 11:56

Why do database indexes work?

In a previous post, I explained that only two factors make indexing possible:

  • your index expects specific queries
  • or you make specific assumptions about the data sets.

In other cases, you are better off just scanning the entire data set.

What makes database indexes work?

As far as I know, there are only 6 strategies that make indexes work. By combining them in different ways, we get all of the various existing schemes. (I would love to hear your feedback on this claim!)

1. You expect specific queries: restructure your data!

Suppose you know ahead of time that you will only need to select some of the elements in your data set. Then you can taylor an index for such queries and thus avoid scanning much of the content. For example, an inverted index in full-text search will select which documents contain the various keywords. Instead of working with all documents, you will only worry about the ones matching at least one keyword. Indexing a column with a B-tree or a hash table is another scenario where you try to immediately go to the relevant rows in a table.

Of course, if I look for all documents containing the words “the” and “will”, and want to know how many there are and what is their average length, such a form of indexing will not help.

2. You expect specific queries: materialize them!

Another commonly used strategy is view materialization. If 10% of all visitors on Google type in the word “sex”, they might as well precompute the result of the query. In Business Intelligence, if you can expect your users to mostly care about results aggregated over weeks, months or years, it makes sense to precompute these values instead of always working from the raw data. Alternatively, you can materialize intermediate elements that are needed to compute your results. For example, even if people do not need data aggregated per day, precomputing it might be useful for computing weekly numbers faster.

This form of indexing tends to work well to address the most popular queries, but it fails when people have more specific needs.

3. You expect specific queries: redundancy is (sometimes) your friend

When you do not know exactly which queries to expect, you can try to index the data in different ways, for different queries. For example, you could both use a B-tree and a hash table, and determine at query time which is the best evaluation strategy. You might even determine that the best way is to forgo the indexes and scan the raw data!

4. Use multiresolution!

Suppose that you look for specific images, but you may still need to scan 50% of them. An index that would point you to only the relevant images might not be effective. Instead, you should try to quickly discard the irrelevant candidates. What you could do is create thumbnails (low resolution images). Then you can dismiss quickly the images that are obviously not a good match. Naturally, you can have progressively finer resolutions.

Database indexes often bin values together. For example, if you could bin all workers earning between $10,000 and $30,000, then all workers earning between $30,000 and $50,000, and so on. If you are looking for workers earning between $40,000 and $45,000, you can first find all works that are in the $30,000-$50,000 bin, and then look up their actual salaries, one by one. You can adapt the bins either to the data distribution or to the types of queries you expect.

For more examples, see my post How to speed up retrieval without any index?.

5. Your data is not random: compress it!

Most real-world data is highly compressible. By compressing the data, you can make it so that your CPU and IO subsystem process less data. However, you have to worry about bottlenecks. Too much compression may overload your CPU. Too little compression and most time will be spent in loading the data from disk. Two techniques are often combined to get good results out of compression: sorting and run-length encoding.

6. In any case: optimize your code

You should be using cache-aware and CPU-aware indexes. Be aware that comparing two bits together may take nearly as long as comparing two integers. Be aware that jumping all over the place (as in a B-tree) takes longer than processing the data by tiny chunks.

Daniel Lemire’s blog

Write good papers

<!–Related posts:

–> Filed under: — Daniel Lemire @ 22:02

So, you want to write a good paper? The most amusing reference is E. Robert Schulman, How to Write a Scientific Paper, Annals of Improbable Research, Vol. 2, No. 5, pg. 8.

For a more serious methodology, follow the following steps.

1. Picking a topic, an idea

My friend Peter Turney has a key piece of advice: be ambitious. Imagine each new paper you write as a lasting reference for your peers. Do not merely aim to get your papers accepted. Aim to have a lasting impact on your field.

We should learn something new. You have to challenge your readers and yourself!

I know of three strategies to write an ambitious paper:

  • Pick a new problem nobody has worked upon. Define the problem and be the first to propose a solution. This is the best way to get highly cited and become famous.
  • Try to explain something significant nobody has managed to explain.
  • Improve by at least an order of magnitude what others have done.

2. Before you ever pick up your pen…

  • What is your message? What point are you making? Most papers should make a single point.
  • Why is this message important? Why should the reader take his precious time to read your paper?
  • How are you going to make your point? What experiments can you run? What theorems can you prove?
  • Has this point been made before? How is your contribution different from what has been said a thousand times before?

3. What a good paper should contain

  • A sexy start: tell the reader early why he should read your paper. Don’t summarize, sell! A good abstract answers the question why should I read this paper?, it does not summarize the paper. Convince us early that your paper is important. Starting out the paper by a punch line is important.
  • You should clearly say what your contribution is. Reviewers are lazy, they do not want to have to figure out what your message is. Spend some time telling us exactly what your contribution is. Spell it out, do not assume we will read the paper carefully.
  • A review of related work: demonstrate that you know all of the related work and that you can relate your own contribution to it.
  • A large reference section: people like to be cited, so make sure you cite every paper that might have some relevance.
  • Experimental evidence: you need to confront your idea with the real-world and report on how well it fares. Compare explicitly your results with the best results elsewhere.
  • Relevant and non-obvious theoretical results: it is easier for people to build on your work if there is some theory and it helps give people confidence in your work.
  • Pictures! Really, even if you feel silly doing it or that you think you can’t draw. A picture can help tremendously in communicating difficult ideas.
  • Original examples over original data sets.
  • A conclusion telling us about future work and summarizing (again) the strong points of the paper.

5. What a good paper should not contain

  • Weak unnecessary results: if you derived ten theorems but only one is necessary, throw the rest of them in your drawers. I do not want to know about useless results!
  • Technical details: technical papers made of several small ideas are usually not interesting.

6. Good pedagogy and style

  • Use strong verbs (replace “we made use of categorization” by “we categorized”).
  • Always give the example first, and the result next.
  • Use as few parenthesis, footnotes and bold characters as you can.
  • Use a spell checker. Just do it.
  • Use a tool such as style-check.rb to check for verbose phrases and other common mistakes.
  • Learn about and use unbreakable spaces.
  • Do not use negations…
  • Avoid UA (useless acronyms).
  • DUAT: Do not use acronyms in titles.
  • Your writing will be in an active voice… (hint: avoid the verb “to be”)
  • Avoid carefully needless words.
  • Employ uncomplicated terms.
  • Learn to use the em-dash — it is a good friend.
  • Short sentences — no more than 15 words — are better.

7. Run through this check list before submission

  • Are section headers consistent with respect to case? (”Our Methodology” versus “Our algorithm”)
  • Do the figures look nice? Are the fonts large enough for easy browsing? Are they readable once printed out in black-and-white? Can we see any compression artifacts?
  • If the page limit is x pages, do you have an x pages long paper?
  • Do you have at least one figure?
  • Is the layout of each page elegant?
  • Do you have widows or orphans?
  • Did you spell check?
  • Do you have a step-by-step toy example for every new algorithm being introduced? Present your examples early.
  • Are all equations arithmetically correct?
  • Can you replace some mathematical notation by plain English?
  • Are all terms defined?
  • Is the mathematical notation consistent? (If you use t for time in the first section, do you use t to note the term in the second section?)
  • Are the title and the abstract geared toward making the paper attractive?
  • Do you summarize your contribution in the introduction?
  • Is the bibliography consistent? (If you abbreviate first names once, do it all the way through. If you have page numbers once, have page numbers throughout.)
  • Is the spelling of all proper names correct? You would hate to get your paper reviewed by someone who would find his name misspelt in your paper.
  • Are the captions correct? Do you put the table caption before or after the table? Do you put the figure caption before or after the figure? Do you center captions or not?
  • Do you refer to a figure as “Fig. 1″ or as “Figure 1″? Which one is correct?
  • Are all internal references correct? If you refer to Fig. 10, does Figure 10 exists? (Some LaTeX package can mess this up, so always check!) Are all tables and figures referenced in the text?
  • If this is a recurring conference or a journal, have you compared your paper with ten or so other articles to make sure that yours is consistent with how these other papers look and feel?
  • Do you use the right fonts? Be watchful: sometimes the font for the section header can differ from the font used in the main text.

8. How to write more than one good paper

Write daily for at least 15 to 30 minutes, ideally two hours. Studies show this is the key to becoming a prolific writer.

9. Further reading

10. If you like my advice…

Daniel Lemire’s blog

Good journals?

I don’t claim to be able to tell anyone where to publish. This page is just a bunch of approximative facts about some journals that are of interest to me. These facts alone cannot determine where you should publish and they are bias. The ordering or coloring is not an indication of the value of the journal. Please do comment this table (see below for comment form).

It seems like the Eigenfactor journal ranking site is a good source.

Journal Impact Factor Google Scholar (*) Google Scholar (***) Listed in DBLP ACM DL Free online proceedings
Data Mining and Knowledge Discovery 2.08 984 984 yes yes no (Springer)
SIGMOD Record 0.76 428 428 yes yes no (ACM)
IEEE Transactions on Pattern Analysis and Machine Intelligence 1.54 489 159 yes yes no (IEEE)
IEEE Transactions on Visualization and Computer Graphics 1.53 150 150 yes yes no (IEEE)
IEEE Transactions on Information Theory - 208 208 yes ? no (IEEE)
Machine Learning Research - 202 202 yes yes yes (MIT Press) OPEN ACCESS
IEEE Transactions on Signal Processing - 387 68 no no no (IEEE)
IEEE Computer 1.15 452 52 yes no? no (IEEE)
VLDB Journal 1.81 132 123 yes yes no (Springer)
ACM Transactions on Database Systems 0.875 143 143 yes yes no (ACM)
IBM Systems Journal 0.48 130 130 yes yes yes
IEEE Transactions on Knowledge and Data Engineering - 121 121 yes yes no (IEEE)
Software – Practice and Experience (SPE)
0.57 195 118 yes yes no (Wiley)
Knowledge and Information Systems (KAIS) 0.57 93 93 yes yes no (Springer)
ACM Transactions on Information Systems (TOIS) 5.059 97 97 yes yes no (ACM)
International Journal of Intelligent Information Systems 0.17 90 90 yes yes no (Springer)
Data and Knowledge Engineering 0.697 70 70 yes no no (Elsevier)
Journal of the American Society for Information Science and Technology - 56 56 yes yes no (Wiley)
Computer Methods in Applied Mechanics and Engineering - 67 28 no yes no (Elsevier)
Journal of Computational and Applied Mathematics 0.486 44 44 no yes no (Elsevier)
Signal Processing - 51 30 no yes no (Elsevier)
Computers and Mathematics with Applications 0.413 32 32 no no (?) no (Elsevier)
Applied Numerical Mathematics 0.4 44 15 no yes no (Elsevier)
Information Processing Letters 0.58 39 39 yes only 1987 no (Elsevier)
Acta Informatica 0.341 56 44 yes yes no (Springer)
Information Retrieval 0.61 36 22 yes yes no (Springer)
Information Sciences 0.23 31 15 yes yes no (Elsevier)
Nordic Journal of Computing 0.80 22 15 yes yes no
International Journal of Computer Mathematics 0.254 40 6 yes no no (Taylor & Francis)
Journal of systems and software 0.592 27 27 yes yes no (Elsevier)
Advances in Computational Mathematics 1.143 20 14 yes yes no (Springer)
Applied Mathematics Letters 0.414 35 2 yes only 1988 no (Elsevier)
Informatica (Lithuanian Academy of Sciences) - 32 32 yes no yes OPEN ACCESS
SIGKDD Exploration - 11 1 yes no yes (ACM)
Discrete Mathematics & Theoretical Computer Science 0.45 7 7 yes no yes
International Journal of Business Intelligence and Data Mining (IJBIDM) - 0 0 no no no (Inderscience)
ACM Transactions on Algorithms - 1 1 no yes no (ACM)
ACM Transactions on Knowledge Discovery in Data - - - no yes no (ACM)
ACM Transactions on the Web - - - no yes no (ACM)
ACM Transactions on Storage - - - no yes no (ACM)
International Journal of Data Warehousing and Mining - 0 0 yes (but no paper added yet) no no (Idea Group)
Applied Mathematical Sciences - 0 0 no no no (Hikari Ltd)
Journal of Informetrics - 0 0 no no no (Elsevier)
Applied Mathematics and Computation - - - yes - no (Elsevier)

Interesting open access journals:

Other journals to include:

(*) Maximal number of citations for a paper printed after 2002. This is an indication of the size of the community.

(***) Maximal number of citations for a paper printed after 2003.

Daniel Lemire’s blog

Data for Database Research

Sometimes, it can be hard to find just the right time series for testing a new algorithm.

My own stuff

I have my own data repository. Quite small!

Hunt and Kill Terrorists

Geology

Astronomy

Motion Capture

Biomedical

e-Commerce

Financial

Meteorological

Climate and environmental data

Motion

Blog

OCR

Collaborative Filtering

Text and XML

Web Monitoring

Web Graphs

Sounds

Voice

Web 2.0

(Where people share data sets freely.)

Time Series (Various)

Various


Not really data, but useful nonetheless:

Synthetic data generator

General documentation on Time Series theory

General Time Series software

  • TDDTool is great time series plotting software

Next Page »